Re: [Intel-gfx] [PATCH 5/5] drm/amdgpu: implement amdgpu_gem_prime_move_notify

2019-11-05 Thread Koenig, Christian
Am 05.11.19 um 14:50 schrieb Daniel Vetter:
> On Tue, Nov 5, 2019 at 2:39 PM Christian König
>  wrote:
>> Am 05.11.19 um 11:52 schrieb Daniel Vetter:
>>> On Tue, Oct 29, 2019 at 11:40:49AM +0100, Christian König wrote:
 Implement the importer side of unpinned DMA-buf handling.

 Signed-off-by: Christian König 
 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 28 -
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  6 +
2 files changed, 33 insertions(+), 1 deletion(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
 index 3629cfe53aad..af39553c51ad 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
 @@ -456,7 +456,33 @@ amdgpu_dma_buf_create_obj(struct drm_device *dev, 
 struct dma_buf *dma_buf)
   return ERR_PTR(ret);
}

 +/**
 + * amdgpu_dma_buf_move_notify - &attach.move_notify implementation
 + *
 + * @attach: the DMA-buf attachment
 + *
 + * Invalidate the DMA-buf attachment, making sure that the we re-create 
 the
 + * mapping before the next use.
 + */
 +static void
 +amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
 +{
 +struct ttm_operation_ctx ctx = { false, false };
 +struct drm_gem_object *obj = attach->importer_priv;
 +struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);
 +struct ttm_placement placement = {};
 +int r;
 +
 +if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
 +return;
 +
 +r = ttm_bo_validate(&bo->tbo, &placement, &ctx);
 +if (r)
 +DRM_ERROR("Failed to invalidate DMA-buf import (%d))\n", r);
>>> Where do you update pagetables?
>>>
>>> The only thing I've found is in the amdgpu CS code, which is way to late
>>> for this stuff here. Plus TTM doesn't handle virtual memory at all (aside
>>> from the gart tt), so clearly you need to call into amdgpu code somewhere
>>> for this. But I didn't find it, neither in your ->move_notify nor the
>>> ->move callback in ttm_bo_driver.
>>>
>>> How does this work?
>> Page tables are not updated until the next command submission, e.g. in
>> amdgpu_cs.c
>>
>> This is save since all previous command submissions are added to the
>> dma_resv object as fences and the dma_buf can't be moved before those
>> are signaled.
> Hm, I thought you still allow explicit buffer lists for each cs in
> amdgpu? Code looks at least like that, not everything goes through the
> context working set stuff.
>
> How do you prevent the security leak if userspace simply lies about
> not using a given buffer in a batch, and then abusing that to read
> that virtual address range anyway and peek at whatever is now going to
> be there when an eviction happened?

Oh, yeah that is a really good point. And no that isn't handled 
correctly at all.

I wanted to rework that for quite some time now, but always got into 
issues with TTM.

Thanks for the notice, so I need to put my TTM rework before of this. 
Crap, that adds a whole bunch of TODOs to my list.

Regards,
Christian.

> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
 +}
 +
static const struct dma_buf_attach_ops amdgpu_dma_buf_attach_ops = {
 +.move_notify = amdgpu_dma_buf_move_notify
};

/**
 @@ -492,7 +518,7 @@ struct drm_gem_object *amdgpu_gem_prime_import(struct 
 drm_device *dev,
   return obj;

   attach = dma_buf_dynamic_attach(dma_buf, dev->dev,
 -&amdgpu_dma_buf_attach_ops, NULL);
 +&amdgpu_dma_buf_attach_ops, obj);
   if (IS_ERR(attach)) {
   drm_gem_object_put(obj);
   return ERR_CAST(attach);
 diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
 b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 index ac776d2620eb..cfa46341c9a7 100644
 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
 @@ -861,6 +861,9 @@ int amdgpu_bo_pin_restricted(struct amdgpu_bo *bo, u32 
 domain,
   return 0;
   }

 +if (bo->tbo.base.import_attach)
 +dma_buf_pin(bo->tbo.base.import_attach);
 +
   bo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
   /* force to pin into visible video ram */
   if (!(bo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS))
 @@ -944,6 +947,9 @@ int amdgpu_bo_unpin(struct amdgpu_bo *bo)

   amdgpu_bo_subtract_pin_size(bo);

 +if (bo->tbo.base.import_attach)
 +dma_buf_unpin(bo->tbo.base.import_attach);
 +
   for (i = 0; i < bo->placement.num_placement; i++) {
   bo->placements[i].lpfn = 0;
   bo->placements[i].flags &= ~TTM

Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations

2019-11-04 Thread Koenig, Christian
Am 04.11.19 um 18:37 schrieb Daniel Vetter:
> Full audit of everyone:
>
> - i915, radeon, amdgpu should be clean per their maintainers.
>
> - vram helpers should be fine, they don't do command submission, so
>really no business holding struct_mutex while doing copy_*_user. But
>I haven't checked them all.
>
> - panfrost seems to dma_resv_lock only in panfrost_job_push, which
>looks clean.
>
> - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(),
>copying from/to userspace happens all in v3d_lookup_bos which is
>outside of the critical section.
>
> - vmwgfx has a bunch of ioctls that do their own copy_*_user:
>- vmw_execbuf_process: First this does some copies in
>  vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself.
>  Then comes the usual ttm reserve/validate sequence, then actual
>  submission/fencing, then unreserving, and finally some more
>  copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of
>  details, but looks all safe.
>- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be
>  seen, seems to only create a fence and copy it out.
>- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be
>  found there.
>Summary: vmwgfx seems to be fine too.
>
> - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the
>copying from userspace before even looking up objects through their
>handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
>
> - qxl only has qxl_execbuffer_ioctl, which calls into
>qxl_process_single_command. There's a lovely comment before the
>__copy_from_user_inatomic that the slowpath should be copied from
>i915, but I guess that never happened. Try not to be unlucky and get
>your CS data evicted between when it's written and the kernel tries
>to read it. The only other copy_from_user is for relocs, but those
>are done before qxl_release_reserve_list(), which seems to be the
>only thing reserving buffers (in the ttm/dma_resv sense) in that
>code. So looks safe.
>
> - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in
>usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this
>everywhere and needs to be fixed up.
>
> v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a
> dma_resv lock of a different object already. Christian mentioned that
> ttm core does this too for ghost objects. intel-gfx-ci highlighted
> that i915 has similar issues.
>
> Unfortunately we can't do this in the usual module init functions,
> because kernel threads don't have an ->mm - we have to wait around for
> some user thread to do this.
>
> Solution is to spawn a worker (but only once). It's horrible, but it
> works.
>
> v3: We can allocate mm! (Chris). Horrible worker hack out, clean
> initcall solution in.
>
> v4: Annotate with __init (Rob Herring)
>
> Cc: Rob Herring 
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: Chris Wilson 
> Cc: Thomas Zimmermann 
> Cc: Rob Herring 
> Cc: Tomeu Vizoso 
> Cc: Eric Anholt 
> Cc: Dave Airlie 
> Cc: Gerd Hoffmann 
> Cc: Ben Skeggs 
> Cc: "VMware Graphics" 
> Cc: Thomas Hellstrom 
> Reviewed-by: Christian König 
> Reviewed-by: Chris Wilson 
> Tested-by: Chris Wilson 
> Signed-off-by: Daniel Vetter 

What's holding you back to commit that?

Christian.

> ---
>   drivers/dma-buf/dma-resv.c | 24 
>   1 file changed, 24 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 709002515550..a05ff542be22 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -34,6 +34,7 @@
>   
>   #include 
>   #include 
> +#include 
>   
>   /**
>* DOC: Reservation Object Overview
> @@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list)
>   kfree_rcu(list, rcu);
>   }
>   
> +#if IS_ENABLED(CONFIG_LOCKDEP)
> +static void __init dma_resv_lockdep(void)
> +{
> + struct mm_struct *mm = mm_alloc();
> + struct dma_resv obj;
> +
> + if (!mm)
> + return;
> +
> + dma_resv_init(&obj);
> +
> + down_read(&mm->mmap_sem);
> + ww_mutex_lock(&obj.lock, NULL);
> + fs_reclaim_acquire(GFP_KERNEL);
> + fs_reclaim_release(GFP_KERNEL);
> + ww_mutex_unlock(&obj.lock);
> + up_read(&mm->mmap_sem);
> + 
> + mmput(mm);
> +}
> +subsys_initcall(dma_resv_lockdep);
> +#endif
> +
>   /**
>* dma_resv_init - initialize a reservation object
>* @obj: the reservation object

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-10-17 Thread Koenig, Christian
Am 16.10.19 um 16:23 schrieb Daniel Vetter:
> On Wed, Oct 16, 2019 at 3:46 PM Koenig, Christian
>  wrote:
>> Am 08.10.19 um 10:55 schrieb Daniel Vetter:
>>> On Wed, Oct 02, 2019 at 08:37:50AM +0000, Koenig, Christian wrote:
>>>> Hi Daniel,
>>>>
>>>> once more a ping on this. Any more comments or can we get it comitted?
>>> Sorry got a bit smashed past weeks, but should be resurrected now back
>>> from xdc.
>> And any more thoughts on this? I mean we are blocked for month on this
>> now :(
> I replied to both 1 and 2 in this series on 8th Oct. You even replied
> to me in the thread on patch 2 ... but not here (I top posted since
> this detour here just me being confused).

Ok, in this case its my fault. I totally missed your reply on 1 and 
thought that the reply on 2 was actually for a different thread.

I'm going to submit the TTM changes separately, cause that is actually a 
bug fix for a completely different issue which just happens to surface 
because we change the locking.

Thanks,
Christian.

> -Daniel
>
>> Thanks,
>> Christian.
>>
>>> -Daniel
>>>> Thanks,
>>>> Christian.
>>>>
>>>> Am 24.09.19 um 11:50 schrieb Christian König:
>>>>> Am 17.09.19 um 16:56 schrieb Daniel Vetter:
>>>>>> [SNIP]
>>>>>>>>>>>>>>>> +/* When either the importer or the exporter
>>>>>>>>>>>>>>>> can't handle dynamic
>>>>>>>>>>>>>>>> + * mappings we cache the mapping here to avoid issues
>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>> + * reservation object lock.
>>>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>>>> +if (dma_buf_attachment_is_dynamic(attach) !=
>>>>>>>>>>>>>>>> +dma_buf_is_dynamic(dmabuf)) {
>>>>>>>>>>>>>>>> +struct sg_table *sgt;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +if (dma_buf_is_dynamic(attach->dmabuf))
>>>>>>>>>>>>>>>> + dma_resv_lock(attach->dmabuf->resv, NULL);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +sgt = dmabuf->ops->map_dma_buf(attach,
>>>>>>>>>>>>>>>> DMA_BIDIRECTIONAL);
>>>>>>>>>>>>>>> Now we're back to enforcing DMA_BIDI, which works nicely
>>>>>>>>>>>>>>> around the
>>>>>>>>>>>>>>> locking pain, but apparently upsets the arm-soc folks who
>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>> this better.
>>>>>>>>>>>>>> Take another look at dma_buf_map_attachment(), we still try
>>>>>>>>>>>>>> to get the
>>>>>>>>>>>>>> caching there for ARM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What we do here is to bidirectionally map the buffer to avoid
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> locking hydra when importer and exporter disagree on locking.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the ARM folks can easily avoid that by switching to
>>>>>>>>>>>>>> dynamic locking
>>>>>>>>>>>>>> for both.
>>>>>>>>>>>> So you still break the contract between importer and exporter,
>>>>>>>>>>>> except not
>>>>>>>>>>>> for anything that's run in intel-gfx-ci so all is good?
>>>>>>>>>>> No, the contract between importer and exporter stays exactly the
>>>>>>>>>>> same it
>>>>>>>>>>> is currently as long as you don't switch to dynamic dma-buf
>>>>>>>>>>> handling.
>>>>>>>>>>>
>>>>>>>>>>> There is no functional change for the ARM folks here. The only
>>>>>>>>>>> change
>>>>>>>>>>> which takes effect is between i915 and amdgpu and that is perfectly
>>>>>>>>>>> covered by intel-gfx-ci.
>>>>>>>>>> There's people who want to run amdgpu on ARM?
>>>>>>>>> Sure there are, we even recently fixed some bugs for this.
>>>>>>>>>
>>>>>>>>> But as far as I know there is no one currently which is affect by
>>>>>>>>> this
>>>>>>>>> change on ARM with amdgpu.
>>>>>>>> But don't you break them with this now?
>>>>>>> No, we see the bidirectional attachment as compatible with the other
>>>>>>> ones.
>>>>>>>
>>>>>>>> amdgpu will soon set the dynamic flag on exports, which forces the
>>>>>>>> caching
>>>>>>>> at create time (to avoid the locking fun), which will then result in a
>>>>>>>> EBUSY at map_attachment time because we have a cached mapping, but
>>>>>>>> it's
>>>>>>>> the wrong type.
>>>>>>> See the check in dma_buf_map_attachment():
>>>>>>>
>>>>>>> if (attach->dir != direction && attach->dir != 
>>>>>>> DMA_BIDIRECTIONAL)
>>>>>>> return ERR_PTR(-EBUSY);
>>>>>> Hm, I misread this. So yeah should work, +/- the issue that we might
>>>>>> not flush enough. But I guess that can be fixed whenever, it's not
>>>>>> like dma-api semantics are a great fit for us. Maybe a fixme comment
>>>>>> would be useful here ... I'll look at this tomorrow or so because atm
>>>>>> brain is slow, I'm down with the usual post-conference cold it seems
>>>>>> :-/
>>>>> Hope your are feeling better now, adding a comment is of course not a
>>>>> problem.
>>>>>
>>>>> With that fixed can I get an reviewed-by or at least and acked-by?
>>>>>
>>>>> I want to land at least some parts of those changes now.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> -Daniel
>>>>>>
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-10-16 Thread Koenig, Christian
Am 08.10.19 um 10:55 schrieb Daniel Vetter:
> On Wed, Oct 02, 2019 at 08:37:50AM +0000, Koenig, Christian wrote:
>> Hi Daniel,
>>
>> once more a ping on this. Any more comments or can we get it comitted?
> Sorry got a bit smashed past weeks, but should be resurrected now back
> from xdc.

And any more thoughts on this? I mean we are blocked for month on this 
now :(

Thanks,
Christian.

> -Daniel
>> Thanks,
>> Christian.
>>
>> Am 24.09.19 um 11:50 schrieb Christian König:
>>> Am 17.09.19 um 16:56 schrieb Daniel Vetter:
>>>> [SNIP]
>>>>>>>>>>>>>>    +    /* When either the importer or the exporter
>>>>>>>>>>>>>> can't handle dynamic
>>>>>>>>>>>>>> + * mappings we cache the mapping here to avoid issues
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> + * reservation object lock.
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +    if (dma_buf_attachment_is_dynamic(attach) !=
>>>>>>>>>>>>>> +    dma_buf_is_dynamic(dmabuf)) {
>>>>>>>>>>>>>> +    struct sg_table *sgt;
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    if (dma_buf_is_dynamic(attach->dmabuf))
>>>>>>>>>>>>>> + dma_resv_lock(attach->dmabuf->resv, NULL);
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +    sgt = dmabuf->ops->map_dma_buf(attach,
>>>>>>>>>>>>>> DMA_BIDIRECTIONAL);
>>>>>>>>>>>>> Now we're back to enforcing DMA_BIDI, which works nicely
>>>>>>>>>>>>> around the
>>>>>>>>>>>>> locking pain, but apparently upsets the arm-soc folks who
>>>>>>>>>>>>> want to
>>>>>>>>>>>>> control
>>>>>>>>>>>>> this better.
>>>>>>>>>>>> Take another look at dma_buf_map_attachment(), we still try
>>>>>>>>>>>> to get the
>>>>>>>>>>>> caching there for ARM.
>>>>>>>>>>>>
>>>>>>>>>>>> What we do here is to bidirectionally map the buffer to avoid
>>>>>>>>>>>> the
>>>>>>>>>>>> locking hydra when importer and exporter disagree on locking.
>>>>>>>>>>>>
>>>>>>>>>>>> So the ARM folks can easily avoid that by switching to
>>>>>>>>>>>> dynamic locking
>>>>>>>>>>>> for both.
>>>>>>>>>> So you still break the contract between importer and exporter,
>>>>>>>>>> except not
>>>>>>>>>> for anything that's run in intel-gfx-ci so all is good?
>>>>>>>>> No, the contract between importer and exporter stays exactly the
>>>>>>>>> same it
>>>>>>>>> is currently as long as you don't switch to dynamic dma-buf
>>>>>>>>> handling.
>>>>>>>>>
>>>>>>>>> There is no functional change for the ARM folks here. The only
>>>>>>>>> change
>>>>>>>>> which takes effect is between i915 and amdgpu and that is perfectly
>>>>>>>>> covered by intel-gfx-ci.
>>>>>>>> There's people who want to run amdgpu on ARM?
>>>>>>> Sure there are, we even recently fixed some bugs for this.
>>>>>>>
>>>>>>> But as far as I know there is no one currently which is affect by
>>>>>>> this
>>>>>>> change on ARM with amdgpu.
>>>>>> But don't you break them with this now?
>>>>> No, we see the bidirectional attachment as compatible with the other
>>>>> ones.
>>>>>
>>>>>> amdgpu will soon set the dynamic flag on exports, which forces the
>>>>>> caching
>>>>>> at create time (to avoid the locking fun), which will then result in a
>>>>>> EBUSY at map_attachment time because we have a cached mapping, but
>>>>>> it's
>>>>>> the wrong type.
>>>>> See the check in dma_buf_map_attachment():
>>>>>
>>>>>    if (attach->dir != direction && attach->dir != DMA_BIDIRECTIONAL)
>>>>>    return ERR_PTR(-EBUSY);
>>>> Hm, I misread this. So yeah should work, +/- the issue that we might
>>>> not flush enough. But I guess that can be fixed whenever, it's not
>>>> like dma-api semantics are a great fit for us. Maybe a fixme comment
>>>> would be useful here ... I'll look at this tomorrow or so because atm
>>>> brain is slow, I'm down with the usual post-conference cold it seems
>>>> :-/
>>> Hope your are feeling better now, adding a comment is of course not a
>>> problem.
>>>
>>> With that fixed can I get an reviewed-by or at least and acked-by?
>>>
>>> I want to land at least some parts of those changes now.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> -Daniel
>>>>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-10-02 Thread Koenig, Christian
Hi Daniel,

once more a ping on this. Any more comments or can we get it comitted?

Thanks,
Christian.

Am 24.09.19 um 11:50 schrieb Christian König:
> Am 17.09.19 um 16:56 schrieb Daniel Vetter:
>> [SNIP]
   +    /* When either the importer or the exporter 
 can't handle dynamic
 + * mappings we cache the mapping here to avoid issues 
 with the
 + * reservation object lock.
 + */
 +    if (dma_buf_attachment_is_dynamic(attach) !=
 +    dma_buf_is_dynamic(dmabuf)) {
 +    struct sg_table *sgt;
 +
 +    if (dma_buf_is_dynamic(attach->dmabuf))
 + dma_resv_lock(attach->dmabuf->resv, NULL);
 +
 +    sgt = dmabuf->ops->map_dma_buf(attach, 
 DMA_BIDIRECTIONAL);
>>> Now we're back to enforcing DMA_BIDI, which works nicely 
>>> around the
>>> locking pain, but apparently upsets the arm-soc folks who 
>>> want to
>>> control
>>> this better.
>> Take another look at dma_buf_map_attachment(), we still try 
>> to get the
>> caching there for ARM.
>>
>> What we do here is to bidirectionally map the buffer to avoid 
>> the
>> locking hydra when importer and exporter disagree on locking.
>>
>> So the ARM folks can easily avoid that by switching to 
>> dynamic locking
>> for both.
 So you still break the contract between importer and exporter, 
 except not
 for anything that's run in intel-gfx-ci so all is good?
>>> No, the contract between importer and exporter stays exactly the 
>>> same it
>>> is currently as long as you don't switch to dynamic dma-buf 
>>> handling.
>>>
>>> There is no functional change for the ARM folks here. The only 
>>> change
>>> which takes effect is between i915 and amdgpu and that is perfectly
>>> covered by intel-gfx-ci.
>> There's people who want to run amdgpu on ARM?
> Sure there are, we even recently fixed some bugs for this.
>
> But as far as I know there is no one currently which is affect by 
> this
> change on ARM with amdgpu.
 But don't you break them with this now?
>>> No, we see the bidirectional attachment as compatible with the other 
>>> ones.
>>>
 amdgpu will soon set the dynamic flag on exports, which forces the 
 caching
 at create time (to avoid the locking fun), which will then result in a
 EBUSY at map_attachment time because we have a cached mapping, but 
 it's
 the wrong type.
>>> See the check in dma_buf_map_attachment():
>>>
>>>   if (attach->dir != direction && attach->dir != DMA_BIDIRECTIONAL)
>>>   return ERR_PTR(-EBUSY);
>> Hm, I misread this. So yeah should work, +/- the issue that we might
>> not flush enough. But I guess that can be fixed whenever, it's not
>> like dma-api semantics are a great fit for us. Maybe a fixme comment
>> would be useful here ... I'll look at this tomorrow or so because atm
>> brain is slow, I'm down with the usual post-conference cold it seems
>> :-/
>
> Hope your are feeling better now, adding a comment is of course not a 
> problem.
>
> With that fixed can I get an reviewed-by or at least and acked-by?
>
> I want to land at least some parts of those changes now.
>
> Regards,
> Christian.
>
>> -Daniel
>>
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] doc: Update references to previously renamed files

2019-09-27 Thread Koenig, Christian
Am 27.09.19 um 13:15 schrieb Anna Karas:
> Update references to reservation.c and reservation.h since these files
> have been renamed to dma-resv.c and dma-resv.h respectively.
>
> Cc: Christian König 
> Link: https://patchwork.freedesktop.org/patch/323401/?series=65037&rev=1
> Signed-off-by: Anna Karas 

Reviewed-by: Christian König 

You should also send that to a couple of more mailing list, only CCing 
intel-gfx is not really appropriate for that code.

Leave me a note when you need to get this committed to drm-misc-fixes.

Regards,
Christian.

> ---
>   Documentation/driver-api/dma-buf.rst | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/driver-api/dma-buf.rst 
> b/Documentation/driver-api/dma-buf.rst
> index b541e97c7ab1..c78db28519f7 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -118,13 +118,13 @@ Kernel Functions and Structures Reference
>   Reservation Objects
>   ---
>   
> -.. kernel-doc:: drivers/dma-buf/reservation.c
> +.. kernel-doc:: drivers/dma-buf/dma-resv.c
>  :doc: Reservation Object Overview
>   
> -.. kernel-doc:: drivers/dma-buf/reservation.c
> +.. kernel-doc:: drivers/dma-buf/dma-resv.c
>  :export:
>   
> -.. kernel-doc:: include/linux/reservation.h
> +.. kernel-doc:: include/linux/dma-resv.h
>  :internal:
>   
>   DMA Fences

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/i915: Update references to previously renamed files

2019-09-27 Thread Koenig, Christian
Am 26.09.19 um 16:32 schrieb Anna Karas:
> Update references to reservation.c and reservation.h since these files
> have been renamed to dma-resv.c and dma-resv.h respectively.

The subject line is wrong since this isn't a i915 related patch, but 
apart from that it looks good to me.

Regards,
Christian.

>
> Cc: Christian König 
> Link: https://patchwork.freedesktop.org/patch/323401/?series=65037&rev=1
> Signed-off-by: Anna Karas 
> ---
>   Documentation/driver-api/dma-buf.rst | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/driver-api/dma-buf.rst 
> b/Documentation/driver-api/dma-buf.rst
> index b541e97c7ab1..c78db28519f7 100644
> --- a/Documentation/driver-api/dma-buf.rst
> +++ b/Documentation/driver-api/dma-buf.rst
> @@ -118,13 +118,13 @@ Kernel Functions and Structures Reference
>   Reservation Objects
>   ---
>   
> -.. kernel-doc:: drivers/dma-buf/reservation.c
> +.. kernel-doc:: drivers/dma-buf/dma-resv.c
>  :doc: Reservation Object Overview
>   
> -.. kernel-doc:: drivers/dma-buf/reservation.c
> +.. kernel-doc:: drivers/dma-buf/dma-resv.c
>  :export:
>   
> -.. kernel-doc:: include/linux/reservation.h
> +.. kernel-doc:: include/linux/dma-resv.h
>  :internal:
>   
>   DMA Fences

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-09-24 Thread Koenig, Christian
Am 17.09.19 um 16:56 schrieb Daniel Vetter:
> [SNIP]
>>>   +/* When either the importer or the exporter can't handle 
>>> dynamic
>>> + * mappings we cache the mapping here to avoid issues with the
>>> + * reservation object lock.
>>> + */
>>> +if (dma_buf_attachment_is_dynamic(attach) !=
>>> +dma_buf_is_dynamic(dmabuf)) {
>>> +struct sg_table *sgt;
>>> +
>>> +if (dma_buf_is_dynamic(attach->dmabuf))
>>> +dma_resv_lock(attach->dmabuf->resv, NULL);
>>> +
>>> +sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
>> Now we're back to enforcing DMA_BIDI, which works nicely around the
>> locking pain, but apparently upsets the arm-soc folks who want to
>> control
>> this better.
> Take another look at dma_buf_map_attachment(), we still try to get the
> caching there for ARM.
>
> What we do here is to bidirectionally map the buffer to avoid the
> locking hydra when importer and exporter disagree on locking.
>
> So the ARM folks can easily avoid that by switching to dynamic locking
> for both.
>>> So you still break the contract between importer and exporter, except 
>>> not
>>> for anything that's run in intel-gfx-ci so all is good?
>> No, the contract between importer and exporter stays exactly the same it
>> is currently as long as you don't switch to dynamic dma-buf handling.
>>
>> There is no functional change for the ARM folks here. The only change
>> which takes effect is between i915 and amdgpu and that is perfectly
>> covered by intel-gfx-ci.
> There's people who want to run amdgpu on ARM?
 Sure there are, we even recently fixed some bugs for this.

 But as far as I know there is no one currently which is affect by this
 change on ARM with amdgpu.
>>> But don't you break them with this now?
>> No, we see the bidirectional attachment as compatible with the other ones.
>>
>>> amdgpu will soon set the dynamic flag on exports, which forces the caching
>>> at create time (to avoid the locking fun), which will then result in a
>>> EBUSY at map_attachment time because we have a cached mapping, but it's
>>> the wrong type.
>> See the check in dma_buf_map_attachment():
>>
>>   if (attach->dir != direction && attach->dir != DMA_BIDIRECTIONAL)
>>   return ERR_PTR(-EBUSY);
> Hm, I misread this. So yeah should work, +/- the issue that we might
> not flush enough. But I guess that can be fixed whenever, it's not
> like dma-api semantics are a great fit for us. Maybe a fixme comment
> would be useful here ... I'll look at this tomorrow or so because atm
> brain is slow, I'm down with the usual post-conference cold it seems
> :-/

Hope your are feeling better now, adding a comment is of course not a 
problem.

With that fixed can I get an reviewed-by or at least and acked-by?

I want to land at least some parts of those changes now.

Regards,
Christian.

> -Daniel
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-09-17 Thread Koenig, Christian
Am 17.09.19 um 15:45 schrieb Daniel Vetter:
> On Tue, Sep 17, 2019 at 01:24:10PM +0000, Koenig, Christian wrote:
>> Am 17.09.19 um 15:13 schrieb Daniel Vetter:
>>> On Tue, Sep 17, 2019 at 12:40:51PM +, Koenig, Christian wrote:
>>>> Am 17.09.19 um 14:31 schrieb Daniel Vetter:
>>>>> On Mon, Sep 16, 2019 at 02:23:13PM +0200, Christian König wrote:
>>>>>> Ping? Any further comment on this or can't we merge at least the locking
>>>>>> change?
>>>>> I was at plumbers ...
>>>>>> Christian.
>>>>>>
>>>>>> Am 11.09.19 um 12:53 schrieb Christian König:
>>>>>>> Am 03.09.19 um 10:05 schrieb Daniel Vetter:
>>>>>>>> On Thu, Aug 29, 2019 at 04:29:14PM +0200, Christian König wrote:
>>>>>>>>> This patch is a stripped down version of the locking changes
>>>>>>>>> necessary to support dynamic DMA-buf handling.
>>>>>>>>>
>>>>>>>>> For compatibility we cache the DMA-buf mapping as soon as
>>>>>>>>> exporter/importer disagree on the dynamic handling.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Christian König 
>>>>>>>>> ---
>>>>>>>>>  drivers/dma-buf/dma-buf.c | 90
>>>>>>>>> ---
>>>>>>>>>  include/linux/dma-buf.h   | 51 +-
>>>>>>>>>  2 files changed, 133 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>>>>>> index 433d91d710e4..65052d52602b 100644
>>>>>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>>>>>> @@ -525,6 +525,10 @@ struct dma_buf *dma_buf_export(const struct
>>>>>>>>> dma_buf_export_info *exp_info)
>>>>>>>>>  return ERR_PTR(-EINVAL);
>>>>>>>>>  }
>>>>>>>>>  +    if (WARN_ON(exp_info->ops->cache_sgt_mapping &&
>>>>>>>>> +    exp_info->ops->dynamic_mapping))
>>>>>>>>> +    return ERR_PTR(-EINVAL);
>>>>>>>>> +
>>>>>>>>>  if (!try_module_get(exp_info->owner))
>>>>>>>>>  return ERR_PTR(-ENOENT);
>>>>>>>>>  @@ -645,10 +649,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
>>>>>>>>>  EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>>>>>    /**
>>>>>>>>> - * dma_buf_attach - Add the device to dma_buf's attachments
>>>>>>>>> list; optionally,
>>>>>>>>> + * dma_buf_dynamic_attach - Add the device to dma_buf's
>>>>>>>>> attachments list; optionally,
>>>>>>>>>   * calls attach() of dma_buf_ops to allow device-specific
>>>>>>>>> attach functionality
>>>>>>>>> - * @dmabuf:    [in]    buffer to attach device to.
>>>>>>>>> - * @dev:    [in]    device to be attached.
>>>>>>>>> + * @dmabuf:    [in]    buffer to attach device to.
>>>>>>>>> + * @dev:    [in]    device to be attached.
>>>>>>>>> + * @dynamic_mapping:    [in]    calling convention for map/unmap
>>>>>>>>>   *
>>>>>>>>>   * Returns struct dma_buf_attachment pointer for this
>>>>>>>>> attachment. Attachments
>>>>>>>>>   * must be cleaned up by calling dma_buf_detach().
>>>>>>>>> @@ -662,8 +667,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>>>>>   * accessible to @dev, and cannot be moved to a more suitable
>>>>>>>>> place. This is
>>>>>>>>>   * indicated with the error code -EBUSY.
>>>>>>>>>   */
>>>>>>>>> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>>>>> -  struct device *dev)
>>>>>>>>> +struct dma_buf_attachment *
>>>>>>>>

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-09-17 Thread Koenig, Christian
Am 17.09.19 um 15:13 schrieb Daniel Vetter:
> On Tue, Sep 17, 2019 at 12:40:51PM +0000, Koenig, Christian wrote:
>> Am 17.09.19 um 14:31 schrieb Daniel Vetter:
>>> On Mon, Sep 16, 2019 at 02:23:13PM +0200, Christian König wrote:
>>>> Ping? Any further comment on this or can't we merge at least the locking
>>>> change?
>>> I was at plumbers ...
>>>> Christian.
>>>>
>>>> Am 11.09.19 um 12:53 schrieb Christian König:
>>>>> Am 03.09.19 um 10:05 schrieb Daniel Vetter:
>>>>>> On Thu, Aug 29, 2019 at 04:29:14PM +0200, Christian König wrote:
>>>>>>> This patch is a stripped down version of the locking changes
>>>>>>> necessary to support dynamic DMA-buf handling.
>>>>>>>
>>>>>>> For compatibility we cache the DMA-buf mapping as soon as
>>>>>>> exporter/importer disagree on the dynamic handling.
>>>>>>>
>>>>>>> Signed-off-by: Christian König 
>>>>>>> ---
>>>>>>>     drivers/dma-buf/dma-buf.c | 90
>>>>>>> ---
>>>>>>>     include/linux/dma-buf.h   | 51 +-
>>>>>>>     2 files changed, 133 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>>>> index 433d91d710e4..65052d52602b 100644
>>>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>>>> @@ -525,6 +525,10 @@ struct dma_buf *dma_buf_export(const struct
>>>>>>> dma_buf_export_info *exp_info)
>>>>>>>     return ERR_PTR(-EINVAL);
>>>>>>>     }
>>>>>>>     +    if (WARN_ON(exp_info->ops->cache_sgt_mapping &&
>>>>>>> +    exp_info->ops->dynamic_mapping))
>>>>>>> +    return ERR_PTR(-EINVAL);
>>>>>>> +
>>>>>>>     if (!try_module_get(exp_info->owner))
>>>>>>>     return ERR_PTR(-ENOENT);
>>>>>>>     @@ -645,10 +649,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
>>>>>>>     EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>>>       /**
>>>>>>> - * dma_buf_attach - Add the device to dma_buf's attachments
>>>>>>> list; optionally,
>>>>>>> + * dma_buf_dynamic_attach - Add the device to dma_buf's
>>>>>>> attachments list; optionally,
>>>>>>>      * calls attach() of dma_buf_ops to allow device-specific
>>>>>>> attach functionality
>>>>>>> - * @dmabuf:    [in]    buffer to attach device to.
>>>>>>> - * @dev:    [in]    device to be attached.
>>>>>>> + * @dmabuf:    [in]    buffer to attach device to.
>>>>>>> + * @dev:    [in]    device to be attached.
>>>>>>> + * @dynamic_mapping:    [in]    calling convention for map/unmap
>>>>>>>      *
>>>>>>>      * Returns struct dma_buf_attachment pointer for this
>>>>>>> attachment. Attachments
>>>>>>>      * must be cleaned up by calling dma_buf_detach().
>>>>>>> @@ -662,8 +667,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
>>>>>>>      * accessible to @dev, and cannot be moved to a more suitable
>>>>>>> place. This is
>>>>>>>      * indicated with the error code -EBUSY.
>>>>>>>      */
>>>>>>> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>>> -  struct device *dev)
>>>>>>> +struct dma_buf_attachment *
>>>>>>> +dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
>>>>>>> +   bool dynamic_mapping)
>>>>>>>     {
>>>>>>>     struct dma_buf_attachment *attach;
>>>>>>>     int ret;
>>>>>>> @@ -677,6 +683,7 @@ struct dma_buf_attachment
>>>>>>> *dma_buf_attach(struct dma_buf *dmabuf,
>>>>>>>       attach->dev = dev;
>>>>>>>     attach->dmabuf = dmabuf;
>>>>>>> +    attach->dynamic_mapping = dynamic_mapping;
>>>>>

Re: [Intel-gfx] [PATCH 1/4] dma-buf: change DMA-buf locking convention

2019-09-17 Thread Koenig, Christian
Am 17.09.19 um 14:31 schrieb Daniel Vetter:
> On Mon, Sep 16, 2019 at 02:23:13PM +0200, Christian König wrote:
>> Ping? Any further comment on this or can't we merge at least the locking
>> change?
> I was at plumbers ...
>> Christian.
>>
>> Am 11.09.19 um 12:53 schrieb Christian König:
>>> Am 03.09.19 um 10:05 schrieb Daniel Vetter:
 On Thu, Aug 29, 2019 at 04:29:14PM +0200, Christian König wrote:
> This patch is a stripped down version of the locking changes
> necessary to support dynamic DMA-buf handling.
>
> For compatibility we cache the DMA-buf mapping as soon as
> exporter/importer disagree on the dynamic handling.
>
> Signed-off-by: Christian König 
> ---
>    drivers/dma-buf/dma-buf.c | 90
> ---
>    include/linux/dma-buf.h   | 51 +-
>    2 files changed, 133 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 433d91d710e4..65052d52602b 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -525,6 +525,10 @@ struct dma_buf *dma_buf_export(const struct
> dma_buf_export_info *exp_info)
>    return ERR_PTR(-EINVAL);
>    }
>    +    if (WARN_ON(exp_info->ops->cache_sgt_mapping &&
> +    exp_info->ops->dynamic_mapping))
> +    return ERR_PTR(-EINVAL);
> +
>    if (!try_module_get(exp_info->owner))
>    return ERR_PTR(-ENOENT);
>    @@ -645,10 +649,11 @@ void dma_buf_put(struct dma_buf *dmabuf)
>    EXPORT_SYMBOL_GPL(dma_buf_put);
>      /**
> - * dma_buf_attach - Add the device to dma_buf's attachments
> list; optionally,
> + * dma_buf_dynamic_attach - Add the device to dma_buf's
> attachments list; optionally,
>     * calls attach() of dma_buf_ops to allow device-specific
> attach functionality
> - * @dmabuf:    [in]    buffer to attach device to.
> - * @dev:    [in]    device to be attached.
> + * @dmabuf:    [in]    buffer to attach device to.
> + * @dev:    [in]    device to be attached.
> + * @dynamic_mapping:    [in]    calling convention for map/unmap
>     *
>     * Returns struct dma_buf_attachment pointer for this
> attachment. Attachments
>     * must be cleaned up by calling dma_buf_detach().
> @@ -662,8 +667,9 @@ EXPORT_SYMBOL_GPL(dma_buf_put);
>     * accessible to @dev, and cannot be moved to a more suitable
> place. This is
>     * indicated with the error code -EBUSY.
>     */
> -struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> -  struct device *dev)
> +struct dma_buf_attachment *
> +dma_buf_dynamic_attach(struct dma_buf *dmabuf, struct device *dev,
> +   bool dynamic_mapping)
>    {
>    struct dma_buf_attachment *attach;
>    int ret;
> @@ -677,6 +683,7 @@ struct dma_buf_attachment
> *dma_buf_attach(struct dma_buf *dmabuf,
>      attach->dev = dev;
>    attach->dmabuf = dmabuf;
> +    attach->dynamic_mapping = dynamic_mapping;
>      mutex_lock(&dmabuf->lock);
>    @@ -685,16 +692,64 @@ struct dma_buf_attachment
> *dma_buf_attach(struct dma_buf *dmabuf,
>    if (ret)
>    goto err_attach;
>    }
> +    dma_resv_lock(dmabuf->resv, NULL);
>    list_add(&attach->node, &dmabuf->attachments);
> +    dma_resv_unlock(dmabuf->resv);
>      mutex_unlock(&dmabuf->lock);
>    +    /* When either the importer or the exporter can't handle dynamic
> + * mappings we cache the mapping here to avoid issues with the
> + * reservation object lock.
> + */
> +    if (dma_buf_attachment_is_dynamic(attach) !=
> +    dma_buf_is_dynamic(dmabuf)) {
> +    struct sg_table *sgt;
> +
> +    if (dma_buf_is_dynamic(attach->dmabuf))
> +    dma_resv_lock(attach->dmabuf->resv, NULL);
> +
> +    sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
 Now we're back to enforcing DMA_BIDI, which works nicely around the
 locking pain, but apparently upsets the arm-soc folks who want to
 control
 this better.
>>> Take another look at dma_buf_map_attachment(), we still try to get the
>>> caching there for ARM.
>>>
>>> What we do here is to bidirectionally map the buffer to avoid the
>>> locking hydra when importer and exporter disagree on locking.
>>>
>>> So the ARM folks can easily avoid that by switching to dynamic locking
>>> for both.
> So you still break the contract between importer and exporter, except not
> for anything that's run in intel-gfx-ci so all is good?

No, the contract between importer and exporter stays exactly the same it 
is currently as long as you don't switch to dynamic dma-buf handling.

There is no functional change for the

Re: [Intel-gfx] [PATCH] dma_resv: prime lockdep annotations

2019-09-03 Thread Koenig, Christian
Am 03.09.19 um 10:16 schrieb Daniel Vetter:
> On Thu, Aug 22, 2019 at 07:53:53AM +0000, Koenig, Christian wrote:
>> Am 22.08.19 um 08:54 schrieb Daniel Vetter:
>>> Full audit of everyone:
>>>
>>> - i915, radeon, amdgpu should be clean per their maintainers.
>>>
>>> - vram helpers should be fine, they don't do command submission, so
>>> really no business holding struct_mutex while doing copy_*_user. But
>>> I haven't checked them all.
>>>
>>> - panfrost seems to dma_resv_lock only in panfrost_job_push, which
>>> looks clean.
>>>
>>> - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(),
>>> copying from/to userspace happens all in v3d_lookup_bos which is
>>> outside of the critical section.
>>>
>>> - vmwgfx has a bunch of ioctls that do their own copy_*_user:
>>> - vmw_execbuf_process: First this does some copies in
>>>   vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself.
>>>   Then comes the usual ttm reserve/validate sequence, then actual
>>>   submission/fencing, then unreserving, and finally some more
>>>   copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of
>>>   details, but looks all safe.
>>> - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be
>>>   seen, seems to only create a fence and copy it out.
>>> - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be
>>>   found there.
>>> Summary: vmwgfx seems to be fine too.
>>>
>>> - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the
>>> copying from userspace before even looking up objects through their
>>> handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
>>>
>>> - qxl only has qxl_execbuffer_ioctl, which calls into
>>> qxl_process_single_command. There's a lovely comment before the
>>> __copy_from_user_inatomic that the slowpath should be copied from
>>> i915, but I guess that never happened. Try not to be unlucky and get
>>> your CS data evicted between when it's written and the kernel tries
>>> to read it. The only other copy_from_user is for relocs, but those
>>> are done before qxl_release_reserve_list(), which seems to be the
>>> only thing reserving buffers (in the ttm/dma_resv sense) in that
>>> code. So looks safe.
>>>
>>> - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in
>>> usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this
>>> everywhere and needs to be fixed up.
>>>
>>> v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a
>>> dma_resv lock of a different object already. Christian mentioned that
>>> ttm core does this too for ghost objects. intel-gfx-ci highlighted
>>> that i915 has similar issues.
>>>
>>> Unfortunately we can't do this in the usual module init functions,
>>> because kernel threads don't have an ->mm - we have to wait around for
>>> some user thread to do this.
>>>
>>> Solution is to spawn a worker (but only once). It's horrible, but it
>>> works.
>>>
>>> v3: We can allocate mm! (Chris). Horrible worker hack out, clean
>>> initcall solution in.
>>>
>>> Cc: Alex Deucher 
>>> Cc: Christian König 
>>> Cc: Chris Wilson 
>>> Cc: Thomas Zimmermann 
>>> Cc: Rob Herring 
>>> Cc: Tomeu Vizoso 
>>> Cc: Eric Anholt 
>>> Cc: Dave Airlie 
>>> Cc: Gerd Hoffmann 
>>> Cc: Ben Skeggs 
>>> Cc: "VMware Graphics" 
>>> Cc: Thomas Hellstrom 
>>> Signed-off-by: Daniel Vetter 
>> Reviewed-by: Christian König 
> Did you get a chance to give this a spin on the amd CI?

No and sorry totally forgot to ask about that.

Going to try to bring this up tomorrow once more, but don't expect that 
I can get this tested anytime soon.

Christian.

> -Daniel
>
>>> ---
>>>drivers/dma-buf/dma-resv.c | 24 
>>>1 file changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
>>> index 42a8f3f11681..d233ef4cf0d7 100644
>>> --- a/drivers/dma-buf/dma-resv.c
>>> +++ b/drivers/dma-buf/dma-resv.c
>>> @@ -34,6 +34,7 @@
>>>
>>>#include 
>>>#include 
>>> +#inclu

Re: [Intel-gfx] [PATCH] dma-buf: Give dma-fence-array distinct lockclasses

2019-08-25 Thread Koenig, Christian
Am 24.08.19 um 21:12 schrieb Chris Wilson:
> Quoting Koenig, Christian (2019-08-24 20:04:43)
>> Am 24.08.19 um 15:58 schrieb Chris Wilson:
>>> In order to allow dma-fence-array as a generic container for fences, we
>>> need to allow for it to contain other dma-fence-arrays. By giving each
>>> dma-fence-array construction their own lockclass, we allow different
>>> types of dma-fence-array to nest, but still do not allow on class of
>>> dma-fence-array to contain itself (even though they have distinct
>>> locks).
>>>
>>> In practice, this means that each subsystem gets its own dma-fence-array
>>> class and we can freely use dma-fence-arrays as containers within the
>>> dmabuf core without angering lockdep.
>> I've considered this for as well. E.g. to use the dma_fence_array
>> implementation instead of coming up with the dma_fence_chain container.
>>
>> But as it turned out when userspace can control nesting, it is trivial
>> to chain enough dma_fence_arrays together to cause an in kernel stack
>> overflow. Which in turn creates a really nice attack vector.
>>
>> So as long as userspace has control over dma_fence_array nesting this is
>> a clear NAK and actually extremely dangerous.
> You are proposing to use recursive dma_fence_array containers for
> dma_resv...

Hui? Where? I've tried rather hard to avoid that.

That was certainly not intentional,
Christian.

>
>> It actually took me quite a while to get the dma_fence_chain container
>> recursion less to avoid stuff like this.
> Sure, we've been avoiding recursion for years.
> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] dma-buf: Give dma-fence-array distinct lockclasses

2019-08-24 Thread Koenig, Christian
Am 24.08.19 um 15:58 schrieb Chris Wilson:
> In order to allow dma-fence-array as a generic container for fences, we
> need to allow for it to contain other dma-fence-arrays. By giving each
> dma-fence-array construction their own lockclass, we allow different
> types of dma-fence-array to nest, but still do not allow on class of
> dma-fence-array to contain itself (even though they have distinct
> locks).
>
> In practice, this means that each subsystem gets its own dma-fence-array
> class and we can freely use dma-fence-arrays as containers within the
> dmabuf core without angering lockdep.

I've considered this for as well. E.g. to use the dma_fence_array 
implementation instead of coming up with the dma_fence_chain container.

But as it turned out when userspace can control nesting, it is trivial 
to chain enough dma_fence_arrays together to cause an in kernel stack 
overflow. Which in turn creates a really nice attack vector.

So as long as userspace has control over dma_fence_array nesting this is 
a clear NAK and actually extremely dangerous.

It actually took me quite a while to get the dma_fence_chain container 
recursion less to avoid stuff like this.

Regards,
Christian.

>
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> Cc: Daniel Vetter 
> ---
>   drivers/dma-buf/dma-fence-array.c | 13 -
>   include/linux/dma-fence-array.h   | 16 
>   2 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c 
> b/drivers/dma-buf/dma-fence-array.c
> index d3fbd950be94..d9bcdbb66d46 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -147,10 +147,11 @@ EXPORT_SYMBOL(dma_fence_array_ops);
>* If @signal_on_any is true the fence array signals if any fence in the 
> array
>* signals, otherwise it signals when all fences in the array signal.
>*/
> -struct dma_fence_array *dma_fence_array_create(int num_fences,
> -struct dma_fence **fences,
> -u64 context, unsigned seqno,
> -bool signal_on_any)
> +struct dma_fence_array *__dma_fence_array_create(int num_fences,
> +  struct dma_fence **fences,
> +  u64 context, unsigned seqno,
> +  bool signal_on_any,
> +  struct lock_class_key *key)
>   {
>   struct dma_fence_array *array;
>   size_t size = sizeof(*array);
> @@ -162,6 +163,8 @@ struct dma_fence_array *dma_fence_array_create(int 
> num_fences,
>   return NULL;
>   
>   spin_lock_init(&array->lock);
> + lockdep_set_class(&array->lock, key);
> +
>   dma_fence_init(&array->base, &dma_fence_array_ops, &array->lock,
>  context, seqno);
>   init_irq_work(&array->work, irq_dma_fence_array_work);
> @@ -174,7 +177,7 @@ struct dma_fence_array *dma_fence_array_create(int 
> num_fences,
>   
>   return array;
>   }
> -EXPORT_SYMBOL(dma_fence_array_create);
> +EXPORT_SYMBOL(__dma_fence_array_create);
>   
>   /**
>* dma_fence_match_context - Check if all fences are from the given context
> diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
> index 303dd712220f..1395f9428cdb 100644
> --- a/include/linux/dma-fence-array.h
> +++ b/include/linux/dma-fence-array.h
> @@ -74,10 +74,18 @@ to_dma_fence_array(struct dma_fence *fence)
>   return container_of(fence, struct dma_fence_array, base);
>   }
>   
> -struct dma_fence_array *dma_fence_array_create(int num_fences,
> -struct dma_fence **fences,
> -u64 context, unsigned seqno,
> -bool signal_on_any);
> +#define dma_fence_array_create(num, fences, context, seqno, any) ({ \
> + static struct lock_class_key __key; \
> + \
> + __dma_fence_array_create((num), (fences), (context), (seqno), (any), \
> +  &__key);   \
> +})
> +
> +struct dma_fence_array *__dma_fence_array_create(int num_fences,
> +  struct dma_fence **fences,
> +  u64 context, unsigned seqno,
> +  bool signal_on_any,
> +  struct lock_class_key *key);
>   
>   bool dma_fence_match_context(struct dma_fence *fence, u64 context);
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] drm/ttm: remove ttm_bo_wait_unreserved

2019-08-22 Thread Koenig, Christian
Am 22.08.19 um 15:06 schrieb Daniel Vetter:
> On Thu, Aug 22, 2019 at 07:56:56AM +0000, Koenig, Christian wrote:
>> Am 22.08.19 um 08:49 schrieb Daniel Vetter:
>>> With nouveau fixed all ttm-using drives have the correct nesting of
>>> mmap_sem vs dma_resv, and we can just lock the buffer.
>>>
>>> Assuming I didn't screw up anything with my audit of course.
>>>
>>> v2:
>>> - Dont forget wu_mutex (Christian König)
>>> - Keep the mmap_sem-less wait optimization (Thomas)
>>> - Use _lock_interruptible to be good citizens (Thomas)
>>>
>>> Reviewed-by: Christian König 
> btw I realized I didn't remove your r-b, since v1 was broken.
>
> For formality, can you pls reaffirm, or still something broken?

My r-b is still valid.

Only problem I see is that neither of us seems to have a good idea about 
the different VM_FAULT_* replies.

But that worked before so it should still work now,
Christian.

>
> Also from the other thread: Reviewed-by: Thomas Hellström 
> 
>
> Thanks, Daniel
>
>>> Signed-off-by: Daniel Vetter 
>>> Cc: Christian Koenig 
>>> Cc: Huang Rui 
>>> Cc: Gerd Hoffmann 
>>> Cc: "VMware Graphics" 
>>> Cc: Thomas Hellstrom 
>>> ---
>>>drivers/gpu/drm/ttm/ttm_bo.c  | 36 ---
>>>drivers/gpu/drm/ttm/ttm_bo_util.c |  1 -
>>>drivers/gpu/drm/ttm/ttm_bo_vm.c   | 18 +---
>>>include/drm/ttm/ttm_bo_api.h  |  4 
>>>4 files changed, 5 insertions(+), 54 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 20ff56f27aa4..d1ce5d315d5b 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref)
>>> dma_fence_put(bo->moving);
>>> if (!ttm_bo_uses_embedded_gem_object(bo))
>>> dma_resv_fini(&bo->base._resv);
>>> -   mutex_destroy(&bo->wu_mutex);
>>> bo->destroy(bo);
>>> ttm_mem_global_free(bdev->glob->mem_glob, acc_size);
>>>}
>>> @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
>>> INIT_LIST_HEAD(&bo->ddestroy);
>>> INIT_LIST_HEAD(&bo->swap);
>>> INIT_LIST_HEAD(&bo->io_reserve_lru);
>>> -   mutex_init(&bo->wu_mutex);
>>> bo->bdev = bdev;
>>> bo->type = type;
>>> bo->num_pages = num_pages;
>>> @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev)
>>> ;
>>>}
>>>EXPORT_SYMBOL(ttm_bo_swapout_all);
>>> -
>>> -/**
>>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to 
>>> become
>>> - * unreserved
>>> - *
>>> - * @bo: Pointer to buffer
>>> - */
>>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo)
>>> -{
>>> -   int ret;
>>> -
>>> -   /*
>>> -* In the absense of a wait_unlocked API,
>>> -* Use the bo::wu_mutex to avoid triggering livelocks due to
>>> -* concurrent use of this function. Note that this use of
>>> -* bo::wu_mutex can go away if we change locking order to
>>> -* mmap_sem -> bo::reserve.
>>> -*/
>>> -   ret = mutex_lock_interruptible(&bo->wu_mutex);
>>> -   if (unlikely(ret != 0))
>>> -   return -ERESTARTSYS;
>>> -   if (!dma_resv_is_locked(bo->base.resv))
>>> -   goto out_unlock;
>>> -   ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
>>> -   if (ret == -EINTR)
>>> -   ret = -ERESTARTSYS;
>>> -   if (unlikely(ret != 0))
>>> -   goto out_unlock;
>>> -   dma_resv_unlock(bo->base.resv);
>>> -
>>> -out_unlock:
>>> -   mutex_unlock(&bo->wu_mutex);
>>> -   return ret;
>>> -}
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> index fe81c565e7ef..82ea26a49959 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>>> @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct 
>>> ttm_buffer_object *bo,
>>> INIT_LIST_HEAD(&fbo->base.lru);
>>> INIT_LIST_HEAD(&fbo->base.swap);
>>> INIT_LIST_HEAD(&fbo->base.io

Re: [Intel-gfx] [PATCH] drm/ttm: remove ttm_bo_wait_unreserved

2019-08-22 Thread Koenig, Christian
Am 22.08.19 um 08:49 schrieb Daniel Vetter:
> With nouveau fixed all ttm-using drives have the correct nesting of
> mmap_sem vs dma_resv, and we can just lock the buffer.
>
> Assuming I didn't screw up anything with my audit of course.
>
> v2:
> - Dont forget wu_mutex (Christian König)
> - Keep the mmap_sem-less wait optimization (Thomas)
> - Use _lock_interruptible to be good citizens (Thomas)
>
> Reviewed-by: Christian König 
> Signed-off-by: Daniel Vetter 
> Cc: Christian Koenig 
> Cc: Huang Rui 
> Cc: Gerd Hoffmann 
> Cc: "VMware Graphics" 
> Cc: Thomas Hellstrom 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c  | 36 ---
>   drivers/gpu/drm/ttm/ttm_bo_util.c |  1 -
>   drivers/gpu/drm/ttm/ttm_bo_vm.c   | 18 +---
>   include/drm/ttm/ttm_bo_api.h  |  4 
>   4 files changed, 5 insertions(+), 54 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 20ff56f27aa4..d1ce5d315d5b 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -162,7 +162,6 @@ static void ttm_bo_release_list(struct kref *list_kref)
>   dma_fence_put(bo->moving);
>   if (!ttm_bo_uses_embedded_gem_object(bo))
>   dma_resv_fini(&bo->base._resv);
> - mutex_destroy(&bo->wu_mutex);
>   bo->destroy(bo);
>   ttm_mem_global_free(bdev->glob->mem_glob, acc_size);
>   }
> @@ -1319,7 +1318,6 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev,
>   INIT_LIST_HEAD(&bo->ddestroy);
>   INIT_LIST_HEAD(&bo->swap);
>   INIT_LIST_HEAD(&bo->io_reserve_lru);
> - mutex_init(&bo->wu_mutex);
>   bo->bdev = bdev;
>   bo->type = type;
>   bo->num_pages = num_pages;
> @@ -1954,37 +1952,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev)
>   ;
>   }
>   EXPORT_SYMBOL(ttm_bo_swapout_all);
> -
> -/**
> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
> - * unreserved
> - *
> - * @bo: Pointer to buffer
> - */
> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo)
> -{
> - int ret;
> -
> - /*
> -  * In the absense of a wait_unlocked API,
> -  * Use the bo::wu_mutex to avoid triggering livelocks due to
> -  * concurrent use of this function. Note that this use of
> -  * bo::wu_mutex can go away if we change locking order to
> -  * mmap_sem -> bo::reserve.
> -  */
> - ret = mutex_lock_interruptible(&bo->wu_mutex);
> - if (unlikely(ret != 0))
> - return -ERESTARTSYS;
> - if (!dma_resv_is_locked(bo->base.resv))
> - goto out_unlock;
> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
> - if (ret == -EINTR)
> - ret = -ERESTARTSYS;
> - if (unlikely(ret != 0))
> - goto out_unlock;
> - dma_resv_unlock(bo->base.resv);
> -
> -out_unlock:
> - mutex_unlock(&bo->wu_mutex);
> - return ret;
> -}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
> b/drivers/gpu/drm/ttm/ttm_bo_util.c
> index fe81c565e7ef..82ea26a49959 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
> @@ -508,7 +508,6 @@ static int ttm_buffer_object_transfer(struct 
> ttm_buffer_object *bo,
>   INIT_LIST_HEAD(&fbo->base.lru);
>   INIT_LIST_HEAD(&fbo->base.swap);
>   INIT_LIST_HEAD(&fbo->base.io_reserve_lru);
> - mutex_init(&fbo->base.wu_mutex);
>   fbo->base.moving = NULL;
>   drm_vma_node_reset(&fbo->base.base.vma_node);
>   atomic_set(&fbo->base.cpu_writers, 0);
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 76eedb963693..a61a35e57d1c 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -125,30 +125,22 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
>   &bdev->man[bo->mem.mem_type];
>   struct vm_area_struct cvma;
>   
> - /*
> -  * Work around locking order reversal in fault / nopfn
> -  * between mmap_sem and bo_reserve: Perform a trylock operation
> -  * for reserve, and if it fails, retry the fault after waiting
> -  * for the buffer to become unreserved.
> -  */
>   if (unlikely(!dma_resv_trylock(bo->base.resv))) {
>   if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
>   if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {

Not an expert on fault handling, but shouldn't this now be one if?

E.g. if FAULT_FLAG_RETRY_NOWAIT is set we should return VM_FAULT_NOPAGE 
instead of VM_FAULT_RETRY.

But really take that with a grain of salt,
Christian.

>   ttm_bo_get(bo);
>   up_read(&vmf->vma->vm_mm->mmap_sem);
> - (void) ttm_bo_wait_unreserved(bo);
> + if (!dma_resv_lock_interruptible(bo->base.resv,
> +  NULL))
> + dma_resv_unlock(bo->base.resv);

Re: [Intel-gfx] [PATCH] dma_resv: prime lockdep annotations

2019-08-22 Thread Koenig, Christian
Am 22.08.19 um 08:54 schrieb Daniel Vetter:
> Full audit of everyone:
>
> - i915, radeon, amdgpu should be clean per their maintainers.
>
> - vram helpers should be fine, they don't do command submission, so
>really no business holding struct_mutex while doing copy_*_user. But
>I haven't checked them all.
>
> - panfrost seems to dma_resv_lock only in panfrost_job_push, which
>looks clean.
>
> - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(),
>copying from/to userspace happens all in v3d_lookup_bos which is
>outside of the critical section.
>
> - vmwgfx has a bunch of ioctls that do their own copy_*_user:
>- vmw_execbuf_process: First this does some copies in
>  vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself.
>  Then comes the usual ttm reserve/validate sequence, then actual
>  submission/fencing, then unreserving, and finally some more
>  copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of
>  details, but looks all safe.
>- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be
>  seen, seems to only create a fence and copy it out.
>- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be
>  found there.
>Summary: vmwgfx seems to be fine too.
>
> - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the
>copying from userspace before even looking up objects through their
>handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
>
> - qxl only has qxl_execbuffer_ioctl, which calls into
>qxl_process_single_command. There's a lovely comment before the
>__copy_from_user_inatomic that the slowpath should be copied from
>i915, but I guess that never happened. Try not to be unlucky and get
>your CS data evicted between when it's written and the kernel tries
>to read it. The only other copy_from_user is for relocs, but those
>are done before qxl_release_reserve_list(), which seems to be the
>only thing reserving buffers (in the ttm/dma_resv sense) in that
>code. So looks safe.
>
> - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in
>usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this
>everywhere and needs to be fixed up.
>
> v2: Thomas pointed at that vmwgfx calls dma_resv_init while it holds a
> dma_resv lock of a different object already. Christian mentioned that
> ttm core does this too for ghost objects. intel-gfx-ci highlighted
> that i915 has similar issues.
>
> Unfortunately we can't do this in the usual module init functions,
> because kernel threads don't have an ->mm - we have to wait around for
> some user thread to do this.
>
> Solution is to spawn a worker (but only once). It's horrible, but it
> works.
>
> v3: We can allocate mm! (Chris). Horrible worker hack out, clean
> initcall solution in.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: Chris Wilson 
> Cc: Thomas Zimmermann 
> Cc: Rob Herring 
> Cc: Tomeu Vizoso 
> Cc: Eric Anholt 
> Cc: Dave Airlie 
> Cc: Gerd Hoffmann 
> Cc: Ben Skeggs 
> Cc: "VMware Graphics" 
> Cc: Thomas Hellstrom 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Christian König 

> ---
>   drivers/dma-buf/dma-resv.c | 24 
>   1 file changed, 24 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 42a8f3f11681..d233ef4cf0d7 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -34,6 +34,7 @@
>   
>   #include 
>   #include 
> +#include 
>   
>   /**
>* DOC: Reservation Object Overview
> @@ -95,6 +96,29 @@ static void dma_resv_list_free(struct dma_resv_list *list)
>   kfree_rcu(list, rcu);
>   }
>   
> +#if IS_ENABLED(CONFIG_LOCKDEP)
> +static void dma_resv_lockdep(void)
> +{
> + struct mm_struct *mm = mm_alloc();
> + struct dma_resv obj;
> +
> + if (!mm)
> + return;
> +
> + dma_resv_init(&obj);
> +
> + down_read(&mm->mmap_sem);
> + ww_mutex_lock(&obj.lock, NULL);
> + fs_reclaim_acquire(GFP_KERNEL);
> + fs_reclaim_release(GFP_KERNEL);
> + ww_mutex_unlock(&obj.lock);
> + up_read(&mm->mmap_sem);
> + 
> + mmput(mm);
> +}
> +subsys_initcall(dma_resv_lockdep);
> +#endif
> +
>   /**
>* dma_resv_init - initialize a reservation object
>* @obj: the reservation object

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations

2019-08-21 Thread Koenig, Christian


Am 21.08.2019 20:28 schrieb "Thomas Hellström (VMware)" 
:
On 8/21/19 8:11 PM, Daniel Vetter wrote:
> On Wed, Aug 21, 2019 at 7:06 PM Thomas Hellström (VMware)
>  wrote:
>> On 8/21/19 6:34 PM, Daniel Vetter wrote:
>>> On Wed, Aug 21, 2019 at 05:54:27PM +0200, Thomas Hellström (VMware) wrote:
 On 8/20/19 4:53 PM, Daniel Vetter wrote:
> Full audit of everyone:
>
> - i915, radeon, amdgpu should be clean per their maintainers.
>
> - vram helpers should be fine, they don't do command submission, so
>  really no business holding struct_mutex while doing copy_*_user. But
>  I haven't checked them all.
>
> - panfrost seems to dma_resv_lock only in panfrost_job_push, which
>  looks clean.
>
> - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(),
>  copying from/to userspace happens all in v3d_lookup_bos which is
>  outside of the critical section.
>
> - vmwgfx has a bunch of ioctls that do their own copy_*_user:
>  - vmw_execbuf_process: First this does some copies in
>vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself.
>Then comes the usual ttm reserve/validate sequence, then actual
>submission/fencing, then unreserving, and finally some more
>copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of
>details, but looks all safe.
>  - vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be
>seen, seems to only create a fence and copy it out.
>  - a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be
>found there.
>  Summary: vmwgfx seems to be fine too.
>
> - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the
>  copying from userspace before even looking up objects through their
>  handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
>
> - qxl only has qxl_execbuffer_ioctl, which calls into
>  qxl_process_single_command. There's a lovely comment before the
>  __copy_from_user_inatomic that the slowpath should be copied from
>  i915, but I guess that never happened. Try not to be unlucky and get
>  your CS data evicted between when it's written and the kernel tries
>  to read it. The only other copy_from_user is for relocs, but those
>  are done before qxl_release_reserve_list(), which seems to be the
>  only thing reserving buffers (in the ttm/dma_resv sense) in that
>  code. So looks safe.
>
> - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in
>  usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this
>  everywhere and needs to be fixed up.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: Chris Wilson 
> Cc: Thomas Zimmermann 
> Cc: Rob Herring 
> Cc: Tomeu Vizoso 
> Cc: Eric Anholt 
> Cc: Dave Airlie 
> Cc: Gerd Hoffmann 
> Cc: Ben Skeggs 
> Cc: "VMware Graphics" 
> Cc: Thomas Hellstrom 
> Signed-off-by: Daniel Vetter 
> ---
> drivers/dma-buf/dma-resv.c | 12 
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 42a8f3f11681..3edca10d3faf 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -34,6 +34,7 @@
> #include 
> #include 
> +#include 
> /**
>  * DOC: Reservation Object Overview
> @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj)
>  &reservation_seqcount_class);
>  RCU_INIT_POINTER(obj->fence, NULL);
>  RCU_INIT_POINTER(obj->fence_excl, NULL);
> +
> +   if (IS_ENABLED(CONFIG_LOCKDEP)) {
> +   if (current->mm)
> +   down_read(¤t->mm->mmap_sem);
> +   ww_mutex_lock(&obj->lock, NULL);
> +   fs_reclaim_acquire(GFP_KERNEL);
> +   fs_reclaim_release(GFP_KERNEL);
> +   ww_mutex_unlock(&obj->lock);
> +   if (current->mm)
> +   up_read(¤t->mm->mmap_sem);
> +   }
> }
> EXPORT_SYMBOL(dma_resv_init);
 I assume if this would have been easily done and maintainable using only
 lockdep annotation instead of actually acquiring the locks, that would have
 been done?
>>> There's might_lock(), plus a pile of macros, but they don't map obviuosly,
>>> so pretty good chances I accidentally end up with the wrong type of
>>> annotation. Easier to just take the locks quickly, and stuff that all into
>>> a lockdep-only section to avoid overhead.
>>>
 Otherwise LGTM.

 Reviewed-by: Thomas Hellström 

 Will test this and let you know if it trips on vmwgfx, but it really
 shouldn't.
>>> Thanks, Daniel
>> One thing that strikes me is 

Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

2019-08-21 Thread Koenig, Christian
Am 21.08.19 um 16:47 schrieb Daniel Vetter:
> On Wed, Aug 21, 2019 at 4:27 PM Thomas Hellström (VMware)
>  wrote:
>> On 8/21/19 4:09 PM, Daniel Vetter wrote:
>>> On Wed, Aug 21, 2019 at 2:47 PM Thomas Hellström (VMware)
>>>  wrote:
 On 8/21/19 2:40 PM, Thomas Hellström (VMware) wrote:
> On 8/20/19 4:53 PM, Daniel Vetter wrote:
> [SNIP]
>> but to keep the mm latency optimization using the RETRY functionality:
> Still no idea why this is needed? All the comments here and the code
> and history seem like they've been about the mmap_sem vs dma_resv
> inversion between driver ioctls and fault handling here. Once that's
> officially fixed there's no reason to play games here and retry loops
> - previously that was necessary because the old ttm_bo_vm_fault had a
> busy spin and that's definitely not nice. If it's needed I think it
> should be a second patch on top, to keep this all clear. I had to
> audit an enormous amount of code, I'd like to make sure I didn't miss
> anything before we start to make this super fancy again. Further
> patches on top is obviously all fine with me.

I think this is just an optimization to not hold the mmap_sem while 
waiting for the dma_resv lock.

I agree that it shouldn't be necessary, but maybe it's a good idea for 
performance. I'm also OK with removing it, cause I'm not sure if it's 
worth it.

But Thomas noted correctly that we should probably do it in a separate 
patch so that when somebody points out "Hey my system is slower now!" 
he's able to bisect to the change.

Christian.

> -Daniel
>
>> Thanks,
>> Thomas
>>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

2019-08-20 Thread Koenig, Christian
Am 20.08.19 um 17:41 schrieb Daniel Vetter:
> On Tue, Aug 20, 2019 at 5:34 PM Koenig, Christian
>  wrote:
>> Am 20.08.19 um 17:21 schrieb Daniel Vetter:
>>> On Tue, Aug 20, 2019 at 5:16 PM Koenig, Christian
>>>  wrote:
>>>> Am 20.08.19 um 16:53 schrieb Daniel Vetter:
>>>>> With nouveau fixed all ttm-using drives have the correct nesting of
>>>>> mmap_sem vs dma_resv, and we can just lock the buffer.
>>>>>
>>>>> Assuming I didn't screw up anything with my audit of course.
>>>>>
>>>>> Signed-off-by: Daniel Vetter 
>>>>> Cc: Christian Koenig 
>>>>> Cc: Huang Rui 
>>>>> Cc: Gerd Hoffmann 
>>>>> Cc: "VMware Graphics" 
>>>>> Cc: Thomas Hellstrom 
>>>> Yes, please. But one more point: you can now remove bo->wu_mutex as well!
>>> Ah right totally forgot about that in my enthusiasm after all the
>>> auditing and fixing nouveau.
>>>
>>>> Apart from that Reviewed-by: Christian König 
>>> Thanks, I already respun the patches, so will be in the next version.
>>> Can you pls also give this a spin on the amdgpu CI, just to make sure
>>> it's all fine? With full lockdep ofc. And then reply with a t-b.
>> I can ask for this on our call tomorrow, but I fear our CI
>> infrastructure is not ready yet.
> I thought you have some internal branch you all commit amdgpu stuff
> for, and then Alex goes and collects the pieces that are ready?

No, that part is correct. The problem is that this branch is not QA 
tested regularly as far as I know.

> Or does that all blow up if you push a patch which touches code outside
> of the dkms?

No, but the problem is related to that.

See the release branches for dkms are separate and indeed QA tested 
regularly.

But changes from amd-staging-drm-next are only cherry picked into those 
in certain intervals.

Well going to discuss that tomorrow,
Christian.

> -Daniel
>
>> Christian.
>>
>>> Thanks, Daniel
>>>>> ---
>>>>> drivers/gpu/drm/ttm/ttm_bo.c| 34 -
>>>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +
>>>>> include/drm/ttm/ttm_bo_api.h|  1 -
>>>>> 3 files changed, 1 insertion(+), 60 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> index 20ff56f27aa4..a952dd624b06 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev)
>>>>> ;
>>>>> }
>>>>> EXPORT_SYMBOL(ttm_bo_swapout_all);
>>>>> -
>>>>> -/**
>>>>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to 
>>>>> become
>>>>> - * unreserved
>>>>> - *
>>>>> - * @bo: Pointer to buffer
>>>>> - */
>>>>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo)
>>>>> -{
>>>>> - int ret;
>>>>> -
>>>>> - /*
>>>>> -  * In the absense of a wait_unlocked API,
>>>>> -  * Use the bo::wu_mutex to avoid triggering livelocks due to
>>>>> -  * concurrent use of this function. Note that this use of
>>>>> -  * bo::wu_mutex can go away if we change locking order to
>>>>> -  * mmap_sem -> bo::reserve.
>>>>> -  */
>>>>> - ret = mutex_lock_interruptible(&bo->wu_mutex);
>>>>> - if (unlikely(ret != 0))
>>>>> - return -ERESTARTSYS;
>>>>> - if (!dma_resv_is_locked(bo->base.resv))
>>>>> - goto out_unlock;
>>>>> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
>>>>> - if (ret == -EINTR)
>>>>> - ret = -ERESTARTSYS;
>>>>> - if (unlikely(ret != 0))
>>>>> - goto out_unlock;
>>>>> - dma_resv_unlock(bo->base.resv);
>>>>> -
>>>>> -out_unlock:
>>>>> - mutex_unlock(&bo->wu_mutex);
>>>>> - return ret;
>>>>> -}
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>&g

Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

2019-08-20 Thread Koenig, Christian
Am 20.08.19 um 17:21 schrieb Daniel Vetter:
> On Tue, Aug 20, 2019 at 5:16 PM Koenig, Christian
>  wrote:
>> Am 20.08.19 um 16:53 schrieb Daniel Vetter:
>>> With nouveau fixed all ttm-using drives have the correct nesting of
>>> mmap_sem vs dma_resv, and we can just lock the buffer.
>>>
>>> Assuming I didn't screw up anything with my audit of course.
>>>
>>> Signed-off-by: Daniel Vetter 
>>> Cc: Christian Koenig 
>>> Cc: Huang Rui 
>>> Cc: Gerd Hoffmann 
>>> Cc: "VMware Graphics" 
>>> Cc: Thomas Hellstrom 
>> Yes, please. But one more point: you can now remove bo->wu_mutex as well!
> Ah right totally forgot about that in my enthusiasm after all the
> auditing and fixing nouveau.
>
>> Apart from that Reviewed-by: Christian König 
> Thanks, I already respun the patches, so will be in the next version.
> Can you pls also give this a spin on the amdgpu CI, just to make sure
> it's all fine? With full lockdep ofc. And then reply with a t-b.

I can ask for this on our call tomorrow, but I fear our CI 
infrastructure is not ready yet.

Christian.

>
> Thanks, Daniel
>>> ---
>>>drivers/gpu/drm/ttm/ttm_bo.c| 34 -
>>>drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +
>>>include/drm/ttm/ttm_bo_api.h|  1 -
>>>3 files changed, 1 insertion(+), 60 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 20ff56f27aa4..a952dd624b06 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev)
>>>;
>>>}
>>>EXPORT_SYMBOL(ttm_bo_swapout_all);
>>> -
>>> -/**
>>> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to 
>>> become
>>> - * unreserved
>>> - *
>>> - * @bo: Pointer to buffer
>>> - */
>>> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo)
>>> -{
>>> - int ret;
>>> -
>>> - /*
>>> -  * In the absense of a wait_unlocked API,
>>> -  * Use the bo::wu_mutex to avoid triggering livelocks due to
>>> -  * concurrent use of this function. Note that this use of
>>> -  * bo::wu_mutex can go away if we change locking order to
>>> -  * mmap_sem -> bo::reserve.
>>> -  */
>>> - ret = mutex_lock_interruptible(&bo->wu_mutex);
>>> - if (unlikely(ret != 0))
>>> - return -ERESTARTSYS;
>>> - if (!dma_resv_is_locked(bo->base.resv))
>>> - goto out_unlock;
>>> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
>>> - if (ret == -EINTR)
>>> - ret = -ERESTARTSYS;
>>> - if (unlikely(ret != 0))
>>> - goto out_unlock;
>>> - dma_resv_unlock(bo->base.resv);
>>> -
>>> -out_unlock:
>>> - mutex_unlock(&bo->wu_mutex);
>>> - return ret;
>>> -}
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> index 76eedb963693..505e1787aeea 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
>>>&bdev->man[bo->mem.mem_type];
>>>struct vm_area_struct cvma;
>>>
>>> - /*
>>> -  * Work around locking order reversal in fault / nopfn
>>> -  * between mmap_sem and bo_reserve: Perform a trylock operation
>>> -  * for reserve, and if it fails, retry the fault after waiting
>>> -  * for the buffer to become unreserved.
>>> -  */
>>> - if (unlikely(!dma_resv_trylock(bo->base.resv))) {
>>> - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
>>> - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
>>> - ttm_bo_get(bo);
>>> - up_read(&vmf->vma->vm_mm->mmap_sem);
>>> - (void) ttm_bo_wait_unreserved(bo);
>>> - ttm_bo_put(bo);
>>> - }
>>> -
>>> - return VM_FAULT_RETRY;
>>> - }
>>> -
>>> - /*

Re: [Intel-gfx] [PATCH 3/3] drm/ttm: remove ttm_bo_wait_unreserved

2019-08-20 Thread Koenig, Christian
Am 20.08.19 um 16:53 schrieb Daniel Vetter:
> With nouveau fixed all ttm-using drives have the correct nesting of
> mmap_sem vs dma_resv, and we can just lock the buffer.
>
> Assuming I didn't screw up anything with my audit of course.
>
> Signed-off-by: Daniel Vetter 
> Cc: Christian Koenig 
> Cc: Huang Rui 
> Cc: Gerd Hoffmann 
> Cc: "VMware Graphics" 
> Cc: Thomas Hellstrom 

Yes, please. But one more point: you can now remove bo->wu_mutex as well!

Apart from that Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c| 34 -
>   drivers/gpu/drm/ttm/ttm_bo_vm.c | 26 +
>   include/drm/ttm/ttm_bo_api.h|  1 -
>   3 files changed, 1 insertion(+), 60 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 20ff56f27aa4..a952dd624b06 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1954,37 +1954,3 @@ void ttm_bo_swapout_all(struct ttm_bo_device *bdev)
>   ;
>   }
>   EXPORT_SYMBOL(ttm_bo_swapout_all);
> -
> -/**
> - * ttm_bo_wait_unreserved - interruptible wait for a buffer object to become
> - * unreserved
> - *
> - * @bo: Pointer to buffer
> - */
> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo)
> -{
> - int ret;
> -
> - /*
> -  * In the absense of a wait_unlocked API,
> -  * Use the bo::wu_mutex to avoid triggering livelocks due to
> -  * concurrent use of this function. Note that this use of
> -  * bo::wu_mutex can go away if we change locking order to
> -  * mmap_sem -> bo::reserve.
> -  */
> - ret = mutex_lock_interruptible(&bo->wu_mutex);
> - if (unlikely(ret != 0))
> - return -ERESTARTSYS;
> - if (!dma_resv_is_locked(bo->base.resv))
> - goto out_unlock;
> - ret = dma_resv_lock_interruptible(bo->base.resv, NULL);
> - if (ret == -EINTR)
> - ret = -ERESTARTSYS;
> - if (unlikely(ret != 0))
> - goto out_unlock;
> - dma_resv_unlock(bo->base.resv);
> -
> -out_unlock:
> - mutex_unlock(&bo->wu_mutex);
> - return ret;
> -}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 76eedb963693..505e1787aeea 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -125,31 +125,7 @@ static vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
>   &bdev->man[bo->mem.mem_type];
>   struct vm_area_struct cvma;
>   
> - /*
> -  * Work around locking order reversal in fault / nopfn
> -  * between mmap_sem and bo_reserve: Perform a trylock operation
> -  * for reserve, and if it fails, retry the fault after waiting
> -  * for the buffer to become unreserved.
> -  */
> - if (unlikely(!dma_resv_trylock(bo->base.resv))) {
> - if (vmf->flags & FAULT_FLAG_ALLOW_RETRY) {
> - if (!(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) {
> - ttm_bo_get(bo);
> - up_read(&vmf->vma->vm_mm->mmap_sem);
> - (void) ttm_bo_wait_unreserved(bo);
> - ttm_bo_put(bo);
> - }
> -
> - return VM_FAULT_RETRY;
> - }
> -
> - /*
> -  * If we'd want to change locking order to
> -  * mmap_sem -> bo::reserve, we'd use a blocking reserve here
> -  * instead of retrying the fault...
> -  */
> - return VM_FAULT_NOPAGE;
> - }
> + dma_resv_lock(bo->base.resv, NULL);
>   
>   /*
>* Refuse to fault imported pages. This should be handled
> diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
> index 43c4929a2171..6b50e624e3e2 100644
> --- a/include/drm/ttm/ttm_bo_api.h
> +++ b/include/drm/ttm/ttm_bo_api.h
> @@ -765,7 +765,6 @@ ssize_t ttm_bo_io(struct ttm_bo_device *bdev, struct file 
> *filp,
>   int ttm_bo_swapout(struct ttm_bo_global *glob,
>   struct ttm_operation_ctx *ctx);
>   void ttm_bo_swapout_all(struct ttm_bo_device *bdev);
> -int ttm_bo_wait_unreserved(struct ttm_buffer_object *bo);
>   
>   /**
>* ttm_bo_uses_embedded_gem_object - check if the given bo uses the

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/3] dma_resv: prime lockdep annotations

2019-08-20 Thread Koenig, Christian
Am 20.08.19 um 16:53 schrieb Daniel Vetter:
> Full audit of everyone:
>
> - i915, radeon, amdgpu should be clean per their maintainers.
>
> - vram helpers should be fine, they don't do command submission, so
>really no business holding struct_mutex while doing copy_*_user. But
>I haven't checked them all.
>
> - panfrost seems to dma_resv_lock only in panfrost_job_push, which
>looks clean.
>
> - v3d holds dma_resv locks in the tail of its v3d_submit_cl_ioctl(),
>copying from/to userspace happens all in v3d_lookup_bos which is
>outside of the critical section.
>
> - vmwgfx has a bunch of ioctls that do their own copy_*_user:
>- vmw_execbuf_process: First this does some copies in
>  vmw_execbuf_cmdbuf() and also in the vmw_execbuf_process() itself.
>  Then comes the usual ttm reserve/validate sequence, then actual
>  submission/fencing, then unreserving, and finally some more
>  copy_to_user in vmw_execbuf_copy_fence_user. Glossing over tons of
>  details, but looks all safe.
>- vmw_fence_event_ioctl: No ttm_reserve/dma_resv_lock anywhere to be
>  seen, seems to only create a fence and copy it out.
>- a pile of smaller ioctl in vmwgfx_ioctl.c, no reservations to be
>  found there.
>Summary: vmwgfx seems to be fine too.
>
> - virtio: There's virtio_gpu_execbuffer_ioctl, which does all the
>copying from userspace before even looking up objects through their
>handles, so safe. Plus the getparam/getcaps ioctl, also both safe.
>
> - qxl only has qxl_execbuffer_ioctl, which calls into
>qxl_process_single_command. There's a lovely comment before the
>__copy_from_user_inatomic that the slowpath should be copied from
>i915, but I guess that never happened. Try not to be unlucky and get
>your CS data evicted between when it's written and the kernel tries
>to read it. The only other copy_from_user is for relocs, but those
>are done before qxl_release_reserve_list(), which seems to be the
>only thing reserving buffers (in the ttm/dma_resv sense) in that
>code. So looks safe.
>
> - A debugfs file in nouveau_debugfs_pstate_set() and the usif ioctl in
>usif_ioctl() look safe. nouveau_gem_ioctl_pushbuf() otoh breaks this
>everywhere and needs to be fixed up.
>
> Cc: Alex Deucher 
> Cc: Christian König 
> Cc: Chris Wilson 
> Cc: Thomas Zimmermann 
> Cc: Rob Herring 
> Cc: Tomeu Vizoso 
> Cc: Eric Anholt 
> Cc: Dave Airlie 
> Cc: Gerd Hoffmann 
> Cc: Ben Skeggs 
> Cc: "VMware Graphics" 
> Cc: Thomas Hellstrom 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Christian König 

> ---
>   drivers/dma-buf/dma-resv.c | 12 
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 42a8f3f11681..3edca10d3faf 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -34,6 +34,7 @@
>   
>   #include 
>   #include 
> +#include 
>   
>   /**
>* DOC: Reservation Object Overview
> @@ -107,6 +108,17 @@ void dma_resv_init(struct dma_resv *obj)
>   &reservation_seqcount_class);
>   RCU_INIT_POINTER(obj->fence, NULL);
>   RCU_INIT_POINTER(obj->fence_excl, NULL);
> +
> + if (IS_ENABLED(CONFIG_LOCKDEP)) {
> + if (current->mm)
> + down_read(¤t->mm->mmap_sem);
> + ww_mutex_lock(&obj->lock, NULL);
> + fs_reclaim_acquire(GFP_KERNEL);
> + fs_reclaim_release(GFP_KERNEL);
> + ww_mutex_unlock(&obj->lock);
> + if (current->mm)
> + up_read(¤t->mm->mmap_sem);
> + }
>   }
>   EXPORT_SYMBOL(dma_resv_init);
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2] dma-fence: Store the timestamp in the same union as the cb_list

2019-08-17 Thread Koenig, Christian
Am 17.08.19 um 17:30 schrieb Chris Wilson:
> The timestamp and the cb_list are mutually exclusive, the cb_list can
> only be added to prior to being signaled (and once signaled we drain),
> while the timestamp is only valid upon being signaled. Both the
> timestamp and the cb_list are only valid while the fence is alive, and
> as soon as no references are held can be replaced by the rcu_head.
>
> By reusing the union for the timestamp, we squeeze the base dma_fence
> struct to 64 bytes on x86-64.
>
> v2: Sort the union chronologically
>
> Suggested-by: Christian König 
> Signed-off-by: Chris Wilson 
> Cc: Christian König 

I can't judge about the correctness of the vmw and Intel stuff, so only 
Acked-by: Christian König .

> ---
>   drivers/dma-buf/dma-fence.c | 16 +++---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 13 ++--
>   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 +++
>   include/linux/dma-fence.h   | 23 -
>   4 files changed, 37 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 8a6d0250285d..2c136aee3e79 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -129,6 +129,7 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>   int dma_fence_signal_locked(struct dma_fence *fence)
>   {
>   struct dma_fence_cb *cur, *tmp;
> + struct list_head cb_list;
>   
>   lockdep_assert_held(fence->lock);
>   
> @@ -136,16 +137,16 @@ int dma_fence_signal_locked(struct dma_fence *fence)
> &fence->flags)))
>   return -EINVAL;
>   
> + /* Stash the cb_list before replacing it with the timestamp */
> + list_replace(&fence->cb_list, &cb_list);
> +
>   fence->timestamp = ktime_get();
>   set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>   trace_dma_fence_signaled(fence);
>   
> - if (!list_empty(&fence->cb_list)) {
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> - INIT_LIST_HEAD(&cur->node);
> - cur->func(fence, cur);
> - }
> - INIT_LIST_HEAD(&fence->cb_list);
> + list_for_each_entry_safe(cur, tmp, &cb_list, node) {
> + INIT_LIST_HEAD(&cur->node);
> + cur->func(fence, cur);
>   }
>   
>   return 0;
> @@ -231,7 +232,8 @@ void dma_fence_release(struct kref *kref)
>   
>   trace_dma_fence_destroy(fence);
>   
> - if (WARN(!list_empty(&fence->cb_list),
> + if (WARN(!list_empty(&fence->cb_list) &&
> +  !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags),
>"Fence %s:%s:%llx:%llx released with pending signals!\n",
>fence->ops->get_driver_name(fence),
>fence->ops->get_timeline_name(fence),
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 2bc9c460e78d..09c68dda2098 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -114,18 +114,18 @@ __dma_fence_signal__timestamp(struct dma_fence *fence, 
> ktime_t timestamp)
>   }
>   
>   static void
> -__dma_fence_signal__notify(struct dma_fence *fence)
> +__dma_fence_signal__notify(struct dma_fence *fence,
> +const struct list_head *list)
>   {
>   struct dma_fence_cb *cur, *tmp;
>   
>   lockdep_assert_held(fence->lock);
>   lockdep_assert_irqs_disabled();
>   
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> + list_for_each_entry_safe(cur, tmp, list, node) {
>   INIT_LIST_HEAD(&cur->node);
>   cur->func(fence, cur);
>   }
> - INIT_LIST_HEAD(&fence->cb_list);
>   }
>   
>   void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> @@ -187,11 +187,12 @@ void intel_engine_breadcrumbs_irq(struct 
> intel_engine_cs *engine)
>   list_for_each_safe(pos, next, &signal) {
>   struct i915_request *rq =
>   list_entry(pos, typeof(*rq), signal_link);
> -
> - __dma_fence_signal__timestamp(&rq->fence, timestamp);
> + struct list_head cb_list;
>   
>   spin_lock(&rq->lock);
> - __dma_fence_signal__notify(&rq->fence);
> + list_replace(&rq->fence.cb_list, &cb_list);
> + __dma_fence_signal__timestamp(&rq->fence, timestamp);
> + __dma_fence_signal__notify(&rq->fence, &cb_list);
>   spin_unlock(&rq->lock);
>   
>   i915_request_put(rq);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 434dfadb0e52..178a6cd1a06f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -185,6 +185,9 @@ static long vmw_fence_wait(struct dma_fence *f, bool 
> intr, signed long timeout)
>   
>   spin_lock(

Re: [Intel-gfx] [PATCH 6/6] dma-fence: Store the timestamp in the same union as the cb_list

2019-08-17 Thread Koenig, Christian
Am 17.08.19 um 17:27 schrieb Chris Wilson:
> Quoting Koenig, Christian (2019-08-17 16:20:12)
>> Am 17.08.19 um 16:47 schrieb Chris Wilson:
>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>>> index 89d96e3e6df6..2c21115b1a37 100644
>>> --- a/drivers/dma-buf/dma-fence.c
>>> +++ b/drivers/dma-buf/dma-fence.c
>>> @@ -129,6 +129,7 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>>>int dma_fence_signal_locked(struct dma_fence *fence)
>>>{
>>>struct dma_fence_cb *cur, *tmp;
>>> + struct list_head cb_list;
>>>
>>>lockdep_assert_held(fence->lock);
>>>
>>> @@ -136,16 +137,16 @@ int dma_fence_signal_locked(struct dma_fence *fence)
>>>  &fence->flags)))
>>>return -EINVAL;
>>>
>>> + /* Stash the cb_list before replacing it with the timestamp */
>>> + list_replace(&fence->cb_list, &cb_list);
>> Stashing the timestamp instead is probably less bytes to modify.
> My thinking was to pass the timestamp to the notify callbacks, we need
> to stash the list and set the timestamp first.

I don't see much of a reason for callbacks to use the timestamp, they 
could just call ktime_get() and would most likely get the same or at 
least a very close by value.

> Nothing that I'm aware of uses the timestamp (just the sync file debug
> which weston was considering using at one point)... So I guess we don't
> care? But I would say we should do that as a separate step in case
> someone does.

Yeah, agree.

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3] dma-fence: Simply wrap dma_fence_signal_locked with dma_fence_signal

2019-08-17 Thread Koenig, Christian
Am 17.08.19 um 17:23 schrieb Chris Wilson:
> Currently dma_fence_signal() tries to avoid the spinlock and only takes
> it if absolutely required to walk the callback list. However, to allow
> for some users to surreptitiously insert lazy signal callbacks that
> do not depend on enabling the signaling mechanism around every fence,
> we always need to notify the callbacks on signaling. As such, we will
> always need to take the spinlock and dma_fence_signal() effectively
> becomes a clone of dma_fence_signal_locked().
>
> v2: Update the test_and_set_bit() before entering the spinlock.
> v3: Drop the test_[and_set]_bit() before the spinlock, it's a caller
> error so expected to be very unlikely.
>
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> Cc: Daniel Vetter 

Reviewed-by: Christian König 

> ---
>   drivers/dma-buf/dma-fence.c | 44 ++---
>   1 file changed, 12 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ff0cd6eae766..8a6d0250285d 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -129,25 +129,16 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>   int dma_fence_signal_locked(struct dma_fence *fence)
>   {
>   struct dma_fence_cb *cur, *tmp;
> - int ret = 0;
>   
>   lockdep_assert_held(fence->lock);
>   
> - if (WARN_ON(!fence))
> + if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +   &fence->flags)))
>   return -EINVAL;
>   
> - if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
> - ret = -EINVAL;
> -
> - /*
> -  * we might have raced with the unlocked dma_fence_signal,
> -  * still run through all callbacks
> -  */
> - } else {
> - fence->timestamp = ktime_get();
> - set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> - trace_dma_fence_signaled(fence);
> - }
> + fence->timestamp = ktime_get();
> + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> + trace_dma_fence_signaled(fence);
>   
>   if (!list_empty(&fence->cb_list)) {
>   list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> @@ -156,7 +147,8 @@ int dma_fence_signal_locked(struct dma_fence *fence)
>   }
>   INIT_LIST_HEAD(&fence->cb_list);
>   }
> - return ret;
> +
> + return 0;
>   }
>   EXPORT_SYMBOL(dma_fence_signal_locked);
>   
> @@ -176,28 +168,16 @@ EXPORT_SYMBOL(dma_fence_signal_locked);
>   int dma_fence_signal(struct dma_fence *fence)
>   {
>   unsigned long flags;
> + int ret;
>   
>   if (!fence)
>   return -EINVAL;
>   
> - if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> - return -EINVAL;
> -
> - fence->timestamp = ktime_get();
> - set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> - trace_dma_fence_signaled(fence);
> -
> - if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
> - struct dma_fence_cb *cur, *tmp;
> + spin_lock_irqsave(fence->lock, flags);
> + ret = dma_fence_signal_locked(fence);
> + spin_unlock_irqrestore(fence->lock, flags);
>   
> - spin_lock_irqsave(fence->lock, flags);
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> - list_del_init(&cur->node);
> - cur->func(fence, cur);
> - }
> - spin_unlock_irqrestore(fence->lock, flags);
> - }
> - return 0;
> + return ret;
>   }
>   EXPORT_SYMBOL(dma_fence_signal);
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 6/6] dma-fence: Store the timestamp in the same union as the cb_list

2019-08-17 Thread Koenig, Christian
Am 17.08.19 um 16:47 schrieb Chris Wilson:
> The timestamp and the cb_list are mutually exclusive, the cb_list can
> only be added to prior to being signaled (and once signaled we drain),
> while the timestamp is only valid upon being signaled. Both the
> timestamp and the cb_list are only valid while the fence is alive, and
> as soon as no references are held can be replaced by the rcu_head.
>
> By reusing the union for the timestamp, we squeeze the base dma_fence
> struct to 64 bytes on x86-64.
>
> Suggested-by: Christian König 
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> ---
>   drivers/dma-buf/dma-fence.c | 16 +---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 13 +++--
>   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |  3 +++
>   include/linux/dma-fence.h   | 17 +
>   4 files changed, 32 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 89d96e3e6df6..2c21115b1a37 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -129,6 +129,7 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>   int dma_fence_signal_locked(struct dma_fence *fence)
>   {
>   struct dma_fence_cb *cur, *tmp;
> + struct list_head cb_list;
>   
>   lockdep_assert_held(fence->lock);
>   
> @@ -136,16 +137,16 @@ int dma_fence_signal_locked(struct dma_fence *fence)
> &fence->flags)))
>   return -EINVAL;
>   
> + /* Stash the cb_list before replacing it with the timestamp */
> + list_replace(&fence->cb_list, &cb_list);

Stashing the timestamp instead is probably less bytes to modify.

> +
>   fence->timestamp = ktime_get();
>   set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
>   trace_dma_fence_signaled(fence);
>   
> - if (!list_empty(&fence->cb_list)) {
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> - INIT_LIST_HEAD(&cur->node);
> - cur->func(fence, cur);
> - }
> - INIT_LIST_HEAD(&fence->cb_list);
> + list_for_each_entry_safe(cur, tmp, &cb_list, node) {
> + INIT_LIST_HEAD(&cur->node);
> + cur->func(fence, cur);
>   }
>   
>   return 0;
> @@ -234,7 +235,8 @@ void dma_fence_release(struct kref *kref)
>   
>   trace_dma_fence_destroy(fence);
>   
> - if (WARN(!list_empty(&fence->cb_list),
> + if (WARN(!list_empty(&fence->cb_list) &&
> +  !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags),
>"Fence %s:%s:%llx:%llx released with pending signals!\n",
>fence->ops->get_driver_name(fence),
>fence->ops->get_timeline_name(fence),
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 2bc9c460e78d..09c68dda2098 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -114,18 +114,18 @@ __dma_fence_signal__timestamp(struct dma_fence *fence, 
> ktime_t timestamp)
>   }
>   
>   static void
> -__dma_fence_signal__notify(struct dma_fence *fence)
> +__dma_fence_signal__notify(struct dma_fence *fence,
> +const struct list_head *list)
>   {
>   struct dma_fence_cb *cur, *tmp;
>   
>   lockdep_assert_held(fence->lock);
>   lockdep_assert_irqs_disabled();
>   
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> + list_for_each_entry_safe(cur, tmp, list, node) {
>   INIT_LIST_HEAD(&cur->node);
>   cur->func(fence, cur);
>   }
> - INIT_LIST_HEAD(&fence->cb_list);
>   }
>   
>   void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> @@ -187,11 +187,12 @@ void intel_engine_breadcrumbs_irq(struct 
> intel_engine_cs *engine)
>   list_for_each_safe(pos, next, &signal) {
>   struct i915_request *rq =
>   list_entry(pos, typeof(*rq), signal_link);
> -
> - __dma_fence_signal__timestamp(&rq->fence, timestamp);
> + struct list_head cb_list;
>   
>   spin_lock(&rq->lock);
> - __dma_fence_signal__notify(&rq->fence);
> + list_replace(&rq->fence.cb_list, &cb_list);
> + __dma_fence_signal__timestamp(&rq->fence, timestamp);
> + __dma_fence_signal__notify(&rq->fence, &cb_list);
>   spin_unlock(&rq->lock);
>   
>   i915_request_put(rq);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 434dfadb0e52..178a6cd1a06f 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -185,6 +185,9 @@ static long vmw_fence_wait(struct dma_fence *f, bool 
> intr, signed long timeout)
>   
>   spin_lock(f->lock);
>   
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->flags))

Re: [Intel-gfx] [PATCH 5/6] dma-fence: Simply wrap dma_fence_signal_locked with dma_fence_signal

2019-08-17 Thread Koenig, Christian
Am 17.08.19 um 16:47 schrieb Chris Wilson:
> Currently dma_fence_signal() tries to avoid the spinlock and only takes
> it if absolutely required to walk the callback list. However, to allow
> for some users to surreptitiously insert lazy signal callbacks that
> do not depend on enabling the signaling mechanism around every fence,
> we always need to notify the callbacks on signaling. As such, we will
> always need to take the spinlock and dma_fence_signal() effectively
> becomes a clone of dma_fence_signal_locked().
>
> v2: Update the test_and_set_bit() before entering the spinlock.
>
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> Cc: Daniel Vetter 
> ---
>   drivers/dma-buf/dma-fence.c | 43 +++--
>   1 file changed, 13 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ff0cd6eae766..89d96e3e6df6 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -129,25 +129,16 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>   int dma_fence_signal_locked(struct dma_fence *fence)
>   {
>   struct dma_fence_cb *cur, *tmp;
> - int ret = 0;
>   
>   lockdep_assert_held(fence->lock);
>   
> - if (WARN_ON(!fence))
> + if (unlikely(test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +   &fence->flags)))
>   return -EINVAL;
>   
> - if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
> - ret = -EINVAL;
> -
> - /*
> -  * we might have raced with the unlocked dma_fence_signal,
> -  * still run through all callbacks
> -  */
> - } else {
> - fence->timestamp = ktime_get();
> - set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> - trace_dma_fence_signaled(fence);
> - }
> + fence->timestamp = ktime_get();
> + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> + trace_dma_fence_signaled(fence);
>   
>   if (!list_empty(&fence->cb_list)) {
>   list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> @@ -156,7 +147,8 @@ int dma_fence_signal_locked(struct dma_fence *fence)
>   }
>   INIT_LIST_HEAD(&fence->cb_list);
>   }
> - return ret;
> +
> + return 0;
>   }
>   EXPORT_SYMBOL(dma_fence_signal_locked);
>   
> @@ -176,28 +168,19 @@ EXPORT_SYMBOL(dma_fence_signal_locked);
>   int dma_fence_signal(struct dma_fence *fence)
>   {
>   unsigned long flags;
> + int ret;
>   
>   if (!fence)
>   return -EINVAL;
>   
> - if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   return -EINVAL;

Actually I think we can completely drop this extra test. Signaling a 
fence twice shouldn't be the fast path we should optimize for.

Apart from that it looks good to me,
Christian.

>   
> - fence->timestamp = ktime_get();
> - set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> - trace_dma_fence_signaled(fence);
> -
> - if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
> - struct dma_fence_cb *cur, *tmp;
> + spin_lock_irqsave(fence->lock, flags);
> + ret = dma_fence_signal_locked(fence);
> + spin_unlock_irqrestore(fence->lock, flags);
>   
> - spin_lock_irqsave(fence->lock, flags);
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> - list_del_init(&cur->node);
> - cur->func(fence, cur);
> - }
> - spin_unlock_irqrestore(fence->lock, flags);
> - }
> - return 0;
> + return ret;
>   }
>   EXPORT_SYMBOL(dma_fence_signal);
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] dma-buf: Shrink size of struct dma_fence

2019-08-17 Thread Koenig, Christian
Am 17.08.19 um 13:39 schrieb Chris Wilson:
> Rearrange the couple of 32-bit atomics hidden amongst the field of
> pointers that unnecessarily caused the compiler to insert some padding,
> shrinks the size of the base struct dma_fence from 80 to 72 bytes on
> x86-64.
>
> Signed-off-by: Chris Wilson 
> Cc: Christian König 

Reviewed-by: Christian König 

BTW: We could also put the timestamp in the union if we want.

E.g. the cb_list should only be used while the fence is unsignaled, the 
timestamp while it is signaled and the rcu while it is freed.

Would save another 8 bytes, bringing us down to 64.

Christian.

> ---
>   include/linux/dma-fence.h | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 404aa748eda6..2ce4d877d33e 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -63,7 +63,7 @@ struct dma_fence_cb;
>* been completed, or never called at all.
>*/
>   struct dma_fence {
> - struct kref refcount;
> + spinlock_t *lock;
>   const struct dma_fence_ops *ops;
>   /* We clear the callback list on kref_put so that by the time we
>* release the fence it is unused. No one should be adding to the 
> cb_list
> @@ -73,11 +73,11 @@ struct dma_fence {
>   struct rcu_head rcu;
>   struct list_head cb_list;
>   };
> - spinlock_t *lock;
>   u64 context;
>   u64 seqno;
> - unsigned long flags;
>   ktime_t timestamp;
> + unsigned long flags;
> + struct kref refcount;
>   int error;
>   };
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/4] dma-fence: Have dma_fence_signal call signal_locked

2019-08-16 Thread Koenig, Christian
Am 15.08.19 um 21:29 schrieb Chris Wilson:
> Quoting Chris Wilson (2019-08-15 20:03:13)
>> Quoting Daniel Vetter (2019-08-15 19:48:42)
>>> On Thu, Aug 15, 2019 at 8:46 PM Chris Wilson  
>>> wrote:
 Quoting Daniel Vetter (2019-08-14 18:20:53)
> On Sun, Aug 11, 2019 at 10:15:23AM +0100, Chris Wilson wrote:
>> Now that dma_fence_signal always takes the spinlock to flush the
>> cb_list, simply take the spinlock and call dma_fence_signal_locked() to
>> avoid code repetition.
>>
>> Suggested-by: Christian König 
>> Signed-off-by: Chris Wilson 
>> Cc: Christian König 
> Hm, I think this largely defeats the point of having the lockless signal
> enabling trickery in dma_fence. Maybe that part isn't needed by anyone,
> but feels like a thing that needs a notch more thought. And if we need it,
> maybe a bit more cleanup.
 You mean dma_fence_enable_sw_signaling(). The only user appears to be to
 flush fences, which is actually the intent of always notifying the signal
 cb. By always doing the callbacks, we can avoid installing the interrupt
 and completely saturating CPUs with irqs, instead doing a batch in a
 leisurely timer callback if not flushed naturally.
>>> Yeah I'm not against ditching this,
>> I was just thinking aloud working out what the current use case in ttm
>> was for.
>>
>>> but can't we ditch a lot more if
>>> we just always take the spinlock in those paths now anyways? Kinda not
>>> worth having the complexity anymore.
>> You would be able to drop the was_set from dma_fence_add_callback. Say
>>
>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
>> index 59ac96ec7ba8..e566445134a2 100644
>> --- a/drivers/dma-buf/dma-fence.c
>> +++ b/drivers/dma-buf/dma-fence.c
>> @@ -345,38 +345,31 @@ int dma_fence_add_callback(struct dma_fence *fence, 
>> struct dma_fence_cb *cb,
>> dma_fence_func_t func)
>>   {
>>  unsigned long flags;
>> -   int ret = 0;
>> -   bool was_set;
>> +   int ret = -ENOENT;
>>
>>  if (WARN_ON(!fence || !func))
>>  return -EINVAL;
>>
>> -   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
>> -   INIT_LIST_HEAD(&cb->node);
>> +   INIT_LIST_HEAD(&cb->node);
>> +   cb->func = func;
>> +
>> +   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>>  return -ENOENT;
>> -   }
>>
>>  spin_lock_irqsave(fence->lock, flags);
>> -
>> -   was_set = test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>> -  &fence->flags);
>> -
>> -   if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> -   ret = -ENOENT;
>> -   else if (!was_set && fence->ops->enable_signaling) {
>> +   if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) &&
>> +   !test_and_set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>> + &fence->flags)) {
>>  trace_dma_fence_enable_signal(fence);
>>
>> -   if (!fence->ops->enable_signaling(fence)) {
>> +   if (!fence->ops->enable_signaling ||
>> +   fence->ops->enable_signaling(fence)) {
>> +   list_add_tail(&cb->node, &fence->cb_list);
>> +   ret = 0;
>> +   } else {
>>  dma_fence_signal_locked(fence);
>> -   ret = -ENOENT;
>>  }
>>  }
>> -
>> -   if (!ret) {
>> -   cb->func = func;
>> -   list_add_tail(&cb->node, &fence->cb_list);
>> -   } else
>> -   INIT_LIST_HEAD(&cb->node);
>>  spin_unlock_irqrestore(fence->lock, flags);
>>
>>  return ret;
>>
>> Not a whole lot changes in terms of branches and serialising
>> instructions. One less baffling sequence to worry about.
> Fwiw,
> Function old new   delta
> dma_fence_add_callback   338 302 -36

Well since the sequence number change didn't worked out I'm now working 
on something where I replaced the shared fences list with a reference 
counted version where we also have an active and staged view of the fences.

This removed the whole problem of keeping things alive while inside the 
RCU and also removes the retry looping etc.. Additional to that we can 
also get rid of most of the memory barriers while adding and 
manipulating fences.

The end result in a totally artificial command submission test case is a 
61% performance improvement. This is so much that I'm actually still 
searching if that is not caused by bug somewhere.

Will probably need some more weeks till this is done, but yeah there is 
a huge potential for optimization here,
Christian.

>
> Almost certainly more shaving if you stare.
> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https:

Re: [Intel-gfx] [PATCH] dma-buf: Restore seqlock around dma_resv updates

2019-08-15 Thread Koenig, Christian
Am 14.08.19 um 22:07 schrieb Daniel Vetter:
> On Wed, Aug 14, 2019 at 07:26:44PM +0100, Chris Wilson wrote:
>> Quoting Chris Wilson (2019-08-14 19:24:01)
>>> This reverts
>>> 67c97fb79a7f ("dma-buf: add reservation_object_fences helper")
>>> dd7a7d1ff2f1 ("drm/i915: use new reservation_object_fences helper")
>>> 0e1d8083bddb ("dma-buf: further relax reservation_object_add_shared_fence")
>>> 5d344f58da76 ("dma-buf: nuke reservation_object seq number")
> Oh I didn't realize they landed already.
>
>>> The scenario that defeats simply grabbing a set of shared/exclusive
>>> fences and using them blissfully under RCU is that any of those fences
>>> may be reallocated by a SLAB_TYPESAFE_BY_RCU fence slab cache. In this
>>> scenario, while keeping the rcu_read_lock we need to establish that no
>>> fence was changed in the dma_resv after a read (or full) memory barrier.
> So if I'm reading correctly what Chris is saying here I guess my half
> comment in that other thread pointed at a real oversight. Since I still
> haven't checked and it's too late for real review not more for now.

Yeah, the root of the problem is that I didn't realized that fences 
could be reused while in the RCU grace period.

Need to go a step back and try to come up with a complete new approach 
for synchronization.

>>> Signed-off-by: Chris Wilson 
>>> Cc: Chris Wilson 
>> I said I needed to go lie down, that proves it.
> Yeah I guess we need to wait for Christian to wake up an have a working
> brain.

Well in that case you will need to wait for a few more years for my kids 
to enter school age :)

Cheers,
Christian.

> -Daniel
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] dma-buf: Restore seqlock around dma_resv updates

2019-08-15 Thread Koenig, Christian
Am 14.08.19 um 20:26 schrieb Chris Wilson:
> Quoting Chris Wilson (2019-08-14 19:24:01)
>> This reverts
>> 67c97fb79a7f ("dma-buf: add reservation_object_fences helper")
>> dd7a7d1ff2f1 ("drm/i915: use new reservation_object_fences helper")
>> 0e1d8083bddb ("dma-buf: further relax reservation_object_add_shared_fence")
>> 5d344f58da76 ("dma-buf: nuke reservation_object seq number")
>>
>> The scenario that defeats simply grabbing a set of shared/exclusive
>> fences and using them blissfully under RCU is that any of those fences
>> may be reallocated by a SLAB_TYPESAFE_BY_RCU fence slab cache. In this
>> scenario, while keeping the rcu_read_lock we need to establish that no
>> fence was changed in the dma_resv after a read (or full) memory barrier.
>>
>> Signed-off-by: Chris Wilson 

Acked-by: Christian König 

>> Cc: Chris Wilson 
> I said I needed to go lie down, that proves it.
>
> Cc: Christian König 
>
>> Cc: Daniel Vetter 
>> ---
>>   drivers/dma-buf/dma-buf.c |  31 -
>>   drivers/dma-buf/dma-resv.c| 109 -
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   7 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_busy.c  |  24 ++--
>>   include/linux/dma-resv.h  | 113 --
>>   5 files changed, 175 insertions(+), 109 deletions(-)
>>
>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>> index b3400d6524ab..433d91d710e4 100644
>> --- a/drivers/dma-buf/dma-buf.c
>> +++ b/drivers/dma-buf/dma-buf.c
>> @@ -199,7 +199,7 @@ static __poll_t dma_buf_poll(struct file *file, 
>> poll_table *poll)
>>  struct dma_resv_list *fobj;
>>  struct dma_fence *fence_excl;
>>  __poll_t events;
>> -   unsigned shared_count;
>> +   unsigned shared_count, seq;
>>   
>>  dmabuf = file->private_data;
>>  if (!dmabuf || !dmabuf->resv)
>> @@ -213,8 +213,21 @@ static __poll_t dma_buf_poll(struct file *file, 
>> poll_table *poll)
>>  if (!events)
>>  return 0;
>>   
>> +retry:
>> +   seq = read_seqcount_begin(&resv->seq);
>>  rcu_read_lock();
>> -   dma_resv_fences(resv, &fence_excl, &fobj, &shared_count);
>> +
>> +   fobj = rcu_dereference(resv->fence);
>> +   if (fobj)
>> +   shared_count = fobj->shared_count;
>> +   else
>> +   shared_count = 0;
>> +   fence_excl = rcu_dereference(resv->fence_excl);
>> +   if (read_seqcount_retry(&resv->seq, seq)) {
>> +   rcu_read_unlock();
>> +   goto retry;
>> +   }
>> +
>>  if (fence_excl && (!(events & EPOLLOUT) || shared_count == 0)) {
>>  struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
>>  __poll_t pevents = EPOLLIN;
>> @@ -1144,6 +1157,7 @@ static int dma_buf_debug_show(struct seq_file *s, void 
>> *unused)
>>  struct dma_resv *robj;
>>  struct dma_resv_list *fobj;
>>  struct dma_fence *fence;
>> +   unsigned seq;
>>  int count = 0, attach_count, shared_count, i;
>>  size_t size = 0;
>>   
>> @@ -1174,9 +1188,16 @@ static int dma_buf_debug_show(struct seq_file *s, 
>> void *unused)
>>  buf_obj->name ?: "");
>>   
>>  robj = buf_obj->resv;
>> -   rcu_read_lock();
>> -   dma_resv_fences(robj, &fence, &fobj, &shared_count);
>> -   rcu_read_unlock();
>> +   while (true) {
>> +   seq = read_seqcount_begin(&robj->seq);
>> +   rcu_read_lock();
>> +   fobj = rcu_dereference(robj->fence);
>> +   shared_count = fobj ? fobj->shared_count : 0;
>> +   fence = rcu_dereference(robj->fence_excl);
>> +   if (!read_seqcount_retry(&robj->seq, seq))
>> +   break;
>> +   rcu_read_unlock();
>> +   }
>>   
>>  if (fence)
>>  seq_printf(s, "\tExclusive fence: %s %s 
>> %ssignalled\n",
>> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
>> index f5142683c851..42a8f3f11681 100644
>> --- a/drivers/dma-buf/dma-resv.c
>> +++ b/drivers/dma-buf/dma-resv.c
>> @@ -49,6 +49,12 @@
>>   DEFINE_WD_CLASS(reservation_ww_class);
>>   EXPORT_SYMBOL(reservation_ww_class);
>>   
>> +struct lock_class_key reservation_seqcount_class;
>> +EXPORT_SYMBOL(reservation_seqcount_class);
>> +
>> +const char reservation_seqcount_string[] = "reservation_seqcount";
>> +EXPORT_SYMBOL(reservation_seqcount_string);
>> +
>>   /**
>>* dma_resv_list_alloc - allocate fence list
>>* @shared_max: number of fences we need space for
>> @@ -96,6 +102,9 @@ static void dma_resv_list_free(struct dma_resv_list *list)
>>   void dma_resv_init(struct dma_resv *obj)
>>   {
>>  ww_mutex_init(&obj->lock, &reservation_ww_class);
>> +
>> +   __seqcount_init(&obj->seq, reserv

Re: [Intel-gfx] [PATCH 4/4] dma-buf: nuke reservation_object seq number

2019-08-14 Thread Koenig, Christian
Am 14.08.19 um 19:48 schrieb Chris Wilson:
> Quoting Chris Wilson (2019-08-14 18:38:20)
>> Quoting Chris Wilson (2019-08-14 18:22:53)
>>> Quoting Chris Wilson (2019-08-14 18:06:18)
 Quoting Chris Wilson (2019-08-14 17:42:48)
> Quoting Daniel Vetter (2019-08-14 16:39:08)
> +   } while (rcu_access_pointer(obj->fence_excl) != *excl);
>> What if someone is real fast (like really real fast) and recycles the
>> exclusive fence so you read the same pointer twice, but everything else
>> changed? reused fence pointer is a lot more likely than seqlock wrapping
>> around.
> It's an exclusive fence. If it is replaced, it must be later than all
> the shared fences (and dependent on them directly or indirectly), and
> so still a consistent snapshot.
 An extension of that argument says we don't even need to loop,

 *list = rcu_dereference(obj->fence);
 *shared_count = *list ? (*list)->shared_count : 0;
 smp_rmb();
 *excl = rcu_dereference(obj->fence_excl);

 Gives a consistent snapshot. It doesn't matter if the fence_excl is
 before or after the shared_list -- if it's after, it's a superset of all
 fences, if it's before, we have a correct list of shared fences the
 earlier fence_excl.
>>> The problem is that the point of the loop is that we do need a check on
>>> the fences after the full memory barrier.
>>>
>>> Thinking of the rationale beaten out for dma_fence_get_excl_rcu_safe()
>>>
>>> We don't have a full memory barrier here, so this cannot be used safely
>>> in light of fence reallocation.
>> i.e. we need to restore the loops in the callers,
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
>> index a2aff1d8290e..f019062c8cd7 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c
>> @@ -110,6 +110,7 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>>   * to report the overall busyness. This is what the wait-ioctl does.
>>   *
>>   */
>> +retry:
>>  dma_resv_fences(obj->base.resv, &excl, &list, &shared_count);
>>
>>  /* Translate the exclusive fence to the READ *and* WRITE engine */
>> @@ -122,6 +123,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>>  args->busy |= busy_check_reader(fence);
>>  }
>>
>> +   smp_rmb();
>> +   if (excl != rcu_access_pointer(obj->base.resv->fence_excl))
>> +   goto retry;
>> +
>>
>> wrap that up as
>>
>> static inline bool
>> dma_resv_fences_retry(struct dma_resv *resv, struct dma_fence *excl)
>> {
>>  smp_rmb();
>>  return excl != rcu_access_pointer(resv->fence_excl);
>> }
> I give up. It's not just the fence_excl that's an issue here.
>
> Any of the shared fences may be replaced after dma_resv_fences()
> and so the original shared fence pointer may be reassigned (even under
> RCU).

Yeah, but this should be harmless. See fences are always replaced either 
when they are signaled anyway or by later fences from the same context.

And existing fences shouldn't be re-used while under RCU, or is anybody 
still using SLAB_TYPESAFE_BY_RCU?

Christian.

>   The only defense against that is the seqcount.
>
> I totally screwed that up.
> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] linux-next: manual merge of the drm-misc tree with the drm and drm-intel trees

2019-08-13 Thread Koenig, Christian
Am 14.08.19 um 04:54 schrieb Stephen Rothwell:
> Hi all,
>
> Today's linux-next merge of the drm-misc tree got a conflict in:
>
>drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>drivers/gpu/drm/i915/i915_vma.c
>drivers/gpu/drm/i915/i915_gem_batch_pool.c
>drivers/gpu/drm/i915/gem/i915_gem_object.c
>drivers/gpu/drm/i915/gt/intel_engine_pool.c
>
> between commits:
>
>a93615f900bd ("drm/i915: Throw away the active object retirement 
> complexity")
>12c255b5dad1 ("drm/i915: Provide an i915_active.acquire callback")
>cd2a4eaf8c79 ("drm/i915: Report resv_obj allocation failure")
>b40d73784ffc ("drm/i915: Replace struct_mutex for batch pool 
> serialisation")
>ab2f7a5c18b5 ("drm/amdgpu: Implement VRAM wipe on release")
>0c159ffef628 ("drm/i915/gem: Defer obj->base.resv fini until RCU callback")
>
> from the drm and drm-intel trees and commit:
>
>52791eeec1d9 ("dma-buf: rename reservation_object to dma_resv")
>
> from the drm-misc tree.
>
> I fixed it up (see below and I added the following merge fix patch) and
> can carry the fix as necessary. This is now fixed as far as linux-next
> is concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the
> conflicting tree to minimise any particularly complex conflicts.
>
> From: Stephen Rothwell 
> Date: Wed, 14 Aug 2019 12:48:39 +1000
> Subject: [PATCH] drm: fix up fallout from "dma-buf: rename reservation_object 
> to dma_resv"
>
> Signed-off-by: Stephen Rothwell 

Yeah, it was to be expected that this causes a bit trouble.

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 4 ++--
>   drivers/gpu/drm/i915/gem/i915_gem_object.c  | 2 +-
>   drivers/gpu/drm/i915/gt/intel_engine_pool.c | 8 
>   3 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index dfd4aa68c806..6ebe61e14f29 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -1242,7 +1242,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object 
> *bo)
>   !(abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE))
>   return;
>   
> - reservation_object_lock(bo->base.resv, NULL);
> + dma_resv_lock(bo->base.resv, NULL);
>   
>   r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence);
>   if (!WARN_ON(r)) {
> @@ -1250,7 +1250,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object 
> *bo)
>   dma_fence_put(fence);
>   }
>   
> - reservation_object_unlock(bo->base.resv);
> + dma_resv_unlock(bo->base.resv);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 3929c3a6b281..67dc61e02c9f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -154,7 +154,7 @@ static void __i915_gem_free_object_rcu(struct rcu_head 
> *head)
>   container_of(head, typeof(*obj), rcu);
>   struct drm_i915_private *i915 = to_i915(obj->base.dev);
>   
> - reservation_object_fini(&obj->base._resv);
> + dma_resv_fini(&obj->base._resv);
>   i915_gem_object_free(obj);
>   
>   GEM_BUG_ON(!atomic_read(&i915->mm.free_count));
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
> index 03d90b49584a..4cd54c569911 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pool.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
> @@ -43,12 +43,12 @@ static int pool_active(struct i915_active *ref)
>   {
>   struct intel_engine_pool_node *node =
>   container_of(ref, typeof(*node), active);
> - struct reservation_object *resv = node->obj->base.resv;
> + struct dma_resv *resv = node->obj->base.resv;
>   int err;
>   
> - if (reservation_object_trylock(resv)) {
> - reservation_object_add_excl_fence(resv, NULL);
> - reservation_object_unlock(resv);
> + if (dma_resv_trylock(resv)) {
> + dma_resv_add_excl_fence(resv, NULL);
> + dma_resv_unlock(resv);
>   }
>   
>   err = i915_gem_object_pin_pages(node->obj);

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] dma-fence: Refactor signaling for manual invocation

2019-08-13 Thread Koenig, Christian
Am 13.08.19 um 10:25 schrieb Chris Wilson:
> Quoting Koenig, Christian (2019-08-13 07:59:28)
>> Am 12.08.19 um 16:53 schrieb Chris Wilson:
>>> Quoting Koenig, Christian (2019-08-12 15:50:59)
>>>> Am 12.08.19 um 16:43 schrieb Chris Wilson:
>>>>> Quoting Koenig, Christian (2019-08-12 15:34:32)
>>>>>> Am 10.08.19 um 17:34 schrieb Chris Wilson:
>>>>>>> Move the duplicated code within dma-fence.c into the header for wider
>>>>>>> reuse. In the process apply a small micro-optimisation to only prune the
>>>>>>> fence->cb_list once rather than use list_del on every entry.
>>>>>>>
>>>>>>> Signed-off-by: Chris Wilson 
>>>>>>> Cc: Tvrtko Ursulin 
>>>>>>> ---
>>>>>>>  drivers/dma-buf/Makefile|  10 +-
>>>>>>>  drivers/dma-buf/dma-fence-trace.c   |  28 +++
>>>>>>>  drivers/dma-buf/dma-fence.c |  33 +--
>>>>>>>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  32 +--
>>>>>>>  include/linux/dma-fence-impl.h  |  83 +++
>>>>>>>  include/linux/dma-fence-types.h | 258 
>>>>>>> 
>>>>>>>  include/linux/dma-fence.h   | 228 +
>>>>>> Mhm, I don't really see the value in creating more header files.
>>>>>>
>>>>>> Especially I'm pretty sure that the types should stay in dma-fence.h
>>>>> iirc, when I included the trace.h from dma-fence.h or dma-fence-impl.h
>>>>> without separating the types, amdgpu failed to compile (which is more
>>>>> than likely to be simply due to be first drm in the list to compile).
>>>> Ah, but why do you want to include trace.h in a header in the first place?
>>>>
>>>> That's usually not something I would recommend either.
>>> The problem is that we do emit a tracepoint as part of the sequence I
>>> want to put into the reusable chunk of code.
>> Ok, we are going in circles here. Why do you want to reuse the code then?
> I am reusing the code to avoid fun and games with signal-vs-free.

Yeah, but that doesn't seems to be valid.

See the dma_fence should more or less be a contained object and you now 
expose quite a bit of the internal functionality inside a headers.

And creating headers which when included make other drivers fail to 
compile sounds like a rather bad idea to me.

Please explain the background a bit more.

Thanks,
Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] dma-fence: Refactor signaling for manual invocation

2019-08-12 Thread Koenig, Christian
Am 12.08.19 um 16:53 schrieb Chris Wilson:
> Quoting Koenig, Christian (2019-08-12 15:50:59)
>> Am 12.08.19 um 16:43 schrieb Chris Wilson:
>>> Quoting Koenig, Christian (2019-08-12 15:34:32)
>>>> Am 10.08.19 um 17:34 schrieb Chris Wilson:
>>>>> Move the duplicated code within dma-fence.c into the header for wider
>>>>> reuse. In the process apply a small micro-optimisation to only prune the
>>>>> fence->cb_list once rather than use list_del on every entry.
>>>>>
>>>>> Signed-off-by: Chris Wilson 
>>>>> Cc: Tvrtko Ursulin 
>>>>> ---
>>>>> drivers/dma-buf/Makefile|  10 +-
>>>>> drivers/dma-buf/dma-fence-trace.c   |  28 +++
>>>>> drivers/dma-buf/dma-fence.c |  33 +--
>>>>> drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  32 +--
>>>>> include/linux/dma-fence-impl.h  |  83 +++
>>>>> include/linux/dma-fence-types.h | 258 
>>>>> include/linux/dma-fence.h   | 228 +
>>>> Mhm, I don't really see the value in creating more header files.
>>>>
>>>> Especially I'm pretty sure that the types should stay in dma-fence.h
>>> iirc, when I included the trace.h from dma-fence.h or dma-fence-impl.h
>>> without separating the types, amdgpu failed to compile (which is more
>>> than likely to be simply due to be first drm in the list to compile).
>> Ah, but why do you want to include trace.h in a header in the first place?
>>
>> That's usually not something I would recommend either.
> The problem is that we do emit a tracepoint as part of the sequence I
> want to put into the reusable chunk of code.

Ok, we are going in circles here. Why do you want to reuse the code then?

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] dma-buf/sw_sync: Synchronize signal vs syncpt free

2019-08-12 Thread Koenig, Christian
Am 12.08.19 um 17:42 schrieb Chris Wilson:
> During release of the syncpt, we remove it from the list of syncpt and
> the tree, but only if it is not already been removed. However, during
> signaling, we first remove the syncpt from the list. So, if we
> concurrently free and signal the syncpt, the free may decide that it is
> not part of the tree and immediately free itself -- meanwhile the
> signaler goes onto to use the now freed datastructure.
>
> In particular, we get struct by commit 0e2f733addbf ("dma-buf: make
> dma_fence structure a bit smaller v2") as the cb_list is immediately
> clobbered by the kfree_rcu.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111381
> Fixes: d3862e44daa7 ("dma-buf/sw-sync: Fix locking around sync_timeline 
> lists")
> References: 0e2f733addbf ("dma-buf: make dma_fence structure a bit smaller 
> v2")
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: Sean Paul 
> Cc: Gustavo Padovan 
> Cc: Christian König 
> Cc:  # v4.14+

Acked-by: Christian König 

> ---
>   drivers/dma-buf/sw_sync.c | 13 +
>   1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
> index 051f6c2873c7..27b1d549ed38 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -132,17 +132,14 @@ static void timeline_fence_release(struct dma_fence 
> *fence)
>   {
>   struct sync_pt *pt = dma_fence_to_sync_pt(fence);
>   struct sync_timeline *parent = dma_fence_parent(fence);
> + unsigned long flags;
>   
> + spin_lock_irqsave(fence->lock, flags);
>   if (!list_empty(&pt->link)) {
> - unsigned long flags;
> -
> - spin_lock_irqsave(fence->lock, flags);
> - if (!list_empty(&pt->link)) {
> - list_del(&pt->link);
> - rb_erase(&pt->node, &parent->pt_tree);
> - }
> - spin_unlock_irqrestore(fence->lock, flags);
> + list_del(&pt->link);
> + rb_erase(&pt->node, &parent->pt_tree);
>   }
> + spin_unlock_irqrestore(fence->lock, flags);
>   
>   sync_timeline_put(parent);
>   dma_fence_free(fence);

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] dma-fence: Refactor signaling for manual invocation

2019-08-12 Thread Koenig, Christian
Am 12.08.19 um 16:43 schrieb Chris Wilson:
> Quoting Koenig, Christian (2019-08-12 15:34:32)
>> Am 10.08.19 um 17:34 schrieb Chris Wilson:
>>> Move the duplicated code within dma-fence.c into the header for wider
>>> reuse. In the process apply a small micro-optimisation to only prune the
>>> fence->cb_list once rather than use list_del on every entry.
>>>
>>> Signed-off-by: Chris Wilson 
>>> Cc: Tvrtko Ursulin 
>>> ---
>>>drivers/dma-buf/Makefile|  10 +-
>>>drivers/dma-buf/dma-fence-trace.c   |  28 +++
>>>drivers/dma-buf/dma-fence.c |  33 +--
>>>drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  32 +--
>>>include/linux/dma-fence-impl.h  |  83 +++
>>>include/linux/dma-fence-types.h | 258 
>>>include/linux/dma-fence.h   | 228 +
>> Mhm, I don't really see the value in creating more header files.
>>
>> Especially I'm pretty sure that the types should stay in dma-fence.h
> iirc, when I included the trace.h from dma-fence.h or dma-fence-impl.h
> without separating the types, amdgpu failed to compile (which is more
> than likely to be simply due to be first drm in the list to compile).

Ah, but why do you want to include trace.h in a header in the first place?

That's usually not something I would recommend either.

Christian.

>
> Doing more work wasn't through choice.
> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 3/4] dma-fence: Refactor signaling for manual invocation

2019-08-12 Thread Koenig, Christian
Am 10.08.19 um 17:34 schrieb Chris Wilson:
> Move the duplicated code within dma-fence.c into the header for wider
> reuse. In the process apply a small micro-optimisation to only prune the
> fence->cb_list once rather than use list_del on every entry.
>
> Signed-off-by: Chris Wilson 
> Cc: Tvrtko Ursulin 
> ---
>   drivers/dma-buf/Makefile|  10 +-
>   drivers/dma-buf/dma-fence-trace.c   |  28 +++
>   drivers/dma-buf/dma-fence.c |  33 +--
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  32 +--
>   include/linux/dma-fence-impl.h  |  83 +++
>   include/linux/dma-fence-types.h | 258 
>   include/linux/dma-fence.h   | 228 +

Mhm, I don't really see the value in creating more header files.

Especially I'm pretty sure that the types should stay in dma-fence.h

Christian.

>   7 files changed, 386 insertions(+), 286 deletions(-)
>   create mode 100644 drivers/dma-buf/dma-fence-trace.c
>   create mode 100644 include/linux/dma-fence-impl.h
>   create mode 100644 include/linux/dma-fence-types.h
>
> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
> index e8c7310cb800..65c43778e571 100644
> --- a/drivers/dma-buf/Makefile
> +++ b/drivers/dma-buf/Makefile
> @@ -1,6 +1,12 @@
>   # SPDX-License-Identifier: GPL-2.0-only
> -obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
> -  reservation.o seqno-fence.o
> +obj-y := \
> + dma-buf.o \
> + dma-fence.o \
> + dma-fence-array.o \
> + dma-fence-chain.o \
> + dma-fence-trace.o \
> + reservation.o \
> + seqno-fence.o
>   obj-$(CONFIG_SYNC_FILE) += sync_file.o
>   obj-$(CONFIG_SW_SYNC)   += sw_sync.o sync_debug.o
>   obj-$(CONFIG_UDMABUF)   += udmabuf.o
> diff --git a/drivers/dma-buf/dma-fence-trace.c 
> b/drivers/dma-buf/dma-fence-trace.c
> new file mode 100644
> index ..eb6f282be4c0
> --- /dev/null
> +++ b/drivers/dma-buf/dma-fence-trace.c
> @@ -0,0 +1,28 @@
> +/*
> + * Fence mechanism for dma-buf and to allow for asynchronous dma access
> + *
> + * Copyright (C) 2012 Canonical Ltd
> + * Copyright (C) 2012 Texas Instruments
> + *
> + * Authors:
> + * Rob Clark 
> + * Maarten Lankhorst 
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published 
> by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but 
> WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +
> +#include 
> +
> +#define CREATE_TRACE_POINTS
> +#include 
> +
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 59ac96ec7ba8..027a6a894abd 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -14,15 +14,9 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   
> -#define CREATE_TRACE_POINTS
> -#include 
> -
> -EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
> -EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> -EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
> -
>   static DEFINE_SPINLOCK(dma_fence_stub_lock);
>   static struct dma_fence dma_fence_stub;
>   
> @@ -128,7 +122,6 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>*/
>   int dma_fence_signal_locked(struct dma_fence *fence)
>   {
> - struct dma_fence_cb *cur, *tmp;
>   int ret = 0;
>   
>   lockdep_assert_held(fence->lock);
> @@ -136,7 +129,7 @@ int dma_fence_signal_locked(struct dma_fence *fence)
>   if (WARN_ON(!fence))
>   return -EINVAL;
>   
> - if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
> + if (!__dma_fence_signal(fence)) {
>   ret = -EINVAL;
>   
>   /*
> @@ -144,15 +137,10 @@ int dma_fence_signal_locked(struct dma_fence *fence)
>* still run through all callbacks
>*/
>   } else {
> - fence->timestamp = ktime_get();
> - set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
> - trace_dma_fence_signaled(fence);
> + __dma_fence_signal__timestamp(fence, ktime_get());
>   }
>   
> - list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) {
> - list_del_init(&cur->node);
> - cur->func(fence, cur);
> - }
> + __dma_fence_signal__notify(fence);
>   return ret;
>   }
>   EXPORT_SYMBOL(dma_fence_signal_locked);
> @@ -177,21 +165,14 @@ int dma_fence_signal(struct dma_fence *fence)
>   if (!fence)
>   return -EINVAL;
>   
> - if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence

Re: [Intel-gfx] [PATCH v5] dma-fence: Propagate errors to dma-fence-array container

2019-08-11 Thread Koenig, Christian
Am 11.08.19 um 18:25 schrieb Chris Wilson:
> When one of the array of fences is signaled, propagate its errors to the
> parent fence-array (keeping the first error to be raised).
>
> v2: Opencode cmpxchg_local to avoid compiler freakout.
> v3: Be careful not to flag an error if we race against signal-on-any.
> v4: Same applies to installing the signal cb.
> v5: Use cmpxchg to only set the error once before using a nifty idea by
> Christian to avoid changing the status after emitting the signal.
>
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: Gustavo Padovan 
> Cc: Christian König 

Reviewed-by: Christian König 

> ---
>   drivers/dma-buf/dma-fence-array.c | 32 ++-
>   1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c 
> b/drivers/dma-buf/dma-fence-array.c
> index 12c6f64c0bc2..d3fbd950be94 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -13,6 +13,8 @@
>   #include 
>   #include 
>   
> +#define PENDING_ERROR 1
> +
>   static const char *dma_fence_array_get_driver_name(struct dma_fence *fence)
>   {
>   return "dma_fence_array";
> @@ -23,10 +25,29 @@ static const char 
> *dma_fence_array_get_timeline_name(struct dma_fence *fence)
>   return "unbound";
>   }
>   
> +static void dma_fence_array_set_pending_error(struct dma_fence_array *array,
> +   int error)
> +{
> + /*
> +  * Propagate the first error reported by any of our fences, but only
> +  * before we ourselves are signaled.
> +  */
> + if (error)
> + cmpxchg(&array->base.error, PENDING_ERROR, error);
> +}
> +
> +static void dma_fence_array_clear_pending_error(struct dma_fence_array 
> *array)
> +{
> + /* Clear the error flag if not actually set. */
> + cmpxchg(&array->base.error, PENDING_ERROR, 0);
> +}
> +
>   static void irq_dma_fence_array_work(struct irq_work *wrk)
>   {
>   struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
>   
> + dma_fence_array_clear_pending_error(array);
> +
>   dma_fence_signal(&array->base);
>   dma_fence_put(&array->base);
>   }
> @@ -38,6 +59,8 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
>   container_of(cb, struct dma_fence_array_cb, cb);
>   struct dma_fence_array *array = array_cb->array;
>   
> + dma_fence_array_set_pending_error(array, f->error);
> +
>   if (atomic_dec_and_test(&array->num_pending))
>   irq_work_queue(&array->work);
>   else
> @@ -63,9 +86,14 @@ static bool dma_fence_array_enable_signaling(struct 
> dma_fence *fence)
>   dma_fence_get(&array->base);
>   if (dma_fence_add_callback(array->fences[i], &cb[i].cb,
>  dma_fence_array_cb_func)) {
> + int error = array->fences[i]->error;
> +
> + dma_fence_array_set_pending_error(array, error);
>   dma_fence_put(&array->base);
> - if (atomic_dec_and_test(&array->num_pending))
> + if (atomic_dec_and_test(&array->num_pending)) {
> + dma_fence_array_clear_pending_error(array);
>   return false;
> + }
>   }
>   }
>   
> @@ -142,6 +170,8 @@ struct dma_fence_array *dma_fence_array_create(int 
> num_fences,
>   atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
>   array->fences = fences;
>   
> + array->base.error = PENDING_ERROR;
> +
>   return array;
>   }
>   EXPORT_SYMBOL(dma_fence_array_create);

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 5/4] dma-fence: Have dma_fence_signal call signal_locked

2019-08-11 Thread Koenig, Christian
Am 11.08.19 um 11:15 schrieb Chris Wilson:
> Now that dma_fence_signal always takes the spinlock to flush the
> cb_list, simply take the spinlock and call dma_fence_signal_locked() to
> avoid code repetition.
>
> Suggested-by: Christian König 
> Signed-off-by: Chris Wilson 
> Cc: Christian König 

Reviewed-by: Christian König 

> ---
>   drivers/dma-buf/dma-fence.c | 32 
>   1 file changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ab4a456bba04..367b71084d34 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -122,26 +122,18 @@ EXPORT_SYMBOL(dma_fence_context_alloc);
>*/
>   int dma_fence_signal_locked(struct dma_fence *fence)
>   {
> - int ret = 0;
> -
> - lockdep_assert_held(fence->lock);
> -
>   if (WARN_ON(!fence))
>   return -EINVAL;
>   
> - if (!__dma_fence_signal(fence)) {
> - ret = -EINVAL;
> + lockdep_assert_held(fence->lock);
>   
> - /*
> -  * we might have raced with the unlocked dma_fence_signal,
> -  * still run through all callbacks
> -  */
> - } else {
> - __dma_fence_signal__timestamp(fence, ktime_get());
> - }
> + if (!__dma_fence_signal(fence))
> + return -EINVAL;
>   
> + __dma_fence_signal__timestamp(fence, ktime_get());
>   __dma_fence_signal__notify(fence);
> - return ret;
> +
> + return 0;
>   }
>   EXPORT_SYMBOL(dma_fence_signal_locked);
>   
> @@ -161,19 +153,19 @@ EXPORT_SYMBOL(dma_fence_signal_locked);
>   int dma_fence_signal(struct dma_fence *fence)
>   {
>   unsigned long flags;
> + int ret;
>   
>   if (!fence)
>   return -EINVAL;
>   
> - if (!__dma_fence_signal(fence))
> - return -EINVAL;
> -
> - __dma_fence_signal__timestamp(fence, ktime_get());
> + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> + return 0;
>   
>   spin_lock_irqsave(fence->lock, flags);
> - __dma_fence_signal__notify(fence);
> + ret = dma_fence_signal_locked(fence);
>   spin_unlock_irqrestore(fence->lock, flags);
> - return 0;
> +
> + return ret;
>   }
>   EXPORT_SYMBOL(dma_fence_signal);
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v4] dma-fence: Propagate errors to dma-fence-array container

2019-08-11 Thread Koenig, Christian
How about this instead:

Setting array->base.error = 1 during initialization.

Then cmpxchg(array->base.error, 1, error) whenever a fence in the array 
signals.

And then finally cmpxchg(array->base.error, 1, 0) when the array itself 
signals.

Christian.

Am 11.08.19 um 14:21 schrieb Chris Wilson:
> When one of the array of fences is signaled, propagate its errors to the
> parent fence-array (keeping the first error to be raised).
>
> v2: Opencode cmpxchg_local to avoid compiler freakout.
> v3: Be careful not to flag an error if we race against signal-on-any.
> v4: Same applies to installing the signal cb.
>
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: Gustavo Padovan 
> Cc: Christian König 
> ---
>   drivers/dma-buf/dma-fence-array.c | 37 ++-
>   include/linux/dma-fence-array.h   |  2 ++
>   2 files changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c 
> b/drivers/dma-buf/dma-fence-array.c
> index 12c6f64c0bc2..4d574dff0ba9 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -23,10 +23,37 @@ static const char 
> *dma_fence_array_get_timeline_name(struct dma_fence *fence)
>   return "unbound";
>   }
>   
> +static void dma_fence_array_set_error(struct dma_fence_array *array)
> +{
> + int error = READ_ONCE(array->pending_error);
> +
> + if (!array->base.error && error)
> + dma_fence_set_error(&array->base, error);
> +}
> +
> +static void dma_fence_array_set_pending_error(struct dma_fence_array *array,
> +   int error)
> +{
> + /*
> +  * Propagate the first error reported by any of our fences, but only
> +  * before we ourselves are signaled.
> +  *
> +  * Note that this may race with multiple fences completing
> +  * simultaneously in error, but only one error will be kept, not
> +  * necessarily the first. So long as we propagate an error if any
> +  * fences were in error before we are signaled we should be telling
> +  * an acceptable truth.
> +  */
> + if (error && !array->pending_error)
> + WRITE_ONCE(array->pending_error, error);
> +}
> +
>   static void irq_dma_fence_array_work(struct irq_work *wrk)
>   {
>   struct dma_fence_array *array = container_of(wrk, typeof(*array), work);
>   
> + dma_fence_array_set_error(array);
> +
>   dma_fence_signal(&array->base);
>   dma_fence_put(&array->base);
>   }
> @@ -38,6 +65,8 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
>   container_of(cb, struct dma_fence_array_cb, cb);
>   struct dma_fence_array *array = array_cb->array;
>   
> + dma_fence_array_set_pending_error(array, f->error);
> +
>   if (atomic_dec_and_test(&array->num_pending))
>   irq_work_queue(&array->work);
>   else
> @@ -63,9 +92,14 @@ static bool dma_fence_array_enable_signaling(struct 
> dma_fence *fence)
>   dma_fence_get(&array->base);
>   if (dma_fence_add_callback(array->fences[i], &cb[i].cb,
>  dma_fence_array_cb_func)) {
> + int error = array->fences[i]->error;
> +
> + dma_fence_array_set_pending_error(array, error);
>   dma_fence_put(&array->base);
> - if (atomic_dec_and_test(&array->num_pending))
> + if (atomic_dec_and_test(&array->num_pending)) {
> + dma_fence_array_set_error(array);
>   return false;
> + }
>   }
>   }
>   
> @@ -141,6 +175,7 @@ struct dma_fence_array *dma_fence_array_create(int 
> num_fences,
>   array->num_fences = num_fences;
>   atomic_set(&array->num_pending, signal_on_any ? 1 : num_fences);
>   array->fences = fences;
> + array->pending_error = 0;
>   
>   return array;
>   }
> diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
> index 303dd712220f..faaf70c524ae 100644
> --- a/include/linux/dma-fence-array.h
> +++ b/include/linux/dma-fence-array.h
> @@ -42,6 +42,8 @@ struct dma_fence_array {
>   atomic_t num_pending;
>   struct dma_fence **fences;
>   
> + int pending_error;
> +
>   struct irq_work work;
>   };
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 4/4] dma-fence: Always execute signal callbacks

2019-08-11 Thread Koenig, Christian
Am 10.08.19 um 17:34 schrieb Chris Wilson:
> Allow for some users to surreptitiously insert lazy signal callbacks that
> do not depend on enabling the signaling mechanism around every fence.
> (The cost of interrupts is too darn high, to revive an old meme.)
> This means that we may have a cb_list even if the signaling bit is not
> enabled, so always notify the callbacks.
>
> The cost is that dma_fence_signal() must always acquire the spinlock to
> ensure that the cb_list is flushed.
>
> Signed-off-by: Chris Wilson 
> ---
>   drivers/dma-buf/dma-fence.c | 8 +++-
>   1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 027a6a894abd..ab4a456bba04 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -170,11 +170,9 @@ int dma_fence_signal(struct dma_fence *fence)
>   
>   __dma_fence_signal__timestamp(fence, ktime_get());
>   
> - if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &fence->flags)) {
> - spin_lock_irqsave(fence->lock, flags);
> - __dma_fence_signal__notify(fence);
> - spin_unlock_irqrestore(fence->lock, flags);
> - }
> + spin_lock_irqsave(fence->lock, flags);
> + __dma_fence_signal__notify(fence);
> + spin_unlock_irqrestore(fence->lock, flags);

If we now always grab the spinlock anyway I suggest to rather merge 
dma_fence_signal and dma_fence_signal_locked.

Christian.

>   return 0;
>   }
>   EXPORT_SYMBOL(dma_fence_signal);

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] dma-fence: Propagate errors to dma-fence-array container

2019-08-11 Thread Koenig, Christian
Am 10.08.19 um 17:34 schrieb Chris Wilson:
> When one of the array of fences is signaled, propagate its errors to the
> parent fence-array (keeping the first error to be raised).
>
> v2: Opencode cmpxchg_local to avoid compiler freakout.
>
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: Gustavo Padovan 
> ---
>   drivers/dma-buf/dma-fence-array.c | 15 +++
>   1 file changed, 15 insertions(+)
>
> diff --git a/drivers/dma-buf/dma-fence-array.c 
> b/drivers/dma-buf/dma-fence-array.c
> index 12c6f64c0bc2..d90675bb4fcc 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -13,6 +13,12 @@
>   #include 
>   #include 
>   
> +static void fence_set_error_once(struct dma_fence *fence, int error)

I would use a dma_fence_array prefix for all names in the file.

> +{
> + if (!fence->error && error)
> + dma_fence_set_error(fence, error);
> +}
> +
>   static const char *dma_fence_array_get_driver_name(struct dma_fence *fence)
>   {
>   return "dma_fence_array";
> @@ -38,6 +44,13 @@ static void dma_fence_array_cb_func(struct dma_fence *f,
>   container_of(cb, struct dma_fence_array_cb, cb);
>   struct dma_fence_array *array = array_cb->array;
>   
> + /*
> +  * Propagate the first error reported by any of our fences, but only
> +  * before we ourselves are signaled.
> +  */
> + if (atomic_read(&array->num_pending) > 0)
> + fence_set_error_once(&array->base, f->error);

That is racy even if you check the atomic because num_pending can be 
initialized to 1 for signal any arrays as well.

I suggest to rather test in dma_fence_array_set_error_once if we got an 
error and if yes grab the sequence lock and test if we are already 
signaled or not.

Christian.

> +
>   if (atomic_dec_and_test(&array->num_pending))
>   irq_work_queue(&array->work);
>   else
> @@ -63,6 +76,8 @@ static bool dma_fence_array_enable_signaling(struct 
> dma_fence *fence)
>   dma_fence_get(&array->base);
>   if (dma_fence_add_callback(array->fences[i], &cb[i].cb,
>  dma_fence_array_cb_func)) {
> + fence_set_error_once(&array->base,
> +  array->fences[i]->error);
>   dma_fence_put(&array->base);
>   if (atomic_dec_and_test(&array->num_pending))
>   return false;

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/6] dma-buf: add dynamic DMA-buf handling v13

2019-08-08 Thread Koenig, Christian
Am 08.08.19 um 09:29 schrieb Daniel Vetter:
> On Thu, Aug 8, 2019 at 9:09 AM Koenig, Christian
>  wrote:
>> Am 07.08.19 um 23:19 schrieb Daniel Vetter:
>>> On Wed, Jul 31, 2019 at 10:55:02AM +0200, Daniel Vetter wrote:
>>>> On Thu, Jun 27, 2019 at 09:28:11AM +0200, Christian König wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> those fails look like something random to me and not related to my patch
>>>>> set. Correct?
>>>> First one I looked at has the reservation_obj all over:
>>>>
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13438/fi-cml-u/igt@gem_exec_fe...@basic-busy-default.html
>>>>
>>>> So 5 second guees is ... probably real?
>>>>
>>>> Note that with the entire lmem stuff going on right now there's massive
>>>> discussions about how we're doing resv_obj vs obj->mm.lock the wrong way
>>>> round in i915, so I'm not surprised at all that you managed to trip this.
>>>>
>>>> The way I see it right now is that obj->mm.lock needs to be limited to
>>>> dealing with the i915 shrinker interactions only, and only for i915 native
>>>> objects. And for dma-bufs we need to make sure it's not anywhere in the
>>>> callchain. Unfortunately that's a massive refactor I guess ...
>>> Thought about this some more, aside from just breaking i915 or waiting
>>> until it's refactored (Both not awesome) I think the only option is get
>>> back to the original caching. And figure out whether we really need to
>>> take the direction into account for that, or whether upgrading to
>>> bidirectional unconditionally won't be ok. I think there's only really two
>>> cases where this matters:
>>>
>>> - display drivers using the cma/dma_alloc helpers. Everything is allocated
>>> fully coherent, cpu side wc, no flushing.
>>>
>>> - Everyone else (on platforms where there's actually some flushing going
>>> on) is for rendering gpus, and those always map bidirectional and want
>>> the mapping cached for as long as possible.
>>>
>>> With that we could go back to creating the cached mapping at attach time
>>> and avoid inflicting the reservation object lock to places that would keel
>>> over.
>>>
>>> Thoughts?
>> Actually we had a not so nice internal mail thread with our hardware
>> guys and it looks like we have tons of hardware bugs/exceptions that
>> sometimes PCIe BARs are only readable or only writable. So it turned out
>> that always caching with bidirectional won't work for us either.
>>
>> Additional to that I'm not sure how i915 actually triggered the issue,
>> cause with the current code that shouldn't be possible.
>>
>> But independent of that I came to the conclusion that we first need to
>> get to a common view of what the fences in the reservation mean or
>> otherwise the whole stuff here isn't going to work smooth either.
>>
>> So working on that for now and when that's finished I will come back to
>> this problem here again.
> Yeah makes sense. I think we also need to clarify a bit the existing
> rules around reservatrion_object, dma_fence signaling, and how that
> nests with everything else (like memory allocation/fs_reclaim critical
> sections, or mmap_sem).
>
> Ignore the drivers which just pin everything system memory (mostly
> just socs) I think we have a bunch of groups, and they're all somewhat
> incompatible with each another. Examples:
>
> - old ttm drivers (anything except amdgpu) nest the mmap_sem within
> the reservation_object. That allows you to do copy_*_user while
> holding reservations, simplifying command submission since you don't
> need fallback paths when you take a fault. But means you have this
> awkward trylock in the mmap path with no forward progress guarantee at
> all.
>
> amdgpu fixed that (but left ttm alone), i915 also works like that with
> mmap_sem being the outer lock.

By the way that is incorrect. Both amdgpu as well as readeon don't use 
copy_to/from_user while holding the reservation lock.

The last time I checked the only driver still doing that was nouveau.

Maybe time to add a might_lock() so that we will be informed about 
misuse by lockdep?

Christian.

>
> - other is reservation_object vs memory allocations. Currently all
> drivers assume you can allocate memory while holding a reservation,
> but i915 gem folks seem to have some plans to change that for i915.
> Which isn't going to work I think, so we 

Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/6] dma-buf: add dynamic DMA-buf handling v13

2019-08-08 Thread Koenig, Christian
Am 07.08.19 um 23:19 schrieb Daniel Vetter:
> On Wed, Jul 31, 2019 at 10:55:02AM +0200, Daniel Vetter wrote:
>> On Thu, Jun 27, 2019 at 09:28:11AM +0200, Christian König wrote:
>>> Hi Daniel,
>>>
>>> those fails look like something random to me and not related to my patch
>>> set. Correct?
>> First one I looked at has the reservation_obj all over:
>>
>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13438/fi-cml-u/igt@gem_exec_fe...@basic-busy-default.html
>>
>> So 5 second guees is ... probably real?
>>
>> Note that with the entire lmem stuff going on right now there's massive
>> discussions about how we're doing resv_obj vs obj->mm.lock the wrong way
>> round in i915, so I'm not surprised at all that you managed to trip this.
>>
>> The way I see it right now is that obj->mm.lock needs to be limited to
>> dealing with the i915 shrinker interactions only, and only for i915 native
>> objects. And for dma-bufs we need to make sure it's not anywhere in the
>> callchain. Unfortunately that's a massive refactor I guess ...
> Thought about this some more, aside from just breaking i915 or waiting
> until it's refactored (Both not awesome) I think the only option is get
> back to the original caching. And figure out whether we really need to
> take the direction into account for that, or whether upgrading to
> bidirectional unconditionally won't be ok. I think there's only really two
> cases where this matters:
>
> - display drivers using the cma/dma_alloc helpers. Everything is allocated
>fully coherent, cpu side wc, no flushing.
>
> - Everyone else (on platforms where there's actually some flushing going
>on) is for rendering gpus, and those always map bidirectional and want
>the mapping cached for as long as possible.
>
> With that we could go back to creating the cached mapping at attach time
> and avoid inflicting the reservation object lock to places that would keel
> over.
>
> Thoughts?

Actually we had a not so nice internal mail thread with our hardware 
guys and it looks like we have tons of hardware bugs/exceptions that 
sometimes PCIe BARs are only readable or only writable. So it turned out 
that always caching with bidirectional won't work for us either.

Additional to that I'm not sure how i915 actually triggered the issue, 
cause with the current code that shouldn't be possible.

But independent of that I came to the conclusion that we first need to 
get to a common view of what the fences in the reservation mean or 
otherwise the whole stuff here isn't going to work smooth either.

So working on that for now and when that's finished I will come back to 
this problem here again.

Regards,
Christian.


> -Daniel
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 8/8] dma-buf: nuke reservation_object seq number

2019-08-07 Thread Koenig, Christian
Am 07.08.19 um 14:19 schrieb Chris Wilson:
> Quoting Christian König (2019-08-07 13:08:38)
>> Am 06.08.19 um 21:57 schrieb Chris Wilson:
>>> If we add to shared-list during the read, ... Hmm, actually we should
>>> return num_list, i.e.
>>>
>>> do {
>>>*list = rcu_dereference(obj->fence);
>>>num_list = *list ? (*list)->count : 0;
>>>smp_rmb();
>>> } while (...)
>>>
>>> return num_list.
>>>
>>> as the relationship between the count and the fence entries is also
>>> determined by the mb in add_shared_fence.
>> I've read that multiple times now, but can't follow. Why should we do this?
>>
>> The only important thing is that the readers see the new fence before
>> the increment of the number of fences.
> Exactly. We order the store so that the fence is in the list before we
> update the count (so that we don't read garbage because the fence isn't
> there yet).
>
> But we don't have the equivalent here for the read once the rmb is
> removed from the seqcount_read_begin/end looping. We need to see the
> update in the same order as was stored, and only use the coherent
> portion of the list.

Ok that makes sense. Going to fix up the code regarding to that.

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Koenig, Christian
Am 31.07.19 um 02:51 schrieb Brian Welty:
[SNIP]
>> +/*
>> + * Memory types for drm_mem_region
>> + */
> #define DRM_MEM_SWAP?
 btw what did you have in mind for this? Since we use shmem we kinda don't
 know whether the BO is actually swapped out or not, at least on the i915
 side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.
>>> Yeah, the problem is not everybody can use shmem. For some use cases you
>>> have to use memory allocated through dma_alloc_coherent().
>>>
>>> So to be able to swap this out you need a separate domain to copy it
>>> from whatever is backing it currently to shmem.
>>>
>>> So we essentially have:
>>> DRM_MEM_SYS_SWAPABLE
>>> DRM_MEM_SYS_NOT_GPU_MAPPED
>>> DRM_MEM_SYS_GPU_MAPPED
>>>
>>> Or something like that.
>> Yeah i915-gem is similar. We oportunistically keep the pages pinned
>> sometimes even if not currently mapped into the (what ttm calls) TT.
>> So I think these three for system memory make sense for us too. I
>> think that's similar (at least in spirit) to the dma_alloc cache you
>> have going on. Mabye instead of the somewhat cumbersome NOT_GPU_MAPPED
>> we could have something like PINNED or so. Although it's not
>> permanently pinned, so maybe that's confusing too.
>>
> Okay, I see now I was far off the mark with what I thought TTM_PL_SYSTEM
> was.  The discussion helped clear up several bits of confusion on my part.
>  From proposed names, I find MAPPED and PINNED slightly confusing.
> In terms of backing store description, maybe these are a little better:
>DRM_MEM_SYS_UNTRANSLATED  (TTM_PL_SYSTEM)
>DRM_MEM_SYS_TRANSLATED(TTM_PL_TT or i915's SYSTEM)

That's still not correct. Let me describe what each of the tree stands for:

1. The backing store is a shmem file so the individual pages are 
swapable by the core OS.
2. The backing store is allocate GPU accessible but not currently in use 
by the GPU.
3. The backing store is currently in use by the GPU.

For i915 all three of those are basically the same and you only need to 
worry about it much.

But for other drivers that's certainly not true and we need this 
distinction of the backing store of an object.

I'm just not sure how we would handle that for cgroups. From experience 
we certainly want a limit over all 3, but you usually also want to limit 
3 alone.

And you also want to limit the amount of bytes moved between those 
states because each state transition might have a bandwidth cost 
associated with it.

> Are these allowed to be both overlapping? Or non-overlapping (partitioned)?
> Per Christian's point about removing .start, seems it doesn't need to
> matter.

You should probably completely drop the idea of this being regions.

And we should also rename them to something like drm_mem_domains to make 
that clear.

> Whatever we define for these sub-types, does it make sense for SYSTEM and
> VRAM to each have them defined?

No, absolutely not. VRAM as well as other private memory types are 
completely driver specific.

> I'm unclear how DRM_MEM_SWAP (or DRM_MEM_SYS_SWAPABLE) would get
> configured by driver...  this is a fixed size partition of host memory?
> Or it is a kind of dummy memory region just for swap implementation?

#1 and #2 in my example above should probably not be configured by the 
driver itself.

And yes seeing those as special for state handling sounds like the 
correct approach to me.

Regards,
Christian.

> TTM was clearly missing that resulting in a whole bunch of extra
> handling and rather complicated handling.
>
>> +#define DRM_MEM_SYSTEM 0
>> +#define DRM_MEM_STOLEN 1
> I think we need a better naming for that.
>
> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
> least for TTM this is the system memory currently GPU accessible.
 Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
 translation table window into system memory. Not the same thing at all.
>>> Thought so. The closest I have in mind is GTT, but everything else works
>>> as well.
>> Would your GPU_MAPPED above work for TT? I think we'll also need
>> STOLEN, I'm even hearing noises that there's going to be stolen for
>> discrete vram for us ... Also if we expand I guess we need to teach
>> ttm to cope with more, or maybe treat the DRM one as some kind of
>> sub-flavour.
> Daniel, maybe what i915 calls stolen could just be DRM_MEM_RESERVED or
> DRM_MEM_PRIV.  Or maybe can argue it falls into UNTRANSLATED type that
> I suggested above, I'm not sure.
>
> -Brian
>
>
>> -Daniel
>>
>>> Christian.
>>>
 -Daniel

> Thanks for looking into that,
> Christian.
>
> Am 30.07.19 um 02:32 schrieb Brian Welty:
>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>>  I fixed the nit with ordering of header includes that Sam noted. ]
>>
>> This RFC series is first implementation of some ideas expressed
>> earlier 

Re: [Intel-gfx] [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Koenig, Christian
Am 30.07.19 um 11:38 schrieb Daniel Vetter:
> On Tue, Jul 30, 2019 at 08:45:57AM +0000, Koenig, Christian wrote:
>> Yeah, that looks like a good start. Just a couple of random design
>> comments/requirements.
>>
>> First of all please restructure the changes so that you more or less
>> have the following:
>> 1. Adding of the new structures and functionality without any change to
>> existing code.
>> 2. Replacing the existing functionality in TTM and all of its drivers.
>> 3. Replacing the existing functionality in i915.
>>
>> This should make it much easier to review the new functionality when it
>> is not mixed with existing TTM stuff.
>>
>>
>> Second please completely drop the concept of gpu_offset or start of the
>> memory region like here:
>>> drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
>> At least on AMD hardware we have the following address spaces which are
>> sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus
>> addresses and physical addresses.
>>
>> Pushing a concept of a general GPU address space into the memory
>> management was a rather bad design mistake in TTM and we should not
>> repeat that here.
>>
>> A region should only consists of a size in bytes and (internal to the
>> region manager) allocations in that region.
>>
>>
>> Third please don't use any CPU or architecture specific types in any
>> data structures:
>>> +struct drm_mem_region {
>>> +   resource_size_t start; /* within GPU physical address space */
>>> +   resource_size_t io_start; /* BAR address (CPU accessible) */
>>> +   resource_size_t size;
>> I knew that resource_size is mostly 64bit on modern architectures, but
>> dGPUs are completely separate to the architecture and we always need
>> 64bits here at least for AMD hardware.
>>
>> So this should either be always uint64_t, or something like
>> gpu_resource_size which depends on what the compiled in drivers require
>> if we really need that.
>>
>> And by the way: Please always use bytes for things like sizes and not
>> number of pages, cause page size is again CPU/architecture specific and
>> GPU drivers don't necessary care about that.
>>
>>
>> And here also a few direct comments on the code:
>>> +   union {
>>> +   struct drm_mm *mm;
>>> +   /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
>>> +   void *priv;
>>> +   };
>> Maybe just always use void *mm here.
>>
>>> +   spinlock_t move_lock;
>>> +   struct dma_fence *move;
>> That is TTM specific and I'm not sure if we want it in the common memory
>> management handling.
>>
>> If we want that here we should probably replace the lock with some rcu
>> and atomic fence pointer exchange first.
>>
>>> +/*
>>> + * Memory types for drm_mem_region
>>> + */
>> #define DRM_MEM_SWAP    ?
> btw what did you have in mind for this? Since we use shmem we kinda don't
> know whether the BO is actually swapped out or not, at least on the i915
> side. So this would be more NOT_CURRENTLY_PINNED_AND_POSSIBLY_SWAPPED_OUT.

Yeah, the problem is not everybody can use shmem. For some use cases you 
have to use memory allocated through dma_alloc_coherent().

So to be able to swap this out you need a separate domain to copy it 
from whatever is backing it currently to shmem.

So we essentially have:
DRM_MEM_SYS_SWAPABLE
DRM_MEM_SYS_NOT_GPU_MAPPED
DRM_MEM_SYS_GPU_MAPPED

Or something like that.

>> TTM was clearly missing that resulting in a whole bunch of extra
>> handling and rather complicated handling.
>>
>>> +#define DRM_MEM_SYSTEM 0
>>> +#define DRM_MEM_STOLEN 1
>> I think we need a better naming for that.
>>
>> STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at
>> least for TTM this is the system memory currently GPU accessible.
> Yup this is wrong, for i915 we use this as stolen, for ttm it's the gpu
> translation table window into system memory. Not the same thing at all.

Thought so. The closest I have in mind is GTT, but everything else works 
as well.

Christian.

> -Daniel
>
>>
>> Thanks for looking into that,
>> Christian.
>>
>> Am 30.07.19 um 02:32 schrieb Brian Welty:
>>> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>>> I fixed the nit with ordering of header includes that Sam noted. ]
>>>
>>> This RFC series is first implementation of some ideas expressed
>>>

Re: [Intel-gfx] [RFC PATCH 0/3] Propose new struct drm_mem_region

2019-07-30 Thread Koenig, Christian
Yeah, that looks like a good start. Just a couple of random design 
comments/requirements.

First of all please restructure the changes so that you more or less 
have the following:
1. Adding of the new structures and functionality without any change to 
existing code.
2. Replacing the existing functionality in TTM and all of its drivers.
3. Replacing the existing functionality in i915.

This should make it much easier to review the new functionality when it 
is not mixed with existing TTM stuff.


Second please completely drop the concept of gpu_offset or start of the 
memory region like here:
> drm_printf(p, "gpu_offset: 0x%08llX\n", man->region.start);
At least on AMD hardware we have the following address spaces which are 
sometimes even partially overlapping: VM, MC, SYSTEM, FB, AGP, XGMI, bus 
addresses and physical addresses.

Pushing a concept of a general GPU address space into the memory 
management was a rather bad design mistake in TTM and we should not 
repeat that here.

A region should only consists of a size in bytes and (internal to the 
region manager) allocations in that region.


Third please don't use any CPU or architecture specific types in any 
data structures:
> +struct drm_mem_region {
> + resource_size_t start; /* within GPU physical address space */
> + resource_size_t io_start; /* BAR address (CPU accessible) */
> + resource_size_t size;

I knew that resource_size is mostly 64bit on modern architectures, but 
dGPUs are completely separate to the architecture and we always need 
64bits here at least for AMD hardware.

So this should either be always uint64_t, or something like 
gpu_resource_size which depends on what the compiled in drivers require 
if we really need that.

And by the way: Please always use bytes for things like sizes and not 
number of pages, cause page size is again CPU/architecture specific and 
GPU drivers don't necessary care about that.


And here also a few direct comments on the code:
> + union {
> + struct drm_mm *mm;
> + /* FIXME (for i915): struct drm_buddy_mm *buddy_mm; */
> + void *priv;
> + };
Maybe just always use void *mm here.

> + spinlock_t move_lock;
> + struct dma_fence *move;

That is TTM specific and I'm not sure if we want it in the common memory 
management handling.

If we want that here we should probably replace the lock with some rcu 
and atomic fence pointer exchange first.

> +/*
> + * Memory types for drm_mem_region
> + */

#define DRM_MEM_SWAP    ?

TTM was clearly missing that resulting in a whole bunch of extra 
handling and rather complicated handling.

> +#define DRM_MEM_SYSTEM   0
> +#define DRM_MEM_STOLEN   1

I think we need a better naming for that.

STOLEN sounds way to much like stolen VRAM for integrated GPUs, but at 
least for TTM this is the system memory currently GPU accessible.


Thanks for looking into that,
Christian.

Am 30.07.19 um 02:32 schrieb Brian Welty:
> [ By request, resending to include amd-gfx + intel-gfx.  Since resending,
>I fixed the nit with ordering of header includes that Sam noted. ]
>
> This RFC series is first implementation of some ideas expressed
> earlier on dri-devel [1].
>
> Some of the goals (open for much debate) are:
>- Create common base structure (subclass) for memory regions (patch #1)
>- Create common memory region types (patch #2)
>- Create common set of memory_region function callbacks (based on
>  ttm_mem_type_manager_funcs and intel_memory_regions_ops)
>- Create common helpers that operate on drm_mem_region to be leveraged
>  by both TTM drivers and i915, reducing code duplication
>- Above might start with refactoring ttm_bo_manager.c as these are
>  helpers for using drm_mm's range allocator and could be made to
>  operate on DRM structures instead of TTM ones.
>- Larger goal might be to make LRU management of GEM objects common, and
>  migrate those fields into drm_mem_region and drm_gem_object strucures.
>
> Patches 1-2 implement the proposed struct drm_mem_region and adds
> associated common set of definitions for memory region type.
>
> Patch #3 is update to i915 and is based upon another series which is
> in progress to add vram support to i915 [2].
>
> [1] https://lists.freedesktop.org/archives/dri-devel/2019-June/224501.html
> [2] https://lists.freedesktop.org/archives/intel-gfx/2019-June/203649.html
>
> Brian Welty (3):
>drm: introduce new struct drm_mem_region
>drm: Introduce DRM_MEM defines for specifying type of drm_mem_region
>drm/i915: Update intel_memory_region to use nested drm_mem_region
>
>   drivers/gpu/drm/i915/gem/i915_gem_object.c|  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  2 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c   | 10 ++---
>   drivers/gpu/drm/i915/i915_gpu_error.c |  2 +-
>   drivers/gpu/drm/i915/i915_query.c |  2 +-
>   drivers/gpu/drm/i915/intel_memory_region.c

Re: [Intel-gfx] [PATCH v1 02/11] drm: drop uapi dependency from drm_print.h

2019-07-29 Thread Koenig, Christian
Am 29.07.19 um 16:35 schrieb Sam Ravnborg:
 Even then it so useless (which drm driver is this message for???) that I
 want to remove them all :(
>>> Yeah, agree. I mean it is nice if the core drm functions use a prefix
>>> for debug output.
>>>
>>> But I actually don't see the point for individual drivers.
>> We should all migrate to the versions with device...
> Just to do an xkdc.com/927 I have considered:
>
> drm_err(const struct drm_device *drm, ...)
> drm_info(const struct drm_device *drm, ...)
>
> drm_kms_err(const struct drm_device *drm, ...)
> drm_kms_info(const struct drm_device *drm, ...))

Why not get completely rid of those and just use dev_err, dev_warn, 
pr_err, pr_warn etc?

I mean is it useful to have this extra printing subsystem in DRM while 
the standard Linux one actually does a better job?

Christian.

>
> And so on for vbl, primse. lease, state etc.
>
> Then we could fish out relevant info from the drm device and present
> this nicely.
>
> For the cases where no drm_device is available the fallback is:
> drm_dev_err(const struct drm_device *drm, ...)
> drm_dev_info(const struct drm_device *drm, ...))
>
> We could modify the existing UPPERCASE macros to be placeholders for
> the more reader friendly lower cases variants and base it all on the
> established infrastructure.
>
> But this is just idle thinking, as only a serious patch would stir the
> relevant discussions.
>
> For now this is far from the top of my TODO list.
>
>
>   Sam

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-19 Thread Koenig, Christian
Am 19.07.19 um 11:31 schrieb Daniel Vetter:
> On Fri, Jul 19, 2019 at 09:14:05AM +0000, Koenig, Christian wrote:
>> Am 19.07.19 um 10:57 schrieb Daniel Vetter:
>>> On Tue, Jul 16, 2019 at 04:21:53PM +0200, Christian König wrote:
>>>> Am 26.06.19 um 14:29 schrieb Daniel Vetter:
>>>> [SNIP]
>>> Well my mail here preceeded the entire amdkfd eviction_fence discussion.
>>> With that I'm not sure anymore, since we don't really need two approaches
>>> of the same thing. And if the plan is to move amdkfd over from the
>>> eviction_fence trick to using the invalidate callback here, then I think
>>> we might need some clarifications on what exactly that means.
>> Mhm, I thought that this was orthogonal. I mean the invalidation
>> callback for a buffer are independent from how the driver is going to
>> use it in the end.
>>
>> Or do you mean that we could use fences and save us from adding just
>> another mechanism for the same signaling thing?
>>
>> That could of course work, but I had the impression that you are not
>> very in favor of that.
> It won't, since you can either use the fence as the invalidate callback,
> or as a fence (for implicit sync). But not both.

Why not both? I mean implicit sync is an artifact you need to handle 
separately anyway.

> But I also don't think it's a good idea to have 2 invalidation mechanisms,
> and since we do have one merged in-tree already would be good to proof
> that the new one is up to the existing challenge.

Ok, how to proceed then? Should I fix up the implicit syncing of fences 
first? I've go a couple of ideas for that as well.

This way we won't have any driver specific definition of what the fences 
in a reservation object mean any more.

> For context: I spend way too much time reading ttm, amdgpu/kfd and i915-gem
> code and my overall impression is that everyone's just running around in
> opposite directions and it's one huge hairball of a mess. With a pretty
> even distribution of equally "eek this is horrible" but also "wow this is
> much better than what the other driver does". So that's why I'm even more
> on the "are we sure this is the right thing" train.

Totally agree on that, but we should also not make the mistake we have 
seen on Windows to try to force all drivers into a common memory management.

That didn't worked out that well in the end and I would rather go down 
the route of trying to move logic into separate components and backing 
off into driver specific logic if we found that common stuff doesn't work.

Christian.

> -Daniel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13

2019-07-19 Thread Koenig, Christian
Am 19.07.19 um 10:57 schrieb Daniel Vetter:
> On Tue, Jul 16, 2019 at 04:21:53PM +0200, Christian König wrote:
>> Am 26.06.19 um 14:29 schrieb Daniel Vetter:
>>> On Wed, Jun 26, 2019 at 02:23:05PM +0200, Christian König wrote:
>>> [SNIP]
>>> Also I looked at CI results and stuff, I guess you indeed didn't break the
>>> world because Chris Wilson has faught back struct_mutex far enough by now.
>>>
>>> Other issue is that since 2 weeks or so our CI started filtering all
>>> lockdep splats, so you need to dig into results yourself.
>>>
>>> btw on that, I think in the end the reservation_obj locking will at best
>>> be consistent between amdgpu and i915: I just remembered that all other
>>> ttm drivers have the mmap_sem vs resv_obj locking the wrong way round, and
>>> the trylock in ttm_bo_fault. So that part alone is hopeless, but I guess
>>> since radeon.ko is abandonware no one will ever fix that.
>>>
>>> So I think in the end it boils down to whether that's good enough to land
>>> it, or not.
>> Well can you give me an rb then? I mean at some point I have to push it to
>> drm-misc-next.
> Well my mail here preceeded the entire amdkfd eviction_fence discussion.
> With that I'm not sure anymore, since we don't really need two approaches
> of the same thing. And if the plan is to move amdkfd over from the
> eviction_fence trick to using the invalidate callback here, then I think
> we might need some clarifications on what exactly that means.

Mhm, I thought that this was orthogonal. I mean the invalidation 
callback for a buffer are independent from how the driver is going to 
use it in the end.

Or do you mean that we could use fences and save us from adding just 
another mechanism for the same signaling thing?

That could of course work, but I had the impression that you are not 
very in favor of that.

Christian.

> -Daniel
>
>> Christian.
>>
>>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v1 11/11] drm: drop uapi dependency from drm_file.h

2019-07-18 Thread Koenig, Christian
Am 18.07.19 um 18:15 schrieb Sam Ravnborg:
> drm_file used drm_magic_t from uapi/drm/drm.h.
> This is a simple unsigned int.
> Just opencode it as such to break the dependency from this header file
> to uapi.

Mhm, why do you want to remove UAPI dependency here in the first place?

I mean the type can't change because it is UAPI, but it is rather bad 
for a documentation point of view.

Christian.

>
> Signed-off-by: Sam Ravnborg 
> Suggested-by: Daniel Vetter 
> Cc: Sean Paul 
> Cc: Liviu Dudau 
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Jani Nikula 
> Cc: Eric Anholt 
> ---
>   include/drm/drm_file.h | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 67af60bb527a..046cd1bf91eb 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -34,8 +34,6 @@
>   #include 
>   #include 
>   
> -#include 
> -
>   #include 
>   
>   struct dma_fence;
> @@ -227,7 +225,7 @@ struct drm_file {
>   struct pid *pid;
>   
>   /** @magic: Authentication magic, see @authenticated. */
> - drm_magic_t magic;
> + unsigned int magic;
>   
>   /**
>* @lhead:

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v1 02/11] drm: drop uapi dependency from drm_print.h

2019-07-18 Thread Koenig, Christian
Am 18.07.19 um 18:46 schrieb Chris Wilson:
> Quoting Sam Ravnborg (2019-07-18 17:14:58)
>> drm_print.h used DRM_NAME - thus adding a dependency from
>> include/drm/drm_print.h => uapi/drm/drm.h
>>
>> Hardcode the name "drm" to break this dependency.
>> The idea is that there shall be a minimal dependency
>> between include/drm/* and uapi/*
>>
>> Signed-off-by: Sam Ravnborg 
>> Suggested-by: Daniel Vetter 
>> Reviewed-by: Daniel Vetter 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Sean Paul 
>> Cc: David Airlie 
>> Cc: Rob Clark 
>> Cc: Sean Paul 
>> Cc: Chris Wilson 
>> Cc: Daniel Vetter 
>> ---
>>   include/drm/drm_print.h | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
>> index a5d6f2f3e430..760d1bd0eaf1 100644
>> --- a/include/drm/drm_print.h
>> +++ b/include/drm/drm_print.h
>> @@ -32,8 +32,6 @@
>>   #include 
>>   #include 
>>   
>> -#include 
>> -
>>   /**
>>* DOC: print
>>*
>> @@ -287,7 +285,7 @@ void drm_err(const char *format, ...);
>>   /* Macros to make printk easier */
>>   
>>   #define _DRM_PRINTK(once, level, fmt, ...) \
>> -   printk##once(KERN_##level "[" DRM_NAME "] " fmt, ##__VA_ARGS__)
>> +   printk##once(KERN_##level "[drm] " fmt, ##__VA_ARGS__)
> I guess I'm th only one who
>
> #undef DRM_NAME
> #define DRM_NAME i915
>
> just so that I didn't have inane logs?
>
> One might suggest that instead of hardcoding it, follow the pr_fmt()
> pattern and only add "[drm]" for the drm core.
>
> Even then it so useless (which drm driver is this message for???) that I
> want to remove them all :(

Yeah, agree. I mean it is nice if the core drm functions use a prefix 
for debug output.

But I actually don't see the point for individual drivers.

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3 2/3] drm: plumb attaching dev thru to prime_pin/unpin

2019-07-16 Thread Koenig, Christian
Am 16.07.19 um 23:37 schrieb Rob Clark:
> From: Rob Clark 
>
> Needed in the following patch for cache operations.

Well have you seen that those callbacks are deprecated?
>* Deprecated hook in favour of &drm_gem_object_funcs.pin.

>* Deprecated hook in favour of &drm_gem_object_funcs.unpin.
>

I would rather say if you want to extend something it would be better to 
switch over to the per GEM object functions first.

Regards,
Christian.

>
> Signed-off-by: Rob Clark 
> ---
> v3: rebased on drm-tip
>
>   drivers/gpu/drm/drm_gem.c   | 8 
>   drivers/gpu/drm/drm_internal.h  | 4 ++--
>   drivers/gpu/drm/drm_prime.c | 4 ++--
>   drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 4 ++--
>   drivers/gpu/drm/msm/msm_drv.h   | 4 ++--
>   drivers/gpu/drm/msm/msm_gem_prime.c | 4 ++--
>   drivers/gpu/drm/nouveau/nouveau_gem.h   | 4 ++--
>   drivers/gpu/drm/nouveau/nouveau_prime.c | 4 ++--
>   drivers/gpu/drm/qxl/qxl_prime.c | 4 ++--
>   drivers/gpu/drm/radeon/radeon_prime.c   | 4 ++--
>   drivers/gpu/drm/vgem/vgem_drv.c | 4 ++--
>   include/drm/drm_drv.h   | 5 ++---
>   12 files changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 84689ccae885..af2549c45027 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1215,22 +1215,22 @@ void drm_gem_print_info(struct drm_printer *p, 
> unsigned int indent,
>   obj->dev->driver->gem_print_info(p, indent, obj);
>   }
>   
> -int drm_gem_pin(struct drm_gem_object *obj)
> +int drm_gem_pin(struct drm_gem_object *obj, struct device *dev)
>   {
>   if (obj->funcs && obj->funcs->pin)
>   return obj->funcs->pin(obj);
>   else if (obj->dev->driver->gem_prime_pin)
> - return obj->dev->driver->gem_prime_pin(obj);
> + return obj->dev->driver->gem_prime_pin(obj, dev);
>   else
>   return 0;
>   }
>   
> -void drm_gem_unpin(struct drm_gem_object *obj)
> +void drm_gem_unpin(struct drm_gem_object *obj, struct device *dev)
>   {
>   if (obj->funcs && obj->funcs->unpin)
>   obj->funcs->unpin(obj);
>   else if (obj->dev->driver->gem_prime_unpin)
> - obj->dev->driver->gem_prime_unpin(obj);
> + obj->dev->driver->gem_prime_unpin(obj, dev);
>   }
>   
>   void *drm_gem_vmap(struct drm_gem_object *obj)
> diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
> index 51a2055c8f18..e64090373e3a 100644
> --- a/drivers/gpu/drm/drm_internal.h
> +++ b/drivers/gpu/drm/drm_internal.h
> @@ -133,8 +133,8 @@ void drm_gem_release(struct drm_device *dev, struct 
> drm_file *file_private);
>   void drm_gem_print_info(struct drm_printer *p, unsigned int indent,
>   const struct drm_gem_object *obj);
>   
> -int drm_gem_pin(struct drm_gem_object *obj);
> -void drm_gem_unpin(struct drm_gem_object *obj);
> +int drm_gem_pin(struct drm_gem_object *obj, struct device *dev);
> +void drm_gem_unpin(struct drm_gem_object *obj, struct device *dev);
>   void *drm_gem_vmap(struct drm_gem_object *obj);
>   void drm_gem_vunmap(struct drm_gem_object *obj, void *vaddr);
>   
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 189d980402ad..126860432ff9 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -575,7 +575,7 @@ int drm_gem_map_attach(struct dma_buf *dma_buf,
>   {
>   struct drm_gem_object *obj = dma_buf->priv;
>   
> - return drm_gem_pin(obj);
> + return drm_gem_pin(obj, attach->dev);
>   }
>   EXPORT_SYMBOL(drm_gem_map_attach);
>   
> @@ -593,7 +593,7 @@ void drm_gem_map_detach(struct dma_buf *dma_buf,
>   {
>   struct drm_gem_object *obj = dma_buf->priv;
>   
> - drm_gem_unpin(obj);
> + drm_gem_unpin(obj, attach->dev);
>   }
>   EXPORT_SYMBOL(drm_gem_map_detach);
>   
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> index a05292e8ed6f..67e69a5f00f2 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> @@ -43,7 +43,7 @@ int etnaviv_gem_prime_mmap(struct drm_gem_object *obj,
>   return etnaviv_obj->ops->mmap(etnaviv_obj, vma);
>   }
>   
> -int etnaviv_gem_prime_pin(struct drm_gem_object *obj)
> +int etnaviv_gem_prime_pin(struct drm_gem_object *obj, struct device *dev)
>   {
>   if (!obj->import_attach) {
>   struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> @@ -55,7 +55,7 @@ int etnaviv_gem_prime_pin(struct drm_gem_object *obj)
>   return 0;
>   }
>   
> -void etnaviv_gem_prime_unpin(struct drm_gem_object *obj)
> +void etnaviv_gem_prime_unpin(struct drm_gem_object *obj, struct device *dev)
>   {
>   if (!obj->import_attach) {
>   struct etnaviv_gem_object *etnaviv_obj = to_etnaviv

Re: [Intel-gfx] [PATCH v6 07/11] drm/i915: add syncobj timeline support

2019-07-15 Thread Koenig, Christian
Hi Lionel,

sorry for the delayed response, I'm just back from vacation.

Am 03.07.19 um 11:17 schrieb Lionel Landwerlin:
> On 03/07/2019 11:56, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2019-07-01 12:34:33)
>>> +   syncobj = drm_syncobj_find(eb->file, 
>>> user_fence.handle);
>>> +   if (!syncobj) {
>>> +   DRM_DEBUG("Invalid syncobj handle provided\n");
>>> +   err = -EINVAL;
>>> +   goto err;
>>> +   }
>>> +
>>> +   if (user_fence.flags & I915_EXEC_FENCE_WAIT) {
>>> +   fence = drm_syncobj_fence_get(syncobj);
>>> +   if (!fence) {
>>> +   DRM_DEBUG("Syncobj handle has no 
>>> fence\n");
>>> +   drm_syncobj_put(syncobj);
>>> +   err = -EINVAL;
>>> +   goto err;
>>> +   }
>>> +
>>> +   err = dma_fence_chain_find_seqno(&fence, 
>>> point);
>> I'm very dubious about chain_find_seqno().
>>
>> It returns -EINVAL if the point is older than the first in the chain --
>> it is in an unknown state, but may be signaled since we remove signaled
>> links from the chain. If we are waiting for an already signaled syncpt,
>> we should not be erring out!
>
>
> You're right, I got this wrong.
>
> We can get fence = NULL if the point is already signaled.
>
> The easiest would be to skip it from the list, or add the stub fence.
>
>
> I guess the CTS got lucky that it always got the point needed before 
> it was garbage collected...

The topmost point is never garbage collected. So IIRC the check is 
actually correct and you should never get NULL here.

>>
>> Do we allow later requests to insert earlier syncpt into the chain? If
>> so, then the request we wait on here may be woefully inaccurate and
>> quite easily lead to cycles in the fence tree. We have no way of
>> resolving such deadlocks -- we would have to treat this fence as a
>> foreign fence and install a backup timer. Alternatively, we only allow
>> this to return the exact fence for a syncpt, and proxies for the rest.
>
>
> Adding points < latest added point is forbidden.
>
> I wish we enforced it a bit more than what's currently done in 
> drm_syncobj_add_point().
>
> In my view we should :
>
>     - lock the syncobj in get_timeline_fence_array() do the sanity 
> check there.
>
>     - keep the lock until we add the point to the timeline
>
>     - unlock once added
>
>
> That way we would ensure that the application cannot generate invalid 
> timelines and error out if it does.
>
> We could do the same for host signaling in 
> drm_syncobj_timeline_signal_ioctl/drm_syncobj_transfer_to_timeline 
> (there the locking a lot shorter).
>
> That requires holding the lock for longer than maybe other driver 
> would prefer.
>
>
> Ccing Christian who can tell whether that's out of question for AMD.

Yeah, adding the lock was the only other option I could see as well, but 
we intentionally decided against that.

Since we have multiple out sync objects we would need to use a ww_mutex 
as lock here.

That in turn would result in a another rather complicated dance for 
deadlock avoidance. Something which each driver would have to implement 
correctly.

That doesn't sounds like a good idea to me just to improve error checking.

As long as it is only in the same process userspace could check that as 
well before doing the submission.

Regards,
Christian.



>
>
> Cheers,
>
>
> -Lionel
>
>
>>> +   if (err || !fence) {
>>> +   DRM_DEBUG("Syncobj handle missing 
>>> requested point\n");
>>> +   drm_syncobj_put(syncobj);
>>> +   err = err != 0 ? err : -EINVAL;
>>> +   goto err;
>>> +   }
>>> +   }
>>> +
>>> +   /*
>>> +    * For timeline syncobjs we need to preallocate 
>>> chains for
>>> +    * later signaling.
>>> +    */
>>> +   if (point != 0 && user_fence.flags & 
>>> I915_EXEC_FENCE_SIGNAL) {
>>> +   fences[n].chain_fence =
>>> + kmalloc(sizeof(*fences[n].chain_fence),
>>> +   GFP_KERNEL);
>>> +   if (!fences[n].chain_fence) {
>>> +   dma_fence_put(fence);
>>> +   drm_syncobj_put(syncobj);
>>> +   err = -ENOMEM;
>>> +   DRM_DEBUG("Unable to alloc 
>>> chain_fence\n");
>>> +   goto err;
>>> +   }
>> What happens if we later try to insert two fences for the same syncpt?
>> Should we not reserve the slot in the chain to reject duplicates?
>> -Chris
>>
>

___
Intel-gfx mailing list
In

Re: [Intel-gfx] [PATCH 1/2] dma-buf: Expand reservation_list to fill allocation

2019-07-14 Thread Koenig, Christian
Am 12.07.19 um 10:03 schrieb Chris Wilson:
> Since kmalloc() will round up the allocation to the next slab size or
> page, it will normally return a pointer to a memory block bigger than we
> asked for. We can query for the actual size of the allocated block using
> ksize() and expand our variable size reservation_list to take advantage
> of that extra space.
>
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> Cc: Michel Dänzer 

Reviewed-by: Christian König 

BTW: I was wondering if we shouldn't replace the reservation_object_list 
with a dma_fence_chain.

That would costs us a bit more memory and is slightly slower on querying 
the fence in the container.

But it would be much faster on adding new fences and massively 
simplifies waiting or returning all fences currently in the container.

Christian.

> ---
>   drivers/dma-buf/reservation.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index a6ac2b3a0185..80ecc1283d15 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -153,7 +153,9 @@ int reservation_object_reserve_shared(struct 
> reservation_object *obj,
>   RCU_INIT_POINTER(new->shared[j++], fence);
>   }
>   new->shared_count = j;
> - new->shared_max = max;
> + new->shared_max =
> + (ksize(new) - offsetof(typeof(*new), shared)) /
> + sizeof(*new->shared);
>   
>   preempt_disable();
>   write_seqcount_begin(&obj->seq);
> @@ -169,7 +171,7 @@ int reservation_object_reserve_shared(struct 
> reservation_object *obj,
>   return 0;
>   
>   /* Drop the references to the signaled fences */
> - for (i = k; i < new->shared_max; ++i) {
> + for (i = k; i < max; ++i) {
>   struct dma_fence *fence;
>   
>   fence = rcu_dereference_protected(new->shared[i],

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v12

2019-06-26 Thread Koenig, Christian
Am 26.06.19 um 10:17 schrieb Daniel Vetter:
> On Wed, Jun 26, 2019 at 09:49:03AM +0200, Christian König wrote:
>> Am 25.06.19 um 18:05 schrieb Daniel Vetter:
>>> On Tue, Jun 25, 2019 at 02:46:49PM +0200, Christian König wrote:
 On the exporter side we add optional explicit pinning callbacks. If those
 callbacks are implemented the framework no longer caches sg tables and the
 map/unmap callbacks are always called with the lock of the reservation 
 object
 held.

 On the importer side we add an optional invalidate callback. This callback 
 is
 used by the exporter to inform the importers that their mappings should be
 destroyed as soon as possible.

 This allows the exporter to provide the mappings without the need to pin
 the backing store.

 v2: don't try to invalidate mappings when the callback is NULL,
   lock the reservation obj while using the attachments,
   add helper to set the callback
 v3: move flag for invalidation support into the DMA-buf,
   use new attach_info structure to set the callback
 v4: use importer_priv field instead of mangling exporter priv.
 v5: drop invalidation_supported flag
 v6: squash together with pin/unpin changes
 v7: pin/unpin takes an attachment now
 v8: nuke dma_buf_attachment_(map|unmap)_locked,
   everything is now handled backward compatible
 v9: always cache when export/importer don't agree on dynamic handling
 v10: minimal style cleanup
 v11: drop automatically re-entry avoidance
 v12: rename callback to move_notify

 Signed-off-by: Christian König 
>>> One thing I've forgotten, just stumbled over ttm_bo->moving. For pinned
>>> buffer sharing that's not needed, and I think for dynamic buffer sharing
>>> it's also not going to be the primary requirement. But I think there's two
>>> reasons we should maybe look into moving that from ttm_bo to resv_obj:
>> That is already part of the resv_obj. The difference is that radeon is
>> overwriting the one in the resv_obj during CS while amdgpu isn't.
> I'm confused here: Atm ->moving isn't in resv_obj, there's only one
> exclusive fence. And yes you need to set that every time you do a move
> (because a move needs to be pretty exclusive access). But I'm not seeing a
> separate not_quite_exclusive fence slot for moves.

Yeah, but shouldn't that be sufficient? I mean why does somebody else 
than the exporter needs to know when a BO is moving?

>> So for amdgpu we keep an extra copy in ttm_bo->moving to keep the page fault
>> handler from unnecessary waiting for a fence in radeon.
> Yeah that's the main one. The other is in CS (at least for i915) we could
> run pipeline texture uploads in parallel with other rendering and stuff
> like that (with multiple engines, which atm is also not there yet). I
> think that could be somewhat useful for vk drivers.
>
> Anyway, totally not understand what you wanted to tell me here in these
> two lines.

Sorry it's 33C in my home office here and I mixed up radeon/amdgpu in 
the sentence above.

>>> - You sound like you want to use this a lot more, even internally in
>>> amdgpu. For that I do think the sepearate dma_fence just to make sure
>>> the buffer is accessible will be needed in resv_obj.
>>>
>>> - Once we have ->moving I think there's some good chances to extract a bit
>>> of the eviction/pipeline bo move boilerplate from ttm, and maybe use it
>>> in other drivers. i915 could already make use of this in upstream, since
>>> we already pipeline get_pages and clflush of buffers. Ofc once we have
>>> vram support, even more useful.
>> I actually indeed wanted to add more stuff to the reservation object
>> implementation, like finally cleaning up the distinction of readers/writers.
> Hm, more details? Not ringing a bell ...

I'm not yet sure about the details either, so please just wait until I 
solved that all up for me first.

>> And cleaning up the fence removal hack we have in the KFD for freed up BOs.
>> That would also allow for getting rid of this in the long term.
> Hm, what's that for?

When the KFD frees up memory it removes their eviction fence from the 
reservation object instead of setting it as signaled and adding a new 
one to all other used reservation objects.

Christian.

> -Daniel
>
>> Christian.
>>
>>> And doing that slight semantic change is much easier once we only have a
>>> few dynamic exporters/importers. And since it's a pure opt-in optimization
>>> (you can always fall back to the exclusive fence) it should be easy to
>>> roll out.
>>>
>>> Thoughts about moving ttm_bo->moving to resv_obj? Ofc strictly only as a
>>> follow up. Plus maybe with a clearer name :-)
>>>
>>> Cheers, Daniel
>>>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v10

2019-06-25 Thread Koenig, Christian
Am 24.06.19 um 16:41 schrieb Daniel Vetter:
> On Mon, Jun 24, 2019 at 03:58:00PM +0200, Christian König wrote:
>> Am 24.06.19 um 13:23 schrieb Koenig, Christian:
>>> Am 21.06.19 um 18:27 schrieb Daniel Vetter:
>>>
>>>> So I pondered a few ideas while working out:
>>>>
>>>> 1) We drop this filtering. Importer needs to keep track of all its
>>>> mappings and filter out invalidates that aren't for that specific importer
>>>> (either because already invalidated, or not yet mapped, or whatever).
>>>> Feels fragile.
>>>>
>>>> [SNIP]
>>> [SNIP]
>>>
>>> I will take a moment and look into #1 as well, but I still don't see the
>>> need to change anything.
>> That turned out much cleaner than I thought it would be. Essentially it is
>> only a single extra line of code in amdgpu.
>>
>> Going to send that out as a patch set in a minute.
> Yeah I mean kinda expected that because:
> - everything's protected with ww_mutex anyway
> - importer needs to keep track of mappings anways
> So really all it needs to do is not be stupid and add the mapping it just
> created to its tracking while still holding the ww_mutex. Similar on
> invalidate/unmap.
>
> With that all we need is a huge note in the docs that importers need to
> keep track of their mappings and dtrt (with all the examples here spelled
> out in the appropriate kerneldoc). And then I'm happy :-)

Should I also rename the invalidate callback into move_notify? Would 
kind of make sense since we are not necessary directly invalidating 
mappings.

Christian.

>
> Cheers, Daniel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v10

2019-06-24 Thread Koenig, Christian
Am 21.06.19 um 18:27 schrieb Daniel Vetter:
>>> Your scenario here is new, and iirc my suggestion back then was to
>>> count the number of pending mappings so you don't go around calling
>>> ->invalidate on mappings that don't exist.
>> Well the key point is we don't call invalidate on mappings, but we call
>> invalidate on attachments.
>>
>> When the invalidate on an attachment is received all the importer should at
>> least start to tear down all mappings.
> Hm, so either we invalidate mappings instead (pretty big change for
> dma-buf, but maybe worth it). Or importers need to deal with invalidate on
> stuff they're don't even have mapped anywhere anyway.

I actually I don't see a problem with this, but see below.

>> [SNIP]
>>> - your scenario, where you call ->invalidate on an attachment which
>>> doesn't have a mapping. I'll call that very lazy accounting, feels
>>> like a bug :-) It's also very easy to fix by keeping track who
>>> actually has a mapping, and then you fix it everywhere, not just for
>>> the specific case of a recursion into the same caller.
>> Yeah, exactly. Unfortunately it's not so easy to handle as just a counter.
>>
>> When somebody unmaps a mapping you need to know if that is already
>> invalidated or not. And this requires tracking of each mapping.
> Yeah we'd need to track mappings. Well, someone has to track mappings, and
> atm it seems to be a mix of both importer and exporter (and dma-buf.c).

Maybe I'm missing something, but I don't see the mix?

Only the importer is responsible to tracking mappings, e.g. the importer 
calls dma_buf_map_attachment() when it needs a mapping and calls 
dma_buf_unmap_attachment() when it is done with the mapping.

In between those two neither the exporter nor the DMA-buf cares about 
what mappings currently exist. And apart from debugging I actually don't 
see a reason why they should.

>> [SNIP]
>>> But I guess there's other fixes too possible.
>>>
>>> Either way none of this is about recursion, I think the recursive case
>>> is simply the one where you've hit this already. Drivers will have to
>>> handle all these additional ->invalidates no matter what with your
>>> current proposal. After all the point here is that the exporter can
>>> move the buffers around whenever it feels like, for whatever reasons.
>> The recursion case is still perfectly valid. In the importer I need to
>> ignore invalidations which are caused by creating a mapping.
>>
>> Otherwise it is perfectly possible that we invalidate a mapping because of
>> its creation which will result in creating a new one
>>
>> So even if you fix up your mapping case, you absolutely still need this to
>> prevent recursion :)
> Hm, but if we stop tracking attachments and instead start tracking
> mappings, then how is that possible:

Yeah, but why should we do this? I don't see a benefit here. Importers 
just create/destroy mappings as they need them.

> 1. importer has no mappings
> 2. importer creates attachment. still no mapping
> 3. importer calls dma_buf_attach_map_sg, still no mapping at this point
> 4. we call into the exporter implementation. still no mapping
> 5. exporter does whatever it does. still no mapping
> 6. exporter finishes. conceptually from the dma-buf pov, _this_ is where
> the mapping starts to exist.
> 7. invalidates (hey the exporter maybe changed its mind!) are totally
> fine, and will be serialized with ww_mutex.
>
> So I kinda don understand why the exporter here is allowed to call
> invalidate too early (the mapping doesn't exist yet from dma-buf pov), and
> dma-buf needs to filter it out.
>
> But anywhere else where we might call ->invalidate and there's not yet a
> mapping (again purely from dma-buf pov), there the importer is supposed to
> do the filter.

Maybe this becomes clearer if we call the callback "moved" instead of 
"invalidated"?

I mean this is actually all about the exporter informing the importer 
that the DMA-buf in question is moving to a new location.

That we need to create a new mapping and destroy the old one at some 
point is an implementation detail on the importer.

I mean the long term idea is to use this for notification that a buffer 
is moving inside the same driver as well. And in this particular case I 
actually don't think that we would create mappings at all. Thinking more 
about it this is actually a really good argument to favor the 
implementation as it is currently.

> Someone needs to keep track of all this, and I want clear
> responsibilities. What they are exactly is not that important.

Clear responsibilities is indeed a good idea.

>>> We could also combine the last two with some helpers, e.g. if your
>>> exporter really expects importers to delay the unmap until it's no
>>> longer in use, then we could do a small helper which puts all these
>>> unmaps onto a list with a worker. But I think you want to integrate
>>> that into your exporters lru management directly.
>>>
 So this is just the most defensive thing I

Re: [Intel-gfx] [PATCH 09/59] drm/prime: Align gem_prime_export with obj_funcs.export

2019-06-17 Thread Koenig, Christian
Am 14.06.19 um 22:35 schrieb Daniel Vetter:
> The idea is that gem_prime_export is deprecated in favor of
> obj_funcs.export. That's much easier to do if both have matching
> function signatures.
>
> Signed-off-by: Daniel Vetter 
> Cc: Russell King 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Sean Paul 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Zhenyu Wang 
> Cc: Zhi Wang 
> Cc: Jani Nikula 
> Cc: Joonas Lahtinen 
> Cc: Rodrigo Vivi 
> Cc: Tomi Valkeinen 
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: "David (ChunMing) Zhou" 
> Cc: Thierry Reding 
> Cc: Jonathan Hunter 
> Cc: Dave Airlie 
> Cc: Eric Anholt 
> Cc: "Michel Dänzer" 
> Cc: Chris Wilson 
> Cc: Huang Rui 
> Cc: Felix Kuehling 
> Cc: Hawking Zhang 
> Cc: Feifei Xu 
> Cc: Jim Qu 
> Cc: Evan Quan 
> Cc: Matthew Auld 
> Cc: Mika Kuoppala 
> Cc: Thomas Zimmermann 
> Cc: Kate Stewart 
> Cc: Sumit Semwal 
> Cc: Jilayne Lovejoy 
> Cc: Thomas Gleixner 
> Cc: Mikulas Patocka 
> Cc: Greg Kroah-Hartman 
> Cc: Junwei Zhang 
> Cc: intel-gvt-...@lists.freedesktop.org
> Cc: intel-gfx@lists.freedesktop.org
> Cc: amd-...@lists.freedesktop.org
> Cc: linux-te...@vger.kernel.org

Acked-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c  | 7 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.h  | 3 +--
>   drivers/gpu/drm/armada/armada_gem.c  | 5 ++---
>   drivers/gpu/drm/armada/armada_gem.h  | 3 +--
>   drivers/gpu/drm/drm_prime.c  | 9 -
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   | 5 ++---
>   drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 8 
>   drivers/gpu/drm/i915/gvt/dmabuf.c| 2 +-
>   drivers/gpu/drm/i915/i915_drv.h  | 3 +--
>   drivers/gpu/drm/omapdrm/omap_gem.h   | 3 +--
>   drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c| 5 ++---
>   drivers/gpu/drm/radeon/radeon_drv.c  | 3 +--
>   drivers/gpu/drm/radeon/radeon_prime.c| 5 ++---
>   drivers/gpu/drm/tegra/gem.c  | 7 +++
>   drivers/gpu/drm/tegra/gem.h  | 3 +--
>   drivers/gpu/drm/udl/udl_dmabuf.c | 5 ++---
>   drivers/gpu/drm/udl/udl_drv.h| 3 +--
>   drivers/gpu/drm/vc4/vc4_bo.c | 5 ++---
>   drivers/gpu/drm/vc4/vc4_drv.h| 3 +--
>   drivers/gpu/drm/vgem/vgem_fence.c| 2 +-
>   include/drm/drm_drv.h| 4 ++--
>   include/drm/drm_prime.h  | 3 +--
>   22 files changed, 39 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index 489041df1f45..4809d4a5d72a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -345,8 +345,7 @@ const struct dma_buf_ops amdgpu_dmabuf_ops = {
>* Returns:
>* Shared DMA buffer representing the GEM BO from the given device.
>*/
> -struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
> - struct drm_gem_object *gobj,
> +struct dma_buf *amdgpu_gem_prime_export(struct drm_gem_object *gobj,
>   int flags)
>   {
>   struct amdgpu_bo *bo = gem_to_amdgpu_bo(gobj);
> @@ -356,9 +355,9 @@ struct dma_buf *amdgpu_gem_prime_export(struct drm_device 
> *dev,
>   bo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID)
>   return ERR_PTR(-EPERM);
>   
> - buf = drm_gem_prime_export(dev, gobj, flags);
> + buf = drm_gem_prime_export(gobj, flags);
>   if (!IS_ERR(buf)) {
> - buf->file->f_mapping = dev->anon_inode->i_mapping;
> + buf->file->f_mapping = gobj->dev->anon_inode->i_mapping;
>   buf->ops = &amdgpu_dmabuf_ops;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.h
> index c7056cbe8685..7f73a4f94204 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.h
> @@ -30,8 +30,7 @@ struct drm_gem_object *
>   amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
>struct dma_buf_attachment *attach,
>struct sg_table *sg);
> -struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
> - struct drm_gem_object *gobj,
> +struct dma_buf *amdgpu_gem_prime_export(struct drm_gem_object *gobj,
>   int flags);
>   struct drm_gem_object *amdgpu_gem_prime_import(struct drm_device *dev,
>   struct dma_buf *dma_buf);
> diff --git a/drivers/gpu/drm/armada/armada_gem.c 
> b/drivers/gpu/drm/armada/armada_gem.c
> index 642d0e70d0f8..7e7fcc3f1f7f 100644
> --- 

Re: [Intel-gfx] [PATCH] dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc

2019-06-04 Thread Koenig, Christian
Am 04.06.19 um 14:39 schrieb Chris Wilson:
> If we have to drop the seqcount & rcu lock to perform a krealloc, we
> have to restart the loop. In doing so, be careful not to lose track of
> the already acquired exclusive fence.
>
> Fixes: fedf54132d24 ("dma-buf: Restart reservation_object_get_fences_rcu() 
> after writes") #v4.10
> Signed-off-by: Chris Wilson 
> Cc: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Sumit Semwal 
> Cc: sta...@vger.kernel.org
> ---
>   drivers/dma-buf/reservation.c | 6 ++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 4d32e2c67862..704503df4892 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -365,6 +365,12 @@ int reservation_object_get_fences_rcu(struct 
> reservation_object *obj,
>  GFP_NOWAIT | __GFP_NOWARN);
>   if (!nshared) {
>   rcu_read_unlock();
> +
> + if (fence_excl) {
> + dma_fence_put(fence_excl);
> + fence_excl = NULL;
> + }
> +

dma_fence_put is NULL save, so no need for the if.

But apart from that a good catch,
Christian.

>   nshared = krealloc(shared, sz, GFP_KERNEL);
>   if (nshared) {
>   shared = nshared;

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 13/13] drm: allow render capable master with DRM_AUTH ioctls

2019-05-27 Thread Koenig, Christian
Am 27.05.19 um 14:10 schrieb Emil Velikov:
> On 2019/05/27, Christian König wrote:
>> Am 27.05.19 um 10:17 schrieb Emil Velikov:
>>> From: Emil Velikov 
>>>
>>> There are cases (in mesa and applications) where one would open the
>>> primary node without properly authenticating the client.
>>>
>>> Sometimes we don't check if the authentication succeeds, but there's
>>> also cases we simply forget to do it.
>>>
>>> The former was a case for Mesa where it did not not check the return
>>> value of drmGetMagic() [1]. That was fixed recently although, there's
>>> the question of older drivers or other apps that exbibit this behaviour.
>>>
>>> While omitting the call results in issues as seen in [2] and [3].
>>>
>>> In the libva case, libva itself doesn't authenticate the DRM client and
>>> the vaGetDisplayDRM documentation doesn't mention if the app should
>>> either.
>>>
>>> As of today, the official vainfo utility doesn't authenticate.
>>>
>>> To workaround issues like these, some users resort to running their apps
>>> under sudo. Which admittedly isn't always a good idea.
>>>
>>> Since any DRIVER_RENDER driver has sufficient isolation between clients,
>>> we can use that, for unauthenticated [primary node] ioctls that require
>>> DRM_AUTH. But only if the respective ioctl is tagged as DRM_RENDER_ALLOW.
>>>
>>> v2:
>>> - Rework/simplify if check (Daniel V)
>>> - Add examples to commit messages, elaborate. (Daniel V)
>>>
>>> v3:
>>> - Use single unlikely (Daniel V)
>>>
>>> v4:
>>> - Patch was reverted because it broke AMDGPU, apply again. The AMDGPU
>>> issue is fixed with earlier patch.
>> As far as I can see this only affects the following two IOCTLs after
>> removing DRM_AUTH from the DRM_RENDER_ALLOW IOCTLs:
>>> DRM_IOCTL_DEF(DRM_IOCTL_PRIME_HANDLE_TO_FD,
>>> drm_prime_handle_to_fd_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>>      DRM_IOCTL_DEF(DRM_IOCTL_PRIME_FD_TO_HANDLE,
>>> drm_prime_fd_to_handle_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW)
>> So I think it would be simpler to just remove DRM_AUTH from those two
>> instead of allowing it for everybody.
>>
> If I understand you correctly this will remove DRM_AUTH also for drivers
> which expose only a primary node. I'm not sure if that is a good idea.

That's a good point, but I have doubts that those drivers implement the 
necessary callbacks and/or set the core feature flag for the IOCTLs.

So the maximum what could happen is that you change the returned error 
from -EACCES into -EOPNOTSUPP/-ENOSYS.

Regards,
Christian.

> That said, if others are OK with the idea I will prepare a patch.
>
> Thanks
> Emil

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH] dma-buf: add struct dma_buf_attach_info v2

2019-05-03 Thread Koenig, Christian
Am 03.05.19 um 14:09 schrieb Daniel Vetter:
> [CAUTION: External Email]
>
> On Fri, May 03, 2019 at 02:05:47PM +0200, Christian König wrote:
>> Am 30.04.19 um 19:31 schrieb Russell King - ARM Linux admin:
>>> On Tue, Apr 30, 2019 at 01:10:02PM +0200, Christian König wrote:
 Add a structure for the parameters of dma_buf_attach, this makes it much 
 easier
 to add new parameters later on.
>>> I don't understand this reasoning.  What are the "new parameters" that
>>> are being proposed, and why do we need to put them into memory to pass
>>> them across this interface?
>>>
>>> If the intention is to make it easier to change the interface, passing
>>> parameters in this manner mean that it's easy for the interface to
>>> change and drivers not to notice the changes, since the compiler will
>>> not warn (unless some member of the structure that the driver is using
>>> gets removed, in which case it will error.)
>>>
>>> Additions to the structure will go unnoticed by drivers - what if the
>>> caller is expecting some different kind of behaviour, and the driver
>>> ignores that new addition?
>> Well, exactly that's the intention here: That the drivers using this
>> interface should be able to ignore the new additions for now as long as they
>> are not going to use them.
>>
>> The background is that we have multiple interface changes in the pipeline,
>> and each step requires new optional parameters.
>>
>>> This doesn't seem to me like a good idea.
>> Well, the obvious alternatives are:
>>
>> a) Change all drivers to explicitly provide NULL/0 for the new parameters.
>>
>> b) Use a wrapper, so that the function signature of dma_buf_attach stays the
>> same.
>>
>> Key point here is that I have an invalidation callback change, a P2P patch
>> set and some locking changes which all require adding new parameters or
>> flags. And at each step I would then start to change all drivers, adding
>> some more NULL pointers or flags with 0 default value.
>>
>> I'm actually perfectly fine going down any route, but this just seemed to me
>> simplest and with the least risk of breaking anything. Opinions?
> I think given all our discussions and plans the argument object makes tons
> of sense. Much easier to document well than a long list of parameters.
> Maybe we should make it const, so it could work like an ops/func table and
> we could store it as a pointer in the dma_buf_attachment?

Yeah, the invalidation callback and P2P flags are constant. But the 
importer_priv field isn't.

We could do something like adding the importer_priv field as parameter 
and the other two as const structure.

Third alternative would be to throw out all the DRM abstraction and just 
embed the attachment structure in the buffer object and get completely 
rid of the importer_priv field (probably the cleanest alternative, but 
also the most work todo).

Christian.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 1/3] drm: Add support for panic message output

2019-03-13 Thread Koenig, Christian
Am 13.03.19 um 18:33 schrieb Michel Dänzer:
> [SNIP]
>>> Copy how? Using a GPU engine?
>> CPU maybe? Though I suppose that won't work if the buffer isn't CPU
>> accesible :/
> Well we do have a debug path for accessing invisible memory with the
> CPU.
>
> E.g. three registers: DATA and auto increment OFFSET_LO/HI. So you can
> just read/write DATA over and over again if you want to access some
> memory.
 Right. I assume that'll be very slow, but I guess it could do when the
 memory isn't directly CPU accessible.
>>> Just made a quick test and reading 33423360 bytes (4096x2040x4) using
>>> that interfaces takes about 13 seconds.
>>>
>>> IIRC we don't use the auto increment optimization yet, so that can
>>> probably be improved by a factor of 3 or more.
> I'd assume only writes are needed, no reads.

I've played around with that for a moment and with a bit of optimization 
I can actually get about 20 MB/s write performance out of the debugging 
interface.

This way overwriting a 4K framebuffer would take less than 2 seconds 
using this.

It's not ideal, but I think for a panic screen perfectly reasonable.

Christian.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 1/3] drm: Add support for panic message output

2019-03-13 Thread Koenig, Christian
Am 13.03.19 um 17:16 schrieb Kazlauskas, Nicholas:
> On 3/13/19 11:54 AM, Christian König wrote:
>> Am 13.03.19 um 16:38 schrieb Michel Dänzer:
>>> On 2019-03-13 2:37 p.m., Christian König wrote:
 Am 13.03.19 um 14:31 schrieb Ville Syrjälä:
> On Wed, Mar 13, 2019 at 10:35:08AM +0100, Michel Dänzer wrote:
>> On 2019-03-12 6:15 p.m., Noralf Trønnes wrote:
>>> Den 12.03.2019 17.17, skrev Ville Syrjälä:
 On Tue, Mar 12, 2019 at 11:47:04AM +0100, Michel Dänzer wrote:
> On 2019-03-11 6:42 p.m., Noralf Trønnes wrote:
>> This adds support for outputting kernel messages on panic().
>> A kernel message dumper is used to dump the log. The dumper
>> iterates
>> over each DRM device and it's crtc's to find suitable
>> framebuffers.
>>
>> All the other dumpers are run before this one except mtdoops.
>> Only atomic drivers are supported.
>>
>> Signed-off-by: Noralf Trønnes 
>> ---
>>     [...]
>>
>> diff --git a/include/drm/drm_framebuffer.h
>> b/include/drm/drm_framebuffer.h
>> index f0b34c977ec5..f3274798ecfe 100644
>> --- a/include/drm/drm_framebuffer.h
>> +++ b/include/drm/drm_framebuffer.h
>> @@ -94,6 +94,44 @@ struct drm_framebuffer_funcs {
>>      struct drm_file *file_priv, unsigned flags,
>>      unsigned color, struct drm_clip_rect *clips,
>>      unsigned num_clips);
>> +
>> +    /**
>> + * @panic_vmap:
>> + *
>> + * Optional callback for panic handling.
>> + *
>> + * For vmapping the selected framebuffer in a panic context.
>> Must
>> + * be super careful about locking (only trylocking allowed).
>> + *
>> + * RETURNS:
>> + *
>> + * NULL if it didn't work out, otherwise an opaque cookie
>> which is
>> + * passed to @panic_draw_xy. It can be anything: vmap area,
>> structure
>> + * with more details, just a few flags, ...
>> + */
>> +    void *(*panic_vmap)(struct drm_framebuffer *fb);
> FWIW, the panic_vmap hook cannot work in general with the
> amdgpu/radeon
> drivers:
>
> Framebuffers are normally tiled, writing to them with the CPU
> results in
> garbled output.
>
>>> In which case the driver needs to support the ->panic_draw_xy
>>> callback,
>>> or maybe it's possible to make a generic helper for tiled buffers.
>> I'm afraid that won't help, at least not without porting big chunks of
>> https://gitlab.freedesktop.org/mesa/mesa/tree/master/src/amd/addrlib
>> into the kernel, none of which will be used for anything else.
>>
>>
> There would need to be a mechanism for switching scanout to a
> linear,
> CPU accessible framebuffer.
 I suppose panic_vmap() could just provide a linear temp buffer
 to the panic handler, and panic_unmap() could copy the contents
 over to the real fb.
>> Copy how? Using a GPU engine?
> CPU maybe? Though I suppose that won't work if the buffer isn't CPU
> accesible :/
 Well we do have a debug path for accessing invisible memory with the
 CPU.

 E.g. three registers: DATA and auto increment OFFSET_LO/HI. So you can
 just read/write DATA over and over again if you want to access some
 memory.
>>> Right. I assume that'll be very slow, but I guess it could do when the
>>> memory isn't directly CPU accessible.
>> Just made a quick test and reading 33423360 bytes (4096x2040x4) using
>> that interfaces takes about 13 seconds.
>>
>> IIRC we don't use the auto increment optimization yet, so that can
>> probably be improved by a factor of 3 or more.
>>
 But turning of tilling etc is still extremely tricky when the system is
 already unstable.
>>> Maybe we could add a little hook to the display code, which just
>>> disables tiling for scanout and maybe disables non-primary planes, but
>>> doesn't touch anything else. Harry / Nicholas, does that seem feasible?
>>>
>>>
>>> I'm coming around from "this is never going to work" to "it might
>>> actually work" with our hardware...
>> Yeah, agree. It's a bit tricky, but doable.
> A "disable_tiling" hook or something along those lines could work for
> display. It's a little bit non trivial when you want to start dealing
> with locking and any active DRM commits, but we have a global lock
> around all our hardware programming anyway that makes that easier to
> deal with.
>
> I think we can just re-commit and update the existing hardware state
> with only the tiling info for every plane reset to off. For most buffers
> I don't think we'd have to really consider changing anything else here
> as long as you respect the current FB si

Re: [Intel-gfx] [PATCH 0/6] drm/drv: Remove drm_dev_unplug()

2019-02-04 Thread Koenig, Christian
Adding Andrey who looked into cleaning this up a while ago as well.

Christian.

Am 03.02.19 um 16:41 schrieb Noralf Trønnes:
> This series removes drm_dev_unplug() and moves the unplugged state
> setting to drm_dev_unregister(). All drivers will now have access to the
> unplugged state if they so desire.
>
> The drm_device ref handling wrt to the last fd closed after unregister
> have been simplified, which also fixed a double drm_dev_unregister()
> situation.
>
> Noralf.
>
> Noralf Trønnes (6):
>drm: Fix drm_release() and device unplug
>drm/drv: Prepare to remove drm_dev_unplug()
>drm/amd: Use drm_dev_unregister()
>drm/udl: Use drm_dev_unregister()
>drm/xen: Use drm_dev_unregister()
>drm/drv: Remove drm_dev_unplug()
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 +-
>   drivers/gpu/drm/drm_drv.c   | 48 -
>   drivers/gpu/drm/drm_file.c  |  6 ++--
>   drivers/gpu/drm/udl/udl_drv.c   |  3 +-
>   drivers/gpu/drm/xen/xen_drm_front.c |  7 ++--
>   include/drm/drm_drv.h   | 11 +++---
>   6 files changed, 27 insertions(+), 51 deletions(-)
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Enhance dma-fence tracing

2019-01-22 Thread Koenig, Christian
Am 22.01.19 um 00:20 schrieb Chris Wilson:
> Rather than every backend and GPU driver reinventing the same wheel for
> user level debugging of HW execution, the common dma-fence framework
> should include the tracing infrastructure required for most client API
> level flow visualisation.
>
> With these common dma-fence level tracepoints, the userspace tools can
> establish a detailed view of the client <-> HW flow across different
> kernels. There is a strong ask to have this available, so that the
> userspace developer can effectively assess if they're doing a good job
> about feeding the beast of a GPU hardware.
>
> In the case of needing to look into more fine-grained details of how
> kernel internals work towards the goal of feeding the beast, the tools
> may optionally amend the dma-fence tracing information with the driver
> implementation specific. But for such cases, the tools should have a
> graceful degradation in case the expected extra tracepoints have
> changed or their format differs from the expected, as the kernel
> implementation internals are not expected to stay the same.
>
> It is important to distinguish between tracing for the purpose of client
> flow visualisation and tracing for the purpose of low-level kernel
> debugging. The latter is highly implementation specific, tied to
> a particular HW and driver, whereas the former addresses a common goal
> of user level tracing and likely a common set of userspace tools.
> Having made the distinction that these tracepoints will be consumed for
> client API tooling, we raise the spectre of tracepoint ABI stability. It
> is hoped that by defining a common set of dma-fence tracepoints, we avoid
> the pitfall of exposing low level details and so restrict ourselves only
> to the high level flow that is applicable to all drivers and hardware.
> Thus the reserved guarantee that this set of tracepoints will be stable
> (with the emphasis on depicting client <-> HW flow as opposed to
> driver <-> HW).
>
> In terms of specific changes to the dma-fence tracing, we remove the
> emission of the strings for every tracepoint (reserving them for
> dma_fence_init for cases where they have unique dma_fence_ops, and
> preferring to have descriptors for the whole fence context). strings do
> not pack as well into the ftrace ringbuffer and we would prefer to
> reduce the amount of indirect callbacks required for frequent tracepoint
> emission.
>
> Signed-off-by: Chris Wilson 
> Cc: Joonas Lahtinen 
> Cc: Tvrtko Ursulin 
> Cc: Alex Deucher 
> Cc: "Christian König" 
> Cc: Eric Anholt 
> Cc: Pierre-Loup Griffais 
> Cc: Michael Sartain 
> Cc: Steven Rostedt 

In general yes please! If possible please separate out the changes to 
the common dma_fence infrastructure from the i915 changes.

One thing I'm wondering is why the enable_signaling trace point doesn't 
need to be exported any more. Is that only used internally in the common 
infrastructure?

Apart from that I'm on sick leave today, so give me at least a few days 
to recover and take a closer look.

Thanks,
Christian.

> ---
>   drivers/dma-buf/dma-fence.c |   9 +-
>   drivers/gpu/drm/i915/i915_gem_clflush.c |   5 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c  |   1 -
>   drivers/gpu/drm/i915/i915_request.c |  16 +-
>   drivers/gpu/drm/i915/i915_timeline.c|   5 +
>   drivers/gpu/drm/i915/i915_trace.h   | 134 ---
>   drivers/gpu/drm/i915/intel_guc_submission.c |  10 ++
>   drivers/gpu/drm/i915/intel_lrc.c|   6 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>   include/trace/events/dma_fence.h| 177 +++-
>   10 files changed, 214 insertions(+), 151 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 3aa8733f832a..5c93ed34b1ff 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -27,8 +27,15 @@
>   #define CREATE_TRACE_POINTS
>   #include 
>   
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_context_create);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_context_destroy);
> +
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_await);
>   EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
> -EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_execute_start);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_execute_end);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_wait_start);
> +EXPORT_TRACEPOINT_SYMBOL(dma_fence_wait_end);
>   
>   static DEFINE_SPINLOCK(dma_fence_stub_lock);
>   static struct dma_fence dma_fence_stub;
> diff --git a/drivers/gpu/drm/i915/i915_gem_clflush.c 
> b/drivers/gpu/drm/i915/i915_gem_clflush.c
> index 8e74c23cbd91..435c1303ecc8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_clflush.c
> +++ b/drivers/gpu/drm/i915/i915_gem_clflush.c
> @@ -22,6 +22,8 @@
>*
>*/
>   
> +#include 
> +
>   #include "i915_drv.h"
>   #include "intel_frontbuffer.h"
>   #include "i915_gem_clflush.h"
> @@ -73,6 +75,7 @@ static void i915_clflush_work(

Re: [Intel-gfx] [PATCH 06/10] drm/syncobj: use the timeline point in drm_syncobj_find_fence v3

2018-12-13 Thread Koenig, Christian
Am 13.12.18 um 18:26 schrieb Daniel Vetter:
>>> Code sharing just because the code looks similar is imo a really
>>> bad idea, when the semantics are entirely different (that was also the
>>> reason behind not reusing all the cpu event stuff for dma_fence, they're
>>> not normal cpu events).
>> Ok, the last sentence is what I don't understand.
>>
>> What exactly is the semantic difference between the dma_fence_wait and
>> the wait_event interface?
>>
>> I mean the wait_event interface was introduced to prevent drivers from
>> openly coding an event interface and getting it wrong all the time.
>>
>> So a good part of the bugs we have seen around waiting for dma-fences
>> are exactly why wait_event was invented in the first place.
>>
>> The only big thing I can see missing in the wait_event interface is
>> waiting for many events at the same time, but that should be a rather
>> easy addition.
> So this bikeshed was years ago, maybe I should type a patch to
> document it, but as far as I remember the big difference is:
>
> - wait_event and friends generally Just Work. It can go wrong of
> course, but the usual pattern is that the waker-side does and
> uncoditional wake_up_all, and hence all the waiter needs to do is add
> themselves to the waiter list.
>
> - dma_buf otoh is entirely different: We wanted to support all kinds
> fo signalling modes, including having interrupts disabled by default
> (not sure whether we actually achieve this still with all the cpu-side
> scheduling the big drivers do). Which means the waker does not
> unconditionally call wake_up_all, at least not timeline, and waiters
> need to call dma_fence_enable_signalling before they can add
> themselves to the waiter list and call schedule().

Well that is not something I'm questioning because we really need this 
behavior as well.

But all of this can be perfectly implemented on top of wake_up_all.

> The other bit difference is how you check for the classic wakeup races
> where the event happens between when you checked for it and when you
> go to sleep. Because hw is involved, the rules are again a bit
> different, and their different between drivers because hw is
> incoherent/broken in all kinds of ways. So there's also really tricky
> things going on between adding the waiter to the waiter list and
> dma_fence_enable_signalling. For pure cpu events you can ignore this
> and bake the few necessary barriers into the various macros, dma_fence
> needs more.

Ah, yes I think I know what you mean with that and I also consider this 
a bad idea as well.

Only very few drivers actually need this behavior and the ones who do 
should be perfectly able to implement this inside the driver code.

The crux is that leaking this behavior into the dma-fence made it 
unnecessary complicated and result in quite a bunch of unnecessary 
irq_work and delayed_work usage.

I will take a look at this over the holidays. Shouldn't be to hard to 
fix and actually has some value additional to being just a nice cleanup.

Regards,
Christian.

>
> Adding Maarten, maybe there was more. I definitely remember huge&very
> long discussions about all this.
> -Daniel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 06/10] drm/syncobj: use the timeline point in drm_syncobj_find_fence v3

2018-12-13 Thread Koenig, Christian
Am 13.12.18 um 17:01 schrieb Daniel Vetter:
> On Thu, Dec 13, 2018 at 12:24:57PM +0000, Koenig, Christian wrote:
>> Am 13.12.18 um 13:21 schrieb Chris Wilson:
>>> Quoting Koenig, Christian (2018-12-13 12:11:10)
>>>> Am 13.12.18 um 12:37 schrieb Chris Wilson:
>>>>> Quoting Chunming Zhou (2018-12-11 10:34:45)
>>>>>> From: Christian König 
>>>>>>
>>>>>> Implement finding the right timeline point in drm_syncobj_find_fence.
>>>>>>
>>>>>> v2: return -EINVAL when the point is not submitted yet.
>>>>>> v3: fix reference counting bug, add flags handling as well
>>>>>>
>>>>>> Signed-off-by: Christian König 
>>>>>> ---
>>>>>> drivers/gpu/drm/drm_syncobj.c | 43 
>>>>>> ---
>>>>>> 1 file changed, 40 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>>> index 76ce13dafc4d..d964b348ecba 100644
>>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>>> @@ -231,16 +231,53 @@ int drm_syncobj_find_fence(struct drm_file 
>>>>>> *file_private,
>>>>>>   struct dma_fence **fence)
>>>>>> {
>>>>>>struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
>>>>>> handle);
>>>>>> -   int ret = 0;
>>>>>> +   struct syncobj_wait_entry wait;
>>>>>> +   int ret;
>>>>>> 
>>>>>>if (!syncobj)
>>>>>>return -ENOENT;
>>>>>> 
>>>>>>*fence = drm_syncobj_fence_get(syncobj);
>>>>>> -   if (!*fence) {
>>>>>> +   drm_syncobj_put(syncobj);
>>>>>> +
>>>>>> +   if (*fence) {
>>>>>> +   ret = dma_fence_chain_find_seqno(fence, point);
>>>>>> +   if (!ret)
>>>>>> +   return 0;
>>>>>> +   dma_fence_put(*fence);
>>>>>> +   } else {
>>>>>>ret = -EINVAL;
>>>>>>}
>>>>>> -   drm_syncobj_put(syncobj);
>>>>>> +
>>>>>> +   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
>>>>>> +   return ret;
>>>>>> +
>>>>>> +   memset(&wait, 0, sizeof(wait));
>>>>>> +   wait.task = current;
>>>>>> +   wait.point = point;
>>>>>> +   drm_syncobj_fence_add_wait(syncobj, &wait);
>>>>>> +
>>>>>> +   do {
>>>>>> +   set_current_state(TASK_INTERRUPTIBLE);
>>>>>> +   if (wait.fence) {
>>>>>> +   ret = 0;
>>>>>> +   break;
>>>>>> +   }
>>>>>> +
>>>>>> +   if (signal_pending(current)) {
>>>>>> +   ret = -ERESTARTSYS;
>>>>>> +   break;
>>>>>> +   }
>>>>>> +
>>>>>> +   schedule();
>>>>>> +   } while (1);
>>>>> I've previously used a dma_fence_proxy so that we could do nonblocking
>>>>> waits on future submits. That would be preferrable (a requirement for
>>>>> our stupid BKL-driven code).
>>>> That is exactly what I would definitely NAK.
>>>>
>>>> I would rather say we should come up with a wait_multiple_events() macro
>>>> and completely nuke the custom implementation of this in:
>>>> 1. dma_fence_default_wait and dma_fence_wait_any_timeout
>>>> 2. the radeon fence implementation
>>>> 3. the nouveau fence implementation
>>>> 4. the syncobj code
>>>>
>>>> Cause all of them do exactly the same. The dma_fence implementation
>>>> unfortunately came up with a custom event handling mechanism instead of
>>>> extending the core Linux wait_event() system.
>>> I don't want a blocking wait at al

Re: [Intel-gfx] [PATCH 06/10] drm/syncobj: use the timeline point in drm_syncobj_find_fence v3

2018-12-13 Thread Koenig, Christian
Am 13.12.18 um 13:21 schrieb Chris Wilson:
> Quoting Koenig, Christian (2018-12-13 12:11:10)
>> Am 13.12.18 um 12:37 schrieb Chris Wilson:
>>> Quoting Chunming Zhou (2018-12-11 10:34:45)
>>>> From: Christian König 
>>>>
>>>> Implement finding the right timeline point in drm_syncobj_find_fence.
>>>>
>>>> v2: return -EINVAL when the point is not submitted yet.
>>>> v3: fix reference counting bug, add flags handling as well
>>>>
>>>> Signed-off-by: Christian König 
>>>> ---
>>>>drivers/gpu/drm/drm_syncobj.c | 43 ---
>>>>1 file changed, 40 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>>>> index 76ce13dafc4d..d964b348ecba 100644
>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>> @@ -231,16 +231,53 @@ int drm_syncobj_find_fence(struct drm_file 
>>>> *file_private,
>>>>  struct dma_fence **fence)
>>>>{
>>>>   struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
>>>> handle);
>>>> -   int ret = 0;
>>>> +   struct syncobj_wait_entry wait;
>>>> +   int ret;
>>>>
>>>>   if (!syncobj)
>>>>   return -ENOENT;
>>>>
>>>>   *fence = drm_syncobj_fence_get(syncobj);
>>>> -   if (!*fence) {
>>>> +   drm_syncobj_put(syncobj);
>>>> +
>>>> +   if (*fence) {
>>>> +   ret = dma_fence_chain_find_seqno(fence, point);
>>>> +   if (!ret)
>>>> +   return 0;
>>>> +   dma_fence_put(*fence);
>>>> +   } else {
>>>>   ret = -EINVAL;
>>>>   }
>>>> -   drm_syncobj_put(syncobj);
>>>> +
>>>> +   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
>>>> +   return ret;
>>>> +
>>>> +   memset(&wait, 0, sizeof(wait));
>>>> +   wait.task = current;
>>>> +   wait.point = point;
>>>> +   drm_syncobj_fence_add_wait(syncobj, &wait);
>>>> +
>>>> +   do {
>>>> +   set_current_state(TASK_INTERRUPTIBLE);
>>>> +   if (wait.fence) {
>>>> +   ret = 0;
>>>> +   break;
>>>> +   }
>>>> +
>>>> +   if (signal_pending(current)) {
>>>> +   ret = -ERESTARTSYS;
>>>> +   break;
>>>> +   }
>>>> +
>>>> +   schedule();
>>>> +   } while (1);
>>> I've previously used a dma_fence_proxy so that we could do nonblocking
>>> waits on future submits. That would be preferrable (a requirement for
>>> our stupid BKL-driven code).
>> That is exactly what I would definitely NAK.
>>
>> I would rather say we should come up with a wait_multiple_events() macro
>> and completely nuke the custom implementation of this in:
>> 1. dma_fence_default_wait and dma_fence_wait_any_timeout
>> 2. the radeon fence implementation
>> 3. the nouveau fence implementation
>> 4. the syncobj code
>>
>> Cause all of them do exactly the same. The dma_fence implementation
>> unfortunately came up with a custom event handling mechanism instead of
>> extending the core Linux wait_event() system.
> I don't want a blocking wait at all.

Ok I wasn't clear enough :) That is exactly what I would NAK!

The wait must be blocking or otherwise you would allow wait-before-signal.

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 06/10] drm/syncobj: use the timeline point in drm_syncobj_find_fence v3

2018-12-13 Thread Koenig, Christian
Am 13.12.18 um 12:37 schrieb Chris Wilson:
> Quoting Chunming Zhou (2018-12-11 10:34:45)
>> From: Christian König 
>>
>> Implement finding the right timeline point in drm_syncobj_find_fence.
>>
>> v2: return -EINVAL when the point is not submitted yet.
>> v3: fix reference counting bug, add flags handling as well
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/drm_syncobj.c | 43 ---
>>   1 file changed, 40 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index 76ce13dafc4d..d964b348ecba 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -231,16 +231,53 @@ int drm_syncobj_find_fence(struct drm_file 
>> *file_private,
>> struct dma_fence **fence)
>>   {
>>  struct drm_syncobj *syncobj = drm_syncobj_find(file_private, 
>> handle);
>> -   int ret = 0;
>> +   struct syncobj_wait_entry wait;
>> +   int ret;
>>   
>>  if (!syncobj)
>>  return -ENOENT;
>>   
>>  *fence = drm_syncobj_fence_get(syncobj);
>> -   if (!*fence) {
>> +   drm_syncobj_put(syncobj);
>> +
>> +   if (*fence) {
>> +   ret = dma_fence_chain_find_seqno(fence, point);
>> +   if (!ret)
>> +   return 0;
>> +   dma_fence_put(*fence);
>> +   } else {
>>  ret = -EINVAL;
>>  }
>> -   drm_syncobj_put(syncobj);
>> +
>> +   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
>> +   return ret;
>> +
>> +   memset(&wait, 0, sizeof(wait));
>> +   wait.task = current;
>> +   wait.point = point;
>> +   drm_syncobj_fence_add_wait(syncobj, &wait);
>> +
>> +   do {
>> +   set_current_state(TASK_INTERRUPTIBLE);
>> +   if (wait.fence) {
>> +   ret = 0;
>> +   break;
>> +   }
>> +
>> +   if (signal_pending(current)) {
>> +   ret = -ERESTARTSYS;
>> +   break;
>> +   }
>> +
>> +   schedule();
>> +   } while (1);
> I've previously used a dma_fence_proxy so that we could do nonblocking
> waits on future submits. That would be preferrable (a requirement for
> our stupid BKL-driven code).

That is exactly what I would definitely NAK.

I would rather say we should come up with a wait_multiple_events() macro 
and completely nuke the custom implementation of this in:
1. dma_fence_default_wait and dma_fence_wait_any_timeout
2. the radeon fence implementation
3. the nouveau fence implementation
4. the syncobj code

Cause all of them do exactly the same. The dma_fence implementation 
unfortunately came up with a custom event handling mechanism instead of 
extending the core Linux wait_event() system.

This in turn lead to a lot of this duplicated handling.

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new drm_syncobj_add_point interface v2

2018-12-12 Thread Koenig, Christian
> Key point is that our Vulcan guys came back and said that this
> wouldn't be sufficient, but I honestly don't fully understand why.
> Hm, sounds like we really need those testscases (vk cts on top of mesa, igt)
> so we can talk about the exact corner cases we care about and why.
Yes, that's why I made it mandatory that David provides an igt test case 
along the ones in libdrm.

> I guess one thing that might happen is that userspace leaves out a number
> and never sets that fence, relying on the >= semantics of the monitored
> fence to unblock that thread. E.g. when skipping a frame in one of the
> auxiliary workloads. For that case we'd need to make sure we don't just wait
> for the given fence to materialize, but also any fences later in the timeline.
Correct and that's also how we have implemented it.

> But we can't decide that without understanding the actual use-case that
> needs to be supported at the other end of the stack, and how all the bits in
> between should look like.
>
> I guess we're back to "uapi design without userspace doesn't make sense" ...
Yeah, well chicken and egg problem. Amdvlk probably won't make the code 
to support this public until the kernel has accepted it and the kernel 
doesn't accept it until the amdvlk patches are public.

David can you take care of this and release the userspace patches as well?

Additional to that except for a bit polishing the UAPI stayed the same 
from the very beginning while being reviewed multiple times now. So that 
seems to be rather sane.

> That seems against the spirit of vulkan, which is very much about "you get all
> the pieces". It also might dig us a hole in the future, if we ever get around 
> to
> moving towards a WDDM2 style memory management model. For future
> proving I think it would make sense if we implement the minimal uapi we
> need for vk timelines, not the strictest guarantees we can get away with
> (without performance impact) with current drivers.
Well I'm repeating myself, but while this seems to be a good idea for an 
userspace API it is not necessary for a kernel API.

In other words userspace can do all the mess it wants in as long as it 
stays inside the same process, but when it starts to mess with 
inter-process communication (e.g. X or Wayland) the stuff should be 
water prove and not allow for mess to leak between processes.

And what we can always do is to make the restriction more lose, but 
tightening it when userspace already depends on a behavior is not 
possible any more.

Regards,
Christian.

Am 12.12.18 um 12:39 schrieb Zhou, David(ChunMing):
> + Daniel Rakos and Jason Ekstrand.
>
>   Below is the background, which is from Daniel R should  be able to explain 
> that's why:
> " ISVs, especially those coming from D3D12, are unsatisfied with the behavior 
> of the Vulkan semaphores as they are unhappy with the fact that for every 
> single dependency they need to use separate semaphores due to their binary 
> nature.
> Compared to that a synchronization primitive like D3D12 monitored fences 
> enable one of those to be used to track a sequence of operations by simply 
> associating timeline values to the completion of individual operations. This 
> allows them to track the lifetime and usage of resources and the ordered 
> completion of sequences.
> Besides that, they also want to use a single synchronization primitive to be 
> able to handle GPU-to-GPU and GPU-to-CPU dependencies, compared to using 
> semaphores for the former and fences for the latter.
> In addition, compared to legacy semaphores, timeline semaphores are proposed 
> to support wait-before-signal, i.e. allow enqueueing a semaphore wait 
> operation with a wait value that is larger than any of the already enqueued 
> signal values. This seems to be a hard requirement for ISVs. Without UMD-side 
> queue batching, and even UMD-side queue batching doesn’t help the situation 
> when such a semaphore is externally shared with another API. Thus in order to 
> properly support wait-before-signal the KMD implementation has to also be 
> able to support such dependencies.
> "
>
> Btw, we already add test case to igt, and tested by many existing test, like 
> libdrm unit test, igt related test, vulkan cts, and steam games.
>
> -David
>> -Original Message-
>> From: Daniel Vetter 
>> Sent: Wednesday, December 12, 2018 7:15 PM
>> To: Koenig, Christian 
>> Cc: Zhou, David(ChunMing) ; dri-devel > de...@lists.freedesktop.org>; amd-gfx list ;
>> intel-gfx ; Christian König
>> 
>> Subject: Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new
>> drm_syncobj_add_point interface v2
>>
>> On Wed, Dec 12, 2018 at 12:08 PM Koenig, Christian

Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new drm_syncobj_add_point interface v2

2018-12-12 Thread Koenig, Christian
Am 12.12.18 um 11:49 schrieb Daniel Vetter:
> On Fri, Dec 07, 2018 at 11:54:15PM +0800, Chunming Zhou wrote:
>> From: Christian König 
>>
>> Use the dma_fence_chain object to create a timeline of fence objects
>> instead of just replacing the existing fence.
>>
>> v2: rebase and cleanup
>>
>> Signed-off-by: Christian König 
> Somewhat jumping back into this. Not sure we discussed this already or
> not. I'm a bit unclear on why we have to chain the fences in the timeline:
>
> - The timeline stuff is modelled after the WDDM2 monitored fences. Which
>really are just u64 counters in memory somewhere (I think could be
>system ram or vram). Because WDDM2 has the memory management entirely
>separated from rendering synchronization it totally allows userspace to
>create loops and deadlocks and everything else nasty using this - the
>memory manager won't deadlock because these monitored fences never leak
>into the buffer manager. And if CS deadlock, gpu reset takes care of the
>mess.
>
> - This has a few consequences, as in they seem to indeed work like a
>memory location: Userspace incrementing out-of-order (because they run
>batches updating the same fence on different engines) is totally fine,
>as is doing anything else "stupid".
>
> - Now on linux we can't allow anything, because we need to make sure that
>deadlocks don't leak into the memory manager. But as long as we block
>until the underlying dma_fence has materialized, nothing userspace can
>do will lead to such a deadlock. Even if userspace ends up submitting
>jobs without enough built-in synchronization, leading to out-of-order
>signalling of fences on that "timeline". And I don't think that would
>pose a problem for us.
>
> Essentially I think we can look at timeline syncobj as a dma_fence
> container indexed through an integer, and there's no need to enforce that
> the timline works like a real dma_fence timeline, with all it's
> guarantees. It's just a pile of (possibly, if userspace is stupid)
> unrelated dma_fences. You could implement the entire thing in userspace
> after all, except for the "we want to share these timeline objects between
> processes" problem.
>
> tldr; I think we can drop the dma_fence_chain complexity completely. Or at
> least I'm not really understanding why it's needed.
>
> Of course that means drivers cannot treat a drm_syncobj timeline as a
> dma_fence timeline. But given the future fences stuff and all that, that's
> already out of the window anyway.
>
> What am I missing?

Good question, since that was exactly my initial idea as well.

Key point is that our Vulcan guys came back and said that this wouldn't 
be sufficient, but I honestly don't fully understand why.

Anyway that's why David came up with using the fence array to wait for 
all previously added fences, which I then later on extended into this 
chain container.

I have to admit that it is way more defensive implemented this way. E.g. 
there is much fewer things userspace can do wrong.

The principal idea is that when they mess things up they are always 
going to wait more than necessary, but never less.

Christian.

> -Daniel
>
>> ---
>>   drivers/gpu/drm/drm_syncobj.c | 37 +++
>>   include/drm/drm_syncobj.h |  5 +
>>   2 files changed, 42 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index e19525af0cce..51f798e2194f 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -122,6 +122,43 @@ static void drm_syncobj_remove_wait(struct drm_syncobj 
>> *syncobj,
>>  spin_unlock(&syncobj->lock);
>>   }
>>   
>> +/**
>> + * drm_syncobj_add_point - add new timeline point to the syncobj
>> + * @syncobj: sync object to add timeline point do
>> + * @chain: chain node to use to add the point
>> + * @fence: fence to encapsulate in the chain node
>> + * @point: sequence number to use for the point
>> + *
>> + * Add the chain node as new timeline point to the syncobj.
>> + */
>> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
>> +   struct dma_fence_chain *chain,
>> +   struct dma_fence *fence,
>> +   uint64_t point)
>> +{
>> +struct syncobj_wait_entry *cur, *tmp;
>> +struct dma_fence *prev;
>> +
>> +dma_fence_get(fence);
>> +
>> +spin_lock(&syncobj->lock);
>> +
>> +prev = rcu_dereference_protected(syncobj->fence,
>> + lockdep_is_held(&syncobj->lock));
>> +dma_fence_chain_init(chain, prev, fence, point);
>> +rcu_assign_pointer(syncobj->fence, &chain->base);
>> +
>> +list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
>> +list_del_init(&cur->node);
>> +syncobj_wait_syncobj_func(syncobj, cur);
>> +}
>> +spin_unlock(&syncobj->lock);
>> +
>> +/* Walk the chain once to trigger garbage collection */
>> +dma_

Re: [Intel-gfx] [PATCH 1/4] mm: Check if mmu notifier callbacks are allowed to fail

2018-12-10 Thread Koenig, Christian
Patches #1 and #3 are Reviewed-by: Christian König 


Patch #2 is Acked-by: Christian König  because 
I can't judge if adding the counter in the thread structure is actually 
a good idea.

In patch #4 I honestly don't understand at all how this stuff works, so 
no-comment from my side on this.

Christian.

Am 10.12.18 um 11:36 schrieb Daniel Vetter:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
>
> Inspired by some confusion we had discussing i915 mmu notifiers and
> whether we could use the newly-introduced return value to handle some
> corner cases. Until we realized that these are only for when a task
> has been killed by the oom reaper.
>
> An alternative approach would be to split the callback into two
> versions, one with the int return value, and the other with void
> return value like in older kernels. But that's a lot more churn for
> fairly little gain I think.
>
> Summary from the m-l discussion on why we want something at warning
> level: This allows automated tooling in CI to catch bugs without
> humans having to look at everything. If we just upgrade the existing
> pr_info to a pr_warn, then we'll have false positives. And as-is, no
> one will ever spot the problem since it's lost in the massive amounts
> of overall dmesg noise.
>
> v2: Drop the full WARN_ON backtrace in favour of just a pr_warn for
> the problematic case (Michal Hocko).
>
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: "Christian König" 
> Cc: David Rientjes 
> Cc: Daniel Vetter 
> Cc: "Jérôme Glisse" 
> Cc: linux...@kvack.org
> Cc: Paolo Bonzini 
> Signed-off-by: Daniel Vetter 
> ---
>   mm/mmu_notifier.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..ccc22f21b735 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,9 @@ int __mmu_notifier_invalidate_range_start(struct 
> mm_struct *mm,
>   pr_info("%pS callback failed with %d in 
> %sblockable context.\n",
>   
> mn->ops->invalidate_range_start, _ret,
>   !blockable ? "non-" : "");
> + if (blockable)
> + pr_warn("%pS callback failure not 
> allowed\n",
> + 
> mn->ops->invalidate_range_start);
>   ret = _ret;
>   }
>   }

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for igt: add timeline test cases (rev2)

2018-12-07 Thread Koenig, Christian
Am 07.12.18 um 14:58 schrieb Daniel Vetter:
> On Fri, Dec 7, 2018 at 11:29 AM Chris Wilson  wrote:
>> Quoting Patchwork (2018-12-07 10:27:46)
>>> == Series Details ==
>>>
>>> Series: igt: add timeline test cases (rev2)
>>> URL   : https://patchwork.freedesktop.org/series/53743/
>>> State : failure
>>>
>>> == Summary ==
>>>
>>> CI Bug Log - changes from CI_DRM_5281 -> IGTPW_2133
>>> 
>>>
>>> Summary
>>> ---
>>>
>>>**FAILURE**
>>>
>>>Serious unknown changes coming with IGTPW_2133 absolutely need to be
>>>verified manually.
>>>
>>>If you think the reported changes have nothing to do with the changes
>>>introduced in IGTPW_2133, please notify your bug team to allow them
>>>to document this new failure mode, which will reduce false positives in 
>>> CI.
>>>
>>>External URL: 
>>> https://patchwork.freedesktop.org/api/1.0/series/53743/revisions/2/mbox/
>>>
>>> Possible new issues
>>> ---
>>>
>>>Here are the unknown changes that may have been introduced in IGTPW_2133:
>>>
>>> ### IGT changes ###
>>>
>>>  Possible regressions 
>>>
>>>* igt@amdgpu/amd_basic@userptr:
>>>  - fi-kbl-8809g:   PASS -> DMESG-WARN
>> What fortuitous timing! Maybe you would like to take a stab at the
>> use-after-free in amdgpu's mmu_notifier.
> Adding Christian König.

Philip Yang is already working on this. We want to replace the MMU notifier 
with HMM as soon as possible.

Christian.

> -Daniel

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Compile fix for 64b dma-fence seqno

2018-12-07 Thread Koenig, Christian
Am 07.12.18 um 13:34 schrieb Mika Kuoppala:
> Many errs of the form:
> drivers/gpu/drm/i915/selftests/intel_hangcheck.c: In function 
> ‘__igt_reset_evict_vma’:
> ./include/linux/kern_levels.h:5:18: error: format ‘%x’ expects argument of 
> type ‘unsigned int’, but argum
>
> Fixes: b312d8ca3a7c ("dma-buf: make fence sequence numbers 64 bit v2")
> Cc: Christian König 
> Cc: Chunming Zhou 
> Cc: Chris Wilson 
> Cc: Joonas Lahtinen 
> Signed-off-by: Mika Kuoppala 

Ah, crap! No I see my mistake.

I searched for dereferences of a fence object, but in this case the 
fence object is embedded in a parent object.

Patch is Acked-by: Christian König , but there 
are probably a couple of more cases like this I missed.

Christian.

> ---
>   drivers/gpu/drm/i915/i915_gem.c  |  4 ++--
>   drivers/gpu/drm/i915/i915_gem_context.c  |  8 
>   drivers/gpu/drm/i915/i915_request.c  | 12 ++--
>   drivers/gpu/drm/i915/intel_lrc.c |  6 +++---
>   drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 14 +++---
>   5 files changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d36a9755ad91..649847b87e41 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3187,7 +3187,7 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
>*/
>   
>   if (i915_request_completed(request)) {
> - GEM_TRACE("%s pardoned global=%d (fence %llx:%d), current %d\n",
> + GEM_TRACE("%s pardoned global=%d (fence %llx:%lld), current 
> %d\n",
> engine->name, request->global_seqno,
> request->fence.context, request->fence.seqno,
> intel_engine_get_seqno(engine));
> @@ -3311,7 +3311,7 @@ static void nop_submit_request(struct i915_request 
> *request)
>   {
>   unsigned long flags;
>   
> - GEM_TRACE("%s fence %llx:%d -> -EIO\n",
> + GEM_TRACE("%s fence %llx:%lld -> -EIO\n",
> request->engine->name,
> request->fence.context, request->fence.seqno);
>   dma_fence_set_error(&request->fence, -EIO);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> b/drivers/gpu/drm/i915/i915_gem_context.c
> index 371c07087095..4ec386950f75 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -649,7 +649,7 @@ last_request_on_engine(struct i915_timeline *timeline,
>   rq = i915_gem_active_raw(&timeline->last_request,
>&engine->i915->drm.struct_mutex);
>   if (rq && rq->engine == engine) {
> - GEM_TRACE("last request for %s on engine %s: %llx:%d\n",
> + GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
> timeline->name, engine->name,
> rq->fence.context, rq->fence.seqno);
>   GEM_BUG_ON(rq->timeline != timeline);
> @@ -686,14 +686,14 @@ static bool engine_has_kernel_context_barrier(struct 
> intel_engine_cs *engine)
>* switch-to-kernel-context?
>*/
>   if (!i915_timeline_sync_is_later(barrier, &rq->fence)) {
> - GEM_TRACE("%s needs barrier for %llx:%d\n",
> + GEM_TRACE("%s needs barrier for %llx:%lld\n",
> ring->timeline->name,
> rq->fence.context,
> rq->fence.seqno);
>   return false;
>   }
>   
> - GEM_TRACE("%s has barrier after %llx:%d\n",
> + GEM_TRACE("%s has barrier after %llx:%lld\n",
> ring->timeline->name,
> rq->fence.context,
> rq->fence.seqno);
> @@ -749,7 +749,7 @@ int i915_gem_switch_to_kernel_context(struct 
> drm_i915_private *i915)
>   if (prev->gem_context == i915->kernel_context)
>   continue;
>   
> - GEM_TRACE("add barrier on %s for %llx:%d\n",
> + GEM_TRACE("add barrier on %s for %llx:%lld\n",
> engine->name,
> prev->fence.context,
> prev->fence.seqno);
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index ca95ab2f4cfa..cefefc11d922 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -270,7 +270,7 @@ static void free_capture_list(struct i915_request 
> *request)
>   static void __retire_engine_request(struct intel_engine_cs *engine,
>   struct i915_request *rq)
>   {
> - GEM_TRACE("%s(%s) fence %llx:%d, global=%d, current %d\n",
> + GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
> _

Re: [Intel-gfx] linux-next: build failure after merge of the drm-misc tree

2018-12-07 Thread Koenig, Christian
Hi Stephen,

yeah, that is a known problem. I missed the change during rebase of the 
revert.

Please see patch "2312f9842854 drm/v3d: fix broken build" which is 
already in drm-misc-next and fixes the issue.

Christian.

Am 06.12.18 um 03:32 schrieb Stephen Rothwell:
> Hi all,
>
> After merging the drm-misc tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/gpu/drm/v3d/v3d_gem.c: In function 'v3d_submit_tfu_ioctl':
> drivers/gpu/drm/v3d/v3d_gem.c:719:3: error: too many arguments to function 
> 'drm_syncobj_replace_fence'
> drm_syncobj_replace_fence(sync_out, 0, sched_done_fence);
> ^
> In file included from drivers/gpu/drm/v3d/v3d_gem.c:5:
> include/drm/drm_syncobj.h:134:6: note: declared here
>   void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
>^
>
> Caused by commit
>
>0b258ed1a219 ("drm: revert "expand replace_fence to support timeline point 
> v2"")
>
> interacting with commit
>
>1584f16ca96e ("drm/v3d: Add support for submitting jobs to the TFU")
>
> I have used the drm-misc tree from next-20181205 for today.
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/selftests: Compile fix for 64b dma-fence seqno

2018-12-07 Thread Koenig, Christian
Am 07.12.18 um 13:22 schrieb Chris Wilson:
> Many errs of the form:
> drivers/gpu/drm/i915/selftests/intel_hangcheck.c: In function 
> ‘__igt_reset_evict_vma’:
> ./include/linux/kern_levels.h:5:18: error: format ‘%x’ expects argument of 
> type ‘unsigned int’, but argum
>
> Fixes: b312d8ca3a7c ("dma-buf: make fence sequence numbers 64 bit v2")
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> Cc: Chunming Zhou 
> Cc: Mika Kuoppala 
> Cc: Joonas Lahtinen 

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 12 ++--
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c 
> b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 60a4bd9405be..34e200d32b7d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -450,7 +450,7 @@ static int __igt_reset_engine(struct drm_i915_private 
> *i915, bool active)
>   if (!wait_until_running(&h, rq)) {
>   struct drm_printer p = 
> drm_info_printer(i915->drm.dev);
>   
> - pr_err("%s: Failed to start request %x, 
> at %x\n",
> + pr_err("%s: Failed to start request 
> %llx, at %x\n",
>  __func__, rq->fence.seqno, 
> hws_seqno(&h, rq));
>   intel_engine_dump(engine, &p,
> "%s\n", engine->name);
> @@ -728,7 +728,7 @@ static int __igt_reset_engines(struct drm_i915_private 
> *i915,
>   if (!wait_until_running(&h, rq)) {
>   struct drm_printer p = 
> drm_info_printer(i915->drm.dev);
>   
> - pr_err("%s: Failed to start request %x, 
> at %x\n",
> + pr_err("%s: Failed to start request 
> %llx, at %x\n",
>  __func__, rq->fence.seqno, 
> hws_seqno(&h, rq));
>   intel_engine_dump(engine, &p,
> "%s\n", engine->name);
> @@ -927,7 +927,7 @@ static int igt_reset_wait(void *arg)
>   if (!wait_until_running(&h, rq)) {
>   struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
> - pr_err("%s: Failed to start request %x, at %x\n",
> + pr_err("%s: Failed to start request %llx, at %x\n",
>  __func__, rq->fence.seqno, hws_seqno(&h, rq));
>   intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
>   
> @@ -1106,7 +1106,7 @@ static int __igt_reset_evict_vma(struct 
> drm_i915_private *i915,
>   if (!wait_until_running(&h, rq)) {
>   struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
> - pr_err("%s: Failed to start request %x, at %x\n",
> + pr_err("%s: Failed to start request %llx, at %x\n",
>  __func__, rq->fence.seqno, hws_seqno(&h, rq));
>   intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
>   
> @@ -1301,7 +1301,7 @@ static int igt_reset_queue(void *arg)
>   if (!wait_until_running(&h, prev)) {
>   struct drm_printer p = 
> drm_info_printer(i915->drm.dev);
>   
> - pr_err("%s(%s): Failed to start request %x, at 
> %x\n",
> + pr_err("%s(%s): Failed to start request %llx, 
> at %x\n",
>  __func__, engine->name,
>  prev->fence.seqno, hws_seqno(&h, prev));
>   intel_engine_dump(engine, &p,
> @@ -1412,7 +1412,7 @@ static int igt_handle_error(void *arg)
>   if (!wait_until_running(&h, rq)) {
>   struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
> - pr_err("%s: Failed to start request %x, at %x\n",
> + pr_err("%s: Failed to start request %llx, at %x\n",
>  __func__, rq->fence.seqno, hws_seqno(&h, rq));
>   intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
>   

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices

2018-11-27 Thread Koenig, Christian
Hi Harish,

Am 26.11.18 um 21:59 schrieb Kasiviswanathan, Harish:
> Thanks Tejun,Eric and Christian for your replies.
>
> We want GPUs resource management to work seamlessly with containers and 
> container orchestration. With the Intel / bpf based approach this is not 
> possible.

I think one lesson learned is that we should describe this goal in the 
patch covert letter when sending it out. That could have avoid something 
like have of the initial confusion.

>  From your response we gather the following. GPU resources need to be 
> abstracted. We will send a new proposal in same vein. Our current thinking is 
> to start with a single abstracted resource and build a framework that can be 
> expanded to include additional resources. We plan to start with “GPU cores”. 
> We believe all GPUs have some concept of cores or compute unit.

Sounds good, just one comment on creating a framework: Before doing 
something like this think for a moment if it doesn't make sense to 
rather extend the existing cgroup framework. That approach usually makes 
more sense because you rarely need something fundamentally new.

Regards,
Christian.

>
> Your feedback is highly appreciated.
>
> Best Regards,
> Harish
>
>
>
> From: amd-gfx  on behalf of Tejun Heo 
> 
> Sent: Tuesday, November 20, 2018 5:30 PM
> To: Ho, Kenny
> Cc: cgro...@vger.kernel.org; intel-gfx@lists.freedesktop.org; 
> y2ke...@gmail.com; amd-...@lists.freedesktop.org; 
> dri-de...@lists.freedesktop.org
> Subject: Re: [PATCH RFC 2/5] cgroup: Add mechanism to register vendor 
> specific DRM devices
>
>
> Hello,
>
> On Tue, Nov 20, 2018 at 10:21:14PM +, Ho, Kenny wrote:
>> By this reply, are you suggesting that vendor specific resources
>> will never be acceptable to be managed under cgroup?  Let say a user
> I wouldn't say never but whatever which gets included as a cgroup
> controller should have clearly defined resource abstractions and the
> control schemes around them including support for delegation.  AFAICS,
> gpu side still seems to have a long way to go (and it's not clear
> whether that's somewhere it will or needs to end up).
>
>> want to have similar functionality as what cgroup is offering but to
>> manage vendor specific resources, what would you suggest as a
>> solution?  When you say keeping vendor specific resource regulation
>> inside drm or specific drivers, do you mean we should replicate the
>> cgroup infrastructure there or do you mean either drm or specific
>> driver should query existing hierarchy (such as device or perhaps
>> cpu) for the process organization information?
>>
>> To put the questions in more concrete terms, let say a user wants to
>> expose certain part of a gpu to a particular cgroup similar to the
>> way selective cpu cores are exposed to a cgroup via cpuset, how
>> should we go about enabling such functionality?
> Do what the intel driver or bpf is doing?  It's not difficult to hook
> into cgroup for identification purposes.
>
> Thanks.
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH RFC 4/5] drm/amdgpu: Add accounting of command submission via DRM cgroup

2018-11-23 Thread Koenig, Christian
Am 23.11.18 um 18:36 schrieb Eric Anholt:
> Christian König  writes:
>
>> Am 20.11.18 um 21:57 schrieb Eric Anholt:
>>> Kenny Ho  writes:
>>>
 Account for the number of command submitted to amdgpu by type on a per
 cgroup basis, for the purpose of profiling/monitoring applications.
>>> For profiling other drivers, I've used perf tracepoints, which let you
>>> get useful timelines of multiple events in the driver.  Have you made
>>> use of this stat for productive profiling?
>> Yes, but this is not related to profiling at all.
>>
>> What we want to do is to limit the resource usage of processes.
> That sounds great, and something I'd be interested in for vc4.  However,
> as far as I saw explained here, this patch doesn't let you limit
> resource usage of a process and is only useful for
> "profiling/monitoring" so I'm wondering how it is useful for that
> purpose.

Ok, good to know. I haven't looked at this in deep, but if this is just 
for accounting that would certainly be missing the goal.

Christian.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable

2018-11-22 Thread Koenig, Christian
Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
>
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.
>
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: David Rientjes 
> Cc: "Christian König" 
> Cc: Daniel Vetter 
> Cc: "Jérôme Glisse" 
> Cc: linux...@kvack.org
> Signed-off-by: Daniel Vetter 
> ---
>   mm/mmu_notifier.c | 8 +++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 59e102589a25..4d282cfb296e 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct 
> mm_struct *mm,
>   id = srcu_read_lock(&srcu);
>   hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
>   if (mn->ops->invalidate_range_start) {
> - int _ret = mn->ops->invalidate_range_start(mn, mm, 
> start, end, blockable);
> + int _ret;
> +
> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> + preempt_disable();
> + _ret = mn->ops->invalidate_range_start(mn, mm, start, 
> end, blockable);
> + if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> + preempt_enable();

Just for the sake of better documenting this how about adding this to 
include/linux/kernel.h right next to might_sleep():

#define disallow_sleeping_if(cond)    for((cond) ? preempt_disable() : 
(void)0; (cond); preempt_disable())

(Just from the back of my head, might contain peanuts and/or hints of 
errors).

Christian.

>   if (_ret) {
>   pr_info("%pS callback failed with %d in 
> %sblockable context.\n",
>   
> mn->ops->invalidate_range_start, _ret,

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 1/3] mm: Check if mmu notifier callbacks are allowed to fail

2018-11-22 Thread Koenig, Christian
Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> Just a bit of paranoia, since if we start pushing this deep into
> callchains it's hard to spot all places where an mmu notifier
> implementation might fail when it's not allowed to.
>
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: "Christian König" 
> Cc: David Rientjes 
> Cc: Daniel Vetter 
> Cc: "Jérôme Glisse" 
> Cc: linux...@kvack.org
> Cc: Paolo Bonzini 
> Signed-off-by: Daniel Vetter 

Acked-by: Christian König 

> ---
>   mm/mmu_notifier.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 5119ff846769..59e102589a25 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -190,6 +190,8 @@ int __mmu_notifier_invalidate_range_start(struct 
> mm_struct *mm,
>   pr_info("%pS callback failed with %d in 
> %sblockable context.\n",
>   
> mn->ops->invalidate_range_start, _ret,
>   !blockable ? "non-" : "");
> + WARN(blockable,"%pS callback failure not 
> allowed\n",
> +  mn->ops->invalidate_range_start);
>   ret = _ret;
>   }
>   }

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 0/5] drm/gem: Add drm_gem_object_funcs

2018-11-12 Thread Koenig, Christian
Am 10.11.18 um 15:56 schrieb Noralf Trønnes:
> This patchset adds a GEM object function table and makes use of it in
> the CMA helper.
>
> This was originally part of a shmem helper series[1] that didn't make
> it. Daniel and Christian showed interest in the vtable part so I have
> hooked it up to some refactoring in tinydrm in order to have a user. The
> tinydrm refactoring is part of a long term plan to get rid of
> tinydrm.ko.
>
> Noralf.
>
> [1] https://patchwork.freedesktop.org/series/27184/
>
> Noralf Trønnes (5):
>drm/driver: Add defaults for .gem_prime_export/import callbacks
>drm/prime: Add drm_gem_prime_mmap()
>drm/gem: Add drm_gem_object_funcs
>drm/cma-helper: Add DRM_GEM_CMA_VMAP_DRIVER_OPS
>drm/tinydrm: Use DRM_GEM_CMA_VMAP_DRIVER_OPS

Acked-by: Christian König  for the series.

Regards,
Christian.

>
>   Documentation/gpu/todo.rst |  13 +++
>   drivers/gpu/drm/drm_client.c   |  12 +--
>   drivers/gpu/drm/drm_gem.c  | 109 ++--
>   drivers/gpu/drm/drm_gem_cma_helper.c   |  86 
>   drivers/gpu/drm/drm_prime.c|  79 +++
>   drivers/gpu/drm/tinydrm/core/tinydrm-core.c|  71 --
>   drivers/gpu/drm/tinydrm/core/tinydrm-helpers.c |   6 ++
>   drivers/gpu/drm/tinydrm/hx8357d.c  |   4 +-
>   drivers/gpu/drm/tinydrm/ili9225.c  |   5 +-
>   drivers/gpu/drm/tinydrm/ili9341.c  |   4 +-
>   drivers/gpu/drm/tinydrm/mi0283qt.c |   6 +-
>   drivers/gpu/drm/tinydrm/mipi-dbi.c |  10 +-
>   drivers/gpu/drm/tinydrm/repaper.c  |   4 +-
>   drivers/gpu/drm/tinydrm/st7586.c   |   5 +-
>   drivers/gpu/drm/tinydrm/st7735r.c  |   4 +-
>   include/drm/drm_drv.h  |   4 +
>   include/drm/drm_gem.h  | 131 
> +
>   include/drm/drm_gem_cma_helper.h   |  24 +
>   include/drm/drm_prime.h|   1 +
>   include/drm/tinydrm/tinydrm.h  |  35 ++-
>   20 files changed, 462 insertions(+), 151 deletions(-)
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/syncobj: Avoid kmalloc(GFP_KERNEL) under spinlock

2018-10-26 Thread Koenig, Christian
Am 26.10.18 um 10:28 schrieb zhoucm1:
> Thanks, Could you help to submit to drm-misc again?

Done.

Christian.

>
> -David
>
>
> On 2018年10月26日 15:43, Christian König wrote:
>> Am 26.10.18 um 08:20 schrieb Chunming Zhou:
>>> drivers/gpu/drm/drm_syncobj.c:202:4-14: ERROR: function 
>>> drm_syncobj_find_signal_pt_for_point called on line 390 inside lock 
>>> on line 389 but uses GFP_KERNEL
>>>
>>>    Find functions that refer to GFP_KERNEL but are called with locks 
>>> held.
>>>
>>> Generated by: scripts/coccinelle/locks/call_kern.cocci
>>>
>>> v2:
>>> syncobj->timeline still needs protect.
>>>
>>> v3:
>>> use a global signaled fence instead of re-allocation.
>>>
>>> v4:
>>> Don't need moving lock.
>>> Don't expose func.
>>>
>>> v5:
>>> rename func and directly return.
>>>
>>> Tested by: syncobj_wait and ./deqp-vk -n dEQP-VK.*semaphore* with
>>> lock debug kernel options enabled.
>>>
>>> Signed-off-by: Chunming Zhou 
>>> Cc: Maarten Lankhorst 
>>> Cc: intel-gfx@lists.freedesktop.org
>>> Cc: Christian König 
>>> Cc: Chris Wilson 
>>> CC: Julia Lawall 
>>> Reviewed-by: Chris Wilson 
>>
>> Reviewed-by: Christian König 
>>
>>> ---
>>>   drivers/gpu/drm/drm_syncobj.c | 36 
>>> ++-
>>>   1 file changed, 19 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>>> b/drivers/gpu/drm/drm_syncobj.c
>>> index b7eaa603f368..d1c6f21c72b5 100644
>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>> @@ -80,6 +80,23 @@ struct drm_syncobj_signal_pt {
>>>   struct list_head list;
>>>   };
>>>   +static DEFINE_SPINLOCK(signaled_fence_lock);
>>> +static struct dma_fence signaled_fence;
>>> +
>>> +static struct dma_fence *drm_syncobj_get_stub_fence(void)
>>> +{
>>> +    spin_lock(&signaled_fence_lock);
>>> +    if (!signaled_fence.ops) {
>>> +    dma_fence_init(&signaled_fence,
>>> +   &drm_syncobj_stub_fence_ops,
>>> +   &signaled_fence_lock,
>>> +   0, 0);
>>> +    dma_fence_signal_locked(&signaled_fence);
>>> +    }
>>> +    spin_unlock(&signaled_fence_lock);
>>> +
>>> +    return dma_fence_get(&signaled_fence);
>>> +}
>>>   /**
>>>    * drm_syncobj_find - lookup and reference a sync object.
>>>    * @file_private: drm file private pointer
>>> @@ -113,23 +130,8 @@ static struct dma_fence
>>>   struct drm_syncobj_signal_pt *signal_pt;
>>>     if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
>>> -    (point <= syncobj->timeline)) {
>>> -    struct drm_syncobj_stub_fence *fence =
>>> -    kzalloc(sizeof(struct drm_syncobj_stub_fence),
>>> -    GFP_KERNEL);
>>> -
>>> -    if (!fence)
>>> -    return NULL;
>>> -    spin_lock_init(&fence->lock);
>>> -    dma_fence_init(&fence->base,
>>> -   &drm_syncobj_stub_fence_ops,
>>> -   &fence->lock,
>>> -   syncobj->timeline_context,
>>> -   point);
>>> -
>>> -    dma_fence_signal(&fence->base);
>>> -    return &fence->base;
>>> -    }
>>> +    (point <= syncobj->timeline))
>>> +    return drm_syncobj_get_stub_fence();
>>>     list_for_each_entry(signal_pt, &syncobj->signal_pt_list, 
>>> list) {
>>>   if (point > signal_pt->value)
>>
>

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] dma-buf: Update reservation shared_count after adding the new fence

2018-10-26 Thread Koenig, Christian
Am 26.10.18 um 10:03 schrieb Chris Wilson:
> We need to serialise the addition of a new fence into the shared list
> such that the fence is visible before we claim it is there. Otherwise a
> concurrent reader of the shared fence list will see an uninitialised
> fence slot before it is set.
>
><4> [109.613162] general protection fault:  [#1] PREEMPT SMP PTI
><4> [109.613177] CPU: 1 PID: 1357 Comm: gem_busy Tainted: G U  
>   4.19.0-rc8-CI-CI_DRM_5035+ #1
><4> [109.613189] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 
> 10/17/2011
><4> [109.613252] RIP: 0010:i915_gem_busy_ioctl+0x146/0x380 [i915]
><4> [109.613261] Code: 0b 43 04 49 83 c6 08 4d 39 e6 89 43 04 74 6d 4d 8b 
> 3e e8 5d 54 f4 e0 85 c0 74 0d 80 3d 08 71 1d 00 00
>0f 84 bb 00 00 00 31 c0 <49> 81 7f 08 20 3a 2c a0 75 cc 41 8b 97 50 02 00 
> 00 49 8b 8f a8 00
><4> [109.613283] RSP: 0018:c944bcf8 EFLAGS: 00010246
><4> [109.613292] RAX:  RBX: c944bdc0 RCX: 
> 0001
><4> [109.613302] RDX:  RSI:  RDI: 
> 822474a0
><4> [109.613311] RBP: c944bd28 R08: 88021e158680 R09: 
> 0001
><4> [109.613321] R10: 0040 R11:  R12: 
> 88021e1641b8
><4> [109.613331] R13: 0003 R14: 88021e1641b0 R15: 
> 6b6b6b6b6b6b6b6b
><4> [109.613341] FS:  7f9c9fc84980() GS:880227a4() 
> knlGS:
><4> [109.613352] CS:  0010 DS:  ES:  CR0: 80050033
><4> [109.613360] CR2: 7f9c9fcb8000 CR3: 0002247d4005 CR4: 
> 000606e0
>
> Fixes: 27836b641c1b ("dma-buf: remove shared fence staging in reservation 
> object")
> Testcase: igt/gem_busy/close-race
> Signed-off-by: Chris Wilson 
> Cc: Christian König 
> Cc: Junwei Zhang 
> Cc: Huang Rui 
> Cc: Sumit Semwal 

Reviewed-by: Christian König 

> ---
>   drivers/dma-buf/reservation.c | 14 +++---
>   1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 5fb4fd461908..c1618335ca99 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -147,16 +147,17 @@ void reservation_object_add_shared_fence(struct 
> reservation_object *obj,
>struct dma_fence *fence)
>   {
>   struct reservation_object_list *fobj;
> - unsigned int i;
> + unsigned int i, count;
>   
>   dma_fence_get(fence);
>   
>   fobj = reservation_object_get_list(obj);
> + count = fobj->shared_count;
>   
>   preempt_disable();
>   write_seqcount_begin(&obj->seq);
>   
> - for (i = 0; i < fobj->shared_count; ++i) {
> + for (i = 0; i < count; ++i) {
>   struct dma_fence *old_fence;
>   
>   old_fence = rcu_dereference_protected(fobj->shared[i],
> @@ -169,14 +170,13 @@ void reservation_object_add_shared_fence(struct 
> reservation_object *obj,
>   }
>   
>   BUG_ON(fobj->shared_count >= fobj->shared_max);
> - fobj->shared_count++;
> + count++;
>   
>   replace:
> - /*
> -  * memory barrier is added by write_seqcount_begin,
> -  * fobj->shared_count is protected by this lock too
> -  */
>   RCU_INIT_POINTER(fobj->shared[i], fence);
> + /* pointer update must be visible before we extend the shared_count */
> + smp_store_mb(fobj->shared_count, count);
> +
>   write_seqcount_end(&obj->seq);
>   preempt_enable();
>   }

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm: fix call_kern.cocci warnings v3

2018-10-25 Thread Koenig, Christian
Am 25.10.18 um 12:36 schrieb Maarten Lankhorst:
> Op 25-10-18 om 12:21 schreef Chunming Zhou:
>> drivers/gpu/drm/drm_syncobj.c:202:4-14: ERROR: function 
>> drm_syncobj_find_signal_pt_for_point called on line 390 inside lock on line 
>> 389 but uses GFP_KERNEL
>>
>>Find functions that refer to GFP_KERNEL but are called with locks held.
>>
>> Generated by: scripts/coccinelle/locks/call_kern.cocci
>>
>> v2:
>> syncobj->timeline still needs protect.
>>
>> v3:
>> use a global signaled fence instead of re-allocation.
>>
>> Signed-off-by: Chunming Zhou 
>> Cc: Maarten Lankhorst 
>> Cc: intel-gfx@lists.freedesktop.org
>> Cc: Christian König 
>> ---
>>   drivers/gpu/drm/drm_drv.c |  2 ++
>>   drivers/gpu/drm/drm_syncobj.c | 52 +--
>>   include/drm/drm_syncobj.h |  1 +
>>   3 files changed, 34 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>> index 36e8e9cbec52..0a6f1023d6c3 100644
>> --- a/drivers/gpu/drm/drm_drv.c
>> +++ b/drivers/gpu/drm/drm_drv.c
>> @@ -37,6 +37,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   
>>   #include "drm_crtc_internal.h"
>>   #include "drm_legacy.h"
>> @@ -1003,6 +1004,7 @@ static int __init drm_core_init(void)
>>  if (ret < 0)
>>  goto error;
>>   
>> +drm_syncobj_stub_fence_init();
>>  drm_core_init_complete = true;
>>   
>>  DRM_DEBUG("Initialized\n");
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index b7eaa603f368..6b3f5a06e4d3 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -80,6 +80,27 @@ struct drm_syncobj_signal_pt {
>>  struct list_head list;
>>   };
>>   
>> +static struct drm_syncobj_stub_fence stub_signaled_fence;
>> +static void global_stub_fence_release(struct dma_fence *fence)
>> +{
>> +/* it is impossible to come here */
>> +BUG();
>> +}
> WARN_ON_ONCE(1)? No need to halt the machine.
>
>> +static const struct dma_fence_ops global_stub_fence_ops = {
>> +.get_driver_name = drm_syncobj_stub_fence_get_name,
>> +.get_timeline_name = drm_syncobj_stub_fence_get_name,
>> +.release = global_stub_fence_release,
>> +};
>> +
>> +void drm_syncobj_stub_fence_init(void)
>> +{
>> +spin_lock_init(&stub_signaled_fence.lock);
>> +dma_fence_init(&stub_signaled_fence.base,
>> +   &global_stub_fence_ops,
>> +   &stub_signaled_fence.lock,
>> +   0, 0);
>> +dma_fence_signal(&stub_signaled_fence.base);
>> +}
>>   /**
>>* drm_syncobj_find - lookup and reference a sync object.
>>* @file_private: drm file private pointer
>> @@ -111,24 +132,14 @@ static struct dma_fence
>>uint64_t point)
>>   {
>>  struct drm_syncobj_signal_pt *signal_pt;
>> +struct dma_fence *f = NULL;
>>   
>> +spin_lock(&syncobj->pt_lock);
>>  if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
>>  (point <= syncobj->timeline)) {
>> -struct drm_syncobj_stub_fence *fence =
>> -kzalloc(sizeof(struct drm_syncobj_stub_fence),
>> -GFP_KERNEL);
>> -
>> -if (!fence)
>> -return NULL;
>> -spin_lock_init(&fence->lock);
>> -dma_fence_init(&fence->base,
>> -   &drm_syncobj_stub_fence_ops,
>> -   &fence->lock,
>> -   syncobj->timeline_context,
>> -   point);
>> -
>> -dma_fence_signal(&fence->base);
>> -return &fence->base;
>> +dma_fence_get(&stub_signaled_fence.base);
>> +spin_unlock(&syncobj->pt_lock);
>> +return &stub_signaled_fence.base;
>>  }
>>   
>>  list_for_each_entry(signal_pt, &syncobj->signal_pt_list, list) {
>> @@ -137,9 +148,12 @@ static struct dma_fence
>>  if ((syncobj->type == DRM_SYNCOBJ_TYPE_BINARY) &&
>>  (point != signal_pt->value))
>>  continue;
>> -return dma_fence_get(&signal_pt->fence_array->base);
>> +f = dma_fence_get(&signal_pt->fence_array->base);
>> +break;
>>  }
>> -return NULL;
>> +spin_unlock(&syncobj->pt_lock);
>> +
>> +return f;
>>   }
>>   
>>   static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
>> @@ -166,9 +180,7 @@ static void drm_syncobj_fence_get_or_add_callback(struct 
>> drm_syncobj *syncobj,
>>  }
>>   
>>  mutex_lock(&syncobj->cb_mutex);
>> -spin_lock(&syncobj->pt_lock);
>>  *fence = drm_syncobj_find_signal_pt_for_point(syncobj, pt_value);
>> -spin_unlock(&syncobj->pt_lock);
>>  if (!*fence)
>>  drm_syncobj_add_callback_locked(syncobj, cb, func);
>>  mutex_unlock(&syncobj->cb_mutex);
>> @@ -379,11 +391,9 @@ drm_syncobj_point_get(struct drm_syncobj *syncobj, u64 
>> point, u64 flags,
>>  

Re: [Intel-gfx] [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread Koenig, Christian
Am 25.10.18 um 11:28 schrieb zhoucm1:


On 2018年10月25日 17:23, Koenig, Christian wrote:
Am 25.10.18 um 11:20 schrieb zhoucm1:


On 2018年10月25日 17:11, Koenig, Christian wrote:
Am 25.10.18 um 11:03 schrieb zhoucm1:


On 2018年10月25日 16:56, Christian König wrote:
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+struct dma_fence *f = NULL;
+struct drm_syncobj_stub_fence *fence =
+kzalloc(sizeof(struct drm_syncobj_stub_fence),
+GFP_KERNEL);
  +if (!fence)
+return NULL;
+spin_lock(&syncobj->pt_lock);

How about using a single static stub fence like I suggested?
Sorry, I don't get your meanings, how to do that?

Add a new function drm_syncobj_stub_fence_init() which is called from 
drm_core_init() when the module is loaded.

In drm_syncobj_stub_fence_init() you initialize one static stub_fence which is 
then used over and over again.
Seems it would not work, we could need more than one stub fence.

Mhm, why? I mean it is just a signaled fence,

If A gets the global stub fence, doesn't put it yet, then B is coming, how does 
B re-use the global stub fence?  anything I misunderstand?

dma_fence_get()? The whole thing is reference counted, every time you need it 
you grab another reference.

Since we globally initialize it the reference never becomes zero, so it is 
never released.

Christian.


David
context and sequence number are irrelevant.

Christian.


David

Since its reference count never goes down to zero it should never be freed. In 
doubt maybe add a .free callback which just calls BUG() to catch reference 
count issues.

Christian.


Thanks,
David





___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread Koenig, Christian
Am 25.10.18 um 11:20 schrieb zhoucm1:


On 2018年10月25日 17:11, Koenig, Christian wrote:
Am 25.10.18 um 11:03 schrieb zhoucm1:


On 2018年10月25日 16:56, Christian König wrote:
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+struct dma_fence *f = NULL;
+struct drm_syncobj_stub_fence *fence =
+kzalloc(sizeof(struct drm_syncobj_stub_fence),
+GFP_KERNEL);
  +if (!fence)
+return NULL;
+spin_lock(&syncobj->pt_lock);

How about using a single static stub fence like I suggested?
Sorry, I don't get your meanings, how to do that?

Add a new function drm_syncobj_stub_fence_init() which is called from 
drm_core_init() when the module is loaded.

In drm_syncobj_stub_fence_init() you initialize one static stub_fence which is 
then used over and over again.
Seems it would not work, we could need more than one stub fence.

Mhm, why? I mean it is just a signaled fence, context and sequence number are 
irrelevant.

Christian.


David

Since its reference count never goes down to zero it should never be freed. In 
doubt maybe add a .free callback which just calls BUG() to catch reference 
count issues.

Christian.


Thanks,
David



___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm: fix call_kern.cocci warnings v2

2018-10-25 Thread Koenig, Christian
Am 25.10.18 um 11:03 schrieb zhoucm1:


On 2018年10月25日 16:56, Christian König wrote:
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -111,15 +111,16 @@ static struct dma_fence
uint64_t point)
  {
  struct drm_syncobj_signal_pt *signal_pt;
+struct dma_fence *f = NULL;
+struct drm_syncobj_stub_fence *fence =
+kzalloc(sizeof(struct drm_syncobj_stub_fence),
+GFP_KERNEL);
  +if (!fence)
+return NULL;
+spin_lock(&syncobj->pt_lock);

How about using a single static stub fence like I suggested?
Sorry, I don't get your meanings, how to do that?

Add a new function drm_syncobj_stub_fence_init() which is called from 
drm_core_init() when the module is loaded.

In drm_syncobj_stub_fence_init() you initialize one static stub_fence which is 
then used over and over again.

Since its reference count never goes down to zero it should never be freed. In 
doubt maybe add a .free callback which just calls BUG() to catch reference 
count issues.

Christian.


Thanks,
David

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm: fix call_kern.cocci warnings (fwd)

2018-10-25 Thread Koenig, Christian
Am 25.10.18 um 09:51 schrieb Maarten Lankhorst:
> Op 25-10-18 om 08:53 schreef Christian König:
>> Am 25.10.18 um 03:28 schrieb Zhou, David(ChunMing):
>>> Reviewed-by: Chunming Zhou 
>> NAK, GFP_ATOMIC should be avoided.
>>
>> The correct solution is to move the allocation out of the spinlock or drop 
>> the lock and reacquire.
> Yeah +1. Especially in a case like this where it's obvious to prevent. :)

Another possibility would to not allocate the dummy fence at all.

E.g. we just need a global instance of that which is always signaled and 
has a reference count of +1.

Christian.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm: fix deadlock of syncobj v6

2018-10-23 Thread Koenig, Christian
Am 23.10.18 um 11:37 schrieb Chunming Zhou:
> v2:
> add a mutex between sync_cb execution and free.
> v3:
> clearly separating the roles for pt_lock and cb_mutex (Chris)
> v4:
> the cb_mutex should be taken outside of the pt_lock around
> this if() block. (Chris)
> v5:
> fix a corner case
> v6:
> tidy drm_syncobj_fence_get_or_add_callback up. (Chris)
>
> Tested by syncobj_basic and syncobj_wait of igt.
>
> Signed-off-by: Chunming Zhou 
> Cc: Daniel Vetter 
> Cc: Chris Wilson 
> Cc: Christian König 
> Cc: intel-gfx@lists.freedesktop.org
> Reviewed-by: Chris Wilson 

I've gone ahead and pushed this to drm-misc-next.

Regards,
Christian.

> ---
>   drivers/gpu/drm/drm_syncobj.c | 156 --
>   include/drm/drm_syncobj.h |   8 +-
>   2 files changed, 81 insertions(+), 83 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 57bf6006394d..b7eaa603f368 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -106,6 +106,42 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
> *file_private,
>   }
>   EXPORT_SYMBOL(drm_syncobj_find);
>   
> +static struct dma_fence
> +*drm_syncobj_find_signal_pt_for_point(struct drm_syncobj *syncobj,
> +   uint64_t point)
> +{
> + struct drm_syncobj_signal_pt *signal_pt;
> +
> + if ((syncobj->type == DRM_SYNCOBJ_TYPE_TIMELINE) &&
> + (point <= syncobj->timeline)) {
> + struct drm_syncobj_stub_fence *fence =
> + kzalloc(sizeof(struct drm_syncobj_stub_fence),
> + GFP_KERNEL);
> +
> + if (!fence)
> + return NULL;
> + spin_lock_init(&fence->lock);
> + dma_fence_init(&fence->base,
> +&drm_syncobj_stub_fence_ops,
> +&fence->lock,
> +syncobj->timeline_context,
> +point);
> +
> + dma_fence_signal(&fence->base);
> + return &fence->base;
> + }
> +
> + list_for_each_entry(signal_pt, &syncobj->signal_pt_list, list) {
> + if (point > signal_pt->value)
> + continue;
> + if ((syncobj->type == DRM_SYNCOBJ_TYPE_BINARY) &&
> + (point != signal_pt->value))
> + continue;
> + return dma_fence_get(&signal_pt->fence_array->base);
> + }
> + return NULL;
> +}
> +
>   static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
>   struct drm_syncobj_cb *cb,
>   drm_syncobj_func_t func)
> @@ -114,115 +150,71 @@ static void drm_syncobj_add_callback_locked(struct 
> drm_syncobj *syncobj,
>   list_add_tail(&cb->node, &syncobj->cb_list);
>   }
>   
> -static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,
> -  struct dma_fence **fence,
> -  struct drm_syncobj_cb *cb,
> -  drm_syncobj_func_t func)
> +static void drm_syncobj_fence_get_or_add_callback(struct drm_syncobj 
> *syncobj,
> +   struct dma_fence **fence,
> +   struct drm_syncobj_cb *cb,
> +   drm_syncobj_func_t func)
>   {
> - int ret;
> + u64 pt_value = 0;
>   
> - ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
> - if (!ret)
> - return 1;
> + if (syncobj->type == DRM_SYNCOBJ_TYPE_BINARY) {
> + /*BINARY syncobj always wait on last pt */
> + pt_value = syncobj->signal_point;
>   
> - spin_lock(&syncobj->lock);
> - /* We've already tried once to get a fence and failed.  Now that we
> -  * have the lock, try one more time just to be sure we don't add a
> -  * callback when a fence has already been set.
> -  */
> - if (!list_empty(&syncobj->signal_pt_list)) {
> - spin_unlock(&syncobj->lock);
> - drm_syncobj_search_fence(syncobj, 0, 0, fence);
> - if (*fence)
> - return 1;
> - spin_lock(&syncobj->lock);
> - } else {
> - *fence = NULL;
> - drm_syncobj_add_callback_locked(syncobj, cb, func);
> - ret = 0;
> + if (pt_value == 0)
> + pt_value += DRM_SYNCOBJ_BINARY_POINT;
>   }
> - spin_unlock(&syncobj->lock);
>   
> - return ret;
> + mutex_lock(&syncobj->cb_mutex);
> + spin_lock(&syncobj->pt_lock);
> + *fence = drm_syncobj_find_signal_pt_for_point(syncobj, pt_value);
> + spin_unlock(&syncobj->pt_lock);
> + if (!*fence)
> + drm_syncobj_add_callback_locked(syncobj, cb, func);
> + mutex_unlock(&syncobj->cb_m

Re: [Intel-gfx] [PATCH] drm: fix deadlock of syncobj v5

2018-10-23 Thread Koenig, Christian
Am 23.10.18 um 11:11 schrieb Chris Wilson:
> Quoting zhoucm1 (2018-10-23 10:09:01)
>>
>> On 2018年10月23日 17:01, Chris Wilson wrote:
>>> Quoting Chunming Zhou (2018-10-23 08:57:54)
 v2:
 add a mutex between sync_cb execution and free.
 v3:
 clearly separating the roles for pt_lock and cb_mutex (Chris)
 v4:
 the cb_mutex should be taken outside of the pt_lock around this if() 
 block. (Chris)
 v5:
 fix a corner case

 Tested by syncobj_basic and syncobj_wait of igt.

 Signed-off-by: Chunming Zhou 
 Cc: Daniel Vetter 
 Cc: Chris Wilson 
 Cc: Christian König 
 Cc: intel-gfx@lists.freedesktop.org
 Reviewed-by: Chris Wilson 
 ---
drivers/gpu/drm/drm_syncobj.c | 55 +++
include/drm/drm_syncobj.h |  8 +++--
2 files changed, 36 insertions(+), 27 deletions(-)

 diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
 index 57bf6006394d..679a56791e34 100644
 --- a/drivers/gpu/drm/drm_syncobj.c
 +++ b/drivers/gpu/drm/drm_syncobj.c
 @@ -125,23 +125,26 @@ static int 
 drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,
   if (!ret)
   return 1;

 -   spin_lock(&syncobj->lock);
 +   mutex_lock(&syncobj->cb_mutex);
   /* We've already tried once to get a fence and failed.  Now that 
 we
* have the lock, try one more time just to be sure we don't add 
 a
* callback when a fence has already been set.
*/
 +   spin_lock(&syncobj->pt_lock);
   if (!list_empty(&syncobj->signal_pt_list)) {
 -   spin_unlock(&syncobj->lock);
 +   spin_unlock(&syncobj->pt_lock);
   drm_syncobj_search_fence(syncobj, 0, 0, fence);
>>> Hmm, just thinking of other ways of tidying this up
>>>
>>> mutex_lock(cb_lock);
>>> spin_lock(pt_lock);
>>> *fence = drm_syncobj_find_signal_pt_for_point();
>>> spin_unlock(pt_list);
>>> if (*!fence)
>>>drm_syncobj_add_callback_locked(syncobj, cb, func);
>>> mutex_unlock(cb_lock);
>>>
>>> i.e. get rid of the early return and we can even drop the int return here
>>> as it is unimportant and unused.
>> Yes, do you need I send v6? or you make a separate patch as a improvment?
> Send it in reply, we still have some time before the shards catch up
> with the ml ;)

I'm idle anyway because I've locked myself out of the AMD VPN accidentally.

So just send me a ping when the v6 is ready to be committed and I can 
push it to drm-misc-next.

Christian.

> -Chris

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/selftests: Remove unused dmabuf->kmap routines, fix the build

2018-06-20 Thread Koenig, Christian


Am 20.06.2018 18:22 schrieb Chris Wilson :
Fix i915's CI build after the removal of the dmabuf->kmap interface that
left the mock routines intact.

In file included from drivers/gpu/drm/i915/i915_gem_dmabuf.c:335:0:
drivers/gpu/drm/i915/selftests/mock_dmabuf.c:104:13: error: 
‘mock_dmabuf_kunmap_atomic’ defined but not used [-Werror=unused-function]
 static void mock_dmabuf_kunmap_atomic(struct dma_buf *dma_buf, unsigned long 
page_num, void *addr)
drivers/gpu/drm/i915/selftests/mock_dmabuf.c:97:14: error: 
‘mock_dmabuf_kmap_atomic’ defined but not used [-Werror=unused-function]
 static void *mock_dmabuf_kmap_atomic(struct dma_buf *dma_buf, unsigned long 
page_num)

Fixes: f664a5269542 ("dma-buf: remove kmap_atomic interface")
Signed-off-by: Chris Wilson 

Reviewed-by: Christian König 

And sorry for the noise,
Christian.

Cc: Christian König 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
---
 drivers/gpu/drm/i915/selftests/mock_dmabuf.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/mock_dmabuf.c 
b/drivers/gpu/drm/i915/selftests/mock_dmabuf.c
index f81fda8ea45e..ca682caf1062 100644
--- a/drivers/gpu/drm/i915/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/selftests/mock_dmabuf.c
@@ -94,18 +94,6 @@ static void mock_dmabuf_vunmap(struct dma_buf *dma_buf, void 
*vaddr)
 vm_unmap_ram(vaddr, mock->npages);
 }

-static void *mock_dmabuf_kmap_atomic(struct dma_buf *dma_buf, unsigned long 
page_num)
-{
-   struct mock_dmabuf *mock = to_mock(dma_buf);
-
-   return kmap_atomic(mock->pages[page_num]);
-}
-
-static void mock_dmabuf_kunmap_atomic(struct dma_buf *dma_buf, unsigned long 
page_num, void *addr)
-{
-   kunmap_atomic(addr);
-}
-
 static void *mock_dmabuf_kmap(struct dma_buf *dma_buf, unsigned long page_num)
 {
 struct mock_dmabuf *mock = to_mock(dma_buf);
--
2.18.0.rc2


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx