[Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0

2016-09-25 Thread Daniel Vetter
On Fri, Sep 23, 2016 at 07:59:44PM +0200, Christian König wrote:
> Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> > On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > > > Currently we install a callback for performing poll on a dma-buf,
> > > > irrespective of the timeout. This involves taking a spinlock, as well as
> > > > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > > > multiple threads.
> > > > 
> > > > We can query whether the poll will block prior to installing the
> > > > callback to make the busy-query fast.
> > > > 
> > > > Single thread: 60% faster
> > > > 8 threads on 4 (+4 HT) cores: 600% faster
> > > > 
> > > > Still not quite the perfect scaling we get with a native busy ioctl, but
> > > > poll(dmabuf) is faster due to the quicker lookup of the object and
> > > > avoiding drm_ioctl().
> > > > 
> > > > Signed-off-by: Chris Wilson 
> > > > Cc: Sumit Semwal 
> > > > Cc: linux-media at vger.kernel.org
> > > > Cc: dri-devel at lists.freedesktop.org
> > > > Cc: linaro-mm-sig at lists.linaro.org
> > > > Reviewed-by: Daniel Vetter 
> > > Need to strike the r-b here, since Christian König pointed out that
> > > objects won't magically switch signalling on.
> > Oh, it also means that
> > 
> > commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> > Author: Jammy Zhou 
> > Date:   Wed Jan 21 18:35:47 2015 +0800
> > 
> >  reservation: wait only with non-zero timeout specified (v3)
> >  When the timeout value passed to reservation_object_wait_timeout_rcu
> >  is zero, no wait should be done if the fences are not signaled.
> >  Return '1' for idle and '0' for busy if the specified timeout is '0'
> >  to keep consistent with the case of non-zero timeout.
> >  v2: call fence_put if not signaled in the case of timeout==0
> >  v3: switch to reservation_object_test_signaled_rcu
> >  Signed-off-by: Jammy Zhou 
> >  Reviewed-by: Christian König 
> >  Reviewed-by: Alex Deucher 
> >  Reviewed-By: Maarten Lankhorst 
> >  Signed-off-by: Sumit Semwal 
> > 
> > is wrong. And reservation_object_test_signaled_rcu() is unreliable.
> 
> Ups indeed, that patch is wrong as well.
> 
> I suggest that we just enable the signaling in this case as well.

Will you/Zhou take care of this corner case? Just so I can't forget about
it ;-)

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0

2016-09-23 Thread Christian König
Am 23.09.2016 um 17:20 schrieb Chris Wilson:
> On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
>>> Currently we install a callback for performing poll on a dma-buf,
>>> irrespective of the timeout. This involves taking a spinlock, as well as
>>> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
>>> multiple threads.
>>>
>>> We can query whether the poll will block prior to installing the
>>> callback to make the busy-query fast.
>>>
>>> Single thread: 60% faster
>>> 8 threads on 4 (+4 HT) cores: 600% faster
>>>
>>> Still not quite the perfect scaling we get with a native busy ioctl, but
>>> poll(dmabuf) is faster due to the quicker lookup of the object and
>>> avoiding drm_ioctl().
>>>
>>> Signed-off-by: Chris Wilson 
>>> Cc: Sumit Semwal 
>>> Cc: linux-media at vger.kernel.org
>>> Cc: dri-devel at lists.freedesktop.org
>>> Cc: linaro-mm-sig at lists.linaro.org
>>> Reviewed-by: Daniel Vetter 
>> Need to strike the r-b here, since Christian König pointed out that
>> objects won't magically switch signalling on.
> Oh, it also means that
>
> commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
> Author: Jammy Zhou 
> Date:   Wed Jan 21 18:35:47 2015 +0800
>
>  reservation: wait only with non-zero timeout specified (v3)
>  
>  When the timeout value passed to reservation_object_wait_timeout_rcu
>  is zero, no wait should be done if the fences are not signaled.
>  
>  Return '1' for idle and '0' for busy if the specified timeout is '0'
>  to keep consistent with the case of non-zero timeout.
>  
>  v2: call fence_put if not signaled in the case of timeout==0
>  
>  v3: switch to reservation_object_test_signaled_rcu
>  
>  Signed-off-by: Jammy Zhou 
>  Reviewed-by: Christian König 
>  Reviewed-by: Alex Deucher 
>  Reviewed-By: Maarten Lankhorst 
>  Signed-off-by: Sumit Semwal 
>
> is wrong. And reservation_object_test_signaled_rcu() is unreliable.

Ups indeed, that patch is wrong as well.

I suggest that we just enable the signaling in this case as well.

Regards,
Christian.

> -Chris
>



[Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0

2016-09-23 Thread Chris Wilson
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Sumit Semwal 
> > Cc: linux-media at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: linaro-mm-sig at lists.linaro.org
> > Reviewed-by: Daniel Vetter 
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

Oh, it also means that

commit fb8b7d2b9d80e1e71f379e57355936bd2b024be9
Author: Jammy Zhou 
Date:   Wed Jan 21 18:35:47 2015 +0800

reservation: wait only with non-zero timeout specified (v3)

When the timeout value passed to reservation_object_wait_timeout_rcu
is zero, no wait should be done if the fences are not signaled.

Return '1' for idle and '0' for busy if the specified timeout is '0'
to keep consistent with the case of non-zero timeout.

v2: call fence_put if not signaled in the case of timeout==0

v3: switch to reservation_object_test_signaled_rcu

Signed-off-by: Jammy Zhou 
Reviewed-by: Christian König 
Reviewed-by: Alex Deucher 
Reviewed-By: Maarten Lankhorst 
Signed-off-by: Sumit Semwal 

is wrong. And reservation_object_test_signaled_rcu() is unreliable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0

2016-09-23 Thread Chris Wilson
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Sumit Semwal 
> > Cc: linux-media at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: linaro-mm-sig at lists.linaro.org
> > Reviewed-by: Daniel Vetter 
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

Propagating a flag through to sync_file is trivial, but not through to
the dma_buf->resv. Looks like dma-buf will be without a fast busy query,
which I guess in the grand scheme of things (i.e. dma-buf itself is not
intended to be used as a fence) is not that important.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0

2016-09-23 Thread Daniel Vetter
On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> Currently we install a callback for performing poll on a dma-buf,
> irrespective of the timeout. This involves taking a spinlock, as well as
> unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> multiple threads.
> 
> We can query whether the poll will block prior to installing the
> callback to make the busy-query fast.
> 
> Single thread: 60% faster
> 8 threads on 4 (+4 HT) cores: 600% faster
> 
> Still not quite the perfect scaling we get with a native busy ioctl, but
> poll(dmabuf) is faster due to the quicker lookup of the object and
> avoiding drm_ioctl().
> 
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: linux-media at vger.kernel.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: linaro-mm-sig at lists.linaro.org
> Reviewed-by: Daniel Vetter 

Need to strike the r-b here, since Christian König pointed out that
objects won't magically switch signalling on.
-Daniel

> ---
>  drivers/dma-buf/dma-buf.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index cf04d249a6a4..c7a7bc579941 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, 
> poll_table *poll)
>   if (!events)
>   return 0;
>  
> + if (poll_does_not_wait(poll)) {
> + if (events & POLLOUT &&
> + !reservation_object_test_signaled_rcu(resv, true))
> + events &= ~(POLLOUT | POLLIN);
> +
> + if (events & POLLIN &&
> + !reservation_object_test_signaled_rcu(resv, false))
> + events &= ~POLLIN;
> +
> + return events;
> + }
> +
>  retry:
>   seq = read_seqcount_begin(>seq);
>   rcu_read_lock();
> -- 
> 2.9.3
> 
> ___
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] [PATCH 11/11] dma-buf: Do a fast lockless check for poll with timeout=0

2016-09-23 Thread Chris Wilson
On Fri, Sep 23, 2016 at 03:50:44PM +0200, Daniel Vetter wrote:
> On Mon, Aug 29, 2016 at 08:08:34AM +0100, Chris Wilson wrote:
> > Currently we install a callback for performing poll on a dma-buf,
> > irrespective of the timeout. This involves taking a spinlock, as well as
> > unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
> > multiple threads.
> > 
> > We can query whether the poll will block prior to installing the
> > callback to make the busy-query fast.
> > 
> > Single thread: 60% faster
> > 8 threads on 4 (+4 HT) cores: 600% faster
> > 
> > Still not quite the perfect scaling we get with a native busy ioctl, but
> > poll(dmabuf) is faster due to the quicker lookup of the object and
> > avoiding drm_ioctl().
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Sumit Semwal 
> > Cc: linux-media at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: linaro-mm-sig at lists.linaro.org
> > Reviewed-by: Daniel Vetter 
> 
> Need to strike the r-b here, since Christian König pointed out that
> objects won't magically switch signalling on.

The point being here that we don't even want to switch signaling on! :)

Christian's point was that not all fences guarantee forward progress
irrespective of whether signaling is enabled or not, and fences are not
required to guarantee forward progress without signaling even if they
provide an ops->signaled().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre