Re: [PATCH 1/2] staging: android: ion: Add __GFP_NOWARN for system contig heap

2018-01-05 Thread Chris Wilson
Quoting Laura Abbott (2018-01-05 19:14:08)
> syzbot reported a warning from Ion:
> 
>   WARNING: CPU: 1 PID: 3485 at mm/page_alloc.c:3926
> 
>   ...
>__alloc_pages_nodemask+0x9fb/0xd80 mm/page_alloc.c:4252
>   alloc_pages_current+0xb6/0x1e0 mm/mempolicy.c:2036
>   alloc_pages include/linux/gfp.h:492 [inline]
>   ion_system_contig_heap_allocate+0x40/0x2c0
>   drivers/staging/android/ion/ion_system_heap.c:374
>   ion_buffer_create drivers/staging/android/ion/ion.c:93 [inline]
>   ion_alloc+0x2c1/0x9e0 drivers/staging/android/ion/ion.c:420
>   ion_ioctl+0x26d/0x380 drivers/staging/android/ion/ion-ioctl.c:84
>   vfs_ioctl fs/ioctl.c:46 [inline]
>   do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
>   SYSC_ioctl fs/ioctl.c:701 [inline]
>   SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
> 
> This is a warning about attempting to allocate order > MAX_ORDER. This
> is coming from a userspace Ion allocation request. Since userspace is
> free to request however much memory it wants (and the kernel is free to
> deny its allocation), silence the allocation attempt with __GFP_NOWARN
> in case it fails.
> 
> Reported-by: syzbot+76e7efc4748495855...@syzkaller.appspotmail.com
> Reported-by: syzbot 
> Signed-off-by: Laura Abbott 
> ---
>  drivers/staging/android/ion/ion_system_heap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/android/ion/ion_system_heap.c 
> b/drivers/staging/android/ion/ion_system_heap.c
> index 71c4228f8238..bc19cdd30637 100644
> --- a/drivers/staging/android/ion/ion_system_heap.c
> +++ b/drivers/staging/android/ion/ion_system_heap.c
> @@ -362,7 +362,7 @@ static int ion_system_contig_heap_allocate(struct 
> ion_heap *heap,
> unsigned long i;
> int ret;
>  
> -   page = alloc_pages(low_order_gfp_flags, order);
> +   page = alloc_pages(low_order_gfp_flags | __GFP_NOWARN, order);

There's both high_order_gfp and low_order_gfp. The former includes
NOWARN and NORETRY.

Interesting, ion_system_heap_create_pools() tries to mix low_order and
high_order, but it only ever uses high_order flags. (orders[0] == 8
forcing a permanent switch from low_order_gfp to high_order_gfp).

There's no good reason for low_order_gfp, high_order_gfp to be static
rewritable variables.

For this instance, I would go farther and suggest you may want
__GFP_RETRY_MAYFAIL | __GFP_NOWARN to prevent userspace from triggering
the lowmemkiller/oomkiller here.
 
(I would kill low_order_gfp_flags / high_order_gfp_flags and avoid the
obfuscation.)
-Chris
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC v2 1/8] dma-buf/fence: add fence_collection fences

2016-04-26 Thread Chris Wilson
On Mon, Apr 25, 2016 at 07:33:21PM -0300, Gustavo Padovan wrote:
> +static const char *fence_collection_get_timeline_name(struct fence *fence)
> +{
> + return "no context";

"unbound" to distinguish from fence contexts within a timeline?

> +static bool fence_collection_enable_signaling(struct fence *fence)
> +{
> + struct fence_collection *collection = to_fence_collection(fence);
> + int i;
> +
> + for (i = 0 ; i < collection->num_fences ; i++) {
> + if (fence_add_callback(collection->fences[i].fence,
> +>fences[i].cb,
> +collection_check_cb_func)) {
> + atomic_dec(>num_pending_fences);
> + return false;

Don't stop, we need to enable all the others!

> + }
> + }
> +
> + return !!atomic_read(>num_pending_fences);

Redundant !!

> +}
> +
> +static bool fence_collection_signaled(struct fence *fence)
> +{
> + struct fence_collection *collection = to_fence_collection(fence);
> +
> + return (atomic_read(>num_pending_fences) == 0);

Redundant ()

> +static signed long fence_collection_wait(struct fence *fence, bool intr,
> +  signed long timeout)
> +{

What advantage does this have over fence_default_wait? You enable
signaling on all, then wait sequentially. The code looks redundant and
could just use fence_default_wait instead.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] dma-buf: Update docs for SYNC ioctl

2016-03-23 Thread Chris Wilson
On Wed, Mar 23, 2016 at 04:32:59PM +0100, David Herrmann wrote:
> Hi
> 
> On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson <ch...@chris-wilson.co.uk> 
> wrote:
> > On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
> >> My question was rather about why we do this? Semantics for EINTR are
> >> well defined, and with SA_RESTART (default on linux) user-space can
> >> ignore it. However, looping on EAGAIN is very uncommon, and it is not
> >> at all clear why it is needed?
> >>
> >> Returning an error to user-space makes sense if user-space has a
> >> reason to react to it. I fail to see how EAGAIN on a cache-flush/sync
> >> operation helps user-space at all? As someone without insight into the
> >> driver implementation, it is hard to tell why.. Any hints?
> >
> > The reason we return EAGAIN is to workaround a deadlock we face when
> > blocking on the GPU holding the struct_mutex (inside the client's
> > process), but the GPU is dead. As our locking is very, very coarse we
> > cannot restart the GPU without acquiring the struct_mutex being held by
> > the client so we wake the client up and tell them the resource they are
> > waiting on (the flush of the object from the GPU into the CPU domain) is
> > temporarily unavailable. If they try to immediately wait upon the ioctl
> > again, they are blocked waiting for the reset to occur before they may
> > complete their flush. There are a few other possible deadlocks that are
> > also avoided with EAGAIN (again, the issue is more or less the lack of
> > fine grained locking).
> 
> ...so you hijacked EAGAIN for all DRM ioctls just for a driver
> workaround?

No, we utilized the fact that EAGAIN was already enshrined by libdrm as
the defacto mechanism for repeating the ioctl in order to repeat the
ioctl for a driver workaround.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] dma-buf: Update docs for SYNC ioctl

2016-03-23 Thread Chris Wilson
On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote:
> My question was rather about why we do this? Semantics for EINTR are
> well defined, and with SA_RESTART (default on linux) user-space can
> ignore it. However, looping on EAGAIN is very uncommon, and it is not
> at all clear why it is needed?
> 
> Returning an error to user-space makes sense if user-space has a
> reason to react to it. I fail to see how EAGAIN on a cache-flush/sync
> operation helps user-space at all? As someone without insight into the
> driver implementation, it is hard to tell why.. Any hints?

The reason we return EAGAIN is to workaround a deadlock we face when
blocking on the GPU holding the struct_mutex (inside the client's
process), but the GPU is dead. As our locking is very, very coarse we
cannot restart the GPU without acquiring the struct_mutex being held by
the client so we wake the client up and tell them the resource they are
waiting on (the flush of the object from the GPU into the CPU domain) is
temporarily unavailable. If they try to immediately wait upon the ioctl
again, they are blocked waiting for the reset to occur before they may
complete their flush. There are a few other possible deadlocks that are
also avoided with EAGAIN (again, the issue is more or less the lack of
fine grained locking).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access()

2016-03-20 Thread Chris Wilson
Drivers, especially i915.ko, can fail during the initial migration of a
dma-buf for CPU access. However, the error code from the driver was not
being propagated back to ioctl and so userspace was blissfully ignorant
of the failure. Rendering corruption ensues.

Whilst fixing the ioctl to return the error code from
dma_buf_start_cpu_access(), also do the same for
dma_buf_end_cpu_access().  For most drivers, dma_buf_end_cpu_access()
cannot fail. i915.ko however, as most drivers would, wants to avoid being
uninterruptible (as would be required to guarrantee no failure when
flushing the buffer to the device). As userspace already has to handle
errors from the SYNC_IOCTL, take advantage of this to be able to restart
the syscall across signals.

This fixes a coherency issue for i915.ko as well as reducing the
uninterruptible hold upon its BKL, the struct_mutex.

Fixes commit c11e391da2a8fe973c3c2398452000bed505851e
Author: Daniel Vetter <daniel.vet...@ffwll.ch>
Date:   Thu Feb 11 20:04:51 2016 -0200

dma-buf: Add ioctls to allow userspace to flush

Testcase: igt/gem_concurrent_blit/*dmabuf*interruptible
Testcase: igt/prime_mmap_coherency/ioctl-errors
Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Tiago Vignatti <tiago.vigna...@intel.com>
Cc: Stéphane Marchesin <marc...@chromium.org>
Cc: David Herrmann <dh.herrm...@gmail.com>
Cc: Sumit Semwal <sumit.sem...@linaro.org>
Cc: Daniel Vetter <daniel.vet...@intel.com>
CC: linux-me...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc: linaro-mm-...@lists.linaro.org
Cc: intel-...@lists.freedesktop.org
Cc: de...@driverdev.osuosl.org
---
 drivers/dma-buf/dma-buf.c | 17 +++--
 drivers/gpu/drm/i915/i915_gem_dmabuf.c| 15 +--
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c |  5 +++--
 drivers/gpu/drm/udl/udl_fb.c  |  4 ++--
 drivers/staging/android/ion/ion.c |  6 --
 include/linux/dma-buf.h   |  6 +++---
 6 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 9810d1df0691..774a60f4309a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -259,6 +259,7 @@ static long dma_buf_ioctl(struct file *file,
struct dma_buf *dmabuf;
struct dma_buf_sync sync;
enum dma_data_direction direction;
+   int ret;
 
dmabuf = file->private_data;
 
@@ -285,11 +286,11 @@ static long dma_buf_ioctl(struct file *file,
}
 
if (sync.flags & DMA_BUF_SYNC_END)
-   dma_buf_end_cpu_access(dmabuf, direction);
+   ret = dma_buf_end_cpu_access(dmabuf, direction);
else
-   dma_buf_begin_cpu_access(dmabuf, direction);
+   ret = dma_buf_begin_cpu_access(dmabuf, direction);
 
-   return 0;
+   return ret;
default:
return -ENOTTY;
}
@@ -613,13 +614,17 @@ EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
  *
  * This call must always succeed.
  */
-void dma_buf_end_cpu_access(struct dma_buf *dmabuf,
-   enum dma_data_direction direction)
+int dma_buf_end_cpu_access(struct dma_buf *dmabuf,
+  enum dma_data_direction direction)
 {
+   int ret = 0;
+
WARN_ON(!dmabuf);
 
if (dmabuf->ops->end_cpu_access)
-   dmabuf->ops->end_cpu_access(dmabuf, direction);
+   ret = dmabuf->ops->end_cpu_access(dmabuf, direction);
+
+   return ret;
 }
 EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 1f3eef6fb345..0506016e18e0 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -228,25 +228,20 @@ static int i915_gem_begin_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_dire
return ret;
 }
 
-static void i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
+static int i915_gem_end_cpu_access(struct dma_buf *dma_buf, enum 
dma_data_direction direction)
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
struct drm_device *dev = obj->base.dev;
-   struct drm_i915_private *dev_priv = to_i915(dev);
-   bool was_interruptible;
int ret;
 
-   mutex_lock(>struct_mutex);
-   was_interruptible = dev_priv->mm.interruptible;
-   dev_priv->mm.interruptible = false;
+   ret = i915_mutex_lock_interruptible(dev);
+   if (ret)
+   return ret;
 
ret = i915_gem_object_set_to_gtt_domain(obj, false);
-
-   dev_priv->mm.interruptible = was_interruptible;
mutex_unlock(>struct_mutex);
 
-   if (unlikely(ret))
-   DRM_ERROR("unable to flush buffer following CPU access; 
rendering may be corrup