Re: [PATCH] drm: Fix lock order reversal between mmap_sem and struct_mutex.

2009-02-19 Thread Peter Zijlstra
On Thu, 2009-02-19 at 18:04 -0800, Eric Anholt wrote:
> On Thu, 2009-02-19 at 23:26 +0100, Peter Zijlstra wrote:
> > On Thu, 2009-02-19 at 22:02 +0100, Thomas Hellstrom wrote:
> > >  
> > > It looks to me like the driver preferred locking order is
> > > 
> > > object_mutex (which happens to be the device global struct_mutex)
> > >   mmap_sem
> > >  offset_mutex.
> > > 
> > > So if one could avoid using the struct_mutex for object bookkeeping (A 
> > > separate lock) then
> > > vm_open() and vm_close() would adhere to that locking order as well, 
> > > simply by not taking the struct_mutex at all.
> > > 
> > > So only fault() remains, in which that locking order is reversed. 
> > > Personally I think the trylock ->reschedule->retry method with proper 
> > > commenting is a good solution. It will be the _only_ place where locking 
> > > order is reversed and it is done in a deadlock-safe manner. Note that 
> > > fault() doesn't really fail, but requests a retry from user-space with 
> > > rescheduling to give the process holding the struct_mutex time to 
> > > release it.
> > 
> > It doesn't do the reschedule -- need_resched() will check if the current
> > task was marked to be scheduled away, furthermore yield based locking
> > sucks chunks.

Imagine what would happen if your faulting task was the highest RT prio
task in the system, you'd end up with a life-lock.

> > What's so very difficult about pulling the copy_*_user() out from under
> > the locks?
> 
> That we're expecting the data movement to occur while holding device
> state in place.  For example, we write data through the GTT most of the
> time so we:
> 
> lock struct_mutex
> pin the object to the GTT
> flushing caches as needed
> copy_from_user
> unpin object
> unlock struct_mutex

So you cannot drop the lock once you've pinned the dst object?

> If I'm to pull the copy_from_user out, that means I have to:
> 
> alloc temporary storage
> for each block of temp storage size:
>   copy_from_user
>   lock struct_mutex
>   pin the object to the GTT
>   flush caches as needed
>   memcpy
>   unpin object
>   unlock struct_mutex
> 
> At this point of introducing our third copy of the user's data in our
> hottest path, we should probably ditch the pwrite path entirely and go
> to user mapping of the objects for performance.  Requiring user mapping
> (which has significant overhead) cuts the likelihood of moving from
> user-space object caching to kernel object caching in the future, which
> has the potential of saving steaming piles of memory.

Or you could get_user_pages() to fault the user pages and pin them, and
then do pagefault_disable() and use copy_from_user_inatomic or such, and
release the pages again.




--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Fix lock order reversal between mmap_sem and struct_mutex.

2009-02-19 Thread Peter Zijlstra
On Thu, 2009-02-19 at 22:02 +0100, Thomas Hellstrom wrote:
>  
> It looks to me like the driver preferred locking order is
> 
> object_mutex (which happens to be the device global struct_mutex)
>   mmap_sem
>  offset_mutex.
> 
> So if one could avoid using the struct_mutex for object bookkeeping (A 
> separate lock) then
> vm_open() and vm_close() would adhere to that locking order as well, 
> simply by not taking the struct_mutex at all.
> 
> So only fault() remains, in which that locking order is reversed. 
> Personally I think the trylock ->reschedule->retry method with proper 
> commenting is a good solution. It will be the _only_ place where locking 
> order is reversed and it is done in a deadlock-safe manner. Note that 
> fault() doesn't really fail, but requests a retry from user-space with 
> rescheduling to give the process holding the struct_mutex time to 
> release it.

It doesn't do the reschedule -- need_resched() will check if the current
task was marked to be scheduled away, furthermore yield based locking
sucks chunks.

What's so very difficult about pulling the copy_*_user() out from under
the locks?


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


RE: [Intel-gfx] [PATCH] i915: add page flipping ioctl

2009-02-19 Thread Zou, Nanhai
>-Original Message-
>From: intel-gfx-boun...@lists.freedesktop.org
>[mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf Of Jesse Barnes
>Sent: 2009年2月20日 8:48
>To: dri-devel@lists.sourceforge.net
>Cc: intel-...@lists.freedesktop.org
>Subject: Re: [Intel-gfx] [PATCH] i915: add page flipping ioctl
>
>On Thursday 19 February 2009 16:43:54 Jesse Barnes wrote:
>> On Thursday 19 February 2009 11:37:01 Chris Wilson wrote:
>> > With a few additional suggestions by Jesse, I've managed to get
>> > tear-free compositing working on i915. Here's the diff on top of the
>> > original patch (though obviously this is just a suggestion, still need
>> > to prevent multiple pending flips to the same plane and ensure that the
>> > old buffer is eventually unpinned, and you might choose to drop the
>> > mutex around the wait_for_vblank ;-):
>>
>> Yeah, looks pretty reasonable.  Here's what I've been working with.  It
>> adds a couple of more changes (and slightly different cleanups) over your
>> version: - added a seqno to wait for
>>   - wait for flip before queuing another
>> but it's missing thew 915 changes you added (btw did I get the pitch wrong?
>> or are you submitting a different value for your objects?).
>>
>> I added a new seqno since it's possible (even likely) that the next vblank
>> won't actually reflect the flip, if the GPU is busy processing a large
>> batchbuffer when the ring command goes in for example.
>>
>> Also it might be a good idea to wait for any previous flip before queuing a
>> new one.  Anyway still tracking down some issues with the X side, but it
>> seems like it's approaching readiness.
>
>Oh you want to actually see the code?
>


Hi Jesse,
See below some comments.
>diff --git a/drivers/gpu/drm/i915/i915_dma.c
>b/drivers/gpu/drm/i915/i915_dma.c
>index cbb8224..4e4fb61 100644
>--- a/drivers/gpu/drm/i915/i915_dma.c
>+++ b/drivers/gpu/drm/i915/i915_dma.c
>@@ -829,6 +829,152 @@ static int i915_set_status_page(struct drm_device *dev,
>void *data,
>   return 0;
> }
>
>+static int i915_pipe_to_plane(struct drm_device *dev, int pipe)
>+{
>+  drm_i915_private_t *dev_priv = dev->dev_private;
>+  u32 reg, pipe_mask;
>+
>+  if (pipe == 0)
>+  pipe_mask = DISPPLANE_SEL_PIPE_A;
>+  else
>+  pipe_mask = DISPPLANE_SEL_PIPE_B;
>+
>+  pipe_mask |= DISPLAY_PLANE_ENABLE;
>+
>+  reg = I915_READ(DSPACNTR);
>+  if ((reg & (DISPLAY_PLANE_ENABLE | DISPPLANE_SEL_PIPE_MASK)) ==
>+  pipe_mask)
>+  return 0;
>+
>+  reg = I915_READ(DSPBCNTR);
>+  if ((reg & (DISPLAY_PLANE_ENABLE | DISPPLANE_SEL_PIPE_MASK)) ==
>+  pipe_mask)
>+  return 1;
>+
>+  return -1;
>+}
>+
>+bool
>+i915_gem_flip_pending(struct drm_gem_object *obj)
>+{
>+  struct drm_device *dev = obj->dev;
>+  drm_i915_private_t *dev_priv = dev->dev_private;
>+  struct drm_i915_gem_object *obj_priv = obj->driver_private;
>+  unsigned long flags;
>+  bool pending = false;
>+
>+  spin_lock_irqsave(&dev_priv->vblank_lock, flags);
>+  if (!list_empty(&obj_priv->vblank_head))
>+  pending = true;
>+  spin_unlock_irqrestore(&dev_priv->vblank_lock, flags);
>+
>+  return pending;
>+}
>+
>+static int i915_gem_page_flip(struct drm_device *dev, void *data,
>+struct drm_file *file_priv)
>+{
>+  struct drm_i915_gem_page_flip *flip_data = data;
>+  drm_i915_private_t *dev_priv = dev->dev_private;
>+  struct drm_gem_object *obj;
>+  struct drm_i915_gem_object *obj_priv;
>+  unsigned long flags;
>+  uint32_t offset;
>+  int pitch, pipe, plane, tiled;
>+  int ret = 0;
>+  RING_LOCALS;
>+
>+  if (!(dev->driver->driver_features & DRIVER_GEM))
>+  return -ENODEV;
>+
>+  /*
>+   * Reject unknown flags so future userspace knows what we (don't)
>+   * support
>+   */
>+  if (flip_data->flags & (~I915_PAGE_FLIP_WAIT)) {
>+  DRM_ERROR("bad page flip flags\n");
>+  return -EINVAL;
>+  }
>+
>+  if ((pipe = flip_data->pipe) > 1) {
>+  DRM_ERROR("bad pipe\n");
>+  return -EINVAL;
>+  }
pipe should be unsigned int, if it is int, this check should be if 
(pipe > 1 || pipe < 0)

>+
>+  plane = i915_pipe_to_plane(dev, pipe);
>+  if (plane < 0) {
>+  DRM_ERROR("bad pipe (no planes enabled?)\n");
>+  return -EINVAL;
>+  }
>+
>+  obj = drm_gem_object_lookup(dev, file_priv, flip_data->handle);
>+  if (obj == NULL)
>+  return -EBADF;
>+
>+  if (i915_gem_flip_pending(obj))
>+  wait_for_completion(&obj_priv->vblank);
>+
>+  mutex_lock(&dev->struct_mutex);
>+
>+  obj_priv = obj->driver_private;
>+  if (obj_priv->tiling_mode != I915_TILING_NONE &&
>+  obj_priv->tiling_mode != I915_TILING_X) {
>+  DRM_ERROR("can only flip non-tiled or X tiled pages\n");
>+  

DRI2 flush extension

2009-02-19 Thread Alan Hourihane
Attached is a new DRI2 flush extension that allows the driver to perform
a "real flush" before dispatching a swap or Xserver copy operation.

Currently we do this before a DRI2CopyRegion() call.

This allows drivers a real "end of scene" flush to ensure rendering is
complete prior to a swap.

I've committed this already to the gallium-mesa-7.4 branch, but any
comments appreciated before I push to the master branch?

Thanks,

Alan.
commit b163d4f9ee2ab4d54daf7c17c097cae51c9c6db2
Author: Alan Hourihane 
Date:   Thu Feb 19 18:39:08 2009 +

glx: add support for a reallyFlush() function before swap occurs.

diff --git a/include/GL/internal/dri_interface.h b/include/GL/internal/dri_interface.h
index 27cc1be..a726b93 100644
--- a/include/GL/internal/dri_interface.h
+++ b/include/GL/internal/dri_interface.h
@@ -78,6 +78,7 @@ typedef struct __DRIswrastExtensionRec		__DRIswrastExtension;
 typedef struct __DRIbufferRec			__DRIbuffer;
 typedef struct __DRIdri2ExtensionRec		__DRIdri2Extension;
 typedef struct __DRIdri2LoaderExtensionRec	__DRIdri2LoaderExtension;
+typedef struct __DRI2flushExtensionRec	__DRI2flushExtension;
 
 /*...@}*/
 
@@ -245,6 +246,16 @@ struct __DRItexBufferExtensionRec {
 			 __DRIdrawable *pDraw);
 };
 
+/**
+ * Used by drivers that implement DRI2
+ */
+#define __DRI2_FLUSH "DRI2_Flush"
+#define __DRI2_FLUSH_VERSION 1
+struct __DRI2flushExtensionRec {
+__DRIextension base;
+void (*flush)(__DRIdrawable *drawable);
+};
+
 
 /**
  * XML document describing the configuration options supported by the
diff --git a/src/glx/x11/dri2_glx.c b/src/glx/x11/dri2_glx.c
index 639aa19..fdda852 100644
--- a/src/glx/x11/dri2_glx.c
+++ b/src/glx/x11/dri2_glx.c
@@ -207,7 +207,13 @@ static void dri2CopySubBuffer(__GLXDRIdrawable *pdraw,
 xrect.width = width;
 xrect.height = height;
 
+#ifdef __DRI2_FLUSH
+if (pdraw->psc->f)
+	(*pdraw->psc->f->flush)(pdraw->driDrawable);
+#endif
+
 region = XFixesCreateRegion(pdraw->psc->dpy, &xrect, 1);
+/* should get a fence ID back from here at some point */
 DRI2CopyRegion(pdraw->psc->dpy, pdraw->drawable, region,
 		   DRI2BufferFrontLeft, DRI2BufferBackLeft);
 XFixesDestroyRegion(pdraw->psc->dpy, region);
@@ -235,6 +241,11 @@ static void dri2WaitX(__GLXDRIdrawable *pdraw)
 xrect.width = priv->width;
 xrect.height = priv->height;
 
+#ifdef __DRI2_FLUSH
+if (pdraw->psc->f)
+	(*pdraw->psc->f->flush)(pdraw->driDrawable);
+#endif
+
 region = XFixesCreateRegion(pdraw->psc->dpy, &xrect, 1);
 DRI2CopyRegion(pdraw->psc->dpy, pdraw->drawable, region,
 		   DRI2BufferFakeFrontLeft, DRI2BufferFrontLeft);
@@ -255,6 +266,11 @@ static void dri2WaitGL(__GLXDRIdrawable *pdraw)
 xrect.width = priv->width;
 xrect.height = priv->height;
 
+#ifdef __DRI2_FLUSH
+if (pdraw->psc->f)
+	(*pdraw->psc->f->flush)(pdraw->driDrawable);
+#endif
+
 region = XFixesCreateRegion(pdraw->psc->dpy, &xrect, 1);
 DRI2CopyRegion(pdraw->psc->dpy, pdraw->drawable, region,
 		   DRI2BufferFrontLeft, DRI2BufferFakeFrontLeft);
diff --git a/src/glx/x11/dri_common.c b/src/glx/x11/dri_common.c
index 4fda649..90c3d8c 100644
--- a/src/glx/x11/dri_common.c
+++ b/src/glx/x11/dri_common.c
@@ -392,6 +392,13 @@ driBindExtensions(__GLXscreenConfigs *psc, int dri2)
 	}
 #endif
 
+#ifdef __DRI2_FLUSH
+	if ((strcmp(extensions[i]->name, __DRI2_FLUSH) == 0) && dri2) {
+	psc->f = (__DRI2flushExtension *) extensions[i];
+	/* internal driver extension, no GL extension exposed */
+	}
+#endif
+
 	/* Ignore unknown extensions */
 }
 }
diff --git a/src/glx/x11/glxclient.h b/src/glx/x11/glxclient.h
index 3e70759..caf58bb 100644
--- a/src/glx/x11/glxclient.h
+++ b/src/glx/x11/glxclient.h
@@ -519,6 +519,10 @@ struct __GLXscreenConfigsRec {
 const __DRItexBufferExtension *texBuffer;
 #endif
 
+#ifdef __DRI2_FLUSH
+const __DRI2flushExtension *f;
+#endif
+
 #endif
 
 /**
--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Fix lock order reversal between mmap_sem and struct_mutex.

2009-02-19 Thread Eric Anholt
On Thu, 2009-02-19 at 23:26 +0100, Peter Zijlstra wrote:
> On Thu, 2009-02-19 at 22:02 +0100, Thomas Hellstrom wrote:
> >  
> > It looks to me like the driver preferred locking order is
> > 
> > object_mutex (which happens to be the device global struct_mutex)
> >   mmap_sem
> >  offset_mutex.
> > 
> > So if one could avoid using the struct_mutex for object bookkeeping (A 
> > separate lock) then
> > vm_open() and vm_close() would adhere to that locking order as well, 
> > simply by not taking the struct_mutex at all.
> > 
> > So only fault() remains, in which that locking order is reversed. 
> > Personally I think the trylock ->reschedule->retry method with proper 
> > commenting is a good solution. It will be the _only_ place where locking 
> > order is reversed and it is done in a deadlock-safe manner. Note that 
> > fault() doesn't really fail, but requests a retry from user-space with 
> > rescheduling to give the process holding the struct_mutex time to 
> > release it.
> 
> It doesn't do the reschedule -- need_resched() will check if the current
> task was marked to be scheduled away, furthermore yield based locking
> sucks chunks.
> 
> What's so very difficult about pulling the copy_*_user() out from under
> the locks?

That we're expecting the data movement to occur while holding device
state in place.  For example, we write data through the GTT most of the
time so we:

lock struct_mutex
pin the object to the GTT
flushing caches as needed
copy_from_user
unpin object
unlock struct_mutex

If I'm to pull the copy_from_user out, that means I have to:

alloc temporary storage
for each block of temp storage size:
copy_from_user
lock struct_mutex
pin the object to the GTT
flush caches as needed
memcpy
unpin object
unlock struct_mutex

At this point of introducing our third copy of the user's data in our
hottest path, we should probably ditch the pwrite path entirely and go
to user mapping of the objects for performance.  Requiring user mapping
(which has significant overhead) cuts the likelihood of moving from
user-space object caching to kernel object caching in the future, which
has the potential of saving steaming piles of memory.

-- 
Eric Anholt
e...@anholt.net eric.anh...@intel.com




signature.asc
Description: This is a digitally signed message part
--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[git pull] drm fixes tree.

2009-02-19 Thread Dave Airlie

Hi Linus,

Please pull the 'drm-fixes' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-fixes

Intel: This contains one lockdep fix, the other is still under discussion 
along with a few locking and error path fixes. It also contains some 
minor kms updates for PLL clocks and LVDS found during Fedora integration.

Radeon: fixes a suspend/resume regression introduced by the master 
changes.

Dave.

 drivers/gpu/drm/drm_crtc.c |3 +-
 drivers/gpu/drm/drm_crtc_helper.c  |   21 +++-
 drivers/gpu/drm/drm_fops.c |3 +
 drivers/gpu/drm/drm_gem.c  |   79 +++-
 drivers/gpu/drm/i915/i915_drv.c|2 +
 drivers/gpu/drm/i915/i915_drv.h|2 +
 drivers/gpu/drm/i915/i915_gem.c|  119 +++-
 drivers/gpu/drm/i915/i915_gem_tiling.c |6 +-
 drivers/gpu/drm/i915/intel_bios.c  |8 ++
 drivers/gpu/drm/i915/intel_display.c   |  160 +---
 drivers/gpu/drm/i915/intel_fb.c|8 +-
 drivers/gpu/drm/i915/intel_lvds.c  |2 -
 drivers/gpu/drm/i915/intel_sdvo.c  |2 +-
 drivers/gpu/drm/i915/intel_tv.c|2 +-
 drivers/gpu/drm/radeon/radeon_cp.c |   21 +++-
 include/drm/drmP.h |2 +
 include/drm/drm_crtc.h |2 +-
 include/drm/drm_crtc_helper.h  |   10 +-
 18 files changed, 279 insertions(+), 173 deletions(-)

commit 3d16118dc825a654043dfe3e14371fdf2976994d
Author: etienne 
Date:   Fri Feb 20 09:44:45 2009 +1000

drm/radeon: update sarea copies of last_ variables on resume.

This fixes a regression reported in bug #12613.

[airlied: not I tweaked the patch slightly and fixed it by etienne did
all the hardwork so gets authorship]

Signed-off-by: etienne 
Signed-off-by: Dave Airlie 

commit ab00b3e5210954cbaff9207db874a9f03197e3ba
Author: Jesse Barnes 
Date:   Wed Feb 11 14:01:46 2009 -0800

drm/i915: Keep refs on the object over the lifetime of vmas for GTT mmap.

This fixes potential fault at fault time if the object was unreferenced
while the mapping still existed.  Now, while the mmap_offset only lives
for the lifetime of the object, the object also stays alive while a vma
exists that needs it.

Signed-off-by: Jesse Barnes 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit 496818f08a78476abdb307e241911536221239fc
Author: Jesse Barnes 
Date:   Wed Feb 11 13:28:14 2009 -0800

drm/i915: take struct mutex around fb unref

Need to do this in case the unref ends up doing a free.

Signed-off-by: Jesse Barnes 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit 43565a0648e664744ac9201c199681451355edcc
Author: Kristian Høgsberg 
Date:   Fri Feb 13 20:56:52 2009 -0500

drm: Use spread spectrum when the bios tells us it's ok.

Lifted from the DDX modesetting.

Signed-off-by: Kristian Høgsberg 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit a29f5ca3d691995266a4b1df313e32ff0509a03c
Author: Kristian Høgsberg 
Date:   Fri Feb 13 20:56:51 2009 -0500

drm: Collapse identical i8xx_clock() and i9xx_clock().

They used to be different.  Now they're identical.

Signed-off-by: Kristian Høgsberg 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit f3cade5c037054ce5f57651fe0b64eaa9781c753
Author: Kristian Høgsberg 
Date:   Fri Feb 13 20:56:50 2009 -0500

drm: Bring PLL limits in sync with DDX values.

Signed-off-by: Kristian Høgsberg 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit 7f9872e06d749afdc2029aa6b7ffe88cb3b8c5c2
Author: Kristian Høgsberg 
Date:   Fri Feb 13 20:56:49 2009 -0500

drm: Add locking around cursor gem operations.

We need to hold the struct_mutex around pinning and the phys object
operations.

Signed-off-by: Kristian Høgsberg 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit 5c3b82e2b229e78eb32f4ea12d16f3ebeeab3fc7
Author: Chris Wilson 
Date:   Wed Feb 11 13:25:09 2009 +

drm: Propagate failure from setting crtc base.

Check the error paths within intel_pipe_set_base() to first cleanup and
then report back the error.

Signed-off-by: Chris Wilson 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit e62fb64e6187ea9d8bcedb17ccaa045ed92d4b55
Author: Chris Wilson 
Date:   Wed Feb 11 16:39:21 2009 +

drm: Check for a NULL encoder when reverting on error path

We need to skip the connectors with a NULL encoder to match the success
path and avoid an OOPS.

Signed-off-by: Chris Wilson 
Signed-off-by: Eric Anholt 
Signed-off-by: Dave Airlie 

commit 85a7bb98582b60b7e9130159d2464eb0bbac13f7
Author: Chris Wilson 
Date:   Wed Feb 11 14:52:44 2009 +

drm/i915: Cleanup the hws on ringbuffer constrution failure.
   

Re: [PATCH] i915: add page flipping ioctl

2009-02-19 Thread Jesse Barnes
On Thursday 19 February 2009 16:43:54 Jesse Barnes wrote:
> On Thursday 19 February 2009 11:37:01 Chris Wilson wrote:
> > With a few additional suggestions by Jesse, I've managed to get
> > tear-free compositing working on i915. Here's the diff on top of the
> > original patch (though obviously this is just a suggestion, still need
> > to prevent multiple pending flips to the same plane and ensure that the
> > old buffer is eventually unpinned, and you might choose to drop the
> > mutex around the wait_for_vblank ;-):
>
> Yeah, looks pretty reasonable.  Here's what I've been working with.  It
> adds a couple of more changes (and slightly different cleanups) over your
> version: - added a seqno to wait for
>   - wait for flip before queuing another
> but it's missing thew 915 changes you added (btw did I get the pitch wrong?
> or are you submitting a different value for your objects?).
>
> I added a new seqno since it's possible (even likely) that the next vblank
> won't actually reflect the flip, if the GPU is busy processing a large
> batchbuffer when the ring command goes in for example.
>
> Also it might be a good idea to wait for any previous flip before queuing a
> new one.  Anyway still tracking down some issues with the X side, but it
> seems like it's approaching readiness.

Oh you want to actually see the code?

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index cbb8224..4e4fb61 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -829,6 +829,152 @@ static int i915_set_status_page(struct drm_device *dev, 
void *data,
return 0;
 }
 
+static int i915_pipe_to_plane(struct drm_device *dev, int pipe)
+{
+   drm_i915_private_t *dev_priv = dev->dev_private;
+   u32 reg, pipe_mask;
+
+   if (pipe == 0)
+   pipe_mask = DISPPLANE_SEL_PIPE_A;
+   else
+   pipe_mask = DISPPLANE_SEL_PIPE_B;
+
+   pipe_mask |= DISPLAY_PLANE_ENABLE;
+
+   reg = I915_READ(DSPACNTR);
+   if ((reg & (DISPLAY_PLANE_ENABLE | DISPPLANE_SEL_PIPE_MASK)) ==
+   pipe_mask)
+   return 0;
+
+   reg = I915_READ(DSPBCNTR);
+   if ((reg & (DISPLAY_PLANE_ENABLE | DISPPLANE_SEL_PIPE_MASK)) ==
+   pipe_mask)
+   return 1;
+
+   return -1;
+}
+
+bool
+i915_gem_flip_pending(struct drm_gem_object *obj)
+{
+   struct drm_device *dev = obj->dev;
+   drm_i915_private_t *dev_priv = dev->dev_private;
+   struct drm_i915_gem_object *obj_priv = obj->driver_private;
+   unsigned long flags;
+   bool pending = false;
+
+   spin_lock_irqsave(&dev_priv->vblank_lock, flags);
+   if (!list_empty(&obj_priv->vblank_head))
+   pending = true;
+   spin_unlock_irqrestore(&dev_priv->vblank_lock, flags);
+
+   return pending;
+}
+
+static int i915_gem_page_flip(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+   struct drm_i915_gem_page_flip *flip_data = data;
+   drm_i915_private_t *dev_priv = dev->dev_private;
+   struct drm_gem_object *obj;
+   struct drm_i915_gem_object *obj_priv;
+   unsigned long flags;
+   uint32_t offset;
+   int pitch, pipe, plane, tiled;
+   int ret = 0;
+   RING_LOCALS;
+
+   if (!(dev->driver->driver_features & DRIVER_GEM))
+   return -ENODEV;
+
+   /*
+* Reject unknown flags so future userspace knows what we (don't)
+* support
+*/
+   if (flip_data->flags & (~I915_PAGE_FLIP_WAIT)) {
+   DRM_ERROR("bad page flip flags\n");
+   return -EINVAL;
+   }
+
+   if ((pipe = flip_data->pipe) > 1) {
+   DRM_ERROR("bad pipe\n");
+   return -EINVAL;
+   }
+
+   plane = i915_pipe_to_plane(dev, pipe);
+   if (plane < 0) {
+   DRM_ERROR("bad pipe (no planes enabled?)\n");
+   return -EINVAL;
+   }
+
+   obj = drm_gem_object_lookup(dev, file_priv, flip_data->handle);
+   if (obj == NULL)
+   return -EBADF;
+
+   if (i915_gem_flip_pending(obj))
+   wait_for_completion(&obj_priv->vblank);
+
+   mutex_lock(&dev->struct_mutex);
+
+   obj_priv = obj->driver_private;
+   if (obj_priv->tiling_mode != I915_TILING_NONE &&
+   obj_priv->tiling_mode != I915_TILING_X) {
+   DRM_ERROR("can only flip non-tiled or X tiled pages\n");
+   ret = -EINVAL;
+   goto out_unref;
+   }
+
+   ret = i915_gem_object_pin(obj, 0);
+   if (ret) {
+   DRM_ERROR("failed to pin object for flip\n");
+   ret = -EBUSY;
+   goto out_unref;
+   }
+   mutex_unlock(&dev->struct_mutex);
+
+   /*
+* Put the object in the GTT domain before the flip,
+* since there may be outstanding rendering
+*/
+   i915_gem_object_set_to_gtt_domain(obj, 0);
+
+   offset = obj_priv->gtt

Re: [PATCH] i915: add page flipping ioctl

2009-02-19 Thread Jesse Barnes
On Thursday 19 February 2009 11:37:01 Chris Wilson wrote:
> With a few additional suggestions by Jesse, I've managed to get
> tear-free compositing working on i915. Here's the diff on top of the
> original patch (though obviously this is just a suggestion, still need
> to prevent multiple pending flips to the same plane and ensure that the
> old buffer is eventually unpinned, and you might choose to drop the
> mutex around the wait_for_vblank ;-):

Yeah, looks pretty reasonable.  Here's what I've been working with.  It adds a 
couple of more changes (and slightly different cleanups) over your version:
  - added a seqno to wait for
  - wait for flip before queuing another
but it's missing thew 915 changes you added (btw did I get the pitch wrong?  
or are you submitting a different value for your objects?).

I added a new seqno since it's possible (even likely) that the next vblank 
won't actually reflect the flip, if the GPU is busy processing a large 
batchbuffer when the ring command goes in for example.

Also it might be a good idea to wait for any previous flip before queuing a 
new one.  Anyway still tracking down some issues with the X side, but it seems 
like it's approaching readiness.

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] i915: add page flipping ioctl

2009-02-19 Thread Jesse Barnes
On Thursday 19 February 2009 06:55:28 Chris Wilson wrote:
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 3795dbc..93e677a 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -435,6 +435,8 @@ EXPORT_SYMBOL(drm_vblank_get);
>   */
>  void drm_vblank_put(struct drm_device *dev, int crtc)
>  {
> + BUG_ON (atomic_read (&dev->vblank_refcount[crtc]) == 0);
> +
>   /* Last user schedules interrupt disable */
>   if (atomic_dec_and_test(&dev->vblank_refcount[crtc]))
>   mod_timer(&dev->vblank_disable_timer, jiffies + 5*DRM_HZ);
> @@ -460,8 +462,9 @@ void drm_vblank_pre_modeset(struct drm_device *dev,
> int crtc)
>* so that interrupts remain enabled in the interim.
>*/
>   if (!dev->vblank_inmodeset[crtc]) {
> - dev->vblank_inmodeset[crtc] = 1;
> - drm_vblank_get(dev, crtc);
> + dev->vblank_inmodeset[crtc] = 0x1;
> + if (drm_vblank_get(dev, crtc) == 0)
> + dev->vblank_inmodeset[crtc] |= 0x2;
>   }
>  }
>  EXPORT_SYMBOL(drm_vblank_pre_modeset);
> @@ -473,9 +476,12 @@ void drm_vblank_post_modeset(struct drm_device
> *dev, int crtc)
>   if (dev->vblank_inmodeset[crtc]) {
>   spin_lock_irqsave(&dev->vbl_lock, irqflags);
>   dev->vblank_disable_allowed = 1;
> - dev->vblank_inmodeset[crtc] = 0;
>   spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
> - drm_vblank_put(dev, crtc);
> +
> + if (dev->vblank_inmodeset[crtc] & 0x2)
> + drm_vblank_put(dev, crtc);
> +
> + dev->vblank_inmodeset[crtc] = 0;
>   }
>  }
>  EXPORT_SYMBOL(drm_vblank_post_modeset);

Looks fine to me; checking for errors when calling functions is probably a 
good idea!

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Preserve SHMLBA bits in hash key for _DRM_SHM mappings.

2009-02-19 Thread Andrew Morton
On Wed, 18 Feb 2009 15:41:02 -0800 (PST)
David Miller  wrote:

> 
> Platforms such as sparc64 have D-cache aliasing issues.  We
> cannot allow virtual mappings in different contexts to be such
> that two cache lines can be loaded for the same backing data.
> Updates to one cache line won't be seen by accesses to the other
> cache line.
> 
> Code in sparc64 and other architectures solve this problem by
> making sure that all userland mappings of MAP_SHARED objects have
> the same virtual address base.  They implement this by keying
> off of the page offset, and using that to choose a suitably
> consistent virtual address for mmap() requests.
> 
> Making things even worse, getting this wrong on sparc64 can result
> in hangs during DRM lock acquisition.  This is because, at least on
> UltraSPARC-III, normal loads consult the D-cache but atomics such
> as 'cas' (which is what cmpxchg() is implement using) only consult
> the L2 cache.  So if a D-cache alias is inserted, the load can
> see different data than the atomic, and we'll loop forever because
> the atomic compare-and-exchange will never complete successfully.
> 
> So to make this all work properly, we need to make sure that the
> hash address computed by drm_map_handle() preserves the SHMLBA
> relevant bits, and that's what this patch does for _DRM_SHM mappings.
> 
> As a historical note, many years ago this bug didn't exist because we
> used to just use the low 32-bits of the address as the hash and just
> hope for the best.  This preserved the SHMLBA bits properly.  But when
> the hashtab code was added to DRM, this was no longer the case.
> 
> ...
>  
>  #include 
> +#include 
> +#include 
>  #include "drmP.h"
>  

The inclusion of asm/shmparam.h direct from a driver is a bit risky. 
It assumes that asm/shmparam.h is compileable in isolation from the
additional things which include/linux/shm.h includes.  In particular,
asm/page.h.

eg:

arch/xtensa/include/asm/shmparam.h
#define SHMLBA  ((PAGE_SIZE > DCACHE_WAY_SIZE)? PAGE_SIZE : DCACHE_WAY_SIZE)


But including linux/shm.h here seems a bit silly.   We'll see..

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Fix lock order reversal between mmap_sem and struct_mutex.

2009-02-19 Thread Thomas Hellstrom
Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 16:59 -0800, Eric Anholt wrote:
>   
>> The basic problem was
>> mmap_sem (do_mmap()) -> struct_mutex (drm_gem_mmap(), i915_gem_fault())
>> struct_mutex (i915_gem_execbuffer()) -> mmap_sem (copy_from/to_user())
>> 
>
> That's not the only problem, there's also:
>
> dup_mmap()
>   down_write(mmap_sem)
>   vm_ops->open() -> drm_vm_open()
> mutex_lock(struct_mutex);
>
>   
>> We have plenty of places where we want to hold device state the same
>> (struct_mutex) while we move a non-trivial amount of data
>> (copy_from/to_user()), such as i915_gem_pwrite().  Solve this by moving the
>> easy things that needed struct_mutex with mmap_sem held to using a lock to
>> cover just those data structures (offset hash and offset manager), and do
>> trylock and reschedule in fault.
>> 
>
> So we establish,
>
>   mmap_sem
> offset_mutex
>
>   i915_gem_mmap_gtt_ioctl()
> mutex_lock(struct_mutex)
> i915_gem_create_mmap_offset()
>   mutex_lock(offset_mutex)
>
> However we still have
>
>   struct_mutex
> mmap_sem
>
> in basically every copy_*_user() case
>
> But you cannot seem to switch ->fault() to use offset_mutex, which would
> work out the inversion because you then have:
>
>   struct_mutex
> mmap_sem
>   offset_mutex
>
> So why bother with the offset_mutex? Instead you make your ->fault()
> fail randomly.
>
> I'm not sure what Wang Chen sees after this patch, but I should not be
> the exact same splat, still it would not at all surprise me if there's
> plenty left.
>
> The locking looks very fragile and I don't think this patch is helping
> anything, sorry.
>
>   
It looks to me like the driver preferred locking order is

object_mutex (which happens to be the device global struct_mutex)
  mmap_sem
 offset_mutex.

So if one could avoid using the struct_mutex for object bookkeeping (A 
separate lock) then
vm_open() and vm_close() would adhere to that locking order as well, 
simply by not taking the struct_mutex at all.

So only fault() remains, in which that locking order is reversed. 
Personally I think the trylock ->reschedule->retry method with proper 
commenting is a good solution. It will be the _only_ place where locking 
order is reversed and it is done in a deadlock-safe manner. Note that 
fault() doesn't really fail, but requests a retry from user-space with 
rescheduling to give the process holding the struct_mutex time to 
release it.

/Thomas






--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] i915: add page flipping ioctl

2009-02-19 Thread Chris Wilson
With a few additional suggestions by Jesse, I've managed to get
tear-free compositing working on i915. Here's the diff on top of the
original patch (though obviously this is just a suggestion, still need
to prevent multiple pending flips to the same plane and ensure that the
old buffer is eventually unpinned, and you might choose to drop the
mutex around the wait_for_vblank ;-):

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index bc440ff..b4d347c 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -835,19 +835,18 @@ static int i915_pipe_to_plane(struct drm_device *dev, int 
pipe)
u32 reg, pipe_mask;
 
if (pipe == 0)
-   pipe_mask = DISPPLANE_SEL_PIPE_A;
+   pipe_mask = DISPPLANE_SEL_PIPE_A | DISPLAY_PLANE_ENABLE;
else
-   pipe_mask = DISPPLANE_SEL_PIPE_B;
+   pipe_mask = DISPPLANE_SEL_PIPE_B | DISPLAY_PLANE_ENABLE;
 
+#define DISPPLANE_ENABLED_PIPE_MASK (DISPLAY_PLANE_ENABLE | 
DISPPLANE_SEL_PIPE_MASK)
reg = I915_READ(DSPACNTR);
-   if ((reg & DISPLAY_PLANE_ENABLE) &&
-   ((reg & DISPPLANE_SEL_PIPE_MASK) == pipe_mask))
-   return 0;
+   if ((reg & DISPPLANE_ENABLED_PIPE_MASK) == pipe_mask)
+   return 0;
 
reg = I915_READ(DSPBCNTR);
-   if ((reg & DISPLAY_PLANE_ENABLE) &&
-   ((reg & DISPPLANE_SEL_PIPE_MASK) == pipe_mask))
-   return 1;
+   if ((reg & DISPPLANE_ENABLED_PIPE_MASK) == pipe_mask)
+   return 1;
 
return -1;
 }
@@ -893,6 +892,8 @@ static int i915_gem_page_flip(struct drm_device *dev, void 
*data,
return -EBADF;
 
obj_priv = obj->driver_private;
+
+   mutex_lock(&dev->struct_mutex);
if (obj_priv->tiling_mode != I915_TILING_NONE &&
obj_priv->tiling_mode != I915_TILING_X) {
DRM_ERROR("can only flip non-tiled or X tiled pages\n");
@@ -903,13 +904,18 @@ static int i915_gem_page_flip(struct drm_device *dev, 
void *data,
ret = i915_gem_object_pin(obj, 0);
if (ret) {
DRM_ERROR("failed to pin object for flip\n");
-   ret = -EBUSY;
+   goto out_unref;
+   }
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, 0);
+   if (ret) {
+   DRM_ERROR("failed to set object to gtt domain for flip\n");
goto out_unref;
}
 
offset = obj_priv->gtt_offset;
pitch = obj_priv->stride;
-   tiled = !!(obj_priv->tiling_mode == I915_TILING_X);
+   tiled = obj_priv->tiling_mode == I915_TILING_X;
 
/*
 * Queue the flip with the lock held in case the flip happens
@@ -919,21 +925,27 @@ static int i915_gem_page_flip(struct drm_device *dev, 
void *data,
 
BEGIN_LP_RING(4);
OUT_RING(CMD_OP_DISPLAYBUFFER_INFO | (plane << 20));
-   OUT_RING((pitch / 8) << 3); /* flip queue and/or pitch */
-   OUT_RING(offset | tiled);
-   OUT_RING(((flip_data->x - 1) << 16) | (flip_data->y - 1));
+   OUT_RING(pitch);
+   if (IS_I965G(dev)) {
+   OUT_RING(offset | tiled);
+   OUT_RING(((flip_data->x - 1) << 16) | (flip_data->y - 1));
+   } else {
+   OUT_RING(offset);
+   OUT_RING(0);
+   }
ADVANCE_LP_RING();
 
-   list_add_tail(&obj_priv->vblank_head, &dev_priv->mm.vblank_list[pipe]);
-   drm_vblank_get(dev, pipe);
+   ret = drm_vblank_get(dev, pipe);
+   if (ret == 0)
+   list_add_tail(&obj_priv->vblank_head,
+ &dev_priv->mm.vblank_list[pipe]);
 
spin_unlock_irqrestore(&dev_priv->vblank_lock, flags);
 
-   if (flip_data->flags & I915_PAGE_FLIP_WAIT)
+   if (ret == 0 && flip_data->flags & I915_PAGE_FLIP_WAIT)
wait_for_completion(&obj_priv->vblank);
 
 out_unref:
-   mutex_lock(&dev->struct_mutex);
drm_gem_object_unreference(obj);
mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8f2dc72..28e8177 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2467,6 +2467,25 @@ i915_gem_ring_throttle(struct drm_device *dev, struct 
drm_file *file_priv)
return ret;
 }
 
+static void
+i915_gem_object_wait_for_vblank(struct drm_device *dev,
+   struct drm_gem_object *obj)
+{
+   drm_i915_private_t *dev_priv = dev->dev_private;
+   struct drm_i915_gem_object *obj_priv;
+   unsigned long irqflags;
+   int wait_for_vblank;
+
+   obj_priv = obj->driver_private;
+
+   spin_lock_irqsave(&dev_priv->vblank_lock, irqflags);
+   wait_for_vblank = !list_empty(&obj_priv->vblank_head);
+   spin_unlock_irqrestore(&dev_priv->vblank_lock, irqflags);
+
+   if (wait_for_vblank)
+   wait_for_completion(&obj_priv->vblank);
+}
+
 int
 i915_gem_execbuffer(str

Re: [PATCH] drm: Take mmap_sem up front to avoid lock order violations.

2009-02-19 Thread Nick Piggin
On Thu, Feb 19, 2009 at 09:49:40AM -0500, Kristian Høgsberg wrote:
> 
> > Secondly, mmap_sem is not a recursive lock (very few kernel locks are,
> > and we generally frown upon recursive locking schemes), this means that
> > the fault handler still cannot function properly.
> 
> I understand, but we take it twice only as a read lock, so that should
> work, right?  We prevent the deadlock the lockdep validator warned about
> and as far as I can see, the patch doesn't introduce a new one.  But
> other than that I agree with the frowning on recursive locking, it's too
> often used to paper over badly thought out locking.

It doesn't work. rwsems are fair (otherwise there is terrible starvation
properties), so if another process does an interleaved down_write, then
the 2nd down_read will block until the down_write is serviced.

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Take mmap_sem up front to avoid lock order violations.

2009-02-19 Thread Kristian Høgsberg
On Thu, 2009-02-19 at 16:17 +0100, Nick Piggin wrote:
> On Thu, Feb 19, 2009 at 09:49:40AM -0500, Kristian Høgsberg wrote:
> > 
> > > Secondly, mmap_sem is not a recursive lock (very few kernel locks are,
> > > and we generally frown upon recursive locking schemes), this means that
> > > the fault handler still cannot function properly.
> > 
> > I understand, but we take it twice only as a read lock, so that should
> > work, right?  We prevent the deadlock the lockdep validator warned about
> > and as far as I can see, the patch doesn't introduce a new one.  But
> > other than that I agree with the frowning on recursive locking, it's too
> > often used to paper over badly thought out locking.
> 
> It doesn't work. rwsems are fair (otherwise there is terrible starvation
> properties), so if another process does an interleaved down_write, then
> the 2nd down_read will block until the down_write is serviced.

Ooh, right, yes of course, ouch.

thanks,
Kristian



--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] i915: add page flipping ioctl

2009-02-19 Thread Chris Wilson
On Mon, 2009-02-16 at 10:55 -0800, Jesse Barnes wrote:
> On Sunday, February 15, 2009 4:02 pm Chris Wilson wrote:
> > On my i915, the flip never occurs and we wait forever on the the vblank.
> > So I presume the command we sent the chip is invalid, or we miss the
> > irq? (I double-checked with lockdep in case I had missed an obvious
> > dead-lock.) Comments on the patch itself inline.
> 
> Thanks a lot for looking at this and testing.
> 
> Hm, maybe you're not getting vblank interrupts at all?  Do you see the count 
> increasing?  Is the IRQ registered?

Apparently, the vblank is never being enabled... Adding a BUG_ON() to
drm_vblank_put() soon identified the unbalanced culprit. So it appears
that drm_vblank_pre_modeset() can fail to enable vblank, obviously
because at that point we have never enabled vblank in pipeconf - but we
unconditionally decrement the vblank reference counter afterwards. Ok,
so now I see lots of vblank interrupts but the display is still snafu.

The following patch is a crude fix.

>From 1c4fa54654a604e0ea604553b983dca737a76e99 Mon Sep 17 00:00:00 2001
From: Chris Wilson 
Date: Thu, 19 Feb 2009 14:48:22 +
Subject: [PATCH] drm: Correct unbalanced drm_vblank_put() during mode
setting.

The first time we install a mode, the vblank will be disabled for a pipe
and so drm_vblank_get() in drm_vblank_pre_modeset() will fail. As we
unconditionally call drm_vblank_put() afterwards, the vblank reference
counter becomes unbalanced.

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/drm_irq.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 3795dbc..93e677a 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -435,6 +435,8 @@ EXPORT_SYMBOL(drm_vblank_get);
  */
 void drm_vblank_put(struct drm_device *dev, int crtc)
 {
+   BUG_ON (atomic_read (&dev->vblank_refcount[crtc]) == 0);
+
/* Last user schedules interrupt disable */
if (atomic_dec_and_test(&dev->vblank_refcount[crtc]))
mod_timer(&dev->vblank_disable_timer, jiffies + 5*DRM_HZ);
@@ -460,8 +462,9 @@ void drm_vblank_pre_modeset(struct drm_device *dev,
int crtc)
 * so that interrupts remain enabled in the interim.
 */
if (!dev->vblank_inmodeset[crtc]) {
-   dev->vblank_inmodeset[crtc] = 1;
-   drm_vblank_get(dev, crtc);
+   dev->vblank_inmodeset[crtc] = 0x1;
+   if (drm_vblank_get(dev, crtc) == 0)
+   dev->vblank_inmodeset[crtc] |= 0x2;
}
 }
 EXPORT_SYMBOL(drm_vblank_pre_modeset);
@@ -473,9 +476,12 @@ void drm_vblank_post_modeset(struct drm_device
*dev, int crtc)
if (dev->vblank_inmodeset[crtc]) {
spin_lock_irqsave(&dev->vbl_lock, irqflags);
dev->vblank_disable_allowed = 1;
-   dev->vblank_inmodeset[crtc] = 0;
spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
-   drm_vblank_put(dev, crtc);
+
+   if (dev->vblank_inmodeset[crtc] & 0x2)
+   drm_vblank_put(dev, crtc);
+
+   dev->vblank_inmodeset[crtc] = 0;
}
 }
 EXPORT_SYMBOL(drm_vblank_post_modeset);
-- 
1.6.0.4




--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r200 lockup in radeon_cp_texture/radeon_freelist_get

2009-02-19 Thread Alex Deucher
On Thu, Feb 19, 2009 at 9:51 AM, Stephane Marchesin
 wrote:
> On Thu, Feb 19, 2009 at 15:46, Alex Deucher  wrote:
>> On Thu, Feb 19, 2009 at 7:26 AM, Roland Scheidegger
>>  wrote:
>>> On 19.02.2009 12:23, Arkadi Shishlov wrote:
 Roland Scheidegger wrote:
> I suspect you're hitting a r200 asic bug, which isn't present in rv250
> and other r200 family members. There are workarounds in the driver for
> this (see r200UpdateTextureState), but to my knowledge they are
> insufficient for two-pass ATI_fragment_shader shaders. There's also a

 You're right. I changed video card to RV280 9250SE and lockup goes away.
 Nice picture, a little slower than fglrx, probably due to 9250 being
 slower than 8500.
>>> doom3 is actually a performance mystery to me. On my 9000pro,
>>> performance seemed to be similar to fglrx, however using another OS it
>>> was vastly faster, and I always wondered how it could be tweaked...
>>> Hyperz doesn't seem to help much, neither does the mipmap optimization,
>>> yet still somehow it must be possible to make it run much faster.
>>>

> mesa test which last I heard showed errors (progs/tests/afsmultiarb)
> (you can switch the test between one and two pass shaders).
> If this is the problem, you could try running doom3 using the arb path I
> think something like doom3 +seta r_renderer arb might work, however arb
> path looks ugly (r_renderer defaults to "best" which will then choose
> "r200" on this card).

 Yes, its ugly and incorrect, some walls are not opaque but blends over
 another walls.
>>> Oh, that sounds like a bug. Ugly yes but I didn't see that.
>>>

> Unfortunately I don't know how this could be fixed, as documentation for
> these asic bugs is nonexistent (or at least non-public).

 If lockup could be reliable reproduced with simple test - like
 afsmultiarb in fresh X without WM - will it be helpfull to get mmio
 trace from fglrx and r200 drivers to compare?
>>> I think at least some of the asic bugs do not necessarily result in a
>>> lockup but also could result in misrendering. Someone might be able to
>>> figure out what fglrx does, I guess of particular interest would be the
>>> writes to these debug regs (0x2D90 through 0x2DBC). That said, it might
>>> not be easy to figure this out completely (could depend on which texture
>>> units are enabled in what pass, and depending on filtering on each of
>>> those). Or it could even be some completely unrelated bug.
>>>
>>>

 In the light of recent progress with AMD's attitude, can't you just ask
 fglrx guys about R200 bug?
>>> R200 is understandably not exactly top priority, and it seemed like the
>>> usual docs didn't cover it. Though maybe Alex wants to comment on this.
>>
>> Unfortunately, r200 is so old, it hard to find much information on it 
>> anymore.
>>
>
> The r200 docs were released under NDA to selected people. So I don't
> think the r200 docs have completely disappeared off the earh ;)

well, we still have those, but those don't seem to cover whatever
quirk or errata seems to be at play here.  That's the stuff that's
hard to dig up.

Alex

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r200 lockup in radeon_cp_texture/radeon_freelist_get

2009-02-19 Thread Stephane Marchesin
On Thu, Feb 19, 2009 at 15:46, Alex Deucher  wrote:
> On Thu, Feb 19, 2009 at 7:26 AM, Roland Scheidegger
>  wrote:
>> On 19.02.2009 12:23, Arkadi Shishlov wrote:
>>> Roland Scheidegger wrote:
 I suspect you're hitting a r200 asic bug, which isn't present in rv250
 and other r200 family members. There are workarounds in the driver for
 this (see r200UpdateTextureState), but to my knowledge they are
 insufficient for two-pass ATI_fragment_shader shaders. There's also a
>>>
>>> You're right. I changed video card to RV280 9250SE and lockup goes away.
>>> Nice picture, a little slower than fglrx, probably due to 9250 being
>>> slower than 8500.
>> doom3 is actually a performance mystery to me. On my 9000pro,
>> performance seemed to be similar to fglrx, however using another OS it
>> was vastly faster, and I always wondered how it could be tweaked...
>> Hyperz doesn't seem to help much, neither does the mipmap optimization,
>> yet still somehow it must be possible to make it run much faster.
>>
>>>
 mesa test which last I heard showed errors (progs/tests/afsmultiarb)
 (you can switch the test between one and two pass shaders).
 If this is the problem, you could try running doom3 using the arb path I
 think something like doom3 +seta r_renderer arb might work, however arb
 path looks ugly (r_renderer defaults to "best" which will then choose
 "r200" on this card).
>>>
>>> Yes, its ugly and incorrect, some walls are not opaque but blends over
>>> another walls.
>> Oh, that sounds like a bug. Ugly yes but I didn't see that.
>>
>>>
 Unfortunately I don't know how this could be fixed, as documentation for
 these asic bugs is nonexistent (or at least non-public).
>>>
>>> If lockup could be reliable reproduced with simple test - like
>>> afsmultiarb in fresh X without WM - will it be helpfull to get mmio
>>> trace from fglrx and r200 drivers to compare?
>> I think at least some of the asic bugs do not necessarily result in a
>> lockup but also could result in misrendering. Someone might be able to
>> figure out what fglrx does, I guess of particular interest would be the
>> writes to these debug regs (0x2D90 through 0x2DBC). That said, it might
>> not be easy to figure this out completely (could depend on which texture
>> units are enabled in what pass, and depending on filtering on each of
>> those). Or it could even be some completely unrelated bug.
>>
>>
>>>
>>> In the light of recent progress with AMD's attitude, can't you just ask
>>> fglrx guys about R200 bug?
>> R200 is understandably not exactly top priority, and it seemed like the
>> usual docs didn't cover it. Though maybe Alex wants to comment on this.
>
> Unfortunately, r200 is so old, it hard to find much information on it anymore.
>

The r200 docs were released under NDA to selected people. So I don't
think the r200 docs have completely disappeared off the earh ;)

Stephane

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Take mmap_sem up front to avoid lock order violations.

2009-02-19 Thread Kristian Høgsberg
On Thu, 2009-02-19 at 11:33 +0100, Peter Zijlstra wrote:
> On Thu, 2009-02-19 at 10:19 +0100, Peter Zijlstra wrote:
> > On Wed, 2009-02-18 at 11:38 -0500, k...@bitplanet.net wrote:
> > > From: Kristian Høgsberg 
> > > 
> > > A number of GEM operations (and legacy drm ones) want to copy data to
> > > or from userspace while holding the struct_mutex lock.  However, the
> > > fault handler calls us with the mmap_sem held and thus enforces the
> > > opposite locking order.  This patch downs the mmap_sem up front for
> > > those operations that access userspace data under the struct_mutex
> > > lock to ensure the locking order is consistent.
> > > 
> > > Signed-off-by: Kristian Høgsberg 
> > > ---
> > > 
> > > Here's a different and simpler attempt to fix the locking order
> > > problem.  We can just down_read() the mmap_sem pre-emptively up-front,
> > > and the locking order is respected.  It's simpler than the
> > > mutex_trylock() game, avoids introducing a new mutex.
> > > 
> 
> OK let me try that again -- my initial response was a tad curt :/

No that's fair, I was aware that the patch was probably borderline and
got the feedback I was looking for ;)

> While I appreciate your efforts in fixing GEM (I too have an interest in
> seeing it done), I cannot support your patch.
> 
> Firstly, you're using mmap_sem well outside its problem domain, this is
> bad form. Furthermore, holding it for extended durations for no good
> reason affects all other users.

Yup, agree.

> Secondly, mmap_sem is not a recursive lock (very few kernel locks are,
> and we generally frown upon recursive locking schemes), this means that
> the fault handler still cannot function properly.

I understand, but we take it twice only as a read lock, so that should
work, right?  We prevent the deadlock the lockdep validator warned about
and as far as I can see, the patch doesn't introduce a new one.  But
other than that I agree with the frowning on recursive locking, it's too
often used to paper over badly thought out locking.

cheers,
Kristian



--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 20211] Runing Kubrick game maximized freezes X and causes kernel panic

2009-02-19 Thread bugzilla-daemon
http://bugs.freedesktop.org/show_bug.cgi?id=20211


Alex Deucher  changed:

   What|Removed |Added

  Attachment #23111|application/x-trash |text/plain
  mime type||
  Attachment #23111|0   |1
   is patch||




-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r200 lockup in radeon_cp_texture/radeon_freelist_get

2009-02-19 Thread Alex Deucher
On Thu, Feb 19, 2009 at 7:26 AM, Roland Scheidegger
 wrote:
> On 19.02.2009 12:23, Arkadi Shishlov wrote:
>> Roland Scheidegger wrote:
>>> I suspect you're hitting a r200 asic bug, which isn't present in rv250
>>> and other r200 family members. There are workarounds in the driver for
>>> this (see r200UpdateTextureState), but to my knowledge they are
>>> insufficient for two-pass ATI_fragment_shader shaders. There's also a
>>
>> You're right. I changed video card to RV280 9250SE and lockup goes away.
>> Nice picture, a little slower than fglrx, probably due to 9250 being
>> slower than 8500.
> doom3 is actually a performance mystery to me. On my 9000pro,
> performance seemed to be similar to fglrx, however using another OS it
> was vastly faster, and I always wondered how it could be tweaked...
> Hyperz doesn't seem to help much, neither does the mipmap optimization,
> yet still somehow it must be possible to make it run much faster.
>
>>
>>> mesa test which last I heard showed errors (progs/tests/afsmultiarb)
>>> (you can switch the test between one and two pass shaders).
>>> If this is the problem, you could try running doom3 using the arb path I
>>> think something like doom3 +seta r_renderer arb might work, however arb
>>> path looks ugly (r_renderer defaults to "best" which will then choose
>>> "r200" on this card).
>>
>> Yes, its ugly and incorrect, some walls are not opaque but blends over
>> another walls.
> Oh, that sounds like a bug. Ugly yes but I didn't see that.
>
>>
>>> Unfortunately I don't know how this could be fixed, as documentation for
>>> these asic bugs is nonexistent (or at least non-public).
>>
>> If lockup could be reliable reproduced with simple test - like
>> afsmultiarb in fresh X without WM - will it be helpfull to get mmio
>> trace from fglrx and r200 drivers to compare?
> I think at least some of the asic bugs do not necessarily result in a
> lockup but also could result in misrendering. Someone might be able to
> figure out what fglrx does, I guess of particular interest would be the
> writes to these debug regs (0x2D90 through 0x2DBC). That said, it might
> not be easy to figure this out completely (could depend on which texture
> units are enabled in what pass, and depending on filtering on each of
> those). Or it could even be some completely unrelated bug.
>
>
>>
>> In the light of recent progress with AMD's attitude, can't you just ask
>> fglrx guys about R200 bug?
> R200 is understandably not exactly top priority, and it seemed like the
> usual docs didn't cover it. Though maybe Alex wants to comment on this.

Unfortunately, r200 is so old, it hard to find much information on it anymore.

Alex

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Only use DRM_IOCTL_UPDATE_DRAW compat wrapper for compat X86.

2009-02-19 Thread Arnd Bergmann
On Wednesday 18 February 2009, David Miller wrote:
> drm: Only use DRM_IOCTL_UPDATE_DRAW compat wrapper for compat X86.
> 
> Only X86 32-bit uses a different alignment for "unsigned long long"
> than it's 64-bit counterpart.
> 
> Therefore this compat translation is only correct, and only needed,
> when either CONFIG_X86 or CONFIG_IA64.
> 
> Signed-off-by: David S. Miller 

The patch is correct AFAICT, but I'd like to point out that the
problem could have been avoided (besides using a non-padded layout)
by using a compat_u64 member in the struct definition instead of
the packed attribute:

typedef struct drm_update_draw32 {
drm_drawable_t handle;
unsigned int type;
unsigned int num;
compat_u64 data;/**< Pointer */
} drm_update_draw32_t;

I find that too often __attribute__((packed)) is used on whole
structures where some other much more gentle solution can be
applied. In fact, there are very few files (e.g.
linux/unaligned/packed_struct.h) that look like they want all
of the implied meanings (pack members, drop alignment on whole
structure, access members as unaligned).

A grep for "packed" in compat_ioctl definitions revealed the
same bug as in drm_update_draw32 to be present in
raw32_config_request, and I'm rather sure that there are more
of these.

Arnd <><

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Take mmap_sem up front to avoid lock order violations.

2009-02-19 Thread Nick Piggin
On Thu, Feb 19, 2009 at 10:19:05AM +0100, Peter Zijlstra wrote:
> On Wed, 2009-02-18 at 11:38 -0500, k...@bitplanet.net wrote:
> > From: Kristian Høgsberg 
> > 
> > A number of GEM operations (and legacy drm ones) want to copy data to
> > or from userspace while holding the struct_mutex lock.  However, the
> > fault handler calls us with the mmap_sem held and thus enforces the
> > opposite locking order.  This patch downs the mmap_sem up front for
> > those operations that access userspace data under the struct_mutex
> > lock to ensure the locking order is consistent.
> > 
> > Signed-off-by: Kristian Høgsberg 
> > ---
> > 
> > Here's a different and simpler attempt to fix the locking order
> > problem.  We can just down_read() the mmap_sem pre-emptively up-front,
> > and the locking order is respected.  It's simpler than the
> > mutex_trylock() game, avoids introducing a new mutex.

The "simple" way to fix this is to just allocate a temporary buffer
to copy a snapshot of the data going to/from userspace. Then do the
real usercopy to/from that buffer outside the locks.

You don't have any performance critical bulk copies (ie. that will
blow the L1 cache), do you? 


--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r200 lockup in radeon_cp_texture/radeon_freelist_get

2009-02-19 Thread Arkadi Shishlov
Roland Scheidegger wrote:
> I suspect you're hitting a r200 asic bug, which isn't present in rv250
> and other r200 family members. There are workarounds in the driver for
> this (see r200UpdateTextureState), but to my knowledge they are
> insufficient for two-pass ATI_fragment_shader shaders. There's also a

You're right. I changed video card to RV280 9250SE and lockup goes away.
Nice picture, a little slower than fglrx, probably due to 9250 being slower 
than 8500.

> mesa test which last I heard showed errors (progs/tests/afsmultiarb)
> (you can switch the test between one and two pass shaders).
> If this is the problem, you could try running doom3 using the arb path I
> think something like doom3 +seta r_renderer arb might work, however arb
> path looks ugly (r_renderer defaults to "best" which will then choose
> "r200" on this card).

Yes, its ugly and incorrect, some walls are not opaque but blends over another 
walls.

> Unfortunately I don't know how this could be fixed, as documentation for
> these asic bugs is nonexistent (or at least non-public).

If lockup could be reliable reproduced with simple test - like afsmultiarb in 
fresh X without WM - 
will it be helpfull to get mmio trace from fglrx and r200 drivers to compare?

In the light of recent progress with AMD's attitude, can't you just ask fglrx 
guys about R200 bug?

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: r200 lockup in radeon_cp_texture/radeon_freelist_get

2009-02-19 Thread Roland Scheidegger
On 19.02.2009 12:23, Arkadi Shishlov wrote:
> Roland Scheidegger wrote:
>> I suspect you're hitting a r200 asic bug, which isn't present in rv250
>> and other r200 family members. There are workarounds in the driver for
>> this (see r200UpdateTextureState), but to my knowledge they are
>> insufficient for two-pass ATI_fragment_shader shaders. There's also a
> 
> You're right. I changed video card to RV280 9250SE and lockup goes away.
> Nice picture, a little slower than fglrx, probably due to 9250 being
> slower than 8500.
doom3 is actually a performance mystery to me. On my 9000pro,
performance seemed to be similar to fglrx, however using another OS it
was vastly faster, and I always wondered how it could be tweaked...
Hyperz doesn't seem to help much, neither does the mipmap optimization,
yet still somehow it must be possible to make it run much faster.

> 
>> mesa test which last I heard showed errors (progs/tests/afsmultiarb)
>> (you can switch the test between one and two pass shaders).
>> If this is the problem, you could try running doom3 using the arb path I
>> think something like doom3 +seta r_renderer arb might work, however arb
>> path looks ugly (r_renderer defaults to "best" which will then choose
>> "r200" on this card).
> 
> Yes, its ugly and incorrect, some walls are not opaque but blends over
> another walls.
Oh, that sounds like a bug. Ugly yes but I didn't see that.

> 
>> Unfortunately I don't know how this could be fixed, as documentation for
>> these asic bugs is nonexistent (or at least non-public).
> 
> If lockup could be reliable reproduced with simple test - like
> afsmultiarb in fresh X without WM - will it be helpfull to get mmio
> trace from fglrx and r200 drivers to compare?
I think at least some of the asic bugs do not necessarily result in a
lockup but also could result in misrendering. Someone might be able to
figure out what fglrx does, I guess of particular interest would be the
writes to these debug regs (0x2D90 through 0x2DBC). That said, it might
not be easy to figure this out completely (could depend on which texture
units are enabled in what pass, and depending on filtering on each of
those). Or it could even be some completely unrelated bug.


> 
> In the light of recent progress with AMD's attitude, can't you just ask
> fglrx guys about R200 bug?
R200 is understandably not exactly top priority, and it seemed like the
usual docs didn't cover it. Though maybe Alex wants to comment on this.

Roland

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Take mmap_sem up front to avoid lock order violations.

2009-02-19 Thread Peter Zijlstra
On Thu, 2009-02-19 at 10:19 +0100, Peter Zijlstra wrote:
> On Wed, 2009-02-18 at 11:38 -0500, k...@bitplanet.net wrote:
> > From: Kristian Høgsberg 
> > 
> > A number of GEM operations (and legacy drm ones) want to copy data to
> > or from userspace while holding the struct_mutex lock.  However, the
> > fault handler calls us with the mmap_sem held and thus enforces the
> > opposite locking order.  This patch downs the mmap_sem up front for
> > those operations that access userspace data under the struct_mutex
> > lock to ensure the locking order is consistent.
> > 
> > Signed-off-by: Kristian Høgsberg 
> > ---
> > 
> > Here's a different and simpler attempt to fix the locking order
> > problem.  We can just down_read() the mmap_sem pre-emptively up-front,
> > and the locking order is respected.  It's simpler than the
> > mutex_trylock() game, avoids introducing a new mutex.
> > 

OK let me try that again -- my initial response was a tad curt :/

While I appreciate your efforts in fixing GEM (I too have an interest in
seeing it done), I cannot support your patch.

Firstly, you're using mmap_sem well outside its problem domain, this is
bad form. Furthermore, holding it for extended durations for no good
reason affects all other users.

Secondly, mmap_sem is not a recursive lock (very few kernel locks are,
and we generally frown upon recursive locking schemes), this means that
the fault handler still cannot function properly.





--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] drm: Take mmap_sem up front to avoid lock order violations.

2009-02-19 Thread Peter Zijlstra
On Wed, 2009-02-18 at 11:38 -0500, k...@bitplanet.net wrote:
> From: Kristian Høgsberg 
> 
> A number of GEM operations (and legacy drm ones) want to copy data to
> or from userspace while holding the struct_mutex lock.  However, the
> fault handler calls us with the mmap_sem held and thus enforces the
> opposite locking order.  This patch downs the mmap_sem up front for
> those operations that access userspace data under the struct_mutex
> lock to ensure the locking order is consistent.
> 
> Signed-off-by: Kristian Høgsberg 
> ---
> 
> Here's a different and simpler attempt to fix the locking order
> problem.  We can just down_read() the mmap_sem pre-emptively up-front,
> and the locking order is respected.  It's simpler than the
> mutex_trylock() game, avoids introducing a new mutex.
> 

Hell no!

for one, mmap_sem is not a recursive lock, so a pagefault will utterly
fail with this in place.

Secondly, holding mmap_sem for no good reason just sucks.

>  drivers/gpu/drm/i915/i915_dma.c |6 +-
>  drivers/gpu/drm/i915/i915_gem.c |   20 +---
>  2 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 81f1cff..d8b58d9 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -642,9 +642,11 @@ static int i915_batchbuffer(struct drm_device *dev, void 
> *data,
>  sizeof(struct 
> drm_clip_rect)))
>   return -EFAULT;
>  
> + down_read(¤t->mm->mmap_sem);
>   mutex_lock(&dev->struct_mutex);
>   ret = i915_dispatch_batchbuffer(dev, batch);
>   mutex_unlock(&dev->struct_mutex);
> + up_read(¤t->mm->mmap_sem);
>  
>   if (sarea_priv)
>   sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv);
> @@ -674,14 +676,16 @@ static int i915_cmdbuffer(struct drm_device *dev, void 
> *data,
>   return -EFAULT;
>   }
>  
> + down_read(¤t->mm->mmap_sem);
>   mutex_lock(&dev->struct_mutex);
>   ret = i915_dispatch_cmdbuffer(dev, cmdbuf);
>   mutex_unlock(&dev->struct_mutex);
> + up_read(¤t->mm->mmap_sem);
> +
>   if (ret) {
>   DRM_ERROR("i915_dispatch_cmdbuffer failed\n");
>   return ret;
>   }
> -
>   if (sarea_priv)
>   sarea_priv->last_dispatch = READ_BREADCRUMB(dev_priv);
>   return 0;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9cd42f..3dd8b6e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -171,6 +171,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>   return -EINVAL;
>   }
>  
> + down_read(¤t->mm->mmap_sem);
>   mutex_lock(&dev->struct_mutex);
>  
>   ret = i915_gem_object_set_cpu_read_domain_range(obj, args->offset,
> @@ -196,6 +197,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>  
>   drm_gem_object_unreference(obj);
>   mutex_unlock(&dev->struct_mutex);
> + up_read(¤t->mm->mmap_sem);
>  
>   return 0;
>  }
> @@ -264,7 +266,7 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct 
> drm_gem_object *obj,
>   if (!access_ok(VERIFY_READ, user_data, remain))
>   return -EFAULT;
>  
> -
> + down_read(¤t->mm->mmap_sem);
>   mutex_lock(&dev->struct_mutex);
>   ret = i915_gem_object_pin(obj, 0);
>   if (ret) {
> @@ -315,6 +317,7 @@ i915_gem_gtt_pwrite(struct drm_device *dev, struct 
> drm_gem_object *obj,
>  fail:
>   i915_gem_object_unpin(obj);
>   mutex_unlock(&dev->struct_mutex);
> + up_read(¤t->mm->mmap_sem);
>  
>   return ret;
>  }
> @@ -328,6 +331,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, struct 
> drm_gem_object *obj,
>   loff_t offset;
>   ssize_t written;
>  
> + down_read(¤t->mm->mmap_sem);
>   mutex_lock(&dev->struct_mutex);
>  
>   ret = i915_gem_object_set_to_cpu_domain(obj, 1);
> @@ -350,6 +354,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev, struct 
> drm_gem_object *obj,
>   }
>  
>   mutex_unlock(&dev->struct_mutex);
> + up_read(¤t->mm->mmap_sem);
>  
>   return 0;
>  }
> @@ -2473,22 +2478,21 @@ i915_gem_execbuffer(struct drm_device *dev, void 
> *data,
>   goto pre_mutex_err;
>   }
>  
> + down_read(¤t->mm->mmap_sem);
>   mutex_lock(&dev->struct_mutex);
>  
>   i915_verify_inactive(dev, __FILE__, __LINE__);
>  
>   if (dev_priv->mm.wedged) {
>   DRM_ERROR("Execbuf while wedged\n");
> - mutex_unlock(&dev->struct_mutex);
>   ret = -EIO;
> - goto pre_mutex_err;
> + goto mutex_err;
>   }
>  
>   if (dev_priv->mm.suspended) {
>   DRM_ERROR("Execbuf while VT-switched.\n");
> - mutex_unlock(&dev->struct_mutex);
>   ret = -EBUSY;
> - goto pre_mutex_err;
> + goto mutex_err;
>   }
> 

[Bug 12613] [Suspend regression][DRM, RADEON]

2009-02-19 Thread bugme-daemon
http://bugzilla.kernel.org/show_bug.cgi?id=12613


mmvi...@yahoo.com changed:

   What|Removed |Added

 CC||mmvi...@yahoo.com




--- Comment #7 from mmvi...@yahoo.com  2009-02-19 01:52 ---
The patch posted by Etienne in http://lkml.org/lkml/2009/2/15/28 fixes a
similar problem with hibernation on my HP nx9005 laptop with ATI Technologies
Inc Radeon Mobility U1 (IGP 320M or something) graphics card.
I can't test suspend to ram on this machine due to another problem
(http://bugzilla.kernel.org/show_bug.cgi?id=5884).

I can make the computer hang with the following steps without the patch (in
2.6.29-rc5-122-g... something):

1) log in to fail safe session (no 3d fancies)
2) check glxgears  (works)
3) echo test > /sys/power/disk
4) echo disk > /sys/power/state (resumes itself after 5 seconds, as expected)
5) try glxgears (immediate hang (mouse doesn't move, caps lock light doesn't
change); fan gets on, so it is trying to do something; power button turns the
machine off normally, etc.)

With Etienne's patch the computer doesn't hang anymore with the steps described
above and even suspend to disk and resume with KDE 4 running works.

Not bad from a "shot in the dark" patch :)


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 20211] Runing Kubrick game maximized freezes X and causes kernel panic

2009-02-19 Thread bugzilla-daemon
http://bugs.freedesktop.org/show_bug.cgi?id=20211





--- Comment #1 from Jure Repinc   2009-02-19 01:30:09 PST 
---
Created an attachment (id=23111)
 --> (http://bugs.freedesktop.org/attachment.cgi?id=23111)
Xorg.0.log


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[Bug 20211] New: Runing Kubrick game maximized freezes X and causes kernel panic

2009-02-19 Thread bugzilla-daemon
http://bugs.freedesktop.org/show_bug.cgi?id=20211

   Summary: Runing Kubrick game maximized freezes X and causes
kernel panic
   Product: DRI
   Version: XOrg CVS
  Platform: x86-64 (AMD64)
OS/Version: Linux (All)
Status: NEW
  Severity: critical
  Priority: medium
 Component: DRM/Radeon
AssignedTo: dri-devel@lists.sourceforge.net
ReportedBy: jlp.b...@gmail.com


Created an attachment (id=23110)
 --> (http://bugs.freedesktop.org/attachment.cgi?id=23110)
xorg.conf

My Xorg server is version 1.5.3, Linux kernel is 2.6.26.6, graphics card is:
01:00.0 VGA compatible controller: ATI Technologies Inc RV350 AR [Radeon 9600]

If I run Kubrick game from KDE 4.2.0 at its default size all is OK. But if I
maximize it over the entire screen (I use resolution 1280x1024) after some time
of runing in demo mode it freezes the X server, only mouse pounter can move.
When I check out the list of processes over SSH I can see kubrick using all
CPU. When I kill kubrick process the system completely locks up and keyboard
LEDs start flashing so I guess the kernel panics.


-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel