from:"Maarten Lankhorst"

Re: [Nouveau] [PATCH 09/17] drm/radeon: use common fence implementation for fences

2014-07-22 Thread Maarten Lankhorst

op 22-07-14 15:45, Christian König schreef:
> Am 22.07.2014 15:26, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 02:19:57PM +0200, Christian König wrote:
>>> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>>>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian König wrote:
>>>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>>>> On 9 July 2014 22:29, Maarten Lankhorst 
>>>>>>>  wrote:
>>>>>>>> Signed-off-by: Maarten Lankhorst 
>>>>>>>> ---
>>>>>>>>   drivers/gpu/drm/radeon/radeon.h|   15 +-
>>>>>>>>   drivers/gpu/drm/radeon/radeon_device.c |   60 -
>>>>>>>>   drivers/gpu/drm/radeon/radeon_fence.c  |  223 
>>>>>>>> ++--
>>>>>>>>   3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>>>
>>>>>>>  From what I can see this is still suffering from the problem that we
>>>>>>> need to find a proper solution to,
>>>>>>>
>>>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>>>> re-reading things is:
>>>>>>>
>>>>>>> We really need to work out a better interface into the drivers to be
>>>>>>> able to avoid random atomic entrypoints,
>>>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>>>> I guess I've lost context a bit, but which atomic entry point are we
>>>>> talking about? Afaics the only one that's mandatory is the is
>>>>> fence->signaled callback to check whether a fence really has been
>>>>> signalled. It's used internally by the fence code to avoid spurious
>>>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>>>> not the case then we can always track the signalled state in software and
>>>>> double-check in a worker thread before updating the sw state. And wrap
>>>>> this all up into a special fence class if there's more than one driver
>>>>> needing this.
>>>> One thing I've forgotten: The i915 scheduler that's floating around runs
>>>> its bottom half from irq context. So I really want to be able to check
>>>> fence state from irq context and I also want to make it possible
>>>> (possible! not mandatory) to register callbacks which are run from any
>>>> context asap after the fence is signalled.
>>> NAK, that's just the bad design I've talked about. Checking fence state
>>> inside the same driver from interrupt context is OK, because it's the
>>> drivers interrupt that we are talking about here.
>>>
>>> Checking fence status from another drivers interrupt context is what really
>>> concerns me here, cause your driver doesn't have the slightest idea if the
>>> called driver is really capable of checking the fence right now.
>> I guess my mail hasn't been clear then. If you don't like it we could add
>> a bit of glue to insulate the madness and bad design i915 might do from
>> radeon. That imo doesn't invalidate the overall fence interfaces.
>>
>> So what about the following:
>> - fence->enabling_signaling is restricted to be called from process
>>context. We don't use any different yet, so would boild down to adding a
>>WARN_ON(in_interrupt) or so to fence_enable_sw_signalling.
>>
>> - Make fence->signaled optional (already the case) and don't implement it
>>in readon (i.e. reduce this patch here). Only downside is that radeon
>>needs to correctly (i.e. without races or so) call fence_signal. And the
>>cross-driver synchronization might be a bit less efficient. Note that
>>you can call fence_signal from wherever you want to, so hopefully that
>>doesn't restrict your implementation.
>>
>> End result: No one calls into radeon from interrupt context, and this is
>> guaranteed.
>>
>> Would that be something you can agree to?
>
> No, the whole enable_signaling stuff should go away. No callback from the 
> driver into the fence code, only the other way around.
>
> fence->signaled as well as fence->wait should become mandatory

Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

2014-07-22 Thread Maarten Lankhorst

op 22-07-14 16:24, Christian König schreef:
>> No, you really shouldn't be doing much in the check anyway, it's meant to be 
>> a lightweight check. If you're not ready yet because of a lockup simply 
>> return not signaled yet.
> It's not only the lockup case from radeon I have in mind here. For userspace 
> queues it might be necessary to call copy_from_user to figure out if a fence 
> is signaled or not.
>
> Returning false all the time is probably not a good idea either.
Having userspace implement a fence sounds like an awful idea, why would you 
want to do that?

A fence could be exported to userspace, but that would only mean it can wait 
for it to be signaled with an interface like poll..

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

2014-07-22 Thread Maarten Lankhorst

op 22-07-14 14:19, Christian König schreef:
> Am 22.07.2014 13:57, schrieb Daniel Vetter:
>> On Tue, Jul 22, 2014 at 01:46:07PM +0200, Daniel Vetter wrote:
>>> On Tue, Jul 22, 2014 at 10:43:13AM +0200, Christian König wrote:
>>>> Am 22.07.2014 06:05, schrieb Dave Airlie:
>>>>> On 9 July 2014 22:29, Maarten Lankhorst  
>>>>> wrote:
>>>>>> Signed-off-by: Maarten Lankhorst 
>>>>>> ---
>>>>>>   drivers/gpu/drm/radeon/radeon.h|   15 +-
>>>>>>   drivers/gpu/drm/radeon/radeon_device.c |   60 -
>>>>>>   drivers/gpu/drm/radeon/radeon_fence.c  |  223 
>>>>>> ++--
>>>>>>   3 files changed, 248 insertions(+), 50 deletions(-)
>>>>>>
>>>>>  From what I can see this is still suffering from the problem that we
>>>>> need to find a proper solution to,
>>>>>
>>>>> My summary of the issues after talking to Jerome and Ben and
>>>>> re-reading things is:
>>>>>
>>>>> We really need to work out a better interface into the drivers to be
>>>>> able to avoid random atomic entrypoints,
>>>> Which is exactly what I criticized from the very first beginning. Good to
>>>> know that I'm not the only one thinking that this isn't such a good idea.
>>> I guess I've lost context a bit, but which atomic entry point are we
>>> talking about? Afaics the only one that's mandatory is the is
>>> fence->signaled callback to check whether a fence really has been
>>> signalled. It's used internally by the fence code to avoid spurious
>>> wakeups. Afaik that should be doable already on any hardware. If that's
>>> not the case then we can always track the signalled state in software and
>>> double-check in a worker thread before updating the sw state. And wrap
>>> this all up into a special fence class if there's more than one driver
>>> needing this.
>> One thing I've forgotten: The i915 scheduler that's floating around runs
>> its bottom half from irq context. So I really want to be able to check
>> fence state from irq context and I also want to make it possible
>> (possible! not mandatory) to register callbacks which are run from any
>> context asap after the fence is signalled.
>
> NAK, that's just the bad design I've talked about. Checking fence state 
> inside the same driver from interrupt context is OK, because it's the drivers 
> interrupt that we are talking about here.
>
> Checking fence status from another drivers interrupt context is what really 
> concerns me here, cause your driver doesn't have the slightest idea if the 
> called driver is really capable of checking the fence right now.
I think there is a usecase for having atomic context allowed with 
fence_is_signaled, but I don't think there is one for interrupt context, so 
it's good with me if fence_is_signaled cannot be called in interrupt context, 
or with irqs disabled.

fence_enable_sw_signaling disables interrupts because it holds fence->lock, so 
in theory it could be called from any context including interrupts. But no sane 
driver author does that, or at least I hope not..

Would a sanity check like the one below be enough to allay your fears?
8<---

diff --git a/include/linux/fence.h b/include/linux/fence.h
index d174585b874b..c1a4519ba2f5 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -143,6 +143,7 @@ struct fence_cb {
  * the second time will be a noop since it was already signaled.
  *
  * Notes on signaled:
+ * Called with interrupts enabled, and never from interrupt context.
  * May set fence->status if returning true.
  *
  * Notes on wait:
@@ -268,15 +269,29 @@ fence_is_signaled_locked(struct fence *fence)
 static inline bool
 fence_is_signaled(struct fence *fence)
 {
+   bool ret;
+
if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
return true;
 
-   if (fence->ops->signaled && fence->ops->signaled(fence)) {
+   if (!fence->ops->signaled)
+   return false;
+
+   if (config_enabled(CONFIG_PROVE_LOCKING))
+   WARN_ON(in_interrupt() || irqs_disabled());
+
+   if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP))
+   preempt_disable();
+
+   ret = fence->ops->signaled(fence);
+
+   if (config_enabled(CONFIG_DEBUG_ATOMIC_SLEEP))
+   preempt_enable();
+
+   if (ret)
fence_signal(fence);
-   return true;
-   }
 
-   return false;
+   return ret;
 }
 
 /**
8<--

Re: [PATCH 09/17] drm/radeon: use common fence implementation for fences

2014-07-22 Thread Maarten Lankhorst

Hey,

op 22-07-14 06:05, Dave Airlie schreef:
> On 9 July 2014 22:29, Maarten Lankhorst  
> wrote:
>> Signed-off-by: Maarten Lankhorst 
>> ---
>>  drivers/gpu/drm/radeon/radeon.h|   15 +-
>>  drivers/gpu/drm/radeon/radeon_device.c |   60 -
>>  drivers/gpu/drm/radeon/radeon_fence.c  |  223 
>> ++--
>>  3 files changed, 248 insertions(+), 50 deletions(-)
>>
> From what I can see this is still suffering from the problem that we
> need to find a proper solution to,
>
> My summary of the issues after talking to Jerome and Ben and
> re-reading things is:
>
> We really need to work out a better interface into the drivers to be
> able to avoid random atomic entrypoints,
> I'm sure you have some ideas and I think you really need to
> investigate them to move this thing forward,
> even it if means some issues with android sync pts.
>
> but none of the two major drivers seem to want the interface as-is so
> something needs to give
wait_queue_t (which radeon uses for fence_queue) uses atomic entrypoints too, 
the most common
one being autoremove_wake_function, which wakes up the thread it was 
initialized from, and removes
itself from the wait_queue_t list, in atomic fashion. It's used by 
__wait_event_interruptible_locked,
if something internally wants to add some arbitrary callback it could already 
happen...

> My major question is why we need an atomic callback here at all, what
> scenario does it cover?
A atomic callback could do something like schedule_work(&work) (like 
nouveau_fence_work already does right now).

I've also added some more experimental things in my unsubmitted branch, in a 
codepath that's taken when synchronization is used with multiple GPU's:

Nouveau: I write the new seqno to the GART fence, which I added a GPU wait for 
using SEMAPHORE_TRIGGER.ACQUIRE_GE.
radeon: I write to a memory location to unblock the execution ring, this will 
probably be replaced by a call to the GPU scheduler.
i915: write to the EXCC (condition code) register to unblock the ring operation 
when it's waiting for the condition code.

But I want to emphasize that this is a hack, and driver maintainers will 
probably NACK it, I think I will only submit the one for nouveau, which is sane 
there because it schedules contexts in hardware.
Even so that part is not final and will probably go through a few iterations 
before submission.

> Surely we can use a workqueue based callback to ask a driver to check
> its signalling, is it really
> that urgent?
Nothing prevents a driver from using that approach, even with those changes.

Driver maintainers can still NACK the use of fence_add_callback if they want to,
or choose not to export fences to outside the driver. Because fences are still
not exporting, nothing will change for them compared to the current situation.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 09/17] drm/radeon: use common fence implementation for fences

2014-07-09 Thread Maarten Lankhorst

op 09-07-14 14:57, Deucher, Alexander schreef:
>> 
>> +static const char *radeon_fence_get_timeline_name(struct fence *f)
>> +{
>> +struct radeon_fence *fence = to_radeon_fence(f);
>> +switch (fence->ring) {
>> +case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
>> +case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
>> +case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
>> +case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
>> +case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
>> +case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> Radeon supports vce rings on newer ascis.  Probably want to add the case for 
> those here too.
>
> Alex
>
Indeed, how about this?
--8<---
Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon.h|  15 +--
 drivers/gpu/drm/radeon/radeon_device.c |  60 -
 drivers/gpu/drm/radeon/radeon_fence.c  | 225 +++--
 3 files changed, 250 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 29d9cc04c04e..03a5567f2c2f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -116,9 +117,6 @@ extern int radeon_deep_color;
 #define RADEONFB_CONN_LIMIT4
 #define RADEON_BIOS_NUM_SCRATCH8
 
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ  0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX 0
@@ -350,12 +348,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {
+   struct fence base;
+
struct radeon_device*rdev;
-   struct kref kref;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
+
+   wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2268,6 +2269,7 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
+   unsignedfence_context;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
boolib_pool_ready;
@@ -2358,11 +2360,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 
index);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 
 /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 03686fab842d..86699df7c8f3 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1213,6 +1213,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+   rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 
0x%04X:0x%04X).\n",
radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1607,6 +1608,54 @@ int radeon_resume_kms(struct drm_device *dev, bool 
resume, bool fbcon)
return 0;
 }
 
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+   uint32_t mask = 0;
+   int i;
+
+   if (!rdev->ddev->irq_enabled)
+   return mask;
+
+   /*
+* increase refcount on sw interrupts for all rings to stop
+* enabling interrupts in radeon_fence_enable_signaling during
+* gpu reset.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!rdev->ring[i].ready)
+   continue;
+
+   atomic_inc(&rdev->irq.ring_int[i]);
+   mask |= 1 << i;
+   }
+   return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+   unsigned long irqflags;
+   int i;
+
+   if (!mask)
+   return;
+
+   /*
+* undo refcount increase, and reset irqs to correct value.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!(mask & (1 << i)))
+   continue;
+
+   atomic_dec(&rdev->irq.ring_int[i]);
+   }
+
+   sp

Re: [PATCH 00/17] Convert TTM to the new fence interface.

2014-07-09 Thread Maarten Lankhorst

op 09-07-14 15:09, Mike Lothian schreef:
> Hi Maarten
>
> Will this stop the stuttering I've been seeing with DRI3 and PRIME? Or will
> other patches / plumbing be required
>
No, that testing was with the whole series including the parts where you 
synchronized intel with radeon (iirc).
Although it might if lucky, I noticed that I missed a int to long conversion, 
which resulted in a success being
reported as error, disabling graphics acceleration entirely.

The series here simply convert the drivers to a common fence infrastructure, 
but shouldn't cause any regressions
or any major behavioral changes. A separate series is needed to make intel and 
radeon synchronized,
and for that series the support on the intel side is a hack. It should be 
possible to get the the radeon/nouveau
changes upstreamed, but this conversion is required for that.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/17] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers

2014-07-09 Thread Maarten Lankhorst

It seems some drivers really want this as a parameter,
like vmwgfx.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/qxl/qxl_release.c|2 +-
 drivers/gpu/drm/radeon/radeon_object.c   |2 +-
 drivers/gpu/drm/radeon/radeon_uvd.c  |2 +-
 drivers/gpu/drm/radeon/radeon_vm.c   |2 +-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |   22 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |7 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |2 +-
 include/drm/ttm/ttm_execbuf_util.h   |9 +
 8 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 14e776f1d14e..2b43e5deb051 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -159,7 +159,7 @@ int qxl_release_reserve_list(struct qxl_release *release, 
bool no_intr)
if (list_is_singular(&release->bos))
return 0;
 
-   ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos);
+   ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos, !no_intr);
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 6c717b257d6d..a3ed725ea641 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -438,7 +438,7 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
u64 bytes_moved = 0, initial_bytes_moved;
u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
 
-   r = ttm_eu_reserve_buffers(ticket, head);
+   r = ttm_eu_reserve_buffers(ticket, head, true);
if (unlikely(r != 0)) {
return r;
}
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c 
b/drivers/gpu/drm/radeon/radeon_uvd.c
index a4ad270e8261..67b2a367df40 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -620,7 +620,7 @@ static int radeon_uvd_send_msg(struct radeon_device *rdev,
INIT_LIST_HEAD(&head);
list_add(&tv.head, &head);
 
-   r = ttm_eu_reserve_buffers(&ticket, &head);
+   r = ttm_eu_reserve_buffers(&ticket, &head, true);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c 
b/drivers/gpu/drm/radeon/radeon_vm.c
index eecff6bbd341..4c68852c3e72 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -364,7 +364,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
 INIT_LIST_HEAD(&head);
 list_add(&tv.head, &head);
 
-r = ttm_eu_reserve_buffers(&ticket, &head);
+r = ttm_eu_reserve_buffers(&ticket, &head, true);
 if (r)
return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c 
b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index e8dac8758528..39a11bbd2bac 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
  */
 
 int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
-  struct list_head *list)
+  struct list_head *list, bool intr)
 {
struct ttm_bo_global *glob;
struct ttm_validate_buffer *entry;
@@ -140,7 +140,7 @@ retry:
if (entry->reserved)
continue;
 
-   ret = __ttm_bo_reserve(bo, true, (ticket == NULL), true,
+   ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
   ticket);
 
if (ret == -EDEADLK) {
@@ -153,13 +153,17 @@ retry:
ttm_eu_backoff_reservation_locked(list);
spin_unlock(&glob->lru_lock);
ttm_eu_list_ref_sub(list);
-   ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
-  ticket);
-   if (unlikely(ret != 0)) {
-   if (ret == -EINTR)
-   ret = -ERESTARTSYS;
-   goto err_fini;
-   }
+
+   if (intr) {
+   ret = 
ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+  ticket);
+   if (unlikely(ret != 0)) {
+   if (ret == -EINTR)
+   ret = -ERESTARTSYS;
+   goto err_fini;
+   }
+   } else
+   ww_mutex_lock_slow(&bo->resv->lock, ticket);

[PATCH 10/17] drm/qxl: rework to new fence interface

2014-07-09 Thread Maarten Lankhorst

Final driver! \o/

This is not a proper dma_fence because the hardware may never signal
anything, so don't use dma-buf with qxl, ever.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/qxl/Makefile  |2 
 drivers/gpu/drm/qxl/qxl_cmd.c |5 -
 drivers/gpu/drm/qxl/qxl_debugfs.c |   12 ++-
 drivers/gpu/drm/qxl/qxl_drv.h |   22 ++---
 drivers/gpu/drm/qxl/qxl_fence.c   |   87 ---
 drivers/gpu/drm/qxl/qxl_kms.c |2 
 drivers/gpu/drm/qxl/qxl_object.c  |2 
 drivers/gpu/drm/qxl/qxl_release.c |  166 -
 drivers/gpu/drm/qxl/qxl_ttm.c |   97 --
 9 files changed, 220 insertions(+), 175 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

diff --git a/drivers/gpu/drm/qxl/Makefile b/drivers/gpu/drm/qxl/Makefile
index ea046ba691d2..ac0d74852e11 100644
--- a/drivers/gpu/drm/qxl/Makefile
+++ b/drivers/gpu/drm/qxl/Makefile
@@ -4,6 +4,6 @@
 
 ccflags-y := -Iinclude/drm
 
-qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o 
qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o 
qxl_ioctl.o qxl_fence.o qxl_release.o
+qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o 
qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o 
qxl_ioctl.o qxl_release.o
 
 obj-$(CONFIG_DRM_QXL)+= qxl.o
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 45fad7b45486..97823644d347 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -620,11 +620,6 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct 
qxl_bo *surf, bool stal
if (ret == -EBUSY)
return -EBUSY;
 
-   if (surf->fence.num_active_releases > 0 && stall == false) {
-   qxl_bo_unreserve(surf);
-   return -EBUSY;
-   }
-
if (stall)
mutex_unlock(&qdev->surf_evict_mutex);
 
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c 
b/drivers/gpu/drm/qxl/qxl_debugfs.c
index c3c2bbdc6674..0d144e0646d6 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -57,11 +57,21 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
struct qxl_device *qdev = node->minor->dev->dev_private;
struct qxl_bo *bo;
 
+   spin_lock(&qdev->release_lock);
list_for_each_entry(bo, &qdev->gem.objects, list) {
+   struct reservation_object_list *fobj;
+   int rel;
+
+   rcu_read_lock();
+   fobj = rcu_dereference(bo->tbo.resv->fence);
+   rel = fobj ? fobj->shared_count : 0;
+   rcu_read_unlock();
+
seq_printf(m, "size %ld, pc %d, sync obj %p, num releases %d\n",
   (unsigned long)bo->gem_base.size, bo->pin_count,
-  bo->tbo.sync_obj, bo->fence.num_active_releases);
+  bo->tbo.sync_obj, rel);
}
+   spin_unlock(&qdev->release_lock);
return 0;
 }
 
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 36ed40ba773f..d547cbdebeb4 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -31,6 +31,7 @@
  * Definitions taken from spice-protocol, plus kernel driver specific bits.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -95,13 +96,6 @@ enum {
QXL_INTERRUPT_IO_CMD |\
QXL_INTERRUPT_CLIENT_MONITORS_CONFIG)
 
-struct qxl_fence {
-   struct qxl_device *qdev;
-   uint32_t num_active_releases;
-   uint32_t *release_ids;
-   struct radix_tree_root tree;
-};
-
 struct qxl_bo {
/* Protected by gem.mutex */
struct list_headlist;
@@ -113,13 +107,13 @@ struct qxl_bo {
unsignedpin_count;
void*kptr;
int type;
+
/* Constant after initialization */
struct drm_gem_object   gem_base;
bool is_primary; /* is this now a primary surface */
bool hw_surf_alloc;
struct qxl_surface surf;
uint32_t surface_id;
-   struct qxl_fence fence; /* per bo fence  - list of releases */
struct qxl_release *surf_create;
 };
 #define gem_to_qxl_bo(gobj) container_of((gobj), struct qxl_bo, gem_base)
@@ -191,6 +185,8 @@ enum {
  * spice-protocol/qxl_dev.h */
 #define QXL_MAX_RES 96
 struct qxl_release {
+   struct fence base;
+
int id;
int type;
uint32_t release_offset;
@@ -284,7 +280,11 @@ struct qxl_device {
uint8_t slot_gen_bits;
uint64_tva_slot_mask;
 
+   /* XXX: when rcu becomes available, release_lock can be killed */
+   spinlock_t  release_lock;
+   spinlock_t  fence_lock;
struct idr  release_idr;

[PATCH 17/17] drm/ttm: use rcu in core ttm

2014-07-09 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/ttm/ttm_bo.c |   76 +++---
 1 file changed, 13 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 31c4a6dd722d..6fe1f4bf37ed 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -466,66 +466,6 @@ static void ttm_bo_cleanup_refs_or_queue(struct 
ttm_buffer_object *bo)
  ((HZ / 100) < 1) ? 1 : HZ / 100);
 }
 
-static int ttm_bo_unreserve_and_wait(struct ttm_buffer_object *bo,
-bool interruptible)
-{
-   struct ttm_bo_global *glob = bo->glob;
-   struct reservation_object_list *fobj;
-   struct fence *excl = NULL;
-   struct fence **shared = NULL;
-   u32 shared_count = 0, i;
-   int ret = 0;
-
-   fobj = reservation_object_get_list(bo->resv);
-   if (fobj && fobj->shared_count) {
-   shared = kmalloc(sizeof(*shared) * fobj->shared_count,
-GFP_KERNEL);
-
-   if (!shared) {
-   ret = -ENOMEM;
-   __ttm_bo_unreserve(bo);
-   spin_unlock(&glob->lru_lock);
-   return ret;
-   }
-
-   for (i = 0; i < fobj->shared_count; ++i) {
-   if (!fence_is_signaled(fobj->shared[i])) {
-   fence_get(fobj->shared[i]);
-   shared[shared_count++] = fobj->shared[i];
-   }
-   }
-   if (!shared_count) {
-   kfree(shared);
-   shared = NULL;
-   }
-   }
-
-   excl = reservation_object_get_excl(bo->resv);
-   if (excl && !fence_is_signaled(excl))
-   fence_get(excl);
-   else
-   excl = NULL;
-
-   __ttm_bo_unreserve(bo);
-   spin_unlock(&glob->lru_lock);
-
-   if (excl) {
-   ret = fence_wait(excl, interruptible);
-   fence_put(excl);
-   }
-
-   if (shared_count > 0) {
-   for (i = 0; i < shared_count; ++i) {
-   if (!ret)
-   ret = fence_wait(shared[i], interruptible);
-   fence_put(shared[i]);
-   }
-   kfree(shared);
-   }
-
-   return ret;
-}
-
 /**
  * function ttm_bo_cleanup_refs_and_unlock
  * If bo idle, remove from delayed- and lru lists, and unref.
@@ -549,9 +489,19 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
ttm_buffer_object *bo,
ret = ttm_bo_wait(bo, false, false, true);
 
if (ret && !no_wait_gpu) {
-   ret = ttm_bo_unreserve_and_wait(bo, interruptible);
-   if (ret)
-   return ret;
+   long lret;
+   ww_mutex_unlock(&bo->resv->lock);
+   spin_unlock(&glob->lru_lock);
+
+   lret = reservation_object_wait_timeout_rcu(bo->resv,
+  true,
+  interruptible,
+  30 * HZ);
+
+   if (lret < 0)
+   return lret;
+   else if (lret == 0)
+   return -EBUSY;
 
spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/17] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab

2014-07-09 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 20a1a866ceeb..79e950df3018 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -567,13 +567,16 @@ static int vmw_user_dmabuf_synccpu_grab(struct 
vmw_user_dma_buffer *user_bo,
int ret;
 
if (flags & drm_vmw_synccpu_allow_cs) {
-   ret = ttm_bo_reserve(bo, true, !!(flags & 
drm_vmw_synccpu_dontblock), false, 0);
-   if (!ret) {
-   ret = ttm_bo_wait(bo, false, true,
- !!(flags & 
drm_vmw_synccpu_dontblock));
-   ttm_bo_unreserve(bo);
-   }
-   return ret;
+   long lret;
+   if (flags & drm_vmw_synccpu_dontblock)
+   return reservation_object_test_signaled_rcu(bo->resv, 
true) ? 0 : -EBUSY;
+
+   lret = reservation_object_wait_timeout_rcu(bo->resv, true, 
true, MAX_SCHEDULE_TIMEOUT);
+   if (!lret)
+   return -EBUSY;
+   else if (lret < 0)
+   return lret;
+   return 0;
}
 
ret = ttm_bo_synccpu_write_grab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/17] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep

2014-07-09 Thread Maarten Lankhorst

With the conversion to the reservation api this should be safe.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c |   28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 4beaa897adad..c2ca894f6507 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -863,33 +863,29 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void 
*data,
struct drm_gem_object *gem;
struct nouveau_bo *nvbo;
bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
+   bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE);
int ret;
-   struct nouveau_fence *fence = NULL;
 
gem = drm_gem_object_lookup(dev, file_priv, req->handle);
if (!gem)
return -ENOENT;
nvbo = nouveau_gem_object(gem);
 
-   ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
-   if (!ret) {
-   ret = ttm_bo_wait(&nvbo->bo, true, true, true);
-   if (!no_wait && ret) {
-   struct fence *excl;
-
-   excl = reservation_object_get_excl(nvbo->bo.resv);
-   fence = nouveau_fence_ref((struct nouveau_fence *)excl);
-   }
+   if (no_wait)
+   ret = reservation_object_test_signaled_rcu(nvbo->bo.resv, 
write) ? 0 : -EBUSY;
+   else {
+   long lret;
 
-   ttm_bo_unreserve(&nvbo->bo);
+   lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, 
write, true, 30 * HZ);
+   if (!lret)
+   ret = -EBUSY;
+   else if (lret > 0)
+   ret = 0;
+   else
+   ret = lret;
}
drm_gem_object_unreference_unlocked(gem);
 
-   if (fence) {
-   ret = nouveau_fence_wait(fence, true, no_wait);
-   nouveau_fence_unref(&fence);
-   }
-
return ret;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/17] drm/radeon: use rcu waits in some ioctls

2014-07-09 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon_gem.c |   19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
b/drivers/gpu/drm/radeon/radeon_gem.c
index d09650c1d720..7ba883843668 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -107,9 +107,12 @@ static int radeon_gem_set_domain(struct drm_gem_object 
*gobj,
}
if (domain == RADEON_GEM_DOMAIN_CPU) {
/* Asking for cpu access wait for object idle */
-   r = radeon_bo_wait(robj, NULL, false);
-   if (r) {
-   printk(KERN_ERR "Failed to wait for object !\n");
+   r = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, 
true, 30 * HZ);
+   if (!r)
+   r = -EBUSY;
+
+   if (r < 0 && r != -EINTR) {
+   printk(KERN_ERR "Failed to wait for object: %i\n", r);
return r;
}
}
@@ -357,14 +360,20 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, 
void *data,
struct drm_radeon_gem_wait_idle *args = data;
struct drm_gem_object *gobj;
struct radeon_bo *robj;
-   int r;
+   int r = 0;
+   long ret;
 
gobj = drm_gem_object_lookup(dev, filp, args->handle);
if (gobj == NULL) {
return -ENOENT;
}
robj = gem_to_radeon_bo(gobj);
-   r = radeon_bo_wait(robj, NULL, false);
+   ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 
30 * HZ);
+   if (ret == 0)
+   r = -EBUSY;
+   else if (ret < 0)
+   r = ret;
+
/* callback hw specific functions if any */
if (rdev->asic->ioctl_wait_idle)
robj->rdev->asic->ioctl_wait_idle(rdev, robj);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/17] drm/vmwgfx: rework to new fence interface

2014-07-09 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |2 
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c|  299 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h|   29 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |9 -
 4 files changed, 200 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index db30b790ad24..f3f8caa09cc8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2360,7 +2360,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
BUG_ON(fence == NULL);
 
fence_rep.handle = fence_handle;
-   fence_rep.seqno = fence->seqno;
+   fence_rep.seqno = fence->base.seqno;
vmw_update_seqno(dev_priv, &dev_priv->fifo);
fence_rep.passed_seqno = dev_priv->last_read_seqno;
}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 05b9eea8e875..77f416b7552c 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -46,6 +46,7 @@ struct vmw_fence_manager {
bool goal_irq_on; /* Protected by @goal_irq_mutex */
bool seqno_valid; /* Protected by @lock, and may not be set to true
 without the @goal_irq_mutex held. */
+   unsigned ctx;
 };
 
 struct vmw_user_fence {
@@ -80,6 +81,12 @@ struct vmw_event_fence_action {
uint32_t *tv_usec;
 };
 
+static struct vmw_fence_manager *
+fman_from_fence(struct vmw_fence_obj *fence)
+{
+   return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+}
+
 /**
  * Note on fencing subsystem usage of irqs:
  * Typically the vmw_fences_update function is called
@@ -102,25 +109,130 @@ struct vmw_event_fence_action {
  * objects with actions attached to them.
  */
 
-static void vmw_fence_obj_destroy_locked(struct kref *kref)
+static void vmw_fence_obj_destroy(struct fence *f)
 {
struct vmw_fence_obj *fence =
-   container_of(kref, struct vmw_fence_obj, kref);
+   container_of(f, struct vmw_fence_obj, base);
 
-   struct vmw_fence_manager *fman = fence->fman;
-   unsigned int num_fences;
+   struct vmw_fence_manager *fman = fman_from_fence(fence);
+   unsigned long irq_flags;
 
+   spin_lock_irqsave(&fman->lock, irq_flags);
list_del_init(&fence->head);
-   num_fences = --fman->num_fence_objects;
-   spin_unlock_irq(&fman->lock);
-   if (fence->destroy)
-   fence->destroy(fence);
-   else
-   kfree(fence);
+   --fman->num_fence_objects;
+   spin_unlock_irqrestore(&fman->lock, irq_flags);
+   fence->destroy(fence);
+}
 
-   spin_lock_irq(&fman->lock);
+static const char *vmw_fence_get_driver_name(struct fence *f)
+{
+   return "vmwgfx";
+}
+
+static const char *vmw_fence_get_timeline_name(struct fence *f)
+{
+   return "svga";
+}
+
+static bool vmw_fence_enable_signaling(struct fence *f)
+{
+   struct vmw_fence_obj *fence =
+   container_of(f, struct vmw_fence_obj, base);
+
+   struct vmw_fence_manager *fman = fman_from_fence(fence);
+
+   __le32 __iomem *fifo_mem = fman->dev_priv->mmio_virt;
+   u32 seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
+   if (seqno - fence->base.seqno < VMW_FENCE_WRAP)
+   return false;
+
+   vmw_fifo_ping_host(fman->dev_priv, SVGA_SYNC_GENERIC);
+
+   return true;
+}
+
+struct vmwgfx_wait_cb {
+   struct fence_cb base;
+   struct task_struct *task;
+};
+
+static void
+vmwgfx_wait_cb(struct fence *fence, struct fence_cb *cb)
+{
+   struct vmwgfx_wait_cb *wait =
+   container_of(cb, struct vmwgfx_wait_cb, base);
+
+   wake_up_process(wait->task);
 }
 
+static void __vmw_fences_update(struct vmw_fence_manager *fman);
+
+static long vmw_fence_wait(struct fence *f, bool intr, signed long timeout)
+{
+   struct vmw_fence_obj *fence =
+   container_of(f, struct vmw_fence_obj, base);
+
+   struct vmw_fence_manager *fman = fman_from_fence(fence);
+   struct vmw_private *dev_priv = fman->dev_priv;
+   struct vmwgfx_wait_cb cb;
+   long ret = timeout;
+   unsigned long irq_flags;
+
+   if (likely(vmw_fence_obj_signaled(fence)))
+   return timeout;
+
+   vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
+   vmw_seqno_waiter_add(dev_priv);
+
+   spin_lock_irqsave(f->lock, irq_flags);
+
+   if (intr && signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   goto out;
+   }
+
+   cb.base.func = vmwgfx_wait_cb;
+   cb.task = current;
+   list_add(&cb.base.node, &f->cb_list);
+
+   while (ret > 0) {

[PATCH 13/17] drm/ttm: flip the switch, and convert to dma_fence

2014-07-09 Thread Maarten Lankhorst


---
 drivers/gpu/drm/nouveau/nouveau_bo.c |   48 +---
 drivers/gpu/drm/nouveau/nouveau_fence.c  |   24 +---
 drivers/gpu/drm/nouveau/nouveau_fence.h  |2 
 drivers/gpu/drm/nouveau/nouveau_gem.c|   16 ++-
 drivers/gpu/drm/qxl/qxl_debugfs.c|6 +
 drivers/gpu/drm/qxl/qxl_drv.h|2 
 drivers/gpu/drm/qxl/qxl_kms.c|1 
 drivers/gpu/drm/qxl/qxl_object.h |4 -
 drivers/gpu/drm/qxl/qxl_release.c|3 -
 drivers/gpu/drm/qxl/qxl_ttm.c|  104 --
 drivers/gpu/drm/radeon/radeon_cs.c   |   10 +-
 drivers/gpu/drm/radeon/radeon_display.c  |   25 +++-
 drivers/gpu/drm/radeon/radeon_object.c   |4 -
 drivers/gpu/drm/radeon/radeon_ttm.c  |   34 --
 drivers/gpu/drm/radeon/radeon_uvd.c  |8 +
 drivers/gpu/drm/radeon/radeon_vm.c   |   14 ++
 drivers/gpu/drm/ttm/ttm_bo.c |  171 +-
 drivers/gpu/drm/ttm/ttm_bo_util.c|   23 +---
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |   10 --
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c   |   40 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |   14 +-
 include/drm/ttm/ttm_bo_api.h |2 
 include/drm/ttm/ttm_bo_driver.h  |   26 -
 include/drm/ttm/ttm_execbuf_util.h   |   10 +-
 24 files changed, 208 insertions(+), 393 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 84aba3fa1bd0..5b8ccc39a282 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -92,13 +92,13 @@ nv10_bo_get_tile_region(struct drm_device *dev, int i)
 
 static void
 nv10_bo_put_tile_region(struct drm_device *dev, struct nouveau_drm_tile *tile,
-   struct nouveau_fence *fence)
+   struct fence *fence)
 {
struct nouveau_drm *drm = nouveau_drm(dev);
 
if (tile) {
spin_lock(&drm->tile.lock);
-   tile->fence = nouveau_fence_ref(fence);
+   tile->fence = nouveau_fence_ref((struct nouveau_fence *)fence);
tile->used = false;
spin_unlock(&drm->tile.lock);
}
@@ -965,7 +965,8 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict, bool intr,
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
-   ret = ttm_bo_move_accel_cleanup(bo, fence,
+   ret = ttm_bo_move_accel_cleanup(bo,
+   &fence->base,
evict,
no_wait_gpu,
new_mem);
@@ -1151,8 +1152,9 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
 {
struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
struct drm_device *dev = drm->dev;
+   struct fence *fence = reservation_object_get_excl(bo->resv);
 
-   nv10_bo_put_tile_region(dev, *old_tile, bo->sync_obj);
+   nv10_bo_put_tile_region(dev, *old_tile, fence);
*old_tile = new_tile;
 }
 
@@ -1423,47 +1425,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
 }
 
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
-   nouveau_fence_unref((struct nouveau_fence **)sync_obj);
-}
-
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
struct reservation_object *resv = nvbo->bo.resv;
 
-   nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
-   nvbo->bo.sync_obj = nouveau_fence_ref(fence);
-
reservation_object_add_excl_fence(resv, &fence->base);
 }
 
-static void *
-nouveau_bo_fence_ref(void *sync_obj)
-{
-   return nouveau_fence_ref(sync_obj);
-}
-
-static bool
-nouveau_bo_fence_signalled(void *sync_obj)
-{
-   return nouveau_fence_done(sync_obj);
-}
-
-static int
-nouveau_bo_fence_wait(void *sync_obj, bool lazy, bool intr)
-{
-   return nouveau_fence_wait(sync_obj, lazy, intr);
-}
-
-static int
-nouveau_bo_fence_flush(void *sync_obj)
-{
-   return 0;
-}
-
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
.ttm_tt_populate = &nouveau_ttm_tt_populate,
@@ -1474,11 +1443,6 @@ struct ttm_bo_driver nouveau_bo_driver = {
.move_notify = nouveau_bo_move_ntfy,
.move = nouveau_bo_move,
.verify_access = nouveau_bo_verify_access,
-   .sync_obj_signaled = nouveau_bo_fence_signalled,
-   .sync_obj_wait = nouveau_bo_fence_wait,
-   .sync_obj_flush = nouveau_bo_fence_flush,
-   .sync_obj_unref = nouveau_bo_fence_unref,
-   .sync_obj_ref = nouveau_bo_fence_ref,
.fault_reserve_notify = &nouveau_ttm_fault_reserve_notify,
.io_mem_reserve = &nouveau_ttm_io_mem_reserve,
.io_mem_free =

[PATCH 07/17] drm/nouveau: rework to new fence interface

2014-07-09 Thread Maarten Lankhorst

From: Maarten Lankhorst 

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/core/core/event.c |4 
 drivers/gpu/drm/nouveau/nouveau_bo.c  |6 
 drivers/gpu/drm/nouveau/nouveau_display.c |4 
 drivers/gpu/drm/nouveau/nouveau_fence.c   |  435 -
 drivers/gpu/drm/nouveau/nouveau_fence.h   |   20 +
 drivers/gpu/drm/nouveau/nouveau_gem.c |   17 -
 drivers/gpu/drm/nouveau/nv04_fence.c  |4 
 drivers/gpu/drm/nouveau/nv10_fence.c  |4 
 drivers/gpu/drm/nouveau/nv17_fence.c  |2 
 drivers/gpu/drm/nouveau/nv50_fence.c  |2 
 drivers/gpu/drm/nouveau/nv84_fence.c  |   11 -
 11 files changed, 330 insertions(+), 179 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/core/core/event.c 
b/drivers/gpu/drm/nouveau/core/core/event.c
index ae81d3b5d8b7..5ddc28ec7660 100644
--- a/drivers/gpu/drm/nouveau/core/core/event.c
+++ b/drivers/gpu/drm/nouveau/core/core/event.c
@@ -139,14 +139,14 @@ nouveau_event_ref(struct nouveau_eventh *handler, struct 
nouveau_eventh **ref)
 void
 nouveau_event_trigger(struct nouveau_event *event, u32 types, int index)
 {
-   struct nouveau_eventh *handler;
+   struct nouveau_eventh *handler, *next;
unsigned long flags;
 
if (WARN_ON(index >= event->index_nr))
return;
 
spin_lock_irqsave(&event->list_lock, flags);
-   list_for_each_entry(handler, &event->list[index], head) {
+   list_for_each_entry_safe(handler, next, &event->list[index], head) {
if (!test_bit(NVKM_EVENT_ENABLE, &handler->flags))
continue;
if (!(handler->types & types))
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index e98af2e9a1cb..84aba3fa1bd0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -959,7 +959,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict, bool intr,
}
 
mutex_lock_nested(&chan->cli->mutex, SINGLE_DEPTH_NESTING);
-   ret = nouveau_fence_sync(bo->sync_obj, chan);
+   ret = nouveau_fence_sync(nouveau_bo(bo), chan);
if (ret == 0) {
ret = drm->ttm.move(chan, bo, &bo->mem, new_mem);
if (ret == 0) {
@@ -1432,10 +1432,12 @@ nouveau_bo_fence_unref(void **sync_obj)
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
-   lockdep_assert_held(&nvbo->bo.resv->lock.base);
+   struct reservation_object *resv = nvbo->bo.resv;
 
nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
nvbo->bo.sync_obj = nouveau_fence_ref(fence);
+
+   reservation_object_add_excl_fence(resv, &fence->base);
 }
 
 static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 7928f8f07334..2c4798750b20 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -660,7 +660,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
spin_unlock_irqrestore(&dev->event_lock, flags);
 
/* Synchronize with the old framebuffer */
-   ret = nouveau_fence_sync(old_bo->bo.sync_obj, chan);
+   ret = nouveau_fence_sync(old_bo, chan);
if (ret)
goto fail;
 
@@ -721,7 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
goto fail_unpin;
 
/* synchronise rendering channel with the kernel's channel */
-   ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
+   ret = nouveau_fence_sync(new_bo, chan);
if (ret) {
ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index ab5ea3b0d666..d24f8ce4341a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -32,91 +32,139 @@
 #include "nouveau_drm.h"
 #include "nouveau_dma.h"
 #include "nouveau_fence.h"
+#include 
 
 #include 
 
-struct fence_work {
-   struct work_struct base;
-   struct list_head head;
-   void (*func)(void *);
-   void *data;
-};
+static const struct fence_ops nouveau_fence_ops_uevent;
+static const struct fence_ops nouveau_fence_ops_legacy;
 
 static void
 nouveau_fence_signal(struct nouveau_fence *fence)
 {
-   struct fence_work *work, *temp;
+   fence_signal_locked(&fence->base);
+   list_del(&fence->head);
+
+   if (fence->base.ops == &nouveau_fence_ops_uevent &&
+   fence->event.head.next) {
+   struct nouveau_event *event;
 
-   list_for_each_entry_safe(work, temp, &fence->work, head) {
-   schedule_work(&work->base);
-   list

[PATCH 05/17] drm/ttm: call ttm_bo_wait while inside a reservation

2014-07-09 Thread Maarten Lankhorst

This is the last remaining function that doesn't use the reservation
lock completely to fence off access to a buffer.
---
 drivers/gpu/drm/ttm/ttm_bo.c |   25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4ab9f7171c4f..d7d34336f108 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -502,17 +502,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
ttm_buffer_object *bo,
if (ret)
return ret;
 
-   /*
-* remove sync_obj with ttm_bo_wait, the wait should be
-* finished, and no new wait object should have been added.
-*/
-   spin_lock(&bdev->fence_lock);
-   ret = ttm_bo_wait(bo, false, false, true);
-   WARN_ON(ret);
-   spin_unlock(&bdev->fence_lock);
-   if (ret)
-   return ret;
-
spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);
 
@@ -528,8 +517,16 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
ttm_buffer_object *bo,
spin_unlock(&glob->lru_lock);
return 0;
}
-   } else
-   spin_unlock(&bdev->fence_lock);
+
+   /*
+* remove sync_obj with ttm_bo_wait, the wait should be
+* finished, and no new wait object should have been added.
+*/
+   spin_lock(&bdev->fence_lock);
+   ret = ttm_bo_wait(bo, false, false, true);
+   WARN_ON(ret);
+   }
+   spin_unlock(&bdev->fence_lock);
 
if (ret || unlikely(list_empty(&bo->ddestroy))) {
__ttm_bo_unreserve(bo);
@@ -1539,6 +1536,8 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
void *sync_obj;
int ret = 0;
 
+   lockdep_assert_held(&bo->resv->lock.base);
+
if (likely(bo->sync_obj == NULL))
return 0;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/17] drm/radeon: add timeout argument to radeon_fence_wait_seq

2014-07-09 Thread Maarten Lankhorst

This makes it possible to wait for a specific amount of time,
rather than wait until infinity.

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_fence.c |   60 ++---
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 913787085dfa..6435719fd45b 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct 
radeon_device *rdev, u64 *seq)
 }
 
 /**
- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
  *
  * @rdev: radeon device pointer
  * @target_seq: sequence number(s) we want to wait for
  * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
  *
  * Wait for the requested sequence number(s) to be written by any ring
  * (all asics).  Sequnce number array is indexed by ring id.
  * @intr selects whether to use interruptable (true) or non-interruptable
  * (false) sleep when waiting for the sequence number.  Helper function
  * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
  * -EDEADLK is returned when a GPU lockup has been detected.
  */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-bool intr)
+static long radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+ u64 *target_seq, bool intr,
+ long timeout)
 {
uint64_t last_seq[RADEON_NUM_RINGS];
bool signaled;
-   int i, r;
+   int i;
 
while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+   long r, waited;
+
+   waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
 
/* Save current sequence values, used to check for GPU lockups 
*/
for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,11 +326,11 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (intr) {
r = wait_event_interruptible_timeout(rdev->fence_queue, 
(
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_seq))
-|| rdev->needs_reset), 
RADEON_FENCE_JIFFIES_TIMEOUT);
+|| rdev->needs_reset), waited);
} else {
r = wait_event_timeout(rdev->fence_queue, (
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_seq))
-|| rdev->needs_reset), 
RADEON_FENCE_JIFFIES_TIMEOUT);
+|| rdev->needs_reset), waited);
}
 
for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -337,6 +344,14 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (unlikely(r < 0))
return r;
 
+   timeout -= waited - r;
+
+   /*
+* If this is a timed wait and the wait completely timed out 
just return.
+*/
+   if (!timeout)
+   break;
+
if (unlikely(!signaled)) {
if (rdev->needs_reset)
return -EDEADLK;
@@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
}
}
}
-   return 0;
+   return timeout;
 }
 
 /**
  * radeon_fence_wait - wait for a fence to signal
  *
  * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
  *
  * Wait for the requested fence to signal (all asics).
  * @intr selects whether to use interruptable (true) or non-interruptable
@@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
 int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
uint64_t seq[RADEON_NUM_RINGS] = {};
-   int r;
+   long r;
 
if (fence == NULL) {
WARN(1, "Querying an invalid fence : %p !\n", fence);
@@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, bool 
intr)
if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
return 0;
 
-   r = radeon_fence_wait_seq(fence->rdev, seq, intr);
-   if (r)
+   r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
MAX_SCHEDULE_TIM

[PATCH 11/17] drm/vmwgfx: get rid of different types of fence_flags entirely

2014-07-09 Thread Maarten Lankhorst

Only one type was ever used. This is needed to simplify the fence
support in the next commit.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |5 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |   14 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |   50 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h   |8 +
 5 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 4a36bb1dc525..f15718cc631d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -792,15 +792,12 @@ static int vmw_sync_obj_flush(void *sync_obj)
 
 static bool vmw_sync_obj_signaled(void *sync_obj)
 {
-   return  vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj,
-  DRM_VMW_FENCE_FLAG_EXEC);
-
+   return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj);
 }
 
 static int vmw_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
 {
return vmw_fence_obj_wait((struct vmw_fence_obj *) sync_obj,
- DRM_VMW_FENCE_FLAG_EXEC,
  lazy, interruptible,
  VMW_FENCE_WAIT_TIMEOUT);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 6b252a887ae2..f217e9723b9e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -332,7 +332,6 @@ struct vmw_sw_context{
uint32_t *cmd_bounce;
uint32_t cmd_bounce_size;
struct list_head resource_list;
-   uint32_t fence_flags;
struct ttm_buffer_object *cur_query_bo;
struct list_head res_relocations;
uint32_t *buf_start;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index f8b25bc4e634..db30b790ad24 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -350,8 +350,6 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context 
*sw_context,
vval_buf->validate_as_mob = validate_as_mob;
}
 
-   sw_context->fence_flags |= DRM_VMW_FENCE_FLAG_EXEC;
-
if (p_val_node)
*p_val_node = val_node;
 
@@ -2308,13 +2306,9 @@ int vmw_execbuf_fence_commands(struct drm_file 
*file_priv,
 
if (p_handle != NULL)
ret = vmw_user_fence_create(file_priv, dev_priv->fman,
-   sequence,
-   DRM_VMW_FENCE_FLAG_EXEC,
-   p_fence, p_handle);
+   sequence, p_fence, p_handle);
else
-   ret = vmw_fence_create(dev_priv->fman, sequence,
-  DRM_VMW_FENCE_FLAG_EXEC,
-  p_fence);
+   ret = vmw_fence_create(dev_priv->fman, sequence, p_fence);
 
if (unlikely(ret != 0 && !synced)) {
(void) vmw_fallback_wait(dev_priv, false, false,
@@ -2387,8 +2381,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
ttm_ref_object_base_unref(vmw_fp->tfile,
  fence_handle, TTM_REF_USAGE);
DRM_ERROR("Fence copy error. Syncing.\n");
-   (void) vmw_fence_obj_wait(fence, fence->signal_mask,
- false, false,
+   (void) vmw_fence_obj_wait(fence, false, false,
  VMW_FENCE_WAIT_TIMEOUT);
}
 }
@@ -2438,7 +2431,6 @@ int vmw_execbuf_process(struct drm_file *file_priv,
sw_context->fp = vmw_fpriv(file_priv);
sw_context->cur_reloc = 0;
sw_context->cur_val_buf = 0;
-   sw_context->fence_flags = 0;
INIT_LIST_HEAD(&sw_context->resource_list);
sw_context->cur_query_bo = dev_priv->pinned_bo;
sw_context->last_query_ctx = NULL;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 436b013b4231..05b9eea8e875 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -207,9 +207,7 @@ void vmw_fence_manager_takedown(struct vmw_fence_manager 
*fman)
 }
 
 static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
- struct vmw_fence_obj *fence,
- u32 seqno,
- uint32_t mask,
+ struct vmw_fence_obj *fence, u32 seqno,
  void (*destroy) (struct vmw_fence_obj *fence))
 {
unsigned long irq_flags;
@@ -220,7 +218,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,

[PATCH 04/17] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence

2014-07-09 Thread Maarten Lankhorst

This will ensure we always hold the required lock when calling those functions.
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |2 ++
 drivers/gpu/drm/nouveau/nouveau_display.c |   17 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..33eb7164525a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1431,6 +1431,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
struct nouveau_fence *old_fence = NULL;
 
+   lockdep_assert_held(&nvbo->bo.resv->lock.base);
+
spin_lock(&nvbo->bo.bdev->fence_lock);
old_fence = nvbo->bo.sync_obj;
nvbo->bo.sync_obj = new_fence;
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 47ad74255bf1..826b66c44235 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -716,6 +716,9 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
}
 
mutex_lock(&chan->cli->mutex);
+   ret = ttm_bo_reserve(&new_bo->bo, true, false, false, NULL);
+   if (ret)
+   goto fail_unpin;
 
/* synchronise rendering channel with the kernel's channel */
spin_lock(&new_bo->bo.bdev->fence_lock);
@@ -723,12 +726,18 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
spin_unlock(&new_bo->bo.bdev->fence_lock);
ret = nouveau_fence_sync(fence, chan);
nouveau_fence_unref(&fence);
-   if (ret)
+   if (ret) {
+   ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
+   }
 
-   ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
-   if (ret)
-   goto fail_unpin;
+   if (new_bo != old_bo) {
+   ttm_bo_unreserve(&new_bo->bo);
+
+   ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
+   if (ret)
+   goto fail_unpin;
+   }
 
/* Initialize a page flip struct */
*s = (struct nouveau_page_flip_state)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/17] drm/radeon: use common fence implementation for fences

2014-07-09 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon.h|   15 +-
 drivers/gpu/drm/radeon/radeon_device.c |   60 -
 drivers/gpu/drm/radeon/radeon_fence.c  |  223 ++--
 3 files changed, 248 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 29d9cc04c04e..03a5567f2c2f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -116,9 +117,6 @@ extern int radeon_deep_color;
 #define RADEONFB_CONN_LIMIT4
 #define RADEON_BIOS_NUM_SCRATCH8
 
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ  0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX 0
@@ -350,12 +348,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {
+   struct fence base;
+
struct radeon_device*rdev;
-   struct kref kref;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
+
+   wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2268,6 +2269,7 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
+   unsignedfence_context;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
boolib_pool_ready;
@@ -2358,11 +2360,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 
index);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 
 /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 03686fab842d..86699df7c8f3 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1213,6 +1213,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+   rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 
0x%04X:0x%04X).\n",
radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1607,6 +1608,54 @@ int radeon_resume_kms(struct drm_device *dev, bool 
resume, bool fbcon)
return 0;
 }
 
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)
+{
+   uint32_t mask = 0;
+   int i;
+
+   if (!rdev->ddev->irq_enabled)
+   return mask;
+
+   /*
+* increase refcount on sw interrupts for all rings to stop
+* enabling interrupts in radeon_fence_enable_signaling during
+* gpu reset.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!rdev->ring[i].ready)
+   continue;
+
+   atomic_inc(&rdev->irq.ring_int[i]);
+   mask |= 1 << i;
+   }
+   return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+   unsigned long irqflags;
+   int i;
+
+   if (!mask)
+   return;
+
+   /*
+* undo refcount increase, and reset irqs to correct value.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!(mask & (1 << i)))
+   continue;
+
+   atomic_dec(&rdev->irq.ring_int[i]);
+   }
+
+   spin_lock_irqsave(&rdev->irq.lock, irqflags);
+   radeon_irq_set(rdev);
+   spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
 /**
  * radeon_gpu_reset - reset the asic
  *
@@ -1624,6 +1673,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 
int i, r;
int resched;
+   uint32_t sw_mask;
 
down_write(&rdev->exclusive_lock);
 
@@ -1637,6 +1687,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
radeon_save_bios_scratch_regs(rdev);
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+   sw_mask = radeon_gpu_mask_sw_irq(rdev);
radeon_pm_suspend(rdev);
radeon_suspend(rdev);
 
@@ -1686,13 +1737,20 @@ retry:
radeon_pm_resume(rdev);
drm_helper_resume_force_mode(rdev->ddev);
 
+   radeon_gpu_unmask_sw_irq(rdev, sw_mask);

[PATCH 06/17] drm/ttm: kill fence_lock

2014-07-09 Thread Maarten Lankhorst

No users are left, kill it off! :D
Conversion to the reservation api is next on the list, after
that the functionality can be restored with rcu.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   25 +++---
 drivers/gpu/drm/nouveau/nouveau_display.c |6 --
 drivers/gpu/drm/nouveau/nouveau_gem.c |   16 +-
 drivers/gpu/drm/qxl/qxl_cmd.c |2 -
 drivers/gpu/drm/qxl/qxl_fence.c   |4 --
 drivers/gpu/drm/qxl/qxl_object.h  |2 -
 drivers/gpu/drm/qxl/qxl_release.c |2 -
 drivers/gpu/drm/radeon/radeon_display.c   |   12 +++--
 drivers/gpu/drm/radeon/radeon_object.c|2 -
 drivers/gpu/drm/ttm/ttm_bo.c  |   75 +++--
 drivers/gpu/drm/ttm/ttm_bo_util.c |5 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |3 -
 drivers/gpu/drm/ttm/ttm_execbuf_util.c|2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c|4 --
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |   17 ++-
 include/drm/ttm/ttm_bo_api.h  |5 --
 include/drm/ttm/ttm_bo_driver.h   |3 -
 17 files changed, 45 insertions(+), 140 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 33eb7164525a..e98af2e9a1cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1196,9 +1196,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, 
bool intr,
}
 
/* Fallback to software copy. */
-   spin_lock(&bo->bdev->fence_lock);
ret = ttm_bo_wait(bo, true, intr, no_wait_gpu);
-   spin_unlock(&bo->bdev->fence_lock);
if (ret == 0)
ret = ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
 
@@ -1425,26 +1423,19 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
 }
 
+static void
+nouveau_bo_fence_unref(void **sync_obj)
+{
+   nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+}
+
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
-   struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
-   struct nouveau_fence *old_fence = NULL;
-
lockdep_assert_held(&nvbo->bo.resv->lock.base);
 
-   spin_lock(&nvbo->bo.bdev->fence_lock);
-   old_fence = nvbo->bo.sync_obj;
-   nvbo->bo.sync_obj = new_fence;
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
-
-   nouveau_fence_unref(&old_fence);
-}
-
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
-   nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+   nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
+   nvbo->bo.sync_obj = nouveau_fence_ref(fence);
 }
 
 static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 826b66c44235..7928f8f07334 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -721,11 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
goto fail_unpin;
 
/* synchronise rendering channel with the kernel's channel */
-   spin_lock(&new_bo->bo.bdev->fence_lock);
-   fence = nouveau_fence_ref(new_bo->bo.sync_obj);
-   spin_unlock(&new_bo->bo.bdev->fence_lock);
-   ret = nouveau_fence_sync(fence, chan);
-   nouveau_fence_unref(&fence);
+   ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
if (ret) {
ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 6e1c58a880fe..6cd5298cbb53 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -105,9 +105,7 @@ nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct 
nouveau_vma *vma)
list_del(&vma->head);
 
if (mapped) {
-   spin_lock(&nvbo->bo.bdev->fence_lock);
fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
}
 
if (fence) {
@@ -432,17 +430,11 @@ retry:
 static int
 validate_sync(struct nouveau_channel *chan, struct nouveau_bo *nvbo)
 {
-   struct nouveau_fence *fence = NULL;
+   struct nouveau_fence *fence = nvbo->bo.sync_obj;
int ret = 0;
 
-   spin_lock(&nvbo->bo.bdev->fence_lock);
-   fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
-
-   if (fence) {
+   if (fence)
ret = nouveau_fence_sync(fence, chan);
-   nouveau_fence_unref(&fence);
-   }
 
return ret;
 }
@@ -661,9 +653,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,

[PATCH 03/17] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

2014-07-09 Thread Maarten Lankhorst

Apart from some code inside ttm itself and nouveau_bo_vma_del,
this is the only place where ttm_bo_wait is used without a reservation.
Fix this so we can remove the fence_lock later on.

After the switch to rcu the reservation lock will be
removed again.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c |   22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..6e1c58a880fe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -886,17 +886,31 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void 
*data,
struct drm_gem_object *gem;
struct nouveau_bo *nvbo;
bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
-   int ret = -EINVAL;
+   int ret;
+   struct nouveau_fence *fence = NULL;
 
gem = drm_gem_object_lookup(dev, file_priv, req->handle);
if (!gem)
return -ENOENT;
nvbo = nouveau_gem_object(gem);
 
-   spin_lock(&nvbo->bo.bdev->fence_lock);
-   ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
+   ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
+   if (!ret) {
+   spin_lock(&nvbo->bo.bdev->fence_lock);
+   ret = ttm_bo_wait(&nvbo->bo, true, true, true);
+   if (!no_wait && ret)
+   fence = nouveau_fence_ref(nvbo->bo.sync_obj);
+   spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+   ttm_bo_unreserve(&nvbo->bo);
+   }
drm_gem_object_unreference_unlocked(gem);
+
+   if (fence) {
+   ret = nouveau_fence_wait(fence, true, no_wait);
+   nouveau_fence_unref(&fence);
+   }
+
return ret;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/17] drm/ttm: kill off some members to ttm_validate_buffer

2014-07-09 Thread Maarten Lankhorst

This reorders the list to keep track of what buffers are reserved,
so previous members are always unreserved.

This gets rid of some bookkeeping that's no longer needed,
while simplifying the code some.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/qxl/qxl_release.c   |1 
 drivers/gpu/drm/ttm/ttm_execbuf_util.c  |  142 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |1 
 include/drm/ttm/ttm_execbuf_util.h  |3 -
 4 files changed, 50 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 2b43e5deb051..e85c4d274dc0 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -350,7 +350,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release 
*release)
 
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
-   entry->reserved = false;
}
spin_unlock(&bdev->fence_lock);
spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c 
b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 39a11bbd2bac..6db47a72667e 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -32,20 +32,12 @@
 #include 
 #include 
 
-static void ttm_eu_backoff_reservation_locked(struct list_head *list)
+static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
+ struct ttm_validate_buffer *entry)
 {
-   struct ttm_validate_buffer *entry;
-
-   list_for_each_entry(entry, list, head) {
+   list_for_each_entry_continue_reverse(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
-   if (!entry->reserved)
-   continue;
 
-   entry->reserved = false;
-   if (entry->removed) {
-   ttm_bo_add_to_lru(bo);
-   entry->removed = false;
-   }
__ttm_bo_unreserve(bo);
}
 }
@@ -56,27 +48,9 @@ static void ttm_eu_del_from_lru_locked(struct list_head 
*list)
 
list_for_each_entry(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
-   if (!entry->reserved)
-   continue;
+   unsigned put_count = ttm_bo_del_from_lru(bo);
 
-   if (!entry->removed) {
-   entry->put_count = ttm_bo_del_from_lru(bo);
-   entry->removed = true;
-   }
-   }
-}
-
-static void ttm_eu_list_ref_sub(struct list_head *list)
-{
-   struct ttm_validate_buffer *entry;
-
-   list_for_each_entry(entry, list, head) {
-   struct ttm_buffer_object *bo = entry->bo;
-
-   if (entry->put_count) {
-   ttm_bo_list_ref_sub(bo, entry->put_count, true);
-   entry->put_count = 0;
-   }
+   ttm_bo_list_ref_sub(bo, put_count, true);
}
 }
 
@@ -91,11 +65,18 @@ void ttm_eu_backoff_reservation(struct ww_acquire_ctx 
*ticket,
 
entry = list_first_entry(list, struct ttm_validate_buffer, head);
glob = entry->bo->glob;
+
spin_lock(&glob->lru_lock);
-   ttm_eu_backoff_reservation_locked(list);
+   list_for_each_entry(entry, list, head) {
+   struct ttm_buffer_object *bo = entry->bo;
+
+   ttm_bo_add_to_lru(bo);
+   __ttm_bo_unreserve(bo);
+   }
+   spin_unlock(&glob->lru_lock);
+
if (ticket)
ww_acquire_fini(ticket);
-   spin_unlock(&glob->lru_lock);
 }
 EXPORT_SYMBOL(ttm_eu_backoff_reservation);
 
@@ -121,64 +102,55 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
if (list_empty(list))
return 0;
 
-   list_for_each_entry(entry, list, head) {
-   entry->reserved = false;
-   entry->put_count = 0;
-   entry->removed = false;
-   }
-
entry = list_first_entry(list, struct ttm_validate_buffer, head);
glob = entry->bo->glob;
 
if (ticket)
ww_acquire_init(ticket, &reservation_ww_class);
-retry:
+
list_for_each_entry(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
 
-   /* already slowpath reserved? */
-   if (entry->reserved)
-   continue;
-
ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
   ticket);
+   if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
+   __ttm_bo_unreserve(bo);
 
-   if (ret == -EDEADLK) {
-   /* uh oh, we lost out, drop every reservation and try
-* to only reserve th

[PATCH 00/17] Convert TTM to the new fence interface.

2014-07-09 Thread Maarten Lankhorst

This series applies on top of the driver-core-next branch of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git

Before converting ttm to the new fence interface I had to fix some
drivers to require a reservation before poking with fence_obj.
After flipping the switch RCU becomes available instead, and
the extra reservations can be dropped again. :-)

I've done at least basic testing on all the drivers I've converted
at some point, but more testing is definitely welcomed!

---

Maarten Lankhorst (17):
  drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
  drm/ttm: kill off some members to ttm_validate_buffer
  drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
  drm/nouveau: require reservations for nouveau_fence_sync and 
nouveau_bo_fence
  drm/ttm: call ttm_bo_wait while inside a reservation
  drm/ttm: kill fence_lock
  drm/nouveau: rework to new fence interface
  drm/radeon: add timeout argument to radeon_fence_wait_seq
  drm/radeon: use common fence implementation for fences
  drm/qxl: rework to new fence interface
  drm/vmwgfx: get rid of different types of fence_flags entirely
  drm/vmwgfx: rework to new fence interface
  drm/ttm: flip the switch, and convert to dma_fence
  drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
  drm/radeon: use rcu waits in some ioctls
  drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
  drm/ttm: use rcu in core ttm

 drivers/gpu/drm/nouveau/core/core/event.c |4 
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   59 +---
 drivers/gpu/drm/nouveau/nouveau_display.c |   25 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c   |  431 +++--
 drivers/gpu/drm/nouveau/nouveau_fence.h   |   22 +
 drivers/gpu/drm/nouveau/nouveau_gem.c |   55 +---
 drivers/gpu/drm/nouveau/nv04_fence.c  |4 
 drivers/gpu/drm/nouveau/nv10_fence.c  |4 
 drivers/gpu/drm/nouveau/nv17_fence.c  |2 
 drivers/gpu/drm/nouveau/nv50_fence.c  |2 
 drivers/gpu/drm/nouveau/nv84_fence.c  |   11 -
 drivers/gpu/drm/qxl/Makefile  |2 
 drivers/gpu/drm/qxl/qxl_cmd.c |7 
 drivers/gpu/drm/qxl/qxl_debugfs.c |   16 +
 drivers/gpu/drm/qxl/qxl_drv.h |   20 -
 drivers/gpu/drm/qxl/qxl_fence.c   |   91 --
 drivers/gpu/drm/qxl/qxl_kms.c |1 
 drivers/gpu/drm/qxl/qxl_object.c  |2 
 drivers/gpu/drm/qxl/qxl_object.h  |6 
 drivers/gpu/drm/qxl/qxl_release.c |  172 ++--
 drivers/gpu/drm/qxl/qxl_ttm.c |   93 --
 drivers/gpu/drm/radeon/radeon.h   |   15 -
 drivers/gpu/drm/radeon/radeon_cs.c|   10 +
 drivers/gpu/drm/radeon/radeon_device.c|   60 
 drivers/gpu/drm/radeon/radeon_display.c   |   21 +
 drivers/gpu/drm/radeon/radeon_fence.c |  283 +++
 drivers/gpu/drm/radeon/radeon_gem.c   |   19 +
 drivers/gpu/drm/radeon/radeon_object.c|8 -
 drivers/gpu/drm/radeon/radeon_ttm.c   |   34 --
 drivers/gpu/drm/radeon/radeon_uvd.c   |   10 -
 drivers/gpu/drm/radeon/radeon_vm.c|   16 +
 drivers/gpu/drm/ttm/ttm_bo.c  |  187 ++---
 drivers/gpu/drm/ttm/ttm_bo_util.c |   28 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |3 
 drivers/gpu/drm/ttm/ttm_execbuf_util.c|  146 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c|   47 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |1 
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c   |   24 --
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c |  329 --
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h |   35 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |   43 +--
 include/drm/ttm/ttm_bo_api.h  |7 
 include/drm/ttm/ttm_bo_driver.h   |   29 --
 include/drm/ttm/ttm_execbuf_util.h|   22 +
 44 files changed, 1256 insertions(+), 1150 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/9] Updated fence patch series

2014-07-02 Thread Maarten Lankhorst

op 02-07-14 07:37, Greg KH schreef:
> On Tue, Jul 01, 2014 at 12:57:02PM +0200, Maarten Lankhorst wrote:
>> So after some more hacking I've moved dma-buf to its own subdirectory,
>> drivers/dma-buf and applied the fence patches to its new place. I believe 
>> that the
>> first patch should be applied regardless, and the rest should be ready now.
>> :-)
>>
>> Changes to the fence api:
>> - release_fence -> fence_release etc.
>> - __fence_init -> fence_init
>> - __fence_signal -> fence_signal_locked
>> - __fence_is_signaled -> fence_is_signaled_locked
>> - Changing BUG_ON to WARN_ON in fence_later, and return NULL if it triggers.
>>
>> Android can expose fences to userspace. It's possible to make the new fence
>> mechanism expose the same fences to userspace by changing sync_fence_create
>> to take a struct fence instead of a struct sync_pt. No other change is 
>> needed,
>> because only the fence parts of struct sync_pt are used. But because the
>> userspace fences are a separate problem and I haven't really looked at it yet
>> I feel it should stay in staging, for now.
> Ok, that's reasonable.
>
> At first glance, this all looks "sane" to me, any objection from anyone
> if I merge this through my driver-core tree for 3.17?
>
Sounds good to me, let me know when you pull it in, so I can rebase my drm 
conversion on top of it. :-)

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/9] dma-buf: move to drivers/dma-buf

2014-07-01 Thread Maarten Lankhorst

op 01-07-14 13:06, Arend van Spriel schreef:
> On 01-07-14 12:57, Maarten Lankhorst wrote:
>> Signed-off-by: Maarten Lankhorst 
> It would help to use '-M' option with format-patch for this kind of rework.
>
> Regards,
> Arend
>
Thanks, was looking for some option but didn't find it.

Have a rediff below. :-)
8< 

Signed-off-by: Maarten Lankhorst 
---
 Documentation/DocBook/device-drivers.tmpl | 3 +--
 MAINTAINERS   | 4 ++--
 drivers/Makefile  | 1 +
 drivers/base/Makefile | 1 -
 drivers/dma-buf/Makefile  | 1 +
 drivers/{base => dma-buf}/dma-buf.c   | 0
 drivers/{base => dma-buf}/reservation.c   | 0
 7 files changed, 5 insertions(+), 5 deletions(-)
 create mode 100644 drivers/dma-buf/Makefile
 rename drivers/{base => dma-buf}/dma-buf.c (100%)
 rename drivers/{base => dma-buf}/reservation.c (100%)

diff --git a/Documentation/DocBook/device-drivers.tmpl 
b/Documentation/DocBook/device-drivers.tmpl
index cc63f30de166..ac61ebd92875 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -128,8 +128,7 @@ X!Edrivers/base/interface.c
 !Edrivers/base/bus.c
  
  Device Drivers DMA Management
-!Edrivers/base/dma-buf.c
-!Edrivers/base/reservation.c
+!Edrivers/dma-buf/dma-buf.c
 !Iinclude/linux/reservation.h
 !Edrivers/base/dma-coherent.c
 !Edrivers/base/dma-mapping.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 134483f206e4..c948e53a4ee6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2882,8 +2882,8 @@ S:Maintained
 L: linux-me...@vger.kernel.org
 L: dri-de...@lists.freedesktop.org
 L: linaro-mm-...@lists.linaro.org
-F: drivers/base/dma-buf*
-F: include/linux/dma-buf*
+F: drivers/dma-buf/
+F: include/linux/dma-buf* include/linux/reservation.h
 F: Documentation/dma-buf-sharing.txt
 T: git git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
 
diff --git a/drivers/Makefile b/drivers/Makefile
index f98b50d8251d..c00337be5351 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_FB_INTEL)  += video/fbdev/intelfb/
 
 obj-$(CONFIG_PARPORT)  += parport/
 obj-y  += base/ block/ misc/ mfd/ nfc/
+obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
 obj-$(CONFIG_NUBUS)+= nubus/
 obj-y  += macintosh/
 obj-$(CONFIG_IDE)  += ide/
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 04b314e0fa51..4aab26ec0292 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -10,7 +10,6 @@ obj-$(CONFIG_DMA_CMA) += dma-contiguous.o
 obj-y  += power/
 obj-$(CONFIG_HAS_DMA)  += dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
-obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o reservation.o
 obj-$(CONFIG_ISA)  += isa.o
 obj-$(CONFIG_FW_LOADER)+= firmware_class.o
 obj-$(CONFIG_NUMA) += node.o
diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
new file mode 100644
index ..4a4f4c9bacd0
--- /dev/null
+++ b/drivers/dma-buf/Makefile
@@ -0,0 +1 @@
+obj-y := dma-buf.o reservation.o
diff --git a/drivers/base/dma-buf.c b/drivers/dma-buf/dma-buf.c
similarity index 100%
rename from drivers/base/dma-buf.c
rename to drivers/dma-buf/dma-buf.c
diff --git a/drivers/base/reservation.c b/drivers/dma-buf/reservation.c
similarity index 100%
rename from drivers/base/reservation.c
rename to drivers/dma-buf/reservation.c
-- 
2.0.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 2/9] fence: dma-buf cross-device synchronization (v18)

2014-07-01 Thread Maarten Lankhorst

A fence can be attached to a buffer which is being filled or consumed
by hw, to allow userspace to pass the buffer without waiting to another
device.  For example, userspace can call page_flip ioctl to display the
next frame of graphics after kicking the GPU but while the GPU is still
rendering.  The display device sharing the buffer with the GPU would
attach a callback to get notified when the GPU's rendering-complete IRQ
fires, to update the scan-out address of the display, without having to
wake up userspace.

A driver must allocate a fence context for each execution ring that can
run in parallel. The function for this takes an argument with how many
contexts to allocate:
  + fence_context_alloc()

A fence is transient, one-shot deal.  It is allocated and attached
to one or more dma-buf's.  When the one that attached it is done, with
the pending operation, it can signal the fence:
  + fence_signal()

To have a rough approximation whether a fence is fired, call:
  + fence_is_signaled()

The dma-buf-mgr handles tracking, and waiting on, the fences associated
with a dma-buf.

The one pending on the fence can add an async callback:
  + fence_add_callback()

The callback can optionally be cancelled with:
  + fence_remove_callback()

To wait synchronously, optionally with a timeout:
  + fence_wait()
  + fence_wait_timeout()

When emitting a fence, call:
  + trace_fence_emit()

To annotate that a fence is blocking on another fence, call:
  + trace_fence_annotate_wait_on(fence, on_fence)

A default software-only implementation is provided, which can be used
by drivers attaching a fence to a buffer when they have no other means
for hw sync.  But a memory backed fence is also envisioned, because it
is common that GPU's can write to, or poll on some memory location for
synchronization.  For example:

  fence = custom_get_fence(...);
  if ((seqno_fence = to_seqno_fence(fence)) != NULL) {
dma_buf *fence_buf = seqno_fence->sync_buf;
get_dma_buf(fence_buf);

... tell the hw the memory location to wait ...
custom_wait_on(fence_buf, seqno_fence->seqno_ofs, fence->seqno);
  } else {
/* fall-back to sw sync * /
fence_add_callback(fence, my_cb);
  }

On SoC platforms, if some other hw mechanism is provided for synchronizing
between IP blocks, it could be supported as an alternate implementation
with it's own fence ops in a similar way.

enable_signaling callback is used to provide sw signaling in case a cpu
waiter is requested or no compatible hardware signaling could be used.

The intention is to provide a userspace interface (presumably via eventfd)
later, to be used in conjunction with dma-buf's mmap support for sw access
to buffers (or for userspace apps that would prefer to do their own
synchronization).

v1: Original
v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided
that dma-fence didn't need to care about the sw->hw signaling path
(it can be handled same as sw->sw case), and therefore the fence->ops
can be simplified and more handled in the core.  So remove the signal,
add_callback, cancel_callback, and wait ops, and replace with a simple
enable_signaling() op which can be used to inform a fence supporting
hw->hw signaling that one or more devices which do not support hw
signaling are waiting (and therefore it should enable an irq or do
whatever is necessary in order that the CPU is notified when the
fence is passed).
v3: Fix locking fail in attach_fence() and get_fence()
v4: Remove tie-in w/ dma-buf..  after discussion w/ danvet and mlankorst
we decided that we need to be able to attach one fence to N dma-buf's,
so using the list_head in dma-fence struct would be problematic.
v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager.
v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments
about checking if fence fired or not. This is broken by design.
waitqueue_active during destruction is now fatal, since the signaller
should be holding a reference in enable_signalling until it signalled
the fence. Pass the original dma_fence_cb along, and call __remove_wait
in the dma_fence_callback handler, so that no cleanup needs to be
performed.
v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if
fence wasn't signaled yet, for example for hardware fences that may
choose to signal blindly.
v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to
header and fixed include mess. dma-fence.h now includes dma-buf.h
All members are now initialized, so kmalloc can be used for
allocating a dma-fence. More documentation added.
v9: Change compiler bitfields to flags, change return type of
enable_signaling to bool. Rework dma_fence_wait. Added
dma_fence_is_signaled and dma_fence_wait_timeout.
s/dma// and change exports to non GPL. Added fence_is_signaled

[PATCH v2 9/9] reservation: add suppport for read-only access using rcu

2014-07-01 Thread Maarten Lankhorst

This adds some extra functions to deal with rcu.

reservation_object_get_fences_rcu() will obtain the list of shared
and exclusive fences without obtaining the ww_mutex.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl and reservation_object_get_list require
the reservation object to be held, updating requires
write_seqcount_begin/end. If only the exclusive fence is needed,
rcu_dereference followed by fence_get_rcu can be used, if the shared
fences are needed it's recommended to use the supplied functions.

Signed-off-by: Maarten Lankhorst 
Reviewed-By: Thomas Hellstrom 
---
 drivers/dma-buf/dma-buf.c |   47 --
 drivers/dma-buf/fence.c   |2 
 drivers/dma-buf/reservation.c |  336 ++---
 include/linux/fence.h |   17 ++
 include/linux/reservation.h   |   52 --
 5 files changed, 400 insertions(+), 54 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cb8379dfeed5..f3014c448e1e 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -137,7 +137,7 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
struct reservation_object_list *fobj;
struct fence *fence_excl;
unsigned long events;
-   unsigned shared_count;
+   unsigned shared_count, seq;
 
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -151,14 +151,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!events)
return 0;
 
-   ww_mutex_lock(&resv->lock, NULL);
+retry:
+   seq = read_seqcount_begin(&resv->seq);
+   rcu_read_lock();
 
-   fobj = resv->fence;
-   if (!fobj)
-   goto out;
-
-   shared_count = fobj->shared_count;
-   fence_excl = resv->fence_excl;
+   fobj = rcu_dereference(resv->fence);
+   if (fobj)
+   shared_count = fobj->shared_count;
+   else
+   shared_count = 0;
+   fence_excl = rcu_dereference(resv->fence_excl);
+   if (read_seqcount_retry(&resv->seq, seq)) {
+   rcu_read_unlock();
+   goto retry;
+   }
 
if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
@@ -176,14 +182,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
if (events & pevents) {
-   if (!fence_add_callback(fence_excl, &dcb->cb,
+   if (!fence_get_rcu(fence_excl)) {
+   /* force a recheck */
+   events &= ~pevents;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   } else if (!fence_add_callback(fence_excl, &dcb->cb,
   dma_buf_poll_cb)) {
events &= ~pevents;
+   fence_put(fence_excl);
} else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
+   fence_put(fence_excl);
dma_buf_poll_cb(NULL, &dcb->cb);
}
}
@@ -205,13 +217,26 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
goto out;
 
for (i = 0; i < shared_count; ++i) {
-   struct fence *fence = fobj->shared[i];
+   struct fence *fence = rcu_dereference(fobj->shared[i]);
 
+   if (!fence_get_rcu(fence)) {
+   /*
+* fence refcount dropped to zero, this means
+* that fobj has been freed
+*
+* call dma_buf_poll_cb and force a recheck!
+*/
+   events &= ~POLLOUT;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   break;
+   }
if (!fence_add_callback(fence, &dcb->cb,
dma_buf_poll_cb)) {
+   fence_put(fence);
events &= ~POLLOUT;
break;

[PATCH v2 4/9] dma-buf: use reservation objects

2014-07-01 Thread Maarten Lankhorst

This allows reservation objects to be used in dma-buf. it's required
for implementing polling support on the fences that belong to a dma-buf.

Signed-off-by: Maarten Lankhorst 
Acked-by: Mauro Carvalho Chehab  #drivers/media/v4l2-core/
Acked-by: Thomas Hellstrom  #drivers/gpu/drm/ttm
Signed-off-by: Vincent Stehlé  
#drivers/gpu/drm/armada/
---
 drivers/dma-buf/dma-buf.c  |   22 --
 drivers/gpu/drm/armada/armada_gem.c|2 +-
 drivers/gpu/drm/drm_prime.c|8 +++-
 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |2 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |3 ++-
 drivers/gpu/drm/nouveau/nouveau_drm.c  |1 +
 drivers/gpu/drm/nouveau/nouveau_gem.h  |1 +
 drivers/gpu/drm/nouveau/nouveau_prime.c|7 +++
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c  |2 +-
 drivers/gpu/drm/radeon/radeon_drv.c|2 ++
 drivers/gpu/drm/radeon/radeon_prime.c  |8 
 drivers/gpu/drm/tegra/gem.c|2 +-
 drivers/gpu/drm/ttm/ttm_object.c   |2 +-
 drivers/media/v4l2-core/videobuf2-dma-contig.c |2 +-
 drivers/staging/android/ion/ion.c  |3 ++-
 include/drm/drmP.h |3 +++
 include/linux/dma-buf.h|9 ++---
 17 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 840c7fa80983..cd40ca22911f 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -25,10 +25,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static inline int is_dma_buf_file(struct file *);
 
@@ -56,6 +58,9 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
list_del(&dmabuf->list_node);
mutex_unlock(&db_list.lock);
 
+   if (dmabuf->resv == (struct reservation_object *)&dmabuf[1])
+   reservation_object_fini(dmabuf->resv);
+
kfree(dmabuf);
return 0;
 }
@@ -128,6 +133,7 @@ static inline int is_dma_buf_file(struct file *file)
  * @size:  [in]Size of the buffer
  * @flags: [in]mode flags for the file.
  * @exp_name:  [in]name of the exporting module - useful for debugging.
+ * @resv:  [in]reservation-object, NULL to allocate default one.
  *
  * Returns, on success, a newly created dma_buf object, which wraps the
  * supplied private data and operations for dma_buf_ops. On either missing
@@ -135,10 +141,17 @@ static inline int is_dma_buf_file(struct file *file)
  *
  */
 struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops,
-   size_t size, int flags, const char *exp_name)
+   size_t size, int flags, const char *exp_name,
+   struct reservation_object *resv)
 {
struct dma_buf *dmabuf;
struct file *file;
+   size_t alloc_size = sizeof(struct dma_buf);
+   if (!resv)
+   alloc_size += sizeof(struct reservation_object);
+   else
+   /* prevent &dma_buf[1] == dma_buf->resv */
+   alloc_size += 1;
 
if (WARN_ON(!priv || !ops
  || !ops->map_dma_buf
@@ -150,7 +163,7 @@ struct dma_buf *dma_buf_export_named(void *priv, const 
struct dma_buf_ops *ops,
return ERR_PTR(-EINVAL);
}
 
-   dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
+   dmabuf = kzalloc(alloc_size, GFP_KERNEL);
if (dmabuf == NULL)
return ERR_PTR(-ENOMEM);
 
@@ -158,6 +171,11 @@ struct dma_buf *dma_buf_export_named(void *priv, const 
struct dma_buf_ops *ops,
dmabuf->ops = ops;
dmabuf->size = size;
dmabuf->exp_name = exp_name;
+   if (!resv) {
+   resv = (struct reservation_object *)&dmabuf[1];
+   reservation_object_init(resv);
+   }
+   dmabuf->resv = resv;
 
file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags);
if (IS_ERR(file)) {
diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index bb9b642d8485..7496f55611a5 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -539,7 +539,7 @@ armada_gem_prime_export(struct drm_device *dev, struct 
drm_gem_object *obj,
int flags)
 {
return dma_buf_export(obj, &armada_gem_prime_dmabuf_ops, obj->size,
- O_RDWR);
+ O_RDWR, NULL);
 }
 
 struct drm_gem_object *
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 304ca8cacbc4..99d578bad17e 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -336,7 +336,13 @@ static const struct dma_buf_ops drm_gem_prime

[PATCH v2 8/9] reservation: update api and add some helpers

2014-07-01 Thread Maarten Lankhorst

Move the list of shared fences to a struct, and return it in
reservation_object_get_list().
Add reservation_object_get_excl to get the exclusive fence.

Add reservation_object_reserve_shared(), which reserves space
in the reservation_object for 1 more shared fence.

reservation_object_add_shared_fence() and
reservation_object_add_excl_fence() are used to assign a new
fence to a reservation_object pointer, to complete a reservation.

Signed-off-by: Maarten Lankhorst 

Changes since v1:
- Add reservation_object_get_excl, reorder code a bit.
---
 Documentation/DocBook/device-drivers.tmpl |1 
 drivers/dma-buf/dma-buf.c |   35 ---
 drivers/dma-buf/reservation.c |  156 +
 include/linux/reservation.h   |   56 +-
 4 files changed, 229 insertions(+), 19 deletions(-)

diff --git a/Documentation/DocBook/device-drivers.tmpl 
b/Documentation/DocBook/device-drivers.tmpl
index ed0ef00cd7bc..dd3f278faa8a 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -133,6 +133,7 @@ X!Edrivers/base/interface.c
 !Edrivers/dma-buf/seqno-fence.c
 !Iinclude/linux/fence.h
 !Iinclude/linux/seqno-fence.h
+!Edrivers/dma-buf/reservation.c
 !Iinclude/linux/reservation.h
 !Edrivers/base/dma-coherent.c
 !Edrivers/base/dma-mapping.c
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 25e8c4165936..cb8379dfeed5 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -134,7 +134,10 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
 {
struct dma_buf *dmabuf;
struct reservation_object *resv;
+   struct reservation_object_list *fobj;
+   struct fence *fence_excl;
unsigned long events;
+   unsigned shared_count;
 
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -150,12 +153,18 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
 
ww_mutex_lock(&resv->lock, NULL);
 
-   if (resv->fence_excl && (!(events & POLLOUT) ||
-resv->fence_shared_count == 0)) {
+   fobj = resv->fence;
+   if (!fobj)
+   goto out;
+
+   shared_count = fobj->shared_count;
+   fence_excl = resv->fence_excl;
+
+   if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
unsigned long pevents = POLLIN;
 
-   if (resv->fence_shared_count == 0)
+   if (shared_count == 0)
pevents |= POLLOUT;
 
spin_lock_irq(&dmabuf->poll.lock);
@@ -167,19 +176,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
if (events & pevents) {
-   if (!fence_add_callback(resv->fence_excl,
-   &dcb->cb, dma_buf_poll_cb))
+   if (!fence_add_callback(fence_excl, &dcb->cb,
+  dma_buf_poll_cb)) {
events &= ~pevents;
-   else
+   } else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
dma_buf_poll_cb(NULL, &dcb->cb);
+   }
}
}
 
-   if ((events & POLLOUT) && resv->fence_shared_count > 0) {
+   if ((events & POLLOUT) && shared_count > 0) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
int i;
 
@@ -194,15 +204,18 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!(events & POLLOUT))
goto out;
 
-   for (i = 0; i < resv->fence_shared_count; ++i)
-   if (!fence_add_callback(resv->fence_shared[i],
-   &dcb->cb, dma_buf_poll_cb)) {
+   for (i = 0; i < shared_count; ++i) {
+   struct fence *fence = fobj->shared[i];
+
+   if (!fence_add_callback(fence, &dcb->cb,
+   dma_buf_poll_cb)) {
events &= ~POLLOUT;
break;
}
+   }
 
/* No callback queued, wake up any additional waiters. */
-   if (i == resv->fence_shared_count)
+   if (i == shared_count)
dma_buf_poll_cb(NULL, &dc

[PATCH v2 6/9] reservation: add support for fences to enable cross-device synchronisation

2014-07-01 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Rob Clark 
---
 include/linux/reservation.h |   20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/reservation.h b/include/linux/reservation.h
index 813dae960ebd..f3f57460a205 100644
--- a/include/linux/reservation.h
+++ b/include/linux/reservation.h
@@ -6,7 +6,7 @@
  * Copyright (C) 2012 Texas Instruments
  *
  * Authors:
- * Rob Clark 
+ * Rob Clark 
  * Maarten Lankhorst 
  * Thomas Hellstrom 
  *
@@ -40,22 +40,40 @@
 #define _LINUX_RESERVATION_H
 
 #include 
+#include 
+#include 
 
 extern struct ww_class reservation_ww_class;
 
 struct reservation_object {
struct ww_mutex lock;
+
+   struct fence *fence_excl;
+   struct fence **fence_shared;
+   u32 fence_shared_count, fence_shared_max;
 };
 
 static inline void
 reservation_object_init(struct reservation_object *obj)
 {
ww_mutex_init(&obj->lock, &reservation_ww_class);
+
+   obj->fence_shared_count = obj->fence_shared_max = 0;
+   obj->fence_shared = NULL;
+   obj->fence_excl = NULL;
 }
 
 static inline void
 reservation_object_fini(struct reservation_object *obj)
 {
+   int i;
+
+   if (obj->fence_excl)
+   fence_put(obj->fence_excl);
+   for (i = 0; i < obj->fence_shared_count; ++i)
+   fence_put(obj->fence_shared[i]);
+   kfree(obj->fence_shared);
+
ww_mutex_destroy(&obj->lock);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 7/9] dma-buf: add poll support, v3

2014-07-01 Thread Maarten Lankhorst

Thanks to Fengguang Wu for spotting a missing static cast.

v2:
- Kill unused variable need_shared.
v3:
- Clarify the BUG() in dma_buf_release some more. (Rob Clark)

Signed-off-by: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c |  108 +
 include/linux/dma-buf.h   |   12 +
 2 files changed, 120 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cd40ca22911f..25e8c4165936 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static inline int is_dma_buf_file(struct file *);
@@ -52,6 +53,16 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
 
BUG_ON(dmabuf->vmapping_counter);
 
+   /*
+* Any fences that a dma-buf poll can wait on should be signaled
+* before releasing dma-buf. This is the responsibility of each
+* driver that uses the reservation objects.
+*
+* If you hit this BUG() it means someone dropped their ref to the
+* dma-buf while still having pending operation to the buffer.
+*/
+   BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
+
dmabuf->ops->release(dmabuf);
 
mutex_lock(&db_list.lock);
@@ -108,10 +119,103 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
return base + offset;
 }
 
+static void dma_buf_poll_cb(struct fence *fence, struct fence_cb *cb)
+{
+   struct dma_buf_poll_cb_t *dcb = (struct dma_buf_poll_cb_t *)cb;
+   unsigned long flags;
+
+   spin_lock_irqsave(&dcb->poll->lock, flags);
+   wake_up_locked_poll(dcb->poll, dcb->active);
+   dcb->active = 0;
+   spin_unlock_irqrestore(&dcb->poll->lock, flags);
+}
+
+static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
+{
+   struct dma_buf *dmabuf;
+   struct reservation_object *resv;
+   unsigned long events;
+
+   dmabuf = file->private_data;
+   if (!dmabuf || !dmabuf->resv)
+   return POLLERR;
+
+   resv = dmabuf->resv;
+
+   poll_wait(file, &dmabuf->poll, poll);
+
+   events = poll_requested_events(poll) & (POLLIN | POLLOUT);
+   if (!events)
+   return 0;
+
+   ww_mutex_lock(&resv->lock, NULL);
+
+   if (resv->fence_excl && (!(events & POLLOUT) ||
+resv->fence_shared_count == 0)) {
+   struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
+   unsigned long pevents = POLLIN;
+
+   if (resv->fence_shared_count == 0)
+   pevents |= POLLOUT;
+
+   spin_lock_irq(&dmabuf->poll.lock);
+   if (dcb->active) {
+   dcb->active |= pevents;
+   events &= ~pevents;
+   } else
+   dcb->active = pevents;
+   spin_unlock_irq(&dmabuf->poll.lock);
+
+   if (events & pevents) {
+   if (!fence_add_callback(resv->fence_excl,
+   &dcb->cb, dma_buf_poll_cb))
+   events &= ~pevents;
+   else
+   /*
+* No callback queued, wake up any additional
+* waiters.
+*/
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   }
+   }
+
+   if ((events & POLLOUT) && resv->fence_shared_count > 0) {
+   struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
+   int i;
+
+   /* Only queue a new callback if no event has fired yet */
+   spin_lock_irq(&dmabuf->poll.lock);
+   if (dcb->active)
+   events &= ~POLLOUT;
+   else
+   dcb->active = POLLOUT;
+   spin_unlock_irq(&dmabuf->poll.lock);
+
+   if (!(events & POLLOUT))
+   goto out;
+
+   for (i = 0; i < resv->fence_shared_count; ++i)
+   if (!fence_add_callback(resv->fence_shared[i],
+   &dcb->cb, dma_buf_poll_cb)) {
+   events &= ~POLLOUT;
+   break;
+   }
+
+   /* No callback queued, wake up any additional waiters. */
+   if (i == resv->fence_shared_count)
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   }
+
+out:
+   ww_mutex_unlock(&resv->lock);
+   return events;
+}
+
 static const struct file_operations dma_buf_fops = {

[PATCH v2 0/9] Updated fence patch series

2014-07-01 Thread Maarten Lankhorst

So after some more hacking I've moved dma-buf to its own subdirectory,
drivers/dma-buf and applied the fence patches to its new place. I believe that 
the
first patch should be applied regardless, and the rest should be ready now.
:-)

Changes to the fence api:
- release_fence -> fence_release etc.
- __fence_init -> fence_init
- __fence_signal -> fence_signal_locked
- __fence_is_signaled -> fence_is_signaled_locked
- Changing BUG_ON to WARN_ON in fence_later, and return NULL if it triggers.

Android can expose fences to userspace. It's possible to make the new fence
mechanism expose the same fences to userspace by changing sync_fence_create
to take a struct fence instead of a struct sync_pt. No other change is needed,
because only the fence parts of struct sync_pt are used. But because the
userspace fences are a separate problem and I haven't really looked at it yet
I feel it should stay in staging, for now.

---

Maarten Lankhorst (9):
  dma-buf: move to drivers/dma-buf
  fence: dma-buf cross-device synchronization (v18)
  seqno-fence: Hardware dma-buf implementation of fencing (v6)
  dma-buf: use reservation objects
  android: convert sync to fence api, v6
  reservation: add support for fences to enable cross-device synchronisation
  dma-buf: add poll support, v3
  reservation: update api and add some helpers
  reservation: add suppport for read-only access using rcu


 Documentation/DocBook/device-drivers.tmpl  |8 
 MAINTAINERS|4 
 drivers/Makefile   |1 
 drivers/base/Kconfig   |9 
 drivers/base/Makefile  |1 
 drivers/base/dma-buf.c |  743 
 drivers/base/reservation.c |   39 -
 drivers/dma-buf/Makefile   |1 
 drivers/dma-buf/dma-buf.c  |  907 
 drivers/dma-buf/fence.c|  431 +++
 drivers/dma-buf/reservation.c  |  477 +
 drivers/dma-buf/seqno-fence.c  |   73 ++
 drivers/gpu/drm/armada/armada_gem.c|2 
 drivers/gpu/drm/drm_prime.c|8 
 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |2 
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |3 
 drivers/gpu/drm/nouveau/nouveau_drm.c  |1 
 drivers/gpu/drm/nouveau/nouveau_gem.h  |1 
 drivers/gpu/drm/nouveau/nouveau_prime.c|7 
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c  |2 
 drivers/gpu/drm/radeon/radeon_drv.c|2 
 drivers/gpu/drm/radeon/radeon_prime.c  |8 
 drivers/gpu/drm/tegra/gem.c|2 
 drivers/gpu/drm/ttm/ttm_object.c   |2 
 drivers/media/v4l2-core/videobuf2-dma-contig.c |2 
 drivers/staging/android/Kconfig|1 
 drivers/staging/android/Makefile   |2 
 drivers/staging/android/ion/ion.c  |3 
 drivers/staging/android/sw_sync.c  |6 
 drivers/staging/android/sync.c |  913 
 drivers/staging/android/sync.h |   79 +-
 drivers/staging/android/sync_debug.c   |  247 ++
 drivers/staging/android/trace/sync.h   |   12 
 include/drm/drmP.h |3 
 include/linux/dma-buf.h|   21 -
 include/linux/fence.h  |  360 +
 include/linux/reservation.h|   82 ++
 include/linux/seqno-fence.h|  116 +++
 include/trace/events/fence.h   |  128 +++
 39 files changed, 3258 insertions(+), 1451 deletions(-)
 delete mode 100644 drivers/base/dma-buf.c
 delete mode 100644 drivers/base/reservation.c
 create mode 100644 drivers/dma-buf/Makefile
 create mode 100644 drivers/dma-buf/dma-buf.c
 create mode 100644 drivers/dma-buf/fence.c
 create mode 100644 drivers/dma-buf/reservation.c
 create mode 100644 drivers/dma-buf/seqno-fence.c
 create mode 100644 drivers/staging/android/sync_debug.c
 create mode 100644 include/linux/fence.h
 create mode 100644 include/linux/seqno-fence.h
 create mode 100644 include/trace/events/fence.h

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2 1/9] dma-buf: move to drivers/dma-buf

2014-07-01 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 Documentation/DocBook/device-drivers.tmpl |3 
 MAINTAINERS   |4 
 drivers/Makefile  |1 
 drivers/base/Makefile |1 
 drivers/base/dma-buf.c|  743 -
 drivers/base/reservation.c|   39 --
 drivers/dma-buf/Makefile  |1 
 drivers/dma-buf/dma-buf.c |  743 +
 drivers/dma-buf/reservation.c |   39 ++
 9 files changed, 787 insertions(+), 787 deletions(-)
 delete mode 100644 drivers/base/dma-buf.c
 delete mode 100644 drivers/base/reservation.c
 create mode 100644 drivers/dma-buf/Makefile
 create mode 100644 drivers/dma-buf/dma-buf.c
 create mode 100644 drivers/dma-buf/reservation.c

diff --git a/Documentation/DocBook/device-drivers.tmpl 
b/Documentation/DocBook/device-drivers.tmpl
index cc63f30de166..ac61ebd92875 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -128,8 +128,7 @@ X!Edrivers/base/interface.c
 !Edrivers/base/bus.c
  
  Device Drivers DMA Management
-!Edrivers/base/dma-buf.c
-!Edrivers/base/reservation.c
+!Edrivers/dma-buf/dma-buf.c
 !Iinclude/linux/reservation.h
 !Edrivers/base/dma-coherent.c
 !Edrivers/base/dma-mapping.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 134483f206e4..c948e53a4ee6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2882,8 +2882,8 @@ S:Maintained
 L: linux-me...@vger.kernel.org
 L: dri-de...@lists.freedesktop.org
 L: linaro-mm-...@lists.linaro.org
-F: drivers/base/dma-buf*
-F: include/linux/dma-buf*
+F: drivers/dma-buf/
+F: include/linux/dma-buf* include/linux/reservation.h
 F: Documentation/dma-buf-sharing.txt
 T: git git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
 
diff --git a/drivers/Makefile b/drivers/Makefile
index f98b50d8251d..c00337be5351 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -61,6 +61,7 @@ obj-$(CONFIG_FB_INTEL)  += video/fbdev/intelfb/
 
 obj-$(CONFIG_PARPORT)  += parport/
 obj-y  += base/ block/ misc/ mfd/ nfc/
+obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf/
 obj-$(CONFIG_NUBUS)+= nubus/
 obj-y  += macintosh/
 obj-$(CONFIG_IDE)  += ide/
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 04b314e0fa51..4aab26ec0292 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -10,7 +10,6 @@ obj-$(CONFIG_DMA_CMA) += dma-contiguous.o
 obj-y  += power/
 obj-$(CONFIG_HAS_DMA)  += dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
-obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o reservation.o
 obj-$(CONFIG_ISA)  += isa.o
 obj-$(CONFIG_FW_LOADER)+= firmware_class.o
 obj-$(CONFIG_NUMA) += node.o
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
deleted file mode 100644
index 840c7fa80983..
--- a/drivers/base/dma-buf.c
+++ /dev/null
@@ -1,743 +0,0 @@
-/*
- * Framework for buffer objects that can be shared across devices/subsystems.
- *
- * Copyright(C) 2011 Linaro Limited. All rights reserved.
- * Author: Sumit Semwal 
- *
- * Many thanks to linaro-mm-sig list, and specially
- * Arnd Bergmann , Rob Clark  and
- * Daniel Vetter  for their support in creation and
- * refining of this idea.
- *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License version 2 as published by
- * the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but WITHOUT
- * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
- * more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static inline int is_dma_buf_file(struct file *);
-
-struct dma_buf_list {
-   struct list_head head;
-   struct mutex lock;
-};
-
-static struct dma_buf_list db_list;
-
-static int dma_buf_release(struct inode *inode, struct file *file)
-{
-   struct dma_buf *dmabuf;
-
-   if (!is_dma_buf_file(file))
-   return -EINVAL;
-
-   dmabuf = file->private_data;
-
-   BUG_ON(dmabuf->vmapping_counter);
-
-   dmabuf->ops->release(dmabuf);
-
-   mutex_lock(&db_list.lock);
-   list_del(&dmabuf->list_node);
-   mutex_unlock(&db_list.lock);
-
-   kfree(dmabuf);
-   return 0;
-}
-
-static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
-{
-   struct dma_buf *dmabuf;
-
-   if (!is_dma_buf_file(file))
-   return -EINVAL;
-
-

[PATCH v2 3/9] seqno-fence: Hardware dma-buf implementation of fencing (v6)

2014-07-01 Thread Maarten Lankhorst

This type of fence can be used with hardware synchronization for simple
hardware that can block execution until the condition
(dma_buf[offset] - value) >= 0 has been met when WAIT_GEQUAL is used,
or (dma_buf[offset] != 0) has been met when WAIT_NONZERO is set.

A software fallback still has to be provided in case the fence is used
with a device that doesn't support this mechanism. It is useful to expose
this for graphics cards that have an op to support this.

Some cards like i915 can export those, but don't have an option to wait,
so they need the software fallback.

I extended the original patch by Rob Clark.

v1: Original
v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
not much was left of the file. Lots of documentation added.
v3: Use fence_ops instead of custom callbacks. Moved to own file
to avoid circular dependency between dma-buf.h and fence.h
v4: Add spinlock pointer to seqno_fence_init
v5: Add condition member to allow wait for != 0.
Fix small style errors pointed out by checkpatch.
v6: Move to a separate file. Fix up api changes in fences.

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Rob Clark  #v4
---
 Documentation/DocBook/device-drivers.tmpl |2 +
 MAINTAINERS   |2 -
 drivers/dma-buf/Makefile  |2 -
 drivers/dma-buf/seqno-fence.c |   73 ++
 include/linux/seqno-fence.h   |  116 +
 5 files changed, 193 insertions(+), 2 deletions(-)
 create mode 100644 drivers/dma-buf/seqno-fence.c
 create mode 100644 include/linux/seqno-fence.h

diff --git a/Documentation/DocBook/device-drivers.tmpl 
b/Documentation/DocBook/device-drivers.tmpl
index e634657efb52..ed0ef00cd7bc 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -130,7 +130,9 @@ X!Edrivers/base/interface.c
  Device Drivers DMA Management
 !Edrivers/dma-buf/dma-buf.c
 !Edrivers/dma-buf/fence.c
+!Edrivers/dma-buf/seqno-fence.c
 !Iinclude/linux/fence.h
+!Iinclude/linux/seqno-fence.h
 !Iinclude/linux/reservation.h
 !Edrivers/base/dma-coherent.c
 !Edrivers/base/dma-mapping.c
diff --git a/MAINTAINERS b/MAINTAINERS
index ebc1ebf6f542..135929f6cf6a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2883,7 +2883,7 @@ L:linux-me...@vger.kernel.org
 L: dri-de...@lists.freedesktop.org
 L: linaro-mm-...@lists.linaro.org
 F: drivers/dma-buf/
-F: include/linux/dma-buf* include/linux/reservation.h include/linux/fence.h
+F: include/linux/dma-buf* include/linux/reservation.h 
include/linux/*fence.h
 F: Documentation/dma-buf-sharing.txt
 T: git git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
 
diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index d7825bfe630e..57a675f90cd0 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1 +1 @@
-obj-y := dma-buf.o fence.o reservation.o
+obj-y := dma-buf.o fence.o reservation.o seqno-fence.o
diff --git a/drivers/dma-buf/seqno-fence.c b/drivers/dma-buf/seqno-fence.c
new file mode 100644
index ..7d12a39a4b57
--- /dev/null
+++ b/drivers/dma-buf/seqno-fence.c
@@ -0,0 +1,73 @@
+/*
+ * seqno-fence, using a dma-buf to synchronize fencing
+ *
+ * Copyright (C) 2012 Texas Instruments
+ * Copyright (C) 2012-2014 Canonical Ltd
+ * Authors:
+ *   Rob Clark 
+ *   Maarten Lankhorst 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+
+static const char *seqno_fence_get_driver_name(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->get_driver_name(fence);
+}
+
+static const char *seqno_fence_get_timeline_name(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->get_timeline_name(fence);
+}
+
+static bool seqno_enable_signaling(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->enable_signaling(fence);
+}
+
+static bool seqno_signaled(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->signaled && seqno_fence->ops->signaled(fence);
+}
+
+static void seqno_release(struct fence *fence)
+{
+   struct seqno_fence *f = to_seqno_fence(fence);
+
+   dma_buf_put(f->sync_buf);
+   if (f->ops->release)
+   f->ops->release(fence);
+   else
+

[PATCH v2 5/9] android: convert sync to fence api, v6

2014-07-01 Thread Maarten Lankhorst

Just to show it's easy.

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.
v5:
- Fix small style issues pointed out by Thomas Hellstrom.
v6:
- Fix for updates to fence api.

Signed-off-by: Maarten Lankhorst 
Acked-by: John Stultz 
---
 drivers/staging/android/Kconfig  |1 
 drivers/staging/android/Makefile |2 
 drivers/staging/android/sw_sync.c|6 
 drivers/staging/android/sync.c   |  913 +++---
 drivers/staging/android/sync.h   |   79 ++-
 drivers/staging/android/sync_debug.c |  247 +
 drivers/staging/android/trace/sync.h |   12 
 7 files changed, 609 insertions(+), 651 deletions(-)
 create mode 100644 drivers/staging/android/sync_debug.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 99e484f845f2..51607e9aa049 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -88,6 +88,7 @@ config SYNC
bool "Synchronization framework"
default n
select ANON_INODES
+   select DMA_SHARED_BUFFER
---help---
  This option enables the framework for synchronization between multiple
  drivers.  Sync implementations can take advantage of hardware
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 0a01e1914905..517ad5ffa429 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_ANDROID_TIMED_OUTPUT)  += timed_output.o
 obj-$(CONFIG_ANDROID_TIMED_GPIO)   += timed_gpio.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
 obj-$(CONFIG_ANDROID_INTF_ALARM_DEV)   += alarm-dev.o
-obj-$(CONFIG_SYNC) += sync.o
+obj-$(CONFIG_SYNC) += sync.o sync_debug.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o
diff --git a/drivers/staging/android/sw_sync.c 
b/drivers/staging/android/sw_sync.c
index 12a136ec1cec..a76db3ff87cb 100644
--- a/drivers/staging/android/sw_sync.c
+++ b/drivers/staging/android/sw_sync.c
@@ -50,7 +50,7 @@ static struct sync_pt *sw_sync_pt_dup(struct sync_pt *sync_pt)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *) sync_pt;
struct sw_sync_timeline *obj =
-   (struct sw_sync_timeline *)sync_pt->parent;
+   (struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
return (struct sync_pt *) sw_sync_pt_create(obj, pt->value);
 }
@@ -59,7 +59,7 @@ static int sw_sync_pt_has_signaled(struct sync_pt *sync_pt)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
struct sw_sync_timeline *obj =
-   (struct sw_sync_timeline *)sync_pt->parent;
+   (struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
return sw_sync_cmp(obj->value, pt->value) >= 0;
 }
@@ -97,7 +97,6 @@ static void sw_sync_pt_value_str(struct sync_pt *sync_pt,
   char *str, int size)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-
snprintf(str, size, "%d", pt->value);
 }
 
@@ -157,7 +156,6 @@ static int sw_sync_open(struct inode *inode, struct file 
*file)
 static int sw_sync_release(struct inode *inode, struct file *file)
 {
struct sw_sync_timeline *obj = file->private_data;
-
sync_timeline_destroy(&obj->obj);
return 0;
 }
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 18174f7c871c..c9a0c2cdc81a 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -31,22 +31,13 @@
 #define CREATE_TRACE_POINTS
 #include "trace/sync.h"
 
-static void sync_fence_signal_pt(struct sync_pt *pt);
-static int _sync_pt_has_signaled(struct sync_pt *pt);
-static void sync_fence_free(struct kref *kref);
-static void sync_dump(void);
-
-static LIST_HEAD(sync_timeline_list_head);
-static DEFINE_SPINLOCK(sync_timeline_list_lock);
-
-static LIST_HEAD(sync_fence_list_head);
-static DEFINE_SPINLOCK(sync_fence_list_lock);
+static const struct fence_ops android_fence_ops;
+static const struct file_operations sync_fence_fops;
 
 struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
   int size, const char *name)
 {
struct sync_timeline *obj;
-   unsigned long flags;
 
if (size < sizeof(struct sync_timeline))
return NULL;
@@ -57,17 +48,14 @@ struct sync_timeline *sync_timeline_create(const struct 
sync_timeline_ops *ops,
 
kref_init(&obj->kref);

Re: [Nouveau] [PATCH v2 2/3] drm/ttm: introduce dma cache sync helpers

2014-06-24 Thread Maarten Lankhorst

op 24-06-14 14:23, Alexandre Courbot schreef:
> On Tue, Jun 24, 2014 at 7:55 PM, Alexandre Courbot  
> wrote:
>> On 06/24/2014 07:33 PM, Alexandre Courbot wrote:
>>> On 06/24/2014 07:02 PM, Russell King - ARM Linux wrote:
 On Tue, Jun 24, 2014 at 06:54:26PM +0900, Alexandre Courbot wrote:
> From: Lucas Stach 
>
> On architectures for which access to GPU memory is non-coherent,
> caches need to be flushed and invalidated explicitly at the
> appropriate places. Introduce two small helpers to make things
> easy for TTM-based drivers.

 Have you run this with DMA API debugging enabled?  I suspect you haven't,
 and I recommend that you do.
>>>
>>> # cat /sys/kernel/debug/dma-api/error_count
>>> 162621
>>>
>>> (╯°□°）╯︵ ┻━┻)
>>
>> *puts table back on its feet*
>>
>> So, yeah - TTM memory is not allocated using the DMA API, hence we cannot
>> use the DMA API to sync it. Thanks Russell for pointing it out.
>>
>> The only alternative I see here is to flush the CPU caches when syncing for
>> the device, and invalidate them for the other direction. Of course if the
>> device has caches on its side as well the opposite operation must also be
>> done for it. Guess the only way is to handle it all by ourselves here. :/
> ... and it really sucks. Basically if we cannot use the DMA API here
> we will lose the convenience of having a portable API that does just
> the right thing for the underlying platform. Without it we would have
> to duplicate arm_iommu_sync_single_for_cpu/device() and we would only
> have support for ARM.
>
> The usage of the DMA API that we are doing might be illegal, but in
> essence it does exactly what we need - at least for ARM. What are the
> alternatives?
Convert TTM to use the dma api? :-)

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPOST PATCH 4/8] android: convert sync to fence api, v5

2014-06-23 Thread Maarten Lankhorst

Hey,

op 20-06-14 22:52, Thierry Reding schreef:
> On Thu, Jun 19, 2014 at 02:28:14PM +0200, Daniel Vetter wrote:
>> On Thu, Jun 19, 2014 at 1:48 PM, Thierry Reding
>>  wrote:
> With these changes, can we pull the android sync logic out of
> drivers/staging/ now?
 Afaik the google guys never really looked at this and acked it. So I'm not
 sure whether they'll follow along. The other issue I have as the
 maintainer of gfx driver is that I don't want to implement support for two
 different sync object primitives (once for dma-buf and once for android
 syncpts), and my impression thus far has been that even with this we're
 not there.

 I'm trying to get our own android guys to upstream their i915 syncpts
 support, but thus far I haven't managed to convince them to throw people's
 time at this.
>>> This has been discussed a fair bit internally recently and some of our
>>> GPU experts have raised concerns that this may result in seriously
>>> degraded performance in our proprietary graphics stack. Now I don't care
>>> very much for the proprietary graphics stack, but by extension I would
>>> assume that the same restrictions are relevant for any open-source
>>> driver as well.
>>>
>>> I'm still trying to fully understand all the implications and at the
>>> same time get some of the people who raised concerns to join in this
>>> discussion. As I understand it the concern is mostly about explicit vs.
>>> implicit synchronization and having this mechanism in the kernel will
>>> implicitly synchronize all accesses to these buffers even in cases where
>>> it's not needed (read vs. write locks, etc.). In one particular instance
>>> it was even mentioned that this kind of implicit synchronization can
>>> lead to deadlocks in some use-cases (this was mentioned for Android
>>> compositing, but I suspect that the same may happen for Wayland or X
>>> compositors).
>> Well the implicit fences here actually can't deadlock. That's the
>> entire point behind using ww mutexes. I've also heard tons of
>> complaints about implicit enforced syncing (especially from opencl
>> people), but in the end drivers and always expose unsynchronized
>> access for specific cases. We do that in i915 for upload buffers and
>> other fun stuff. This is about shared stuff across different drivers
>> and different processes.
> Tegra K1 needs to share buffers across different drivers even for very
> basic use-cases since the GPU and display drivers are separate. So while
> I agree that the GPU driver can still use explicit synchronization for
> internal work, things aren't that simple in general.
>
> Let me try to reconstruct the use-case that caused the lock on Android:
> the compositor uses a hardware overlay to display an image. The system
> detects that there's little activity and instructs the compositor to put
> everything into one image and scan out only that (for power efficiency).
> Now with implicit locking the display driver has a lock on the image, so
> the GPU (used for compositing) needs to wait for it before it can
> composite everything into one image. But the display driver cannot
> release the lock on the image until the final image has been composited
> and can be displayed instead.
>
> This may not be technically a deadlock, but it's still a stalemate.
> Unless I'm missing something fundamental about DMA fences and ww mutexes
> I don't see how you can get out of this situation.
This sounds like a case for implicit shared fences. ;-) Reading and scanning 
out would
only wait for the last 'exclusive' fence, not on each other.

But in drivers/drm I can encounter a similar issue, people expect to be able to
overwrite the contents of the currently displayed buffer, so I 'solved' it by 
not adding
a fence on the buffer, only by waiting for buffer idle before page flipping.
The rationale is that the buffer is pinned internally, and the backing storage 
cannot
go away until dma_buf_unmap_attachment is called. So when you render to the
current front buffer without queuing a page flip you get exactly what you 
expect. ;-)

> Explicit vs. implicit synchronization may also become more of an issue
> as buffers are imported from other sources (such as cameras).
Yeah, but the kernel space primitives would in both cases be the same, so 
drivers don't need to implement 2 separate fencing mechanisms for that. :-)

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [REPOST PATCH 4/8] android: convert sync to fence api, v5

2014-06-19 Thread Maarten Lankhorst

op 19-06-14 17:22, Colin Cross schreef:
> On Wed, Jun 18, 2014 at 11:37 PM, Daniel Vetter  wrote:
>> On Wed, Jun 18, 2014 at 06:15:56PM -0700, Greg KH wrote:
>>> On Wed, Jun 18, 2014 at 12:37:11PM +0200, Maarten Lankhorst wrote:
>>>> Just to show it's easy.
>>>>
>>>> Android syncpoints can be mapped to a timeline. This removes the need
>>>> to maintain a separate api for synchronization. I've left the android
>>>> trace events in place, but the core fence events should already be
>>>> sufficient for debugging.
>>>>
>>>> v2:
>>>> - Call fence_remove_callback in sync_fence_free if not all fences have 
>>>> fired.
>>>> v3:
>>>> - Merge Colin Cross' bugfixes, and the android fence merge optimization.
>>>> v4:
>>>> - Merge with the upstream fixes.
>>>> v5:
>>>> - Fix small style issues pointed out by Thomas Hellstrom.
>>>>
>>>> Signed-off-by: Maarten Lankhorst 
>>>> Acked-by: John Stultz 
>>>> ---
>>>>  drivers/staging/android/Kconfig  |1
>>>>  drivers/staging/android/Makefile |2
>>>>  drivers/staging/android/sw_sync.c|6
>>>>  drivers/staging/android/sync.c   |  913 
>>>> +++---
>>>>  drivers/staging/android/sync.h   |   79 ++-
>>>>  drivers/staging/android/sync_debug.c |  247 +
>>>>  drivers/staging/android/trace/sync.h |   12
>>>>  7 files changed, 609 insertions(+), 651 deletions(-)
>>>>  create mode 100644 drivers/staging/android/sync_debug.c
>>> With these changes, can we pull the android sync logic out of
>>> drivers/staging/ now?
>> Afaik the google guys never really looked at this and acked it. So I'm not
>> sure whether they'll follow along. The other issue I have as the
>> maintainer of gfx driver is that I don't want to implement support for two
>> different sync object primitives (once for dma-buf and once for android
>> syncpts), and my impression thus far has been that even with this we're
>> not there.
> We have tested these patches to use dma fences to back the android
> sync driver and not found any major issues.  However, my understanding
> is that dma fences are designed for implicit sync, and explicit sync
> through the android sync driver is bolted on the side to share code.
> Android is not moving away from explicit sync, but we do wrap all of
> our userspace sync accesses through libsync
> (https://android.googlesource.com/platform/system/core/+/master/libsync/sync.c,
> ignore the sw_sync parts), so if the kernel supported a slightly
> different userspace explicit sync interface we could adapt to it
> fairly easily.  All we require is that individual kernel drivers need
> to be able to accept work alongisde an fd to wait on, and to return an
> fd that will signal when the work is done, and that userspace has some
> way to merge two of those fds, wait on an fd, and get some debugging
> info from an fd.  However, this patch set doesn't do that, it has no
> way to export a dma fence as an fd except through the android sync
> driver, so it is not yet ready to fully replace android sync.
>
Dma fences can be exported as android fences, just didn't see a need for it 
yet. :-)
To wait on all implicit fences attached to a dma-buf one could simply poll the 
dma-buf directly,
or use something like a android userspace fence.

sync_fence_create takes a sync_pt as function argument, but I kept that to keep 
source code
compatibility, not because it uses any sync_pt functions. Here's a patch to 
create a userspace
fd for dma-fence instead of a android fence, applied on top of "android: 
convert sync to fence api".

diff --git a/drivers/staging/android/sw_sync.c 
b/drivers/staging/android/sw_sync.c
index a76db3ff87cb..afc3c63b0438 100644
--- a/drivers/staging/android/sw_sync.c
+++ b/drivers/staging/android/sw_sync.c
@@ -184,7 +184,7 @@ static long sw_sync_ioctl_create_fence(struct 
sw_sync_timeline *obj,
}
 
data.name[sizeof(data.name) - 1] = '\0';
-   fence = sync_fence_create(data.name, pt);
+   fence = sync_fence_create(data.name, &pt->base);
if (fence == NULL) {
sync_pt_free(pt);
err = -ENOMEM;
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 70b09b5001ba..c89a6f954e41 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct 
fence_cb *cb)
 }
 
 /* TODO: implement a create which takes mor

[REPOST PATCH 1/8] fence: dma-buf cross-device synchronization (v17)

2014-06-18 Thread Maarten Lankhorst

A fence can be attached to a buffer which is being filled or consumed
by hw, to allow userspace to pass the buffer without waiting to another
device.  For example, userspace can call page_flip ioctl to display the
next frame of graphics after kicking the GPU but while the GPU is still
rendering.  The display device sharing the buffer with the GPU would
attach a callback to get notified when the GPU's rendering-complete IRQ
fires, to update the scan-out address of the display, without having to
wake up userspace.

A driver must allocate a fence context for each execution ring that can
run in parallel. The function for this takes an argument with how many
contexts to allocate:
  + fence_context_alloc()

A fence is transient, one-shot deal.  It is allocated and attached
to one or more dma-buf's.  When the one that attached it is done, with
the pending operation, it can signal the fence:
  + fence_signal()

To have a rough approximation whether a fence is fired, call:
  + fence_is_signaled()

The dma-buf-mgr handles tracking, and waiting on, the fences associated
with a dma-buf.

The one pending on the fence can add an async callback:
  + fence_add_callback()

The callback can optionally be cancelled with:
  + fence_remove_callback()

To wait synchronously, optionally with a timeout:
  + fence_wait()
  + fence_wait_timeout()

When emitting a fence, call:
  + trace_fence_emit()

To annotate that a fence is blocking on another fence, call:
  + trace_fence_annotate_wait_on(fence, on_fence)

A default software-only implementation is provided, which can be used
by drivers attaching a fence to a buffer when they have no other means
for hw sync.  But a memory backed fence is also envisioned, because it
is common that GPU's can write to, or poll on some memory location for
synchronization.  For example:

  fence = custom_get_fence(...);
  if ((seqno_fence = to_seqno_fence(fence)) != NULL) {
dma_buf *fence_buf = seqno_fence->sync_buf;
get_dma_buf(fence_buf);

... tell the hw the memory location to wait ...
custom_wait_on(fence_buf, seqno_fence->seqno_ofs, fence->seqno);
  } else {
/* fall-back to sw sync * /
fence_add_callback(fence, my_cb);
  }

On SoC platforms, if some other hw mechanism is provided for synchronizing
between IP blocks, it could be supported as an alternate implementation
with it's own fence ops in a similar way.

enable_signaling callback is used to provide sw signaling in case a cpu
waiter is requested or no compatible hardware signaling could be used.

The intention is to provide a userspace interface (presumably via eventfd)
later, to be used in conjunction with dma-buf's mmap support for sw access
to buffers (or for userspace apps that would prefer to do their own
synchronization).

v1: Original
v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided
that dma-fence didn't need to care about the sw->hw signaling path
(it can be handled same as sw->sw case), and therefore the fence->ops
can be simplified and more handled in the core.  So remove the signal,
add_callback, cancel_callback, and wait ops, and replace with a simple
enable_signaling() op which can be used to inform a fence supporting
hw->hw signaling that one or more devices which do not support hw
signaling are waiting (and therefore it should enable an irq or do
whatever is necessary in order that the CPU is notified when the
fence is passed).
v3: Fix locking fail in attach_fence() and get_fence()
v4: Remove tie-in w/ dma-buf..  after discussion w/ danvet and mlankorst
we decided that we need to be able to attach one fence to N dma-buf's,
so using the list_head in dma-fence struct would be problematic.
v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager.
v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments
about checking if fence fired or not. This is broken by design.
waitqueue_active during destruction is now fatal, since the signaller
should be holding a reference in enable_signalling until it signalled
the fence. Pass the original dma_fence_cb along, and call __remove_wait
in the dma_fence_callback handler, so that no cleanup needs to be
performed.
v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if
fence wasn't signaled yet, for example for hardware fences that may
choose to signal blindly.
v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to
header and fixed include mess. dma-fence.h now includes dma-buf.h
All members are now initialized, so kmalloc can be used for
allocating a dma-fence. More documentation added.
v9: Change compiler bitfields to flags, change return type of
enable_signaling to bool. Rework dma_fence_wait. Added
dma_fence_is_signaled and dma_fence_wait_timeout.
s/dma// and change exports to non GPL. Added fence_is_signaled

[REPOST PATCH 2/8] seqno-fence: Hardware dma-buf implementation of fencing (v5)

2014-06-18 Thread Maarten Lankhorst

This type of fence can be used with hardware synchronization for simple
hardware that can block execution until the condition
(dma_buf[offset] - value) >= 0 has been met when WAIT_GEQUAL is used,
or (dma_buf[offset] != 0) has been met when WAIT_NONZERO is set.

A software fallback still has to be provided in case the fence is used
with a device that doesn't support this mechanism. It is useful to expose
this for graphics cards that have an op to support this.

Some cards like i915 can export those, but don't have an option to wait,
so they need the software fallback.

I extended the original patch by Rob Clark.

v1: Original
v2: Renamed from bikeshed to seqno, moved into dma-fence.c since
not much was left of the file. Lots of documentation added.
v3: Use fence_ops instead of custom callbacks. Moved to own file
to avoid circular dependency between dma-buf.h and fence.h
v4: Add spinlock pointer to seqno_fence_init
v5: Add condition member to allow wait for != 0.
Fix small style errors pointed out by checkpatch.

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Rob Clark  #v4
---
 Documentation/DocBook/device-drivers.tmpl |1 
 drivers/base/fence.c  |   52 +
 include/linux/seqno-fence.h   |  119 +
 3 files changed, 172 insertions(+)
 create mode 100644 include/linux/seqno-fence.h

diff --git a/Documentation/DocBook/device-drivers.tmpl 
b/Documentation/DocBook/device-drivers.tmpl
index 7eef81069d1b..6ca7a11fb893 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -131,6 +131,7 @@ X!Edrivers/base/interface.c
 !Edrivers/base/dma-buf.c
 !Edrivers/base/fence.c
 !Iinclude/linux/fence.h
+!Iinclude/linux/seqno-fence.h
 !Edrivers/base/reservation.c
 !Iinclude/linux/reservation.h
 !Edrivers/base/dma-coherent.c
diff --git a/drivers/base/fence.c b/drivers/base/fence.c
index 1da7f4d6542a..752a2dfa505f 100644
--- a/drivers/base/fence.c
+++ b/drivers/base/fence.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -414,3 +415,54 @@ __fence_init(struct fence *fence, const struct fence_ops 
*ops,
trace_fence_init(fence);
 }
 EXPORT_SYMBOL(__fence_init);
+
+static const char *seqno_fence_get_driver_name(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->get_driver_name(fence);
+}
+
+static const char *seqno_fence_get_timeline_name(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->get_timeline_name(fence);
+}
+
+static bool seqno_enable_signaling(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->enable_signaling(fence);
+}
+
+static bool seqno_signaled(struct fence *fence)
+{
+   struct seqno_fence *seqno_fence = to_seqno_fence(fence);
+   return seqno_fence->ops->signaled && seqno_fence->ops->signaled(fence);
+}
+
+static void seqno_release(struct fence *fence)
+{
+   struct seqno_fence *f = to_seqno_fence(fence);
+
+   dma_buf_put(f->sync_buf);
+   if (f->ops->release)
+   f->ops->release(fence);
+   else
+   kfree(f);
+}
+
+static long seqno_wait(struct fence *fence, bool intr, signed long timeout)
+{
+   struct seqno_fence *f = to_seqno_fence(fence);
+   return f->ops->wait(fence, intr, timeout);
+}
+
+const struct fence_ops seqno_fence_ops = {
+   .get_driver_name = seqno_fence_get_driver_name,
+   .get_timeline_name = seqno_fence_get_timeline_name,
+   .enable_signaling = seqno_enable_signaling,
+   .signaled = seqno_signaled,
+   .wait = seqno_wait,
+   .release = seqno_release,
+};
+EXPORT_SYMBOL(seqno_fence_ops);
diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h
new file mode 100644
index ..b4d4aad3cadc
--- /dev/null
+++ b/include/linux/seqno-fence.h
@@ -0,0 +1,119 @@
+/*
+ * seqno-fence, using a dma-buf to synchronize fencing
+ *
+ * Copyright (C) 2012 Texas Instruments
+ * Copyright (C) 2012 Canonical Ltd
+ * Authors:
+ * Rob Clark 
+ *   Maarten Lankhorst 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __LINUX_SEQNO_FENCE_H
+#define __LINUX_SEQNO_FENCE_H
+
+#inclu

[REPOST PATCH 5/8] reservation: add support for fences to enable cross-device synchronisation

2014-06-18 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Rob Clark 
---
 include/linux/reservation.h |   20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/reservation.h b/include/linux/reservation.h
index 813dae960ebd..f3f57460a205 100644
--- a/include/linux/reservation.h
+++ b/include/linux/reservation.h
@@ -6,7 +6,7 @@
  * Copyright (C) 2012 Texas Instruments
  *
  * Authors:
- * Rob Clark 
+ * Rob Clark 
  * Maarten Lankhorst 
  * Thomas Hellstrom 
  *
@@ -40,22 +40,40 @@
 #define _LINUX_RESERVATION_H
 
 #include 
+#include 
+#include 
 
 extern struct ww_class reservation_ww_class;
 
 struct reservation_object {
struct ww_mutex lock;
+
+   struct fence *fence_excl;
+   struct fence **fence_shared;
+   u32 fence_shared_count, fence_shared_max;
 };
 
 static inline void
 reservation_object_init(struct reservation_object *obj)
 {
ww_mutex_init(&obj->lock, &reservation_ww_class);
+
+   obj->fence_shared_count = obj->fence_shared_max = 0;
+   obj->fence_shared = NULL;
+   obj->fence_excl = NULL;
 }
 
 static inline void
 reservation_object_fini(struct reservation_object *obj)
 {
+   int i;
+
+   if (obj->fence_excl)
+   fence_put(obj->fence_excl);
+   for (i = 0; i < obj->fence_shared_count; ++i)
+   fence_put(obj->fence_shared[i]);
+   kfree(obj->fence_shared);
+
ww_mutex_destroy(&obj->lock);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REPOST PATCH 7/8] reservation: update api and add some helpers

2014-06-18 Thread Maarten Lankhorst

Move the list of shared fences to a struct, and return it in
reservation_object_get_list().
Add reservation_object_get_excl to get the exclusive fence.

Add reservation_object_reserve_shared(), which reserves space
in the reservation_object for 1 more shared fence.

reservation_object_add_shared_fence() and
reservation_object_add_excl_fence() are used to assign a new
fence to a reservation_object pointer, to complete a reservation.

Signed-off-by: Maarten Lankhorst 

Changes since v1:
- Add reservation_object_get_excl, reorder code a bit.
---
 drivers/base/dma-buf.c  |   35 +++---
 drivers/base/fence.c|4 +
 drivers/base/reservation.c  |  156 +++
 include/linux/fence.h   |6 ++
 include/linux/reservation.h |   56 ++-
 5 files changed, 236 insertions(+), 21 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 25e8c4165936..cb8379dfeed5 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -134,7 +134,10 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
 {
struct dma_buf *dmabuf;
struct reservation_object *resv;
+   struct reservation_object_list *fobj;
+   struct fence *fence_excl;
unsigned long events;
+   unsigned shared_count;
 
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -150,12 +153,18 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
 
ww_mutex_lock(&resv->lock, NULL);
 
-   if (resv->fence_excl && (!(events & POLLOUT) ||
-resv->fence_shared_count == 0)) {
+   fobj = resv->fence;
+   if (!fobj)
+   goto out;
+
+   shared_count = fobj->shared_count;
+   fence_excl = resv->fence_excl;
+
+   if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
unsigned long pevents = POLLIN;
 
-   if (resv->fence_shared_count == 0)
+   if (shared_count == 0)
pevents |= POLLOUT;
 
spin_lock_irq(&dmabuf->poll.lock);
@@ -167,19 +176,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
if (events & pevents) {
-   if (!fence_add_callback(resv->fence_excl,
-   &dcb->cb, dma_buf_poll_cb))
+   if (!fence_add_callback(fence_excl, &dcb->cb,
+  dma_buf_poll_cb)) {
events &= ~pevents;
-   else
+   } else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
dma_buf_poll_cb(NULL, &dcb->cb);
+   }
}
}
 
-   if ((events & POLLOUT) && resv->fence_shared_count > 0) {
+   if ((events & POLLOUT) && shared_count > 0) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
int i;
 
@@ -194,15 +204,18 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!(events & POLLOUT))
goto out;
 
-   for (i = 0; i < resv->fence_shared_count; ++i)
-   if (!fence_add_callback(resv->fence_shared[i],
-   &dcb->cb, dma_buf_poll_cb)) {
+   for (i = 0; i < shared_count; ++i) {
+   struct fence *fence = fobj->shared[i];
+
+   if (!fence_add_callback(fence, &dcb->cb,
+   dma_buf_poll_cb)) {
events &= ~POLLOUT;
break;
}
+   }
 
/* No callback queued, wake up any additional waiters. */
-   if (i == resv->fence_shared_count)
+   if (i == shared_count)
dma_buf_poll_cb(NULL, &dcb->cb);
}
 
diff --git a/drivers/base/fence.c b/drivers/base/fence.c
index 752a2dfa505f..74d1f7bcb467 100644
--- a/drivers/base/fence.c
+++ b/drivers/base/fence.c
@@ -170,7 +170,7 @@ void release_fence(struct kref *kref)
if (fence->ops->release)
fence->ops->release(fence);
else
-   kfree(fence);
+   free_fence(fence);
 }
 EXPORT_SYMBOL(release_fence);
 
@@ -448,7 +448,7 @@ static void seqno_release(struct fence *fence)

[REPOST PATCH 6/8] dma-buf: add poll support, v3

2014-06-18 Thread Maarten Lankhorst

Thanks to Fengguang Wu for spotting a missing static cast.

v2:
- Kill unused variable need_shared.
v3:
- Clarify the BUG() in dma_buf_release some more. (Rob Clark)

Signed-off-by: Maarten Lankhorst 
---
 drivers/base/dma-buf.c  |  108 +++
 include/linux/dma-buf.h |   12 +
 2 files changed, 120 insertions(+)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index cd40ca22911f..25e8c4165936 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static inline int is_dma_buf_file(struct file *);
@@ -52,6 +53,16 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
 
BUG_ON(dmabuf->vmapping_counter);
 
+   /*
+* Any fences that a dma-buf poll can wait on should be signaled
+* before releasing dma-buf. This is the responsibility of each
+* driver that uses the reservation objects.
+*
+* If you hit this BUG() it means someone dropped their ref to the
+* dma-buf while still having pending operation to the buffer.
+*/
+   BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
+
dmabuf->ops->release(dmabuf);
 
mutex_lock(&db_list.lock);
@@ -108,10 +119,103 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
return base + offset;
 }
 
+static void dma_buf_poll_cb(struct fence *fence, struct fence_cb *cb)
+{
+   struct dma_buf_poll_cb_t *dcb = (struct dma_buf_poll_cb_t *)cb;
+   unsigned long flags;
+
+   spin_lock_irqsave(&dcb->poll->lock, flags);
+   wake_up_locked_poll(dcb->poll, dcb->active);
+   dcb->active = 0;
+   spin_unlock_irqrestore(&dcb->poll->lock, flags);
+}
+
+static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
+{
+   struct dma_buf *dmabuf;
+   struct reservation_object *resv;
+   unsigned long events;
+
+   dmabuf = file->private_data;
+   if (!dmabuf || !dmabuf->resv)
+   return POLLERR;
+
+   resv = dmabuf->resv;
+
+   poll_wait(file, &dmabuf->poll, poll);
+
+   events = poll_requested_events(poll) & (POLLIN | POLLOUT);
+   if (!events)
+   return 0;
+
+   ww_mutex_lock(&resv->lock, NULL);
+
+   if (resv->fence_excl && (!(events & POLLOUT) ||
+resv->fence_shared_count == 0)) {
+   struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
+   unsigned long pevents = POLLIN;
+
+   if (resv->fence_shared_count == 0)
+   pevents |= POLLOUT;
+
+   spin_lock_irq(&dmabuf->poll.lock);
+   if (dcb->active) {
+   dcb->active |= pevents;
+   events &= ~pevents;
+   } else
+   dcb->active = pevents;
+   spin_unlock_irq(&dmabuf->poll.lock);
+
+   if (events & pevents) {
+   if (!fence_add_callback(resv->fence_excl,
+   &dcb->cb, dma_buf_poll_cb))
+   events &= ~pevents;
+   else
+   /*
+* No callback queued, wake up any additional
+* waiters.
+*/
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   }
+   }
+
+   if ((events & POLLOUT) && resv->fence_shared_count > 0) {
+   struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
+   int i;
+
+   /* Only queue a new callback if no event has fired yet */
+   spin_lock_irq(&dmabuf->poll.lock);
+   if (dcb->active)
+   events &= ~POLLOUT;
+   else
+   dcb->active = POLLOUT;
+   spin_unlock_irq(&dmabuf->poll.lock);
+
+   if (!(events & POLLOUT))
+   goto out;
+
+   for (i = 0; i < resv->fence_shared_count; ++i)
+   if (!fence_add_callback(resv->fence_shared[i],
+   &dcb->cb, dma_buf_poll_cb)) {
+   events &= ~POLLOUT;
+   break;
+   }
+
+   /* No callback queued, wake up any additional waiters. */
+   if (i == resv->fence_shared_count)
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   }
+
+out:
+   ww_mutex_unlock(&resv->lock);
+   return events;
+}
+
 static const struct file_operations dma_buf_fops = {
.relea

[REPOST PATCH 8/8] reservation: add suppport for read-only access using rcu

2014-06-18 Thread Maarten Lankhorst

This adds 4 more functions to deal with rcu.

reservation_object_get_fences_rcu() will obtain the list of shared
and exclusive fences without obtaining the ww_mutex.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl() is added because touching the fence_excl
member directly will trigger a sparse warning.

Signed-off-by: Maarten Lankhorst 
Reviewed-By: Thomas Hellstrom 
---
 drivers/base/dma-buf.c  |   47 +-
 drivers/base/reservation.c  |  336 ---
 include/linux/fence.h   |   20 ++-
 include/linux/reservation.h |   52 +--
 4 files changed, 400 insertions(+), 55 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index cb8379dfeed5..f3014c448e1e 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -137,7 +137,7 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
struct reservation_object_list *fobj;
struct fence *fence_excl;
unsigned long events;
-   unsigned shared_count;
+   unsigned shared_count, seq;
 
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -151,14 +151,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!events)
return 0;
 
-   ww_mutex_lock(&resv->lock, NULL);
+retry:
+   seq = read_seqcount_begin(&resv->seq);
+   rcu_read_lock();
 
-   fobj = resv->fence;
-   if (!fobj)
-   goto out;
-
-   shared_count = fobj->shared_count;
-   fence_excl = resv->fence_excl;
+   fobj = rcu_dereference(resv->fence);
+   if (fobj)
+   shared_count = fobj->shared_count;
+   else
+   shared_count = 0;
+   fence_excl = rcu_dereference(resv->fence_excl);
+   if (read_seqcount_retry(&resv->seq, seq)) {
+   rcu_read_unlock();
+   goto retry;
+   }
 
if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
@@ -176,14 +182,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
if (events & pevents) {
-   if (!fence_add_callback(fence_excl, &dcb->cb,
+   if (!fence_get_rcu(fence_excl)) {
+   /* force a recheck */
+   events &= ~pevents;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   } else if (!fence_add_callback(fence_excl, &dcb->cb,
   dma_buf_poll_cb)) {
events &= ~pevents;
+   fence_put(fence_excl);
} else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
+   fence_put(fence_excl);
dma_buf_poll_cb(NULL, &dcb->cb);
}
}
@@ -205,13 +217,26 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
goto out;
 
for (i = 0; i < shared_count; ++i) {
-   struct fence *fence = fobj->shared[i];
+   struct fence *fence = rcu_dereference(fobj->shared[i]);
 
+   if (!fence_get_rcu(fence)) {
+   /*
+* fence refcount dropped to zero, this means
+* that fobj has been freed
+*
+* call dma_buf_poll_cb and force a recheck!
+*/
+   events &= ~POLLOUT;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   break;
+   }
if (!fence_add_callback(fence, &dcb->cb,
dma_buf_poll_cb)) {
+   fence_put(fence);
events &= ~POLLOUT;
break;
}
+   fence_put(fence);
}
 
/* No callback queued, wake up any additional waiters. */
@@ -220,7 +245,7 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
}
 
 out:
-

[REPOST PATCH 4/8] android: convert sync to fence api, v5

2014-06-18 Thread Maarten Lankhorst

Just to show it's easy.

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.
v5:
- Fix small style issues pointed out by Thomas Hellstrom.

Signed-off-by: Maarten Lankhorst 
Acked-by: John Stultz 
---
 drivers/staging/android/Kconfig  |1 
 drivers/staging/android/Makefile |2 
 drivers/staging/android/sw_sync.c|6 
 drivers/staging/android/sync.c   |  913 +++---
 drivers/staging/android/sync.h   |   79 ++-
 drivers/staging/android/sync_debug.c |  247 +
 drivers/staging/android/trace/sync.h |   12 
 7 files changed, 609 insertions(+), 651 deletions(-)
 create mode 100644 drivers/staging/android/sync_debug.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 99e484f845f2..51607e9aa049 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -88,6 +88,7 @@ config SYNC
bool "Synchronization framework"
default n
select ANON_INODES
+   select DMA_SHARED_BUFFER
---help---
  This option enables the framework for synchronization between multiple
  drivers.  Sync implementations can take advantage of hardware
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 0a01e1914905..517ad5ffa429 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_ANDROID_TIMED_OUTPUT)  += timed_output.o
 obj-$(CONFIG_ANDROID_TIMED_GPIO)   += timed_gpio.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
 obj-$(CONFIG_ANDROID_INTF_ALARM_DEV)   += alarm-dev.o
-obj-$(CONFIG_SYNC) += sync.o
+obj-$(CONFIG_SYNC) += sync.o sync_debug.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o
diff --git a/drivers/staging/android/sw_sync.c 
b/drivers/staging/android/sw_sync.c
index 12a136ec1cec..a76db3ff87cb 100644
--- a/drivers/staging/android/sw_sync.c
+++ b/drivers/staging/android/sw_sync.c
@@ -50,7 +50,7 @@ static struct sync_pt *sw_sync_pt_dup(struct sync_pt *sync_pt)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *) sync_pt;
struct sw_sync_timeline *obj =
-   (struct sw_sync_timeline *)sync_pt->parent;
+   (struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
return (struct sync_pt *) sw_sync_pt_create(obj, pt->value);
 }
@@ -59,7 +59,7 @@ static int sw_sync_pt_has_signaled(struct sync_pt *sync_pt)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
struct sw_sync_timeline *obj =
-   (struct sw_sync_timeline *)sync_pt->parent;
+   (struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
return sw_sync_cmp(obj->value, pt->value) >= 0;
 }
@@ -97,7 +97,6 @@ static void sw_sync_pt_value_str(struct sync_pt *sync_pt,
   char *str, int size)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
-
snprintf(str, size, "%d", pt->value);
 }
 
@@ -157,7 +156,6 @@ static int sw_sync_open(struct inode *inode, struct file 
*file)
 static int sw_sync_release(struct inode *inode, struct file *file)
 {
struct sw_sync_timeline *obj = file->private_data;
-
sync_timeline_destroy(&obj->obj);
return 0;
 }
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 18174f7c871c..70b09b5001ba 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -31,22 +31,13 @@
 #define CREATE_TRACE_POINTS
 #include "trace/sync.h"
 
-static void sync_fence_signal_pt(struct sync_pt *pt);
-static int _sync_pt_has_signaled(struct sync_pt *pt);
-static void sync_fence_free(struct kref *kref);
-static void sync_dump(void);
-
-static LIST_HEAD(sync_timeline_list_head);
-static DEFINE_SPINLOCK(sync_timeline_list_lock);
-
-static LIST_HEAD(sync_fence_list_head);
-static DEFINE_SPINLOCK(sync_fence_list_lock);
+static const struct fence_ops android_fence_ops;
+static const struct file_operations sync_fence_fops;
 
 struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
   int size, const char *name)
 {
struct sync_timeline *obj;
-   unsigned long flags;
 
if (size < sizeof(struct sync_timeline))
return NULL;
@@ -57,17 +48,14 @@ struct sync_timeline *sync_timeline_create(const struct 
sync_timeline_ops *ops,
 
kref_init(&obj->kref);
obj->ops = ops;
+

[REPOST PATCH 0/8] fence synchronization patches

2014-06-18 Thread Maarten Lankhorst

The following series implements fence and converts dma-buf and
android sync to use it. Patch 5 and 6 add support for polling
to dma-buf, blocking until all fences are signaled.
Patch 7 and 8 provide some helpers, and allow use of RCU in the
reservation api. The helpers make it easier to convert ttm, and
make dealing with rcu less painful.

Patches slightly updated to fix compilation with armada and
new atomic primitives, but otherwise identical.

---

Maarten Lankhorst (8):
  fence: dma-buf cross-device synchronization (v17)
  seqno-fence: Hardware dma-buf implementation of fencing (v5)
  dma-buf: use reservation objects
  android: convert sync to fence api, v5
  reservation: add support for fences to enable cross-device synchronisation
  dma-buf: add poll support, v3
  reservation: update api and add some helpers
  reservation: add suppport for read-only access using rcu


 Documentation/DocBook/device-drivers.tmpl  |3 
 drivers/base/Kconfig   |9 
 drivers/base/Makefile  |2 
 drivers/base/dma-buf.c |  168 
 drivers/base/fence.c   |  468 
 drivers/base/reservation.c |  440 
 drivers/gpu/drm/armada/armada_gem.c|2 
 drivers/gpu/drm/drm_prime.c|8 
 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |2 
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |3 
 drivers/gpu/drm/nouveau/nouveau_drm.c  |1 
 drivers/gpu/drm/nouveau/nouveau_gem.h  |1 
 drivers/gpu/drm/nouveau/nouveau_prime.c|7 
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c  |2 
 drivers/gpu/drm/radeon/radeon_drv.c|2 
 drivers/gpu/drm/radeon/radeon_prime.c  |8 
 drivers/gpu/drm/tegra/gem.c|2 
 drivers/gpu/drm/ttm/ttm_object.c   |2 
 drivers/media/v4l2-core/videobuf2-dma-contig.c |2 
 drivers/staging/android/Kconfig|1 
 drivers/staging/android/Makefile   |2 
 drivers/staging/android/ion/ion.c  |3 
 drivers/staging/android/sw_sync.c  |6 
 drivers/staging/android/sync.c |  913 
 drivers/staging/android/sync.h |   79 +-
 drivers/staging/android/sync_debug.c   |  247 ++
 drivers/staging/android/trace/sync.h   |   12 
 include/drm/drmP.h |3 
 include/linux/dma-buf.h|   21 -
 include/linux/fence.h  |  355 +
 include/linux/reservation.h|   82 ++
 include/linux/seqno-fence.h|  119 +++
 include/trace/events/fence.h   |  128 +++
 33 files changed, 2435 insertions(+), 668 deletions(-)
 create mode 100644 drivers/base/fence.c
 create mode 100644 drivers/staging/android/sync_debug.c
 create mode 100644 include/linux/fence.h
 create mode 100644 include/linux/seqno-fence.h
 create mode 100644 include/trace/events/fence.h

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[REPOST PATCH 3/8] dma-buf: use reservation objects

2014-06-18 Thread Maarten Lankhorst

This allows reservation objects to be used in dma-buf. it's required
for implementing polling support on the fences that belong to a dma-buf.

Signed-off-by: Maarten Lankhorst 
Acked-by: Mauro Carvalho Chehab  #drivers/media/v4l2-core/
Acked-by: Thomas Hellstrom  #drivers/gpu/drm/ttm
Signed-off-by: Vincent Stehlé  
#drivers/gpu/drm/armada/
---
 drivers/base/dma-buf.c |   22 --
 drivers/gpu/drm/armada/armada_gem.c|2 +-
 drivers/gpu/drm/drm_prime.c|8 +++-
 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |2 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |3 ++-
 drivers/gpu/drm/nouveau/nouveau_drm.c  |1 +
 drivers/gpu/drm/nouveau/nouveau_gem.h  |1 +
 drivers/gpu/drm/nouveau/nouveau_prime.c|7 +++
 drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c  |2 +-
 drivers/gpu/drm/radeon/radeon_drv.c|2 ++
 drivers/gpu/drm/radeon/radeon_prime.c  |8 
 drivers/gpu/drm/tegra/gem.c|2 +-
 drivers/gpu/drm/ttm/ttm_object.c   |2 +-
 drivers/media/v4l2-core/videobuf2-dma-contig.c |2 +-
 drivers/staging/android/ion/ion.c  |3 ++-
 include/drm/drmP.h |3 +++
 include/linux/dma-buf.h|9 ++---
 17 files changed, 65 insertions(+), 14 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 840c7fa80983..cd40ca22911f 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -25,10 +25,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 static inline int is_dma_buf_file(struct file *);
 
@@ -56,6 +58,9 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
list_del(&dmabuf->list_node);
mutex_unlock(&db_list.lock);
 
+   if (dmabuf->resv == (struct reservation_object *)&dmabuf[1])
+   reservation_object_fini(dmabuf->resv);
+
kfree(dmabuf);
return 0;
 }
@@ -128,6 +133,7 @@ static inline int is_dma_buf_file(struct file *file)
  * @size:  [in]Size of the buffer
  * @flags: [in]mode flags for the file.
  * @exp_name:  [in]name of the exporting module - useful for debugging.
+ * @resv:  [in]reservation-object, NULL to allocate default one.
  *
  * Returns, on success, a newly created dma_buf object, which wraps the
  * supplied private data and operations for dma_buf_ops. On either missing
@@ -135,10 +141,17 @@ static inline int is_dma_buf_file(struct file *file)
  *
  */
 struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops,
-   size_t size, int flags, const char *exp_name)
+   size_t size, int flags, const char *exp_name,
+   struct reservation_object *resv)
 {
struct dma_buf *dmabuf;
struct file *file;
+   size_t alloc_size = sizeof(struct dma_buf);
+   if (!resv)
+   alloc_size += sizeof(struct reservation_object);
+   else
+   /* prevent &dma_buf[1] == dma_buf->resv */
+   alloc_size += 1;
 
if (WARN_ON(!priv || !ops
  || !ops->map_dma_buf
@@ -150,7 +163,7 @@ struct dma_buf *dma_buf_export_named(void *priv, const 
struct dma_buf_ops *ops,
return ERR_PTR(-EINVAL);
}
 
-   dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
+   dmabuf = kzalloc(alloc_size, GFP_KERNEL);
if (dmabuf == NULL)
return ERR_PTR(-ENOMEM);
 
@@ -158,6 +171,11 @@ struct dma_buf *dma_buf_export_named(void *priv, const 
struct dma_buf_ops *ops,
dmabuf->ops = ops;
dmabuf->size = size;
dmabuf->exp_name = exp_name;
+   if (!resv) {
+   resv = (struct reservation_object *)&dmabuf[1];
+   reservation_object_init(resv);
+   }
+   dmabuf->resv = resv;
 
file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags);
if (IS_ERR(file)) {
diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index bb9b642d8485..7496f55611a5 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -539,7 +539,7 @@ armada_gem_prime_export(struct drm_device *dev, struct 
drm_gem_object *obj,
int flags)
 {
return dma_buf_export(obj, &armada_gem_prime_dmabuf_ops, obj->size,
- O_RDWR);
+ O_RDWR, NULL);
 }
 
 struct drm_gem_object *
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 304ca8cacbc4..99d578bad17e 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -336,7 +336,13 @@ static const struct dma_buf_ops drm_gem_prime_dmab

Re: [PATCH] fence: Use smp_mb__before_atomic()

2014-06-05 Thread Maarten Lankhorst


op 05-06-14 13:51, Rob Clark schreef:

On Wed, Jun 4, 2014 at 1:49 PM, Greg Kroah-Hartman
 wrote:

On Wed, Jun 04, 2014 at 03:28:33PM +0200, Thierry Reding wrote:

On Wed, Jun 04, 2014 at 04:57:07PM +0530, Sumit Semwal wrote:

Hi Greg,


On 30 May 2014 21:38, Greg Kroah-Hartman  wrote:

On Fri, May 30, 2014 at 10:15:05AM +0200, Thierry Reding wrote:

On Wed, May 28, 2014 at 01:51:45PM -0700, Greg Kroah-Hartman wrote:

On Wed, May 28, 2014 at 04:26:32PM +0200, Thierry Reding wrote:

From: Thierry Reding 

Commit febdbfe8a91c (arch: Prepare for smp_mb__{before,after}_atomic())
deprecated the smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}*()
functions in favour of the unified smp_mb__{before,after}_atomic().

Signed-off-by: Thierry Reding 
---
  drivers/base/fence.c | 4 ++--

Where does this file come from?  I've not seen it before, and it's not
in my tree.

I think it came in through Sumit's tree and it's only in linux-next I
believe.

Odd, linux-next is for merging things in Linus's next release.

And as I have never seen this code that will end up being my
responsibility to maintain, it seems strange that it will be merged in
the next kernel development cycle.

What broke down here with our review process that required something to
be merged without at least a cc: to me?

This is a new file added by Maarten's patches [1], that got reviewed
on dri-devel and other mailing lists. Since it was quite closely
associated with dma-buf, I figured I should take it through the
dma-buf tree.

I am sorry I didn't notice that you weren't CC'ed on these patches -
Sincere apologies, since I should've noticed that during the patch
review process - I would take part of the blame here as well  :(

I do realize now that atleast on my part, I should've asked you before
taking it through the dma-buf tree - I will make sure things like this
don't happen again through me.

May I request you to help us handle this - would it help if we add
Maarten as the maintainer for this file? Any other suggestions?

Perhaps something like the following would help?

diff --git a/MAINTAINERS b/MAINTAINERS
index fb39c9c3f0c1..d582f54adec8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2867,7 +2867,9 @@ L:linux-me...@vger.kernel.org
  L:   dri-de...@lists.freedesktop.org
  L:   linaro-mm-...@lists.linaro.org
  F:   drivers/base/dma-buf*
+F:   drivers/base/fence.c
  F:   include/linux/dma-buf*
+F:   include/linux/fence.h
  F:   Documentation/dma-buf-sharing.txt
  T:   git git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
@@ -2936,6 +2938,8 @@ T:git 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
  S:   Supported
  F:   Documentation/kobject.txt
  F:   drivers/base/
+X:   drivers/base/dma-buf*
+X:   drivers/base/fence.c
  F:   fs/sysfs/
  F:   fs/debugfs/
  F:   include/linux/kobj*

That removes Greg from the list generated by get_maintainer.pl for
anything that touches the DMA-BUF files.

That doesn't really work for most people, I'll still be "responsible"
for the code.


Thinking about it, perhaps moving DMA-BUF into its own subdirectory
would be an option too, to make the separation more obvious.

That might be best for some of this.

But again, why is the fence.c code needed at all anyway?  I'm not sold
on that.

Fence serves as a way to synchronize between (for example) multiple
asynchronous gpu's.  There is definitely a need for this.  Otherwise
performance for optimus/prime type setups is going to suck.

I thought we had added something under Documentation/ about it, but I
can't find it now (although possibly looking at the wrong trees)..
there is at least a bit of a description in the commit msg:

   https://lkml.org/lkml/2014/2/24/602

I don't think the question about whether we need something like fence
to augment dma-buf is really in doubt.  Maybe it should live somewhere
else, I'm not sure.  But it makes sense for it to live wherever
dma-buf does, as they are intended to work together.


Additionally we already have a user for this in the kernel: the GPU drivers 
that use TTM.

In my tree I have a branch that converts TTM fences to this fence mechanism, 
roughly
preserving the same functionality. As soon as the fence code is merged I'll 
move TTM over to the new code.

Additionally I added a series on top that allows drm drivers to get additional 
fences through
dma-buf, allowing nouveau and i915 to work on the same buffer object,
with both drivers being aware that the other driver is performing work on it.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1.4 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq

2014-06-03 Thread Maarten Lankhorst


This makes it possible to wait for a specific amount of time,
rather than wait until infinity.

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_fence.c | 60 +++
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..b25b14231421 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct 
radeon_device *rdev, u64 *seq)
 }
 
 /**

- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
  *
  * @rdev: radeon device pointer
  * @target_seq: sequence number(s) we want to wait for
  * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
  *
  * Wait for the requested sequence number(s) to be written by any ring
  * (all asics).  Sequnce number array is indexed by ring id.
  * @intr selects whether to use interruptable (true) or non-interruptable
  * (false) sleep when waiting for the sequence number.  Helper function
  * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
  * -EDEADLK is returned when a GPU lockup has been detected.
  */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+u64 *target_seq, bool intr,
+long timeout)
 {
uint64_t last_seq[RADEON_NUM_RINGS];
bool signaled;
-   int i, r;
+   int i;
 
 	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {

+   long r, waited;
+
+   waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
 
 		/* Save current sequence values, used to check for GPU lockups */

for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,11 +326,11 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (intr) {
r = wait_event_interruptible_timeout(rdev->fence_queue, 
(
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_seq))
-|| rdev->needs_reset), 
RADEON_FENCE_JIFFIES_TIMEOUT);
+|| rdev->needs_reset), waited);
} else {
r = wait_event_timeout(rdev->fence_queue, (
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_seq))
-|| rdev->needs_reset), 
RADEON_FENCE_JIFFIES_TIMEOUT);
+|| rdev->needs_reset), waited);
}
 
 		for (i = 0; i < RADEON_NUM_RINGS; ++i) {

@@ -337,6 +344,14 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (unlikely(r < 0))
return r;
 
+		timeout -= waited - r;

+
+   /*
+* If this is a timed wait and the wait completely timed out 
just return.
+*/
+   if (!timeout)
+   break;
+
if (unlikely(!signaled)) {
if (rdev->needs_reset)
return -EDEADLK;
@@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
}
}
}
-   return 0;
+   return timeout;
 }
 
 /**

  * radeon_fence_wait - wait for a fence to signal
  *
  * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
  *
  * Wait for the requested fence to signal (all asics).
  * @intr selects whether to use interruptable (true) or non-interruptable
@@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
 int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
uint64_t seq[RADEON_NUM_RINGS] = {};
-   int r;
+   long r;
 
 	if (fence == NULL) {

WARN(1, "Querying an invalid fence : %p !\n", fence);
@@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, bool 
intr)
if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
return 0;
 
-	r = radeon_fence_wait_seq(fence->rdev, seq, intr);

-   if (r)
+   r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, 
MAX_SCHEDULE_TIMEOUT);
+   if (r <

[RFC PATCH v1.3 08/16 2/2] drm/radeon: use common fence implementation for fences

2014-06-02 Thread Maarten Lankhorst


Signed-off-by: Maarten Lankhorst 
---
Oops, changed unsigned long  in __radeon_fence_wait to long, fixing a subtle 
bug.

 drivers/gpu/drm/radeon/radeon.h|  15 +--
 drivers/gpu/drm/radeon/radeon_device.c |  60 -
 drivers/gpu/drm/radeon/radeon_fence.c  | 223 +++--
 3 files changed, 248 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8149e7cf4303..32a3f2fe70c5 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
 #define RADEONFB_CONN_LIMIT4
 #define RADEON_BIOS_NUM_SCRATCH8
 
-/* fence seq are set to this number when signaled */

-#define RADEON_FENCE_SIGNALED_SEQ  0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX 0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {

+   struct fence base;
+
struct radeon_device*rdev;
-   struct kref kref;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
+
+   wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);

@@ -2257,6 +2258,7 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
+   unsignedfence_context;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
boolib_pool_ready;
@@ -2347,11 +2349,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 
index);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 
 /*

- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 14671406212f..9a7d9f63203e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+   rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
 	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",

radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1566,6 +1567,54 @@ int radeon_resume_kms(struct drm_device *dev, bool 
resume, bool fbcon)
return 0;
 }
 
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)

+{
+   uint32_t mask = 0;
+   int i;
+
+   if (!rdev->ddev->irq_enabled)
+   return mask;
+
+   /*
+* increase refcount on sw interrupts for all rings to stop
+* enabling interrupts in radeon_fence_enable_signaling during
+* gpu reset.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!rdev->ring[i].ready)
+   continue;
+
+   atomic_inc(&rdev->irq.ring_int[i]);
+   mask |= 1 << i;
+   }
+   return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+   unsigned long irqflags;
+   int i;
+
+   if (!mask)
+   return;
+
+   /*
+* undo refcount increase, and reset irqs to correct value.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!(mask & (1 << i)))
+   continue;
+
+   atomic_dec(&rdev->irq.ring_int[i]);
+   }
+
+   spin_lock_irqsave(&rdev->irq.lock, irqflags);
+   radeon_irq_set(rdev);
+   spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
 /**
  * radeon_gpu_reset - reset the asic
  *
@@ -1583,6 +1632,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 
 	int i, r;

int resched;
+   uint32_t sw_mask;
 
 	down_write(&rdev->exclusive_lock);
 
@@ -1596,6 +1646,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)

radeon_save_bios_scratch_regs(rdev);
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+   sw_mask = radeon_gpu_mask_sw_irq(rdev);
radeon_pm_suspend(rdev);
radeon_suspend(rdev);
 
@@ -1645,13 +1696,20 @@ retry:

radeon_pm_resume(rdev);
drm_helper_resume_f

[RFC PATCH v1.3 08/16 1/2] drm/radeon: add timeout argument to radeon_fence_wait_seq

2014-06-02 Thread Maarten Lankhorst


This makes it possible to wait for a specific amount of time,
rather than wait until infinity.

Signed-off-by: Maarten Lankhorst 
---
 Splitted out version, I've noticed that I forgot to convert 
radeon_fence_wait_empty to long r, fixed.
 drivers/gpu/drm/radeon/radeon_fence.c | 60 +++
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..bf4bfe65a050 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -283,28 +283,35 @@ static bool radeon_fence_any_seq_signaled(struct 
radeon_device *rdev, u64 *seq)
 }
 
 /**

- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
  *
  * @rdev: radeon device pointer
  * @target_seq: sequence number(s) we want to wait for
  * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
  *
  * Wait for the requested sequence number(s) to be written by any ring
  * (all asics).  Sequnce number array is indexed by ring id.
  * @intr selects whether to use interruptable (true) or non-interruptable
  * (false) sleep when waiting for the sequence number.  Helper function
  * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
  * -EDEADLK is returned when a GPU lockup has been detected.
  */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+u64 *target_seq, bool intr,
+long timeout)
 {
uint64_t last_seq[RADEON_NUM_RINGS];
bool signaled;
-   int i, r;
+   int i;
 
 	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {

+   long r, waited = timeout;
+
+   waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
 
 		/* Save current sequence values, used to check for GPU lockups */

for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -319,13 +326,15 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (intr) {
r = wait_event_interruptible_timeout(rdev->fence_queue, 
(
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_seq))
-|| rdev->needs_reset), 
RADEON_FENCE_JIFFIES_TIMEOUT);
+|| rdev->needs_reset), waited);
} else {
r = wait_event_timeout(rdev->fence_queue, (
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_seq))
-|| rdev->needs_reset), 
RADEON_FENCE_JIFFIES_TIMEOUT);
+|| rdev->needs_reset), waited);
}
 
+		timeout -= waited - r;

+
for (i = 0; i < RADEON_NUM_RINGS; ++i) {
if (!target_seq[i])
continue;
@@ -337,6 +346,12 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (unlikely(r < 0))
return r;
 
+		/*

+* If this is a timed wait and the wait completely timed out 
just return.
+*/
+   if (!timeout)
+   break;
+
if (unlikely(!signaled)) {
if (rdev->needs_reset)
return -EDEADLK;
@@ -379,14 +394,14 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
}
}
}
-   return 0;
+   return timeout;
 }
 
 /**

  * radeon_fence_wait - wait for a fence to signal
  *
  * @fence: radeon fence object
- * @intr: use interruptable sleep
+ * @intr: use interruptible sleep
  *
  * Wait for the requested fence to signal (all asics).
  * @intr selects whether to use interruptable (true) or non-interruptable
@@ -396,7 +411,7 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
 int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
uint64_t seq[RADEON_NUM_RINGS] = {};
-   int r;
+   long r;
 
 	if (fence == NULL) {

WARN(1, "Querying an invalid fence : %p !\n", fence);
@@ -407,9 +422,10 @@ int radeon_fence_wait(struct radeon_fence *fence, bool 
intr)
if (seq[fence->ring] == RADEON_FENCE_SIGNALED_SEQ)
return 0;
 
-	r = radeon_fence_wait_seq

[RFC PATCH v1.2 08/16] drm/radeon: use common fence implementation for fences

2014-06-02 Thread Maarten Lankhorst


Changes since v1:
- Fixed interaction with reset handling.
  + Use exclusive_lock, either with trylock or blocking.
  + Bump sw irq refcount in the recovery function to prevent fiddling
with irq registers during gpu recovery.
- Add radeon lockup detection to the default fence wait function.
---
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 68528619834a..a7d839a158ae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
 #define RADEONFB_CONN_LIMIT4
 #define RADEON_BIOS_NUM_SCRATCH8
 
-/* fence seq are set to this number when signaled */

-#define RADEON_FENCE_SIGNALED_SEQ  0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX 0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {

+   struct fence base;
+
struct radeon_device*rdev;
-   struct kref kref;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
+
+   wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);

@@ -2256,6 +2257,7 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
+   unsignedfence_context;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
boolib_pool_ready;
@@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 
index);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 
 /*

- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bbf7e29..6800a0f6dd33 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+   rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
 	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",

radeon_family_name[rdev->family], pdev->vendor, pdev->device,
@@ -1565,6 +1566,54 @@ int radeon_resume_kms(struct drm_device *dev, bool 
resume, bool fbcon)
return 0;
 }
 
+static uint32_t radeon_gpu_mask_sw_irq(struct radeon_device *rdev)

+{
+   uint32_t mask = 0;
+   int i;
+
+   if (!rdev->ddev->irq_enabled)
+   return mask;
+
+   /*
+* increase refcount on sw interrupts for all rings to stop
+* enabling interrupts in radeon_fence_enable_signaling during
+* gpu reset.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!rdev->ring[i].ready)
+   continue;
+
+   atomic_inc(&rdev->irq.ring_int[i]);
+   mask |= 1 << i;
+   }
+   return mask;
+}
+
+static void radeon_gpu_unmask_sw_irq(struct radeon_device *rdev, uint32_t mask)
+{
+   unsigned long irqflags;
+   int i;
+
+   if (!mask)
+   return;
+
+   /*
+* undo refcount increase, and reset irqs to correct value.
+*/
+
+   for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+   if (!(mask & (1 << i)))
+   continue;
+
+   atomic_dec(&rdev->irq.ring_int[i]);
+   }
+
+   spin_lock_irqsave(&rdev->irq.lock, irqflags);
+   radeon_irq_set(rdev);
+   spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+}
+
 /**
  * radeon_gpu_reset - reset the asic
  *
@@ -1582,6 +1631,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)
 
 	int i, r;

int resched;
+   uint32_t sw_mask;
 
 	down_write(&rdev->exclusive_lock);
 
@@ -1595,6 +1645,7 @@ int radeon_gpu_reset(struct radeon_device *rdev)

radeon_save_bios_scratch_regs(rdev);
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&rdev->mman.bdev);
+   sw_mask = radeon_gpu_mask_sw_irq(rdev);
radeon_pm_suspend(rdev);
radeon_suspend(rdev);
 
@@ -1644,13 +1695,20 @@ retry:

radeon_pm_resume(rdev);
drm_helper_resume_force_mode(rdev->ddev);
 
+	radeon_gpu_unmask_sw_irq(rdev, sw_mask);

ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched);
if (r) {
/* ba

Re: [PATCH] fence: Use smp_mb__before_atomic()

2014-05-28 Thread Maarten Lankhorst



On 28-05-14 16:26, Thierry Reding wrote:

From: Thierry Reding 

Commit febdbfe8a91c (arch: Prepare for smp_mb__{before,after}_atomic())
deprecated the smp_mb__{before,after}_{atomic,clear}_{dec,inc,bit}*()
functions in favour of the unified smp_mb__{before,after}_atomic().

Signed-off-by: Thierry Reding 

Acked-by: Maarten Lankhorst 

I saw the patches, but hard to clean up if it's not in the fences tree 
yet. :-)


~maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/2 with seqcount v3] reservation: add suppport for read-only access using rcu

2014-05-20 Thread Maarten Lankhorst


op 20-05-14 17:13, Thomas Hellstrom schreef:

On 05/19/2014 03:13 PM, Maarten Lankhorst wrote:

op 19-05-14 15:42, Thomas Hellstrom schreef:

Hi, Maarten!

Some nitpicks, and that krealloc within rcu lock still worries me.
Otherwise looks good.

/Thomas



On 04/23/2014 12:15 PM, Maarten Lankhorst wrote:

@@ -55,8 +60,8 @@ int reservation_object_reserve_shared(struct
reservation_object *obj)
   kfree(obj->staged);
   obj->staged = NULL;
   return 0;
-}
-max = old->shared_max * 2;
+} else
+max = old->shared_max * 2;

Perhaps as a separate reformatting patch?

I'll fold it in to the patch that added
reservation_object_reserve_shared.

+
+int reservation_object_get_fences_rcu(struct reservation_object *obj,
+  struct fence **pfence_excl,
+  unsigned *pshared_count,
+  struct fence ***pshared)
+{
+unsigned shared_count = 0;
+unsigned retry = 1;
+struct fence **shared = NULL, *fence_excl = NULL;
+int ret = 0;
+
+while (retry) {
+struct reservation_object_list *fobj;
+unsigned seq;
+
+seq = read_seqcount_begin(&obj->seq);
+
+rcu_read_lock();
+
+fobj = rcu_dereference(obj->fence);
+if (fobj) {
+struct fence **nshared;
+
+shared_count = ACCESS_ONCE(fobj->shared_count);

ACCESS_ONCE() shouldn't be needed inside the seqlock?

Yes it is, shared_count may be increased, leading to potential
different sizes for krealloc and memcpy
if the ACCESS_ONCE is removed. I could use shared_max here instead,
which stays the same,
but it would waste more memory.

Maarten, Another perhaps ignorant question WRT this,
Does ACCESS_ONCE() guarantee that the value accessed is read atomically?

Well I've reworked the code to use shared_max, so this point is moot. :-)

On any archs I'm aware of it would work, either the old or new value would be 
visible, as long as natural alignment is used.
rcu uses the same trick in the rcu_dereference macro, so if this didn't work 
rcu wouldn't work either.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: build failure after merge of the dma-buf tree

2014-05-20 Thread Maarten Lankhorst


Hey,

op 20-05-14 09:13, Stephen Rothwell schreef:

Hi Sumit,

After merging the dma-buf tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/gpu/drm/tegra/gem.c: In function 'tegra_gem_prime_export':
drivers/gpu/drm/tegra/gem.c:423:15: error: macro "dma_buf_export" requires 5 
arguments, but only 4 given
drivers/gpu/drm/tegra/gem.c:422:9: error: 'dma_buf_export' undeclared (first 
use in this function)

Caused by commit 8dfb1f0f8103 ("dma-buf: use reservation objects").
Grep is your friend ...

I have use the dma-buf tree from next-20140519 for today.


sumits can you amend the commit?
---

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index bcf9895cef9f..1e9de41a14ea 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -419,7 +419,7 @@ struct dma_buf *tegra_gem_prime_export(struct drm_device 
*drm,
   int flags)
 {
return dma_buf_export(gem, &tegra_gem_prime_dmabuf_ops, gem->size,
- flags);
+ flags, NULL);
 }
 
 struct drm_gem_object *tegra_gem_prime_import(struct drm_device *drm,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/2 with seqcount v3] reservation: add suppport for read-only access using rcu

2014-05-19 Thread Maarten Lankhorst


op 19-05-14 15:42, Thomas Hellstrom schreef:

Hi, Maarten!

Some nitpicks, and that krealloc within rcu lock still worries me.
Otherwise looks good.

/Thomas



On 04/23/2014 12:15 PM, Maarten Lankhorst wrote:

@@ -55,8 +60,8 @@ int reservation_object_reserve_shared(struct
reservation_object *obj)
  kfree(obj->staged);
  obj->staged = NULL;
  return 0;
-}
-max = old->shared_max * 2;
+} else
+max = old->shared_max * 2;

Perhaps as a separate reformatting patch?

I'll fold it in to the patch that added reservation_object_reserve_shared.

+
+int reservation_object_get_fences_rcu(struct reservation_object *obj,
+  struct fence **pfence_excl,
+  unsigned *pshared_count,
+  struct fence ***pshared)
+{
+unsigned shared_count = 0;
+unsigned retry = 1;
+struct fence **shared = NULL, *fence_excl = NULL;
+int ret = 0;
+
+while (retry) {
+struct reservation_object_list *fobj;
+unsigned seq;
+
+seq = read_seqcount_begin(&obj->seq);
+
+rcu_read_lock();
+
+fobj = rcu_dereference(obj->fence);
+if (fobj) {
+struct fence **nshared;
+
+shared_count = ACCESS_ONCE(fobj->shared_count);

ACCESS_ONCE() shouldn't be needed inside the seqlock?

Yes it is, shared_count may be increased, leading to potential different sizes 
for krealloc and memcpy
if the ACCESS_ONCE is removed. I could use shared_max here instead, which stays 
the same,
but it would waste more memory.


+nshared = krealloc(shared, sizeof(*shared) *
shared_count, GFP_KERNEL);

Again, krealloc should be a sleeping function, and not suitable within a
RCU read lock? I still think this krealloc should be moved to the start
of the retry loop, and we should start with a suitable guess of
shared_count (perhaps 0?) It's not like we're going to waste a lot of
memory

But shared_count is only known when holding the rcu lock.

What about this change?

@@ -254,16 +254,27 @@ int reservation_object_get_fences_rcu(struct 
reservation_object *obj,
fobj = rcu_dereference(obj->fence);
if (fobj) {
struct fence **nshared;
+   size_t sz;
 
 			shared_count = ACCESS_ONCE(fobj->shared_count);

-   nshared = krealloc(shared, sizeof(*shared) * 
shared_count, GFP_KERNEL);
+   sz = sizeof(*shared) * shared_count;
+
+   nshared = krealloc(shared, sz,
+  GFP_NOWAIT | __GFP_NOWARN);
if (!nshared) {
+   rcu_read_unlock();
+   nshared = krealloc(shared, sz, GFP_KERNEL)
+   if (nshared) {
+   shared = nshared;
+   continue;
+   }
+
ret = -ENOMEM;
-   shared_count = retry = 0;
-   goto unlock;
+   shared_count = 0;
+   break;
}
shared = nshared;
-   memcpy(shared, fobj->shared, sizeof(*shared) * 
shared_count);
+   memcpy(shared, fobj->shared, sz);
} else
shared_count = 0;
fence_excl = rcu_dereference(obj->fence_excl);



+
+/*
+ * There could be a read_seqcount_retry here, but nothing cares
+ * about whether it's the old or newer fence pointers that are
+ * signale. That race could still have happened after checking

Typo.

Oops.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-19 Thread Maarten Lankhorst


op 19-05-14 14:30, Christian König schreef:

Am 19.05.2014 12:10, schrieb Maarten Lankhorst:

op 19-05-14 10:27, Christian König schreef:

Am 19.05.2014 10:00, schrieb Maarten Lankhorst:
[SNIP]
The problem here is that the whole approach collides with the way we do reset 
handling from a conceptual point of view. Every IOCTL or other call chain into 
the driver is protected by the read side of the exclusive_lock semaphore. So in 
the case of a GPU lockup we can take the write side of the semaphore and so 
make sure that we have nobody else accessing the hardware or internal driver 
structures only changed at init time.

Leaking a drivers IRQ context into another driver as well as calling into a 
driver in atomic context is just a quite uncommon approach and should be 
considered very carefully.

I would rather vote for a completely synchronous interface only allowing 
blocking waits and checks if a fence is signaled from not atomic context.

If a driver needs to avoid blocking it should just use a workqueue and checking 
a fence outside your own driver is probably be better done in a bottom halve 
handler anyway.


Except that you might want to do something like
fence_is_signaled() in another driver to check whether you need to
defer, or can submit the batch buffer immediately, saving a bunch of
context switches. Running the is_signaled atomic is really useful here
because it means you can't do too many scary things in your is_signaled
handler.


This is indeed a nice optimization, but nothing more. If you want to provide a 
is_signalled interface for atomic context then this should be optional, not 
mandatory.

See below.

In case of enable_signaling it was the only sane solution, because
fence_signal can be called from irq context, and any calls after that to
fence_add_callback and fence_wait aren't allowed to do anything, so
fence_enable_sw_signaling and the default wait implementation must be
atomic. fence_wait itself doesn't have to be, so it's easy to grab
exclusive_lock there.


I don't think you understood my point here: Completely drop enable_signaling, 
it's unnecessary and only complicates the interface.

We purposely avoided exactly this paradigm in the past and I haven't seen any 
good argument to start with it now.


In the common case a lot more fences will be emitted than will be waited on.
This means it makes sense to delay signaling a fence with fence_signal for
as long as possible. But when a fence user wants to work with a fence
some way is needed to ensure that the fence will complete. This is the idea
behind .enable_signaling, it tells the fence driver to call fence_signal on
the fence 'soon' because there are now waiters for it.

The atomic .signaled is optional, and can be set to NULL, but there is
no guarantee that fence_is_signaled will ever return true in that case,
unless fence_enable_sw_signaling is called (which calls .enable_signaling).

Providing a custom wait function is optional in the interface, if the default 
wait
function is used all waiters are signaled when fence_signal is called.

Removing enable_signaling would only make sense if fence_signal was removed too,
but that would mean that fence_is_signaled could no longer exist in the core 
fence
code, and would mean completely rewriting the interface.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-19 Thread Maarten Lankhorst


op 19-05-14 10:27, Christian König schreef:

Am 19.05.2014 10:00, schrieb Maarten Lankhorst:

op 15-05-14 18:13, Christian König schreef:

Am 15.05.2014 17:58, schrieb Maarten Lankhorst:

op 15-05-14 17:48, Christian König schreef:

Am 15.05.2014 16:18, schrieb Maarten Lankhorst:

op 15-05-14 15:19, Christian König schreef:

Am 15.05.2014 15:04, schrieb Maarten Lankhorst:

op 15-05-14 11:42, Christian König schreef:

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:

op 15-05-14 11:21, Christian König schreef:

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:

op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+ radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Ah, I see. That's also the reason why you moved the wake_up_all out of the 
processing function.

Correct. :-)

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is called in 
non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout and can 
then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets a 
chance to wait for the hardware to become idle and so never has the opportunity 
to the reset the whole thing.

You could set up a hangcheck timer like intel does, and end up with a reliable 
hangcheck detection that doesn't depend on cpu waits. :-) Or override the 
default wait function and restore the old behavior.


Overriding the default wait function sounds better, please implement it this 
way.

Thanks,
Christian. 


Does this modification look sane?

Adding the timeout is on my todo list for quite some time as well, so this part 
makes sense.


+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+struct radeon_fence *fence = to_radeon_fence(f);
+u64 target_seq[RADEON_NUM_RINGS] = {};
+
+target_seq[fence->ring] = fence->seq;
+return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, 
timeout);
+}

When this call is comming from outside the radeon driver you need to lock 
rdev->exclusive_lock here to make sure not to interfere with a possible reset.

Ah thanks, I'll add that.


 .get_timeline_name = radeon_fence_get_timeline_name,
 .enable_signaling = radeon_fence_enable_signaling,
 .signaled = __radeon_fence_signaled,

Do we still need those callback when we implemented the wait callback?

.get_timeline_name is used for debugging (trace events).
.signaled is the non-blocking call to check if the fence is signaled or not.
.enable_signaling is used for adding callbacks upon fence completion, the 
default 'fence_default_wait' uses it, so
when it works no separate implementation is needed unless you want to do more 
than just waiting.
It's also used when fence_add_callback is called. i915 can be patched to use 
it. ;-)


I just meant enable_signaling, the other ones are fine with me. The problem 
with enable_signaling is that it's called with a spin lock held, so we can't 
sleep.

While resetting the GPU could be moved out into a timer the problem here is that I 
can't lock rdev->exclusive_lock in such situations.

This means when i915 would call into radeon to enable signaling for a fence we 
can't make sure that there is not GPU reset running on another CPU. And 
touching the IRQ registers while a reset is going on is a really good recipe to 
lockup the whole system.

If you increase the irq counter on all rings before doing a gpu reset, adjust 
the state and call sw_irq_put when done this race could never happ

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-19 Thread Maarten Lankhorst


op 15-05-14 18:13, Christian König schreef:

Am 15.05.2014 17:58, schrieb Maarten Lankhorst:

op 15-05-14 17:48, Christian König schreef:

Am 15.05.2014 16:18, schrieb Maarten Lankhorst:

op 15-05-14 15:19, Christian König schreef:

Am 15.05.2014 15:04, schrieb Maarten Lankhorst:

op 15-05-14 11:42, Christian König schreef:

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:

op 15-05-14 11:21, Christian König schreef:

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:

op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+ radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Ah, I see. That's also the reason why you moved the wake_up_all out of the 
processing function.

Correct. :-)

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is called in 
non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout and can 
then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets a 
chance to wait for the hardware to become idle and so never has the opportunity 
to the reset the whole thing.

You could set up a hangcheck timer like intel does, and end up with a reliable 
hangcheck detection that doesn't depend on cpu waits. :-) Or override the 
default wait function and restore the old behavior.


Overriding the default wait function sounds better, please implement it this 
way.

Thanks,
Christian. 


Does this modification look sane?

Adding the timeout is on my todo list for quite some time as well, so this part 
makes sense.


+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+struct radeon_fence *fence = to_radeon_fence(f);
+u64 target_seq[RADEON_NUM_RINGS] = {};
+
+target_seq[fence->ring] = fence->seq;
+return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, 
timeout);
+}

When this call is comming from outside the radeon driver you need to lock 
rdev->exclusive_lock here to make sure not to interfere with a possible reset.

Ah thanks, I'll add that.


 .get_timeline_name = radeon_fence_get_timeline_name,
 .enable_signaling = radeon_fence_enable_signaling,
 .signaled = __radeon_fence_signaled,

Do we still need those callback when we implemented the wait callback?

.get_timeline_name is used for debugging (trace events).
.signaled is the non-blocking call to check if the fence is signaled or not.
.enable_signaling is used for adding callbacks upon fence completion, the 
default 'fence_default_wait' uses it, so
when it works no separate implementation is needed unless you want to do more 
than just waiting.
It's also used when fence_add_callback is called. i915 can be patched to use 
it. ;-)


I just meant enable_signaling, the other ones are fine with me. The problem 
with enable_signaling is that it's called with a spin lock held, so we can't 
sleep.

While resetting the GPU could be moved out into a timer the problem here is that I 
can't lock rdev->exclusive_lock in such situations.

This means when i915 would call into radeon to enable signaling for a fence we 
can't make sure that there is not GPU reset running on another CPU. And 
touching the IRQ registers while a reset is going on is a really good recipe to 
lockup the whole system.

If you increase the irq counter on all rings before doing a gpu reset, adjust 
the state and call sw_irq_put when done this race could never happen. Or am I 
missing something?


Beside that's being extremely ugly in the case of a hard

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-15 Thread Maarten Lankhorst


op 15-05-14 17:48, Christian König schreef:

Am 15.05.2014 16:18, schrieb Maarten Lankhorst:

op 15-05-14 15:19, Christian König schreef:

Am 15.05.2014 15:04, schrieb Maarten Lankhorst:

op 15-05-14 11:42, Christian König schreef:

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:

op 15-05-14 11:21, Christian König schreef:

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:

op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+ radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Ah, I see. That's also the reason why you moved the wake_up_all out of the 
processing function.

Correct. :-)

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is called in 
non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout and can 
then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets a 
chance to wait for the hardware to become idle and so never has the opportunity 
to the reset the whole thing.

You could set up a hangcheck timer like intel does, and end up with a reliable 
hangcheck detection that doesn't depend on cpu waits. :-) Or override the 
default wait function and restore the old behavior.


Overriding the default wait function sounds better, please implement it this 
way.

Thanks,
Christian. 


Does this modification look sane?

Adding the timeout is on my todo list for quite some time as well, so this part 
makes sense.


+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+struct radeon_fence *fence = to_radeon_fence(f);
+u64 target_seq[RADEON_NUM_RINGS] = {};
+
+target_seq[fence->ring] = fence->seq;
+return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, 
timeout);
+}

When this call is comming from outside the radeon driver you need to lock 
rdev->exclusive_lock here to make sure not to interfere with a possible reset.

Ah thanks, I'll add that.


 .get_timeline_name = radeon_fence_get_timeline_name,
 .enable_signaling = radeon_fence_enable_signaling,
 .signaled = __radeon_fence_signaled,

Do we still need those callback when we implemented the wait callback?

.get_timeline_name is used for debugging (trace events).
.signaled is the non-blocking call to check if the fence is signaled or not.
.enable_signaling is used for adding callbacks upon fence completion, the 
default 'fence_default_wait' uses it, so
when it works no separate implementation is needed unless you want to do more 
than just waiting.
It's also used when fence_add_callback is called. i915 can be patched to use 
it. ;-)


I just meant enable_signaling, the other ones are fine with me. The problem 
with enable_signaling is that it's called with a spin lock held, so we can't 
sleep.

While resetting the GPU could be moved out into a timer the problem here is that I 
can't lock rdev->exclusive_lock in such situations.

This means when i915 would call into radeon to enable signaling for a fence we 
can't make sure that there is not GPU reset running on another CPU. And 
touching the IRQ registers while a reset is going on is a really good recipe to 
lockup the whole system.

If you increase the irq counter on all rings before doing a gpu reset, adjust 
the state and call sw_irq_put when done this race could never happen. Or am I 
missing something?

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-15 Thread Maarten Lankhorst


op 15-05-14 15:19, Christian König schreef:

Am 15.05.2014 15:04, schrieb Maarten Lankhorst:

op 15-05-14 11:42, Christian König schreef:

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:

op 15-05-14 11:21, Christian König schreef:

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:

op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Ah, I see. That's also the reason why you moved the wake_up_all out of the 
processing function.

Correct. :-)

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is called in 
non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout and can 
then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets a 
chance to wait for the hardware to become idle and so never has the opportunity 
to the reset the whole thing.

You could set up a hangcheck timer like intel does, and end up with a reliable 
hangcheck detection that doesn't depend on cpu waits. :-) Or override the 
default wait function and restore the old behavior.


Overriding the default wait function sounds better, please implement it this 
way.

Thanks,
Christian. 


Does this modification look sane?

Adding the timeout is on my todo list for quite some time as well, so this part 
makes sense.


+static long __radeon_fence_wait(struct fence *f, bool intr, long timeout)
+{
+struct radeon_fence *fence = to_radeon_fence(f);
+u64 target_seq[RADEON_NUM_RINGS] = {};
+
+target_seq[fence->ring] = fence->seq;
+return radeon_fence_wait_seq_timeout(fence->rdev, target_seq, intr, 
timeout);
+}

When this call is comming from outside the radeon driver you need to lock 
rdev->exclusive_lock here to make sure not to interfere with a possible reset.

Ah thanks, I'll add that.


 .get_timeline_name = radeon_fence_get_timeline_name,
 .enable_signaling = radeon_fence_enable_signaling,
 .signaled = __radeon_fence_signaled,

Do we still need those callback when we implemented the wait callback?

.get_timeline_name is used for debugging (trace events).
.signaled is the non-blocking call to check if the fence is signaled or not.
.enable_signaling is used for adding callbacks upon fence completion, the 
default 'fence_default_wait' uses it, so
when it works no separate implementation is needed unless you want to do more 
than just waiting.
It's also used when fence_add_callback is called. i915 can be patched to use 
it. ;-)

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-15 Thread Maarten Lankhorst


op 15-05-14 11:42, Christian König schreef:

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:

op 15-05-14 11:21, Christian König schreef:

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:

op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+ __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Ah, I see. That's also the reason why you moved the wake_up_all out of the 
processing function.

Correct. :-)

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is called in 
non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout and can 
then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets a 
chance to wait for the hardware to become idle and so never has the opportunity 
to the reset the whole thing.

You could set up a hangcheck timer like intel does, and end up with a reliable 
hangcheck detection that doesn't depend on cpu waits. :-) Or override the 
default wait function and restore the old behavior.


Overriding the default wait function sounds better, please implement it this 
way.

Thanks,
Christian. 


Does this modification look sane?

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index bc844f300d3f..2d415eb2834a 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -361,28 +361,35 @@ static bool radeon_fence_any_seq_signaled(struct 
radeon_device *rdev, u64 *seq)
 }
 
 /**

- * radeon_fence_wait_seq - wait for a specific sequence numbers
+ * radeon_fence_wait_seq_timeout - wait for a specific sequence numbers
  *
  * @rdev: radeon device pointer
  * @target_seq: sequence number(s) we want to wait for
  * @intr: use interruptable sleep
+ * @timeout: maximum time to wait, or MAX_SCHEDULE_TIMEOUT for infinite wait
  *
  * Wait for the requested sequence number(s) to be written by any ring
  * (all asics).  Sequnce number array is indexed by ring id.
  * @intr selects whether to use interruptable (true) or non-interruptable
  * (false) sleep when waiting for the sequence number.  Helper function
  * for radeon_fence_wait_*().
- * Returns 0 if the sequence number has passed, error for all other cases.
+ * Returns remaining time if the sequence number has passed, 0 when
+ * the wait timeout, or an error for all other cases.
  * -EDEADLK is returned when a GPU lockup has been detected.
  */
-static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 *target_seq,
-bool intr)
+static int radeon_fence_wait_seq_timeout(struct radeon_device *rdev,
+u64 *target_seq, bool intr,
+long timeout)
 {
uint64_t last_seq[RADEON_NUM_RINGS];
bool signaled;
-   int i, r;
+   int i;
 
 	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {

+   long r, waited = timeout;
+
+   waited = timeout < RADEON_FENCE_JIFFIES_TIMEOUT ?
+timeout : RADEON_FENCE_JIFFIES_TIMEOUT;
 
 		/* Save current sequence values, used to check for GPU lockups */

for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -397,13 +404,15 @@ static int radeon_fence_wait_seq(struct radeon_device 
*rdev, u64 *target_seq,
if (intr) {
r = wait_event_interruptible_timeout(rdev->fence_queue, 
(
(signaled = radeon_fence_any_seq_signaled(rdev, 
target_s

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-15 Thread Maarten Lankhorst


op 15-05-14 11:21, Christian König schreef:

Am 15.05.2014 03:06, schrieb Maarten Lankhorst:

op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Ah, I see. That's also the reason why you moved the wake_up_all out of the 
processing function.

Correct. :-)

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

Timeouts help with the detection of the lockup, but not at all with the 
handling of them.

What we essentially need is a wait callback into the driver that is called in 
non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout and can 
then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never gets a 
chance to wait for the hardware to become idle and so never has the opportunity 
to the reset the whole thing.

You could set up a hangcheck timer like intel does, and end up with a reliable 
hangcheck detection that doesn't depend on cpu waits. :-) Or override the 
default wait function and restore the old behavior.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-14 Thread Maarten Lankhorst


op 14-05-14 17:29, Christian König schreef:

+/* did fence get signaled after we enabled the sw irq? */
+if (atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+return false;
+}
+
+fence->fence_wake.flags = 0;
+fence->fence_wake.private = NULL;
+fence->fence_wake.func = radeon_fence_check_signaled;
+__add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+fence_get(f);

That looks like a race condition to me. The fence needs to be added to the wait 
queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to me. How for example is lockup detection supposed to happen with this? 

It's not a race condition because fence_queue.lock is held when this function 
is called.

Lockup's a bit of a weird problem, the changes wouldn't allow core ttm code to 
handle the lockup any more,
but any driver specific wait code would still handle this. I did this by 
design, because in future patches the wait
function may be called from outside of the radeon driver. The official wait 
function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 30*HZ for 
example, it would still return
and report that the function timed out.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 07/16] drm/nouveau: rework to new fence interface

2014-05-14 Thread Maarten Lankhorst

From: Maarten Lankhorst 

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/core/core/event.c |4 
 drivers/gpu/drm/nouveau/nouveau_bo.c  |6 
 drivers/gpu/drm/nouveau/nouveau_display.c |4 
 drivers/gpu/drm/nouveau/nouveau_fence.c   |  434 -
 drivers/gpu/drm/nouveau/nouveau_fence.h   |   20 +
 drivers/gpu/drm/nouveau/nouveau_gem.c |   17 -
 drivers/gpu/drm/nouveau/nv04_fence.c  |4 
 drivers/gpu/drm/nouveau/nv10_fence.c  |4 
 drivers/gpu/drm/nouveau/nv17_fence.c  |2 
 drivers/gpu/drm/nouveau/nv50_fence.c  |2 
 drivers/gpu/drm/nouveau/nv84_fence.c  |   11 -
 11 files changed, 329 insertions(+), 179 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/core/core/event.c 
b/drivers/gpu/drm/nouveau/core/core/event.c
index 3f3c76581a9e..167408b72099 100644
--- a/drivers/gpu/drm/nouveau/core/core/event.c
+++ b/drivers/gpu/drm/nouveau/core/core/event.c
@@ -118,14 +118,14 @@ nouveau_event_ref(struct nouveau_eventh *handler, struct 
nouveau_eventh **ref)
 void
 nouveau_event_trigger(struct nouveau_event *event, int index)
 {
-   struct nouveau_eventh *handler;
+   struct nouveau_eventh *handler, *next;
unsigned long flags;
 
if (WARN_ON(index >= event->index_nr))
return;
 
spin_lock_irqsave(&event->list_lock, flags);
-   list_for_each_entry(handler, &event->index[index].list, head) {
+   list_for_each_entry_safe(handler, next, &event->index[index].list, 
head) {
if (test_bit(NVKM_EVENT_ENABLE, &handler->flags) &&
handler->func(handler->priv, index) == NVKM_EVENT_DROP)
nouveau_event_put(handler);
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index e98af2e9a1cb..84aba3fa1bd0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -959,7 +959,7 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict, bool intr,
}
 
mutex_lock_nested(&chan->cli->mutex, SINGLE_DEPTH_NESTING);
-   ret = nouveau_fence_sync(bo->sync_obj, chan);
+   ret = nouveau_fence_sync(nouveau_bo(bo), chan);
if (ret == 0) {
ret = drm->ttm.move(chan, bo, &bo->mem, new_mem);
if (ret == 0) {
@@ -1432,10 +1432,12 @@ nouveau_bo_fence_unref(void **sync_obj)
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
-   lockdep_assert_held(&nvbo->bo.resv->lock.base);
+   struct reservation_object *resv = nvbo->bo.resv;
 
nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
nvbo->bo.sync_obj = nouveau_fence_ref(fence);
+
+   reservation_object_add_excl_fence(resv, &fence->base);
 }
 
 static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 6a0ca004bd19..eeb8762feaf0 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -660,7 +660,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
spin_unlock_irqrestore(&dev->event_lock, flags);
 
/* Synchronize with the old framebuffer */
-   ret = nouveau_fence_sync(old_bo->bo.sync_obj, chan);
+   ret = nouveau_fence_sync(old_bo, chan);
if (ret)
goto fail;
 
@@ -721,7 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
goto fail_unpin;
 
/* synchronise rendering channel with the kernel's channel */
-   ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
+   ret = nouveau_fence_sync(new_bo, chan);
if (ret) {
ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index 90074d620e31..9a9e04985826 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -32,91 +32,139 @@
 #include "nouveau_drm.h"
 #include "nouveau_dma.h"
 #include "nouveau_fence.h"
+#include 
 
 #include 
 
-struct fence_work {
-   struct work_struct base;
-   struct list_head head;
-   void (*func)(void *);
-   void *data;
-};
+static const struct fence_ops nouveau_fence_ops_uevent;
+static const struct fence_ops nouveau_fence_ops_legacy;
 
 static void
 nouveau_fence_signal(struct nouveau_fence *fence)
 {
-   struct fence_work *work, *temp;
+   __fence_signal(&fence->base);
+   list_del(&fence->head);
+
+   if (fence->base.ops == &nouveau_fence_ops_uevent &&
+   fence->event.head.next) {
+   struct nouveau_event *event;
 
-   list_for_each_entry_safe(work, temp, &fence->work, head) {
-   schedule_

[RFC PATCH v1 09/16] drm/qxl: rework to new fence interface

2014-05-14 Thread Maarten Lankhorst

Final driver! \o/

This is not a proper dma_fence because the hardware may never signal
anything, so don't use dma-buf with qxl, ever.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/qxl/Makefile  |2 
 drivers/gpu/drm/qxl/qxl_cmd.c |5 -
 drivers/gpu/drm/qxl/qxl_debugfs.c |   12 ++-
 drivers/gpu/drm/qxl/qxl_drv.h |   22 ++---
 drivers/gpu/drm/qxl/qxl_fence.c   |   87 ---
 drivers/gpu/drm/qxl/qxl_kms.c |2 
 drivers/gpu/drm/qxl/qxl_object.c  |2 
 drivers/gpu/drm/qxl/qxl_release.c |  166 -
 drivers/gpu/drm/qxl/qxl_ttm.c |   97 --
 9 files changed, 220 insertions(+), 175 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

diff --git a/drivers/gpu/drm/qxl/Makefile b/drivers/gpu/drm/qxl/Makefile
index ea046ba691d2..ac0d74852e11 100644
--- a/drivers/gpu/drm/qxl/Makefile
+++ b/drivers/gpu/drm/qxl/Makefile
@@ -4,6 +4,6 @@
 
 ccflags-y := -Iinclude/drm
 
-qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o 
qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o 
qxl_ioctl.o qxl_fence.o qxl_release.o
+qxl-y := qxl_drv.o qxl_kms.o qxl_display.o qxl_ttm.o qxl_fb.o qxl_object.o 
qxl_gem.o qxl_cmd.o qxl_image.o qxl_draw.o qxl_debugfs.o qxl_irq.o qxl_dumb.o 
qxl_ioctl.o qxl_release.o
 
 obj-$(CONFIG_DRM_QXL)+= qxl.o
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 45fad7b45486..97823644d347 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -620,11 +620,6 @@ static int qxl_reap_surf(struct qxl_device *qdev, struct 
qxl_bo *surf, bool stal
if (ret == -EBUSY)
return -EBUSY;
 
-   if (surf->fence.num_active_releases > 0 && stall == false) {
-   qxl_bo_unreserve(surf);
-   return -EBUSY;
-   }
-
if (stall)
mutex_unlock(&qdev->surf_evict_mutex);
 
diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c 
b/drivers/gpu/drm/qxl/qxl_debugfs.c
index c3c2bbdc6674..0d144e0646d6 100644
--- a/drivers/gpu/drm/qxl/qxl_debugfs.c
+++ b/drivers/gpu/drm/qxl/qxl_debugfs.c
@@ -57,11 +57,21 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data)
struct qxl_device *qdev = node->minor->dev->dev_private;
struct qxl_bo *bo;
 
+   spin_lock(&qdev->release_lock);
list_for_each_entry(bo, &qdev->gem.objects, list) {
+   struct reservation_object_list *fobj;
+   int rel;
+
+   rcu_read_lock();
+   fobj = rcu_dereference(bo->tbo.resv->fence);
+   rel = fobj ? fobj->shared_count : 0;
+   rcu_read_unlock();
+
seq_printf(m, "size %ld, pc %d, sync obj %p, num releases %d\n",
   (unsigned long)bo->gem_base.size, bo->pin_count,
-  bo->tbo.sync_obj, bo->fence.num_active_releases);
+  bo->tbo.sync_obj, rel);
}
+   spin_unlock(&qdev->release_lock);
return 0;
 }
 
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 36ed40ba773f..d547cbdebeb4 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -31,6 +31,7 @@
  * Definitions taken from spice-protocol, plus kernel driver specific bits.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -95,13 +96,6 @@ enum {
QXL_INTERRUPT_IO_CMD |\
QXL_INTERRUPT_CLIENT_MONITORS_CONFIG)
 
-struct qxl_fence {
-   struct qxl_device *qdev;
-   uint32_t num_active_releases;
-   uint32_t *release_ids;
-   struct radix_tree_root tree;
-};
-
 struct qxl_bo {
/* Protected by gem.mutex */
struct list_headlist;
@@ -113,13 +107,13 @@ struct qxl_bo {
unsignedpin_count;
void*kptr;
int type;
+
/* Constant after initialization */
struct drm_gem_object   gem_base;
bool is_primary; /* is this now a primary surface */
bool hw_surf_alloc;
struct qxl_surface surf;
uint32_t surface_id;
-   struct qxl_fence fence; /* per bo fence  - list of releases */
struct qxl_release *surf_create;
 };
 #define gem_to_qxl_bo(gobj) container_of((gobj), struct qxl_bo, gem_base)
@@ -191,6 +185,8 @@ enum {
  * spice-protocol/qxl_dev.h */
 #define QXL_MAX_RES 96
 struct qxl_release {
+   struct fence base;
+
int id;
int type;
uint32_t release_offset;
@@ -284,7 +280,11 @@ struct qxl_device {
uint8_t slot_gen_bits;
uint64_tva_slot_mask;
 
+   /* XXX: when rcu becomes available, release_lock can be killed */
+   spinlock_t  release_lock;
+   spinlock_t  fence_lock;
struct idr  release_idr;

[RFC PATCH v1 06/16] drm/ttm: kill fence_lock

2014-05-14 Thread Maarten Lankhorst

No users are left, kill it off! :D
Conversion to the reservation api is next on the list, after
that the functionality can be restored with rcu.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   25 +++---
 drivers/gpu/drm/nouveau/nouveau_display.c |6 --
 drivers/gpu/drm/nouveau/nouveau_gem.c |   16 +-
 drivers/gpu/drm/qxl/qxl_cmd.c |2 -
 drivers/gpu/drm/qxl/qxl_fence.c   |4 --
 drivers/gpu/drm/qxl/qxl_object.h  |2 -
 drivers/gpu/drm/qxl/qxl_release.c |2 -
 drivers/gpu/drm/radeon/radeon_display.c   |2 -
 drivers/gpu/drm/radeon/radeon_object.c|2 -
 drivers/gpu/drm/ttm/ttm_bo.c  |   75 +++--
 drivers/gpu/drm/ttm/ttm_bo_util.c |5 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |3 -
 drivers/gpu/drm/ttm/ttm_execbuf_util.c|2 -
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c|4 --
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |   17 ++-
 include/drm/ttm/ttm_bo_api.h  |5 --
 include/drm/ttm/ttm_bo_driver.h   |3 -
 17 files changed, 36 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 33eb7164525a..e98af2e9a1cb 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1196,9 +1196,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, bool evict, 
bool intr,
}
 
/* Fallback to software copy. */
-   spin_lock(&bo->bdev->fence_lock);
ret = ttm_bo_wait(bo, true, intr, no_wait_gpu);
-   spin_unlock(&bo->bdev->fence_lock);
if (ret == 0)
ret = ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
 
@@ -1425,26 +1423,19 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
 }
 
+static void
+nouveau_bo_fence_unref(void **sync_obj)
+{
+   nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+}
+
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
-   struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
-   struct nouveau_fence *old_fence = NULL;
-
lockdep_assert_held(&nvbo->bo.resv->lock.base);
 
-   spin_lock(&nvbo->bo.bdev->fence_lock);
-   old_fence = nvbo->bo.sync_obj;
-   nvbo->bo.sync_obj = new_fence;
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
-
-   nouveau_fence_unref(&old_fence);
-}
-
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
-   nouveau_fence_unref((struct nouveau_fence **)sync_obj);
+   nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
+   nvbo->bo.sync_obj = nouveau_fence_ref(fence);
 }
 
 static void *
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 61b8c3375135..6a0ca004bd19 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -721,11 +721,7 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
goto fail_unpin;
 
/* synchronise rendering channel with the kernel's channel */
-   spin_lock(&new_bo->bo.bdev->fence_lock);
-   fence = nouveau_fence_ref(new_bo->bo.sync_obj);
-   spin_unlock(&new_bo->bo.bdev->fence_lock);
-   ret = nouveau_fence_sync(fence, chan);
-   nouveau_fence_unref(&fence);
+   ret = nouveau_fence_sync(new_bo->bo.sync_obj, chan);
if (ret) {
ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 6e1c58a880fe..6cd5298cbb53 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -105,9 +105,7 @@ nouveau_gem_object_unmap(struct nouveau_bo *nvbo, struct 
nouveau_vma *vma)
list_del(&vma->head);
 
if (mapped) {
-   spin_lock(&nvbo->bo.bdev->fence_lock);
fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
}
 
if (fence) {
@@ -432,17 +430,11 @@ retry:
 static int
 validate_sync(struct nouveau_channel *chan, struct nouveau_bo *nvbo)
 {
-   struct nouveau_fence *fence = NULL;
+   struct nouveau_fence *fence = nvbo->bo.sync_obj;
int ret = 0;
 
-   spin_lock(&nvbo->bo.bdev->fence_lock);
-   fence = nouveau_fence_ref(nvbo->bo.sync_obj);
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
-
-   if (fence) {
+   if (fence)
ret = nouveau_fence_sync(fence, chan);
-   nouveau_fence_unref(&fence);
-   }
 
return ret;
 }
@@ -661,9 +653,7 @@ nouveau_gem_pushbuf_reloc_apply(struct nouveau_cli *cli,

[RFC PATCH v1 11/16] drm/vmwgfx: rework to new fence interface

2014-05-14 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |2 
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c|  299 ++
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h|   29 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |9 -
 4 files changed, 200 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index db30b790ad24..f3f8caa09cc8 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -2360,7 +2360,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
BUG_ON(fence == NULL);
 
fence_rep.handle = fence_handle;
-   fence_rep.seqno = fence->seqno;
+   fence_rep.seqno = fence->base.seqno;
vmw_update_seqno(dev_priv, &dev_priv->fifo);
fence_rep.passed_seqno = dev_priv->last_read_seqno;
}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 05b9eea8e875..5d595ca5d82a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -46,6 +46,7 @@ struct vmw_fence_manager {
bool goal_irq_on; /* Protected by @goal_irq_mutex */
bool seqno_valid; /* Protected by @lock, and may not be set to true
 without the @goal_irq_mutex held. */
+   unsigned ctx;
 };
 
 struct vmw_user_fence {
@@ -80,6 +81,12 @@ struct vmw_event_fence_action {
uint32_t *tv_usec;
 };
 
+static struct vmw_fence_manager *
+fman_from_fence(struct vmw_fence_obj *fence)
+{
+   return container_of(fence->base.lock, struct vmw_fence_manager, lock);
+}
+
 /**
  * Note on fencing subsystem usage of irqs:
  * Typically the vmw_fences_update function is called
@@ -102,25 +109,130 @@ struct vmw_event_fence_action {
  * objects with actions attached to them.
  */
 
-static void vmw_fence_obj_destroy_locked(struct kref *kref)
+static void vmw_fence_obj_destroy(struct fence *f)
 {
struct vmw_fence_obj *fence =
-   container_of(kref, struct vmw_fence_obj, kref);
+   container_of(f, struct vmw_fence_obj, base);
 
-   struct vmw_fence_manager *fman = fence->fman;
-   unsigned int num_fences;
+   struct vmw_fence_manager *fman = fman_from_fence(fence);
+   unsigned long irq_flags;
 
+   spin_lock_irqsave(&fman->lock, irq_flags);
list_del_init(&fence->head);
-   num_fences = --fman->num_fence_objects;
-   spin_unlock_irq(&fman->lock);
-   if (fence->destroy)
-   fence->destroy(fence);
-   else
-   kfree(fence);
+   --fman->num_fence_objects;
+   spin_unlock_irqrestore(&fman->lock, irq_flags);
+   fence->destroy(fence);
+}
 
-   spin_lock_irq(&fman->lock);
+static const char *vmw_fence_get_driver_name(struct fence *f)
+{
+   return "vmwgfx";
+}
+
+static const char *vmw_fence_get_timeline_name(struct fence *f)
+{
+   return "svga";
+}
+
+static bool vmw_fence_enable_signaling(struct fence *f)
+{
+   struct vmw_fence_obj *fence =
+   container_of(f, struct vmw_fence_obj, base);
+
+   struct vmw_fence_manager *fman = fman_from_fence(fence);
+
+   __le32 __iomem *fifo_mem = fman->dev_priv->mmio_virt;
+   u32 seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
+   if (seqno - fence->base.seqno < VMW_FENCE_WRAP)
+   return false;
+
+   vmw_fifo_ping_host(fman->dev_priv, SVGA_SYNC_GENERIC);
+
+   return true;
+}
+
+struct vmwgfx_wait_cb {
+   struct fence_cb base;
+   struct task_struct *task;
+};
+
+static void
+vmwgfx_wait_cb(struct fence *fence, struct fence_cb *cb)
+{
+   struct vmwgfx_wait_cb *wait =
+   container_of(cb, struct vmwgfx_wait_cb, base);
+
+   wake_up_state(wait->task, TASK_NORMAL);
 }
 
+static void __vmw_fences_update(struct vmw_fence_manager *fman);
+
+static long vmw_fence_wait(struct fence *f, bool intr, signed long timeout)
+{
+   struct vmw_fence_obj *fence =
+   container_of(f, struct vmw_fence_obj, base);
+
+   struct vmw_fence_manager *fman = fman_from_fence(fence);
+   struct vmw_private *dev_priv = fman->dev_priv;
+   struct vmwgfx_wait_cb cb;
+   long ret = timeout;
+   unsigned long irq_flags;
+
+   if (likely(vmw_fence_obj_signaled(fence)))
+   return timeout;
+
+   vmw_fifo_ping_host(dev_priv, SVGA_SYNC_GENERIC);
+   vmw_seqno_waiter_add(dev_priv);
+
+   spin_lock_irqsave(f->lock, irq_flags);
+
+   if (intr && signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   goto out;
+   }
+
+   cb.base.func = vmwgfx_wait_cb;
+   cb.task = current;
+   list_add(&cb.base.node, &f->cb_list);

[RFC PATCH v1 14/16] drm/radeon: use rcu waits in some ioctls

2014-05-14 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon_gem.c |   19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gem.c 
b/drivers/gpu/drm/radeon/radeon_gem.c
index d09650c1d720..7ba883843668 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -107,9 +107,12 @@ static int radeon_gem_set_domain(struct drm_gem_object 
*gobj,
}
if (domain == RADEON_GEM_DOMAIN_CPU) {
/* Asking for cpu access wait for object idle */
-   r = radeon_bo_wait(robj, NULL, false);
-   if (r) {
-   printk(KERN_ERR "Failed to wait for object !\n");
+   r = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, 
true, 30 * HZ);
+   if (!r)
+   r = -EBUSY;
+
+   if (r < 0 && r != -EINTR) {
+   printk(KERN_ERR "Failed to wait for object: %i\n", r);
return r;
}
}
@@ -357,14 +360,20 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, 
void *data,
struct drm_radeon_gem_wait_idle *args = data;
struct drm_gem_object *gobj;
struct radeon_bo *robj;
-   int r;
+   int r = 0;
+   long ret;
 
gobj = drm_gem_object_lookup(dev, filp, args->handle);
if (gobj == NULL) {
return -ENOENT;
}
robj = gem_to_radeon_bo(gobj);
-   r = radeon_bo_wait(robj, NULL, false);
+   ret = reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 
30 * HZ);
+   if (ret == 0)
+   r = -EBUSY;
+   else if (ret < 0)
+   r = ret;
+
/* callback hw specific functions if any */
if (rdev->asic->ioctl_wait_idle)
robj->rdev->asic->ioctl_wait_idle(rdev, robj);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

2014-05-14 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon.h|   15 +--
 drivers/gpu/drm/radeon/radeon_device.c |1 
 drivers/gpu/drm/radeon/radeon_fence.c  |  189 +---
 3 files changed, 153 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 68528619834a..a7d839a158ae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -64,6 +64,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -113,9 +114,6 @@ extern int radeon_hard_reset;
 #define RADEONFB_CONN_LIMIT4
 #define RADEON_BIOS_NUM_SCRATCH8
 
-/* fence seq are set to this number when signaled */
-#define RADEON_FENCE_SIGNALED_SEQ  0LL
-
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
 #define RADEON_RING_TYPE_GFX_INDEX 0
@@ -347,12 +345,15 @@ struct radeon_fence_driver {
 };
 
 struct radeon_fence {
+   struct fence base;
+
struct radeon_device*rdev;
-   struct kref kref;
/* protected by radeon_fence.lock */
uint64_tseq;
/* RB, DMA, etc. */
unsignedring;
+
+   wait_queue_t fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -2256,6 +2257,7 @@ struct radeon_device {
struct radeon_mman  mman;
struct radeon_fence_driver  fence_drv[RADEON_NUM_RINGS];
wait_queue_head_t   fence_queue;
+   unsignedfence_context;
struct mutexring_lock;
struct radeon_ring  ring[RADEON_NUM_RINGS];
boolib_pool_ready;
@@ -2346,11 +2348,6 @@ u32 cik_mm_rdoorbell(struct radeon_device *rdev, u32 
index);
 void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 
 /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
  * Registers read & write functions.
  */
 #define RREG8(reg) readb((rdev->rmmio) + (reg))
diff --git a/drivers/gpu/drm/radeon/radeon_device.c 
b/drivers/gpu/drm/radeon/radeon_device.c
index 0e770bbf7e29..501d0cf9eb8b 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1175,6 +1175,7 @@ int radeon_device_init(struct radeon_device *rdev,
for (i = 0; i < RADEON_NUM_RINGS; i++) {
rdev->ring[i].idx = i;
}
+   rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 
0x%04X:0x%04X).\n",
radeon_family_name[rdev->family], pdev->vendor, pdev->device,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index a77b1c13ea43..bc844f300d3f 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -39,6 +39,15 @@
 #include "radeon.h"
 #include "radeon_trace.h"
 
+static const struct fence_ops radeon_fence_ops;
+
+#define to_radeon_fence(p) \
+   ({  \
+   struct radeon_fence *__f;   \
+   __f = container_of((p), struct radeon_fence, base); \
+   __f->base.ops == &radeon_fence_ops ? __f : NULL;\
+   })
+
 /*
  * Fences
  * Fences mark an event in the GPUs pipeline and are used
@@ -111,30 +120,55 @@ int radeon_fence_emit(struct radeon_device *rdev,
  struct radeon_fence **fence,
  int ring)
 {
+   u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
/* we are protected by the ring emission mutex */
*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
if ((*fence) == NULL) {
return -ENOMEM;
}
-   kref_init(&((*fence)->kref));
-   (*fence)->rdev = rdev;
-   (*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
(*fence)->ring = ring;
+   __fence_init(&(*fence)->base, &radeon_fence_ops,
+&rdev->fence_queue.lock, rdev->fence_context + ring, seq);
+   (*fence)->rdev = rdev;
+   (*fence)->seq = seq;
radeon_fence_ring_emit(rdev, ring, *fence);
trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
return 0;
 }
 
 /**
- * radeon_fence_process - process a fence
+ * radeon_fence_check_signaled - callback from fence_queue
  *
- * @rdev: radeon_device pointer
- * @ring: ring index the fence is associated with
- *
- * Checks the current fence value and wakes the fence queue
- * if the sequence number has increased (all asics).
+ * this function is called with fence_queue lock held, which is

[RFC PATCH v1 10/16] drm/vmwgfx: get rid of different types of fence_flags entirely

2014-05-14 Thread Maarten Lankhorst

Only one type was ever used. This is needed to simplify the fence
support in the next commit.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c  |5 +--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |   14 ++---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c   |   50 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h   |8 +
 5 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 4a36bb1dc525..f15718cc631d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -792,15 +792,12 @@ static int vmw_sync_obj_flush(void *sync_obj)
 
 static bool vmw_sync_obj_signaled(void *sync_obj)
 {
-   return  vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj,
-  DRM_VMW_FENCE_FLAG_EXEC);
-
+   return vmw_fence_obj_signaled((struct vmw_fence_obj *) sync_obj);
 }
 
 static int vmw_sync_obj_wait(void *sync_obj, bool lazy, bool interruptible)
 {
return vmw_fence_obj_wait((struct vmw_fence_obj *) sync_obj,
- DRM_VMW_FENCE_FLAG_EXEC,
  lazy, interruptible,
  VMW_FENCE_WAIT_TIMEOUT);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 6b252a887ae2..f217e9723b9e 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -332,7 +332,6 @@ struct vmw_sw_context{
uint32_t *cmd_bounce;
uint32_t cmd_bounce_size;
struct list_head resource_list;
-   uint32_t fence_flags;
struct ttm_buffer_object *cur_query_bo;
struct list_head res_relocations;
uint32_t *buf_start;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index f8b25bc4e634..db30b790ad24 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -350,8 +350,6 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context 
*sw_context,
vval_buf->validate_as_mob = validate_as_mob;
}
 
-   sw_context->fence_flags |= DRM_VMW_FENCE_FLAG_EXEC;
-
if (p_val_node)
*p_val_node = val_node;
 
@@ -2308,13 +2306,9 @@ int vmw_execbuf_fence_commands(struct drm_file 
*file_priv,
 
if (p_handle != NULL)
ret = vmw_user_fence_create(file_priv, dev_priv->fman,
-   sequence,
-   DRM_VMW_FENCE_FLAG_EXEC,
-   p_fence, p_handle);
+   sequence, p_fence, p_handle);
else
-   ret = vmw_fence_create(dev_priv->fman, sequence,
-  DRM_VMW_FENCE_FLAG_EXEC,
-  p_fence);
+   ret = vmw_fence_create(dev_priv->fman, sequence, p_fence);
 
if (unlikely(ret != 0 && !synced)) {
(void) vmw_fallback_wait(dev_priv, false, false,
@@ -2387,8 +2381,7 @@ vmw_execbuf_copy_fence_user(struct vmw_private *dev_priv,
ttm_ref_object_base_unref(vmw_fp->tfile,
  fence_handle, TTM_REF_USAGE);
DRM_ERROR("Fence copy error. Syncing.\n");
-   (void) vmw_fence_obj_wait(fence, fence->signal_mask,
- false, false,
+   (void) vmw_fence_obj_wait(fence, false, false,
  VMW_FENCE_WAIT_TIMEOUT);
}
 }
@@ -2438,7 +2431,6 @@ int vmw_execbuf_process(struct drm_file *file_priv,
sw_context->fp = vmw_fpriv(file_priv);
sw_context->cur_reloc = 0;
sw_context->cur_val_buf = 0;
-   sw_context->fence_flags = 0;
INIT_LIST_HEAD(&sw_context->resource_list);
sw_context->cur_query_bo = dev_priv->pinned_bo;
sw_context->last_query_ctx = NULL;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 436b013b4231..05b9eea8e875 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -207,9 +207,7 @@ void vmw_fence_manager_takedown(struct vmw_fence_manager 
*fman)
 }
 
 static int vmw_fence_obj_init(struct vmw_fence_manager *fman,
- struct vmw_fence_obj *fence,
- u32 seqno,
- uint32_t mask,
+ struct vmw_fence_obj *fence, u32 seqno,
  void (*destroy) (struct vmw_fence_obj *fence))
 {
unsigned long irq_flags;
@@ -220,7 +218,6 @@ static int vmw_fence_obj_init(struct vmw_fence_manager 
*fman,

[RFC PATCH v1 12/16] drm/ttm: flip the switch, and convert to dma_fence

2014-05-14 Thread Maarten Lankhorst


---
 drivers/gpu/drm/nouveau/nouveau_bo.c |   48 +---
 drivers/gpu/drm/nouveau/nouveau_fence.c  |   24 +---
 drivers/gpu/drm/nouveau/nouveau_fence.h  |2 
 drivers/gpu/drm/nouveau/nouveau_gem.c|   16 ++-
 drivers/gpu/drm/qxl/qxl_debugfs.c|6 +
 drivers/gpu/drm/qxl/qxl_drv.h|2 
 drivers/gpu/drm/qxl/qxl_kms.c|1 
 drivers/gpu/drm/qxl/qxl_object.h |4 -
 drivers/gpu/drm/qxl/qxl_release.c|3 -
 drivers/gpu/drm/qxl/qxl_ttm.c|  104 --
 drivers/gpu/drm/radeon/radeon_cs.c   |   10 +-
 drivers/gpu/drm/radeon/radeon_display.c  |   18 +++
 drivers/gpu/drm/radeon/radeon_object.c   |4 -
 drivers/gpu/drm/radeon/radeon_ttm.c  |   34 --
 drivers/gpu/drm/radeon/radeon_uvd.c  |8 +
 drivers/gpu/drm/radeon/radeon_vm.c   |2 
 drivers/gpu/drm/ttm/ttm_bo.c |  171 +-
 drivers/gpu/drm/ttm/ttm_bo_util.c|   23 +---
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |   10 --
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c   |   40 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |   14 +-
 include/drm/ttm/ttm_bo_api.h |2 
 include/drm/ttm/ttm_bo_driver.h  |   26 -
 include/drm/ttm/ttm_execbuf_util.h   |   10 +-
 24 files changed, 197 insertions(+), 385 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 84aba3fa1bd0..5b8ccc39a282 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -92,13 +92,13 @@ nv10_bo_get_tile_region(struct drm_device *dev, int i)
 
 static void
 nv10_bo_put_tile_region(struct drm_device *dev, struct nouveau_drm_tile *tile,
-   struct nouveau_fence *fence)
+   struct fence *fence)
 {
struct nouveau_drm *drm = nouveau_drm(dev);
 
if (tile) {
spin_lock(&drm->tile.lock);
-   tile->fence = nouveau_fence_ref(fence);
+   tile->fence = nouveau_fence_ref((struct nouveau_fence *)fence);
tile->used = false;
spin_unlock(&drm->tile.lock);
}
@@ -965,7 +965,8 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict, bool intr,
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
-   ret = ttm_bo_move_accel_cleanup(bo, fence,
+   ret = ttm_bo_move_accel_cleanup(bo,
+   &fence->base,
evict,
no_wait_gpu,
new_mem);
@@ -1151,8 +1152,9 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo,
 {
struct nouveau_drm *drm = nouveau_bdev(bo->bdev);
struct drm_device *dev = drm->dev;
+   struct fence *fence = reservation_object_get_excl(bo->resv);
 
-   nv10_bo_put_tile_region(dev, *old_tile, bo->sync_obj);
+   nv10_bo_put_tile_region(dev, *old_tile, fence);
*old_tile = new_tile;
 }
 
@@ -1423,47 +1425,14 @@ nouveau_ttm_tt_unpopulate(struct ttm_tt *ttm)
ttm_pool_unpopulate(ttm);
 }
 
-static void
-nouveau_bo_fence_unref(void **sync_obj)
-{
-   nouveau_fence_unref((struct nouveau_fence **)sync_obj);
-}
-
 void
 nouveau_bo_fence(struct nouveau_bo *nvbo, struct nouveau_fence *fence)
 {
struct reservation_object *resv = nvbo->bo.resv;
 
-   nouveau_bo_fence_unref(&nvbo->bo.sync_obj);
-   nvbo->bo.sync_obj = nouveau_fence_ref(fence);
-
reservation_object_add_excl_fence(resv, &fence->base);
 }
 
-static void *
-nouveau_bo_fence_ref(void *sync_obj)
-{
-   return nouveau_fence_ref(sync_obj);
-}
-
-static bool
-nouveau_bo_fence_signalled(void *sync_obj)
-{
-   return nouveau_fence_done(sync_obj);
-}
-
-static int
-nouveau_bo_fence_wait(void *sync_obj, bool lazy, bool intr)
-{
-   return nouveau_fence_wait(sync_obj, lazy, intr);
-}
-
-static int
-nouveau_bo_fence_flush(void *sync_obj)
-{
-   return 0;
-}
-
 struct ttm_bo_driver nouveau_bo_driver = {
.ttm_tt_create = &nouveau_ttm_tt_create,
.ttm_tt_populate = &nouveau_ttm_tt_populate,
@@ -1474,11 +1443,6 @@ struct ttm_bo_driver nouveau_bo_driver = {
.move_notify = nouveau_bo_move_ntfy,
.move = nouveau_bo_move,
.verify_access = nouveau_bo_verify_access,
-   .sync_obj_signaled = nouveau_bo_fence_signalled,
-   .sync_obj_wait = nouveau_bo_fence_wait,
-   .sync_obj_flush = nouveau_bo_fence_flush,
-   .sync_obj_unref = nouveau_bo_fence_unref,
-   .sync_obj_ref = nouveau_bo_fence_ref,
.fault_reserve_notify = &nouveau_ttm_fault_reserve_notify,
.io_mem_reserve = &nouveau_ttm_io_mem_reserve,
.io_mem_free = &no

[RFC PATCH v1 16/16] drm/ttm: use rcu in core ttm

2014-05-14 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/ttm/ttm_bo.c |   76 +++---
 1 file changed, 13 insertions(+), 63 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 31c4a6dd722d..6fe1f4bf37ed 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -466,66 +466,6 @@ static void ttm_bo_cleanup_refs_or_queue(struct 
ttm_buffer_object *bo)
  ((HZ / 100) < 1) ? 1 : HZ / 100);
 }
 
-static int ttm_bo_unreserve_and_wait(struct ttm_buffer_object *bo,
-bool interruptible)
-{
-   struct ttm_bo_global *glob = bo->glob;
-   struct reservation_object_list *fobj;
-   struct fence *excl = NULL;
-   struct fence **shared = NULL;
-   u32 shared_count = 0, i;
-   int ret = 0;
-
-   fobj = reservation_object_get_list(bo->resv);
-   if (fobj && fobj->shared_count) {
-   shared = kmalloc(sizeof(*shared) * fobj->shared_count,
-GFP_KERNEL);
-
-   if (!shared) {
-   ret = -ENOMEM;
-   __ttm_bo_unreserve(bo);
-   spin_unlock(&glob->lru_lock);
-   return ret;
-   }
-
-   for (i = 0; i < fobj->shared_count; ++i) {
-   if (!fence_is_signaled(fobj->shared[i])) {
-   fence_get(fobj->shared[i]);
-   shared[shared_count++] = fobj->shared[i];
-   }
-   }
-   if (!shared_count) {
-   kfree(shared);
-   shared = NULL;
-   }
-   }
-
-   excl = reservation_object_get_excl(bo->resv);
-   if (excl && !fence_is_signaled(excl))
-   fence_get(excl);
-   else
-   excl = NULL;
-
-   __ttm_bo_unreserve(bo);
-   spin_unlock(&glob->lru_lock);
-
-   if (excl) {
-   ret = fence_wait(excl, interruptible);
-   fence_put(excl);
-   }
-
-   if (shared_count > 0) {
-   for (i = 0; i < shared_count; ++i) {
-   if (!ret)
-   ret = fence_wait(shared[i], interruptible);
-   fence_put(shared[i]);
-   }
-   kfree(shared);
-   }
-
-   return ret;
-}
-
 /**
  * function ttm_bo_cleanup_refs_and_unlock
  * If bo idle, remove from delayed- and lru lists, and unref.
@@ -549,9 +489,19 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
ttm_buffer_object *bo,
ret = ttm_bo_wait(bo, false, false, true);
 
if (ret && !no_wait_gpu) {
-   ret = ttm_bo_unreserve_and_wait(bo, interruptible);
-   if (ret)
-   return ret;
+   long lret;
+   ww_mutex_unlock(&bo->resv->lock);
+   spin_unlock(&glob->lru_lock);
+
+   lret = reservation_object_wait_timeout_rcu(bo->resv,
+  true,
+  interruptible,
+  30 * HZ);
+
+   if (lret < 0)
+   return lret;
+   else if (lret == 0)
+   return -EBUSY;
 
spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 13/16] drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep

2014-05-14 Thread Maarten Lankhorst

With the conversion to the reservation api this should be safe.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c |   28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index 4beaa897adad..c2ca894f6507 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -863,33 +863,29 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void 
*data,
struct drm_gem_object *gem;
struct nouveau_bo *nvbo;
bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
+   bool write = !!(req->flags & NOUVEAU_GEM_CPU_PREP_WRITE);
int ret;
-   struct nouveau_fence *fence = NULL;
 
gem = drm_gem_object_lookup(dev, file_priv, req->handle);
if (!gem)
return -ENOENT;
nvbo = nouveau_gem_object(gem);
 
-   ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
-   if (!ret) {
-   ret = ttm_bo_wait(&nvbo->bo, true, true, true);
-   if (!no_wait && ret) {
-   struct fence *excl;
-
-   excl = reservation_object_get_excl(nvbo->bo.resv);
-   fence = nouveau_fence_ref((struct nouveau_fence *)excl);
-   }
+   if (no_wait)
+   ret = reservation_object_test_signaled_rcu(nvbo->bo.resv, 
write) ? 0 : -EBUSY;
+   else {
+   long lret;
 
-   ttm_bo_unreserve(&nvbo->bo);
+   lret = reservation_object_wait_timeout_rcu(nvbo->bo.resv, 
write, true, 30 * HZ);
+   if (!lret)
+   ret = -EBUSY;
+   else if (lret > 0)
+   ret = 0;
+   else
+   ret = lret;
}
drm_gem_object_unreference_unlocked(gem);
 
-   if (fence) {
-   ret = nouveau_fence_wait(fence, true, no_wait);
-   nouveau_fence_unref(&fence);
-   }
-
return ret;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 15/16] drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab

2014-05-14 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |   17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 20a1a866ceeb..79e950df3018 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -567,13 +567,16 @@ static int vmw_user_dmabuf_synccpu_grab(struct 
vmw_user_dma_buffer *user_bo,
int ret;
 
if (flags & drm_vmw_synccpu_allow_cs) {
-   ret = ttm_bo_reserve(bo, true, !!(flags & 
drm_vmw_synccpu_dontblock), false, 0);
-   if (!ret) {
-   ret = ttm_bo_wait(bo, false, true,
- !!(flags & 
drm_vmw_synccpu_dontblock));
-   ttm_bo_unreserve(bo);
-   }
-   return ret;
+   long lret;
+   if (flags & drm_vmw_synccpu_dontblock)
+   return reservation_object_test_signaled_rcu(bo->resv, 
true) ? 0 : -EBUSY;
+
+   lret = reservation_object_wait_timeout_rcu(bo->resv, true, 
true, MAX_SCHEDULE_TIMEOUT);
+   if (!lret)
+   return -EBUSY;
+   else if (lret < 0)
+   return lret;
+   return 0;
}
 
ret = ttm_bo_synccpu_write_grab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 04/16] drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence

2014-05-14 Thread Maarten Lankhorst

This will ensure we always hold the required lock when calling those functions.
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |2 ++
 drivers/gpu/drm/nouveau/nouveau_display.c |   17 +
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index b6dc85c614be..33eb7164525a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1431,6 +1431,8 @@ nouveau_bo_fence(struct nouveau_bo *nvbo, struct 
nouveau_fence *fence)
struct nouveau_fence *new_fence = nouveau_fence_ref(fence);
struct nouveau_fence *old_fence = NULL;
 
+   lockdep_assert_held(&nvbo->bo.resv->lock.base);
+
spin_lock(&nvbo->bo.bdev->fence_lock);
old_fence = nvbo->bo.sync_obj;
nvbo->bo.sync_obj = new_fence;
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index da764a4ed958..61b8c3375135 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -716,6 +716,9 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
}
 
mutex_lock(&chan->cli->mutex);
+   ret = ttm_bo_reserve(&new_bo->bo, true, false, false, NULL);
+   if (ret)
+   goto fail_unpin;
 
/* synchronise rendering channel with the kernel's channel */
spin_lock(&new_bo->bo.bdev->fence_lock);
@@ -723,12 +726,18 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, struct 
drm_framebuffer *fb,
spin_unlock(&new_bo->bo.bdev->fence_lock);
ret = nouveau_fence_sync(fence, chan);
nouveau_fence_unref(&fence);
-   if (ret)
+   if (ret) {
+   ttm_bo_unreserve(&new_bo->bo);
goto fail_unpin;
+   }
 
-   ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
-   if (ret)
-   goto fail_unpin;
+   if (new_bo != old_bo) {
+   ttm_bo_unreserve(&new_bo->bo);
+
+   ret = ttm_bo_reserve(&old_bo->bo, true, false, false, NULL);
+   if (ret)
+   goto fail_unpin;
+   }
 
/* Initialize a page flip struct */
*s = (struct nouveau_page_flip_state)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 02/16] drm/ttm: kill off some members to ttm_validate_buffer

2014-05-14 Thread Maarten Lankhorst

This reorders the list to keep track of what buffers are reserved,
so previous members are always unreserved.

This gets rid of some bookkeeping that's no longer needed,
while simplifying the code some.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/qxl/qxl_release.c   |1 
 drivers/gpu/drm/ttm/ttm_execbuf_util.c  |  142 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c |1 
 include/drm/ttm/ttm_execbuf_util.h  |3 -
 4 files changed, 50 insertions(+), 97 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 2b43e5deb051..e85c4d274dc0 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -350,7 +350,6 @@ void qxl_release_fence_buffer_objects(struct qxl_release 
*release)
 
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
-   entry->reserved = false;
}
spin_unlock(&bdev->fence_lock);
spin_unlock(&glob->lru_lock);
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c 
b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index 39a11bbd2bac..6db47a72667e 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -32,20 +32,12 @@
 #include 
 #include 
 
-static void ttm_eu_backoff_reservation_locked(struct list_head *list)
+static void ttm_eu_backoff_reservation_reverse(struct list_head *list,
+ struct ttm_validate_buffer *entry)
 {
-   struct ttm_validate_buffer *entry;
-
-   list_for_each_entry(entry, list, head) {
+   list_for_each_entry_continue_reverse(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
-   if (!entry->reserved)
-   continue;
 
-   entry->reserved = false;
-   if (entry->removed) {
-   ttm_bo_add_to_lru(bo);
-   entry->removed = false;
-   }
__ttm_bo_unreserve(bo);
}
 }
@@ -56,27 +48,9 @@ static void ttm_eu_del_from_lru_locked(struct list_head 
*list)
 
list_for_each_entry(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
-   if (!entry->reserved)
-   continue;
+   unsigned put_count = ttm_bo_del_from_lru(bo);
 
-   if (!entry->removed) {
-   entry->put_count = ttm_bo_del_from_lru(bo);
-   entry->removed = true;
-   }
-   }
-}
-
-static void ttm_eu_list_ref_sub(struct list_head *list)
-{
-   struct ttm_validate_buffer *entry;
-
-   list_for_each_entry(entry, list, head) {
-   struct ttm_buffer_object *bo = entry->bo;
-
-   if (entry->put_count) {
-   ttm_bo_list_ref_sub(bo, entry->put_count, true);
-   entry->put_count = 0;
-   }
+   ttm_bo_list_ref_sub(bo, put_count, true);
}
 }
 
@@ -91,11 +65,18 @@ void ttm_eu_backoff_reservation(struct ww_acquire_ctx 
*ticket,
 
entry = list_first_entry(list, struct ttm_validate_buffer, head);
glob = entry->bo->glob;
+
spin_lock(&glob->lru_lock);
-   ttm_eu_backoff_reservation_locked(list);
+   list_for_each_entry(entry, list, head) {
+   struct ttm_buffer_object *bo = entry->bo;
+
+   ttm_bo_add_to_lru(bo);
+   __ttm_bo_unreserve(bo);
+   }
+   spin_unlock(&glob->lru_lock);
+
if (ticket)
ww_acquire_fini(ticket);
-   spin_unlock(&glob->lru_lock);
 }
 EXPORT_SYMBOL(ttm_eu_backoff_reservation);
 
@@ -121,64 +102,55 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
if (list_empty(list))
return 0;
 
-   list_for_each_entry(entry, list, head) {
-   entry->reserved = false;
-   entry->put_count = 0;
-   entry->removed = false;
-   }
-
entry = list_first_entry(list, struct ttm_validate_buffer, head);
glob = entry->bo->glob;
 
if (ticket)
ww_acquire_init(ticket, &reservation_ww_class);
-retry:
+
list_for_each_entry(entry, list, head) {
struct ttm_buffer_object *bo = entry->bo;
 
-   /* already slowpath reserved? */
-   if (entry->reserved)
-   continue;
-
ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
   ticket);
+   if (!ret && unlikely(atomic_read(&bo->cpu_writers) > 0)) {
+   __ttm_bo_unreserve(bo);
 
-   if (ret == -EDEADLK) {
-   /* uh oh, we lost out, drop every reservation and try
-* to only reserve th

[RFC PATCH v1 05/16] drm/ttm: call ttm_bo_wait while inside a reservation

2014-05-14 Thread Maarten Lankhorst

This is the last remaining function that doesn't use the reservation
lock completely to fence off access to a buffer.
---
 drivers/gpu/drm/ttm/ttm_bo.c |   25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4ab9f7171c4f..d7d34336f108 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -502,17 +502,6 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
ttm_buffer_object *bo,
if (ret)
return ret;
 
-   /*
-* remove sync_obj with ttm_bo_wait, the wait should be
-* finished, and no new wait object should have been added.
-*/
-   spin_lock(&bdev->fence_lock);
-   ret = ttm_bo_wait(bo, false, false, true);
-   WARN_ON(ret);
-   spin_unlock(&bdev->fence_lock);
-   if (ret)
-   return ret;
-
spin_lock(&glob->lru_lock);
ret = __ttm_bo_reserve(bo, false, true, false, 0);
 
@@ -528,8 +517,16 @@ static int ttm_bo_cleanup_refs_and_unlock(struct 
ttm_buffer_object *bo,
spin_unlock(&glob->lru_lock);
return 0;
}
-   } else
-   spin_unlock(&bdev->fence_lock);
+
+   /*
+* remove sync_obj with ttm_bo_wait, the wait should be
+* finished, and no new wait object should have been added.
+*/
+   spin_lock(&bdev->fence_lock);
+   ret = ttm_bo_wait(bo, false, false, true);
+   WARN_ON(ret);
+   }
+   spin_unlock(&bdev->fence_lock);
 
if (ret || unlikely(list_empty(&bo->ddestroy))) {
__ttm_bo_unreserve(bo);
@@ -1539,6 +1536,8 @@ int ttm_bo_wait(struct ttm_buffer_object *bo,
void *sync_obj;
int ret = 0;
 
+   lockdep_assert_held(&bo->resv->lock.base);
+
if (likely(bo->sync_obj == NULL))
return 0;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 03/16] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

2014-05-14 Thread Maarten Lankhorst

Apart from some code inside ttm itself and nouveau_bo_vma_del,
this is the only place where ttm_bo_wait is used without a reservation.
Fix this so we can remove the fence_lock later on.

After the switch to rcu the reservation lock will be
removed again.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c |   22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index c90c0dc0afe8..6e1c58a880fe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -886,17 +886,31 @@ nouveau_gem_ioctl_cpu_prep(struct drm_device *dev, void 
*data,
struct drm_gem_object *gem;
struct nouveau_bo *nvbo;
bool no_wait = !!(req->flags & NOUVEAU_GEM_CPU_PREP_NOWAIT);
-   int ret = -EINVAL;
+   int ret;
+   struct nouveau_fence *fence = NULL;
 
gem = drm_gem_object_lookup(dev, file_priv, req->handle);
if (!gem)
return -ENOENT;
nvbo = nouveau_gem_object(gem);
 
-   spin_lock(&nvbo->bo.bdev->fence_lock);
-   ret = ttm_bo_wait(&nvbo->bo, true, true, no_wait);
-   spin_unlock(&nvbo->bo.bdev->fence_lock);
+   ret = ttm_bo_reserve(&nvbo->bo, true, false, false, 0);
+   if (!ret) {
+   spin_lock(&nvbo->bo.bdev->fence_lock);
+   ret = ttm_bo_wait(&nvbo->bo, true, true, true);
+   if (!no_wait && ret)
+   fence = nouveau_fence_ref(nvbo->bo.sync_obj);
+   spin_unlock(&nvbo->bo.bdev->fence_lock);
+
+   ttm_bo_unreserve(&nvbo->bo);
+   }
drm_gem_object_unreference_unlocked(gem);
+
+   if (fence) {
+   ret = nouveau_fence_wait(fence, true, no_wait);
+   nouveau_fence_unref(&fence);
+   }
+
return ret;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH v1 01/16] drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers

2014-05-14 Thread Maarten Lankhorst

It seems some drivers really want this as a parameter,
like vmwgfx.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/qxl/qxl_release.c|2 +-
 drivers/gpu/drm/radeon/radeon_object.c   |2 +-
 drivers/gpu/drm/radeon/radeon_uvd.c  |2 +-
 drivers/gpu/drm/radeon/radeon_vm.c   |2 +-
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   |   22 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |7 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |2 +-
 include/drm/ttm/ttm_execbuf_util.h   |9 +
 8 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 14e776f1d14e..2b43e5deb051 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -159,7 +159,7 @@ int qxl_release_reserve_list(struct qxl_release *release, 
bool no_intr)
if (list_is_singular(&release->bos))
return 0;
 
-   ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos);
+   ret = ttm_eu_reserve_buffers(&release->ticket, &release->bos, !no_intr);
if (ret)
return ret;
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index 19bec0dbfa38..51bf80cdce5c 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -438,7 +438,7 @@ int radeon_bo_list_validate(struct radeon_device *rdev,
u64 bytes_moved = 0, initial_bytes_moved;
u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
 
-   r = ttm_eu_reserve_buffers(ticket, head);
+   r = ttm_eu_reserve_buffers(ticket, head, true);
if (unlikely(r != 0)) {
return r;
}
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c 
b/drivers/gpu/drm/radeon/radeon_uvd.c
index 1b65ae2433cd..2f93fef15aab 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -620,7 +620,7 @@ static int radeon_uvd_send_msg(struct radeon_device *rdev,
INIT_LIST_HEAD(&head);
list_add(&tv.head, &head);
 
-   r = ttm_eu_reserve_buffers(&ticket, &head);
+   r = ttm_eu_reserve_buffers(&ticket, &head, true);
if (r)
return r;
 
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c 
b/drivers/gpu/drm/radeon/radeon_vm.c
index 2aae6ce49d32..f4fd72477a71 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -364,7 +364,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,
 INIT_LIST_HEAD(&head);
 list_add(&tv.head, &head);
 
-r = ttm_eu_reserve_buffers(&ticket, &head);
+r = ttm_eu_reserve_buffers(&ticket, &head, true);
 if (r)
return r;
 
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c 
b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index e8dac8758528..39a11bbd2bac 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -112,7 +112,7 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
  */
 
 int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
-  struct list_head *list)
+  struct list_head *list, bool intr)
 {
struct ttm_bo_global *glob;
struct ttm_validate_buffer *entry;
@@ -140,7 +140,7 @@ retry:
if (entry->reserved)
continue;
 
-   ret = __ttm_bo_reserve(bo, true, (ticket == NULL), true,
+   ret = __ttm_bo_reserve(bo, intr, (ticket == NULL), true,
   ticket);
 
if (ret == -EDEADLK) {
@@ -153,13 +153,17 @@ retry:
ttm_eu_backoff_reservation_locked(list);
spin_unlock(&glob->lru_lock);
ttm_eu_list_ref_sub(list);
-   ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
-  ticket);
-   if (unlikely(ret != 0)) {
-   if (ret == -EINTR)
-   ret = -ERESTARTSYS;
-   goto err_fini;
-   }
+
+   if (intr) {
+   ret = 
ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+  ticket);
+   if (unlikely(ret != 0)) {
+   if (ret == -EINTR)
+   ret = -ERESTARTSYS;
+   goto err_fini;
+   }
+   } else
+   ww_mutex_lock_slow(&bo->resv->lock, ticket);

[RFC PATCH v1 00/16] Convert all ttm drivers to use the new reservation interface

2014-05-14 Thread Maarten Lankhorst

This series depends on the previously posted reservation api patches.
2 of them are not yet in for-next-fences branch of
git://git.linaro.org/people/sumit.semwal/linux-3.x.git

The missing patches are still in my vmwgfx_wip branch at
git://people.freedesktop.org/~mlankhorst/linux

All ttm drivers are converted to the fence api, fence_lock is removed
and rcu is used in its place.

qxl is the first driver to use shared fence slots, but when these patches
are applied it's easy to convert nouveau too. I've done it as part of the
cross-device gpu synchronization patch series.

---

Maarten Lankhorst (16):
  drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
  drm/ttm: kill off some members to ttm_validate_buffer
  drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
  drm/nouveau: require reservations for nouveau_fence_sync and 
nouveau_bo_fence
  drm/ttm: call ttm_bo_wait while inside a reservation
  drm/ttm: kill fence_lock
  drm/nouveau: rework to new fence interface
  drm/radeon: use common fence implementation for fences
  drm/qxl: rework to new fence interface
  drm/vmwgfx: get rid of different types of fence_flags entirely
  drm/vmwgfx: rework to new fence interface
  drm/ttm: flip the switch, and convert to dma_fence
  drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
  drm/radeon: use rcu waits in some ioctls
  drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
  drm/ttm: use rcu in core ttm

 drivers/gpu/drm/nouveau/core/core/event.c |4 
 drivers/gpu/drm/nouveau/nouveau_bo.c  |   59 +---
 drivers/gpu/drm/nouveau/nouveau_display.c |   25 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c   |  430 +++--
 drivers/gpu/drm/nouveau/nouveau_fence.h   |   22 +
 drivers/gpu/drm/nouveau/nouveau_gem.c |   55 +---
 drivers/gpu/drm/nouveau/nv04_fence.c  |4 
 drivers/gpu/drm/nouveau/nv10_fence.c  |4 
 drivers/gpu/drm/nouveau/nv17_fence.c  |2 
 drivers/gpu/drm/nouveau/nv50_fence.c  |2 
 drivers/gpu/drm/nouveau/nv84_fence.c  |   11 -
 drivers/gpu/drm/qxl/Makefile  |2 
 drivers/gpu/drm/qxl/qxl_cmd.c |7 
 drivers/gpu/drm/qxl/qxl_debugfs.c |   16 +
 drivers/gpu/drm/qxl/qxl_drv.h |   20 -
 drivers/gpu/drm/qxl/qxl_fence.c   |   91 --
 drivers/gpu/drm/qxl/qxl_kms.c |1 
 drivers/gpu/drm/qxl/qxl_object.c  |2 
 drivers/gpu/drm/qxl/qxl_object.h  |6 
 drivers/gpu/drm/qxl/qxl_release.c |  172 ++--
 drivers/gpu/drm/qxl/qxl_ttm.c |   93 --
 drivers/gpu/drm/radeon/radeon.h   |   15 -
 drivers/gpu/drm/radeon/radeon_cs.c|   10 +
 drivers/gpu/drm/radeon/radeon_device.c|1 
 drivers/gpu/drm/radeon/radeon_display.c   |   20 +
 drivers/gpu/drm/radeon/radeon_fence.c |  191 ++---
 drivers/gpu/drm/radeon/radeon_gem.c   |   19 +
 drivers/gpu/drm/radeon/radeon_object.c|8 -
 drivers/gpu/drm/radeon/radeon_ttm.c   |   34 --
 drivers/gpu/drm/radeon/radeon_uvd.c   |   10 -
 drivers/gpu/drm/ttm/ttm_bo.c  |  187 ++---
 drivers/gpu/drm/ttm/ttm_bo_util.c |   28 --
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |3 
 drivers/gpu/drm/ttm/ttm_execbuf_util.c|  146 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c|   47 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |1 
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c   |   24 --
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c |  329 --
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h |   35 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |   43 +--
 include/drm/ttm/ttm_bo_api.h  |7 
 include/drm/ttm/ttm_bo_driver.h   |   29 --
 include/drm/ttm/ttm_execbuf_util.h|   22 +
 43 files changed, 1107 insertions(+), 1130 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/2 with seqcount v3] reservation: add suppport for read-only access using rcu

2014-04-29 Thread Maarten Lankhorst


op 23-04-14 13:15, Maarten Lankhorst schreef:

This adds 4 more functions to deal with rcu.

reservation_object_get_fences_rcu() will obtain the list of shared
and exclusive fences without obtaining the ww_mutex.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl() is added because touching the fence_excl
member directly will trigger a sparse warning.

Signed-off-by: Maarten Lankhorst 
---
Using seqcount and fixing some lockdep bugs.
Changes since v2:
- Fix some crashes, remove some unneeded barriers when provided by seqcount 
writes
- Fix code to work correctly with sparse's RCU annotations.
- Create a global string for the seqcount lock to make lockdep happy.

Can I get this version reviewed? If it looks correct I'll mail the full series
because it's intertwined with the TTM conversion to use this code.

Ping, can anyone review this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/2 with seqcount v3] reservation: add suppport for read-only access using rcu

2014-04-23 Thread Maarten Lankhorst


This adds 4 more functions to deal with rcu.

reservation_object_get_fences_rcu() will obtain the list of shared
and exclusive fences without obtaining the ww_mutex.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl() is added because touching the fence_excl
member directly will trigger a sparse warning.

Signed-off-by: Maarten Lankhorst 
---
Using seqcount and fixing some lockdep bugs.
Changes since v2:
- Fix some crashes, remove some unneeded barriers when provided by seqcount 
writes
- Fix code to work correctly with sparse's RCU annotations.
- Create a global string for the seqcount lock to make lockdep happy.

Can I get this version reviewed? If it looks correct I'll mail the full series
because it's intertwined with the TTM conversion to use this code.

See http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=vmwgfx_wip
---
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index d89a98d2c37b..0df673f812eb 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -137,7 +137,7 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
struct reservation_object_list *fobj;
struct fence *fence_excl;
unsigned long events;
-   unsigned shared_count;
+   unsigned shared_count, seq;
 
 	dmabuf = file->private_data;

if (!dmabuf || !dmabuf->resv)
@@ -151,14 +151,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!events)
return 0;
 
-	ww_mutex_lock(&resv->lock, NULL);

+retry:
+   seq = read_seqcount_begin(&resv->seq);
+   rcu_read_lock();
 
-	fobj = resv->fence;

-   if (!fobj)
-   goto out;
-
-   shared_count = fobj->shared_count;
-   fence_excl = resv->fence_excl;
+   fobj = rcu_dereference(resv->fence);
+   if (fobj)
+   shared_count = fobj->shared_count;
+   else
+   shared_count = 0;
+   fence_excl = rcu_dereference(resv->fence_excl);
+   if (read_seqcount_retry(&resv->seq, seq)) {
+   rcu_read_unlock();
+   goto retry;
+   }
 
 	if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {

struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
@@ -176,14 +182,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
 		if (events & pevents) {

-   if (!fence_add_callback(fence_excl, &dcb->cb,
+   if (!fence_get_rcu(fence_excl)) {
+   /* force a recheck */
+   events &= ~pevents;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   } else if (!fence_add_callback(fence_excl, &dcb->cb,
   dma_buf_poll_cb)) {
events &= ~pevents;
+   fence_put(fence_excl);
} else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
+   fence_put(fence_excl);
dma_buf_poll_cb(NULL, &dcb->cb);
}
}
@@ -205,13 +217,26 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
goto out;
 
 		for (i = 0; i < shared_count; ++i) {

-   struct fence *fence = fobj->shared[i];
+   struct fence *fence = rcu_dereference(fobj->shared[i]);
 
+			if (!fence_get_rcu(fence)) {

+   /*
+* fence refcount dropped to zero, this means
+* that fobj has been freed
+*
+* call dma_buf_poll_cb and force a recheck!
+*/
+   events &= ~POLLOUT;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   break;
+   }
if (!fence_add_callback(fence, &dcb->cb,
dma_buf_poll_cb)) {
+   fence_put(fence);
events &= ~POLLOUT;
break;
}
+   fence_put(fence);
}
 
 		/* No callback queued, wake up any

Re: [PATCH 2/2] [RFC v2 with seqcount] reservation: add suppport for read-only access using rcu

2014-04-14 Thread Maarten Lankhorst


op 11-04-14 21:35, Thomas Hellstrom schreef:

On 04/11/2014 08:09 PM, Maarten Lankhorst wrote:

op 11-04-14 12:11, Thomas Hellstrom schreef:

On 04/11/2014 11:24 AM, Maarten Lankhorst wrote:

op 11-04-14 10:38, Thomas Hellstrom schreef:

Hi, Maarten.

Here I believe we encounter a lot of locking inconsistencies.

First, it seems you're use a number of pointers as RCU pointers
without
annotating them as such and use the correct rcu
macros when assigning those pointers.

Some pointers (like the pointers in the shared fence list) are both
used
as RCU pointers (in dma_buf_poll()) for example,
or considered protected by the seqlock
(reservation_object_get_fences_rcu()), which I believe is OK, but then
the pointers must
be assigned using the correct rcu macros. In the memcpy in
reservation_object_get_fences_rcu() we might get away with an
ugly typecast, but with a verbose comment that the pointers are
considered protected by the seqlock at that location.

So I've updated (attached) the headers with proper __rcu annotation
and
locking comments according to how they are being used in the various
reading functions.
I believe if we want to get rid of this we need to validate those
pointers using the seqlock as well.
This will generate a lot of sparse warnings in those places needing
rcu_dereference()
rcu_assign_pointer()
rcu_dereference_protected()

With this I think we can get rid of all ACCESS_ONCE macros: It's not
needed when the rcu_x() macros are used, and
it's never needed for the members protected by the seqlock, (provided
that the seq is tested). The only place where I think that's
*not* the case is at the krealloc in
reservation_object_get_fences_rcu().

Also I have some more comments in the
reservation_object_get_fences_rcu() function below:

I felt that the barriers needed for rcu were already provided by
checking the seqcount lock.
But looking at rcu_dereference makes it seem harmless to add it in
more places, it handles
the ACCESS_ONCE and barrier() for us.

And it makes the code more maintainable, and helps sparse doing a lot of
checking for us. I guess
we can tolerate a couple of extra barriers for that.


We could probably get away with using RCU_INIT_POINTER on the writer
side,
because the smp_wmb is already done by arranging seqcount updates
correctly.

Hmm. yes, probably. At least in the replace function. I think if we do
it in other places, we should add comments as to where
the smp_wmb() is located, for future reference.


Also  I saw in a couple of places where you're checking the shared
pointers, you're not checking for NULL pointers, which I guess may
happen if shared_count and pointers are not in full sync?


No, because shared_count is protected with seqcount. I only allow
appending to the array, so when
shared_count is validated by seqcount it means that the
[0...shared_count) indexes are valid and non-null.
What could happen though is that the fence at a specific index is
updated with another one from the same
context, but that's harmless.


Hmm, doesn't attaching an exclusive fence clear all shared fence
pointers from under a reader?


No, for that reason. It only resets shared_count to 0. This is harmless because 
the shared fence pointers are
still valid long enough because of RCU delayed deletion. fence_get_rcu will 
fail when the refcount has
dropped to zero. This is enough of a check to prevent errors, so there's no 
need to explicitly clear the fence
pointers.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] [RFC v2 with seqcount] reservation: add suppport for read-only access using rcu

2014-04-14 Thread Maarten Lankhorst


op 11-04-14 21:30, Thomas Hellstrom schreef:

Hi!

On 04/11/2014 08:09 PM, Maarten Lankhorst wrote:

op 11-04-14 12:11, Thomas Hellstrom schreef:

On 04/11/2014 11:24 AM, Maarten Lankhorst wrote:

op 11-04-14 10:38, Thomas Hellstrom schreef:

Hi, Maarten.

Here I believe we encounter a lot of locking inconsistencies.

First, it seems you're use a number of pointers as RCU pointers
without
annotating them as such and use the correct rcu
macros when assigning those pointers.

Some pointers (like the pointers in the shared fence list) are both
used
as RCU pointers (in dma_buf_poll()) for example,
or considered protected by the seqlock
(reservation_object_get_fences_rcu()), which I believe is OK, but then
the pointers must
be assigned using the correct rcu macros. In the memcpy in
reservation_object_get_fences_rcu() we might get away with an
ugly typecast, but with a verbose comment that the pointers are
considered protected by the seqlock at that location.

So I've updated (attached) the headers with proper __rcu annotation
and
locking comments according to how they are being used in the various
reading functions.
I believe if we want to get rid of this we need to validate those
pointers using the seqlock as well.
This will generate a lot of sparse warnings in those places needing
rcu_dereference()
rcu_assign_pointer()
rcu_dereference_protected()

With this I think we can get rid of all ACCESS_ONCE macros: It's not
needed when the rcu_x() macros are used, and
it's never needed for the members protected by the seqlock, (provided
that the seq is tested). The only place where I think that's
*not* the case is at the krealloc in
reservation_object_get_fences_rcu().

Also I have some more comments in the
reservation_object_get_fences_rcu() function below:

I felt that the barriers needed for rcu were already provided by
checking the seqcount lock.
But looking at rcu_dereference makes it seem harmless to add it in
more places, it handles
the ACCESS_ONCE and barrier() for us.

And it makes the code more maintainable, and helps sparse doing a lot of
checking for us. I guess
we can tolerate a couple of extra barriers for that.


We could probably get away with using RCU_INIT_POINTER on the writer
side,
because the smp_wmb is already done by arranging seqcount updates
correctly.

Hmm. yes, probably. At least in the replace function. I think if we do
it in other places, we should add comments as to where
the smp_wmb() is located, for future reference.


Also  I saw in a couple of places where you're checking the shared
pointers, you're not checking for NULL pointers, which I guess may
happen if shared_count and pointers are not in full sync?


No, because shared_count is protected with seqcount. I only allow
appending to the array, so when
shared_count is validated by seqcount it means that the
[0...shared_count) indexes are valid and non-null.
What could happen though is that the fence at a specific index is
updated with another one from the same
context, but that's harmless.

Hmm.
Shouldn't we have a way to clean signaled fences from reservation
objects? Perhaps when we attach a new fence, or after a wait with
ww_mutex held? Otherwise we'd have a lot of completely unused fence
objects hanging around for no reason. I don't think we need to be as
picky as TTM, but I think we should do something?


Calling reservation_object_add_excl_fence with a NULL fence works, I do this in 
ttm_bo_wait().
It requires ww_mutex.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] [RFC v2 with seqcount] reservation: add suppport for read-only access using rcu

2014-04-11 Thread Maarten Lankhorst


op 11-04-14 12:11, Thomas Hellstrom schreef:

On 04/11/2014 11:24 AM, Maarten Lankhorst wrote:

op 11-04-14 10:38, Thomas Hellstrom schreef:

Hi, Maarten.

Here I believe we encounter a lot of locking inconsistencies.

First, it seems you're use a number of pointers as RCU pointers without
annotating them as such and use the correct rcu
macros when assigning those pointers.

Some pointers (like the pointers in the shared fence list) are both used
as RCU pointers (in dma_buf_poll()) for example,
or considered protected by the seqlock
(reservation_object_get_fences_rcu()), which I believe is OK, but then
the pointers must
be assigned using the correct rcu macros. In the memcpy in
reservation_object_get_fences_rcu() we might get away with an
ugly typecast, but with a verbose comment that the pointers are
considered protected by the seqlock at that location.

So I've updated (attached) the headers with proper __rcu annotation and
locking comments according to how they are being used in the various
reading functions.
I believe if we want to get rid of this we need to validate those
pointers using the seqlock as well.
This will generate a lot of sparse warnings in those places needing
rcu_dereference()
rcu_assign_pointer()
rcu_dereference_protected()

With this I think we can get rid of all ACCESS_ONCE macros: It's not
needed when the rcu_x() macros are used, and
it's never needed for the members protected by the seqlock, (provided
that the seq is tested). The only place where I think that's
*not* the case is at the krealloc in
reservation_object_get_fences_rcu().

Also I have some more comments in the
reservation_object_get_fences_rcu() function below:

I felt that the barriers needed for rcu were already provided by
checking the seqcount lock.
But looking at rcu_dereference makes it seem harmless to add it in
more places, it handles
the ACCESS_ONCE and barrier() for us.

And it makes the code more maintainable, and helps sparse doing a lot of
checking for us. I guess
we can tolerate a couple of extra barriers for that.


We could probably get away with using RCU_INIT_POINTER on the writer
side,
because the smp_wmb is already done by arranging seqcount updates
correctly.

Hmm. yes, probably. At least in the replace function. I think if we do
it in other places, we should add comments as to where
the smp_wmb() is located, for future reference.


Also  I saw in a couple of places where you're checking the shared
pointers, you're not checking for NULL pointers, which I guess may
happen if shared_count and pointers are not in full sync?


No, because shared_count is protected with seqcount. I only allow appending to 
the array, so when
shared_count is validated by seqcount it means that the [0...shared_count) 
indexes are valid and non-null.
What could happen though is that the fence at a specific index is updated with 
another one from the same
context, but that's harmless.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] [RFC v2 with seqcount] reservation: add suppport for read-only access using rcu

2014-04-11 Thread Maarten Lankhorst


op 11-04-14 10:38, Thomas Hellstrom schreef:

Hi, Maarten.

Here I believe we encounter a lot of locking inconsistencies.

First, it seems you're use a number of pointers as RCU pointers without
annotating them as such and use the correct rcu
macros when assigning those pointers.

Some pointers (like the pointers in the shared fence list) are both used
as RCU pointers (in dma_buf_poll()) for example,
or considered protected by the seqlock
(reservation_object_get_fences_rcu()), which I believe is OK, but then
the pointers must
be assigned using the correct rcu macros. In the memcpy in
reservation_object_get_fences_rcu() we might get away with an
ugly typecast, but with a verbose comment that the pointers are
considered protected by the seqlock at that location.

So I've updated (attached) the headers with proper __rcu annotation and
locking comments according to how they are being used in the various
reading functions.
I believe if we want to get rid of this we need to validate those
pointers using the seqlock as well.
This will generate a lot of sparse warnings in those places needing
rcu_dereference()
rcu_assign_pointer()
rcu_dereference_protected()

With this I think we can get rid of all ACCESS_ONCE macros: It's not
needed when the rcu_x() macros are used, and
it's never needed for the members protected by the seqlock, (provided
that the seq is tested). The only place where I think that's
*not* the case is at the krealloc in reservation_object_get_fences_rcu().

Also I have some more comments in the
reservation_object_get_fences_rcu() function below:

I felt that the barriers needed for rcu were already provided by checking the 
seqcount lock.
But looking at rcu_dereference makes it seem harmless to add it in more places, 
it handles
the ACCESS_ONCE and barrier() for us.

We could probably get away with using RCU_INIT_POINTER on the writer side,
because the smp_wmb is already done by arranging seqcount updates correctly.


diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index d89a98d2c37b..ca6ef0c4b358 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c

+int reservation_object_get_fences_rcu(struct reservation_object *obj,
+  struct fence **pfence_excl,
+  unsigned *pshared_count,
+  struct fence ***pshared)
+{
+unsigned shared_count = 0;
+unsigned retry = 1;
+struct fence **shared = NULL, *fence_excl = NULL;
+int ret = 0;
+
+while (retry) {
+struct reservation_object_list *fobj;
+unsigned seq, retry;
You're shadowing retry?

Oops.



+
+seq = read_seqcount_begin(&obj->seq);
+
+rcu_read_lock();
+
+fobj = ACCESS_ONCE(obj->fence);
+if (fobj) {
+struct fence **nshared;
+
+shared_count = ACCESS_ONCE(fobj->shared_count);
+nshared = krealloc(shared, sizeof(*shared) *
shared_count, GFP_KERNEL);

krealloc inside rcu_read_lock(). Better to put this first in the loop.

Except that shared_count isn't known until the rcu_read_lock is taken.

Thanks,
Thomas

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] [RFC v2 with seqcount] reservation: add suppport for read-only access using rcu

2014-04-10 Thread Maarten Lankhorst


op 10-04-14 13:08, Thomas Hellstrom schreef:

On 04/10/2014 12:07 PM, Maarten Lankhorst wrote:

Hey,

op 10-04-14 10:46, Thomas Hellstrom schreef:

Hi!

Ugh. This became more complicated than I thought, but I'm OK with moving
TTM over to fence while we sort out
how / if we're going to use this.

While reviewing, it struck me that this is kind of error-prone, and hard
to follow since we're operating on a structure that may be
continually updated under us, needing a lot of RCU-specific macros and
barriers.

Yeah, but with the exception of dma_buf_poll I don't think there is
anything else
outside drivers/base/reservation.c has to deal with rcu.


Also the rcu wait appears to not complete until there are no busy fences
left (new ones can be added while we wait) rather than
waiting on a snapshot of busy fences.

This has been by design, because 'wait for bo idle' type of functions
only care
if the bo is completely idle or not.

No, not when using RCU, because the bo may be busy again before the
function returns :)
Complete idleness can only be guaranteed if holding the reservation, or
otherwise making sure
that no new rendering is submitted to the buffer, so it's an overkill to
wait for complete idleness here.

You're probably right, but it makes waiting a lot easier if I don't have to 
deal with memory allocations. :P

It would be easy to make a snapshot even without seqlocks, just copy
reservation_object_test_signaled_rcu to return a shared list if
test_all is set, or return pointer to exclusive otherwise.


I wonder if these issues can be addressed by having a function that
provides a snapshot of all busy fences: This can be accomplished
either by including the exclusive fence in the fence_list structure and
allocate a new such structure each time it is updated. The RCU reader
could then just make a copy of the current fence_list structure pointed
to by &obj->fence, but I'm not sure we want to reallocate *each* time we
update the fence pointer.

No, the most common operation is updating fence pointers, which is why
the current design makes that cheap. It's also why doing rcu reads is
more expensive.

The other approach uses a seqlock to obtain a consistent snapshot, and
I've attached an incomplete outline, and I'm not 100% whether it's OK to
combine RCU and seqlocks in this way...

Both these approaches have the benefit of hiding the RCU snapshotting in
a single function, that can then be used by any waiting
or polling function.


I think the middle way with using seqlocks to protect the fence_excl
pointer and shared list combination,
and using RCU to protect the refcounts for fences and the availability
of the list could work for our usecase
and might remove a bunch of memory barriers. But yeah that depends on
layering rcu and seqlocks.
No idea if that is allowed. But I suppose it is.

Also, you're being overly paranoid with seqlock reading, we would only
need something like this:

rcu_read_lock()
 preempt_disable()
 seq = read_seqcount_begin()
 read fence_excl, shared_count = ACCESS_ONCE(fence->shared_count)
 copy shared to a struct.
 if (read_seqcount_retry()) { unlock and retry }
   preempt_enable();
   use fence_get_rcu() to bump refcount on everything, if that fails
unlock, put, and retry
rcu_read_unlock()

But the shared list would still need to be RCU'd, to make sure we're
not reading freed garbage.

Ah. OK,
But I think we should use rcu inside seqcount, because
read_seqcount_begin() may spin for a long time if there are
many writers. Also I don't think the preempt_disable() is needed for
read_seq critical sections other than they might
decrease the risc of retries..


Reading the seqlock code makes me suspect that's the case too. The lockdep code 
calls
local_irq_disable, so it's probably safe without preemption disabled.

~Maarten

I like the ability of not allocating memory, so I kept 
reservation_object_wait_timeout_rcu mostly
the way it was. This code appears to fail on nouveau when using the shared 
members,
but I'm not completely sure whether the error is in nouveau or this code yet.

--8<
[RFC v2] reservation: add suppport for read-only access using rcu

This adds 4 more functions to deal with rcu.

reservation_object_get_fences_rcu() will obtain the list of shared
and exclusive fences without obtaining the ww_mutex.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl() is added because touching the fence_excl
member directly will trigger a sparse warning.

Signed-off-by: Maarten Lankhorst 

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index d89a98d2c37b..ca6ef0c4b358 100644
--- a/drivers/base/dma-buf.c

Re: [PATCH 2/2] [RFC] reservation: add suppport for read-only access using rcu

2014-04-10 Thread Maarten Lankhorst


Hey,

op 10-04-14 10:46, Thomas Hellstrom schreef:

Hi!

Ugh. This became more complicated than I thought, but I'm OK with moving
TTM over to fence while we sort out
how / if we're going to use this.

While reviewing, it struck me that this is kind of error-prone, and hard
to follow since we're operating on a structure that may be
continually updated under us, needing a lot of RCU-specific macros and
barriers.

Yeah, but with the exception of dma_buf_poll I don't think there is anything 
else
outside drivers/base/reservation.c has to deal with rcu.


Also the rcu wait appears to not complete until there are no busy fences
left (new ones can be added while we wait) rather than
waiting on a snapshot of busy fences.

This has been by design, because 'wait for bo idle' type of functions only care
if the bo is completely idle or not.

It would be easy to make a snapshot even without seqlocks, just copy
reservation_object_test_signaled_rcu to return a shared list if test_all is 
set, or return pointer to exclusive otherwise.


I wonder if these issues can be addressed by having a function that
provides a snapshot of all busy fences: This can be accomplished
either by including the exclusive fence in the fence_list structure and
allocate a new such structure each time it is updated. The RCU reader
could then just make a copy of the current fence_list structure pointed
to by &obj->fence, but I'm not sure we want to reallocate *each* time we
update the fence pointer.

No, the most common operation is updating fence pointers, which is why
the current design makes that cheap. It's also why doing rcu reads is more 
expensive.

The other approach uses a seqlock to obtain a consistent snapshot, and
I've attached an incomplete outline, and I'm not 100% whether it's OK to
combine RCU and seqlocks in this way...

Both these approaches have the benefit of hiding the RCU snapshotting in
a single function, that can then be used by any waiting
or polling function.



I think the middle way with using seqlocks to protect the fence_excl pointer 
and shared list combination,
and using RCU to protect the refcounts for fences and the availability of the 
list could work for our usecase
and might remove a bunch of memory barriers. But yeah that depends on layering 
rcu and seqlocks.
No idea if that is allowed. But I suppose it is.

Also, you're being overly paranoid with seqlock reading, we would only need 
something like this:

rcu_read_lock()
preempt_disable()
seq = read_seqcount_begin();
read fence_excl, shared_count = ACCESS_ONCE(fence->shared_count)
copy shared to a struct.
if (read_seqcount_retry()) { unlock and retry }
  preempt_enable();
  use fence_get_rcu() to bump refcount on everything, if that fails unlock, 
put, and retry
rcu_read_unlock()

But the shared list would still need to be RCU'd, to make sure we're not 
reading freed garbage.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] reservation: update api and add some helpers

2014-04-09 Thread Maarten Lankhorst

Move the list of shared fences to a struct, and return it in
reservation_object_get_list().

Add reservation_object_reserve_shared(), which reserves space
in the reservation_object for 1 more shared fence.

reservation_object_add_shared_fence() and
reservation_object_add_excl_fence() are used to assign a new
fence to a reservation_object pointer, to complete a reservation.

Signed-off-by: Maarten Lankhorst 
---
 drivers/base/dma-buf.c  |   35 +++---
 drivers/base/fence.c|4 +
 drivers/base/reservation.c  |  154 +++
 include/linux/fence.h   |6 ++
 include/linux/reservation.h |   48 +++--
 kernel/sched/core.c |1 
 6 files changed, 228 insertions(+), 20 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 96338bf7f457..d89a98d2c37b 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -134,7 +134,10 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
 {
struct dma_buf *dmabuf;
struct reservation_object *resv;
+   struct reservation_object_list *fobj;
+   struct fence *fence_excl;
unsigned long events;
+   unsigned shared_count;
 
dmabuf = file->private_data;
if (!dmabuf || !dmabuf->resv)
@@ -150,12 +153,18 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
 
ww_mutex_lock(&resv->lock, NULL);
 
-   if (resv->fence_excl && (!(events & POLLOUT) ||
-resv->fence_shared_count == 0)) {
+   fobj = resv->fence;
+   if (!fobj)
+   goto out;
+
+   shared_count = fobj->shared_count;
+   fence_excl = resv->fence_excl;
+
+   if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
unsigned long pevents = POLLIN;
 
-   if (resv->fence_shared_count == 0)
+   if (shared_count == 0)
pevents |= POLLOUT;
 
spin_lock_irq(&dmabuf->poll.lock);
@@ -167,19 +176,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
if (events & pevents) {
-   if (!fence_add_callback(resv->fence_excl,
-   &dcb->cb, dma_buf_poll_cb))
+   if (!fence_add_callback(fence_excl, &dcb->cb,
+  dma_buf_poll_cb)) {
events &= ~pevents;
-   else
+   } else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
dma_buf_poll_cb(NULL, &dcb->cb);
+   }
}
}
 
-   if ((events & POLLOUT) && resv->fence_shared_count > 0) {
+   if ((events & POLLOUT) && shared_count > 0) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
int i;
 
@@ -194,15 +204,18 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!(events & POLLOUT))
goto out;
 
-   for (i = 0; i < resv->fence_shared_count; ++i)
-   if (!fence_add_callback(resv->fence_shared[i],
-   &dcb->cb, dma_buf_poll_cb)) {
+   for (i = 0; i < shared_count; ++i) {
+   struct fence *fence = fobj->shared[i];
+
+   if (!fence_add_callback(fence, &dcb->cb,
+   dma_buf_poll_cb)) {
events &= ~POLLOUT;
break;
}
+   }
 
/* No callback queued, wake up any additional waiters. */
-   if (i == resv->fence_shared_count)
+   if (i == shared_count)
dma_buf_poll_cb(NULL, &dcb->cb);
}
 
diff --git a/drivers/base/fence.c b/drivers/base/fence.c
index 8fff13fb86cf..f780f9b3d418 100644
--- a/drivers/base/fence.c
+++ b/drivers/base/fence.c
@@ -170,7 +170,7 @@ void release_fence(struct kref *kref)
if (fence->ops->release)
fence->ops->release(fence);
else
-   kfree(fence);
+   free_fence(fence);
 }
 EXPORT_SYMBOL(release_fence);
 
@@ -448,7 +448,7 @@ static void seqno_release(struct fence *fence)
if (f->ops->release)
f->ops->release(fence);
els

[PATCH 0/2] Updates to fence api

2014-04-09 Thread Maarten Lankhorst

The following series implements small updates to the fence api.
I've found them useful when implementing the fence API in ttm and i915.

The last patch enables RCU on top of the api. I've found this less
useful, but it was the condition on which Thomas Hellstrom was ok
with converting TTM to fence, so I had to keep it in.

If nobody objects I'll probably merge that patch through drm, because
some care is needed in ttm before it can flip the switch on rcu.

---

Maarten Lankhorst (2):
  reservation: update api and add some helpers
  [RFC] reservation: add suppport for read-only access using rcu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] [RFC] reservation: add suppport for read-only access using rcu

2014-04-09 Thread Maarten Lankhorst

This adds 3 more functions to deal with rcu.

reservation_object_wait_timeout_rcu() will wait on all fences of the
reservation_object, without obtaining the ww_mutex.

reservation_object_test_signaled_rcu() will test if all fences of the
reservation_object are signaled without using the ww_mutex.

reservation_object_get_excl() is added because touching the fence_excl
member directly will trigger a sparse warning.

Signed-off-by: Maarten Lankhorst 
---
 drivers/base/dma-buf.c  |   46 +++--
 drivers/base/reservation.c  |  147 +--
 include/linux/fence.h   |   22 ++
 include/linux/reservation.h |   40 
 4 files changed, 224 insertions(+), 31 deletions(-)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index d89a98d2c37b..fc2d7546b8b0 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -151,14 +151,22 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
if (!events)
return 0;
 
-   ww_mutex_lock(&resv->lock, NULL);
+   rcu_read_lock();
 
-   fobj = resv->fence;
-   if (!fobj)
-   goto out;
+   fobj = rcu_dereference(resv->fence);
+   if (fobj) {
+   shared_count = ACCESS_ONCE(fobj->shared_count);
+   smp_mb(); /* shared_count needs transitivity wrt fence_excl */
+   } else
+   shared_count = 0;
+   fence_excl = rcu_dereference(resv->fence_excl);
 
-   shared_count = fobj->shared_count;
-   fence_excl = resv->fence_excl;
+   /*
+* This would have needed a smp_read_barrier_depends()
+* because shared_count needs to be read before shared[i], but
+* spin_lock_irq and spin_unlock_irq provide even stronger
+* guarantees.
+*/
 
if (fence_excl && (!(events & POLLOUT) || shared_count == 0)) {
struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
@@ -176,14 +184,20 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
spin_unlock_irq(&dmabuf->poll.lock);
 
if (events & pevents) {
-   if (!fence_add_callback(fence_excl, &dcb->cb,
+   if (!fence_get_rcu(fence_excl)) {
+   /* force a recheck */
+   events &= ~pevents;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   } else if (!fence_add_callback(fence_excl, &dcb->cb,
   dma_buf_poll_cb)) {
events &= ~pevents;
+   fence_put(fence_excl);
} else {
/*
 * No callback queued, wake up any additional
 * waiters.
 */
+   fence_put(fence_excl);
dma_buf_poll_cb(NULL, &dcb->cb);
}
}
@@ -205,13 +219,25 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
goto out;
 
for (i = 0; i < shared_count; ++i) {
-   struct fence *fence = fobj->shared[i];
-
+   struct fence *fence = fence_get_rcu(fobj->shared[i]);
+   if (!fence) {
+   /*
+* fence refcount dropped to zero, this means
+* that fobj has been freed
+*
+* call dma_buf_poll_cb and force a recheck!
+*/
+   events &= ~POLLOUT;
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   break;
+   }
if (!fence_add_callback(fence, &dcb->cb,
dma_buf_poll_cb)) {
+   fence_put(fence);
events &= ~POLLOUT;
break;
}
+   fence_put(fence);
}
 
/* No callback queued, wake up any additional waiters. */
@@ -220,7 +246,7 @@ static unsigned int dma_buf_poll(struct file *file, 
poll_table *poll)
}
 
 out:
-   ww_mutex_unlock(&resv->lock);
+   rcu_read_unlock();
return events;
 }
 
diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c
index b82a5b630a8e..4cdce63140b8 100644
--- a/drivers/base/reservation.c
+++ b/drivers/base/reservation.c
@@ -87,9 +87,13 @@ reservation_object_add_shared_inplace(struct 
rese

Re: nouveau crash due to missing channel (WAS: Re: [ANNOUNCE] 3.12.12-rt19)

2014-03-07 Thread Maarten Lankhorst


op 07-03-14 12:18, Sebastian Andrzej Siewior schreef:

* Fernando Lopez-Lezcano | 2014-03-01 17:48:29 [-0800]:


On 02/23/2014 10:47 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.12.12-rt19 patch set.

Just hit this Oops in my desktop at home:

[22328.388996] BUG: unable to handle kernel NULL pointer dereference
at 0008
[22328.389013] IP: []
nouveau_fence_wait_uevent.isra.2+0x22/0x440 [nouveau]

This is

| static int
| nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr)
|
| {
| struct nouveau_channel *chan = fence->channel;
| struct nouveau_fifo *pfifo = nouveau_fifo(chan->drm->device);

and chan is NULL.


[22328.389046] RAX:  RBX: 8807a68f8fa8 RCX:

[22328.389046] RDX: 0001 RSI: 8807a68f8fb0 RDI:
8807a68f8fa8
[22328.389047] RBP: 8807c09bdca0 R08: 045e R09:
e200
[22328.389047] R10: a0157d80 R11: 8807c09bdde0 R12:
0001
[22328.389047] R13:  R14: 8807d8493a80 R15:
8807a68f8fb0
[22328.389053] Call Trace:
[22328.389069]  [] nouveau_fence_wait+0x86/0x1a0 [nouveau]
[22328.389081]  [] nouveau_bo_fence_wait+0x15/0x20
[nouveau]
[22328.389084]  [] ttm_bo_wait+0x96/0x1a0 [ttm]
[22328.389095]  []
nouveau_gem_ioctl_cpu_prep+0x5c/0xf0 [nouveau]
[22328.389101]  [] drm_ioctl+0x502/0x630 [drm]
[22328.389114]  [] nouveau_drm_ioctl+0x51/0x90 [nouveau]

I can't find any kind of locking so my question is what ensures that chan is
not set to NULL between nouveau_fence_done() and
nouveau_fence_wait_uevent()? There are just a few opcodes in between but
nothing that pauses nouveau_fence_signal().

Absolutely nothing. :-) Worse still, there's no guarantee that channel isn't 
freed, but hopefully that is less likely to be an issue.

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] android: convert sync to fence api, v4

2014-03-04 Thread Maarten Lankhorst


op 04-03-14 11:00, Daniel Vetter schreef:

On Tue, Mar 04, 2014 at 09:20:58AM +0100, Maarten Lankhorst wrote:

op 04-03-14 09:14, Daniel Vetter schreef:

On Tue, Mar 04, 2014 at 08:50:38AM +0100, Maarten Lankhorst wrote:

op 03-03-14 22:11, Daniel Vetter schreef:

On Mon, Feb 17, 2014 at 04:57:19PM +0100, Maarten Lankhorst wrote:

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.

Signed-off-by: Maarten Lankhorst 
---

Snipped everything but headers - Ian Lister from our android team is
signed up to have a more in-depth look at proper integration with android
syncpoints. Adding him to cc.


diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index 62e2255b1c1e..6036dbdc8e6f 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 

  struct sync_timeline;
  struct sync_pt;
@@ -40,8 +41,6 @@ struct sync_fence;
   * -1 if a will signal before b
   * @free_pt: called before sync_pt is freed
   * @release_obj: called before sync_timeline is freed
- * @print_obj: deprecated
- * @print_pt: deprecated
   * @fill_driver_data: write implementation specific driver data to data.
   *  should return an error if there is not enough room
   *  as specified by size.  This information is returned
@@ -67,13 +66,6 @@ struct sync_timeline_ops {
   /* optional */
   void (*release_obj)(struct sync_timeline *sync_timeline);

- /* deprecated */
- void (*print_obj)(struct seq_file *s,
-  struct sync_timeline *sync_timeline);
-
- /* deprecated */
- void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
-
   /* optional */
   int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);

@@ -104,42 +96,48 @@ struct sync_timeline {

   /* protected by child_list_lock */
   bool destroyed;
+ int context, value;

   struct list_head child_list_head;
   spinlock_t child_list_lock;

   struct list_head active_list_head;
- spinlock_t active_list_lock;

+#ifdef CONFIG_DEBUG_FS
   struct list_head sync_timeline_list;
+#endif
  };

  /**
   * struct sync_pt - sync point
- * @parent: sync_timeline to which this sync_pt belongs
+ * @fence: base fence class
   * @child_list: membership in sync_timeline.child_list_head
   * @active_list: membership in sync_timeline.active_list_head
+<<<<<<< current
   * @signaled_list: membership in temporary signaled_list on stack
   * @fence: sync_fence to which the sync_pt belongs
   * @pt_list: membership in sync_fence.pt_list_head
   * @status: 1: signaled, 0:active, <0: error
   * @timestamp: time which sync_pt status transitioned from active to
   *  signaled or error.
+===
+>>>>>>> patched

Conflict markers ...

Oops.

   */
  struct sync_pt {
- struct sync_timeline *parent;
- struct list_head child_list;
+ struct fence base;

Hm, embedding feels wrong, since that still means that I'll need to
implement two kinds of fences in i915 - one using the seqno fence to make
dma-buf sync work, and one to implmenent sync_pt to make the android folks
happy.

If I can dream I think we should have a pointer to an underlying fence
here, i.e. a struct sync_pt would just be a userspace interface wrapper to
do explicit syncing using native fences, instead of implicit syncing like
with dma-bufs. But this is all drive-by comments from a very cursory
high-level look. I might be full of myself again ;-)
-Daniel


No, the idea is that because android syncpoint is simply another type of
dma-fence, that if you deal with normal fences then android can
automatically be handled too. The userspace fence api android exposes
could be very easily made to work for dma-fence, just pass a dma-fence
to sync_fence_create.
So exposing dma-fence would probably work for android too.

Hm, then why do we still have struct sync_pt around? Since it's just the
internal bit, with the userspace facing object being struct sync_fence,
I'd opt to shuffle any useful features into the core struct fence.
-Daniel

To keep compatibility with the android api. I think that gradually converting 
them is going to be
more useful than to force all drivers to use a new api all at once. They could 
keep android
syncpoint api for exporting, as long as they accept dma-fence for 
importing/waiting.

We don't have any users of the android sync_pt stuff (outside of the
framework itself). So any considerations for existing drivers for
upstreaming are imo moot. At least for the in-kernel interfaces used. For
the actual userspace interface I guess keeping the android syncpt ioctls
as-is ha

Re: [PATCH 4/6] android: convert sync to fence api, v4

2014-03-04 Thread Maarten Lankhorst


op 04-03-14 09:14, Daniel Vetter schreef:

On Tue, Mar 04, 2014 at 08:50:38AM +0100, Maarten Lankhorst wrote:

op 03-03-14 22:11, Daniel Vetter schreef:

On Mon, Feb 17, 2014 at 04:57:19PM +0100, Maarten Lankhorst wrote:

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.

Signed-off-by: Maarten Lankhorst 
---

Snipped everything but headers - Ian Lister from our android team is
signed up to have a more in-depth look at proper integration with android
syncpoints. Adding him to cc.


diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index 62e2255b1c1e..6036dbdc8e6f 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 

  struct sync_timeline;
  struct sync_pt;
@@ -40,8 +41,6 @@ struct sync_fence;
   * -1 if a will signal before b
   * @free_pt: called before sync_pt is freed
   * @release_obj: called before sync_timeline is freed
- * @print_obj: deprecated
- * @print_pt: deprecated
   * @fill_driver_data: write implementation specific driver data to data.
   *  should return an error if there is not enough room
   *  as specified by size.  This information is returned
@@ -67,13 +66,6 @@ struct sync_timeline_ops {
   /* optional */
   void (*release_obj)(struct sync_timeline *sync_timeline);

- /* deprecated */
- void (*print_obj)(struct seq_file *s,
-  struct sync_timeline *sync_timeline);
-
- /* deprecated */
- void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
-
   /* optional */
   int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);

@@ -104,42 +96,48 @@ struct sync_timeline {

   /* protected by child_list_lock */
   bool destroyed;
+ int context, value;

   struct list_head child_list_head;
   spinlock_t child_list_lock;

   struct list_head active_list_head;
- spinlock_t active_list_lock;

+#ifdef CONFIG_DEBUG_FS
   struct list_head sync_timeline_list;
+#endif
  };

  /**
   * struct sync_pt - sync point
- * @parent: sync_timeline to which this sync_pt belongs
+ * @fence: base fence class
   * @child_list: membership in sync_timeline.child_list_head
   * @active_list: membership in sync_timeline.active_list_head
+<<<<<<< current
   * @signaled_list: membership in temporary signaled_list on stack
   * @fence: sync_fence to which the sync_pt belongs
   * @pt_list: membership in sync_fence.pt_list_head
   * @status: 1: signaled, 0:active, <0: error
   * @timestamp: time which sync_pt status transitioned from active to
   *  signaled or error.
+===
+>>>>>>> patched

Conflict markers ...

Oops.

   */
  struct sync_pt {
- struct sync_timeline *parent;
- struct list_head child_list;
+ struct fence base;

Hm, embedding feels wrong, since that still means that I'll need to
implement two kinds of fences in i915 - one using the seqno fence to make
dma-buf sync work, and one to implmenent sync_pt to make the android folks
happy.

If I can dream I think we should have a pointer to an underlying fence
here, i.e. a struct sync_pt would just be a userspace interface wrapper to
do explicit syncing using native fences, instead of implicit syncing like
with dma-bufs. But this is all drive-by comments from a very cursory
high-level look. I might be full of myself again ;-)
-Daniel


No, the idea is that because android syncpoint is simply another type of
dma-fence, that if you deal with normal fences then android can
automatically be handled too. The userspace fence api android exposes
could be very easily made to work for dma-fence, just pass a dma-fence
to sync_fence_create.
So exposing dma-fence would probably work for android too.

Hm, then why do we still have struct sync_pt around? Since it's just the
internal bit, with the userspace facing object being struct sync_fence,
I'd opt to shuffle any useful features into the core struct fence.
-Daniel

To keep compatibility with the android api. I think that gradually converting 
them is going to be
more useful than to force all drivers to use a new api all at once. They could 
keep android
syncpoint api for exporting, as long as they accept dma-fence for 
importing/waiting.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] android: convert sync to fence api, v4

2014-03-03 Thread Maarten Lankhorst


op 03-03-14 22:11, Daniel Vetter schreef:

On Mon, Feb 17, 2014 at 04:57:19PM +0100, Maarten Lankhorst wrote:

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.

Signed-off-by: Maarten Lankhorst 
---

Snipped everything but headers - Ian Lister from our android team is
signed up to have a more in-depth look at proper integration with android
syncpoints. Adding him to cc.


diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index 62e2255b1c1e..6036dbdc8e6f 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 

  struct sync_timeline;
  struct sync_pt;
@@ -40,8 +41,6 @@ struct sync_fence;
   * -1 if a will signal before b
   * @free_pt: called before sync_pt is freed
   * @release_obj: called before sync_timeline is freed
- * @print_obj: deprecated
- * @print_pt: deprecated
   * @fill_driver_data: write implementation specific driver data to data.
   *  should return an error if there is not enough room
   *  as specified by size.  This information is returned
@@ -67,13 +66,6 @@ struct sync_timeline_ops {
   /* optional */
   void (*release_obj)(struct sync_timeline *sync_timeline);

- /* deprecated */
- void (*print_obj)(struct seq_file *s,
-  struct sync_timeline *sync_timeline);
-
- /* deprecated */
- void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
-
   /* optional */
   int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);

@@ -104,42 +96,48 @@ struct sync_timeline {

   /* protected by child_list_lock */
   bool destroyed;
+ int context, value;

   struct list_head child_list_head;
   spinlock_t child_list_lock;

   struct list_head active_list_head;
- spinlock_t active_list_lock;

+#ifdef CONFIG_DEBUG_FS
   struct list_head sync_timeline_list;
+#endif
  };

  /**
   * struct sync_pt - sync point
- * @parent: sync_timeline to which this sync_pt belongs
+ * @fence: base fence class
   * @child_list: membership in sync_timeline.child_list_head
   * @active_list: membership in sync_timeline.active_list_head
+<<<<<<< current
   * @signaled_list: membership in temporary signaled_list on stack
   * @fence: sync_fence to which the sync_pt belongs
   * @pt_list: membership in sync_fence.pt_list_head
   * @status: 1: signaled, 0:active, <0: error
   * @timestamp: time which sync_pt status transitioned from active to
   *  signaled or error.
+===
+>>>>>>> patched

Conflict markers ...

Oops.

   */
  struct sync_pt {
- struct sync_timeline *parent;
- struct list_head child_list;
+ struct fence base;

Hm, embedding feels wrong, since that still means that I'll need to
implement two kinds of fences in i915 - one using the seqno fence to make
dma-buf sync work, and one to implmenent sync_pt to make the android folks
happy.

If I can dream I think we should have a pointer to an underlying fence
here, i.e. a struct sync_pt would just be a userspace interface wrapper to
do explicit syncing using native fences, instead of implicit syncing like
with dma-bufs. But this is all drive-by comments from a very cursory
high-level look. I might be full of myself again ;-)
-Daniel


No, the idea is that because android syncpoint is simply another type of 
dma-fence, that if you deal with normal fences then android can automatically
be handled too. The userspace fence api android exposes could be very easily 
made to work for dma-fence, just pass a dma-fence to sync_fence_create.
So exposing dma-fence would probably work for android too.

~Maarten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/6] android: convert sync to fence api, v5

2014-02-24 Thread Maarten Lankhorst

Just to show it's easy.

Android syncpoints can be mapped to a timeline. This removes the need
to maintain a separate api for synchronization. I've left the android
trace events in place, but the core fence events should already be
sufficient for debugging.

v2:
- Call fence_remove_callback in sync_fence_free if not all fences have fired.
v3:
- Merge Colin Cross' bugfixes, and the android fence merge optimization.
v4:
- Merge with the upstream fixes.
v5:
- Fix small style issues pointed out by Thomas Hellstrom.

Signed-off-by: Maarten Lankhorst 
---
 drivers/staging/android/Kconfig  |1 
 drivers/staging/android/Makefile |2 
 drivers/staging/android/sw_sync.c|4 
 drivers/staging/android/sync.c   |  903 --
 drivers/staging/android/sync.h   |   82 ++-
 drivers/staging/android/sync_debug.c |  247 +
 drivers/staging/android/trace/sync.h |   12 
 7 files changed, 611 insertions(+), 640 deletions(-)
 create mode 100644 drivers/staging/android/sync_debug.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index b91c758883bf..ecc8194242b5 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -77,6 +77,7 @@ config SYNC
bool "Synchronization framework"
default n
select ANON_INODES
+   select DMA_SHARED_BUFFER
---help---
  This option enables the framework for synchronization between multiple
  drivers.  Sync implementations can take advantage of hardware
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 0a01e1914905..517ad5ffa429 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -9,5 +9,5 @@ obj-$(CONFIG_ANDROID_TIMED_OUTPUT)  += timed_output.o
 obj-$(CONFIG_ANDROID_TIMED_GPIO)   += timed_gpio.o
 obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)+= lowmemorykiller.o
 obj-$(CONFIG_ANDROID_INTF_ALARM_DEV)   += alarm-dev.o
-obj-$(CONFIG_SYNC) += sync.o
+obj-$(CONFIG_SYNC) += sync.o sync_debug.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o
diff --git a/drivers/staging/android/sw_sync.c 
b/drivers/staging/android/sw_sync.c
index f24493ac65e3..a76db3ff87cb 100644
--- a/drivers/staging/android/sw_sync.c
+++ b/drivers/staging/android/sw_sync.c
@@ -50,7 +50,7 @@ static struct sync_pt *sw_sync_pt_dup(struct sync_pt *sync_pt)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *) sync_pt;
struct sw_sync_timeline *obj =
-   (struct sw_sync_timeline *)sync_pt->parent;
+   (struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
return (struct sync_pt *) sw_sync_pt_create(obj, pt->value);
 }
@@ -59,7 +59,7 @@ static int sw_sync_pt_has_signaled(struct sync_pt *sync_pt)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
struct sw_sync_timeline *obj =
-   (struct sw_sync_timeline *)sync_pt->parent;
+   (struct sw_sync_timeline *)sync_pt_parent(sync_pt);
 
return sw_sync_cmp(obj->value, pt->value) >= 0;
 }
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 3d05f662110b..b2254e5a8b70 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -31,22 +31,13 @@
 #define CREATE_TRACE_POINTS
 #include "trace/sync.h"
 
-static void sync_fence_signal_pt(struct sync_pt *pt);
-static int _sync_pt_has_signaled(struct sync_pt *pt);
-static void sync_fence_free(struct kref *kref);
-static void sync_dump(void);
-
-static LIST_HEAD(sync_timeline_list_head);
-static DEFINE_SPINLOCK(sync_timeline_list_lock);
-
-static LIST_HEAD(sync_fence_list_head);
-static DEFINE_SPINLOCK(sync_fence_list_lock);
+static const struct fence_ops android_fence_ops;
+static const struct file_operations sync_fence_fops;
 
 struct sync_timeline *sync_timeline_create(const struct sync_timeline_ops *ops,
   int size, const char *name)
 {
struct sync_timeline *obj;
-   unsigned long flags;
 
if (size < sizeof(struct sync_timeline))
return NULL;
@@ -57,17 +48,14 @@ struct sync_timeline *sync_timeline_create(const struct 
sync_timeline_ops *ops,
 
kref_init(&obj->kref);
obj->ops = ops;
+   obj->context = fence_context_alloc(1);
strlcpy(obj->name, name, sizeof(obj->name));
 
INIT_LIST_HEAD(&obj->child_list_head);
-   spin_lock_init(&obj->child_list_lock);
-
INIT_LIST_HEAD(&obj->active_list_head);
-   spin_lock_init(&obj->active_list_lock);
+   spin_lock_init(&obj->child_list_lock);
 
-   spin_lock_irqsave(&sync_timeline_list_lock, flags);
-   list_add_tail(&obj->sync_timeline_list, &sync_timeline_list_head);
-   spin_unlock_irqrestore(&sync_timeline_li

[PATCH 6/6] dma-buf: add poll support, v3

2014-02-24 Thread Maarten Lankhorst

Thanks to Fengguang Wu for spotting a missing static cast.

v2:
- Kill unused variable need_shared.
v3:
- Clarify the BUG() in dma_buf_release some more. (Rob Clark)

Signed-off-by: Maarten Lankhorst 
---
 drivers/base/dma-buf.c  |  108 +++
 include/linux/dma-buf.h |   12 +
 2 files changed, 120 insertions(+)

diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index 65d0f6201db4..84a9d0b66c99 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static inline int is_dma_buf_file(struct file *);
@@ -52,6 +53,16 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
 
BUG_ON(dmabuf->vmapping_counter);
 
+   /*
+* Any fences that a dma-buf poll can wait on should be signaled
+* before releasing dma-buf. This is the responsibility of each
+* driver that uses the reservation objects.
+*
+* If you hit this BUG() it means someone dropped their ref to the
+* dma-buf while still having pending operation to the buffer.
+*/
+   BUG_ON(dmabuf->cb_shared.active || dmabuf->cb_excl.active);
+
dmabuf->ops->release(dmabuf);
 
mutex_lock(&db_list.lock);
@@ -108,10 +119,103 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
return base + offset;
 }
 
+static void dma_buf_poll_cb(struct fence *fence, struct fence_cb *cb)
+{
+   struct dma_buf_poll_cb_t *dcb = (struct dma_buf_poll_cb_t *)cb;
+   unsigned long flags;
+
+   spin_lock_irqsave(&dcb->poll->lock, flags);
+   wake_up_locked_poll(dcb->poll, dcb->active);
+   dcb->active = 0;
+   spin_unlock_irqrestore(&dcb->poll->lock, flags);
+}
+
+static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
+{
+   struct dma_buf *dmabuf;
+   struct reservation_object *resv;
+   unsigned long events;
+
+   dmabuf = file->private_data;
+   if (!dmabuf || !dmabuf->resv)
+   return POLLERR;
+
+   resv = dmabuf->resv;
+
+   poll_wait(file, &dmabuf->poll, poll);
+
+   events = poll_requested_events(poll) & (POLLIN | POLLOUT);
+   if (!events)
+   return 0;
+
+   ww_mutex_lock(&resv->lock, NULL);
+
+   if (resv->fence_excl && (!(events & POLLOUT) ||
+resv->fence_shared_count == 0)) {
+   struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_excl;
+   unsigned long pevents = POLLIN;
+
+   if (resv->fence_shared_count == 0)
+   pevents |= POLLOUT;
+
+   spin_lock_irq(&dmabuf->poll.lock);
+   if (dcb->active) {
+   dcb->active |= pevents;
+   events &= ~pevents;
+   } else
+   dcb->active = pevents;
+   spin_unlock_irq(&dmabuf->poll.lock);
+
+   if (events & pevents) {
+   if (!fence_add_callback(resv->fence_excl,
+   &dcb->cb, dma_buf_poll_cb))
+   events &= ~pevents;
+   else
+   /*
+* No callback queued, wake up any additional
+* waiters.
+*/
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   }
+   }
+
+   if ((events & POLLOUT) && resv->fence_shared_count > 0) {
+   struct dma_buf_poll_cb_t *dcb = &dmabuf->cb_shared;
+   int i;
+
+   /* Only queue a new callback if no event has fired yet */
+   spin_lock_irq(&dmabuf->poll.lock);
+   if (dcb->active)
+   events &= ~POLLOUT;
+   else
+   dcb->active = POLLOUT;
+   spin_unlock_irq(&dmabuf->poll.lock);
+
+   if (!(events & POLLOUT))
+   goto out;
+
+   for (i = 0; i < resv->fence_shared_count; ++i)
+   if (!fence_add_callback(resv->fence_shared[i],
+   &dcb->cb, dma_buf_poll_cb)) {
+   events &= ~POLLOUT;
+   break;
+   }
+
+   /* No callback queued, wake up any additional waiters. */
+   if (i == resv->fence_shared_count)
+   dma_buf_poll_cb(NULL, &dcb->cb);
+   }
+
+out:
+   ww_mutex_unlock(&resv->lock);
+   return events;
+}
+
 static const struct file_operations dma_buf_fops = {
.relea

[PATCH 5/6] reservation: add support for fences to enable cross-device synchronisation

2014-02-24 Thread Maarten Lankhorst

Signed-off-by: Maarten Lankhorst 
Reviewed-by: Rob Clark 
---
 include/linux/reservation.h |   20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/reservation.h b/include/linux/reservation.h
index 813dae960ebd..f3f57460a205 100644
--- a/include/linux/reservation.h
+++ b/include/linux/reservation.h
@@ -6,7 +6,7 @@
  * Copyright (C) 2012 Texas Instruments
  *
  * Authors:
- * Rob Clark 
+ * Rob Clark 
  * Maarten Lankhorst 
  * Thomas Hellstrom 
  *
@@ -40,22 +40,40 @@
 #define _LINUX_RESERVATION_H
 
 #include 
+#include 
+#include 
 
 extern struct ww_class reservation_ww_class;
 
 struct reservation_object {
struct ww_mutex lock;
+
+   struct fence *fence_excl;
+   struct fence **fence_shared;
+   u32 fence_shared_count, fence_shared_max;
 };
 
 static inline void
 reservation_object_init(struct reservation_object *obj)
 {
ww_mutex_init(&obj->lock, &reservation_ww_class);
+
+   obj->fence_shared_count = obj->fence_shared_max = 0;
+   obj->fence_shared = NULL;
+   obj->fence_excl = NULL;
 }
 
 static inline void
 reservation_object_fini(struct reservation_object *obj)
 {
+   int i;
+
+   if (obj->fence_excl)
+   fence_put(obj->fence_excl);
+   for (i = 0; i < obj->fence_shared_count; ++i)
+   fence_put(obj->fence_shared[i]);
+   kfree(obj->fence_shared);
+
ww_mutex_destroy(&obj->lock);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 >

201 - 300 of 528 matches

Mail list logo