date:20140904

[PATCH v2] clk: Don't hold prepare_lock across debugfs creation

2014-09-04 Thread Stephen Boyd

Rob Clark reports a lockdep splat that involves the prepare_lock
chained with the mmap semaphore.

==
[ INFO: possible circular locking dependency detected ]
3.17.0-rc1-00050-g07a489b #802 Tainted: GW
---
Xorg.bin/5413 is trying to acquire lock:
 (prepare_lock){+.+.+.}, at: [] clk_prepare_lock+0x88/0xfc

but task is already holding lock:
 (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (qcom_iommu_lock){+.+...}:
   [] qcom_iommu_map+0x28/0x450
   [] iommu_map+0xc8/0x12c
   [] msm_iommu_map+0xb4/0x130
   [] msm_gem_get_iova_locked+0x9c/0xe8
   [] msm_gem_get_iova+0x4c/0x64
   [] mdp4_kms_init+0x4c4/0x6c0
   [] msm_load+0x2ac/0x34c
   [] drm_dev_register+0xac/0x108
   [] drm_platform_init+0x50/0xf0
   [] try_to_bring_up_master.part.3+0xc8/0x108
   [] component_master_add_with_match+0xa8/0x104
   [] msm_pdev_probe+0x64/0x70
   [] platform_drv_probe+0x2c/0x60
   [] driver_probe_device+0x108/0x234
   [] bus_for_each_drv+0x64/0x98
   [] device_attach+0x78/0x8c
   [] bus_probe_device+0x88/0xac
   [] deferred_probe_work_func+0x68/0x9c
   [] process_one_work+0x1a0/0x40c
   [] worker_thread+0x44/0x4d8
   [] kthread+0xd8/0xec
   [] ret_from_fork+0x14/0x2c

-> #3 (&dev->struct_mutex){+.+.+.}:
   [] drm_gem_mmap+0x38/0xd0
   [] msm_gem_mmap+0xc/0x5c
   [] mmap_region+0x35c/0x6c8
   [] do_mmap_pgoff+0x314/0x398
   [] vm_mmap_pgoff+0x84/0xb4
   [] SyS_mmap_pgoff+0x94/0xbc
   [] ret_fast_syscall+0x0/0x48

-> #2 (&mm->mmap_sem){++}:
   [] filldir64+0x68/0x180
   [] dcache_readdir+0x188/0x22c
   [] iterate_dir+0x9c/0x11c
   [] SyS_getdents64+0x78/0xe8
   [] ret_fast_syscall+0x0/0x48

-> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}:
   [] __create_file+0x58/0x1dc
   [] debugfs_create_dir+0x1c/0x24
   [] clk_debug_create_subtree+0x20/0x170
   [] clk_debug_init+0xec/0x14c
   [] do_one_initcall+0x8c/0x1c8
   [] kernel_init_freeable+0x13c/0x1dc
   [] kernel_init+0x8/0xe8
   [] ret_from_fork+0x14/0x2c

-> #0 (prepare_lock){+.+.+.}:
   [] mutex_lock_nested+0x70/0x3e8
   [] clk_prepare_lock+0x88/0xfc
   [] clk_prepare+0xc/0x24
   [] __enable_clocks.isra.4+0x18/0xa4
   [] __flush_iotlb_va+0xe0/0x114
   [] qcom_iommu_unmap+0xac/0x1f0
   [] iommu_unmap+0x9c/0xe8
   [] msm_iommu_unmap+0x64/0x84
   [] msm_gem_free_object+0x11c/0x338
   [] drm_gem_object_handle_unreference_unlocked+0xfc/0x130
   [] drm_gem_object_release_handle+0x50/0x68
   [] idr_for_each+0xa8/0xdc
   [] drm_gem_release+0x1c/0x28
   [] drm_release+0x370/0x428
   [] __fput+0x98/0x1e8
   [] task_work_run+0xb0/0xfc
   [] do_exit+0x2ec/0x948
   [] do_group_exit+0x4c/0xb8
   [] get_signal+0x28c/0x6ac
   [] do_signal+0xc4/0x3e4
   [] do_work_pending+0xb4/0xc4
   [] work_pending+0xc/0x20

other info that might help us debug this:

Chain exists of:
  prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(qcom_iommu_lock);
   lock(&dev->struct_mutex);
   lock(qcom_iommu_lock);
  lock(prepare_lock);

 *** DEADLOCK ***

3 locks held by Xorg.bin/5413:
 #0:  (drm_global_mutex){+.+.+.}, at: [] drm_release+0x34/0x428
 #1:  (&dev->struct_mutex){+.+.+.}, at: [] 
drm_gem_object_handle_unreference_unlocked+0xcc/0x130
 #2:  (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0

stack backtrace:
CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: GW  
3.17.0-rc1-00050-g07a489b #802
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x98/0xb8)
[] (dump_stack) from [] (print_circular_bug+0x218/0x340)
[] (print_circular_bug) from [] 
(__lock_acquire+0x1d24/0x20b8)
[] (__lock_acquire) from [] (lock_acquire+0x9c/0xbc)
[] (lock_acquire) from [] (mutex_lock_nested+0x70/0x3e8)
[] (mutex_lock_nested) from [] (clk_prepare_lock+0x88/0xfc)
[] (clk_prepare_lock) from [] (clk_prepare+0xc/0x24)
[] (clk_prepare) from [] (__enable_clocks.isra.4+0x18/0xa4)
[] (__enable_clocks.isra.4) from [] 
(__flush_iotlb_va+0xe0/0x114)
[] (__flush_iotlb_va) from [] (qcom_iommu_unmap+0xac/0x1f0)
[] (qcom_iommu_unmap) from [] (iommu_unmap+0x9c/0xe8)
[] (iommu_unmap) from [] (msm_iommu_unmap+0x64/0x84)
[] (msm_iommu_unmap) from [] 
(msm_gem_free_object+0x11c/0x338)
[] (msm_gem_free_object) from [] 
(drm_gem_object_handle_unreference_unlocked+0xfc/0x130)
[] (drm_gem_object_handle_unreference_unlocked) from [] 
(drm_gem_object_release_handle+0x50/0x68)
[] (drm_gem_object_release_handle) from [] 
(idr_for_each+0xa8/0xdc)
[] (idr_for_e

[PATCH 1/2] clk: Make __clk_lookup() use a list instead of tree search

2014-09-04 Thread Stephen Boyd

On 09/04, Stephen Boyd wrote:
> In the near future we're going to move the prepare lock to be a
> per-clock ww_mutex. __clk_lookup() is called very deep in the
> set-rate path and we would like to avoid having to take all the
> locks in the clock tree to search for a clock (basically
> defeating the purpose of introducing per-clock locks). Introduce
> a new list that contains all clocks registered in the system and
> walk this list until the clock is found.
> 
> Signed-off-by: Stephen Boyd 
> ---

Actually this won't work. We can't grab the list lock while the
prepare lock is held. So we need to do the debugfs stuff with a
different lock and do it outside of the prepare lock.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

[Bug 82667] "Distance" game closed alpha: Changing shadow settings segfaults

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=82667

--- Comment #3 from Christoph Haag  ---
Created attachment 105768
  --> https://bugs.freedesktop.org/attachment.cgi?id=105768&action=edit
The Book of Unwritten Tales 2 Early access - same backtrace

The same happens when changing the graphics quality setting in "The Book of
Unwritten Tales 2" Early Access (for kickstarter backers). Interestingly with
the mesa debug build it works sometimes(?), but when changing the settings a
few times, it very quickly crashes.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/0bba279b/attachment.html>

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #15 from smoki  ---
 Default what Debian 32bit passes in addition to mesa options is:

/usr/lib/gcc/i586-linux-gnu/4.9/cc1 -E -quiet -v -imultiarch i386-linux-gnu -
-mtune=generic -march=i586

 That is with -mtune=generic -march=i586, with -mtune=native -march=native on
AMD Kabini it passes this:

 /usr/lib/gcc/i586-linux-gnu/4.9/cc1 -E -quiet -v -imultiarch i386-linux-gnu -
-march=btver2 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16
-msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mno-fma
-mno-fma4 -mno-xop -mbmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1
-mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mno-fsgsbase -mno-rdseed -mprfchw
-mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd
-mno-avx512pf -mno-prefetchwt1 --param l1-cache-size=32 --param
l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=btver2

 So Debian default works fine for the 10.2 branch, and for the 10.3-devel up to
the commit 37d43ebb28ce8be38f3d9b0805b8b14354ce786d . But anything newer is
affected, 10.3 branch and git master.

 Maybe some of those options need to be added to default, not sure what of
these make is work normal :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/af93049b/attachment.html>

ccf vs iommu vs drm locking fun

2014-09-04 Thread Rob Clark

So, I was looking at the below lockdep splat, and discussing it a bit
w/ sboyd on IRC, and came to a slightly disturbing realization..

The interaction between prepare_lock and debugfs bits is a little bit
worrying.  In particular, it is probably not a good idea to assume
that anyone who needs to grab prepare_lock does not already hold
mmap_sem.  Not holding mmap_sem or locks that interact w/ mmap_sem is
going to be pretty hard to avoid, at least for gpu drivers that are
using iommus that are using CCF ;-)

BR,
-R


--

[15928.894558]
[15928.894609] ==
[15928.895145] [ INFO: possible circular locking dependency detected ]
[15928.901141] 3.17.0-rc1-00050-g07a489b #802 Tainted: GW
[15928.907335] ---
[15928.907348] Xorg.bin/5413 is trying to acquire lock:
[15928.907417]  (prepare_lock){+.+.+.}, at: []
clk_prepare_lock+0x88/0xfc
[15928.907424]
[15928.907424] but task is already holding lock:
[15928.907508]  (qcom_iommu_lock){+.+...}, at: []
qcom_iommu_unmap+0x1c/0x1f0
[15928.907519]
[15928.907519] which lock already depends on the new lock.
[15928.907519]
[15928.907532]
[15928.907532] the existing dependency chain (in reverse order) is:
[15928.907575]
[15928.907575] -> #4 (qcom_iommu_lock){+.+...}:
[15928.907611][] qcom_iommu_map+0x28/0x450
[15928.907634][] iommu_map+0xc8/0x12c
[15928.907662][] msm_iommu_map+0xb4/0x130
[15928.907681][] msm_gem_get_iova_locked+0x9c/0xe8
[15928.907714][] msm_gem_get_iova+0x4c/0x64
[15928.907765][] mdp4_kms_init+0x4c4/0x6c0
[15928.907813][] msm_load+0x2ac/0x34c
[15928.907846][] drm_dev_register+0xac/0x108
[15928.907868][] drm_platform_init+0x50/0xf0
[15928.907892][] try_to_bring_up_master.part.3+0xc8/0x108
[15928.907913][] component_master_add_with_match+0xa8/0x104
[15928.907934][] msm_pdev_probe+0x64/0x70
[15928.907955][] platform_drv_probe+0x2c/0x60
[15928.907983][] driver_probe_device+0x108/0x234
[15928.908003][] bus_for_each_drv+0x64/0x98
[15928.908040][] device_attach+0x78/0x8c
[15928.908082][] bus_probe_device+0x88/0xac
[15928.908126][] deferred_probe_work_func+0x68/0x9c
[15928.908182][] process_one_work+0x1a0/0x40c
[15928.908214][] worker_thread+0x44/0x4d8
[15928.908237][] kthread+0xd8/0xec
[15928.908262][] ret_from_fork+0x14/0x2c
[15928.908291]
[15928.908291] -> #3 (&dev->struct_mutex){+.+.+.}:
[15928.908311][] drm_gem_mmap+0x38/0xd0
[15928.908329][] msm_gem_mmap+0xc/0x5c
[15928.908358][] mmap_region+0x35c/0x6c8
[15928.908377][] do_mmap_pgoff+0x314/0x398
[15928.908398][] vm_mmap_pgoff+0x84/0xb4
[15928.908416][] SyS_mmap_pgoff+0x94/0xbc
[15928.908436][] ret_fast_syscall+0x0/0x48
[15928.908463]
[15928.908463] -> #2 (&mm->mmap_sem){++}:
[15928.908512][] filldir64+0x68/0x180
[15928.908558][] dcache_readdir+0x188/0x22c
[15928.908593][] iterate_dir+0x9c/0x11c
[15928.908616][] SyS_getdents64+0x78/0xe8
[15928.908640][] ret_fast_syscall+0x0/0x48
[15928.908671]
[15928.908671] -> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}:
[15928.908706][] __create_file+0x58/0x1dc
[15928.908728][] debugfs_create_dir+0x1c/0x24
[15928.908761][] clk_debug_create_subtree+0x20/0x170
[15928.908790][] clk_debug_init+0xec/0x14c
[15928.908816][] do_one_initcall+0x8c/0x1c8
[15928.908846][] kernel_init_freeable+0x13c/0x1dc
[15928.908873][] kernel_init+0x8/0xe8
[15928.908898][] ret_from_fork+0x14/0x2c
[15928.908925]
[15928.908925] -> #0 (prepare_lock){+.+.+.}:
[15928.908948][] mutex_lock_nested+0x70/0x3e8
[15928.908970][] clk_prepare_lock+0x88/0xfc
[15928.909001][] clk_prepare+0xc/0x24
[15928.909022][] __enable_clocks.isra.4+0x18/0xa4
[15928.909041][] __flush_iotlb_va+0xe0/0x114
[15928.909071][] qcom_iommu_unmap+0xac/0x1f0
[15928.909093][] iommu_unmap+0x9c/0xe8
[15928.909112][] msm_iommu_unmap+0x64/0x84
[15928.909130][] msm_gem_free_object+0x11c/0x338
[15928.909149][]
drm_gem_object_handle_unreference_unlocked+0xfc/0x130
[15928.909166][] drm_gem_object_release_handle+0x50/0x68
[15928.909199][] idr_for_each+0xa8/0xdc
[15928.909225][] drm_gem_release+0x1c/0x28
[15928.909258][] drm_release+0x370/0x428
[15928.909302][] __fput+0x98/0x1e8
[15928.909339][] task_work_run+0xb0/0xfc
[15928.909386][] do_exit+0x2ec/0x948
[15928.909415][] do_group_exit+0x4c/0xb8
[15928.909455][] get_signal+0x28c/0x6ac
[15928.909507][] do_signal+0xc4/0x3e4
[15928.909548][] do_work_pending+0xb4/0xc4
[15928.909584][] work_pending+0xc/0x20
[15928.909595]
[15928.909595] other info that might help us debug this:
[15928.909595

[Bug 81644] Random crashes on RadeonSI with Chromium.

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=81644

--- Comment #83 from Aaron B  ---
This bisect put me here, which looks like it didn't go as planned again...

http://cgit.freedesktop.org/mesa/mesa/commit/?id=78578b759943cb198d34eedc00b3408c1599f6ec

I'm going to give up for now, maybe when I don't have so many other things
going on I'll bisect it over a week or so.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/12483c2a/attachment.html>

[PATCH 3/3] drm/radeon: allow concurrent buffer reads

2014-09-04 Thread Christian König

From: Christian K?nig 

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/radeon_cs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index ec4840c..f662de4 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -183,7 +183,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser 
*p)
}

p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
-   p->relocs[i].tv.shared = false;
+   p->relocs[i].tv.shared = !r->write_domain;
p->relocs[i].handle = r->handle;

radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
@@ -260,7 +260,8 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
continue;

resv = p->relocs[i].robj->tbo.resv;
-   radeon_semaphore_sync_resv(p->ib.semaphore, resv, false);
+   radeon_semaphore_sync_resv(p->ib.semaphore, resv,
+  p->relocs[i].tv.shared);
}
 }

-- 
1.9.1

[PATCH 2/3] drm/radeon: add the infrastructure for concurrent buffer access

2014-09-04 Thread Christian König

From: Christian K?nig 

This allows us to specify if we want to sync to
the shared fences of a reservation object or not.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/radeon/cik.c  | 25 ++-
 drivers/gpu/drm/radeon/cik_sdma.c | 25 ++-
 drivers/gpu/drm/radeon/evergreen_dma.c| 24 +-
 drivers/gpu/drm/radeon/r100.c | 21 +
 drivers/gpu/drm/radeon/r200.c | 21 +
 drivers/gpu/drm/radeon/r600.c | 23 +-
 drivers/gpu/drm/radeon/r600_dma.c | 25 ++-
 drivers/gpu/drm/radeon/radeon.h   | 43 +-
 drivers/gpu/drm/radeon/radeon_asic.h  | 74 ---
 drivers/gpu/drm/radeon/radeon_benchmark.c | 30 ++---
 drivers/gpu/drm/radeon/radeon_cs.c|  8 +---
 drivers/gpu/drm/radeon/radeon_ib.c|  2 +-
 drivers/gpu/drm/radeon/radeon_semaphore.c | 38 ++--
 drivers/gpu/drm/radeon/radeon_test.c  | 24 +++---
 drivers/gpu/drm/radeon/radeon_ttm.c   | 12 ++---
 drivers/gpu/drm/radeon/radeon_vm.c| 16 ++-
 drivers/gpu/drm/radeon/rv770_dma.c| 25 ++-
 drivers/gpu/drm/radeon/si_dma.c   | 25 ++-
 18 files changed, 253 insertions(+), 208 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 1f598ab..0b5a230 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -3959,18 +3959,19 @@ bool cik_semaphore_ring_emit(struct radeon_device *rdev,
  * @src_offset: src GPU address
  * @dst_offset: dst GPU address
  * @num_gpu_pages: number of GPU pages to xfer
- * @fence: radeon fence object
+ * @resv: reservation object to sync to
  *
  * Copy GPU paging using the CP DMA engine (CIK+).
  * Used by the radeon ttm implementation to move pages if
  * registered as the asic copy callback.
  */
-int cik_copy_cpdma(struct radeon_device *rdev,
-  uint64_t src_offset, uint64_t dst_offset,
-  unsigned num_gpu_pages,
-  struct radeon_fence **fence)
+struct radeon_fence *cik_copy_cpdma(struct radeon_device *rdev,
+   uint64_t src_offset, uint64_t dst_offset,
+   unsigned num_gpu_pages,
+   struct reservation_object *resv)
 {
struct radeon_semaphore *sem = NULL;
+   struct radeon_fence *fence;
int ring_index = rdev->asic->copy.blit_ring_index;
struct radeon_ring *ring = &rdev->ring[ring_index];
u32 size_in_bytes, cur_size_in_bytes, control;
@@ -3980,7 +3981,7 @@ int cik_copy_cpdma(struct radeon_device *rdev,
r = radeon_semaphore_create(rdev, &sem);
if (r) {
DRM_ERROR("radeon: moving bo (%d).\n", r);
-   return r;
+   return ERR_PTR(r);
}

size_in_bytes = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT);
@@ -3989,10 +3990,10 @@ int cik_copy_cpdma(struct radeon_device *rdev,
if (r) {
DRM_ERROR("radeon: moving bo (%d).\n", r);
radeon_semaphore_free(rdev, &sem, NULL);
-   return r;
+   return ERR_PTR(r);
}

-   radeon_semaphore_sync_to(sem, *fence);
+   radeon_semaphore_sync_resv(sem, resv, false);
radeon_semaphore_sync_rings(rdev, sem, ring->idx);

for (i = 0; i < num_loops; i++) {
@@ -4014,17 +4015,17 @@ int cik_copy_cpdma(struct radeon_device *rdev,
dst_offset += cur_size_in_bytes;
}

-   r = radeon_fence_emit(rdev, fence, ring->idx);
+   r = radeon_fence_emit(rdev, &fence, ring->idx);
if (r) {
radeon_ring_unlock_undo(rdev, ring);
radeon_semaphore_free(rdev, &sem, NULL);
-   return r;
+   return ERR_PTR(r);
}

radeon_ring_unlock_commit(rdev, ring, false);
-   radeon_semaphore_free(rdev, &sem, *fence);
+   radeon_semaphore_free(rdev, &sem, fence);

-   return r;
+   return fence;
 }

 /*
diff --git a/drivers/gpu/drm/radeon/cik_sdma.c 
b/drivers/gpu/drm/radeon/cik_sdma.c
index 192278b..c01a610 100644
--- a/drivers/gpu/drm/radeon/cik_sdma.c
+++ b/drivers/gpu/drm/radeon/cik_sdma.c
@@ -537,18 +537,19 @@ void cik_sdma_fini(struct radeon_device *rdev)
  * @src_offset: src GPU address
  * @dst_offset: dst GPU address
  * @num_gpu_pages: number of GPU pages to xfer
- * @fence: radeon fence object
+ * @resv: reservation object to sync to
  *
  * Copy GPU paging using the DMA engine (CIK).
  * Used by the radeon ttm implementation to move pages if
  * registered as the asic copy callback.
  */
-int cik_copy_dma(struct radeon_device *rdev,
-uint64_t src_offset, uint64_t dst_offset,
-unsigned num_gpu_pages,
-struct radeon_fence **fence)
+struct radeon_fence *cik_copy_dma(struct radeon_device *rdev,
+ uint64_t src_offs

[PATCH 1/3] drm/ttm: allow fence to be added as shared

2014-09-04 Thread Christian König

From: Christian K?nig 

This patch adds a new flag to the ttm_validate_buffer list to
add the fence as shared to the reservation object.

Signed-off-by: Christian K?nig 
---
 drivers/gpu/drm/qxl/qxl_release.c|  1 +
 drivers/gpu/drm/radeon/radeon_cs.c   |  1 +
 drivers/gpu/drm/radeon/radeon_vm.c   |  3 +++
 drivers/gpu/drm/ttm/ttm_execbuf_util.c   | 18 +++---
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c  |  3 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |  5 +
 include/drm/ttm/ttm_execbuf_util.h   |  2 ++
 7 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index a6e19c8..446e71c 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -226,6 +226,7 @@ int qxl_release_list_add(struct qxl_release *release, 
struct qxl_bo *bo)

qxl_bo_ref(bo);
entry->tv.bo = &bo->tbo;
+   entry->tv.shared = false;
list_add_tail(&entry->tv.head, &release->bos);
return 0;
 }
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 6e3d1c8..cd517ab 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -183,6 +183,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser 
*p)
}

p->relocs[i].tv.bo = &p->relocs[i].robj->tbo;
+   p->relocs[i].tv.shared = false;
p->relocs[i].handle = r->handle;

radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head,
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c 
b/drivers/gpu/drm/radeon/radeon_vm.c
index 671ee56..1cce446 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -143,6 +143,7 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct 
radeon_device *rdev,
list[0].prefered_domains = RADEON_GEM_DOMAIN_VRAM;
list[0].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
list[0].tv.bo = &vm->page_directory->tbo;
+   list[0].tv.shared = false;
list[0].tiling_flags = 0;
list[0].handle = 0;
list_add(&list[0].tv.head, head);
@@ -156,6 +157,7 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct 
radeon_device *rdev,
list[idx].prefered_domains = RADEON_GEM_DOMAIN_VRAM;
list[idx].allowed_domains = RADEON_GEM_DOMAIN_VRAM;
list[idx].tv.bo = &list[idx].robj->tbo;
+   list[idx].tv.shared = false;
list[idx].tiling_flags = 0;
list[idx].handle = 0;
list_add(&list[idx++].tv.head, head);
@@ -395,6 +397,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev,

 memset(&tv, 0, sizeof(tv));
 tv.bo = &bo->tbo;
+   tv.shared = false;

 INIT_LIST_HEAD(&head);
 list_add(&tv.head, &head);
diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c 
b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
index adafc0f..8ce508e 100644
--- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
+++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
@@ -119,8 +119,14 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
ret = -EBUSY;
}

-   if (!ret)
-   continue;
+   if (!ret) {
+   if (!entry->shared)
+   continue;
+
+   ret = reservation_object_reserve_shared(bo->resv);
+   if (!ret)
+   continue;
+   }

/* uh oh, we lost out, drop every reservation and try
 * to only reserve this buffer, then start over if
@@ -136,6 +142,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
ret = 0;
}

+   if (!ret && entry->shared)
+   ret = reservation_object_reserve_shared(bo->resv);
+
if (unlikely(ret != 0)) {
if (ret == -EINTR)
ret = -ERESTARTSYS;
@@ -183,7 +192,10 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx 
*ticket,

list_for_each_entry(entry, list, head) {
bo = entry->bo;
-   reservation_object_add_excl_fence(bo->resv, fence);
+   if (entry->shared)
+   reservation_object_add_shared_fence(bo->resv, fence);
+   else
+   reservation_object_add_excl_fence(bo->resv, fence);
ttm_bo_add_to_lru(bo);
__ttm_bo_unreserve(bo);
}
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index 0ceaddc..b4de3b2 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -346,6 +346,7 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context 
*sw_context,
++sw_context->cur_val_buf;

[Bug 83510] Graphical glitches in Unreal Engine 4

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83510

--- Comment #4 from Cl?ment Gu?rin  ---
Created attachment 105767
  --> https://bugs.freedesktop.org/attachment.cgi?id=105767&action=edit
Black pixel + low light glitch in Realistic Rendering demo

Should look like this: https://www.youtube.com/watch?v=3AAf_2vry8A

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/262af647/attachment-0001.html>

[Bug 83510] Graphical glitches in Unreal Engine 4

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83510

--- Comment #3 from Cl?ment Gu?rin  ---
Created attachment 105766
  --> https://bugs.freedesktop.org/attachment.cgi?id=105766&action=edit
Black pixel glitch in Mobile Game demo

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/abf5ce60/attachment.html>

[Bug 83510] Graphical glitches in Unreal Engine 4

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83510

--- Comment #2 from Cl?ment Gu?rin  ---
Created attachment 105765
  --> https://bugs.freedesktop.org/attachment.cgi?id=105765&action=edit
Black pixel glitch in Stylized demo

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/bbe93ee6/attachment.html>

[Bug 83510] Graphical glitches in Unreal Engine 4

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83510

--- Comment #1 from Cl?ment Gu?rin  ---
Created attachment 105764
  --> https://bugs.freedesktop.org/attachment.cgi?id=105764&action=edit
Motion blur glitch in Shooter Game

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/1ecd54ce/attachment.html>

[Bug 83510] New: Graphical glitches in Unreal Engine 4

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83510

  Priority: medium
Bug ID: 83510
  Assignee: dri-devel at lists.freedesktop.org
   Summary: Graphical glitches in Unreal Engine 4
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: geecko.dev at free.fr
  Hardware: Other
Status: NEW
   Version: git
 Component: Drivers/Gallium/radeonsi
   Product: Mesa

While all the Linux demos are now running fine, there is some rendering errors
left.

- Some objects are blurred. It looks like motion blur on objects that don't
need it, like cameras, characters or weapons. You can see it in the Elemental
Demo or Shooter Game demo.
- Some pixels turn black when the camera is static, very noticeable on the sky
box in the Mobile Game demo.
- It seems that the lighting is too dark in some scenes, can be seen in the
Realistic Rendering demo or the Shooter Game demo with the Sanctuary map.

HD 7950, mesa-git and llvm-svn on latest Arch Linux.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/75e24436/attachment.html>

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> > Last time I tested, (and it seems like Michel is on the same track),
> > writing with the CPU to write-combined memory was substantially faster
> > than writing to cached memory, with the additional side-effect that CPU
> > caches are left unpolluted.
> > 
> > Moreover (although only tested on Intel's embedded chipsets), texturing
> > from cpu-cache-coherent PCI memory was a real GPU performance hog
> > compared to texturing from non-snooped memory. Hence, whenever a buffer
> > could be classified as GPU-read-only (or almost at least), it should be
> > placed in write-combined memory.
> 
> Just a quick comment since this explicitly referes to intel chips: On
> desktop/laptop chips with the big shared l3/l4 caches it's the other way
> round. Cached uploads are substantially faster than wc and not using
> coherent access is a severe perf hit for texturing. I guess the hw guys
> worked really hard to hide the snooping costs so that the gpu can benefit
> from the massive bandwidth these caches can provide.

This is similar to modern POWER chips as well. We have pretty big L3's
(though not technically shared they are in a separate quadrant and we
have a shared L4 in the memory buffer) and our fabric is generally
optimized for cachable/coherent access performance. In fact, we only
have so many credits for NC accesses on the bus...

What that tells me is that when setting up the desired cachability
attributes for the mapping of a memory object, we need to consider these
things here:

  - The hard requirement of the HW (non-coherent GPUs require NC, AGP
does in some cases, etc...) which I think is basically already handled
using the placement attributes set by the GPU driver for the memory type

  - The optimal attributes (and platform hard requirements) for fast
memory accesses to an object by the processor.  From what I read here,
this can be NC+WC on older Intel, cachable on newer, etc...)

  - The optimal attributes for fast GPU DMA accesses to the object in
system memory. Here too, this is fairly platform/chipset dependent.

Do we have flags in the DRM that tell us whether an object in memory is
more likely to be used by the GPU via DMA vs by the CPU via MMIO ? On
powerpc (except in the old AGP case), I wouldn't care about require
cachable in both case, but I can see the low latency crowd wanting the
former to be non-cachable while the dumb GPUs like AST who don't do DMA
would benefit greatly from the latter...

Cheers,
Ben.

[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 16:52 +0900, Michel D?nzer wrote:
> >   #endif
> > +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE)
> > + /*
> > +  * Using a non-cachable mapping of system memory on
> > +  * cache coherent powerpc's can be fatal, let's make
> > +  * sure this doesn't happen and warn if it does. The
> > +  * only exception is powermac with AGP which has to
> > +  * take the risk.
> > +  */
> > + if (!machine_is(powermac) &&
> > + ((caching_flags & TTM_PL_FLAG_SYSTEM) ||
> > +  (caching_flags & TTM_PL_FLAG_TT))) {
> > + pr_err_once("TTM: Attempt to use a non-cached"
> > + " mapping on RAM unsupported !\n");
> > + return tmp;
> 
> NAK, this breaks AGP on PowerMacs.

No it doesn't :-)

Cheers,
Ben.

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #14 from Emil Velikov  ---
(In reply to comment #12)
>  That happens with default ./autogen blah blah
> 
>  As i said earlier if i pass -mtune=native -march=native flags perf is
> normal, very weird and funny thing in the same time :D

Unless you're provided --enable-debug mesa does mess around with the compiler
optimisation/debug options (-O* and -g*). And it never touches -mtune or
-march. Might be that the compiler is going for some different heuristics
before/after the commit causing substantially different code to be generated ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/d439c493/attachment.html>

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #13 from smoki  ---
(In reply to comment #12)
>  That happens with default ./autogen blah blah
> 
>  As i said earlier if i pass -mtune=native -march=native flags perf is
> normal, very weird and funny thing in the same time :D

 But almost normal, as i said earlier too glretrace for example is slowish
again :) Very weird issues, all is fine on
37d43ebb28ce8be38f3d9b0805b8b14354ce786d.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/07e398d8/attachment.html>

[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()

2014-09-04 Thread Michel Dänzer

On 04.09.2014 18:34, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 16:52 +0900, Michel D?nzer wrote:
>>>#endif
>>> +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE)
>>> + /*
>>> +  * Using a non-cachable mapping of system memory on
>>> +  * cache coherent powerpc's can be fatal, let's make
>>> +  * sure this doesn't happen and warn if it does. The
>>> +  * only exception is powermac with AGP which has to
>>> +  * take the risk.
>>> +  */
>>> + if (!machine_is(powermac) &&
>>> + ((caching_flags & TTM_PL_FLAG_SYSTEM) ||
>>> +  (caching_flags & TTM_PL_FLAG_TT))) {
>>> + pr_err_once("TTM: Attempt to use a non-cached"
>>> + " mapping on RAM unsupported !\n");
>>> + return tmp;
>>
>> NAK, this breaks AGP on PowerMacs.
>
> No it doesn't :-)

Yeah sorry, I was blind.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

[Bug 83861] radeon power management cause audio skips and glitch

2014-09-04 Thread bugzilla-dae...@bugzilla.kernel.org

https://bugzilla.kernel.org/show_bug.cgi?id=83861

Rafael J. Wysocki  changed:

   What|Removed |Added

  Component|Other   |Video(DRI - non Intel)
   Assignee|rjw at rjwysocki.net   |drivers_video-dri at 
kernel-bu
   ||gs.osdl.org
Product|Power Management|Drivers

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #12 from smoki  ---
 That happens with default ./autogen blah blah

 As i said earlier if i pass -mtune=native -march=native flags perf is normal,
very weird and funny thing in the same time :D

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/2bfe8b0d/attachment.html>

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #11 from smoki  ---
(In reply to comment #8) 
> Can you guys bisect?

 So latest good is 37d43ebb28ce8be38f3d9b0805b8b14354ce786d, after
07c65b85eada8dd34019763b6e82ed4257a9b4a6 there is corruption (so i can't tested
this) all the way until 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 one week later
where is a fix for 3.17 kernel. Took the patch tried
07c65b85eada8dd34019763b6e82ed4257a9b4a6 again an bingo, blah, blah...

 So PIPE_USAGE_STREAM seems to be a main problem again, but unlike 64bit on
32bit it can't be reverted to old behavior, on 32bit it is unusable produce
much corruption.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/f483404c/attachment.html>

[PATCH 2/2] clk: Don't hold prepare_lock across debugfs creation

2014-09-04 Thread Stephen Boyd

Rob Clark reports a lockdep splat that involves the prepare_lock
chained with the mmap semaphore.

==
[ INFO: possible circular locking dependency detected ]
3.17.0-rc1-00050-g07a489b #802 Tainted: GW
---
Xorg.bin/5413 is trying to acquire lock:
 (prepare_lock){+.+.+.}, at: [] clk_prepare_lock+0x88/0xfc

but task is already holding lock:
 (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (qcom_iommu_lock){+.+...}:
   [] qcom_iommu_map+0x28/0x450
   [] iommu_map+0xc8/0x12c
   [] msm_iommu_map+0xb4/0x130
   [] msm_gem_get_iova_locked+0x9c/0xe8
   [] msm_gem_get_iova+0x4c/0x64
   [] mdp4_kms_init+0x4c4/0x6c0
   [] msm_load+0x2ac/0x34c
   [] drm_dev_register+0xac/0x108
   [] drm_platform_init+0x50/0xf0
   [] try_to_bring_up_master.part.3+0xc8/0x108
   [] component_master_add_with_match+0xa8/0x104
   [] msm_pdev_probe+0x64/0x70
   [] platform_drv_probe+0x2c/0x60
   [] driver_probe_device+0x108/0x234
   [] bus_for_each_drv+0x64/0x98
   [] device_attach+0x78/0x8c
   [] bus_probe_device+0x88/0xac
   [] deferred_probe_work_func+0x68/0x9c
   [] process_one_work+0x1a0/0x40c
   [] worker_thread+0x44/0x4d8
   [] kthread+0xd8/0xec
   [] ret_from_fork+0x14/0x2c

-> #3 (&dev->struct_mutex){+.+.+.}:
   [] drm_gem_mmap+0x38/0xd0
   [] msm_gem_mmap+0xc/0x5c
   [] mmap_region+0x35c/0x6c8
   [] do_mmap_pgoff+0x314/0x398
   [] vm_mmap_pgoff+0x84/0xb4
   [] SyS_mmap_pgoff+0x94/0xbc
   [] ret_fast_syscall+0x0/0x48

-> #2 (&mm->mmap_sem){++}:
   [] filldir64+0x68/0x180
   [] dcache_readdir+0x188/0x22c
   [] iterate_dir+0x9c/0x11c
   [] SyS_getdents64+0x78/0xe8
   [] ret_fast_syscall+0x0/0x48

-> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}:
   [] __create_file+0x58/0x1dc
   [] debugfs_create_dir+0x1c/0x24
   [] clk_debug_create_subtree+0x20/0x170
   [] clk_debug_init+0xec/0x14c
   [] do_one_initcall+0x8c/0x1c8
   [] kernel_init_freeable+0x13c/0x1dc
   [] kernel_init+0x8/0xe8
   [] ret_from_fork+0x14/0x2c

-> #0 (prepare_lock){+.+.+.}:
   [] mutex_lock_nested+0x70/0x3e8
   [] clk_prepare_lock+0x88/0xfc
   [] clk_prepare+0xc/0x24
   [] __enable_clocks.isra.4+0x18/0xa4
   [] __flush_iotlb_va+0xe0/0x114
   [] qcom_iommu_unmap+0xac/0x1f0
   [] iommu_unmap+0x9c/0xe8
   [] msm_iommu_unmap+0x64/0x84
   [] msm_gem_free_object+0x11c/0x338
   [] drm_gem_object_handle_unreference_unlocked+0xfc/0x130
   [] drm_gem_object_release_handle+0x50/0x68
   [] idr_for_each+0xa8/0xdc
   [] drm_gem_release+0x1c/0x28
   [] drm_release+0x370/0x428
   [] __fput+0x98/0x1e8
   [] task_work_run+0xb0/0xfc
   [] do_exit+0x2ec/0x948
   [] do_group_exit+0x4c/0xb8
   [] get_signal+0x28c/0x6ac
   [] do_signal+0xc4/0x3e4
   [] do_work_pending+0xb4/0xc4
   [] work_pending+0xc/0x20

other info that might help us debug this:

Chain exists of:
  prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(qcom_iommu_lock);
   lock(&dev->struct_mutex);
   lock(qcom_iommu_lock);
  lock(prepare_lock);

 *** DEADLOCK ***

3 locks held by Xorg.bin/5413:
 #0:  (drm_global_mutex){+.+.+.}, at: [] drm_release+0x34/0x428
 #1:  (&dev->struct_mutex){+.+.+.}, at: [] 
drm_gem_object_handle_unreference_unlocked+0xcc/0x130
 #2:  (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0

stack backtrace:
CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: GW  
3.17.0-rc1-00050-g07a489b #802
[] (unwind_backtrace) from [] (show_stack+0x10/0x14)
[] (show_stack) from [] (dump_stack+0x98/0xb8)
[] (dump_stack) from [] (print_circular_bug+0x218/0x340)
[] (print_circular_bug) from [] 
(__lock_acquire+0x1d24/0x20b8)
[] (__lock_acquire) from [] (lock_acquire+0x9c/0xbc)
[] (lock_acquire) from [] (mutex_lock_nested+0x70/0x3e8)
[] (mutex_lock_nested) from [] (clk_prepare_lock+0x88/0xfc)
[] (clk_prepare_lock) from [] (clk_prepare+0xc/0x24)
[] (clk_prepare) from [] (__enable_clocks.isra.4+0x18/0xa4)
[] (__enable_clocks.isra.4) from [] 
(__flush_iotlb_va+0xe0/0x114)
[] (__flush_iotlb_va) from [] (qcom_iommu_unmap+0xac/0x1f0)
[] (qcom_iommu_unmap) from [] (iommu_unmap+0x9c/0xe8)
[] (iommu_unmap) from [] (msm_iommu_unmap+0x64/0x84)
[] (msm_iommu_unmap) from [] 
(msm_gem_free_object+0x11c/0x338)
[] (msm_gem_free_object) from [] 
(drm_gem_object_handle_unreference_unlocked+0xfc/0x130)
[] (drm_gem_object_handle_unreference_unlocked) from [] 
(drm_gem_object_release_handle+0x50/0x68)
[] (drm_gem_object_release_handle) from [] 
(idr_for_each+0xa8/0xdc)
[] (idr_for_e

[PATCH 1/2] clk: Make __clk_lookup() use a list instead of tree search

2014-09-04 Thread Stephen Boyd

In the near future we're going to move the prepare lock to be a
per-clock ww_mutex. __clk_lookup() is called very deep in the
set-rate path and we would like to avoid having to take all the
locks in the clock tree to search for a clock (basically
defeating the purpose of introducing per-clock locks). Introduce
a new list that contains all clocks registered in the system and
walk this list until the clock is found.

Signed-off-by: Stephen Boyd 
---

Yeah this commit text could be updated and/or this could be squashed
into the next patch.

 drivers/clk/clk.c   | 52 +
 include/linux/clk-private.h |  1 +
 2 files changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index b76fa69b44cb..cf5df744cb21 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -33,8 +33,10 @@ static struct task_struct *enable_owner;
 static int prepare_refcnt;
 static int enable_refcnt;

+static DEFINE_MUTEX(clk_lookup_lock);
 static HLIST_HEAD(clk_root_list);
 static HLIST_HEAD(clk_orphan_list);
+static HLIST_HEAD(clk_lookup_list);
 static LIST_HEAD(clk_notifier_list);

 /***   locking ***/
@@ -670,46 +672,23 @@ out:
 }
 EXPORT_SYMBOL_GPL(__clk_is_enabled);

-static struct clk *__clk_lookup_subtree(const char *name, struct clk *clk)
-{
-   struct clk *child;
-   struct clk *ret;
-
-   if (!strcmp(clk->name, name))
-   return clk;
-
-   hlist_for_each_entry(child, &clk->children, child_node) {
-   ret = __clk_lookup_subtree(name, child);
-   if (ret)
-   return ret;
-   }
-
-   return NULL;
-}
-
 struct clk *__clk_lookup(const char *name)
 {
-   struct clk *root_clk;
-   struct clk *ret;
+   struct clk *clk;

if (!name)
return NULL;

-   /* search the 'proper' clk tree first */
-   hlist_for_each_entry(root_clk, &clk_root_list, child_node) {
-   ret = __clk_lookup_subtree(name, root_clk);
-   if (ret)
-   return ret;
+   mutex_lock(&clk_lookup_lock);
+   hlist_for_each_entry(clk, &clk_lookup_list, lookup_node) {
+   if (!strcmp(clk->name, name))
+   goto found;
}
+   clk = NULL;
+found:
+   mutex_unlock(&clk_lookup_lock);

-   /* if not found, then search the orphan tree */
-   hlist_for_each_entry(root_clk, &clk_orphan_list, child_node) {
-   ret = __clk_lookup_subtree(name, root_clk);
-   if (ret)
-   return ret;
-   }
-
-   return NULL;
+   return clk;
 }

 /*
@@ -1823,6 +1802,11 @@ int __clk_init(struct device *dev, struct clk *clk)

clk->parent = __clk_init_parent(clk);

+   /* Insert into clock lookup list */
+   mutex_lock(&clk_lookup_lock);
+   hlist_add_head(&clk->lookup_node, &clk_lookup_list);
+   mutex_unlock(&clk_lookup_lock);
+
/*
 * Populate clk->parent if parent has already been __clk_init'd.  If
 * parent has not yet been __clk_init'd then place clk in the orphan
@@ -2117,6 +2101,10 @@ void clk_unregister(struct clk *clk)

hlist_del_init(&clk->child_node);

+   mutex_lock(&clk_lookup_lock);
+   hlist_del_init(&clk->lookup_node);
+   mutex_unlock(&clk_lookup_lock);
+
if (clk->prepare_count)
pr_warn("%s: unregistering prepared clock: %s\n",
__func__, clk->name);
diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h
index efbf70b9fd84..3cd98a930006 100644
--- a/include/linux/clk-private.h
+++ b/include/linux/clk-private.h
@@ -48,6 +48,7 @@ struct clk {
unsigned long   accuracy;
struct hlist_head   children;
struct hlist_node   child_node;
+   struct hlist_node   lookup_node;
unsigned intnotifier_count;
 #ifdef CONFIG_DEBUG_FS
struct dentry   *dentry;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

[Bug 79980] Random radeonsi crashes

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=79980

--- Comment #129 from AdrianG  ---
Radeon 8550g/8670m - doesn't get passed login screen with 3.17-rc3. At least in
rc1 I could get to the desktop but then it would almost immediately hang.
(distro: Ubuntu 14.04 standard + Gnome 3.2).

Works like a charm on kernel 3.14*

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/851dad81/attachment.html>

[PATCH -v3 2/4] drm/i915: split intel_update_plane into check() and commit()

2014-09-04 Thread Ville Syrjälä

On Wed, Sep 03, 2014 at 05:10:16PM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Due to the upcoming atomic modesetting feature we need to separate
> some update functions into a check step that can fail and a commit
> step that should, ideally, never fail.
> 
> This commit splits intel_update_plane() and its commit part can still
> fail due to the fb pinning procedure.

Crap. I wrote a reply and somehow it seems to have gotten lost before I
sent it out. I'll try to recall it all again...

> 
> Signed-off-by: Gustavo Padovan 
> ---
>  drivers/gpu/drm/i915/intel_sprite.c | 253 
> +---
>  1 file changed, 150 insertions(+), 103 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_sprite.c 
> b/drivers/gpu/drm/i915/intel_sprite.c
> index 07a74ef..7b0d1a9 100644
> --- a/drivers/gpu/drm/i915/intel_sprite.c
> +++ b/drivers/gpu/drm/i915/intel_sprite.c
> @@ -845,57 +845,24 @@ static bool colorkey_enabled(struct intel_plane 
> *intel_plane)
>  }
>  
>  static int
> -intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc,
> -struct drm_framebuffer *fb, int crtc_x, int crtc_y,
> -unsigned int crtc_w, unsigned int crtc_h,
> -uint32_t src_x, uint32_t src_y,
> -uint32_t src_w, uint32_t src_h)
> +intel_check_sprite_plane(struct drm_plane *plane,
> +  struct intel_plane_state *state)
>  {
> - struct drm_device *dev = plane->dev;
> - struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> + struct intel_crtc *intel_crtc = to_intel_crtc(state->crtc);
>   struct intel_plane *intel_plane = to_intel_plane(plane);
> - enum pipe pipe = intel_crtc->pipe;
> + struct drm_framebuffer *fb = state->fb;
>   struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
>   struct drm_i915_gem_object *obj = intel_fb->obj;
> - struct drm_i915_gem_object *old_obj = intel_plane->obj;
> - int ret;
> - bool primary_enabled;
> - bool visible;
> + int crtc_x, crtc_y;
> + unsigned int crtc_w, crtc_h;
> + uint32_t src_x, src_y, src_w, src_h;
> + struct drm_rect *src = &state->src;
> + struct drm_rect *dst = &state->dst;
> + struct drm_rect *orig_src = &state->orig_src;
> + const struct drm_rect *clip = &state->clip;
>   int hscale, vscale;
>   int max_scale, min_scale;
>   int pixel_size = drm_format_plane_cpp(fb->pixel_format, 0);
> - struct drm_rect src = {
> - /* sample coordinates in 16.16 fixed point */
> - .x1 = src_x,
> - .x2 = src_x + src_w,
> - .y1 = src_y,
> - .y2 = src_y + src_h,
> - };
> - struct drm_rect dst = {
> - /* integer pixels */
> - .x1 = crtc_x,
> - .x2 = crtc_x + crtc_w,
> - .y1 = crtc_y,
> - .y2 = crtc_y + crtc_h,
> - };
> - const struct drm_rect clip = {
> - .x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0,
> - .y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0,
> - };
> - const struct {
> - int crtc_x, crtc_y;
> - unsigned int crtc_w, crtc_h;
> - uint32_t src_x, src_y, src_w, src_h;
> - } orig = {
> - .crtc_x = crtc_x,
> - .crtc_y = crtc_y,
> - .crtc_w = crtc_w,
> - .crtc_h = crtc_h,
> - .src_x = src_x,
> - .src_y = src_y,
> - .src_w = src_w,
> - .src_h = src_h,
> - };
>  
>   /* Don't modify another pipe's plane */
>   if (intel_plane->pipe != intel_crtc->pipe) {
> @@ -927,55 +894,55 @@ intel_update_plane(struct drm_plane *plane, struct 
> drm_crtc *crtc,
>   max_scale = intel_plane->max_downscale << 16;
>   min_scale = intel_plane->can_scale ? 1 : (1 << 16);
>  
> - drm_rect_rotate(&src, fb->width << 16, fb->height << 16,
> + drm_rect_rotate(src, fb->width << 16, fb->height << 16,
>   intel_plane->rotation);
>  
> - hscale = drm_rect_calc_hscale_relaxed(&src, &dst, min_scale, max_scale);
> + hscale = drm_rect_calc_hscale_relaxed(src, dst, min_scale, max_scale);
>   BUG_ON(hscale < 0);
>  
> - vscale = drm_rect_calc_vscale_relaxed(&src, &dst, min_scale, max_scale);
> + vscale = drm_rect_calc_vscale_relaxed(src, dst, min_scale, max_scale);
>   BUG_ON(vscale < 0);
>  
> - visible = drm_rect_clip_scaled(&src, &dst, &clip, hscale, vscale);
> + state->visible =  drm_rect_clip_scaled(src, dst, clip, hscale, vscale);
>  
> - crtc_x = dst.x1;
> - crtc_y = dst.y1;
> - crtc_w = drm_rect_width(&dst);
> - crtc_h = drm_rect_height(&dst);
> + crtc_x = dst->x1;
> + crtc_y = dst->y1;
> + crtc_w = drm_rect_width(dst);
> + crtc_h = drm_rect_height(dst);
>  
> - if (visible) {
> + if (state->visible) {
>   /* check again in case clipping clamped the results

ccf vs iommu vs drm locking fun

2014-09-04 Thread Stephen Boyd

On 09/04/14 17:46, Rob Clark wrote:
> So, I was looking at the below lockdep splat, and discussing it a bit
> w/ sboyd on IRC, and came to a slightly disturbing realization..
>
> The interaction between prepare_lock and debugfs bits is a little bit
> worrying.  In particular, it is probably not a good idea to assume
> that anyone who needs to grab prepare_lock does not already hold
> mmap_sem.  Not holding mmap_sem or locks that interact w/ mmap_sem is
> going to be pretty hard to avoid, at least for gpu drivers that are
> using iommus that are using CCF ;-)

I'm thinking one way to fix this is to replace the tree traversal for
debugfs registration with a list iteration of all registered clocks.
That way we don't hold the prepare mutex across debugfs directory/file
creation. This should break the chain.

Now that debugfs isn't a hierarchy, this becomes a lot easier, we just
need to keep a linked list of all the clocks that are registered. I
already have that patch for my wwmutex series, but I didn't convert
debugfs to use it. Two patches to follow.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 16:59 +0900, Michel D?nzer wrote:
> 
> Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm 
> not alone, at least with AGP 1x it seems to work quite well for most 
> people. So I don't see the justification for intentionally breaking it 
> completely for all of us.

Oh I wasn't arguing for breaking it, just jesting. We need to keep it
working. It's amazing how well broken stuff actually work though :-)

I mean, it's architecturally broken and if we get a collision between
the cache and the NCU, the chip will crash. We just get lucky I suppose.

Anyway, I'll try a different approach tomorrow see how it goes.

Cheers,
Ben.

[PATCH -v3 4/4] drm/i915: split intel_primary_plane_setplane() into check() and commit()

2014-09-04 Thread Ville Syrjälä

On Wed, Sep 03, 2014 at 05:10:18PM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> As a preparation for atomic updates we need to split the code to check
> everything we are going to commit first. This patch starts the work to
> split intel_primary_plane_setplane() into check() and commit() parts.
> 
> More work is expected on this to get a better split of the two steps.
> Ideally the commit() step should never fail.
> 
> Signed-off-by: Gustavo Padovan 
> ---
>  drivers/gpu/drm/i915/intel_display.c | 130 
> +++
>  1 file changed, 72 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index c3f1967..1e3985b 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11663,63 +11663,37 @@ disable_unpin:
>  }
>  
>  static int
> -intel_primary_plane_setplane(struct drm_plane *plane, struct drm_crtc *crtc,
> -  struct drm_framebuffer *fb, int crtc_x, int crtc_y,
> -  unsigned int crtc_w, unsigned int crtc_h,
> -  uint32_t src_x, uint32_t src_y,
> -  uint32_t src_w, uint32_t src_h)
> +intel_check_primary_plane(struct drm_plane *plane,
> +   struct intel_plane_state *state)
> +{
> + struct drm_crtc *crtc = state->crtc;
> + struct drm_framebuffer *fb = state->fb;
> + struct drm_rect *dest = &state->dst;
> + struct drm_rect *src = &state->src;
> + const struct drm_rect *clip = &state->clip;
> +
> + return drm_plane_helper_check_update(plane, crtc, fb,
> + src, dest, clip,
> + DRM_PLANE_HELPER_NO_SCALING,
> + DRM_PLANE_HELPER_NO_SCALING,
> + false, true, &state->visible);
> +}
> +
> +static int
> +intel_commit_primary_plane(struct drm_plane *plane,
> +struct intel_plane_state *state)
>  {
> + struct drm_crtc *crtc = state->crtc;
> + struct drm_framebuffer *fb = state->fb;
>   struct drm_device *dev = crtc->dev;
>   struct drm_i915_private *dev_priv = dev->dev_private;
>   struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>   struct drm_i915_gem_object *obj = intel_fb_obj(fb);
>   struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->fb);
> - struct drm_rect dest = {
> - /* integer pixels */
> - .x1 = crtc_x,
> - .y1 = crtc_y,
> - .x2 = crtc_x + crtc_w,
> - .y2 = crtc_y + crtc_h,
> - };
> - struct drm_rect src = {
> - /* 16.16 fixed point */
> - .x1 = src_x,
> - .y1 = src_y,
> - .x2 = src_x + src_w,
> - .y2 = src_y + src_h,
> - };
> - const struct drm_rect clip = {
> - /* integer pixels */
> - .x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0,
> - .y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0,
> - };
> - const struct {
> - int crtc_x, crtc_y;
> - unsigned int crtc_w, crtc_h;
> - uint32_t src_x, src_y, src_w, src_h;
> - } orig = {
> - .crtc_x = crtc_x,
> - .crtc_y = crtc_y,
> - .crtc_w = crtc_w,
> - .crtc_h = crtc_h,
> - .src_x = src_x,
> - .src_y = src_y,
> - .src_w = src_w,
> - .src_h = src_h,
> - };
>   struct intel_plane *intel_plane = to_intel_plane(plane);
> - bool visible;
> + struct drm_rect *src = &state->src;
>   int ret;
>  
> - ret = drm_plane_helper_check_update(plane, crtc, fb,
> - &src, &dest, &clip,
> - DRM_PLANE_HELPER_NO_SCALING,
> - DRM_PLANE_HELPER_NO_SCALING,
> - false, true, &visible);
> -
> - if (ret)
> - return ret;
> -
>   /*
>* If the CRTC isn't enabled, we're just pinning the framebuffer,
>* updating the fb pointer, and returning without touching the
> @@ -11754,7 +11728,7 @@ intel_primary_plane_setplane(struct drm_plane *plane, 
> struct drm_crtc *crtc,
>* happens if userspace explicitly disables the plane by passing fb=0
>* because plane->fb still gets set and pinned.
>*/
> - if (!visible) {
> + if (!state->visible) {
>   mutex_lock(&dev->struct_mutex);
>  
>   /*
> @@ -11801,7 +11775,7 @@ intel_primary_plane_setplane(struct drm_plane *plane, 
> struct drm_crtc *crtc,
>   intel_disable_fbc(dev);
>   }
>   }
> - ret = intel_pipe_set_base(crtc, src.x1, src.y1, fb);
> + ret = intel_pip

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote:

> > This will, from what I can tell, try to use the same caching mode as the
> > original object:
> >
> > if ((cur_placement & caching) != 0)
> > result |= (cur_placement & caching);
> >
> > And cur_placement comes from bo->mem.placement which as far as I can
> > tell is based on the placement array which the drivers set up.
> 
> This originates from the fact that when evicting GTT memory, on x86 it's
> unnecessary and undesirable to switch caching mode when going to system.

But that's what I don't quite understand. We have two different mappings
here. The VRAM and the memory object. We wouldn't be "switching"... we
are creating a temporary mapping for the memory object in order to do
the memcpy, but we seem to be doing it by using the caching attributes
of the VRAM object or am I missing something ? I don't see how that
makes sense so I suppose I'm missing something here :-)

> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.

That's very strange indeed. It's certainly an x86 specific artifact,
even if we were allowed by our hypervisor to map memory non-cachable
(the HW somewhat can), we tend to have a higher throughput by going
cachable, but that could be due to the way the PowerBus works (it's
basically very biased toward cachable transactions).

> I dislike the approach of rewriting placements. In some cases I think it
> won't even work, because placements are declared 'static const'
> 
> What I'd suggest is instead to intercept the driver response from
> init_mem_type() and filter out undesired caching modes from
> available_caching and default_caching, 

This was my original intent but Jerome seems to have different ideas
(see his proposed patches). I'm happy to revive mine as well and post it
as an alternative after I've tested it a bit more (tomorrow).

> perhaps also looking at whether
> the memory type is mappable or not. This should have the additional
> benefit of working everywhere, and if a caching mode is selected that's
> not available on the platform, you'll simply get an error. (I guess?)

You mean that if not mappable we don't bother filtering ?

The rule is really for me pretty simple:

   - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable

   - If it's PCIe memory space (VRAM, registers, ...) it MUST be
non-cachable.

Cheers,
Ben.

> /Thomas
> 
> 
> >
> > Cheers,
> > Ben.
> >
> >
> > ___
> > dri-devel mailing list
> > dri-devel at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH -v3 3/4] drm/i915: split intel_cursor_plane_update() into check() and commit()

2014-09-04 Thread Ville Syrjälä

On Wed, Sep 03, 2014 at 05:10:17PM -0300, Gustavo Padovan wrote:
> From: Gustavo Padovan 
> 
> Due to the upcoming atomic modesetting feature we need to separate
> some update functions into a check step that can fail and a commit
> step that should, ideally, never fail.
> 
> The commit part can still fail, but that should be solved in another
> upcoming patch.
> 
> Signed-off-by: Gustavo Padovan 
> ---
>  drivers/gpu/drm/i915/intel_display.c | 104 
> ++-
>  1 file changed, 67 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c 
> b/drivers/gpu/drm/i915/intel_display.c
> index 22d3902..c3f1967 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -11896,51 +11896,42 @@ intel_cursor_plane_disable(struct drm_plane *plane)
>  }
>  
>  static int
> -intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
> -   struct drm_framebuffer *fb, int crtc_x, int crtc_y,
> -   unsigned int crtc_w, unsigned int crtc_h,
> -   uint32_t src_x, uint32_t src_y,
> -   uint32_t src_w, uint32_t src_h)
> +intel_check_cursor_plane(struct drm_plane *plane,
> +  struct intel_plane_state *state)
>  {
> - struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> - struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
> - struct drm_i915_gem_object *obj = intel_fb->obj;
> - struct drm_rect dest = {
> - /* integer pixels */
> - .x1 = crtc_x,
> - .y1 = crtc_y,
> - .x2 = crtc_x + crtc_w,
> - .y2 = crtc_y + crtc_h,
> - };
> - struct drm_rect src = {
> - /* 16.16 fixed point */
> - .x1 = src_x,
> - .y1 = src_y,
> - .x2 = src_x + src_w,
> - .y2 = src_y + src_h,
> - };
> - const struct drm_rect clip = {
> - /* integer pixels */
> - .x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0,
> - .y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0,
> - };
> - bool visible;
> - int ret;
> + struct drm_crtc *crtc = state->crtc;
> + struct drm_framebuffer *fb = state->fb;
> + struct drm_rect *dest = &state->dst;
> + struct drm_rect *src = &state->src;
> + const struct drm_rect *clip = &state->clip;
>  
> - ret = drm_plane_helper_check_update(plane, crtc, fb,
> - &src, &dest, &clip,
> + return drm_plane_helper_check_update(plane, crtc, fb,
> + src, dest, clip,
>   DRM_PLANE_HELPER_NO_SCALING,
>   DRM_PLANE_HELPER_NO_SCALING,
> - true, true, &visible);
> - if (ret)
> - return ret;
> + true, true, &state->visible);
> +}
>  
> - crtc->cursor_x = crtc_x;
> - crtc->cursor_y = crtc_y;
> +static int
> +intel_commit_cursor_plane(struct drm_plane *plane,
> +   struct intel_plane_state *state)
> +{
> + struct drm_crtc *crtc = state->crtc;
> + struct drm_framebuffer *fb = state->fb;
> + struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> + struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
> + struct drm_i915_gem_object *obj = intel_fb->obj;
> + struct drm_rect *dest = &state->dst;
> + int crtc_w, crtc_h;
> +
> + crtc->cursor_x = state->dst.x1;
> + crtc->cursor_y = state->dst.y1;
>   if (fb != crtc->cursor->fb) {
> + crtc_w = drm_rect_width(dest);
> + crtc_h = drm_rect_height(dest);

These would need to be the original unclipped coordinates since we
program the cursor hardware with those and the hardware clips itself.

>   return intel_crtc_cursor_set_obj(crtc, obj, crtc_w, crtc_h);
>   } else {
> - intel_crtc_update_cursor(crtc, visible);
> + intel_crtc_update_cursor(crtc, state->visible);
>  
>   intel_frontbuffer_flip(crtc->dev,
>  
> INTEL_FRONTBUFFER_CURSOR(intel_crtc->pipe));
> @@ -11948,6 +11939,45 @@ intel_cursor_plane_update(struct drm_plane *plane, 
> struct drm_crtc *crtc,
>   return 0;
>   }
>  }
> +
> +static int
> +intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc,
> +   struct drm_framebuffer *fb, int crtc_x, int crtc_y,
> +   unsigned int crtc_w, unsigned int crtc_h,
> +   uint32_t src_x, uint32_t src_y,
> +   uint32_t src_w, uint32_t src_h)
> +{
> + struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> + struct intel_plane_state state;
> + int ret;
> +
> + state.crtc = crtc;
> + state.fb = fb;
>

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Thu, 2014-09-04 at 16:19 +0900, Michel D?nzer wrote:
> > +#else /* CONFIG_X86 */
> > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
> *placement)
> > +{
> > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> > + ttm->caching_state = tt_cached;
> > + *placement &= ~TTM_PL_MASK_CACHING;
> > + *placement |= TTM_PL_FLAG_CACHED;
> 
> NAK, this will break AGP on PowerMacs.

 ... which doesn't work reliably anyway with DRI2 :-)

The problem is ... with DRI1 I think we had tricks to take out the
AGP from the linear mapping but that want away, didn't we ?

In any case, we are playing with fire on these by allowing the
cache paradox. It just happens that those old CPUs aren't *that*
aggressive at speculative prefetch and we probably rarely hit the
lockups that they would cause...

Michel, what do you recommend we do then ? The patch I sent to
double check in ttm_io_prot() has a specific hack to avoid warning
on PowerMac for the above reason, but we need to fix Jerome if we
want to keep that broken-by-design Mac AGP functionality going :-)

Maybe we could add a similar ifdef in the above ?

Cheers,
Ben.

[PATCH 4/9] drm/ast: Don't assume DVO enabled means SIL164 on uninitialized chips

2014-09-04 Thread Benjamin Herrenschmidt


It looks like the AST2400 comes up with the DVO enable bit set,
which causes us to incorrectly assume we have a SIL164 regardless
of the value of the scratch registers setup by the BMC firmware.

So let's limit that test to the case where the chip has already
been setup by a BIOS.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_main.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index 556d065..48998b2 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -158,16 +158,22 @@ static int ast_detect_chip(struct drm_device *dev, bool 
*need_post)
/*
 * VGACRA3 Enhanced Color Mode Register, check if DVO is already
 * enabled, in that case, assume we have a SIL164 TMDS transmitter
+*
+* Don't make that assumption if we the chip wasn't enabled and
+* is at power-on reset, otherwise we'll incorrectly "detect" a
+* SIL164 when there is none.
 */
-   jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff);
-   if (jreg & 0x80)
-   ast->tx_chip_type = AST_TX_SIL164;
+   if (!*need_post) {
+   jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 
0xff);
+   if (jreg & 0x80)
+   ast->tx_chip_type = AST_TX_SIL164;
+   }

if ((ast->chip == AST2300) || (ast->chip == AST2400)) {
/*
 * On AST2300 and 2400, look the configuration set by the SoC in
 * the SOC scratch register #1 bits 11:8 (interestingly marked
-* as "reserved" in the spec
+* as "reserved" in the spec)
 */
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 
0xff);
switch (jreg) {

[PATCH 3/9] drm/ast: Properly initialize P2A base before using it in ast_init_3rdtx()

2014-09-04 Thread Benjamin Herrenschmidt


If the P2A has been used to target other SOC registers before that
call, we're going to hit the wrong place so make sure we set the
base address up properly before using it.

(P2A stands for PCIe to AHB bridge and is the bride that allows
accessing the AST's internal AHB bus using a relocatable 64k
window in the second half of the PCIe MMIO BAR)

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_dp501.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/ast/ast_dp501.c b/drivers/gpu/drm/ast/ast_dp501.c
index 5da4b62..7e2ddde 100644
--- a/drivers/gpu/drm/ast/ast_dp501.c
+++ b/drivers/gpu/drm/ast/ast_dp501.c
@@ -400,7 +400,18 @@ void ast_init_3rdtx(struct drm_device *dev)
if (ast->tx_chip_type == AST_TX_SIL164)
ast_init_dvo(dev);
else {
+   /*
+* Set DAC source to VGA mode in SCU2C via the 
P2A
+* bridge. First configure the P2U to target 
the SCU
+* in case it isn't at this stage.
+*/
+   ast_write32(ast, 0xf004, 0x1e6e);
+   ast_write32(ast, 0xf000, 0x1);
+   /* Then unlock the SCU with the magic password 
*/
ast_write32(ast, 0x12000, 0x1688a8a8);
+   ast_write32(ast, 0x12000, 0x1688a8a8);
+   ast_write32(ast, 0x12000, 0x1688a8a8);
+   /* Finally, clear bits [17:16] of SCU2c */
data = ast_read32(ast, 0x1202c);
data &= 0xfffc;
ast_write32(ast, 0, data);

[PATCH 2/9] drm/ast: POST chip at probe time if VGA not enabled

2014-09-04 Thread Benjamin Herrenschmidt


We need to do it on machines without a BIOS such as POWER8. Also
for detection to work without triggering PCIe errors, we need
to enable VGA early on, inside ast_detect_chip().

While touching those files, replace a few hard coded register
numbers with the corresponding symbolic constant.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_drv.h  |  3 +++
 drivers/gpu/drm/ast/ast_main.c | 47 --
 drivers/gpu/drm/ast/ast_post.c | 23 +
 3 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h
index a203a6f..78fc683 100644
--- a/drivers/gpu/drm/ast/ast_drv.h
+++ b/drivers/gpu/drm/ast/ast_drv.h
@@ -384,6 +384,9 @@ int ast_bo_push_sysram(struct ast_bo *bo);
 int ast_mmap(struct file *filp, struct vm_area_struct *vma);

 /* ast post */
+void ast_enable_vga(struct drm_device *dev);
+void ast_enable_mmio(struct drm_device *dev);
+bool ast_is_vga_enabled(struct drm_device *dev);
 void ast_post_gpu(struct drm_device *dev);
 u32 ast_mindwm(struct ast_private *ast, u32 r);
 void ast_moutdwm(struct ast_private *ast, u32 r, u32 v);
diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index c2ff793..556d065 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -63,7 +63,7 @@ uint8_t ast_get_index_reg_mask(struct ast_private *ast,
 }


-static int ast_detect_chip(struct drm_device *dev)
+static int ast_detect_chip(struct drm_device *dev, bool *need_post)
 {
struct ast_private *ast = dev->dev_private;
uint32_t data, jreg;
@@ -109,6 +109,21 @@ static int ast_detect_chip(struct drm_device *dev)
}
}

+   /*
+* If VGA isn't enabled, we need to enable now or subsequent
+* access to the scratch registers will fail. We also inform
+* our caller that it needs to POST the chip
+* (Assumption: VGA not enabled -> need to POST)
+*/
+   if (!ast_is_vga_enabled(dev)) {
+   ast_enable_vga(dev);
+   ast_enable_mmio(dev);
+   DRM_INFO("VGA not enabled on entry, requesting chip POST\n");
+   *need_post = true;
+   } else
+   *need_post = false;
+
+   /* Check if we support wide screen */
switch (ast->chip) {
case AST1180:
ast->support_wide_screen = true;
@@ -124,6 +139,7 @@ static int ast_detect_chip(struct drm_device *dev)
ast->support_wide_screen = true;
else {
ast->support_wide_screen = false;
+   /* Read SCU7c (silicon revision register) */
ast_write32(ast, 0xf004, 0x1e6e);
ast_write32(ast, 0xf000, 0x1);
data = ast_read32(ast, 0x1207c);
@@ -136,11 +152,23 @@ static int ast_detect_chip(struct drm_device *dev)
break;
}

+   /* Check 3rd Tx option (digital output afaik) */
ast->tx_chip_type = AST_TX_NONE;
+
+   /*
+* VGACRA3 Enhanced Color Mode Register, check if DVO is already
+* enabled, in that case, assume we have a SIL164 TMDS transmitter
+*/
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff);
if (jreg & 0x80)
ast->tx_chip_type = AST_TX_SIL164;
+
if ((ast->chip == AST2300) || (ast->chip == AST2400)) {
+   /*
+* On AST2300 and 2400, look the configuration set by the SoC in
+* the SOC scratch register #1 bits 11:8 (interestingly marked
+* as "reserved" in the spec
+*/
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 
0xff);
switch (jreg) {
case 0x04:
@@ -161,6 +189,17 @@ static int ast_detect_chip(struct drm_device *dev)
}
}

+   /* Print stuff for diagnostic purposes */
+   switch(ast->tx_chip_type) {
+   case AST_TX_SIL164:
+   DRM_INFO("Using Sil164 TMDS transmitter\n");
+   break;
+   case AST_TX_DP501:
+   DRM_INFO("Using DP501 DisplayPort transmitter\n");
+   break;
+   default:
+   DRM_INFO("Analog VGA only\n");
+   }
return 0;
 }

@@ -345,6 +384,7 @@ static u32 ast_get_vram_info(struct drm_device *dev)
 int ast_driver_load(struct drm_device *dev, unsigned long flags)
 {
struct ast_private *ast;
+   bool need_post;
int ret = 0;

ast = kzalloc(sizeof(struct ast_private), GFP_KERNEL);
@@ -379,7 +419,7 @@ int ast_driver_load(struct drm_device *dev, unsigned long 
flags)
}
}

-   ast_detect_chip(dev);
+   ast_detect_chip(dev, &need_post);

if (ast->chip != AST1180) {
ast_get_dram_info(dev);
@@ -387,6 +427,9 @@ int ast_driver_load(struct drm_device *dev

[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()

2014-09-04 Thread Benjamin Herrenschmidt

On all current cache coherent powerpc processors, it is not legit
to map system memory non-cachable. This will cause aliases with
the linear mapping which can be fatal.

The TTM should generally avoid it after Jerome placement patches but
let's add a sanity check anyway to catch any possible remaining issue.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index ef2ac3c..48095be 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -37,6 +37,9 @@
 #include 
 #include 
 #include 
+#if defined(__powerpc__)
+#include 
+#endif

 void ttm_bo_free_old_node(struct ttm_buffer_object *bo)
 {
@@ -498,6 +501,20 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp)
tmp = pgprot_noncached(tmp);

 #endif
+#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE)
+   /*
+* Using a non-cachable mapping of system memory on
+* cache coherent powerpc's can be fatal, let's make
+* sure this doesn't happen and warn if it does. The
+* only exception is powermac with AGP which has to
+* take the risk.
+*/
+   if (!machine_is(powermac) &&
+   ((caching_flags & TTM_PL_FLAG_SYSTEM) ||
+(caching_flags & TTM_PL_FLAG_TT))) {
+   pr_err_once("TTM: Attempt to use a non-cached"
+   " mapping on RAM unsupported !\n");
+   return tmp;
}
 #endif
 #if defined(__ia64__) || defined(__arm__) || defined(__powerpc__)

[PATCH 8/9] drm/ttm: Clean usage of ttm_io_prot() with TTM_PL_FLAG_CACHED

2014-09-04 Thread Benjamin Herrenschmidt

Today, most callers of ttm_io_prot() check TTM_PL_FLAG_CACHED before
calling it since on some archs it will unconditionally create non-cached
mappings.

But not all callers do which is incorrect as far as I can tell.

Instead, move that check inside ttm_io_port() itself for all archs
and make powerpc use the same implementation as ia64 and arm

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 19 ---
 drivers/gpu/drm/ttm/ttm_bo_vm.c   |  5 ++---
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index e31d48c..ef2ac3c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -487,28 +487,27 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,

 pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp)
 {
+   /* Cached mappings need no adjustment */
+   if (caching_flags & TTM_PL_FLAG_CACHED)
+   return tmp;
+
 #if defined(__i386__) || defined(__x86_64__)
if (caching_flags & TTM_PL_FLAG_WC)
tmp = pgprot_writecombine(tmp);
else if (boot_cpu_data.x86 > 3)
tmp = pgprot_noncached(tmp);

-#elif defined(__powerpc__)
-   if (!(caching_flags & TTM_PL_FLAG_CACHED)) {
-   pgprot_val(tmp) |= _PAGE_NO_CACHE;
-   if (caching_flags & TTM_PL_FLAG_UNCACHED)
-   pgprot_val(tmp) |= _PAGE_GUARDED;
+#endif
}
 #endif
-#if defined(__ia64__) || defined(__arm__)
+#if defined(__ia64__) || defined(__arm__) || defined(__powerpc__)
if (caching_flags & TTM_PL_FLAG_WC)
tmp = pgprot_writecombine(tmp);
else
tmp = pgprot_noncached(tmp);
 #endif
 #if defined(__sparc__) || defined(__mips__)
-   if (!(caching_flags & TTM_PL_FLAG_CACHED))
-   tmp = pgprot_noncached(tmp);
+   tmp = pgprot_noncached(tmp);
 #endif
return tmp;
 }
@@ -567,9 +566,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo,
 * We need to use vmap to get the desired page protection
 * or to make the buffer object look contiguous.
 */
-   prot = (mem->placement & TTM_PL_FLAG_CACHED) ?
-   PAGE_KERNEL :
-   ttm_io_prot(mem->placement, PAGE_KERNEL);
+   prot = ttm_io_prot(mem->placement, PAGE_KERNEL);
map->bo_kmap_type = ttm_bo_map_vmap;
map->virtual = vmap(ttm->pages + start_page, num_pages,
0, prot);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 0ce48e5..4ce8dc1 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -200,9 +200,8 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, 
struct vm_fault *vmf)
cvma.vm_page_prot);
} else {
ttm = bo->ttm;
-   if (!(bo->mem.placement & TTM_PL_FLAG_CACHED))
-   cvma.vm_page_prot = ttm_io_prot(bo->mem.placement,
-   cvma.vm_page_prot);
+   cvma.vm_page_prot = ttm_io_prot(bo->mem.placement,
+   cvma.vm_page_prot);

/* Allocate all page at once, most common usage */
if (ttm->bdev->driver->ttm_tt_populate(ttm)) {

[PATCH 7/9] drm: powerpc can use a simpler drm_io_prot()

2014-09-04 Thread Benjamin Herrenschmidt

What the code does is equivalent to the x86 code, so let's use
it as well

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/drm_vm.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index 24e045c..ed02563 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -48,15 +48,11 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
 {
pgprot_t tmp = vm_get_page_prot(vma->vm_flags);

-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
tmp = pgprot_noncached(tmp);
else
tmp = pgprot_writecombine(tmp);
-#elif defined(__powerpc__)
-   pgprot_val(tmp) |= _PAGE_NO_CACHE;
-   if (map->type == _DRM_REGISTERS)
-   pgprot_val(tmp) |= _PAGE_GUARDED;
 #elif defined(__ia64__)
if (efi_range_is_wc(vma->vm_start, vma->vm_end -
vma->vm_start))

[PATCH 6/9] drm/ttm: force cached mapping on non x86 platform

2014-09-04 Thread Benjamin Herrenschmidt

From: J?r?me Glisse 

People interested in providing uncached or write combined mapping
on there architecture need to do the ground work inside there arch
specific code to allow to break the linear kernel mapping so that
page mapping attributes can be updated, in the meantime force cached
mapping for non x86 architecture.

Signed-off-by: J?r?me Glisse 
Signed-off-by: Benjamin Herrenschmidt 
---

[Minor compile fixes on top of Jerome original v3]

 drivers/gpu/drm/radeon/radeon_ttm.c |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c|  2 +-
 drivers/gpu/drm/ttm/ttm_bo_util.c   |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c| 32 +---
 include/drm/ttm/ttm_bo_driver.h |  2 +-
 5 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 72afe82..4dd5060 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object 
*bo,
return r;
}

-   r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement);
+   r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement);
if (unlikely(r)) {
goto out_cleanup;
}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 3da89d5..4dc21c3 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object 
*bo,
goto out_err;
}

-   ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement);
+   ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement);
if (ret)
goto out_err;

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index 30e5d90..e31d48c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -59,7 +59,7 @@ int ttm_bo_move_ttm(struct ttm_buffer_object *bo,
old_mem->mem_type = TTM_PL_SYSTEM;
}

-   ret = ttm_tt_set_placement_caching(ttm, new_mem->placement);
+   ret = ttm_tt_set_placement_caching(ttm, &new_mem->placement);
if (unlikely(ret != 0))
return ret;

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index bf080ab..19ae8ee 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p,

return ret;
 }
-#else /* CONFIG_X86 */
-static inline int ttm_tt_set_page_caching(struct page *p,
- enum ttm_caching_state c_old,
- enum ttm_caching_state c_new)
-{
-   return 0;
-}
-#endif /* CONFIG_X86 */

 /*
  * Change caching policy for the linear kernel map
@@ -149,19 +141,37 @@ out_err:
return ret;
 }

-int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement)
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
 {
enum ttm_caching_state state;

-   if (placement & TTM_PL_FLAG_WC)
+   if (*placement & TTM_PL_FLAG_WC)
state = tt_wc;
-   else if (placement & TTM_PL_FLAG_UNCACHED)
+   else if (*placement & TTM_PL_FLAG_UNCACHED)
state = tt_uncached;
else
state = tt_cached;

return ttm_tt_set_caching(ttm, state);
 }
+#else /* CONFIG_X86 */
+int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
+{
+   if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
+   ttm->caching_state = tt_cached;
+   *placement &= ~TTM_PL_MASK_CACHING;
+   *placement |= TTM_PL_FLAG_CACHED;
+   } else {
+   if (*placement & TTM_PL_FLAG_WC)
+   ttm->caching_state = tt_wc;
+   else if (*placement & TTM_PL_FLAG_UNCACHED)
+   ttm->caching_state = tt_uncached;
+   else
+   ttm->caching_state = tt_cached;
+   }
+   return 0;
+}
+#endif /* CONFIG_X86 */
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);

 void ttm_tt_destroy(struct ttm_tt *ttm)
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1d9f0f1..cbc5ad2 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm);
  * hit RAM. This function may be very costly as it involves global TLB
  * and cache flushes and potential page splitting / combining.
  */
-extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t 
placement);
+extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t 
*placement);
 extern int ttm_tt_swapout(struct ttm_tt *ttm,
  struct file *persistent_swap_stora

[PATCH 5/9] drm/ast: Cleanup analog init code path

2014-09-04 Thread Benjamin Herrenschmidt

Move the MMIO mangling to a separate routine and actually
disable the DVO output when using pure analog.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_dp501.c | 49 ++---
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_dp501.c b/drivers/gpu/drm/ast/ast_dp501.c
index 7e2ddde..76f07f3 100644
--- a/drivers/gpu/drm/ast/ast_dp501.c
+++ b/drivers/gpu/drm/ast/ast_dp501.c
@@ -379,11 +379,39 @@ static bool ast_init_dvo(struct drm_device *dev)
return true;
 }

+
+static void ast_init_analog(struct drm_device *dev)
+{
+   struct ast_private *ast = dev->dev_private;
+   u32 data;
+
+   /*
+* Set DAC source to VGA mode in SCU2C via the P2A
+* bridge. First configure the P2U to target the SCU
+* in case it isn't at this stage.
+*/
+   ast_write32(ast, 0xf004, 0x1e6e);
+   ast_write32(ast, 0xf000, 0x1);
+
+   /* Then unlock the SCU with the magic password */
+   ast_write32(ast, 0x12000, 0x1688a8a8);
+   ast_write32(ast, 0x12000, 0x1688a8a8);
+   ast_write32(ast, 0x12000, 0x1688a8a8);
+
+   /* Finally, clear bits [17:16] of SCU2c */
+   data = ast_read32(ast, 0x1202c);
+   data &= 0xfffc;
+   ast_write32(ast, 0, data);
+
+   /* Disable DVO */
+   ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xcf, 0x00);
+}
+
 void ast_init_3rdtx(struct drm_device *dev)
 {
struct ast_private *ast = dev->dev_private;
u8 jreg;
-   u32 data;
+
if (ast->chip == AST2300 || ast->chip == AST2400) {
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 
0xff);
switch (jreg & 0x0e) {
@@ -399,23 +427,8 @@ void ast_init_3rdtx(struct drm_device *dev)
default:
if (ast->tx_chip_type == AST_TX_SIL164)
ast_init_dvo(dev);
-   else {
-   /*
-* Set DAC source to VGA mode in SCU2C via the 
P2A
-* bridge. First configure the P2U to target 
the SCU
-* in case it isn't at this stage.
-*/
-   ast_write32(ast, 0xf004, 0x1e6e);
-   ast_write32(ast, 0xf000, 0x1);
-   /* Then unlock the SCU with the magic password 
*/
-   ast_write32(ast, 0x12000, 0x1688a8a8);
-   ast_write32(ast, 0x12000, 0x1688a8a8);
-   ast_write32(ast, 0x12000, 0x1688a8a8);
-   /* Finally, clear bits [17:16] of SCU2c */
-   data = ast_read32(ast, 0x1202c);
-   data &= 0xfffc;
-   ast_write32(ast, 0, data);
-   }
+   else
+   ast_init_analog(dev);
}
}
 }

[PATCH 4/9] drm/ast: Don't assume DVO enabled means SIL164 on uninitialized chips

2014-09-04 Thread Benjamin Herrenschmidt

It looks like the AST2400 comes up with the DVO enable bit set,
which causes us to incorrectly assume we have a SIL164 regardless
of the value of the scratch registers setup by the BMC firmware.

So let's limit that test to the case where the chip has already
been setup by a BIOS.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_main.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index 556d065..48998b2 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -158,16 +158,22 @@ static int ast_detect_chip(struct drm_device *dev, bool 
*need_post)
/*
 * VGACRA3 Enhanced Color Mode Register, check if DVO is already
 * enabled, in that case, assume we have a SIL164 TMDS transmitter
+*
+* Don't make that assumption if we the chip wasn't enabled and
+* is at power-on reset, otherwise we'll incorrectly "detect" a
+* SIL164 when there is none.
 */
-   jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff);
-   if (jreg & 0x80)
-   ast->tx_chip_type = AST_TX_SIL164;
+   if (!*need_post) {
+   jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 
0xff);
+   if (jreg & 0x80)
+   ast->tx_chip_type = AST_TX_SIL164;
+   }

if ((ast->chip == AST2300) || (ast->chip == AST2400)) {
/*
 * On AST2300 and 2400, look the configuration set by the SoC in
 * the SOC scratch register #1 bits 11:8 (interestingly marked
-* as "reserved" in the spec
+* as "reserved" in the spec)
 */
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 
0xff);
switch (jreg) {

[PATCH 2/9] drm/ast: POST chip at probe time if VGA not enabled

2014-09-04 Thread Benjamin Herrenschmidt

We need to do it on machines without a BIOS such as POWER8. Also
for detection to work without triggering PCIe errors, we need
to enable VGA early on, inside ast_detect_chip().

While touching those files, replace a few hard coded register
numbers with the corresponding symbolic constant.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_drv.h  |  3 +++
 drivers/gpu/drm/ast/ast_main.c | 47 --
 drivers/gpu/drm/ast/ast_post.c | 23 +
 3 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h
index a203a6f..78fc683 100644
--- a/drivers/gpu/drm/ast/ast_drv.h
+++ b/drivers/gpu/drm/ast/ast_drv.h
@@ -384,6 +384,9 @@ int ast_bo_push_sysram(struct ast_bo *bo);
 int ast_mmap(struct file *filp, struct vm_area_struct *vma);

 /* ast post */
+void ast_enable_vga(struct drm_device *dev);
+void ast_enable_mmio(struct drm_device *dev);
+bool ast_is_vga_enabled(struct drm_device *dev);
 void ast_post_gpu(struct drm_device *dev);
 u32 ast_mindwm(struct ast_private *ast, u32 r);
 void ast_moutdwm(struct ast_private *ast, u32 r, u32 v);
diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index c2ff793..556d065 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -63,7 +63,7 @@ uint8_t ast_get_index_reg_mask(struct ast_private *ast,
 }


-static int ast_detect_chip(struct drm_device *dev)
+static int ast_detect_chip(struct drm_device *dev, bool *need_post)
 {
struct ast_private *ast = dev->dev_private;
uint32_t data, jreg;
@@ -109,6 +109,21 @@ static int ast_detect_chip(struct drm_device *dev)
}
}

+   /*
+* If VGA isn't enabled, we need to enable now or subsequent
+* access to the scratch registers will fail. We also inform
+* our caller that it needs to POST the chip
+* (Assumption: VGA not enabled -> need to POST)
+*/
+   if (!ast_is_vga_enabled(dev)) {
+   ast_enable_vga(dev);
+   ast_enable_mmio(dev);
+   DRM_INFO("VGA not enabled on entry, requesting chip POST\n");
+   *need_post = true;
+   } else
+   *need_post = false;
+
+   /* Check if we support wide screen */
switch (ast->chip) {
case AST1180:
ast->support_wide_screen = true;
@@ -124,6 +139,7 @@ static int ast_detect_chip(struct drm_device *dev)
ast->support_wide_screen = true;
else {
ast->support_wide_screen = false;
+   /* Read SCU7c (silicon revision register) */
ast_write32(ast, 0xf004, 0x1e6e);
ast_write32(ast, 0xf000, 0x1);
data = ast_read32(ast, 0x1207c);
@@ -136,11 +152,23 @@ static int ast_detect_chip(struct drm_device *dev)
break;
}

+   /* Check 3rd Tx option (digital output afaik) */
ast->tx_chip_type = AST_TX_NONE;
+
+   /*
+* VGACRA3 Enhanced Color Mode Register, check if DVO is already
+* enabled, in that case, assume we have a SIL164 TMDS transmitter
+*/
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff);
if (jreg & 0x80)
ast->tx_chip_type = AST_TX_SIL164;
+
if ((ast->chip == AST2300) || (ast->chip == AST2400)) {
+   /*
+* On AST2300 and 2400, look the configuration set by the SoC in
+* the SOC scratch register #1 bits 11:8 (interestingly marked
+* as "reserved" in the spec
+*/
jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 
0xff);
switch (jreg) {
case 0x04:
@@ -161,6 +189,17 @@ static int ast_detect_chip(struct drm_device *dev)
}
}

+   /* Print stuff for diagnostic purposes */
+   switch(ast->tx_chip_type) {
+   case AST_TX_SIL164:
+   DRM_INFO("Using Sil164 TMDS transmitter\n");
+   break;
+   case AST_TX_DP501:
+   DRM_INFO("Using DP501 DisplayPort transmitter\n");
+   break;
+   default:
+   DRM_INFO("Analog VGA only\n");
+   }
return 0;
 }

@@ -345,6 +384,7 @@ static u32 ast_get_vram_info(struct drm_device *dev)
 int ast_driver_load(struct drm_device *dev, unsigned long flags)
 {
struct ast_private *ast;
+   bool need_post;
int ret = 0;

ast = kzalloc(sizeof(struct ast_private), GFP_KERNEL);
@@ -379,7 +419,7 @@ int ast_driver_load(struct drm_device *dev, unsigned long 
flags)
}
}

-   ast_detect_chip(dev);
+   ast_detect_chip(dev, &need_post);

if (ast->chip != AST1180) {
ast_get_dram_info(dev);
@@ -387,6 +427,9 @@ int ast_driver_load(struct drm_device *dev,

[PATCH 1/9] drm/ast: Try to use MMIO registers when PIO isn't supported

2014-09-04 Thread Benjamin Herrenschmidt

If the PIO resources haven't been assigned, then we have no choice
but try to use the MMIO version. This is the case for example on
POWER8 which doesn't support PIO at all.

Chips rev 0x20 or later have MMIO decoding enabled by default.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/gpu/drm/ast/ast_drv.h  |  5 -
 drivers/gpu/drm/ast/ast_main.c | 20 +---
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h
index 957d4fa..a203a6f 100644
--- a/drivers/gpu/drm/ast/ast_drv.h
+++ b/drivers/gpu/drm/ast/ast_drv.h
@@ -125,8 +125,9 @@ struct ast_gem_object;

 #define AST_IO_AR_PORT_WRITE   (0x40)
 #define AST_IO_MISC_PORT_WRITE (0x42)
+#define AST_IO_VGA_ENABLE_PORT (0x43)
 #define AST_IO_SEQ_PORT(0x44)
-#define AST_DAC_INDEX_READ (0x3c7)
+#define AST_IO_DAC_INDEX_READ  (0x47)
 #define AST_IO_DAC_INDEX_WRITE (0x48)
 #define AST_IO_DAC_DATA(0x49)
 #define AST_IO_GR_PORT (0x4E)
@@ -134,6 +135,8 @@ struct ast_gem_object;
 #define AST_IO_INPUT_STATUS1_READ  (0x5A)
 #define AST_IO_MISC_PORT_READ  (0x4C)

+#define AST_IO_MM_OFFSET   (0x380)
+
 #define __ast_read(x) \
 static inline u##x ast_read##x(struct ast_private *ast, u32 reg) { \
 u##x val = 0;\
diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c
index a2cc6be..c2ff793 100644
--- a/drivers/gpu/drm/ast/ast_main.c
+++ b/drivers/gpu/drm/ast/ast_main.c
@@ -359,10 +359,24 @@ int ast_driver_load(struct drm_device *dev, unsigned long 
flags)
ret = -EIO;
goto out_free;
}
-   ast->ioregs = pci_iomap(dev->pdev, 2, 0);
+
+   /*
+* If we don't have IO space at all, use MMIO now and
+* assume the chip has MMIO enabled by default (rev 0x20
+* and higher).
+*/
+   if (!(pci_resource_flags(dev->pdev, 2) & IORESOURCE_IO)) {
+   DRM_INFO("platform has no IO space, trying MMIO\n");
+   ast->ioregs = ast->regs + AST_IO_MM_OFFSET;
+   }
+
+   /* "map" IO regs if the above hasn't done so already */
if (!ast->ioregs) {
-   ret = -EIO;
-   goto out_free;
+   ast->ioregs = pci_iomap(dev->pdev, 2, 0);
+   if (!ast->ioregs) {
+   ret = -EIO;
+   goto out_free;
+   }
}

ast_detect_chip(dev);

TTM placement & caching issue/questions

2014-09-04 Thread Michel Dänzer

On 04.09.2014 16:59, Michel D?nzer wrote:
> On 04.09.2014 16:54, Benjamin Herrenschmidt wrote:
>> On Thu, 2014-09-04 at 16:19 +0900, Michel D?nzer wrote:
 +#else /* CONFIG_X86 */
 +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
>>> *placement)
 +{
 + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
 + ttm->caching_state = tt_cached;
 + *placement &= ~TTM_PL_MASK_CACHING;
 + *placement |= TTM_PL_FLAG_CACHED;
>>>
>>> NAK, this will break AGP on PowerMacs.
>>
>>   ... which doesn't work reliably anyway with DRI2 :-)
>
> Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm
> not alone, at least with AGP 1x it seems to work quite well for most
> people. So I don't see the justification for intentionally breaking it
> completely for all of us.

Even more so because PCI GART is unusably slow in general.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

TTM placement & caching issue/questions

2014-09-04 Thread Michel Dänzer

On 04.09.2014 16:54, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 16:19 +0900, Michel D?nzer wrote:
>>> +#else /* CONFIG_X86 */
>>> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t
>> *placement)
>>> +{
>>> + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
>>> + ttm->caching_state = tt_cached;
>>> + *placement &= ~TTM_PL_MASK_CACHING;
>>> + *placement |= TTM_PL_FLAG_CACHED;
>>
>> NAK, this will break AGP on PowerMacs.
>
>   ... which doesn't work reliably anyway with DRI2 :-)

Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm 
not alone, at least with AGP 1x it seems to work quite well for most 
people. So I don't see the justification for intentionally breaking it 
completely for all of us.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

SI display gap for more than 2 displays

2014-09-04 Thread Sylvain BERTRAND

On Thu, Sep 04, 2014 at 03:52:20PM +0200, Sylvain BERTRAND wrote:
> Hi,
> 
> In si_program_display_gap we have DISP1_GAP and DISP2_GAP.
> 
> Where are DISP3_GAP to DISP6_GAP? What does expect this hardware
> block when more than 2 displays are connected? Is DISP2_GAP
> actually stand for DISP[3-6]_GAP?
> 
> Still in the same function, what happened to the pipes for
> DCCG_DISP[2-6]_SLOW_SELECT?

I noticed something else: in si_enable_display_gap, the
DISP1_GAP_MCHG and DISP2_GAP_MCHG fields from CG_DISPLAY_GAP_CNTL
get inited with DISP1 only to vblank, and never reprogrammed
with new displays like DISP[12]_GAP. It seems not consistant,
expected?

regards,

-- 
Sylvain BERTRAND

[Bug 83505] New: AMD A4-5300 APU : only radeon.dpm=1 prevents random reboots with 3.16.1 kernel.

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83505

  Priority: medium
Bug ID: 83505
  Assignee: dri-devel at lists.freedesktop.org
   Summary: AMD A4-5300 APU : only radeon.dpm=1 prevents random
reboots with 3.16.1 kernel.
  Severity: minor
Classification: Unclassified
OS: Linux (All)
  Reporter: rpnpif at free.fr
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: unspecified
 Component: DRM/Radeon
   Product: DRI

Created attachment 105760
  --> https://bugs.freedesktop.org/attachment.cgi?id=105760&action=edit
dmesg with radeon.dpm=1

I have read that for this AMD A4-5300 APU (Aruba), radeon.dpm=1 on the grub
command line of the kernel booting is useless because this option is already
per default in the kernel.

But if I don't use this option, Debian 7.6 reboots randomly after some minutes,
hours or days of work when I use Iceweasel/Firefox and some flash animation,
Libreoffice or other heavy application.

With this option, no reboots. All work fine after several days.

This issue seems about the same as
https://bugs.freedesktop.org/show_bug.cgi?id=82201#c26 but I am not sure.

I report this issue because dpm=1 should be useless.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/b194d598/attachment.html>

[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()

2014-09-04 Thread Michel Dänzer

On 04.09.2014 16:47, Benjamin Herrenschmidt wrote:
> On all current cache coherent powerpc processors, it is not legit
> to map system memory non-cachable. This will cause aliases with
> the linear mapping which can be fatal.
>
> The TTM should generally avoid it after Jerome placement patches but
> let's add a sanity check anyway to catch any possible remaining issue.
>
> Signed-off-by: Benjamin Herrenschmidt 

[...]

> @@ -498,6 +501,20 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t 
> tmp)
>   tmp = pgprot_noncached(tmp);
>
>   #endif
> +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE)
> + /*
> +  * Using a non-cachable mapping of system memory on
> +  * cache coherent powerpc's can be fatal, let's make
> +  * sure this doesn't happen and warn if it does. The
> +  * only exception is powermac with AGP which has to
> +  * take the risk.
> +  */
> + if (!machine_is(powermac) &&
> + ((caching_flags & TTM_PL_FLAG_SYSTEM) ||
> +  (caching_flags & TTM_PL_FLAG_TT))) {
> + pr_err_once("TTM: Attempt to use a non-cached"
> + " mapping on RAM unsupported !\n");
> + return tmp;

NAK, this breaks AGP on PowerMacs.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83416

--- Comment #18 from Laurent carlier  ---
(In reply to comment #17)
> Does this Mesa patch help?
> 
> https://bugs.freedesktop.org/attachment.cgi?id=105755

No, it doesn't help

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/f9c3266a/attachment.html>

[Bug 81239] Evolution window content not shown fully (only desktop background)

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=81239

--- Comment #11 from Paul Menzel  ---
I created ticket 736069 [1] in the GNOME Bugzilla bug tracker.

[1] https://bugzilla.gnome.org/show_bug.cgi?id=736069

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/6a2dba55/attachment.html>

TTM placement & caching issue/questions

2014-09-04 Thread Michel Dänzer

On 04.09.2014 11:36, Jerome Glisse wrote:
> On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
>> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
>>> On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
>>>
 So in the meantime the attached patch should work, it just silently ignore
 the caching attribute request on non x86 instead of pretending that things
 are setup as expected and then latter the radeon ou nouveau hw unsetting
 the snoop bit.

 It's not tested but i think it should work.
>>>
>>> I'm still getting placements with !CACHED going from bo_memcpy in
>>> ttm_io_prot() though ... I'm looking at filtering the placement
>>> attributes instead.
>>>
>>> Ben.
>>
>> Ok so this one should do the trick.
>
> Ok final version ... famous last word.

[...]

> +#else /* CONFIG_X86 */
> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement)
> +{
> + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) {
> + ttm->caching_state = tt_cached;
> + *placement &= ~TTM_PL_MASK_CACHING;
> + *placement |= TTM_PL_FLAG_CACHED;

NAK, this will break AGP on PowerMacs.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

TTM placement & caching issue/questions

2014-09-04 Thread Michel Dänzer

On 04.09.2014 10:55, Jerome Glisse wrote:
>
> While i agree about the issue of incoherent double map of same page, i
> think we have more issue. For instance lattely AMD have been pushing a
> lot of patches to move things to use uncached memory for radeon and as
> usual thoses patches comes with no comment to the motivations of those
> changes.

That would have been a fair review comment...


> What i understand is that uncached mapping for some frequently use buffer
> give a significant performance boost (i am assuming this has to do with
> all the snoop pci transaction overhead).

Exactly, although it's a win even if the data is written by the CPU only 
once and read by the GPU only once.


> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

GTT with AGP being one exception.


-- 
Earthling Michel D?nzer|  http://www.amd.com
Libre software enthusiast  |Mesa and X developer

[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83416

--- Comment #17 from Grigori Goronzy  ---
Does this Mesa patch help?

https://bugs.freedesktop.org/attachment.cgi?id=105755

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/e8552757/attachment-0001.html>

[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences

2014-09-04 Thread Maarten Lankhorst

Hey,

Op 04-09-14 om 15:34 schreef Christian K?nig:
>> I need to check the docs how to do this correctly,
> The docs don't really cover this case.
>
> For the GPU waiting on an address there is an extra document just for this 
> case which I don't have at hand right now. But IIRC it was recommended to use 
> the local memory of the device waiting on the semaphore. I'm just not sure if 
> that's for pure performance reasons to avoid accessing the bus or if there's 
> a hard and unavoidable hardware reason to do so.
>
> For the GPU signaling case there is a special bit in the semaphore 
> instructions that you need to set if any user outside of the GPU should see 
> the write.
>
> In general it is explicitly supported to use semaphores for inter device 
> synchronization on the bus (that's what the block is made for), but it's not 
> intended to be used for synchronization between the CPU and the device. So 
> I'm not sure if things like cache snooping is implemented and correctly 
> supported.
>
> Well I see the feature more like nice to have and needs a bunch of testing, 
> so I would say either wait with the patch for now or make it optional to use 
> or something like that.
You're right, it's meant as something 'nice to have'. This is why it came after 
the patch that exports reservation_object to/from dma-buf. :-)

~Maarten

SI display gap for more than 2 displays

2014-09-04 Thread Sylvain BERTRAND

Hi,

In si_program_display_gap we have DISP1_GAP and DISP2_GAP.

Where are DISP3_GAP to DISP6_GAP? What does expect this hardware
block when more than 2 displays are connected? Is DISP2_GAP
actually stand for DISP[3-6]_GAP?

Still in the same function, what happened to the pipes for
DCCG_DISP[2-6]_SLOW_SELECT?

regards,

-- 
Sylvain

P.S. It seems that all this was "fixed" in CI with new hardware
blocks, but I'm focussing on SI blocks.

[Bug 83500] si_dma_copy_tile causes GPU hangs

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83500

--- Comment #2 from Grigori Goronzy  ---
Created attachment 105755
  --> https://bugs.freedesktop.org/attachment.cgi?id=105755&action=edit
Better fix

This is a possibly better fix that only disables DMA if 1D tiling is involved.
Please give it a try.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/4dc47054/attachment.html>

[PATCH v2] drm/radeon: reduce memory footprint for debugging

2014-09-04 Thread Andy Shevchenko

There is no need to use hex_dump_to_buffer() since we have a kernel helper to
dump up to 64 bytes just via printk(). In our case the actual size is 15 bytes.

Signed-off-by: Andy Shevchenko 
---
 drivers/gpu/drm/radeon/atombios_dp.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_dp.c 
b/drivers/gpu/drm/radeon/atombios_dp.c
index 95ea276..95d5d4a 100644
--- a/drivers/gpu/drm/radeon/atombios_dp.c
+++ b/drivers/gpu/drm/radeon/atombios_dp.c
@@ -405,16 +405,13 @@ bool radeon_dp_getdpcd(struct radeon_connector 
*radeon_connector)
u8 msg[DP_DPCD_SIZE];
int ret;

-   char dpcd_hex_dump[DP_DPCD_SIZE * 3];
-
ret = drm_dp_dpcd_read(&radeon_connector->ddc_bus->aux, DP_DPCD_REV, 
msg,
   DP_DPCD_SIZE);
if (ret > 0) {
memcpy(dig_connector->dpcd, msg, DP_DPCD_SIZE);

-   hex_dump_to_buffer(dig_connector->dpcd, 
sizeof(dig_connector->dpcd),
-  32, 1, dpcd_hex_dump, sizeof(dpcd_hex_dump), 
false);
-   DRM_DEBUG_KMS("DPCD: %s\n", dpcd_hex_dump);
+   DRM_DEBUG_KMS("DPCD: %*ph\n", (int)sizeof(dig_connector->dpcd),
+ dig_connector->dpcd);

radeon_dp_probe_oui(radeon_connector);

-- 
2.1.0

[PATCH] drm/radeon: reduce memory footprint for debugging

2014-09-04 Thread Andy Shevchenko

There is no need to use hex_dump_to_buffer() since we have a kernel helper to
dump up to 64 bytes just via printk(). In our case the actual size is 15 bytes.

Signed-off-by: Andy Shevchenko 
---
 drivers/gpu/drm/radeon/atombios_dp.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/atombios_dp.c 
b/drivers/gpu/drm/radeon/atombios_dp.c
index 95ea276..4e75c48 100644
--- a/drivers/gpu/drm/radeon/atombios_dp.c
+++ b/drivers/gpu/drm/radeon/atombios_dp.c
@@ -405,16 +405,13 @@ bool radeon_dp_getdpcd(struct radeon_connector 
*radeon_connector)
u8 msg[DP_DPCD_SIZE];
int ret;

-   char dpcd_hex_dump[DP_DPCD_SIZE * 3];
-
ret = drm_dp_dpcd_read(&radeon_connector->ddc_bus->aux, DP_DPCD_REV, 
msg,
   DP_DPCD_SIZE);
if (ret > 0) {
memcpy(dig_connector->dpcd, msg, DP_DPCD_SIZE);

-   hex_dump_to_buffer(dig_connector->dpcd, 
sizeof(dig_connector->dpcd),
-  32, 1, dpcd_hex_dump, sizeof(dpcd_hex_dump), 
false);
-   DRM_DEBUG_KMS("DPCD: %s\n", dpcd_hex_dump);
+   DRM_DEBUG_KMS("DPCD: %*ph\n", dig_connector->dpcd,
+ (int)sizeof(dig_connector->dpcd));

radeon_dp_probe_oui(radeon_connector);

-- 
2.1.0

[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83416

Laurent carlier  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #16 from Laurent carlier  ---
Bad luck, it's hanging again! -> reopened

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/832b91d8/attachment.html>

[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences

2014-09-04 Thread Christian König

> I need to check the docs how to do this correctly,
The docs don't really cover this case.

For the GPU waiting on an address there is an extra document just for 
this case which I don't have at hand right now. But IIRC it was 
recommended to use the local memory of the device waiting on the 
semaphore. I'm just not sure if that's for pure performance reasons to 
avoid accessing the bus or if there's a hard and unavoidable hardware 
reason to do so.

For the GPU signaling case there is a special bit in the semaphore 
instructions that you need to set if any user outside of the GPU should 
see the write.

In general it is explicitly supported to use semaphores for inter device 
synchronization on the bus (that's what the block is made for), but it's 
not intended to be used for synchronization between the CPU and the 
device. So I'm not sure if things like cache snooping is implemented and 
correctly supported.

Well I see the feature more like nice to have and needs a bunch of 
testing, so I would say either wait with the patch for now or make it 
optional to use or something like that.

Regards,
Christian.

Am 04.09.2014 um 14:25 schrieb Christian K?nig:
> Am 04.09.2014 um 14:08 schrieb Maarten Lankhorst:
>> Hey,
>>
>> Op 04-09-14 om 13:54 schreef Christian K?nig:
>>> Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst:
 Use the semaphore mechanism to make this happen, this uses signaling
 from the cpu instead of signaling by the gpu.
>>> I'm not sure if this will work reliable when the semaphores are in 
>>> system memory. We might need to reserve some VRAM for them instead.
>>>
>>> Regards,
>>> Christian.
>> Why would it be unreliable? I mostly kept it in semaphore for 
>> simplicity.
>
> The semaphore block tries to avoid memory accesses whenever possible.
>
> For example when a signal for address A arrives the block doesn't 
> necessary writes that to memory but instead tries to match it 
> immediately with a wait for address A. Similar is true if a wait for 
> address A arrives and the semaphore block thinks it knows the memory 
> value at address A.
>
> Also I'm not sure if the semaphore block really polls the memory 
> address for changes, instead it might just snoop the MC for writes to 
> this address. Since CPU writes to system memory aren't seen by the GPU 
> MC the semaphore block would never know something changed.
>
> I need to check the docs how to do this correctly,
> Christian.
>
>>
>> ~Maarten
>>
>

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 22:36 -0400, Jerome Glisse wrote:
> On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote:
> > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote:
> > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:
> > > 
> > > > So in the meantime the attached patch should work, it just silently 
> > > > ignore
> > > > the caching attribute request on non x86 instead of pretending that 
> > > > things
> > > > are setup as expected and then latter the radeon ou nouveau hw unsetting
> > > > the snoop bit.
> > > > 
> > > > It's not tested but i think it should work.
> > > 
> > > I'm still getting placements with !CACHED going from bo_memcpy in
> > > ttm_io_prot() though ... I'm looking at filtering the placement
> > > attributes instead.
> > > 
> > > Ben.
> > 
> > Ok so this one should do the trick.
> 
> Ok final version ... famous last word.

Minus a couple of obvious typos that prevent if from building, it seems
to do the trick for me with the AST driver, no more bad mappings.

I'll still send a patch that catches the incorrect mapping attempts
inside ttm_io_prot() and warns to help future debugging and avoid
"random" behaviour. (I need to fix other things in the powerpc code
in there anyway).

Cheers,
Ben.

[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83416

Laurent carlier  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Laurent carlier  ---
I can confirm that 8bd67231797e5d79d72a4e91b37ea81da30c6df3 is fixing the hang.

Thanks Marek, closing!

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/8b653eac/attachment.html>

[Bug 81644] Random crashes on RadeonSI with Chromium.

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=81644

--- Comment #82 from Aaron B  ---
I'm still bisecting, but I just want to say I suck at it and I'll probably need
at least 2 bisects to the same point, if not more. I'm trying to be patient,
but on the old Mesa's the glitch just takes so long to do, even when I set it
up to do it.

So, should I skip to bisecting if this DMA patch that was just proposed is the
source of our problem, also?

https://bugs.freedesktop.org/show_bug.cgi?id=83500

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/46ca10bf/attachment.html>

[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process

2014-09-04 Thread Daniel Vetter

On Thu, Sep 04, 2014 at 11:52:15AM +, Gupta, Sourab wrote:
> On Thu, 2014-09-04 at 10:01 +, Daniel Vetter wrote:
> > Interface design discussions should happen in public (so that
> > non-intel people can jump in, which happens rather often for other
> > drivers actually). But at least include internal mailing lists next
> > time around. Also adding dri-devel.
> > 
> > The problem I see with your approach is that "process-wise" is not a
> > solid concept with drm. We can dump information per open drm file, but
> > that file descriptor can be shared between processes. And the latest
> > generation of linux compositor protocols (like dri3) actually take
> > advantage of this.
> 
> By "process-wise" sharing, do you mean the sharing of the drm file
> across different processes (having different tgid's), or is it sharing
> across the threads of a single process (having same tgid)?
> Sorry, we are not aware of the sharing of drm file across processes in
> dri3 protocols, as in android userspace, we have not come across such
> scenario. Can you please shed some light on it.
> 
> In our design, we have a tgid based accounting mechanism. As long as the
> drm file is shared within the threads of the same process, its resources
> (objects and memory) are accounted together. But if the drm file is
> shared across different processes (diff tgid's), this case is still an
> open.
> Will our tgid based accounting cover the dri3 usecases also (if they
> share drm file within same tgid)?

Well in unix a file descriptor is simply not tied to a process/thread at
all, so if you expose accounting data for resources which are tied to file
descriptors then that doesn't work. E.g.
- fork inteherits all the filedescriptors from its parents, same for exec
- you can pass file descriptors explicitly between processes over unix
  domain sockets (this is what dri3 does).

So if you'd use the tgid of the process that opened the file you'd account
everything to the X server with dri3. Which is not really useful.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences

2014-09-04 Thread Christian König

Am 04.09.2014 um 14:08 schrieb Maarten Lankhorst:
> Hey,
>
> Op 04-09-14 om 13:54 schreef Christian K?nig:
>> Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst:
>>> Use the semaphore mechanism to make this happen, this uses signaling
>>> from the cpu instead of signaling by the gpu.
>> I'm not sure if this will work reliable when the semaphores are in system 
>> memory. We might need to reserve some VRAM for them instead.
>>
>> Regards,
>> Christian.
> Why would it be unreliable? I mostly kept it in semaphore for simplicity.

The semaphore block tries to avoid memory accesses whenever possible.

For example when a signal for address A arrives the block doesn't 
necessary writes that to memory but instead tries to match it 
immediately with a wait for address A. Similar is true if a wait for 
address A arrives and the semaphore block thinks it knows the memory 
value at address A.

Also I'm not sure if the semaphore block really polls the memory address 
for changes, instead it might just snoop the MC for writes to this 
address. Since CPU writes to system memory aren't seen by the GPU MC the 
semaphore block would never know something changed.

I need to check the docs how to do this correctly,
Christian.

>
> ~Maarten
>

[Bug 83432] r600_query.c:269:r600_emit_query_end: Assertion `ctx->num_pipelinestat_queries > 0' failed [Gallium HUD]

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83432

Marek Ol??k  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Marek Ol??k  ---
Fixed by 3dbf55c1be5a8867616e475d943c776d8245d0c. Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ddb19548/attachment.html>

[PATCH 14/19] drm: Don't update vblank timestamp when the counter didn't change

2014-09-04 Thread Mario Kleiner

I thought about this one again and opposed to my previous comment now think
it's fine, also for drivers without hw vblank counter queries.

-mario



On Wed, Aug 6, 2014 at 1:49 PM,  wrote:

> From: Ville Syrj?l? 
>
> If we already have a timestamp for the current vblank counter, don't
> update it with a new timestmap. Small errors can creep in between two
> timestamp queries for the same vblank count, which could be confusing to
> userspace when it queries the timestamp for the same vblank sequence
> number twice.
>
> This problem gets exposed when the vblank disable timer is not used
> (or is set to expire quickly) and thus we can get multiple vblank
> disable<->enable transition during the same frame which would all
> attempt to update the timestamp with the latest estimate.
>
> Testcase: igt/kms_flip/flip-vs-expired-vblank
> Signed-off-by: Ville Syrj?l? 
> ---
>  drivers/gpu/drm/drm_irq.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index af33df1..0523f5b 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -106,6 +106,9 @@ static void drm_update_vblank_count(struct drm_device
> *dev, int crtc)
> DRM_DEBUG("enabling vblank interrupts on crtc %d, missed %d\n",
>   crtc, diff);
>
> +   if (diff == 0)
> +   return;
> +
> /* Reinitialize corresponding vblank timestamp if high-precision
> query
>  * available. Skip this step if query unsupported or failed. Will
>  * reinitialize delayed at next vblank interrupt in that case.
> --
> 1.8.5.5
>
>
------ next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/9cdf6567/attachment.html>

[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences

2014-09-04 Thread Maarten Lankhorst

Hey,

Op 04-09-14 om 13:54 schreef Christian K?nig:
> Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst:
>> Use the semaphore mechanism to make this happen, this uses signaling
>> from the cpu instead of signaling by the gpu.
>
> I'm not sure if this will work reliable when the semaphores are in system 
> memory. We might need to reserve some VRAM for them instead.
>
> Regards,
> Christian.
Why would it be unreliable? I mostly kept it in semaphore for simplicity.

~Maarten

[Bug 83500] si_dma_copy_tile causes GPU hangs

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83500

--- Comment #1 from Marek Ol??k  ---
Thank you very much for tracking this down.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/520bd07c/attachment.html>

[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences

2014-09-04 Thread Christian König

Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst:
> Use the semaphore mechanism to make this happen, this uses signaling
> from the cpu instead of signaling by the gpu.

I'm not sure if this will work reliable when the semaphores are in 
system memory. We might need to reserve some VRAM for them instead.

Regards,
Christian.

>
> Signed-off-by: Maarten Lankhorst 
> ---
>   drivers/gpu/drm/radeon/radeon.h   |  17 ++-
>   drivers/gpu/drm/radeon/radeon_cs.c|  30 ++---
>   drivers/gpu/drm/radeon/radeon_fence.c |  13 ++-
>   drivers/gpu/drm/radeon/radeon_semaphore.c | 184 
> ++
>   4 files changed, 221 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index dddb2b7dd752..cd18fa7f801c 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -359,6 +359,11 @@ struct radeon_fence_driver {
>   struct delayed_work lockup_work;
>   };
>   
> +struct radeon_fence_cb {
> + struct fence_cb base;
> + struct fence *fence;
> +};
> +
>   struct radeon_fence {
>   struct fence base;
>   
> @@ -368,6 +373,10 @@ struct radeon_fence {
>   unsignedring;
>   
>   wait_queue_tfence_wake;
> +
> + atomic_tnum_cpu_cbs;
> + struct radeon_fence_cb  *cpu_cbs;
> + uint32_t*cpu_sema;
>   };
>   
>   int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
> @@ -574,9 +583,11 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
>*/
>   struct radeon_semaphore {
>   struct radeon_sa_bo *sa_bo;
> - signed  waiters;
> + signed  waiters, cpu_waiters, cpu_waiters_max;
>   uint64_tgpu_addr;
>   struct radeon_fence *sync_to[RADEON_NUM_RINGS];
> + uint32_t*cpu_sema;
> + struct radeon_fence_cb  *cpu_cbs;
>   };
>   
>   int radeon_semaphore_create(struct radeon_device *rdev,
> @@ -587,6 +598,10 @@ bool radeon_semaphore_emit_wait(struct radeon_device 
> *rdev, int ring,
>   struct radeon_semaphore *semaphore);
>   void radeon_semaphore_sync_to(struct radeon_semaphore *semaphore,
> struct radeon_fence *fence);
> +int radeon_semaphore_sync_obj(struct radeon_device *rdev,
> +   struct radeon_semaphore *semaphore,
> +   struct reservation_object *resv);
> +
>   int radeon_semaphore_sync_rings(struct radeon_device *rdev,
>   struct radeon_semaphore *semaphore,
>   int waiting_ring);
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
> b/drivers/gpu/drm/radeon/radeon_cs.c
> index 8ad4e2cfae15..b141f5bd029d 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -250,32 +250,16 @@ static int radeon_cs_get_ring(struct radeon_cs_parser 
> *p, u32 ring, s32 priority
>   
>   static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
>   {
> - int i;
> -
> - for (i = 0; i < p->nrelocs; i++) {
> - struct reservation_object *resv;
> - struct fence *fence;
> - struct radeon_fence *rfence;
> - int r;
> + int i, ret = 0;
>   
> + for (i = 0; !ret && i < p->nrelocs; i++) {
>   if (!p->relocs[i].robj)
>   continue;
>   
> - resv = p->relocs[i].robj->tbo.resv;
> - fence = reservation_object_get_excl(resv);
> - if (!fence)
> - continue;
> - rfence = to_radeon_fence(fence);
> - if (!rfence || rfence->rdev != p->rdev) {
> - r = fence_wait(fence, true);
> - if (r)
> - return r;
> - continue;
> - }
> -
> - radeon_semaphore_sync_to(p->ib.semaphore, rfence);
> + ret = radeon_semaphore_sync_obj(p->rdev, p->ib.semaphore,
> + p->relocs[i].robj->tbo.resv);
>   }
> - return 0;
> + return ret;
>   }
>   
>   /* XXX: note that this is called from the legacy UMS CS ioctl as well */
> @@ -442,6 +426,10 @@ static void radeon_cs_parser_fini(struct 
> radeon_cs_parser *parser, int error, bo
>*/
>   list_sort(NULL, &parser->validated, cmp_size_smaller_first);
>   
> + /* must be called with all reservation_objects still held */
> + radeon_semaphore_free(parser->rdev, &parser->ib.semaphore,
> +   parser->ib.fence);
> +
>   ttm_eu_fence_buffer_objects(&parser->ticket,
>   &parser->validated,
>   &parser->

[Bug 83418] EU IV is incorrectly rendered after git1409011930.d571f2

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83418

Marek Ol??k  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Marek Ol??k  ---
I reverted the problematic commit as 8bd67231797e5d79d72a4e91b37ea81da30c6df3.
Closing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/534b5783/attachment.html>

[Bug 75112] Meta Bug for HyperZ issues on r600g and radeonsi

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=75112

Bug 75112 depends on bug 83418, which changed state.

Bug 83418 Summary: EU IV is incorrectly rendered after git1409011930.d571f2
https://bugs.freedesktop.org/show_bug.cgi?id=83418

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/17efa872/attachment.html>

[Bug 79980] Random radeonsi crashes

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=79980

--- Comment #128 from Grigori Goronzy  ---
You might want to try the patch in
https://bugs.freedesktop.org/show_bug.cgi?id=83500

Maybe some of these issues have a common cause.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ad0d226a/attachment.html>

[Bug 83500] New: si_dma_copy_tile causes GPU hangs

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83500

  Priority: medium
Bug ID: 83500
  Assignee: dri-devel at lists.freedesktop.org
   Summary: si_dma_copy_tile causes GPU hangs
  Severity: normal
Classification: Unclassified
OS: Linux (All)
  Reporter: greg at chown.ath.cx
  Hardware: x86-64 (AMD64)
Status: NEW
   Version: git
 Component: Drivers/Gallium/radeonsi
   Product: Mesa

Created attachment 105745
  --> https://bugs.freedesktop.org/attachment.cgi?id=105745&action=edit
Workaround

Async DMA linear to tiled copies are causing GPU hangs in some cases. On Cape
Verde, I can easily triggers this as described in [1]. The game Brutal Legend
also triggers similar hangs when it streams assets while gameplay.

Disabling usage of this function and using the resource_copy_region fallback
instead fixes all hangs. The attached patch does that.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=79980#c124

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/683ad5c6/attachment.html>

[PATCH 3/7] drm/radeon: cope with foreign fences inside the reservation object

2014-09-04 Thread Christian König

Am 04.09.2014 um 13:40 schrieb Maarten Lankhorst:
> Not the whole world is a radeon! :-)
>
> Signed-off-by: Maarten Lankhorst 
> ---
>   drivers/gpu/drm/radeon/radeon.h | 11 -
>   drivers/gpu/drm/radeon/radeon_cs.c  | 32 +
>   drivers/gpu/drm/radeon/radeon_display.c | 41 
> -
>   drivers/gpu/drm/radeon/radeon_fence.c   |  3 +++
>   drivers/gpu/drm/radeon/radeon_mode.h|  1 +
>   5 files changed, 61 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index d80dc547a105..dddb2b7dd752 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -699,17 +699,6 @@ void radeon_doorbell_free(struct radeon_device *rdev, 
> u32 doorbell);
>* IRQS.
>*/
>   
> -struct radeon_flip_work {
> - struct work_struct  flip_work;
> - struct work_struct  unpin_work;
> - struct radeon_device*rdev;
> - int crtc_id;
> - uint64_tbase;
> - struct drm_pending_vblank_event *event;
> - struct radeon_bo*old_rbo;
> - struct radeon_fence *fence;
> -};
> -

Please keep this structure were it was, apart from that the patch looks 
good on first glance.

Christian.

>   struct r500_irq_stat_regs {
>   u32 disp_int;
>   u32 hdmi0_status;
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
> b/drivers/gpu/drm/radeon/radeon_cs.c
> index 6e3d1c8f3483..8ad4e2cfae15 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -248,23 +248,34 @@ static int radeon_cs_get_ring(struct radeon_cs_parser 
> *p, u32 ring, s32 priority
>   return 0;
>   }
>   
> -static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
> +static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
>   {
>   int i;
>   
>   for (i = 0; i < p->nrelocs; i++) {
>   struct reservation_object *resv;
>   struct fence *fence;
> + struct radeon_fence *rfence;
> + int r;
>   
>   if (!p->relocs[i].robj)
>   continue;
>   
>   resv = p->relocs[i].robj->tbo.resv;
>   fence = reservation_object_get_excl(resv);
> + if (!fence)
> + continue;
> + rfence = to_radeon_fence(fence);
> + if (!rfence || rfence->rdev != p->rdev) {
> + r = fence_wait(fence, true);
> + if (r)
> + return r;
> + continue;
> + }
>   
> - radeon_semaphore_sync_to(p->ib.semaphore,
> -  (struct radeon_fence *)fence);
> + radeon_semaphore_sync_to(p->ib.semaphore, rfence);
>   }
> + return 0;
>   }
>   
>   /* XXX: note that this is called from the legacy UMS CS ioctl as well */
> @@ -474,13 +485,19 @@ static int radeon_cs_ib_chunk(struct radeon_device 
> *rdev,
>   return r;
>   }
>   
> + r = radeon_cs_sync_rings(parser);
> + if (r) {
> + if (r != -ERESTARTSYS)
> + DRM_ERROR("Failed to sync rings: %i\n", r);
> + return r;
> + }
> +
>   if (parser->ring == R600_RING_TYPE_UVD_INDEX)
>   radeon_uvd_note_usage(rdev);
>   else if ((parser->ring == TN_RING_TYPE_VCE1_INDEX) ||
>(parser->ring == TN_RING_TYPE_VCE2_INDEX))
>   radeon_vce_note_usage(rdev);
>   
> - radeon_cs_sync_rings(parser);
>   r = radeon_ib_schedule(rdev, &parser->ib, NULL, true);
>   if (r) {
>   DRM_ERROR("Failed to schedule IB !\n");
> @@ -567,7 +584,12 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device 
> *rdev,
>   if (r) {
>   goto out;
>   }
> - radeon_cs_sync_rings(parser);
> + r = radeon_cs_sync_rings(parser);
> + if (r) {
> + if (r != -ERESTARTSYS)
> + DRM_ERROR("Failed to sync rings: %i\n", r);
> + goto out;
> + }
>   radeon_semaphore_sync_to(parser->ib.semaphore, vm->fence);
>   
>   if ((rdev->family >= CHIP_TAHITI) &&
> diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
> b/drivers/gpu/drm/radeon/radeon_display.c
> index bc894c17b2f9..715b2d95346c 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -36,6 +36,17 @@
>   
>   #include 
>   
> +struct radeon_flip_work {
> + struct work_struct  flip_work;
> + struct work_struct  unpin_work;
> + struct radeon_device*rdev;
> + int crtc_id;
> + uint64_tbase;
> + struct drm_pending_vblank_event *event;
> + struct radeon_bo*old_rbo;
> + struct fence*fence;

[PATCH 7/7] drm/nouveau: allow asynchronous waiting using gart fences

2014-09-04 Thread Maarten Lankhorst

This requires allocating a fence sooner to annotate any
cross-dev fences, and making sure that enough memory is
available before emitting the fence.

The current seqno is written to the GART bo on completion,
and a list of finished fences is kept to allow arbitrary depth.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c  |  28 ++--
 drivers/gpu/drm/nouveau/nouveau_chan.c|   6 +-
 drivers/gpu/drm/nouveau/nouveau_display.c |  45 ---
 drivers/gpu/drm/nouveau/nouveau_fence.c   | 212 ++
 drivers/gpu/drm/nouveau/nouveau_fence.h   |  29 ++--
 drivers/gpu/drm/nouveau/nouveau_gem.c |  25 ++--
 drivers/gpu/drm/nouveau/nv04_fence.c  |   9 +-
 drivers/gpu/drm/nouveau/nv10_fence.c  |   9 +-
 drivers/gpu/drm/nouveau/nv84_fence.c  |  31 +++--
 drivers/gpu/drm/nouveau/nvc0_fence.c  |   4 +-
 10 files changed, 305 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index f89b4a7c93fe..24c941927926 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -970,21 +970,21 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int 
evict, bool intr,
}

mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING);
-   ret = nouveau_fence_sync(nouveau_bo(bo), chan, true);
-   if (ret == 0) {
+   ret = nouveau_fence_new(chan, &fence);
+   if (ret)
+   goto out;
+
+   ret = nouveau_fence_sync(nouveau_bo(bo), fence, true);
+   if (ret == 0)
ret = drm->ttm.move(chan, bo, &bo->mem, new_mem);
-   if (ret == 0) {
-   ret = nouveau_fence_new(chan, false, &fence);
-   if (ret == 0) {
-   ret = ttm_bo_move_accel_cleanup(bo,
-   &fence->base,
-   evict,
-   no_wait_gpu,
-   new_mem);
-   nouveau_fence_unref(&fence);
-   }
-   }
-   }
+   if (ret == 0)
+   ret = nouveau_fence_emit(fence);
+   if (ret == 0)
+   ret = ttm_bo_move_accel_cleanup(bo, &fence->base, evict,
+   no_wait_gpu, new_mem);
+   nouveau_fence_unref(&fence);
+
+out:
mutex_unlock(&cli->mutex);
return ret;
 }
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
b/drivers/gpu/drm/nouveau/nouveau_chan.c
index d639750379d6..1e5c76dfed3a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -46,9 +46,11 @@ nouveau_channel_idle(struct nouveau_channel *chan)
struct nouveau_fence *fence = NULL;
int ret;

-   ret = nouveau_fence_new(chan, false, &fence);
+   ret = nouveau_fence_new(chan, &fence);
if (!ret) {
-   ret = nouveau_fence_wait(fence, false, false);
+   ret = nouveau_fence_emit(fence);
+   if (!ret)
+   ret = nouveau_fence_wait(fence, false, false);
nouveau_fence_unref(&fence);
}

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index a9ec525c0994..adbf870686aa 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -26,6 +26,7 @@

 #include 
 #include 
+#include 

 #include 

@@ -36,7 +37,6 @@
 #include "nouveau_gem.h"
 #include "nouveau_connector.h"
 #include "nv50_display.h"
-
 #include "nouveau_fence.h"

 #include 
@@ -644,7 +644,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
   struct nouveau_bo *old_bo,
   struct nouveau_bo *new_bo,
   struct nouveau_page_flip_state *s,
-  struct nouveau_fence **pfence)
+  struct nouveau_fence *fence)
 {
struct nouveau_fence_chan *fctx = chan->fence;
struct nouveau_drm *drm = chan->drm;
@@ -657,11 +657,6 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
list_add_tail(&s->head, &fctx->flip);
spin_unlock_irqrestore(&dev->event_lock, flags);

-   /* Synchronize with the old framebuffer */
-   ret = nouveau_fence_sync(old_bo, chan, false);
-   if (ret)
-   goto fail;
-
/* Emit the pageflip */
ret = RING_SPACE(chan, 2);
if (ret)
@@ -674,7 +669,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan,
OUT_RING  (chan, 0x);
FIRE_RING (chan);

-   ret = nouveau_fence_new(chan, false, pfence);
+   ret = nouveau_fence_emit(fence);
if (ret)
goto fail;

@@ -700,6 +695,12 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, stru

[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences

2014-09-04 Thread Maarten Lankhorst

Use the semaphore mechanism to make this happen, this uses signaling
from the cpu instead of signaling by the gpu.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon.h   |  17 ++-
 drivers/gpu/drm/radeon/radeon_cs.c|  30 ++---
 drivers/gpu/drm/radeon/radeon_fence.c |  13 ++-
 drivers/gpu/drm/radeon/radeon_semaphore.c | 184 ++
 4 files changed, 221 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index dddb2b7dd752..cd18fa7f801c 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -359,6 +359,11 @@ struct radeon_fence_driver {
struct delayed_work lockup_work;
 };

+struct radeon_fence_cb {
+   struct fence_cb base;
+   struct fence *fence;
+};
+
 struct radeon_fence {
struct fence base;

@@ -368,6 +373,10 @@ struct radeon_fence {
unsignedring;

wait_queue_tfence_wake;
+
+   atomic_tnum_cpu_cbs;
+   struct radeon_fence_cb  *cpu_cbs;
+   uint32_t*cpu_sema;
 };

 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -574,9 +583,11 @@ int radeon_mode_dumb_mmap(struct drm_file *filp,
  */
 struct radeon_semaphore {
struct radeon_sa_bo *sa_bo;
-   signed  waiters;
+   signed  waiters, cpu_waiters, cpu_waiters_max;
uint64_tgpu_addr;
struct radeon_fence *sync_to[RADEON_NUM_RINGS];
+   uint32_t*cpu_sema;
+   struct radeon_fence_cb  *cpu_cbs;
 };

 int radeon_semaphore_create(struct radeon_device *rdev,
@@ -587,6 +598,10 @@ bool radeon_semaphore_emit_wait(struct radeon_device 
*rdev, int ring,
struct radeon_semaphore *semaphore);
 void radeon_semaphore_sync_to(struct radeon_semaphore *semaphore,
  struct radeon_fence *fence);
+int radeon_semaphore_sync_obj(struct radeon_device *rdev,
+ struct radeon_semaphore *semaphore,
+ struct reservation_object *resv);
+
 int radeon_semaphore_sync_rings(struct radeon_device *rdev,
struct radeon_semaphore *semaphore,
int waiting_ring);
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 8ad4e2cfae15..b141f5bd029d 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -250,32 +250,16 @@ static int radeon_cs_get_ring(struct radeon_cs_parser *p, 
u32 ring, s32 priority

 static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 {
-   int i;
-
-   for (i = 0; i < p->nrelocs; i++) {
-   struct reservation_object *resv;
-   struct fence *fence;
-   struct radeon_fence *rfence;
-   int r;
+   int i, ret = 0;

+   for (i = 0; !ret && i < p->nrelocs; i++) {
if (!p->relocs[i].robj)
continue;

-   resv = p->relocs[i].robj->tbo.resv;
-   fence = reservation_object_get_excl(resv);
-   if (!fence)
-   continue;
-   rfence = to_radeon_fence(fence);
-   if (!rfence || rfence->rdev != p->rdev) {
-   r = fence_wait(fence, true);
-   if (r)
-   return r;
-   continue;
-   }
-
-   radeon_semaphore_sync_to(p->ib.semaphore, rfence);
+   ret = radeon_semaphore_sync_obj(p->rdev, p->ib.semaphore,
+   p->relocs[i].robj->tbo.resv);
}
-   return 0;
+   return ret;
 }

 /* XXX: note that this is called from the legacy UMS CS ioctl as well */
@@ -442,6 +426,10 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser 
*parser, int error, bo
 */
list_sort(NULL, &parser->validated, cmp_size_smaller_first);

+   /* must be called with all reservation_objects still held */
+   radeon_semaphore_free(parser->rdev, &parser->ib.semaphore,
+ parser->ib.fence);
+
ttm_eu_fence_buffer_objects(&parser->ticket,
&parser->validated,
&parser->ib.fence->base);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c 
b/drivers/gpu/drm/radeon/radeon_fence.c
index 0262fe2580d2..7687a7f8f41b 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -142,6 +142,8 @@ int radeon_fence_emit(struct radeon_device *rdev,
(*fence)->ring = ring;
fence_init(&(*fence)->base,

[PATCH 5/7] drm/nouveau: export reservation_object from dmabuf to ttm

2014-09-04 Thread Maarten Lankhorst

Adds an extra argument to nouveau_bo_new, which is used in nouveau_prime.c.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/nouveau/dispnv04/crtc.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c| 4 ++--
 drivers/gpu/drm/nouveau/nouveau_bo.h| 1 +
 drivers/gpu/drm/nouveau/nouveau_chan.c  | 2 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c | 6 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c   | 2 +-
 drivers/gpu/drm/nouveau/nouveau_prime.c | 5 -
 drivers/gpu/drm/nouveau/nv17_fence.c| 2 +-
 drivers/gpu/drm/nouveau/nv50_display.c  | 6 +++---
 drivers/gpu/drm/nouveau/nv50_fence.c| 2 +-
 drivers/gpu/drm/nouveau/nv84_fence.c| 4 ++--
 11 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
index b90aa5c1f90a..fca6a1f9c20c 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
@@ -1127,7 +1127,7 @@ nv04_crtc_create(struct drm_device *dev, int crtc_num)
drm_mode_crtc_set_gamma_size(&nv_crtc->base, 256);

ret = nouveau_bo_new(dev, 64*64*4, 0x100, TTM_PL_FLAG_VRAM,
-0, 0x, NULL, &nv_crtc->cursor.nvbo);
+0, 0x, NULL, NULL, &nv_crtc->cursor.nvbo);
if (!ret) {
ret = nouveau_bo_pin(nv_crtc->cursor.nvbo, TTM_PL_FLAG_VRAM);
if (!ret) {
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index bda32276bcc2..f89b4a7c93fe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -181,7 +181,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32 flags,
 int
 nouveau_bo_new(struct drm_device *dev, int size, int align,
   uint32_t flags, uint32_t tile_mode, uint32_t tile_flags,
-  struct sg_table *sg,
+  struct sg_table *sg, struct reservation_object *robj,
   struct nouveau_bo **pnvbo)
 {
struct nouveau_drm *drm = nouveau_drm(dev);
@@ -230,7 +230,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
ret = ttm_bo_init(&drm->ttm.bdev, &nvbo->bo, size,
  type, &nvbo->placement,
  align >> PAGE_SHIFT, false, NULL, acc_size, sg,
- NULL, nouveau_bo_del_ttm);
+ robj, nouveau_bo_del_ttm);
if (ret) {
/* ttm will call nouveau_bo_del_ttm if it fails.. */
return ret;
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h 
b/drivers/gpu/drm/nouveau/nouveau_bo.h
index ae95b2d43b36..d20c0b5c4e31 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.h
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.h
@@ -68,6 +68,7 @@ extern struct ttm_bo_driver nouveau_bo_driver;
 void nouveau_bo_move_init(struct nouveau_drm *);
 int  nouveau_bo_new(struct drm_device *, int size, int align, u32 flags,
u32 tile_mode, u32 tile_flags, struct sg_table *sg,
+   struct reservation_object *robj,
struct nouveau_bo **);
 int  nouveau_bo_pin(struct nouveau_bo *, u32 flags);
 int  nouveau_bo_unpin(struct nouveau_bo *);
diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c 
b/drivers/gpu/drm/nouveau/nouveau_chan.c
index 99cd9e4a2aa6..d639750379d6 100644
--- a/drivers/gpu/drm/nouveau/nouveau_chan.c
+++ b/drivers/gpu/drm/nouveau/nouveau_chan.c
@@ -106,7 +106,7 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct 
nvif_device *device,
if (nouveau_vram_pushbuf)
target = TTM_PL_FLAG_VRAM;

-   ret = nouveau_bo_new(drm->dev, size, 0, target, 0, 0, NULL,
+   ret = nouveau_bo_new(drm->dev, size, 0, target, 0, 0, NULL, NULL,
&chan->push.buffer);
if (ret == 0) {
ret = nouveau_bo_pin(chan->push.buffer, target);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c 
b/drivers/gpu/drm/nouveau/nouveau_fence.c
index decfe6c4ac07..574517a396fd 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -195,8 +195,12 @@ nouveau_fence_work(struct fence *fence,

work = kmalloc(sizeof(*work), GFP_KERNEL);
if (!work) {
+   /*
+* this might not be a nouveau fence any more,
+* so force a lazy wait here
+*/
WARN_ON(nouveau_fence_wait((struct nouveau_fence *)fence,
-  false, false));
+  true, false));
goto err;
}

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index b7dbd16904e0..1bc4eb33b60f 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -165,7 +165,7 @@ nouveau_gem_new(struct drm_device *dev, int size, int 
align, uint32_t domain,
flags |= TTM_PL_FLAG_SYSTEM;

ret

[PATCH 4/7] drm/radeon: export reservation_object from dmabuf to ttm

2014-09-04 Thread Maarten Lankhorst

Adds an extra argument to radeon_bo_create, which is used in radeon_prime.c.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/cik.c  | 4 ++--
 drivers/gpu/drm/radeon/evergreen.c| 6 +++---
 drivers/gpu/drm/radeon/r600.c | 4 ++--
 drivers/gpu/drm/radeon/radeon_benchmark.c | 4 ++--
 drivers/gpu/drm/radeon/radeon_device.c| 2 +-
 drivers/gpu/drm/radeon/radeon_gart.c  | 2 +-
 drivers/gpu/drm/radeon/radeon_gem.c   | 2 +-
 drivers/gpu/drm/radeon/radeon_object.c| 8 +---
 drivers/gpu/drm/radeon/radeon_object.h| 1 +
 drivers/gpu/drm/radeon/radeon_prime.c | 5 -
 drivers/gpu/drm/radeon/radeon_ring.c  | 2 +-
 drivers/gpu/drm/radeon/radeon_sa.c| 2 +-
 drivers/gpu/drm/radeon/radeon_test.c  | 5 +++--
 drivers/gpu/drm/radeon/radeon_ttm.c   | 2 +-
 drivers/gpu/drm/radeon/radeon_uvd.c   | 3 ++-
 drivers/gpu/drm/radeon/radeon_vce.c   | 3 ++-
 drivers/gpu/drm/radeon/radeon_vm.c| 5 +++--
 17 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 1f598ab3b9a7..d984de903928 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -4689,7 +4689,7 @@ static int cik_mec_init(struct radeon_device *rdev)
r = radeon_bo_create(rdev,
 rdev->mec.num_mec *rdev->mec.num_pipe * 
MEC_HPD_SIZE * 2,
 PAGE_SIZE, true,
-RADEON_GEM_DOMAIN_GTT, 0, NULL,
+RADEON_GEM_DOMAIN_GTT, 0, NULL, NULL,
 &rdev->mec.hpd_eop_obj);
if (r) {
dev_warn(rdev->dev, "(%d) create HDP EOP bo failed\n", 
r);
@@ -4860,7 +4860,7 @@ static int cik_cp_compute_resume(struct radeon_device 
*rdev)
 sizeof(struct bonaire_mqd),
 PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_GTT, 0, NULL,
-&rdev->ring[idx].mqd_obj);
+NULL, &rdev->ring[idx].mqd_obj);
if (r) {
dev_warn(rdev->dev, "(%d) create MQD bo 
failed\n", r);
return r;
diff --git a/drivers/gpu/drm/radeon/evergreen.c 
b/drivers/gpu/drm/radeon/evergreen.c
index dbca60c7d097..c6ccef6c3596 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -4023,7 +4023,7 @@ int sumo_rlc_init(struct radeon_device *rdev)
if (rdev->rlc.save_restore_obj == NULL) {
r = radeon_bo_create(rdev, dws * 4, PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_VRAM, 0, NULL,
-&rdev->rlc.save_restore_obj);
+NULL, &rdev->rlc.save_restore_obj);
if (r) {
dev_warn(rdev->dev, "(%d) create RLC sr bo 
failed\n", r);
return r;
@@ -4102,7 +4102,7 @@ int sumo_rlc_init(struct radeon_device *rdev)
if (rdev->rlc.clear_state_obj == NULL) {
r = radeon_bo_create(rdev, dws * 4, PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_VRAM, 0, NULL,
-&rdev->rlc.clear_state_obj);
+NULL, &rdev->rlc.clear_state_obj);
if (r) {
dev_warn(rdev->dev, "(%d) create RLC c bo 
failed\n", r);
sumo_rlc_fini(rdev);
@@ -4179,7 +4179,7 @@ int sumo_rlc_init(struct radeon_device *rdev)
r = radeon_bo_create(rdev, rdev->rlc.cp_table_size,
 PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_VRAM, 0, NULL,
-&rdev->rlc.cp_table_obj);
+NULL, &rdev->rlc.cp_table_obj);
if (r) {
dev_warn(rdev->dev, "(%d) create RLC cp table 
bo failed\n", r);
sumo_rlc_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index a95ced569d84..94e82c6b03ca 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -1430,7 +1430,7 @@ int r600_vram_scratch_init(struct radeon_device *rdev)
if (rdev->vram_scratch.robj == NULL) {
r = radeon_bo_create(rdev, RADEON_GPU_PAGE_SIZE,
 PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM,
-0, NULL, &rdev->vram_scratch.robj);
+

[PATCH 3/7] drm/radeon: cope with foreign fences inside the reservation object

2014-09-04 Thread Maarten Lankhorst

Not the whole world is a radeon! :-)

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/radeon/radeon.h | 11 -
 drivers/gpu/drm/radeon/radeon_cs.c  | 32 +
 drivers/gpu/drm/radeon/radeon_display.c | 41 -
 drivers/gpu/drm/radeon/radeon_fence.c   |  3 +++
 drivers/gpu/drm/radeon/radeon_mode.h|  1 +
 5 files changed, 61 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index d80dc547a105..dddb2b7dd752 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -699,17 +699,6 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 
doorbell);
  * IRQS.
  */

-struct radeon_flip_work {
-   struct work_struct  flip_work;
-   struct work_struct  unpin_work;
-   struct radeon_device*rdev;
-   int crtc_id;
-   uint64_tbase;
-   struct drm_pending_vblank_event *event;
-   struct radeon_bo*old_rbo;
-   struct radeon_fence *fence;
-};
-
 struct r500_irq_stat_regs {
u32 disp_int;
u32 hdmi0_status;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 6e3d1c8f3483..8ad4e2cfae15 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -248,23 +248,34 @@ static int radeon_cs_get_ring(struct radeon_cs_parser *p, 
u32 ring, s32 priority
return 0;
 }

-static void radeon_cs_sync_rings(struct radeon_cs_parser *p)
+static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 {
int i;

for (i = 0; i < p->nrelocs; i++) {
struct reservation_object *resv;
struct fence *fence;
+   struct radeon_fence *rfence;
+   int r;

if (!p->relocs[i].robj)
continue;

resv = p->relocs[i].robj->tbo.resv;
fence = reservation_object_get_excl(resv);
+   if (!fence)
+   continue;
+   rfence = to_radeon_fence(fence);
+   if (!rfence || rfence->rdev != p->rdev) {
+   r = fence_wait(fence, true);
+   if (r)
+   return r;
+   continue;
+   }

-   radeon_semaphore_sync_to(p->ib.semaphore,
-(struct radeon_fence *)fence);
+   radeon_semaphore_sync_to(p->ib.semaphore, rfence);
}
+   return 0;
 }

 /* XXX: note that this is called from the legacy UMS CS ioctl as well */
@@ -474,13 +485,19 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
return r;
}

+   r = radeon_cs_sync_rings(parser);
+   if (r) {
+   if (r != -ERESTARTSYS)
+   DRM_ERROR("Failed to sync rings: %i\n", r);
+   return r;
+   }
+
if (parser->ring == R600_RING_TYPE_UVD_INDEX)
radeon_uvd_note_usage(rdev);
else if ((parser->ring == TN_RING_TYPE_VCE1_INDEX) ||
 (parser->ring == TN_RING_TYPE_VCE2_INDEX))
radeon_vce_note_usage(rdev);

-   radeon_cs_sync_rings(parser);
r = radeon_ib_schedule(rdev, &parser->ib, NULL, true);
if (r) {
DRM_ERROR("Failed to schedule IB !\n");
@@ -567,7 +584,12 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device 
*rdev,
if (r) {
goto out;
}
-   radeon_cs_sync_rings(parser);
+   r = radeon_cs_sync_rings(parser);
+   if (r) {
+   if (r != -ERESTARTSYS)
+   DRM_ERROR("Failed to sync rings: %i\n", r);
+   goto out;
+   }
radeon_semaphore_sync_to(parser->ib.semaphore, vm->fence);

if ((rdev->family >= CHIP_TAHITI) &&
diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
b/drivers/gpu/drm/radeon/radeon_display.c
index bc894c17b2f9..715b2d95346c 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -36,6 +36,17 @@

 #include 

+struct radeon_flip_work {
+   struct work_struct  flip_work;
+   struct work_struct  unpin_work;
+   struct radeon_device*rdev;
+   int crtc_id;
+   uint64_tbase;
+   struct drm_pending_vblank_event *event;
+   struct radeon_bo*old_rbo;
+   struct fence*fence;
+};
+
 static void avivo_crtc_load_lut(struct drm_crtc *crtc)
 {
struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
@@ -402,14 +413,21 @@ static void radeon_flip_work_func(struct work_struct 
*__work)

 down_read(&rdev->exclusive_lock);
if (work->fence) {
-   r = radeon_fence_wai

[PATCH 2/7] drm/ttm: add reservation_object as argument to ttm_bo_init

2014-09-04 Thread Maarten Lankhorst

This allows importing reservation objects from dma-bufs.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/ast/ast_ttm.c|  2 +-
 drivers/gpu/drm/bochs/bochs_mm.c |  2 +-
 drivers/gpu/drm/cirrus/cirrus_ttm.c  |  2 +-
 drivers/gpu/drm/mgag200/mgag200_ttm.c|  2 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c |  2 +-
 drivers/gpu/drm/qxl/qxl_object.c |  2 +-
 drivers/gpu/drm/radeon/radeon_object.c   |  2 +-
 drivers/gpu/drm/ttm/ttm_bo.c | 24 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |  2 +-
 include/drm/ttm/ttm_bo_api.h |  2 ++
 10 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index 8008ea0bc76c..58c19cfe6af0 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -339,7 +339,7 @@ int ast_bo_create(struct drm_device *dev, int size, int 
align,
ret = ttm_bo_init(&ast->ttm.bdev, &astbo->bo, size,
  ttm_bo_type_device, &astbo->placement,
  align >> PAGE_SHIFT, false, NULL, acc_size,
- NULL, ast_bo_ttm_destroy);
+ NULL, NULL, ast_bo_ttm_destroy);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index 2af30e7607d7..6c50a7a44864 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -377,7 +377,7 @@ static int bochs_bo_create(struct drm_device *dev, int 
size, int align,
ret = ttm_bo_init(&bochs->ttm.bdev, &bochsbo->bo, size,
  ttm_bo_type_device, &bochsbo->placement,
  align >> PAGE_SHIFT, false, NULL, acc_size,
- NULL, bochs_bo_ttm_destroy);
+ NULL, NULL, bochs_bo_ttm_destroy);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 3e7d758330a9..b3b3d16d1279 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -343,7 +343,7 @@ int cirrus_bo_create(struct drm_device *dev, int size, int 
align,
ret = ttm_bo_init(&cirrus->ttm.bdev, &cirrusbo->bo, size,
  ttm_bo_type_device, &cirrusbo->placement,
  align >> PAGE_SHIFT, false, NULL, acc_size,
- NULL, cirrus_bo_ttm_destroy);
+ NULL, NULL, cirrus_bo_ttm_destroy);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c 
b/drivers/gpu/drm/mgag200/mgag200_ttm.c
index be883ef5a1d3..398b6fb161a6 100644
--- a/drivers/gpu/drm/mgag200/mgag200_ttm.c
+++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c
@@ -339,7 +339,7 @@ int mgag200_bo_create(struct drm_device *dev, int size, int 
align,
ret = ttm_bo_init(&mdev->ttm.bdev, &mgabo->bo, size,
  ttm_bo_type_device, &mgabo->placement,
  align >> PAGE_SHIFT, false, NULL, acc_size,
- NULL, mgag200_bo_ttm_destroy);
+ NULL, NULL, mgag200_bo_ttm_destroy);
if (ret)
return ret;

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index eea74b127b03..bda32276bcc2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -230,7 +230,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align,
ret = ttm_bo_init(&drm->ttm.bdev, &nvbo->bo, size,
  type, &nvbo->placement,
  align >> PAGE_SHIFT, false, NULL, acc_size, sg,
- nouveau_bo_del_ttm);
+ NULL, nouveau_bo_del_ttm);
if (ret) {
/* ttm will call nouveau_bo_del_ttm if it fails.. */
return ret;
diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
index 69c104c3240f..cdeaf08fdc74 100644
--- a/drivers/gpu/drm/qxl/qxl_object.c
+++ b/drivers/gpu/drm/qxl/qxl_object.c
@@ -110,7 +110,7 @@ int qxl_bo_create(struct qxl_device *qdev,

r = ttm_bo_init(&qdev->mman.bdev, &bo->tbo, size, type,
&bo->placement, 0, !kernel, NULL, size,
-   NULL, &qxl_ttm_bo_destroy);
+   NULL, NULL, &qxl_ttm_bo_destroy);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
dev_err(qdev->dev,
diff --git a/drivers/gpu/drm/radeon/radeon_object.c 
b/drivers/gpu/drm/radeon/radeon_object.c
index aadbd36e64b9..61f3f16bbcbc 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -209,7 +209,7 @@ int radeon_bo_create(struct radeon_device *rdev,
down_read(&rdev->pm.mclk_lock);
r = ttm_bo_init(&rdev->mman.bdev, &bo->tbo, size, type,

[PATCH 1/7] drm: Pass dma-buf as argument to, gem_prime_import_sg_table

2014-09-04 Thread Maarten Lankhorst

Allows importing reservation_objects from a dma-buf.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/drm_gem_cma_helper.c| 5 +++--
 drivers/gpu/drm/drm_prime.c | 2 +-
 drivers/gpu/drm/msm/msm_drv.h   | 2 +-
 drivers/gpu/drm/msm/msm_gem_prime.c | 4 ++--
 drivers/gpu/drm/nouveau/nouveau_gem.h   | 2 +-
 drivers/gpu/drm/nouveau/nouveau_prime.c | 5 +++--
 drivers/gpu/drm/qxl/qxl_drv.h   | 2 +-
 drivers/gpu/drm/qxl/qxl_prime.c | 2 +-
 drivers/gpu/drm/radeon/radeon_drv.c | 2 +-
 drivers/gpu/drm/radeon/radeon_prime.c   | 5 +++--
 include/drm/drmP.h  | 3 ++-
 include/drm/drm_gem_cma_helper.h| 3 ++-
 12 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c 
b/drivers/gpu/drm/drm_gem_cma_helper.c
index e467e67af6e7..08646af2ddc2 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -316,7 +316,8 @@ out:
 EXPORT_SYMBOL_GPL(drm_gem_cma_prime_get_sg_table);

 struct drm_gem_object *
-drm_gem_cma_prime_import_sg_table(struct drm_device *dev, size_t size,
+drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
+ struct dma_buf_attachment *attach,
  struct sg_table *sgt)
 {
struct drm_gem_cma_object *cma_obj;
@@ -325,7 +326,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev, 
size_t size,
return ERR_PTR(-EINVAL);

/* Create a CMA GEM buffer. */
-   cma_obj = __drm_gem_cma_create(dev, size);
+   cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
if (IS_ERR(cma_obj))
return ERR_CAST(cma_obj);

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 99d578bad17e..dc4711f30382 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -522,7 +522,7 @@ struct drm_gem_object *drm_gem_prime_import(struct 
drm_device *dev,
goto fail_detach;
}

-   obj = dev->driver->gem_prime_import_sg_table(dev, dma_buf->size, sgt);
+   obj = dev->driver->gem_prime_import_sg_table(dev, attach, sgt);
if (IS_ERR(obj)) {
ret = PTR_ERR(obj);
goto fail_unmap;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 8a2c5fd0893e..a0dc2592ffc1 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -170,7 +170,7 @@ struct sg_table *msm_gem_prime_get_sg_table(struct 
drm_gem_object *obj);
 void *msm_gem_prime_vmap(struct drm_gem_object *obj);
 void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr);
 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
-   size_t size, struct sg_table *sg);
+   struct dma_buf_attachment *attach, struct sg_table *sg);
 int msm_gem_prime_pin(struct drm_gem_object *obj);
 void msm_gem_prime_unpin(struct drm_gem_object *obj);
 void *msm_gem_vaddr_locked(struct drm_gem_object *obj);
diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c 
b/drivers/gpu/drm/msm/msm_gem_prime.c
index d48f9fc5129b..b75f9940ee9e 100644
--- a/drivers/gpu/drm/msm/msm_gem_prime.c
+++ b/drivers/gpu/drm/msm/msm_gem_prime.c
@@ -37,9 +37,9 @@ void msm_gem_prime_vunmap(struct drm_gem_object *obj, void 
*vaddr)
 }

 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
-   size_t size, struct sg_table *sg)
+   struct dma_buf_attachment *attach, struct sg_table *sg)
 {
-   return msm_gem_import(dev, size, sg);
+   return msm_gem_import(dev, attach->dmabuf->size, sg);
 }

 int msm_gem_prime_pin(struct drm_gem_object *obj)
diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.h 
b/drivers/gpu/drm/nouveau/nouveau_gem.h
index ddab762d81fe..e4049faca780 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.h
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.h
@@ -39,7 +39,7 @@ struct reservation_object *nouveau_gem_prime_res_obj(struct 
drm_gem_object *);
 extern void nouveau_gem_prime_unpin(struct drm_gem_object *);
 extern struct sg_table *nouveau_gem_prime_get_sg_table(struct drm_gem_object 
*);
 extern struct drm_gem_object *nouveau_gem_prime_import_sg_table(
-   struct drm_device *, size_t size, struct sg_table *);
+   struct drm_device *, struct dma_buf_attachment *, struct sg_table *);
 extern void *nouveau_gem_prime_vmap(struct drm_gem_object *);
 extern void nouveau_gem_prime_vunmap(struct drm_gem_object *, void *);

diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c 
b/drivers/gpu/drm/nouveau/nouveau_prime.c
index 1f51008e4d26..2215cdba587d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -23,6 +23,7 @@
  */

 #include 
+#include 

 #include "nouveau_drm.h"
 #include "nouveau_gem.h"
@@ -56,7 +57,7 @@ void nouveau_gem_prime_vunmap(struct drm_gem_object *obj, 
void *vaddr)
 }

 struct drm_gem_object *nouveau_gem_prime_import_sg_t

[PATCH 0/7] cross-dev synchronization in TTM through dma-buf.

2014-09-04 Thread Maarten Lankhorst

So this is finally it. After all the work writing support for fences cross-dev 
synchronization is now possible. :-)

The last 2 patches of this series are not needed for cross-dev to work. But 
without it any waits on cross-device fences will be done synchronously.
I've previously tested this with i915, but the patches for i915 fail to apply 
again with the execlist stuff, so I haven't tried with the latest drm-next 
changes.

I would like to have the first 2 patches applied on drm-next, and the 
radeon/nouveau specific patches when they go through their review.

[Bug 75112] Meta Bug for HyperZ issues on r600g and radeonsi

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=75112

smoki  changed:

   What|Removed |Added

 Depends on||83418

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/1b7b6bbc/attachment-0001.html>

[Bug 83418] EU IV is incorrectly rendered after git1409011930.d571f2

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83418

smoki  changed:

   What|Removed |Added

 Blocks||75112

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ad047194/attachment.html>

[Bug 44126] [r300g] 0ad: carpet textures "flash" and get hidden by ground texture.

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=44126

--- Comment #4 from Marek Ol??k  ---
Flickering when 2 primitives exactly intersect each other and vertex positions
are not equal is a quite common programming mistake in games. Drivers cannot do
anything about it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/01a410f7/attachment.html>

[Bug 83418] EU IV is incorrectly rendered after git1409011930.d571f2

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83418

--- Comment #7 from Jos? Su?rez  ---
As stated by smoki, reverting that commit indeed fixes the problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ac463558/attachment.html>

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote:

> So in the meantime the attached patch should work, it just silently ignore
> the caching attribute request on non x86 instead of pretending that things
> are setup as expected and then latter the radeon ou nouveau hw unsetting
> the snoop bit.
> 
> It's not tested but i think it should work.

I'm still getting placements with !CACHED going from bo_memcpy in
ttm_io_prot() though ... I'm looking at filtering the placement
attributes instead.

Ben.

> > 
> > Cheers,
> > J?r?me
> > 
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > > ___
> > > dri-devel mailing list
> > > dri-devel at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/dri-devel

TTM placement & caching issue/questions

2014-09-04 Thread Thomas Hellstrom

On 09/04/2014 11:43 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote:
>> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
>>> Last time I tested, (and it seems like Michel is on the same track),
>>> writing with the CPU to write-combined memory was substantially faster
>>> than writing to cached memory, with the additional side-effect that CPU
>>> caches are left unpolluted.
>>>
>>> Moreover (although only tested on Intel's embedded chipsets), texturing
>>> from cpu-cache-coherent PCI memory was a real GPU performance hog
>>> compared to texturing from non-snooped memory. Hence, whenever a buffer
>>> could be classified as GPU-read-only (or almost at least), it should be
>>> placed in write-combined memory.
>> Just a quick comment since this explicitly referes to intel chips: On
>> desktop/laptop chips with the big shared l3/l4 caches it's the other way
>> round. Cached uploads are substantially faster than wc and not using
>> coherent access is a severe perf hit for texturing. I guess the hw guys
>> worked really hard to hide the snooping costs so that the gpu can benefit
>> from the massive bandwidth these caches can provide.
> This is similar to modern POWER chips as well. We have pretty big L3's
> (though not technically shared they are in a separate quadrant and we
> have a shared L4 in the memory buffer) and our fabric is generally
> optimized for cachable/coherent access performance. In fact, we only
> have so many credits for NC accesses on the bus...
>

Thanks both of you for the update. I haven't dealt with real hardware
for a while..

/Thomas

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

On Wed, 2014-09-03 at 21:55 -0400, Jerome Glisse wrote:
> So i think we need to get a platform flags and or set_pages_array_wc|uc
> needs to fail and this would fallback to cached mapping if the fallback
> code still works. So if your arch properly return and error for those
> cache changing function then you should be fine.
> 
> This also means that we need to fix ttm_tt_set_placement_caching so that
> when it returns an error it switches to cached mapping. Which will always
> work.

Can't I just filter the mem_type definitions in the mem_type_manager
with something along that totally untested patch ?

Or do I *also* need to make those set_page_array_* things fail ?

--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1308,6 +1308,24 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned 
 }
 EXPORT_SYMBOL(ttm_bo_evict_mm);

+static void ttm_bo_filter_mem_type(struct ttm_bo_device *bdev, unsigned type,
+  struct ttm_mem_type_manager *man)
+{
+   /*
+* On some architectures/patforms, we cannot allow non-cachable
+* mappings of system memory. This can be a problem with AGP on
+* old G5 systems vs. TTM_PL_TT but we don't really have a choice
+* at this point on ppc64 at least and the AGP on these never
+* worked reliably anyway.
+*/
+#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE)
+   if (type == TTM_PL_SYSTEM || type == TTM_PL_TT) {
+   man->available_caching &= TTM_PL_FLAG_CACHED;
+   man->default_caching &= man->available_caching;
+   }
+#endif
+}
+
 int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type,
unsigned long p_size)
 {
@@ -1327,6 +1345,8 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned ty
return ret;
man->bdev = bdev;

+   ttm_bo_filter_mem_type(bdev, type, man);
+
ret = 0;
if (type != TTM_PL_SYSTEM) {
ret = (*man->func->init)(man, p_size);

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #10 from smoki  ---
 OK i will bisect this, having now here pretty much clear case something like 3
times performance drop in OpenJK :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/acc1c2a6/attachment-0001.html>

[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process

2014-09-04 Thread Daniel Vetter

On Thu, Sep 4, 2014 at 9:03 AM, Gupta, Sourab  wrote:
> On Wed, 2014-09-03 at 13:09 +, Daniel Vetter wrote:
>> On Wed, Sep 03, 2014 at 11:49:52AM +, Gupta, Sourab wrote:
>> > On Wed, 2014-09-03 at 10:58 +, Daniel Vetter wrote:
>> > > On Wed, Sep 03, 2014 at 03:39:55PM +0530, sourab.gupta at intel.com 
>> > > wrote:
>> > > > From: Sourab Gupta 
>> > > >
>> > > > Currently the Graphics Driver provides an interface through which
>> > > > one can get a snapshot of the overall Graphics memory consumption.
>> > > > Also there is an interface available, which provides information
>> > > > about the several memory related attributes of every single Graphics
>> > > > buffer created by the various clients.
>> > > >
>> > > > There is a requirement of a new interface for achieving below
>> > > > functionalities:
>> > > > 1) Need to provide Client based detailed information about the
>> > > > distribution of Graphics memory
>> > > > 2) Need to provide an interface which can provide info about the
>> > > > sharing of Graphics buffers between the clients.
>> > > >
>> > > > The client based interface would also aid in debugging of
>> > > > memory usage/consumption by each client & debug memleak related issues.
>> > > >
>> > > > With this new interface,
>> > > > 1) In case of memleak scenarios, we can easily zero in on the culprit
>> > > > client which is unexpectedly holding on the Graphics buffers for an
>> > > > inordinate amount of time.
>> > > > 2) We can get an estimate of the instantaneous memory footprint of
>> > > > every Graphics client.
>> > > > 3) We can now trace all the processes sharing a particular Graphics 
>> > > > buffer.
>> > > >
>> > > > By means of this patch we try to provide a sysfs interface to achieve
>> > > > the mentioned functionalities.
>> > > >
>> > > > There are two files created in sysfs:
>> > > > 'i915_gem_meminfo' will provide summary of the graphics resources used 
>> > > > by
>> > > > each graphics client.
>> > > > 'i915_gem_objinfo' will provide detailed view of each object created by
>> > > > individual clients.
>> > > >
>> > > > v2: Changes made for
>> > > > - adding support to report user virtual addresses of mapped buffers
>> > > > - replacing pid based reporting with tgid based one
>> > > > - checkpatch and other misc cleanup
>> > > >
>> > > > Signed-off-by: Sourab Gupta 
>> > > > Signed-off-by: Akash Goel 
>> > >
>> > > Sorry I didn't spot this the first time around, but I think sysfs is the
>> > > wrong place for this.
>> > >
>> > > Generally sysfs is for setting/reading per-object values, and it has the
>> > > big rule that there should be only _one_ value per file. The error state
>> > > is a bit an exception, but otoh it's also just the full dump as a binary
>> > > file (which for historical reasons is printed as ascii).
>> > >
>> > > The other issue is that imo this should be a generic interface, so that 
>> > > we
>> > > can write a gpu_top tool for dumping memory consumers which works on all
>> > > linux platforms.
>> > >
>> > > To avoid delaying for a long time can we just move ahead by putting this
>> > > into debugfs?
>> > >
>> > > Also in debugfs there's already a lot of this stuff around - why is that
>> > > not sufficient and could we extend it somehow with the missing bits?
>> > >
>> > > Thanks, Daniel
>> >
>> > Hi Daniel,
>> >
>> > Thanks for your inputs.
>> > We had originally put the patch in sysfs, as there was a requirement for
>> > this feature to be available in production kernels also.
>> > We can move it to debugfs to move ahead with this. I'll submit the
>> > debugfs version of this patch next time.
>>
>> Yeah sysfs is the only place where we have a stable api, but that also
>> implies that requirements are a _lot_ more stringent. At least we need
>> testcases to make sure the interface actually do what we want them to do,
>> and to make sure we don't break the interface by accident.
>>
>> > Also,
>> > we developed this new interface to overcome the deficiencies of existing
>> > interface. With this new interface, we can provide client based detailed
>> > information about the distribution of Graphics memory. This gives
>> > information about the various states of the graphics objects opened per
>> > process (summarized as well as detailed info)
>> > It also gives information about Graphics buffers shared between the
>> > clients, and gives user mapped virtual address of all the mapped
>> > graphics buffers.
>> > It was not feasible to fit all this info in the existing interface. So
>> > we decided to go ahead with new interface for these functionality.
>>
>> Well the problem is that adding more files like that increases the
>> maintenance burden. So if there's some way to compute the information you
>> want from information already provided in debugfs, then I prefer we do
>> that at first.
>> -Daniel
>
> Hi Daniel,
>
> We went through the existing debugfs interfaces, but we couldn't derive
> the information we need from these inter

[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process

2014-09-04 Thread Gupta, Sourab

On Thu, 2014-09-04 at 10:01 +, Daniel Vetter wrote:
> On Thu, Sep 4, 2014 at 9:03 AM, Gupta, Sourab  
> wrote:
> > On Wed, 2014-09-03 at 13:09 +, Daniel Vetter wrote:
> >> On Wed, Sep 03, 2014 at 11:49:52AM +, Gupta, Sourab wrote:
> >> > On Wed, 2014-09-03 at 10:58 +, Daniel Vetter wrote:
> >> > > On Wed, Sep 03, 2014 at 03:39:55PM +0530, sourab.gupta at intel.com 
> >> > > wrote:
> >> > > > From: Sourab Gupta 
> >> > > >
> >> > > > Currently the Graphics Driver provides an interface through which
> >> > > > one can get a snapshot of the overall Graphics memory consumption.
> >> > > > Also there is an interface available, which provides information
> >> > > > about the several memory related attributes of every single Graphics
> >> > > > buffer created by the various clients.
> >> > > >
> >> > > > There is a requirement of a new interface for achieving below
> >> > > > functionalities:
> >> > > > 1) Need to provide Client based detailed information about the
> >> > > > distribution of Graphics memory
> >> > > > 2) Need to provide an interface which can provide info about the
> >> > > > sharing of Graphics buffers between the clients.
> >> > > >
> >> > > > The client based interface would also aid in debugging of
> >> > > > memory usage/consumption by each client & debug memleak related 
> >> > > > issues.
> >> > > >
> >> > > > With this new interface,
> >> > > > 1) In case of memleak scenarios, we can easily zero in on the culprit
> >> > > > client which is unexpectedly holding on the Graphics buffers for an
> >> > > > inordinate amount of time.
> >> > > > 2) We can get an estimate of the instantaneous memory footprint of
> >> > > > every Graphics client.
> >> > > > 3) We can now trace all the processes sharing a particular Graphics 
> >> > > > buffer.
> >> > > >
> >> > > > By means of this patch we try to provide a sysfs interface to achieve
> >> > > > the mentioned functionalities.
> >> > > >
> >> > > > There are two files created in sysfs:
> >> > > > 'i915_gem_meminfo' will provide summary of the graphics resources 
> >> > > > used by
> >> > > > each graphics client.
> >> > > > 'i915_gem_objinfo' will provide detailed view of each object created 
> >> > > > by
> >> > > > individual clients.
> >> > > >
> >> > > > v2: Changes made for
> >> > > > - adding support to report user virtual addresses of mapped 
> >> > > > buffers
> >> > > > - replacing pid based reporting with tgid based one
> >> > > > - checkpatch and other misc cleanup
> >> > > >
> >> > > > Signed-off-by: Sourab Gupta 
> >> > > > Signed-off-by: Akash Goel 
> >> > >
> >> > > Sorry I didn't spot this the first time around, but I think sysfs is 
> >> > > the
> >> > > wrong place for this.
> >> > >
> >> > > Generally sysfs is for setting/reading per-object values, and it has 
> >> > > the
> >> > > big rule that there should be only _one_ value per file. The error 
> >> > > state
> >> > > is a bit an exception, but otoh it's also just the full dump as a 
> >> > > binary
> >> > > file (which for historical reasons is printed as ascii).
> >> > >
> >> > > The other issue is that imo this should be a generic interface, so 
> >> > > that we
> >> > > can write a gpu_top tool for dumping memory consumers which works on 
> >> > > all
> >> > > linux platforms.
> >> > >
> >> > > To avoid delaying for a long time can we just move ahead by putting 
> >> > > this
> >> > > into debugfs?
> >> > >
> >> > > Also in debugfs there's already a lot of this stuff around - why is 
> >> > > that
> >> > > not sufficient and could we extend it somehow with the missing bits?
> >> > >
> >> > > Thanks, Daniel
> >> >
> >> > Hi Daniel,
> >> >
> >> > Thanks for your inputs.
> >> > We had originally put the patch in sysfs, as there was a requirement for
> >> > this feature to be available in production kernels also.
> >> > We can move it to debugfs to move ahead with this. I'll submit the
> >> > debugfs version of this patch next time.
> >>
> >> Yeah sysfs is the only place where we have a stable api, but that also
> >> implies that requirements are a _lot_ more stringent. At least we need
> >> testcases to make sure the interface actually do what we want them to do,
> >> and to make sure we don't break the interface by accident.
> >>
> >> > Also,
> >> > we developed this new interface to overcome the deficiencies of existing
> >> > interface. With this new interface, we can provide client based detailed
> >> > information about the distribution of Graphics memory. This gives
> >> > information about the various states of the graphics objects opened per
> >> > process (summarized as well as detailed info)
> >> > It also gives information about Graphics buffers shared between the
> >> > clients, and gives user mapped virtual address of all the mapped
> >> > graphics buffers.
> >> > It was not feasible to fit all this info in the existing interface. So
> >> > we decided to go ahead with new interface for these functionality.
> >>
> >> Well the problem is

TTM placement & caching issue/questions

2014-09-04 Thread Daniel Vetter

On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote:
> Last time I tested, (and it seems like Michel is on the same track),
> writing with the CPU to write-combined memory was substantially faster
> than writing to cached memory, with the additional side-effect that CPU
> caches are left unpolluted.
> 
> Moreover (although only tested on Intel's embedded chipsets), texturing
> from cpu-cache-coherent PCI memory was a real GPU performance hog
> compared to texturing from non-snooped memory. Hence, whenever a buffer
> could be classified as GPU-read-only (or almost at least), it should be
> placed in write-combined memory.

Just a quick comment since this explicitly referes to intel chips: On
desktop/laptop chips with the big shared l3/l4 caches it's the other way
round. Cached uploads are substantially faster than wc and not using
coherent access is a severe perf hit for texturing. I guess the hw guys
worked really hard to hide the snooping costs so that the gpu can benefit
from the massive bandwidth these caches can provide.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

TTM placement & caching issue/questions

2014-09-04 Thread Thomas Hellstrom

On 09/04/2014 10:06 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote:
>
>>> This will, from what I can tell, try to use the same caching mode as the
>>> original object:
>>>
>>> if ((cur_placement & caching) != 0)
>>> result |= (cur_placement & caching);
>>>
>>> And cur_placement comes from bo->mem.placement which as far as I can
>>> tell is based on the placement array which the drivers set up.
>> This originates from the fact that when evicting GTT memory, on x86 it's
>> unnecessary and undesirable to switch caching mode when going to system.
> But that's what I don't quite understand. We have two different mappings
> here. The VRAM and the memory object. We wouldn't be "switching"... we
> are creating a temporary mapping for the memory object in order to do
> the memcpy, but we seem to be doing it by using the caching attributes
> of the VRAM object or am I missing something ? I don't see how that
> makes sense so I suppose I'm missing something here :-)

Well, the intention when TTM was written was that the driver writer
should be smart enough that when he wanted a move from unached VRAM to
system, he'd request cached system in the placement flags in the first
place.  If TTM somehow overrides such a request, that's a bug in TTM.

If the move, for example, is a result of an eviction, then the driver
evict_flags() function should ideally look at the current placement and
decide about a suitable placement based on that: vram-to-system moves
should generally request cacheable memory if the next access is expected
by the CPU. Probably write-combined otherwise.
If the move is the result of a TTM swapout, TTM will automatically
select cachable system, and for most other moves, I think the driver
writer is in full control.

>
>> Last time I tested, (and it seems like Michel is on the same track),
>> writing with the CPU to write-combined memory was substantially faster
>> than writing to cached memory, with the additional side-effect that CPU
>> caches are left unpolluted.
> That's very strange indeed. It's certainly an x86 specific artifact,
> even if we were allowed by our hypervisor to map memory non-cachable
> (the HW somewhat can), we tend to have a higher throughput by going
> cachable, but that could be due to the way the PowerBus works (it's
> basically very biased toward cachable transactions).
>
>> I dislike the approach of rewriting placements. In some cases I think it
>> won't even work, because placements are declared 'static const'
>>
>> What I'd suggest is instead to intercept the driver response from
>> init_mem_type() and filter out undesired caching modes from
>> available_caching and default_caching, 
> This was my original intent but Jerome seems to have different ideas
> (see his proposed patches). I'm happy to revive mine as well and post it
> as an alternative after I've tested it a bit more (tomorrow).
>
>> perhaps also looking at whether
>> the memory type is mappable or not. This should have the additional
>> benefit of working everywhere, and if a caching mode is selected that's
>> not available on the platform, you'll simply get an error. (I guess?)
> You mean that if not mappable we don't bother filtering ?
>
> The rule is really for me pretty simple:
>
>- If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable
>
>- If it's PCIe memory space (VRAM, registers, ...) it MUST be
> non-cachable.

Yes, something along these lines. I guess checking for VRAM or
TTM_MEMTYPE_FLAG_FIXED would perhaps do the trick

/Thomas

>
> Cheers,
> Ben.
>
>> /Thomas
>>
>>
>>> Cheers,
>>> Ben.
>>>
>>>
>>> ___
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=C9AHL1VngKBOxe2UrNP2eCZo6FLqdlr6Y90rpfE5rUs%3D%0A&s=73da0633bafc5d54bf116bc861d48d13c39cf8f41832adfb739709e98ec05553
>

[Bug 83436] Sudden framerate drops in multiple games

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=83436

--- Comment #9 from Maciej  ---
Did an update today, performance decrease is still there, but fps drops are
gone. I had no other apps running in the background, so I'm not sure what's up.
However fps drops in TF2 are still a thing.

As for bisecting, I really have no skills to do that, I'm just a gamer with AMD
card :/

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/627f9d1b/attachment.html>

[git pull] drm fixes

2014-09-04 Thread Dave Airlie


Hi Linus,

just i915 and vmwgfx fixes,

i915 contains a bunch of fixes for recent regressions in outputs,
vmwgfx fixes a possible loop for ever and a bad return code.

Dave.

The following changes since commit 59753a805499f1ffbca4ac0a24b3dff67bf1:

  Merge tag 'backlight-fixes-3.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight (2014-08-28 
10:47:10 -0700)

are available in the git repository at:

  git://people.freedesktop.org/~airlied/linux drm-fixes

for you to fetch changes up to 3aacfda0ecd9040521fbfb4a2c53cd6bf77ae4ee:

  Merge tag 'drm-intel-fixes-2014-09-03' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes (2014-09-04 11:20:00 
+1000)



Dave Airlie (4):
  Merge tag 'drm-intel-fixes-2014-08-28' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes
  drm/i915: handle G45/GM45 pulse detection connected state.
  Merge branch 'vmwgfx-fixes-3.17' of 
git://people.freedesktop.org/~thomash/linux into drm-fixes
  Merge tag 'drm-intel-fixes-2014-09-03' of 
git://anongit.freedesktop.org/drm-intel into drm-fixes

Mathias Krause (1):
  drm/i915: Remove bogus __init annotation from DMI callbacks

Paulo Zanoni (1):
  drm/i915: fix plane/cursor handling when runtime suspended

Scot Doyle (2):
  drm/i915: Ignore VBT backlight presence check on Acer C720 (4005U)
  drm/i915: don't warn if backlight unexpectedly enabled

Thomas Hellstrom (2):
  drm/vmwgfx: Fix an incorrect OOM return value
  drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle

Ville Syrj?l? (2):
  drm/i915: Move intel_ddi_set_vc_payload_alloc(false) to 
haswell_crtc_disable()
  drm/i915: Fix lock dropping in intel_tv_detect()

 drivers/gpu/drm/i915/intel_bios.c   |  2 +-
 drivers/gpu/drm/i915/intel_crt.c|  2 +-
 drivers/gpu/drm/i915/intel_display.c| 34 +---
 drivers/gpu/drm/i915/intel_dp.c | 55 ++---
 drivers/gpu/drm/i915/intel_lvds.c   |  2 +-
 drivers/gpu/drm/i915/intel_panel.c  |  8 ++---
 drivers/gpu/drm/i915/intel_tv.c | 10 --
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 11 +++
 drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c|  3 +-
 9 files changed, 88 insertions(+), 39 deletions(-)

[Bug 82828] Regression: Crash in 3Dmark2001

2014-09-04 Thread bugzilla-dae...@freedesktop.org

https://bugs.freedesktop.org/show_bug.cgi?id=82828

--- Comment #20 from Pavel Ondra?ka  ---
Your patch does indeed fix the crashing tests, I still see some piglit
regressions but that should be either bug 82882 or bug 82978.
Thanks for the fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/41fa3826/attachment.html>

TTM placement & caching issue/questions

2014-09-04 Thread Benjamin Herrenschmidt

Hi folks !

I've been tracking down some problems with the recent DRI on powerpc and
stumbled upon something that doesn't look right, and not necessarily
only for us.

Now it's possible that I haven't fully understood the code here and I
also don't know to what extent some of that behaviour is necessary for
some platforms such as Intel GTT bits.

What I've observed with a simple/dumb (no DMA) driver like AST (but this
probably happens more generally) is that when evicting a BO from VRAM
into System memory, the TTM tries to preserve the existing caching
attributes of the VRAM object.

>From what I can tell, we end up with going from VRAM to System memory
type, and we eventually call ttm_bo_select_caching() to select the
caching option for the target.

This will, from what I can tell, try to use the same caching mode as the
original object:

if ((cur_placement & caching) != 0)
result |= (cur_placement & caching);

And cur_placement comes from bo->mem.placement which as far as I can
tell is based on the placement array which the drivers set up.

Now they tend to uniformly setup the placement for System memory as
TTM_PL_MASK_CACHING which enables all caching modes.

So I end up with, for example, my System memory BOs having
TTM_PL_FLAG_CACHED not set (though they also don't have
TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC.

We don't seem to use the man->default_caching (which will have
TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the
proposed placement and the existing caching mode.

Now this is a problem for several reason that I can think of:

 - On a number of powerpc platforms, such as all our server 64-bit one
for example, it's actually illegal to map system memory non-cached. The
system is fully cache coherent for all possible DMA originators (that we
care about at least) and mapping memory non-cachable while it's mapped
cachable in the linear mapping can cause nasty cache paradox which, when
detected by HW, can checkstop the system.

 - A similar issue exists, afaik, on ARM >= v7, so anything mapped
non-cachable must be removed from the linear mapping explicitly since
otherwise it can be speculatively prefetched into the cache.

 - I don't know about x86, but even then, it looks quite sub-optimal to
map the memory backing of the BOs and access it using a WC rather than a
cachable mapping attribute.

Now, some folks on IRC mentioned that there might be reasons for the
current behaviour as to not change the caching attributes when going
in/out of the GTT on Intel, I don't know how that relates and how that
works, but maybe that should be enforced by having a different placement
mask specifically on those chipsets.

Dave, should we change the various PCI drivers for generally coherent
devices such that the System memory type doesn't allow placements
without CACHED attribute ? Or at least on coherent platforms ? How do
detect that ? Should we have a TTM helper to establish the default
memory placement attributes that "normal PCI" drivers call to set that
up so we can have all the necessary arch ifdefs in one single place, at
least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ?

Non-PCI and "special" drivers like Intel can use a different set of
placement attributes to represent the requirements of those specific
platforms (mostly thinking of embedded ARM here which under some
circumstances might actually require non-cached mappings).
Or am I missing another part of the puzzle ?

As it-is, things are broken for me even for dumb drivers, and I suspect
to a large extent with radeon and nouveau too, though in some case we
might get away with it most of the time ... until the machine locks up
for some unexplainable reason... This might cause problems on existing
distros such as RHEL7 with our radeon adapters even.

Any suggestion of what's the best approach to fix it ? I'm happy to
produce the patches but I'm not that familiar with the TTM so I would
like to make sure I'm the right track first :-)

Cheers,
Ben.

[PATCH 6/9] drm/ttm: force cached mapping on non x86 platform

2014-09-04 Thread Thomas Hellstrom

On 09/04/2014 09:46 AM, Benjamin Herrenschmidt wrote:
> From: J?r?me Glisse 
>
> People interested in providing uncached or write combined mapping
> on there architecture need to do the ground work inside there arch
> specific code to allow to break the linear kernel mapping so that
> page mapping attributes can be updated, in the meantime force cached
> mapping for non x86 architecture.
I don't like this patch. Please see my previous email.

/Thomas

1 2 >

1 - 100 of 126 matches

Mail list logo