[PATCH v2] clk: Don't hold prepare_lock across debugfs creation
Rob Clark reports a lockdep splat that involves the prepare_lock chained with the mmap semaphore. == [ INFO: possible circular locking dependency detected ] 3.17.0-rc1-00050-g07a489b #802 Tainted: GW --- Xorg.bin/5413 is trying to acquire lock: (prepare_lock){+.+.+.}, at: [] clk_prepare_lock+0x88/0xfc but task is already holding lock: (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (qcom_iommu_lock){+.+...}: [] qcom_iommu_map+0x28/0x450 [] iommu_map+0xc8/0x12c [] msm_iommu_map+0xb4/0x130 [] msm_gem_get_iova_locked+0x9c/0xe8 [] msm_gem_get_iova+0x4c/0x64 [] mdp4_kms_init+0x4c4/0x6c0 [] msm_load+0x2ac/0x34c [] drm_dev_register+0xac/0x108 [] drm_platform_init+0x50/0xf0 [] try_to_bring_up_master.part.3+0xc8/0x108 [] component_master_add_with_match+0xa8/0x104 [] msm_pdev_probe+0x64/0x70 [] platform_drv_probe+0x2c/0x60 [] driver_probe_device+0x108/0x234 [] bus_for_each_drv+0x64/0x98 [] device_attach+0x78/0x8c [] bus_probe_device+0x88/0xac [] deferred_probe_work_func+0x68/0x9c [] process_one_work+0x1a0/0x40c [] worker_thread+0x44/0x4d8 [] kthread+0xd8/0xec [] ret_from_fork+0x14/0x2c -> #3 (&dev->struct_mutex){+.+.+.}: [] drm_gem_mmap+0x38/0xd0 [] msm_gem_mmap+0xc/0x5c [] mmap_region+0x35c/0x6c8 [] do_mmap_pgoff+0x314/0x398 [] vm_mmap_pgoff+0x84/0xb4 [] SyS_mmap_pgoff+0x94/0xbc [] ret_fast_syscall+0x0/0x48 -> #2 (&mm->mmap_sem){++}: [] filldir64+0x68/0x180 [] dcache_readdir+0x188/0x22c [] iterate_dir+0x9c/0x11c [] SyS_getdents64+0x78/0xe8 [] ret_fast_syscall+0x0/0x48 -> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}: [] __create_file+0x58/0x1dc [] debugfs_create_dir+0x1c/0x24 [] clk_debug_create_subtree+0x20/0x170 [] clk_debug_init+0xec/0x14c [] do_one_initcall+0x8c/0x1c8 [] kernel_init_freeable+0x13c/0x1dc [] kernel_init+0x8/0xe8 [] ret_from_fork+0x14/0x2c -> #0 (prepare_lock){+.+.+.}: [] mutex_lock_nested+0x70/0x3e8 [] clk_prepare_lock+0x88/0xfc [] clk_prepare+0xc/0x24 [] __enable_clocks.isra.4+0x18/0xa4 [] __flush_iotlb_va+0xe0/0x114 [] qcom_iommu_unmap+0xac/0x1f0 [] iommu_unmap+0x9c/0xe8 [] msm_iommu_unmap+0x64/0x84 [] msm_gem_free_object+0x11c/0x338 [] drm_gem_object_handle_unreference_unlocked+0xfc/0x130 [] drm_gem_object_release_handle+0x50/0x68 [] idr_for_each+0xa8/0xdc [] drm_gem_release+0x1c/0x28 [] drm_release+0x370/0x428 [] __fput+0x98/0x1e8 [] task_work_run+0xb0/0xfc [] do_exit+0x2ec/0x948 [] do_group_exit+0x4c/0xb8 [] get_signal+0x28c/0x6ac [] do_signal+0xc4/0x3e4 [] do_work_pending+0xb4/0xc4 [] work_pending+0xc/0x20 other info that might help us debug this: Chain exists of: prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock Possible unsafe locking scenario: CPU0CPU1 lock(qcom_iommu_lock); lock(&dev->struct_mutex); lock(qcom_iommu_lock); lock(prepare_lock); *** DEADLOCK *** 3 locks held by Xorg.bin/5413: #0: (drm_global_mutex){+.+.+.}, at: [] drm_release+0x34/0x428 #1: (&dev->struct_mutex){+.+.+.}, at: [] drm_gem_object_handle_unreference_unlocked+0xcc/0x130 #2: (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0 stack backtrace: CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: GW 3.17.0-rc1-00050-g07a489b #802 [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x98/0xb8) [] (dump_stack) from [] (print_circular_bug+0x218/0x340) [] (print_circular_bug) from [] (__lock_acquire+0x1d24/0x20b8) [] (__lock_acquire) from [] (lock_acquire+0x9c/0xbc) [] (lock_acquire) from [] (mutex_lock_nested+0x70/0x3e8) [] (mutex_lock_nested) from [] (clk_prepare_lock+0x88/0xfc) [] (clk_prepare_lock) from [] (clk_prepare+0xc/0x24) [] (clk_prepare) from [] (__enable_clocks.isra.4+0x18/0xa4) [] (__enable_clocks.isra.4) from [] (__flush_iotlb_va+0xe0/0x114) [] (__flush_iotlb_va) from [] (qcom_iommu_unmap+0xac/0x1f0) [] (qcom_iommu_unmap) from [] (iommu_unmap+0x9c/0xe8) [] (iommu_unmap) from [] (msm_iommu_unmap+0x64/0x84) [] (msm_iommu_unmap) from [] (msm_gem_free_object+0x11c/0x338) [] (msm_gem_free_object) from [] (drm_gem_object_handle_unreference_unlocked+0xfc/0x130) [] (drm_gem_object_handle_unreference_unlocked) from [] (drm_gem_object_release_handle+0x50/0x68) [] (drm_gem_object_release_handle) from [] (idr_for_each+0xa8/0xdc) [] (idr_for_e
[PATCH 1/2] clk: Make __clk_lookup() use a list instead of tree search
On 09/04, Stephen Boyd wrote: > In the near future we're going to move the prepare lock to be a > per-clock ww_mutex. __clk_lookup() is called very deep in the > set-rate path and we would like to avoid having to take all the > locks in the clock tree to search for a clock (basically > defeating the purpose of introducing per-clock locks). Introduce > a new list that contains all clocks registered in the system and > walk this list until the clock is found. > > Signed-off-by: Stephen Boyd > --- Actually this won't work. We can't grab the list lock while the prepare lock is held. So we need to do the debugfs stuff with a different lock and do it outside of the prepare lock. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[Bug 82667] "Distance" game closed alpha: Changing shadow settings segfaults
https://bugs.freedesktop.org/show_bug.cgi?id=82667 --- Comment #3 from Christoph Haag --- Created attachment 105768 --> https://bugs.freedesktop.org/attachment.cgi?id=105768&action=edit The Book of Unwritten Tales 2 Early access - same backtrace The same happens when changing the graphics quality setting in "The Book of Unwritten Tales 2" Early Access (for kickstarter backers). Interestingly with the mesa debug build it works sometimes(?), but when changing the settings a few times, it very quickly crashes. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/0bba279b/attachment.html>
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #15 from smoki --- Default what Debian 32bit passes in addition to mesa options is: /usr/lib/gcc/i586-linux-gnu/4.9/cc1 -E -quiet -v -imultiarch i386-linux-gnu - -mtune=generic -march=i586 That is with -mtune=generic -march=i586, with -mtune=native -march=native on AMD Kabini it passes this: /usr/lib/gcc/i586-linux-gnu/4.9/cc1 -E -quiet -v -imultiarch i386-linux-gnu - -march=btver2 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mbmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mno-fsgsbase -mno-rdseed -mprfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=btver2 So Debian default works fine for the 10.2 branch, and for the 10.3-devel up to the commit 37d43ebb28ce8be38f3d9b0805b8b14354ce786d . But anything newer is affected, 10.3 branch and git master. Maybe some of those options need to be added to default, not sure what of these make is work normal :) -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/af93049b/attachment.html>
ccf vs iommu vs drm locking fun
So, I was looking at the below lockdep splat, and discussing it a bit w/ sboyd on IRC, and came to a slightly disturbing realization.. The interaction between prepare_lock and debugfs bits is a little bit worrying. In particular, it is probably not a good idea to assume that anyone who needs to grab prepare_lock does not already hold mmap_sem. Not holding mmap_sem or locks that interact w/ mmap_sem is going to be pretty hard to avoid, at least for gpu drivers that are using iommus that are using CCF ;-) BR, -R -- [15928.894558] [15928.894609] == [15928.895145] [ INFO: possible circular locking dependency detected ] [15928.901141] 3.17.0-rc1-00050-g07a489b #802 Tainted: GW [15928.907335] --- [15928.907348] Xorg.bin/5413 is trying to acquire lock: [15928.907417] (prepare_lock){+.+.+.}, at: [] clk_prepare_lock+0x88/0xfc [15928.907424] [15928.907424] but task is already holding lock: [15928.907508] (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0 [15928.907519] [15928.907519] which lock already depends on the new lock. [15928.907519] [15928.907532] [15928.907532] the existing dependency chain (in reverse order) is: [15928.907575] [15928.907575] -> #4 (qcom_iommu_lock){+.+...}: [15928.907611][] qcom_iommu_map+0x28/0x450 [15928.907634][] iommu_map+0xc8/0x12c [15928.907662][] msm_iommu_map+0xb4/0x130 [15928.907681][] msm_gem_get_iova_locked+0x9c/0xe8 [15928.907714][] msm_gem_get_iova+0x4c/0x64 [15928.907765][] mdp4_kms_init+0x4c4/0x6c0 [15928.907813][] msm_load+0x2ac/0x34c [15928.907846][] drm_dev_register+0xac/0x108 [15928.907868][] drm_platform_init+0x50/0xf0 [15928.907892][] try_to_bring_up_master.part.3+0xc8/0x108 [15928.907913][] component_master_add_with_match+0xa8/0x104 [15928.907934][] msm_pdev_probe+0x64/0x70 [15928.907955][] platform_drv_probe+0x2c/0x60 [15928.907983][] driver_probe_device+0x108/0x234 [15928.908003][] bus_for_each_drv+0x64/0x98 [15928.908040][] device_attach+0x78/0x8c [15928.908082][] bus_probe_device+0x88/0xac [15928.908126][] deferred_probe_work_func+0x68/0x9c [15928.908182][] process_one_work+0x1a0/0x40c [15928.908214][] worker_thread+0x44/0x4d8 [15928.908237][] kthread+0xd8/0xec [15928.908262][] ret_from_fork+0x14/0x2c [15928.908291] [15928.908291] -> #3 (&dev->struct_mutex){+.+.+.}: [15928.908311][] drm_gem_mmap+0x38/0xd0 [15928.908329][] msm_gem_mmap+0xc/0x5c [15928.908358][] mmap_region+0x35c/0x6c8 [15928.908377][] do_mmap_pgoff+0x314/0x398 [15928.908398][] vm_mmap_pgoff+0x84/0xb4 [15928.908416][] SyS_mmap_pgoff+0x94/0xbc [15928.908436][] ret_fast_syscall+0x0/0x48 [15928.908463] [15928.908463] -> #2 (&mm->mmap_sem){++}: [15928.908512][] filldir64+0x68/0x180 [15928.908558][] dcache_readdir+0x188/0x22c [15928.908593][] iterate_dir+0x9c/0x11c [15928.908616][] SyS_getdents64+0x78/0xe8 [15928.908640][] ret_fast_syscall+0x0/0x48 [15928.908671] [15928.908671] -> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}: [15928.908706][] __create_file+0x58/0x1dc [15928.908728][] debugfs_create_dir+0x1c/0x24 [15928.908761][] clk_debug_create_subtree+0x20/0x170 [15928.908790][] clk_debug_init+0xec/0x14c [15928.908816][] do_one_initcall+0x8c/0x1c8 [15928.908846][] kernel_init_freeable+0x13c/0x1dc [15928.908873][] kernel_init+0x8/0xe8 [15928.908898][] ret_from_fork+0x14/0x2c [15928.908925] [15928.908925] -> #0 (prepare_lock){+.+.+.}: [15928.908948][] mutex_lock_nested+0x70/0x3e8 [15928.908970][] clk_prepare_lock+0x88/0xfc [15928.909001][] clk_prepare+0xc/0x24 [15928.909022][] __enable_clocks.isra.4+0x18/0xa4 [15928.909041][] __flush_iotlb_va+0xe0/0x114 [15928.909071][] qcom_iommu_unmap+0xac/0x1f0 [15928.909093][] iommu_unmap+0x9c/0xe8 [15928.909112][] msm_iommu_unmap+0x64/0x84 [15928.909130][] msm_gem_free_object+0x11c/0x338 [15928.909149][] drm_gem_object_handle_unreference_unlocked+0xfc/0x130 [15928.909166][] drm_gem_object_release_handle+0x50/0x68 [15928.909199][] idr_for_each+0xa8/0xdc [15928.909225][] drm_gem_release+0x1c/0x28 [15928.909258][] drm_release+0x370/0x428 [15928.909302][] __fput+0x98/0x1e8 [15928.909339][] task_work_run+0xb0/0xfc [15928.909386][] do_exit+0x2ec/0x948 [15928.909415][] do_group_exit+0x4c/0xb8 [15928.909455][] get_signal+0x28c/0x6ac [15928.909507][] do_signal+0xc4/0x3e4 [15928.909548][] do_work_pending+0xb4/0xc4 [15928.909584][] work_pending+0xc/0x20 [15928.909595] [15928.909595] other info that might help us debug this: [15928.909595
[Bug 81644] Random crashes on RadeonSI with Chromium.
https://bugs.freedesktop.org/show_bug.cgi?id=81644 --- Comment #83 from Aaron B --- This bisect put me here, which looks like it didn't go as planned again... http://cgit.freedesktop.org/mesa/mesa/commit/?id=78578b759943cb198d34eedc00b3408c1599f6ec I'm going to give up for now, maybe when I don't have so many other things going on I'll bisect it over a week or so. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/12483c2a/attachment.html>
[PATCH 3/3] drm/radeon: allow concurrent buffer reads
From: Christian K?nig Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/radeon_cs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index ec4840c..f662de4 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -183,7 +183,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) } p->relocs[i].tv.bo = &p->relocs[i].robj->tbo; - p->relocs[i].tv.shared = false; + p->relocs[i].tv.shared = !r->write_domain; p->relocs[i].handle = r->handle; radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head, @@ -260,7 +260,8 @@ static void radeon_cs_sync_rings(struct radeon_cs_parser *p) continue; resv = p->relocs[i].robj->tbo.resv; - radeon_semaphore_sync_resv(p->ib.semaphore, resv, false); + radeon_semaphore_sync_resv(p->ib.semaphore, resv, + p->relocs[i].tv.shared); } } -- 1.9.1
[PATCH 2/3] drm/radeon: add the infrastructure for concurrent buffer access
From: Christian K?nig This allows us to specify if we want to sync to the shared fences of a reservation object or not. Signed-off-by: Christian K?nig --- drivers/gpu/drm/radeon/cik.c | 25 ++- drivers/gpu/drm/radeon/cik_sdma.c | 25 ++- drivers/gpu/drm/radeon/evergreen_dma.c| 24 +- drivers/gpu/drm/radeon/r100.c | 21 + drivers/gpu/drm/radeon/r200.c | 21 + drivers/gpu/drm/radeon/r600.c | 23 +- drivers/gpu/drm/radeon/r600_dma.c | 25 ++- drivers/gpu/drm/radeon/radeon.h | 43 +- drivers/gpu/drm/radeon/radeon_asic.h | 74 --- drivers/gpu/drm/radeon/radeon_benchmark.c | 30 ++--- drivers/gpu/drm/radeon/radeon_cs.c| 8 +--- drivers/gpu/drm/radeon/radeon_ib.c| 2 +- drivers/gpu/drm/radeon/radeon_semaphore.c | 38 ++-- drivers/gpu/drm/radeon/radeon_test.c | 24 +++--- drivers/gpu/drm/radeon/radeon_ttm.c | 12 ++--- drivers/gpu/drm/radeon/radeon_vm.c| 16 ++- drivers/gpu/drm/radeon/rv770_dma.c| 25 ++- drivers/gpu/drm/radeon/si_dma.c | 25 ++- 18 files changed, 253 insertions(+), 208 deletions(-) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 1f598ab..0b5a230 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -3959,18 +3959,19 @@ bool cik_semaphore_ring_emit(struct radeon_device *rdev, * @src_offset: src GPU address * @dst_offset: dst GPU address * @num_gpu_pages: number of GPU pages to xfer - * @fence: radeon fence object + * @resv: reservation object to sync to * * Copy GPU paging using the CP DMA engine (CIK+). * Used by the radeon ttm implementation to move pages if * registered as the asic copy callback. */ -int cik_copy_cpdma(struct radeon_device *rdev, - uint64_t src_offset, uint64_t dst_offset, - unsigned num_gpu_pages, - struct radeon_fence **fence) +struct radeon_fence *cik_copy_cpdma(struct radeon_device *rdev, + uint64_t src_offset, uint64_t dst_offset, + unsigned num_gpu_pages, + struct reservation_object *resv) { struct radeon_semaphore *sem = NULL; + struct radeon_fence *fence; int ring_index = rdev->asic->copy.blit_ring_index; struct radeon_ring *ring = &rdev->ring[ring_index]; u32 size_in_bytes, cur_size_in_bytes, control; @@ -3980,7 +3981,7 @@ int cik_copy_cpdma(struct radeon_device *rdev, r = radeon_semaphore_create(rdev, &sem); if (r) { DRM_ERROR("radeon: moving bo (%d).\n", r); - return r; + return ERR_PTR(r); } size_in_bytes = (num_gpu_pages << RADEON_GPU_PAGE_SHIFT); @@ -3989,10 +3990,10 @@ int cik_copy_cpdma(struct radeon_device *rdev, if (r) { DRM_ERROR("radeon: moving bo (%d).\n", r); radeon_semaphore_free(rdev, &sem, NULL); - return r; + return ERR_PTR(r); } - radeon_semaphore_sync_to(sem, *fence); + radeon_semaphore_sync_resv(sem, resv, false); radeon_semaphore_sync_rings(rdev, sem, ring->idx); for (i = 0; i < num_loops; i++) { @@ -4014,17 +4015,17 @@ int cik_copy_cpdma(struct radeon_device *rdev, dst_offset += cur_size_in_bytes; } - r = radeon_fence_emit(rdev, fence, ring->idx); + r = radeon_fence_emit(rdev, &fence, ring->idx); if (r) { radeon_ring_unlock_undo(rdev, ring); radeon_semaphore_free(rdev, &sem, NULL); - return r; + return ERR_PTR(r); } radeon_ring_unlock_commit(rdev, ring, false); - radeon_semaphore_free(rdev, &sem, *fence); + radeon_semaphore_free(rdev, &sem, fence); - return r; + return fence; } /* diff --git a/drivers/gpu/drm/radeon/cik_sdma.c b/drivers/gpu/drm/radeon/cik_sdma.c index 192278b..c01a610 100644 --- a/drivers/gpu/drm/radeon/cik_sdma.c +++ b/drivers/gpu/drm/radeon/cik_sdma.c @@ -537,18 +537,19 @@ void cik_sdma_fini(struct radeon_device *rdev) * @src_offset: src GPU address * @dst_offset: dst GPU address * @num_gpu_pages: number of GPU pages to xfer - * @fence: radeon fence object + * @resv: reservation object to sync to * * Copy GPU paging using the DMA engine (CIK). * Used by the radeon ttm implementation to move pages if * registered as the asic copy callback. */ -int cik_copy_dma(struct radeon_device *rdev, -uint64_t src_offset, uint64_t dst_offset, -unsigned num_gpu_pages, -struct radeon_fence **fence) +struct radeon_fence *cik_copy_dma(struct radeon_device *rdev, + uint64_t src_offs
[PATCH 1/3] drm/ttm: allow fence to be added as shared
From: Christian K?nig This patch adds a new flag to the ttm_validate_buffer list to add the fence as shared to the reservation object. Signed-off-by: Christian K?nig --- drivers/gpu/drm/qxl/qxl_release.c| 1 + drivers/gpu/drm/radeon/radeon_cs.c | 1 + drivers/gpu/drm/radeon/radeon_vm.c | 3 +++ drivers/gpu/drm/ttm/ttm_execbuf_util.c | 18 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 3 +++ drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 5 + include/drm/ttm/ttm_execbuf_util.h | 2 ++ 7 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c index a6e19c8..446e71c 100644 --- a/drivers/gpu/drm/qxl/qxl_release.c +++ b/drivers/gpu/drm/qxl/qxl_release.c @@ -226,6 +226,7 @@ int qxl_release_list_add(struct qxl_release *release, struct qxl_bo *bo) qxl_bo_ref(bo); entry->tv.bo = &bo->tbo; + entry->tv.shared = false; list_add_tail(&entry->tv.head, &release->bos); return 0; } diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 6e3d1c8..cd517ab 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -183,6 +183,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p) } p->relocs[i].tv.bo = &p->relocs[i].robj->tbo; + p->relocs[i].tv.shared = false; p->relocs[i].handle = r->handle; radeon_cs_buckets_add(&buckets, &p->relocs[i].tv.head, diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c index 671ee56..1cce446 100644 --- a/drivers/gpu/drm/radeon/radeon_vm.c +++ b/drivers/gpu/drm/radeon/radeon_vm.c @@ -143,6 +143,7 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev, list[0].prefered_domains = RADEON_GEM_DOMAIN_VRAM; list[0].allowed_domains = RADEON_GEM_DOMAIN_VRAM; list[0].tv.bo = &vm->page_directory->tbo; + list[0].tv.shared = false; list[0].tiling_flags = 0; list[0].handle = 0; list_add(&list[0].tv.head, head); @@ -156,6 +157,7 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev, list[idx].prefered_domains = RADEON_GEM_DOMAIN_VRAM; list[idx].allowed_domains = RADEON_GEM_DOMAIN_VRAM; list[idx].tv.bo = &list[idx].robj->tbo; + list[idx].tv.shared = false; list[idx].tiling_flags = 0; list[idx].handle = 0; list_add(&list[idx++].tv.head, head); @@ -395,6 +397,7 @@ static int radeon_vm_clear_bo(struct radeon_device *rdev, memset(&tv, 0, sizeof(tv)); tv.bo = &bo->tbo; + tv.shared = false; INIT_LIST_HEAD(&head); list_add(&tv.head, &head); diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c index adafc0f..8ce508e 100644 --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c @@ -119,8 +119,14 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, ret = -EBUSY; } - if (!ret) - continue; + if (!ret) { + if (!entry->shared) + continue; + + ret = reservation_object_reserve_shared(bo->resv); + if (!ret) + continue; + } /* uh oh, we lost out, drop every reservation and try * to only reserve this buffer, then start over if @@ -136,6 +142,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, ret = 0; } + if (!ret && entry->shared) + ret = reservation_object_reserve_shared(bo->resv); + if (unlikely(ret != 0)) { if (ret == -EINTR) ret = -ERESTARTSYS; @@ -183,7 +192,10 @@ void ttm_eu_fence_buffer_objects(struct ww_acquire_ctx *ticket, list_for_each_entry(entry, list, head) { bo = entry->bo; - reservation_object_add_excl_fence(bo->resv, fence); + if (entry->shared) + reservation_object_add_shared_fence(bo->resv, fence); + else + reservation_object_add_excl_fence(bo->resv, fence); ttm_bo_add_to_lru(bo); __ttm_bo_unreserve(bo); } diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c index 0ceaddc..b4de3b2 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -346,6 +346,7 @@ static int vmw_bo_to_validate_list(struct vmw_sw_context *sw_context, ++sw_context->cur_val_buf;
[Bug 83510] Graphical glitches in Unreal Engine 4
https://bugs.freedesktop.org/show_bug.cgi?id=83510 --- Comment #4 from Cl?ment Gu?rin --- Created attachment 105767 --> https://bugs.freedesktop.org/attachment.cgi?id=105767&action=edit Black pixel + low light glitch in Realistic Rendering demo Should look like this: https://www.youtube.com/watch?v=3AAf_2vry8A -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/262af647/attachment-0001.html>
[Bug 83510] Graphical glitches in Unreal Engine 4
https://bugs.freedesktop.org/show_bug.cgi?id=83510 --- Comment #3 from Cl?ment Gu?rin --- Created attachment 105766 --> https://bugs.freedesktop.org/attachment.cgi?id=105766&action=edit Black pixel glitch in Mobile Game demo -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/abf5ce60/attachment.html>
[Bug 83510] Graphical glitches in Unreal Engine 4
https://bugs.freedesktop.org/show_bug.cgi?id=83510 --- Comment #2 from Cl?ment Gu?rin --- Created attachment 105765 --> https://bugs.freedesktop.org/attachment.cgi?id=105765&action=edit Black pixel glitch in Stylized demo -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/bbe93ee6/attachment.html>
[Bug 83510] Graphical glitches in Unreal Engine 4
https://bugs.freedesktop.org/show_bug.cgi?id=83510 --- Comment #1 from Cl?ment Gu?rin --- Created attachment 105764 --> https://bugs.freedesktop.org/attachment.cgi?id=105764&action=edit Motion blur glitch in Shooter Game -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/1ecd54ce/attachment.html>
[Bug 83510] New: Graphical glitches in Unreal Engine 4
https://bugs.freedesktop.org/show_bug.cgi?id=83510 Priority: medium Bug ID: 83510 Assignee: dri-devel at lists.freedesktop.org Summary: Graphical glitches in Unreal Engine 4 Severity: normal Classification: Unclassified OS: Linux (All) Reporter: geecko.dev at free.fr Hardware: Other Status: NEW Version: git Component: Drivers/Gallium/radeonsi Product: Mesa While all the Linux demos are now running fine, there is some rendering errors left. - Some objects are blurred. It looks like motion blur on objects that don't need it, like cameras, characters or weapons. You can see it in the Elemental Demo or Shooter Game demo. - Some pixels turn black when the camera is static, very noticeable on the sky box in the Mobile Game demo. - It seems that the lighting is too dark in some scenes, can be seen in the Realistic Rendering demo or the Shooter Game demo with the Sanctuary map. HD 7950, mesa-git and llvm-svn on latest Arch Linux. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/75e24436/attachment.html>
TTM placement & caching issue/questions
On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote: > On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: > > Last time I tested, (and it seems like Michel is on the same track), > > writing with the CPU to write-combined memory was substantially faster > > than writing to cached memory, with the additional side-effect that CPU > > caches are left unpolluted. > > > > Moreover (although only tested on Intel's embedded chipsets), texturing > > from cpu-cache-coherent PCI memory was a real GPU performance hog > > compared to texturing from non-snooped memory. Hence, whenever a buffer > > could be classified as GPU-read-only (or almost at least), it should be > > placed in write-combined memory. > > Just a quick comment since this explicitly referes to intel chips: On > desktop/laptop chips with the big shared l3/l4 caches it's the other way > round. Cached uploads are substantially faster than wc and not using > coherent access is a severe perf hit for texturing. I guess the hw guys > worked really hard to hide the snooping costs so that the gpu can benefit > from the massive bandwidth these caches can provide. This is similar to modern POWER chips as well. We have pretty big L3's (though not technically shared they are in a separate quadrant and we have a shared L4 in the memory buffer) and our fabric is generally optimized for cachable/coherent access performance. In fact, we only have so many credits for NC accesses on the bus... What that tells me is that when setting up the desired cachability attributes for the mapping of a memory object, we need to consider these things here: - The hard requirement of the HW (non-coherent GPUs require NC, AGP does in some cases, etc...) which I think is basically already handled using the placement attributes set by the GPU driver for the memory type - The optimal attributes (and platform hard requirements) for fast memory accesses to an object by the processor. From what I read here, this can be NC+WC on older Intel, cachable on newer, etc...) - The optimal attributes for fast GPU DMA accesses to the object in system memory. Here too, this is fairly platform/chipset dependent. Do we have flags in the DRM that tell us whether an object in memory is more likely to be used by the GPU via DMA vs by the CPU via MMIO ? On powerpc (except in the old AGP case), I wouldn't care about require cachable in both case, but I can see the low latency crowd wanting the former to be non-cachable while the dumb GPUs like AST who don't do DMA would benefit greatly from the latter... Cheers, Ben.
[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()
On Thu, 2014-09-04 at 16:52 +0900, Michel D?nzer wrote: > > #endif > > +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE) > > + /* > > + * Using a non-cachable mapping of system memory on > > + * cache coherent powerpc's can be fatal, let's make > > + * sure this doesn't happen and warn if it does. The > > + * only exception is powermac with AGP which has to > > + * take the risk. > > + */ > > + if (!machine_is(powermac) && > > + ((caching_flags & TTM_PL_FLAG_SYSTEM) || > > + (caching_flags & TTM_PL_FLAG_TT))) { > > + pr_err_once("TTM: Attempt to use a non-cached" > > + " mapping on RAM unsupported !\n"); > > + return tmp; > > NAK, this breaks AGP on PowerMacs. No it doesn't :-) Cheers, Ben.
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #14 from Emil Velikov --- (In reply to comment #12) > That happens with default ./autogen blah blah > > As i said earlier if i pass -mtune=native -march=native flags perf is > normal, very weird and funny thing in the same time :D Unless you're provided --enable-debug mesa does mess around with the compiler optimisation/debug options (-O* and -g*). And it never touches -mtune or -march. Might be that the compiler is going for some different heuristics before/after the commit causing substantially different code to be generated ? -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/d439c493/attachment.html>
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #13 from smoki --- (In reply to comment #12) > That happens with default ./autogen blah blah > > As i said earlier if i pass -mtune=native -march=native flags perf is > normal, very weird and funny thing in the same time :D But almost normal, as i said earlier too glretrace for example is slowish again :) Very weird issues, all is fine on 37d43ebb28ce8be38f3d9b0805b8b14354ce786d. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/07e398d8/attachment.html>
[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()
On 04.09.2014 18:34, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 16:52 +0900, Michel D?nzer wrote: >>>#endif >>> +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE) >>> + /* >>> + * Using a non-cachable mapping of system memory on >>> + * cache coherent powerpc's can be fatal, let's make >>> + * sure this doesn't happen and warn if it does. The >>> + * only exception is powermac with AGP which has to >>> + * take the risk. >>> + */ >>> + if (!machine_is(powermac) && >>> + ((caching_flags & TTM_PL_FLAG_SYSTEM) || >>> + (caching_flags & TTM_PL_FLAG_TT))) { >>> + pr_err_once("TTM: Attempt to use a non-cached" >>> + " mapping on RAM unsupported !\n"); >>> + return tmp; >> >> NAK, this breaks AGP on PowerMacs. > > No it doesn't :-) Yeah sorry, I was blind. -- Earthling Michel D?nzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
[Bug 83861] radeon power management cause audio skips and glitch
https://bugzilla.kernel.org/show_bug.cgi?id=83861 Rafael J. Wysocki changed: What|Removed |Added Component|Other |Video(DRI - non Intel) Assignee|rjw at rjwysocki.net |drivers_video-dri at kernel-bu ||gs.osdl.org Product|Power Management|Drivers -- You are receiving this mail because: You are watching the assignee of the bug.
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #12 from smoki --- That happens with default ./autogen blah blah As i said earlier if i pass -mtune=native -march=native flags perf is normal, very weird and funny thing in the same time :D -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/2bfe8b0d/attachment.html>
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #11 from smoki --- (In reply to comment #8) > Can you guys bisect? So latest good is 37d43ebb28ce8be38f3d9b0805b8b14354ce786d, after 07c65b85eada8dd34019763b6e82ed4257a9b4a6 there is corruption (so i can't tested this) all the way until 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 one week later where is a fix for 3.17 kernel. Took the patch tried 07c65b85eada8dd34019763b6e82ed4257a9b4a6 again an bingo, blah, blah... So PIPE_USAGE_STREAM seems to be a main problem again, but unlike 64bit on 32bit it can't be reverted to old behavior, on 32bit it is unusable produce much corruption. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/f483404c/attachment.html>
[PATCH 2/2] clk: Don't hold prepare_lock across debugfs creation
Rob Clark reports a lockdep splat that involves the prepare_lock chained with the mmap semaphore. == [ INFO: possible circular locking dependency detected ] 3.17.0-rc1-00050-g07a489b #802 Tainted: GW --- Xorg.bin/5413 is trying to acquire lock: (prepare_lock){+.+.+.}, at: [] clk_prepare_lock+0x88/0xfc but task is already holding lock: (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (qcom_iommu_lock){+.+...}: [] qcom_iommu_map+0x28/0x450 [] iommu_map+0xc8/0x12c [] msm_iommu_map+0xb4/0x130 [] msm_gem_get_iova_locked+0x9c/0xe8 [] msm_gem_get_iova+0x4c/0x64 [] mdp4_kms_init+0x4c4/0x6c0 [] msm_load+0x2ac/0x34c [] drm_dev_register+0xac/0x108 [] drm_platform_init+0x50/0xf0 [] try_to_bring_up_master.part.3+0xc8/0x108 [] component_master_add_with_match+0xa8/0x104 [] msm_pdev_probe+0x64/0x70 [] platform_drv_probe+0x2c/0x60 [] driver_probe_device+0x108/0x234 [] bus_for_each_drv+0x64/0x98 [] device_attach+0x78/0x8c [] bus_probe_device+0x88/0xac [] deferred_probe_work_func+0x68/0x9c [] process_one_work+0x1a0/0x40c [] worker_thread+0x44/0x4d8 [] kthread+0xd8/0xec [] ret_from_fork+0x14/0x2c -> #3 (&dev->struct_mutex){+.+.+.}: [] drm_gem_mmap+0x38/0xd0 [] msm_gem_mmap+0xc/0x5c [] mmap_region+0x35c/0x6c8 [] do_mmap_pgoff+0x314/0x398 [] vm_mmap_pgoff+0x84/0xb4 [] SyS_mmap_pgoff+0x94/0xbc [] ret_fast_syscall+0x0/0x48 -> #2 (&mm->mmap_sem){++}: [] filldir64+0x68/0x180 [] dcache_readdir+0x188/0x22c [] iterate_dir+0x9c/0x11c [] SyS_getdents64+0x78/0xe8 [] ret_fast_syscall+0x0/0x48 -> #1 (&sb->s_type->i_mutex_key#3){+.+.+.}: [] __create_file+0x58/0x1dc [] debugfs_create_dir+0x1c/0x24 [] clk_debug_create_subtree+0x20/0x170 [] clk_debug_init+0xec/0x14c [] do_one_initcall+0x8c/0x1c8 [] kernel_init_freeable+0x13c/0x1dc [] kernel_init+0x8/0xe8 [] ret_from_fork+0x14/0x2c -> #0 (prepare_lock){+.+.+.}: [] mutex_lock_nested+0x70/0x3e8 [] clk_prepare_lock+0x88/0xfc [] clk_prepare+0xc/0x24 [] __enable_clocks.isra.4+0x18/0xa4 [] __flush_iotlb_va+0xe0/0x114 [] qcom_iommu_unmap+0xac/0x1f0 [] iommu_unmap+0x9c/0xe8 [] msm_iommu_unmap+0x64/0x84 [] msm_gem_free_object+0x11c/0x338 [] drm_gem_object_handle_unreference_unlocked+0xfc/0x130 [] drm_gem_object_release_handle+0x50/0x68 [] idr_for_each+0xa8/0xdc [] drm_gem_release+0x1c/0x28 [] drm_release+0x370/0x428 [] __fput+0x98/0x1e8 [] task_work_run+0xb0/0xfc [] do_exit+0x2ec/0x948 [] do_group_exit+0x4c/0xb8 [] get_signal+0x28c/0x6ac [] do_signal+0xc4/0x3e4 [] do_work_pending+0xb4/0xc4 [] work_pending+0xc/0x20 other info that might help us debug this: Chain exists of: prepare_lock --> &dev->struct_mutex --> qcom_iommu_lock Possible unsafe locking scenario: CPU0CPU1 lock(qcom_iommu_lock); lock(&dev->struct_mutex); lock(qcom_iommu_lock); lock(prepare_lock); *** DEADLOCK *** 3 locks held by Xorg.bin/5413: #0: (drm_global_mutex){+.+.+.}, at: [] drm_release+0x34/0x428 #1: (&dev->struct_mutex){+.+.+.}, at: [] drm_gem_object_handle_unreference_unlocked+0xcc/0x130 #2: (qcom_iommu_lock){+.+...}, at: [] qcom_iommu_unmap+0x1c/0x1f0 stack backtrace: CPU: 1 PID: 5413 Comm: Xorg.bin Tainted: GW 3.17.0-rc1-00050-g07a489b #802 [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x98/0xb8) [] (dump_stack) from [] (print_circular_bug+0x218/0x340) [] (print_circular_bug) from [] (__lock_acquire+0x1d24/0x20b8) [] (__lock_acquire) from [] (lock_acquire+0x9c/0xbc) [] (lock_acquire) from [] (mutex_lock_nested+0x70/0x3e8) [] (mutex_lock_nested) from [] (clk_prepare_lock+0x88/0xfc) [] (clk_prepare_lock) from [] (clk_prepare+0xc/0x24) [] (clk_prepare) from [] (__enable_clocks.isra.4+0x18/0xa4) [] (__enable_clocks.isra.4) from [] (__flush_iotlb_va+0xe0/0x114) [] (__flush_iotlb_va) from [] (qcom_iommu_unmap+0xac/0x1f0) [] (qcom_iommu_unmap) from [] (iommu_unmap+0x9c/0xe8) [] (iommu_unmap) from [] (msm_iommu_unmap+0x64/0x84) [] (msm_iommu_unmap) from [] (msm_gem_free_object+0x11c/0x338) [] (msm_gem_free_object) from [] (drm_gem_object_handle_unreference_unlocked+0xfc/0x130) [] (drm_gem_object_handle_unreference_unlocked) from [] (drm_gem_object_release_handle+0x50/0x68) [] (drm_gem_object_release_handle) from [] (idr_for_each+0xa8/0xdc) [] (idr_for_e
[PATCH 1/2] clk: Make __clk_lookup() use a list instead of tree search
In the near future we're going to move the prepare lock to be a per-clock ww_mutex. __clk_lookup() is called very deep in the set-rate path and we would like to avoid having to take all the locks in the clock tree to search for a clock (basically defeating the purpose of introducing per-clock locks). Introduce a new list that contains all clocks registered in the system and walk this list until the clock is found. Signed-off-by: Stephen Boyd --- Yeah this commit text could be updated and/or this could be squashed into the next patch. drivers/clk/clk.c | 52 + include/linux/clk-private.h | 1 + 2 files changed, 21 insertions(+), 32 deletions(-) diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index b76fa69b44cb..cf5df744cb21 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -33,8 +33,10 @@ static struct task_struct *enable_owner; static int prepare_refcnt; static int enable_refcnt; +static DEFINE_MUTEX(clk_lookup_lock); static HLIST_HEAD(clk_root_list); static HLIST_HEAD(clk_orphan_list); +static HLIST_HEAD(clk_lookup_list); static LIST_HEAD(clk_notifier_list); /*** locking ***/ @@ -670,46 +672,23 @@ out: } EXPORT_SYMBOL_GPL(__clk_is_enabled); -static struct clk *__clk_lookup_subtree(const char *name, struct clk *clk) -{ - struct clk *child; - struct clk *ret; - - if (!strcmp(clk->name, name)) - return clk; - - hlist_for_each_entry(child, &clk->children, child_node) { - ret = __clk_lookup_subtree(name, child); - if (ret) - return ret; - } - - return NULL; -} - struct clk *__clk_lookup(const char *name) { - struct clk *root_clk; - struct clk *ret; + struct clk *clk; if (!name) return NULL; - /* search the 'proper' clk tree first */ - hlist_for_each_entry(root_clk, &clk_root_list, child_node) { - ret = __clk_lookup_subtree(name, root_clk); - if (ret) - return ret; + mutex_lock(&clk_lookup_lock); + hlist_for_each_entry(clk, &clk_lookup_list, lookup_node) { + if (!strcmp(clk->name, name)) + goto found; } + clk = NULL; +found: + mutex_unlock(&clk_lookup_lock); - /* if not found, then search the orphan tree */ - hlist_for_each_entry(root_clk, &clk_orphan_list, child_node) { - ret = __clk_lookup_subtree(name, root_clk); - if (ret) - return ret; - } - - return NULL; + return clk; } /* @@ -1823,6 +1802,11 @@ int __clk_init(struct device *dev, struct clk *clk) clk->parent = __clk_init_parent(clk); + /* Insert into clock lookup list */ + mutex_lock(&clk_lookup_lock); + hlist_add_head(&clk->lookup_node, &clk_lookup_list); + mutex_unlock(&clk_lookup_lock); + /* * Populate clk->parent if parent has already been __clk_init'd. If * parent has not yet been __clk_init'd then place clk in the orphan @@ -2117,6 +2101,10 @@ void clk_unregister(struct clk *clk) hlist_del_init(&clk->child_node); + mutex_lock(&clk_lookup_lock); + hlist_del_init(&clk->lookup_node); + mutex_unlock(&clk_lookup_lock); + if (clk->prepare_count) pr_warn("%s: unregistering prepared clock: %s\n", __func__, clk->name); diff --git a/include/linux/clk-private.h b/include/linux/clk-private.h index efbf70b9fd84..3cd98a930006 100644 --- a/include/linux/clk-private.h +++ b/include/linux/clk-private.h @@ -48,6 +48,7 @@ struct clk { unsigned long accuracy; struct hlist_head children; struct hlist_node child_node; + struct hlist_node lookup_node; unsigned intnotifier_count; #ifdef CONFIG_DEBUG_FS struct dentry *dentry; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation
[Bug 79980] Random radeonsi crashes
https://bugs.freedesktop.org/show_bug.cgi?id=79980 --- Comment #129 from AdrianG --- Radeon 8550g/8670m - doesn't get passed login screen with 3.17-rc3. At least in rc1 I could get to the desktop but then it would almost immediately hang. (distro: Ubuntu 14.04 standard + Gnome 3.2). Works like a charm on kernel 3.14* -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/851dad81/attachment.html>
[PATCH -v3 2/4] drm/i915: split intel_update_plane into check() and commit()
On Wed, Sep 03, 2014 at 05:10:16PM -0300, Gustavo Padovan wrote: > From: Gustavo Padovan > > Due to the upcoming atomic modesetting feature we need to separate > some update functions into a check step that can fail and a commit > step that should, ideally, never fail. > > This commit splits intel_update_plane() and its commit part can still > fail due to the fb pinning procedure. Crap. I wrote a reply and somehow it seems to have gotten lost before I sent it out. I'll try to recall it all again... > > Signed-off-by: Gustavo Padovan > --- > drivers/gpu/drm/i915/intel_sprite.c | 253 > +--- > 1 file changed, 150 insertions(+), 103 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_sprite.c > b/drivers/gpu/drm/i915/intel_sprite.c > index 07a74ef..7b0d1a9 100644 > --- a/drivers/gpu/drm/i915/intel_sprite.c > +++ b/drivers/gpu/drm/i915/intel_sprite.c > @@ -845,57 +845,24 @@ static bool colorkey_enabled(struct intel_plane > *intel_plane) > } > > static int > -intel_update_plane(struct drm_plane *plane, struct drm_crtc *crtc, > -struct drm_framebuffer *fb, int crtc_x, int crtc_y, > -unsigned int crtc_w, unsigned int crtc_h, > -uint32_t src_x, uint32_t src_y, > -uint32_t src_w, uint32_t src_h) > +intel_check_sprite_plane(struct drm_plane *plane, > + struct intel_plane_state *state) > { > - struct drm_device *dev = plane->dev; > - struct intel_crtc *intel_crtc = to_intel_crtc(crtc); > + struct intel_crtc *intel_crtc = to_intel_crtc(state->crtc); > struct intel_plane *intel_plane = to_intel_plane(plane); > - enum pipe pipe = intel_crtc->pipe; > + struct drm_framebuffer *fb = state->fb; > struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); > struct drm_i915_gem_object *obj = intel_fb->obj; > - struct drm_i915_gem_object *old_obj = intel_plane->obj; > - int ret; > - bool primary_enabled; > - bool visible; > + int crtc_x, crtc_y; > + unsigned int crtc_w, crtc_h; > + uint32_t src_x, src_y, src_w, src_h; > + struct drm_rect *src = &state->src; > + struct drm_rect *dst = &state->dst; > + struct drm_rect *orig_src = &state->orig_src; > + const struct drm_rect *clip = &state->clip; > int hscale, vscale; > int max_scale, min_scale; > int pixel_size = drm_format_plane_cpp(fb->pixel_format, 0); > - struct drm_rect src = { > - /* sample coordinates in 16.16 fixed point */ > - .x1 = src_x, > - .x2 = src_x + src_w, > - .y1 = src_y, > - .y2 = src_y + src_h, > - }; > - struct drm_rect dst = { > - /* integer pixels */ > - .x1 = crtc_x, > - .x2 = crtc_x + crtc_w, > - .y1 = crtc_y, > - .y2 = crtc_y + crtc_h, > - }; > - const struct drm_rect clip = { > - .x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0, > - .y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0, > - }; > - const struct { > - int crtc_x, crtc_y; > - unsigned int crtc_w, crtc_h; > - uint32_t src_x, src_y, src_w, src_h; > - } orig = { > - .crtc_x = crtc_x, > - .crtc_y = crtc_y, > - .crtc_w = crtc_w, > - .crtc_h = crtc_h, > - .src_x = src_x, > - .src_y = src_y, > - .src_w = src_w, > - .src_h = src_h, > - }; > > /* Don't modify another pipe's plane */ > if (intel_plane->pipe != intel_crtc->pipe) { > @@ -927,55 +894,55 @@ intel_update_plane(struct drm_plane *plane, struct > drm_crtc *crtc, > max_scale = intel_plane->max_downscale << 16; > min_scale = intel_plane->can_scale ? 1 : (1 << 16); > > - drm_rect_rotate(&src, fb->width << 16, fb->height << 16, > + drm_rect_rotate(src, fb->width << 16, fb->height << 16, > intel_plane->rotation); > > - hscale = drm_rect_calc_hscale_relaxed(&src, &dst, min_scale, max_scale); > + hscale = drm_rect_calc_hscale_relaxed(src, dst, min_scale, max_scale); > BUG_ON(hscale < 0); > > - vscale = drm_rect_calc_vscale_relaxed(&src, &dst, min_scale, max_scale); > + vscale = drm_rect_calc_vscale_relaxed(src, dst, min_scale, max_scale); > BUG_ON(vscale < 0); > > - visible = drm_rect_clip_scaled(&src, &dst, &clip, hscale, vscale); > + state->visible = drm_rect_clip_scaled(src, dst, clip, hscale, vscale); > > - crtc_x = dst.x1; > - crtc_y = dst.y1; > - crtc_w = drm_rect_width(&dst); > - crtc_h = drm_rect_height(&dst); > + crtc_x = dst->x1; > + crtc_y = dst->y1; > + crtc_w = drm_rect_width(dst); > + crtc_h = drm_rect_height(dst); > > - if (visible) { > + if (state->visible) { > /* check again in case clipping clamped the results
ccf vs iommu vs drm locking fun
On 09/04/14 17:46, Rob Clark wrote: > So, I was looking at the below lockdep splat, and discussing it a bit > w/ sboyd on IRC, and came to a slightly disturbing realization.. > > The interaction between prepare_lock and debugfs bits is a little bit > worrying. In particular, it is probably not a good idea to assume > that anyone who needs to grab prepare_lock does not already hold > mmap_sem. Not holding mmap_sem or locks that interact w/ mmap_sem is > going to be pretty hard to avoid, at least for gpu drivers that are > using iommus that are using CCF ;-) I'm thinking one way to fix this is to replace the tree traversal for debugfs registration with a list iteration of all registered clocks. That way we don't hold the prepare mutex across debugfs directory/file creation. This should break the chain. Now that debugfs isn't a hierarchy, this becomes a lot easier, we just need to keep a linked list of all the clocks that are registered. I already have that patch for my wwmutex series, but I didn't convert debugfs to use it. Two patches to follow. -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
TTM placement & caching issue/questions
On Thu, 2014-09-04 at 16:59 +0900, Michel D?nzer wrote: > > Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm > not alone, at least with AGP 1x it seems to work quite well for most > people. So I don't see the justification for intentionally breaking it > completely for all of us. Oh I wasn't arguing for breaking it, just jesting. We need to keep it working. It's amazing how well broken stuff actually work though :-) I mean, it's architecturally broken and if we get a collision between the cache and the NCU, the chip will crash. We just get lucky I suppose. Anyway, I'll try a different approach tomorrow see how it goes. Cheers, Ben.
[PATCH -v3 4/4] drm/i915: split intel_primary_plane_setplane() into check() and commit()
On Wed, Sep 03, 2014 at 05:10:18PM -0300, Gustavo Padovan wrote: > From: Gustavo Padovan > > As a preparation for atomic updates we need to split the code to check > everything we are going to commit first. This patch starts the work to > split intel_primary_plane_setplane() into check() and commit() parts. > > More work is expected on this to get a better split of the two steps. > Ideally the commit() step should never fail. > > Signed-off-by: Gustavo Padovan > --- > drivers/gpu/drm/i915/intel_display.c | 130 > +++ > 1 file changed, 72 insertions(+), 58 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index c3f1967..1e3985b 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -11663,63 +11663,37 @@ disable_unpin: > } > > static int > -intel_primary_plane_setplane(struct drm_plane *plane, struct drm_crtc *crtc, > - struct drm_framebuffer *fb, int crtc_x, int crtc_y, > - unsigned int crtc_w, unsigned int crtc_h, > - uint32_t src_x, uint32_t src_y, > - uint32_t src_w, uint32_t src_h) > +intel_check_primary_plane(struct drm_plane *plane, > + struct intel_plane_state *state) > +{ > + struct drm_crtc *crtc = state->crtc; > + struct drm_framebuffer *fb = state->fb; > + struct drm_rect *dest = &state->dst; > + struct drm_rect *src = &state->src; > + const struct drm_rect *clip = &state->clip; > + > + return drm_plane_helper_check_update(plane, crtc, fb, > + src, dest, clip, > + DRM_PLANE_HELPER_NO_SCALING, > + DRM_PLANE_HELPER_NO_SCALING, > + false, true, &state->visible); > +} > + > +static int > +intel_commit_primary_plane(struct drm_plane *plane, > +struct intel_plane_state *state) > { > + struct drm_crtc *crtc = state->crtc; > + struct drm_framebuffer *fb = state->fb; > struct drm_device *dev = crtc->dev; > struct drm_i915_private *dev_priv = dev->dev_private; > struct intel_crtc *intel_crtc = to_intel_crtc(crtc); > struct drm_i915_gem_object *obj = intel_fb_obj(fb); > struct drm_i915_gem_object *old_obj = intel_fb_obj(plane->fb); > - struct drm_rect dest = { > - /* integer pixels */ > - .x1 = crtc_x, > - .y1 = crtc_y, > - .x2 = crtc_x + crtc_w, > - .y2 = crtc_y + crtc_h, > - }; > - struct drm_rect src = { > - /* 16.16 fixed point */ > - .x1 = src_x, > - .y1 = src_y, > - .x2 = src_x + src_w, > - .y2 = src_y + src_h, > - }; > - const struct drm_rect clip = { > - /* integer pixels */ > - .x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0, > - .y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0, > - }; > - const struct { > - int crtc_x, crtc_y; > - unsigned int crtc_w, crtc_h; > - uint32_t src_x, src_y, src_w, src_h; > - } orig = { > - .crtc_x = crtc_x, > - .crtc_y = crtc_y, > - .crtc_w = crtc_w, > - .crtc_h = crtc_h, > - .src_x = src_x, > - .src_y = src_y, > - .src_w = src_w, > - .src_h = src_h, > - }; > struct intel_plane *intel_plane = to_intel_plane(plane); > - bool visible; > + struct drm_rect *src = &state->src; > int ret; > > - ret = drm_plane_helper_check_update(plane, crtc, fb, > - &src, &dest, &clip, > - DRM_PLANE_HELPER_NO_SCALING, > - DRM_PLANE_HELPER_NO_SCALING, > - false, true, &visible); > - > - if (ret) > - return ret; > - > /* >* If the CRTC isn't enabled, we're just pinning the framebuffer, >* updating the fb pointer, and returning without touching the > @@ -11754,7 +11728,7 @@ intel_primary_plane_setplane(struct drm_plane *plane, > struct drm_crtc *crtc, >* happens if userspace explicitly disables the plane by passing fb=0 >* because plane->fb still gets set and pinned. >*/ > - if (!visible) { > + if (!state->visible) { > mutex_lock(&dev->struct_mutex); > > /* > @@ -11801,7 +11775,7 @@ intel_primary_plane_setplane(struct drm_plane *plane, > struct drm_crtc *crtc, > intel_disable_fbc(dev); > } > } > - ret = intel_pipe_set_base(crtc, src.x1, src.y1, fb); > + ret = intel_pip
TTM placement & caching issue/questions
On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote: > > This will, from what I can tell, try to use the same caching mode as the > > original object: > > > > if ((cur_placement & caching) != 0) > > result |= (cur_placement & caching); > > > > And cur_placement comes from bo->mem.placement which as far as I can > > tell is based on the placement array which the drivers set up. > > This originates from the fact that when evicting GTT memory, on x86 it's > unnecessary and undesirable to switch caching mode when going to system. But that's what I don't quite understand. We have two different mappings here. The VRAM and the memory object. We wouldn't be "switching"... we are creating a temporary mapping for the memory object in order to do the memcpy, but we seem to be doing it by using the caching attributes of the VRAM object or am I missing something ? I don't see how that makes sense so I suppose I'm missing something here :-) > Last time I tested, (and it seems like Michel is on the same track), > writing with the CPU to write-combined memory was substantially faster > than writing to cached memory, with the additional side-effect that CPU > caches are left unpolluted. That's very strange indeed. It's certainly an x86 specific artifact, even if we were allowed by our hypervisor to map memory non-cachable (the HW somewhat can), we tend to have a higher throughput by going cachable, but that could be due to the way the PowerBus works (it's basically very biased toward cachable transactions). > I dislike the approach of rewriting placements. In some cases I think it > won't even work, because placements are declared 'static const' > > What I'd suggest is instead to intercept the driver response from > init_mem_type() and filter out undesired caching modes from > available_caching and default_caching, This was my original intent but Jerome seems to have different ideas (see his proposed patches). I'm happy to revive mine as well and post it as an alternative after I've tested it a bit more (tomorrow). > perhaps also looking at whether > the memory type is mappable or not. This should have the additional > benefit of working everywhere, and if a caching mode is selected that's > not available on the platform, you'll simply get an error. (I guess?) You mean that if not mappable we don't bother filtering ? The rule is really for me pretty simple: - If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable - If it's PCIe memory space (VRAM, registers, ...) it MUST be non-cachable. Cheers, Ben. > /Thomas > > > > > > Cheers, > > Ben. > > > > > > ___ > > dri-devel mailing list > > dri-devel at lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH -v3 3/4] drm/i915: split intel_cursor_plane_update() into check() and commit()
On Wed, Sep 03, 2014 at 05:10:17PM -0300, Gustavo Padovan wrote: > From: Gustavo Padovan > > Due to the upcoming atomic modesetting feature we need to separate > some update functions into a check step that can fail and a commit > step that should, ideally, never fail. > > The commit part can still fail, but that should be solved in another > upcoming patch. > > Signed-off-by: Gustavo Padovan > --- > drivers/gpu/drm/i915/intel_display.c | 104 > ++- > 1 file changed, 67 insertions(+), 37 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 22d3902..c3f1967 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -11896,51 +11896,42 @@ intel_cursor_plane_disable(struct drm_plane *plane) > } > > static int > -intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc, > - struct drm_framebuffer *fb, int crtc_x, int crtc_y, > - unsigned int crtc_w, unsigned int crtc_h, > - uint32_t src_x, uint32_t src_y, > - uint32_t src_w, uint32_t src_h) > +intel_check_cursor_plane(struct drm_plane *plane, > + struct intel_plane_state *state) > { > - struct intel_crtc *intel_crtc = to_intel_crtc(crtc); > - struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); > - struct drm_i915_gem_object *obj = intel_fb->obj; > - struct drm_rect dest = { > - /* integer pixels */ > - .x1 = crtc_x, > - .y1 = crtc_y, > - .x2 = crtc_x + crtc_w, > - .y2 = crtc_y + crtc_h, > - }; > - struct drm_rect src = { > - /* 16.16 fixed point */ > - .x1 = src_x, > - .y1 = src_y, > - .x2 = src_x + src_w, > - .y2 = src_y + src_h, > - }; > - const struct drm_rect clip = { > - /* integer pixels */ > - .x2 = intel_crtc->active ? intel_crtc->config.pipe_src_w : 0, > - .y2 = intel_crtc->active ? intel_crtc->config.pipe_src_h : 0, > - }; > - bool visible; > - int ret; > + struct drm_crtc *crtc = state->crtc; > + struct drm_framebuffer *fb = state->fb; > + struct drm_rect *dest = &state->dst; > + struct drm_rect *src = &state->src; > + const struct drm_rect *clip = &state->clip; > > - ret = drm_plane_helper_check_update(plane, crtc, fb, > - &src, &dest, &clip, > + return drm_plane_helper_check_update(plane, crtc, fb, > + src, dest, clip, > DRM_PLANE_HELPER_NO_SCALING, > DRM_PLANE_HELPER_NO_SCALING, > - true, true, &visible); > - if (ret) > - return ret; > + true, true, &state->visible); > +} > > - crtc->cursor_x = crtc_x; > - crtc->cursor_y = crtc_y; > +static int > +intel_commit_cursor_plane(struct drm_plane *plane, > + struct intel_plane_state *state) > +{ > + struct drm_crtc *crtc = state->crtc; > + struct drm_framebuffer *fb = state->fb; > + struct intel_crtc *intel_crtc = to_intel_crtc(crtc); > + struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); > + struct drm_i915_gem_object *obj = intel_fb->obj; > + struct drm_rect *dest = &state->dst; > + int crtc_w, crtc_h; > + > + crtc->cursor_x = state->dst.x1; > + crtc->cursor_y = state->dst.y1; > if (fb != crtc->cursor->fb) { > + crtc_w = drm_rect_width(dest); > + crtc_h = drm_rect_height(dest); These would need to be the original unclipped coordinates since we program the cursor hardware with those and the hardware clips itself. > return intel_crtc_cursor_set_obj(crtc, obj, crtc_w, crtc_h); > } else { > - intel_crtc_update_cursor(crtc, visible); > + intel_crtc_update_cursor(crtc, state->visible); > > intel_frontbuffer_flip(crtc->dev, > > INTEL_FRONTBUFFER_CURSOR(intel_crtc->pipe)); > @@ -11948,6 +11939,45 @@ intel_cursor_plane_update(struct drm_plane *plane, > struct drm_crtc *crtc, > return 0; > } > } > + > +static int > +intel_cursor_plane_update(struct drm_plane *plane, struct drm_crtc *crtc, > + struct drm_framebuffer *fb, int crtc_x, int crtc_y, > + unsigned int crtc_w, unsigned int crtc_h, > + uint32_t src_x, uint32_t src_y, > + uint32_t src_w, uint32_t src_h) > +{ > + struct intel_crtc *intel_crtc = to_intel_crtc(crtc); > + struct intel_plane_state state; > + int ret; > + > + state.crtc = crtc; > + state.fb = fb; >
TTM placement & caching issue/questions
On Thu, 2014-09-04 at 16:19 +0900, Michel D?nzer wrote: > > +#else /* CONFIG_X86 */ > > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t > *placement) > > +{ > > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > > + ttm->caching_state = tt_cached; > > + *placement &= ~TTM_PL_MASK_CACHING; > > + *placement |= TTM_PL_FLAG_CACHED; > > NAK, this will break AGP on PowerMacs. ... which doesn't work reliably anyway with DRI2 :-) The problem is ... with DRI1 I think we had tricks to take out the AGP from the linear mapping but that want away, didn't we ? In any case, we are playing with fire on these by allowing the cache paradox. It just happens that those old CPUs aren't *that* aggressive at speculative prefetch and we probably rarely hit the lockups that they would cause... Michel, what do you recommend we do then ? The patch I sent to double check in ttm_io_prot() has a specific hack to avoid warning on PowerMac for the above reason, but we need to fix Jerome if we want to keep that broken-by-design Mac AGP functionality going :-) Maybe we could add a similar ifdef in the above ? Cheers, Ben.
[PATCH 4/9] drm/ast: Don't assume DVO enabled means SIL164 on uninitialized chips
It looks like the AST2400 comes up with the DVO enable bit set, which causes us to incorrectly assume we have a SIL164 regardless of the value of the scratch registers setup by the BMC firmware. So let's limit that test to the case where the chip has already been setup by a BIOS. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_main.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c index 556d065..48998b2 100644 --- a/drivers/gpu/drm/ast/ast_main.c +++ b/drivers/gpu/drm/ast/ast_main.c @@ -158,16 +158,22 @@ static int ast_detect_chip(struct drm_device *dev, bool *need_post) /* * VGACRA3 Enhanced Color Mode Register, check if DVO is already * enabled, in that case, assume we have a SIL164 TMDS transmitter +* +* Don't make that assumption if we the chip wasn't enabled and +* is at power-on reset, otherwise we'll incorrectly "detect" a +* SIL164 when there is none. */ - jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff); - if (jreg & 0x80) - ast->tx_chip_type = AST_TX_SIL164; + if (!*need_post) { + jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff); + if (jreg & 0x80) + ast->tx_chip_type = AST_TX_SIL164; + } if ((ast->chip == AST2300) || (ast->chip == AST2400)) { /* * On AST2300 and 2400, look the configuration set by the SoC in * the SOC scratch register #1 bits 11:8 (interestingly marked -* as "reserved" in the spec +* as "reserved" in the spec) */ jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 0xff); switch (jreg) {
[PATCH 3/9] drm/ast: Properly initialize P2A base before using it in ast_init_3rdtx()
If the P2A has been used to target other SOC registers before that call, we're going to hit the wrong place so make sure we set the base address up properly before using it. (P2A stands for PCIe to AHB bridge and is the bride that allows accessing the AST's internal AHB bus using a relocatable 64k window in the second half of the PCIe MMIO BAR) Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_dp501.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/ast/ast_dp501.c b/drivers/gpu/drm/ast/ast_dp501.c index 5da4b62..7e2ddde 100644 --- a/drivers/gpu/drm/ast/ast_dp501.c +++ b/drivers/gpu/drm/ast/ast_dp501.c @@ -400,7 +400,18 @@ void ast_init_3rdtx(struct drm_device *dev) if (ast->tx_chip_type == AST_TX_SIL164) ast_init_dvo(dev); else { + /* +* Set DAC source to VGA mode in SCU2C via the P2A +* bridge. First configure the P2U to target the SCU +* in case it isn't at this stage. +*/ + ast_write32(ast, 0xf004, 0x1e6e); + ast_write32(ast, 0xf000, 0x1); + /* Then unlock the SCU with the magic password */ ast_write32(ast, 0x12000, 0x1688a8a8); + ast_write32(ast, 0x12000, 0x1688a8a8); + ast_write32(ast, 0x12000, 0x1688a8a8); + /* Finally, clear bits [17:16] of SCU2c */ data = ast_read32(ast, 0x1202c); data &= 0xfffc; ast_write32(ast, 0, data);
[PATCH 2/9] drm/ast: POST chip at probe time if VGA not enabled
We need to do it on machines without a BIOS such as POWER8. Also for detection to work without triggering PCIe errors, we need to enable VGA early on, inside ast_detect_chip(). While touching those files, replace a few hard coded register numbers with the corresponding symbolic constant. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_drv.h | 3 +++ drivers/gpu/drm/ast/ast_main.c | 47 -- drivers/gpu/drm/ast/ast_post.c | 23 + 3 files changed, 62 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h index a203a6f..78fc683 100644 --- a/drivers/gpu/drm/ast/ast_drv.h +++ b/drivers/gpu/drm/ast/ast_drv.h @@ -384,6 +384,9 @@ int ast_bo_push_sysram(struct ast_bo *bo); int ast_mmap(struct file *filp, struct vm_area_struct *vma); /* ast post */ +void ast_enable_vga(struct drm_device *dev); +void ast_enable_mmio(struct drm_device *dev); +bool ast_is_vga_enabled(struct drm_device *dev); void ast_post_gpu(struct drm_device *dev); u32 ast_mindwm(struct ast_private *ast, u32 r); void ast_moutdwm(struct ast_private *ast, u32 r, u32 v); diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c index c2ff793..556d065 100644 --- a/drivers/gpu/drm/ast/ast_main.c +++ b/drivers/gpu/drm/ast/ast_main.c @@ -63,7 +63,7 @@ uint8_t ast_get_index_reg_mask(struct ast_private *ast, } -static int ast_detect_chip(struct drm_device *dev) +static int ast_detect_chip(struct drm_device *dev, bool *need_post) { struct ast_private *ast = dev->dev_private; uint32_t data, jreg; @@ -109,6 +109,21 @@ static int ast_detect_chip(struct drm_device *dev) } } + /* +* If VGA isn't enabled, we need to enable now or subsequent +* access to the scratch registers will fail. We also inform +* our caller that it needs to POST the chip +* (Assumption: VGA not enabled -> need to POST) +*/ + if (!ast_is_vga_enabled(dev)) { + ast_enable_vga(dev); + ast_enable_mmio(dev); + DRM_INFO("VGA not enabled on entry, requesting chip POST\n"); + *need_post = true; + } else + *need_post = false; + + /* Check if we support wide screen */ switch (ast->chip) { case AST1180: ast->support_wide_screen = true; @@ -124,6 +139,7 @@ static int ast_detect_chip(struct drm_device *dev) ast->support_wide_screen = true; else { ast->support_wide_screen = false; + /* Read SCU7c (silicon revision register) */ ast_write32(ast, 0xf004, 0x1e6e); ast_write32(ast, 0xf000, 0x1); data = ast_read32(ast, 0x1207c); @@ -136,11 +152,23 @@ static int ast_detect_chip(struct drm_device *dev) break; } + /* Check 3rd Tx option (digital output afaik) */ ast->tx_chip_type = AST_TX_NONE; + + /* +* VGACRA3 Enhanced Color Mode Register, check if DVO is already +* enabled, in that case, assume we have a SIL164 TMDS transmitter +*/ jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff); if (jreg & 0x80) ast->tx_chip_type = AST_TX_SIL164; + if ((ast->chip == AST2300) || (ast->chip == AST2400)) { + /* +* On AST2300 and 2400, look the configuration set by the SoC in +* the SOC scratch register #1 bits 11:8 (interestingly marked +* as "reserved" in the spec +*/ jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 0xff); switch (jreg) { case 0x04: @@ -161,6 +189,17 @@ static int ast_detect_chip(struct drm_device *dev) } } + /* Print stuff for diagnostic purposes */ + switch(ast->tx_chip_type) { + case AST_TX_SIL164: + DRM_INFO("Using Sil164 TMDS transmitter\n"); + break; + case AST_TX_DP501: + DRM_INFO("Using DP501 DisplayPort transmitter\n"); + break; + default: + DRM_INFO("Analog VGA only\n"); + } return 0; } @@ -345,6 +384,7 @@ static u32 ast_get_vram_info(struct drm_device *dev) int ast_driver_load(struct drm_device *dev, unsigned long flags) { struct ast_private *ast; + bool need_post; int ret = 0; ast = kzalloc(sizeof(struct ast_private), GFP_KERNEL); @@ -379,7 +419,7 @@ int ast_driver_load(struct drm_device *dev, unsigned long flags) } } - ast_detect_chip(dev); + ast_detect_chip(dev, &need_post); if (ast->chip != AST1180) { ast_get_dram_info(dev); @@ -387,6 +427,9 @@ int ast_driver_load(struct drm_device *dev
[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()
On all current cache coherent powerpc processors, it is not legit to map system memory non-cachable. This will cause aliases with the linear mapping which can be fatal. The TTM should generally avoid it after Jerome placement patches but let's add a sanity check anyway to catch any possible remaining issue. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ttm/ttm_bo_util.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index ef2ac3c..48095be 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -37,6 +37,9 @@ #include #include #include +#if defined(__powerpc__) +#include +#endif void ttm_bo_free_old_node(struct ttm_buffer_object *bo) { @@ -498,6 +501,20 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp) tmp = pgprot_noncached(tmp); #endif +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE) + /* +* Using a non-cachable mapping of system memory on +* cache coherent powerpc's can be fatal, let's make +* sure this doesn't happen and warn if it does. The +* only exception is powermac with AGP which has to +* take the risk. +*/ + if (!machine_is(powermac) && + ((caching_flags & TTM_PL_FLAG_SYSTEM) || +(caching_flags & TTM_PL_FLAG_TT))) { + pr_err_once("TTM: Attempt to use a non-cached" + " mapping on RAM unsupported !\n"); + return tmp; } #endif #if defined(__ia64__) || defined(__arm__) || defined(__powerpc__)
[PATCH 8/9] drm/ttm: Clean usage of ttm_io_prot() with TTM_PL_FLAG_CACHED
Today, most callers of ttm_io_prot() check TTM_PL_FLAG_CACHED before calling it since on some archs it will unconditionally create non-cached mappings. But not all callers do which is incorrect as far as I can tell. Instead, move that check inside ttm_io_port() itself for all archs and make powerpc use the same implementation as ia64 and arm Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ttm/ttm_bo_util.c | 19 --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 ++--- 2 files changed, 10 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index e31d48c..ef2ac3c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -487,28 +487,27 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t tmp) { + /* Cached mappings need no adjustment */ + if (caching_flags & TTM_PL_FLAG_CACHED) + return tmp; + #if defined(__i386__) || defined(__x86_64__) if (caching_flags & TTM_PL_FLAG_WC) tmp = pgprot_writecombine(tmp); else if (boot_cpu_data.x86 > 3) tmp = pgprot_noncached(tmp); -#elif defined(__powerpc__) - if (!(caching_flags & TTM_PL_FLAG_CACHED)) { - pgprot_val(tmp) |= _PAGE_NO_CACHE; - if (caching_flags & TTM_PL_FLAG_UNCACHED) - pgprot_val(tmp) |= _PAGE_GUARDED; +#endif } #endif -#if defined(__ia64__) || defined(__arm__) +#if defined(__ia64__) || defined(__arm__) || defined(__powerpc__) if (caching_flags & TTM_PL_FLAG_WC) tmp = pgprot_writecombine(tmp); else tmp = pgprot_noncached(tmp); #endif #if defined(__sparc__) || defined(__mips__) - if (!(caching_flags & TTM_PL_FLAG_CACHED)) - tmp = pgprot_noncached(tmp); + tmp = pgprot_noncached(tmp); #endif return tmp; } @@ -567,9 +566,7 @@ static int ttm_bo_kmap_ttm(struct ttm_buffer_object *bo, * We need to use vmap to get the desired page protection * or to make the buffer object look contiguous. */ - prot = (mem->placement & TTM_PL_FLAG_CACHED) ? - PAGE_KERNEL : - ttm_io_prot(mem->placement, PAGE_KERNEL); + prot = ttm_io_prot(mem->placement, PAGE_KERNEL); map->bo_kmap_type = ttm_bo_map_vmap; map->virtual = vmap(ttm->pages + start_page, num_pages, 0, prot); diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 0ce48e5..4ce8dc1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -200,9 +200,8 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) cvma.vm_page_prot); } else { ttm = bo->ttm; - if (!(bo->mem.placement & TTM_PL_FLAG_CACHED)) - cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, - cvma.vm_page_prot); + cvma.vm_page_prot = ttm_io_prot(bo->mem.placement, + cvma.vm_page_prot); /* Allocate all page at once, most common usage */ if (ttm->bdev->driver->ttm_tt_populate(ttm)) {
[PATCH 7/9] drm: powerpc can use a simpler drm_io_prot()
What the code does is equivalent to the x86 code, so let's use it as well Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/drm_vm.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c index 24e045c..ed02563 100644 --- a/drivers/gpu/drm/drm_vm.c +++ b/drivers/gpu/drm/drm_vm.c @@ -48,15 +48,11 @@ static pgprot_t drm_io_prot(struct drm_local_map *map, { pgprot_t tmp = vm_get_page_prot(vma->vm_flags); -#if defined(__i386__) || defined(__x86_64__) +#if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__) if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING)) tmp = pgprot_noncached(tmp); else tmp = pgprot_writecombine(tmp); -#elif defined(__powerpc__) - pgprot_val(tmp) |= _PAGE_NO_CACHE; - if (map->type == _DRM_REGISTERS) - pgprot_val(tmp) |= _PAGE_GUARDED; #elif defined(__ia64__) if (efi_range_is_wc(vma->vm_start, vma->vm_end - vma->vm_start))
[PATCH 6/9] drm/ttm: force cached mapping on non x86 platform
From: J?r?me Glisse People interested in providing uncached or write combined mapping on there architecture need to do the ground work inside there arch specific code to allow to break the linear kernel mapping so that page mapping attributes can be updated, in the meantime force cached mapping for non x86 architecture. Signed-off-by: J?r?me Glisse Signed-off-by: Benjamin Herrenschmidt --- [Minor compile fixes on top of Jerome original v3] drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c| 2 +- drivers/gpu/drm/ttm/ttm_bo_util.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c| 32 +--- include/drm/ttm/ttm_bo_driver.h | 2 +- 5 files changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 72afe82..4dd5060 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -304,7 +304,7 @@ static int radeon_move_vram_ram(struct ttm_buffer_object *bo, return r; } - r = ttm_tt_set_placement_caching(bo->ttm, tmp_mem.placement); + r = ttm_tt_set_placement_caching(bo->ttm, &tmp_mem.placement); if (unlikely(r)) { goto out_cleanup; } diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 3da89d5..4dc21c3 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -305,7 +305,7 @@ static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo, goto out_err; } - ret = ttm_tt_set_placement_caching(bo->ttm, mem->placement); + ret = ttm_tt_set_placement_caching(bo->ttm, &mem->placement); if (ret) goto out_err; diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 30e5d90..e31d48c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -59,7 +59,7 @@ int ttm_bo_move_ttm(struct ttm_buffer_object *bo, old_mem->mem_type = TTM_PL_SYSTEM; } - ret = ttm_tt_set_placement_caching(ttm, new_mem->placement); + ret = ttm_tt_set_placement_caching(ttm, &new_mem->placement); if (unlikely(ret != 0)) return ret; diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c index bf080ab..19ae8ee 100644 --- a/drivers/gpu/drm/ttm/ttm_tt.c +++ b/drivers/gpu/drm/ttm/ttm_tt.c @@ -89,14 +89,6 @@ static inline int ttm_tt_set_page_caching(struct page *p, return ret; } -#else /* CONFIG_X86 */ -static inline int ttm_tt_set_page_caching(struct page *p, - enum ttm_caching_state c_old, - enum ttm_caching_state c_new) -{ - return 0; -} -#endif /* CONFIG_X86 */ /* * Change caching policy for the linear kernel map @@ -149,19 +141,37 @@ out_err: return ret; } -int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement) +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) { enum ttm_caching_state state; - if (placement & TTM_PL_FLAG_WC) + if (*placement & TTM_PL_FLAG_WC) state = tt_wc; - else if (placement & TTM_PL_FLAG_UNCACHED) + else if (*placement & TTM_PL_FLAG_UNCACHED) state = tt_uncached; else state = tt_cached; return ttm_tt_set_caching(ttm, state); } +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) +{ + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + *placement &= ~TTM_PL_MASK_CACHING; + *placement |= TTM_PL_FLAG_CACHED; + } else { + if (*placement & TTM_PL_FLAG_WC) + ttm->caching_state = tt_wc; + else if (*placement & TTM_PL_FLAG_UNCACHED) + ttm->caching_state = tt_uncached; + else + ttm->caching_state = tt_cached; + } + return 0; +} +#endif /* CONFIG_X86 */ EXPORT_SYMBOL(ttm_tt_set_placement_caching); void ttm_tt_destroy(struct ttm_tt *ttm) diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 1d9f0f1..cbc5ad2 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -669,7 +669,7 @@ extern int ttm_tt_swapin(struct ttm_tt *ttm); * hit RAM. This function may be very costly as it involves global TLB * and cache flushes and potential page splitting / combining. */ -extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t placement); +extern int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement); extern int ttm_tt_swapout(struct ttm_tt *ttm, struct file *persistent_swap_stora
[PATCH 5/9] drm/ast: Cleanup analog init code path
Move the MMIO mangling to a separate routine and actually disable the DVO output when using pure analog. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_dp501.c | 49 ++--- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_dp501.c b/drivers/gpu/drm/ast/ast_dp501.c index 7e2ddde..76f07f3 100644 --- a/drivers/gpu/drm/ast/ast_dp501.c +++ b/drivers/gpu/drm/ast/ast_dp501.c @@ -379,11 +379,39 @@ static bool ast_init_dvo(struct drm_device *dev) return true; } + +static void ast_init_analog(struct drm_device *dev) +{ + struct ast_private *ast = dev->dev_private; + u32 data; + + /* +* Set DAC source to VGA mode in SCU2C via the P2A +* bridge. First configure the P2U to target the SCU +* in case it isn't at this stage. +*/ + ast_write32(ast, 0xf004, 0x1e6e); + ast_write32(ast, 0xf000, 0x1); + + /* Then unlock the SCU with the magic password */ + ast_write32(ast, 0x12000, 0x1688a8a8); + ast_write32(ast, 0x12000, 0x1688a8a8); + ast_write32(ast, 0x12000, 0x1688a8a8); + + /* Finally, clear bits [17:16] of SCU2c */ + data = ast_read32(ast, 0x1202c); + data &= 0xfffc; + ast_write32(ast, 0, data); + + /* Disable DVO */ + ast_set_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xcf, 0x00); +} + void ast_init_3rdtx(struct drm_device *dev) { struct ast_private *ast = dev->dev_private; u8 jreg; - u32 data; + if (ast->chip == AST2300 || ast->chip == AST2400) { jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 0xff); switch (jreg & 0x0e) { @@ -399,23 +427,8 @@ void ast_init_3rdtx(struct drm_device *dev) default: if (ast->tx_chip_type == AST_TX_SIL164) ast_init_dvo(dev); - else { - /* -* Set DAC source to VGA mode in SCU2C via the P2A -* bridge. First configure the P2U to target the SCU -* in case it isn't at this stage. -*/ - ast_write32(ast, 0xf004, 0x1e6e); - ast_write32(ast, 0xf000, 0x1); - /* Then unlock the SCU with the magic password */ - ast_write32(ast, 0x12000, 0x1688a8a8); - ast_write32(ast, 0x12000, 0x1688a8a8); - ast_write32(ast, 0x12000, 0x1688a8a8); - /* Finally, clear bits [17:16] of SCU2c */ - data = ast_read32(ast, 0x1202c); - data &= 0xfffc; - ast_write32(ast, 0, data); - } + else + ast_init_analog(dev); } } }
[PATCH 4/9] drm/ast: Don't assume DVO enabled means SIL164 on uninitialized chips
It looks like the AST2400 comes up with the DVO enable bit set, which causes us to incorrectly assume we have a SIL164 regardless of the value of the scratch registers setup by the BMC firmware. So let's limit that test to the case where the chip has already been setup by a BIOS. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_main.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c index 556d065..48998b2 100644 --- a/drivers/gpu/drm/ast/ast_main.c +++ b/drivers/gpu/drm/ast/ast_main.c @@ -158,16 +158,22 @@ static int ast_detect_chip(struct drm_device *dev, bool *need_post) /* * VGACRA3 Enhanced Color Mode Register, check if DVO is already * enabled, in that case, assume we have a SIL164 TMDS transmitter +* +* Don't make that assumption if we the chip wasn't enabled and +* is at power-on reset, otherwise we'll incorrectly "detect" a +* SIL164 when there is none. */ - jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff); - if (jreg & 0x80) - ast->tx_chip_type = AST_TX_SIL164; + if (!*need_post) { + jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff); + if (jreg & 0x80) + ast->tx_chip_type = AST_TX_SIL164; + } if ((ast->chip == AST2300) || (ast->chip == AST2400)) { /* * On AST2300 and 2400, look the configuration set by the SoC in * the SOC scratch register #1 bits 11:8 (interestingly marked -* as "reserved" in the spec +* as "reserved" in the spec) */ jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 0xff); switch (jreg) {
[PATCH 2/9] drm/ast: POST chip at probe time if VGA not enabled
We need to do it on machines without a BIOS such as POWER8. Also for detection to work without triggering PCIe errors, we need to enable VGA early on, inside ast_detect_chip(). While touching those files, replace a few hard coded register numbers with the corresponding symbolic constant. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_drv.h | 3 +++ drivers/gpu/drm/ast/ast_main.c | 47 -- drivers/gpu/drm/ast/ast_post.c | 23 + 3 files changed, 62 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h index a203a6f..78fc683 100644 --- a/drivers/gpu/drm/ast/ast_drv.h +++ b/drivers/gpu/drm/ast/ast_drv.h @@ -384,6 +384,9 @@ int ast_bo_push_sysram(struct ast_bo *bo); int ast_mmap(struct file *filp, struct vm_area_struct *vma); /* ast post */ +void ast_enable_vga(struct drm_device *dev); +void ast_enable_mmio(struct drm_device *dev); +bool ast_is_vga_enabled(struct drm_device *dev); void ast_post_gpu(struct drm_device *dev); u32 ast_mindwm(struct ast_private *ast, u32 r); void ast_moutdwm(struct ast_private *ast, u32 r, u32 v); diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c index c2ff793..556d065 100644 --- a/drivers/gpu/drm/ast/ast_main.c +++ b/drivers/gpu/drm/ast/ast_main.c @@ -63,7 +63,7 @@ uint8_t ast_get_index_reg_mask(struct ast_private *ast, } -static int ast_detect_chip(struct drm_device *dev) +static int ast_detect_chip(struct drm_device *dev, bool *need_post) { struct ast_private *ast = dev->dev_private; uint32_t data, jreg; @@ -109,6 +109,21 @@ static int ast_detect_chip(struct drm_device *dev) } } + /* +* If VGA isn't enabled, we need to enable now or subsequent +* access to the scratch registers will fail. We also inform +* our caller that it needs to POST the chip +* (Assumption: VGA not enabled -> need to POST) +*/ + if (!ast_is_vga_enabled(dev)) { + ast_enable_vga(dev); + ast_enable_mmio(dev); + DRM_INFO("VGA not enabled on entry, requesting chip POST\n"); + *need_post = true; + } else + *need_post = false; + + /* Check if we support wide screen */ switch (ast->chip) { case AST1180: ast->support_wide_screen = true; @@ -124,6 +139,7 @@ static int ast_detect_chip(struct drm_device *dev) ast->support_wide_screen = true; else { ast->support_wide_screen = false; + /* Read SCU7c (silicon revision register) */ ast_write32(ast, 0xf004, 0x1e6e); ast_write32(ast, 0xf000, 0x1); data = ast_read32(ast, 0x1207c); @@ -136,11 +152,23 @@ static int ast_detect_chip(struct drm_device *dev) break; } + /* Check 3rd Tx option (digital output afaik) */ ast->tx_chip_type = AST_TX_NONE; + + /* +* VGACRA3 Enhanced Color Mode Register, check if DVO is already +* enabled, in that case, assume we have a SIL164 TMDS transmitter +*/ jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xa3, 0xff); if (jreg & 0x80) ast->tx_chip_type = AST_TX_SIL164; + if ((ast->chip == AST2300) || (ast->chip == AST2400)) { + /* +* On AST2300 and 2400, look the configuration set by the SoC in +* the SOC scratch register #1 bits 11:8 (interestingly marked +* as "reserved" in the spec +*/ jreg = ast_get_index_reg_mask(ast, AST_IO_CRTC_PORT, 0xd1, 0xff); switch (jreg) { case 0x04: @@ -161,6 +189,17 @@ static int ast_detect_chip(struct drm_device *dev) } } + /* Print stuff for diagnostic purposes */ + switch(ast->tx_chip_type) { + case AST_TX_SIL164: + DRM_INFO("Using Sil164 TMDS transmitter\n"); + break; + case AST_TX_DP501: + DRM_INFO("Using DP501 DisplayPort transmitter\n"); + break; + default: + DRM_INFO("Analog VGA only\n"); + } return 0; } @@ -345,6 +384,7 @@ static u32 ast_get_vram_info(struct drm_device *dev) int ast_driver_load(struct drm_device *dev, unsigned long flags) { struct ast_private *ast; + bool need_post; int ret = 0; ast = kzalloc(sizeof(struct ast_private), GFP_KERNEL); @@ -379,7 +419,7 @@ int ast_driver_load(struct drm_device *dev, unsigned long flags) } } - ast_detect_chip(dev); + ast_detect_chip(dev, &need_post); if (ast->chip != AST1180) { ast_get_dram_info(dev); @@ -387,6 +427,9 @@ int ast_driver_load(struct drm_device *dev,
[PATCH 1/9] drm/ast: Try to use MMIO registers when PIO isn't supported
If the PIO resources haven't been assigned, then we have no choice but try to use the MMIO version. This is the case for example on POWER8 which doesn't support PIO at all. Chips rev 0x20 or later have MMIO decoding enabled by default. Signed-off-by: Benjamin Herrenschmidt --- drivers/gpu/drm/ast/ast_drv.h | 5 - drivers/gpu/drm/ast/ast_main.c | 20 +--- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_drv.h b/drivers/gpu/drm/ast/ast_drv.h index 957d4fa..a203a6f 100644 --- a/drivers/gpu/drm/ast/ast_drv.h +++ b/drivers/gpu/drm/ast/ast_drv.h @@ -125,8 +125,9 @@ struct ast_gem_object; #define AST_IO_AR_PORT_WRITE (0x40) #define AST_IO_MISC_PORT_WRITE (0x42) +#define AST_IO_VGA_ENABLE_PORT (0x43) #define AST_IO_SEQ_PORT(0x44) -#define AST_DAC_INDEX_READ (0x3c7) +#define AST_IO_DAC_INDEX_READ (0x47) #define AST_IO_DAC_INDEX_WRITE (0x48) #define AST_IO_DAC_DATA(0x49) #define AST_IO_GR_PORT (0x4E) @@ -134,6 +135,8 @@ struct ast_gem_object; #define AST_IO_INPUT_STATUS1_READ (0x5A) #define AST_IO_MISC_PORT_READ (0x4C) +#define AST_IO_MM_OFFSET (0x380) + #define __ast_read(x) \ static inline u##x ast_read##x(struct ast_private *ast, u32 reg) { \ u##x val = 0;\ diff --git a/drivers/gpu/drm/ast/ast_main.c b/drivers/gpu/drm/ast/ast_main.c index a2cc6be..c2ff793 100644 --- a/drivers/gpu/drm/ast/ast_main.c +++ b/drivers/gpu/drm/ast/ast_main.c @@ -359,10 +359,24 @@ int ast_driver_load(struct drm_device *dev, unsigned long flags) ret = -EIO; goto out_free; } - ast->ioregs = pci_iomap(dev->pdev, 2, 0); + + /* +* If we don't have IO space at all, use MMIO now and +* assume the chip has MMIO enabled by default (rev 0x20 +* and higher). +*/ + if (!(pci_resource_flags(dev->pdev, 2) & IORESOURCE_IO)) { + DRM_INFO("platform has no IO space, trying MMIO\n"); + ast->ioregs = ast->regs + AST_IO_MM_OFFSET; + } + + /* "map" IO regs if the above hasn't done so already */ if (!ast->ioregs) { - ret = -EIO; - goto out_free; + ast->ioregs = pci_iomap(dev->pdev, 2, 0); + if (!ast->ioregs) { + ret = -EIO; + goto out_free; + } } ast_detect_chip(dev);
TTM placement & caching issue/questions
On 04.09.2014 16:59, Michel D?nzer wrote: > On 04.09.2014 16:54, Benjamin Herrenschmidt wrote: >> On Thu, 2014-09-04 at 16:19 +0900, Michel D?nzer wrote: +#else /* CONFIG_X86 */ +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t >>> *placement) +{ + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { + ttm->caching_state = tt_cached; + *placement &= ~TTM_PL_MASK_CACHING; + *placement |= TTM_PL_FLAG_CACHED; >>> >>> NAK, this will break AGP on PowerMacs. >> >> ... which doesn't work reliably anyway with DRI2 :-) > > Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm > not alone, at least with AGP 1x it seems to work quite well for most > people. So I don't see the justification for intentionally breaking it > completely for all of us. Even more so because PCI GART is unusably slow in general. -- Earthling Michel D?nzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
TTM placement & caching issue/questions
On 04.09.2014 16:54, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 16:19 +0900, Michel D?nzer wrote: >>> +#else /* CONFIG_X86 */ >>> +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t >> *placement) >>> +{ >>> + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { >>> + ttm->caching_state = tt_cached; >>> + *placement &= ~TTM_PL_MASK_CACHING; >>> + *placement |= TTM_PL_FLAG_CACHED; >> >> NAK, this will break AGP on PowerMacs. > > ... which doesn't work reliably anyway with DRI2 :-) Define 'not reliably'. I have uptimes of weeks, and I'm pretty sure I'm not alone, at least with AGP 1x it seems to work quite well for most people. So I don't see the justification for intentionally breaking it completely for all of us. -- Earthling Michel D?nzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
SI display gap for more than 2 displays
On Thu, Sep 04, 2014 at 03:52:20PM +0200, Sylvain BERTRAND wrote: > Hi, > > In si_program_display_gap we have DISP1_GAP and DISP2_GAP. > > Where are DISP3_GAP to DISP6_GAP? What does expect this hardware > block when more than 2 displays are connected? Is DISP2_GAP > actually stand for DISP[3-6]_GAP? > > Still in the same function, what happened to the pipes for > DCCG_DISP[2-6]_SLOW_SELECT? I noticed something else: in si_enable_display_gap, the DISP1_GAP_MCHG and DISP2_GAP_MCHG fields from CG_DISPLAY_GAP_CNTL get inited with DISP1 only to vblank, and never reprogrammed with new displays like DISP[12]_GAP. It seems not consistant, expected? regards, -- Sylvain BERTRAND
[Bug 83505] New: AMD A4-5300 APU : only radeon.dpm=1 prevents random reboots with 3.16.1 kernel.
https://bugs.freedesktop.org/show_bug.cgi?id=83505 Priority: medium Bug ID: 83505 Assignee: dri-devel at lists.freedesktop.org Summary: AMD A4-5300 APU : only radeon.dpm=1 prevents random reboots with 3.16.1 kernel. Severity: minor Classification: Unclassified OS: Linux (All) Reporter: rpnpif at free.fr Hardware: x86-64 (AMD64) Status: NEW Version: unspecified Component: DRM/Radeon Product: DRI Created attachment 105760 --> https://bugs.freedesktop.org/attachment.cgi?id=105760&action=edit dmesg with radeon.dpm=1 I have read that for this AMD A4-5300 APU (Aruba), radeon.dpm=1 on the grub command line of the kernel booting is useless because this option is already per default in the kernel. But if I don't use this option, Debian 7.6 reboots randomly after some minutes, hours or days of work when I use Iceweasel/Firefox and some flash animation, Libreoffice or other heavy application. With this option, no reboots. All work fine after several days. This issue seems about the same as https://bugs.freedesktop.org/show_bug.cgi?id=82201#c26 but I am not sure. I report this issue because dpm=1 should be useless. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/b194d598/attachment.html>
[PATCH 9/9] drm/ttm: Sanity check mapping attributes on powerpc in ttm_io_prot()
On 04.09.2014 16:47, Benjamin Herrenschmidt wrote: > On all current cache coherent powerpc processors, it is not legit > to map system memory non-cachable. This will cause aliases with > the linear mapping which can be fatal. > > The TTM should generally avoid it after Jerome placement patches but > let's add a sanity check anyway to catch any possible remaining issue. > > Signed-off-by: Benjamin Herrenschmidt [...] > @@ -498,6 +501,20 @@ pgprot_t ttm_io_prot(uint32_t caching_flags, pgprot_t > tmp) > tmp = pgprot_noncached(tmp); > > #endif > +#if defined(__powerpc__) && !defined(CONFIG_NOT_COHERENT_CACHE) > + /* > + * Using a non-cachable mapping of system memory on > + * cache coherent powerpc's can be fatal, let's make > + * sure this doesn't happen and warn if it does. The > + * only exception is powermac with AGP which has to > + * take the risk. > + */ > + if (!machine_is(powermac) && > + ((caching_flags & TTM_PL_FLAG_SYSTEM) || > + (caching_flags & TTM_PL_FLAG_TT))) { > + pr_err_once("TTM: Attempt to use a non-cached" > + " mapping on RAM unsupported !\n"); > + return tmp; NAK, this breaks AGP on PowerMacs. -- Earthling Michel D?nzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start
https://bugs.freedesktop.org/show_bug.cgi?id=83416 --- Comment #18 from Laurent carlier --- (In reply to comment #17) > Does this Mesa patch help? > > https://bugs.freedesktop.org/attachment.cgi?id=105755 No, it doesn't help -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/f9c3266a/attachment.html>
[Bug 81239] Evolution window content not shown fully (only desktop background)
https://bugs.freedesktop.org/show_bug.cgi?id=81239 --- Comment #11 from Paul Menzel --- I created ticket 736069 [1] in the GNOME Bugzilla bug tracker. [1] https://bugzilla.gnome.org/show_bug.cgi?id=736069 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/6a2dba55/attachment.html>
TTM placement & caching issue/questions
On 04.09.2014 11:36, Jerome Glisse wrote: > On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: >> On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: >>> On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: >>> So in the meantime the attached patch should work, it just silently ignore the caching attribute request on non x86 instead of pretending that things are setup as expected and then latter the radeon ou nouveau hw unsetting the snoop bit. It's not tested but i think it should work. >>> >>> I'm still getting placements with !CACHED going from bo_memcpy in >>> ttm_io_prot() though ... I'm looking at filtering the placement >>> attributes instead. >>> >>> Ben. >> >> Ok so this one should do the trick. > > Ok final version ... famous last word. [...] > +#else /* CONFIG_X86 */ > +int ttm_tt_set_placement_caching(struct ttm_tt *ttm, uint32_t *placement) > +{ > + if (*placement & (TTM_PL_TT | TTM_PL_FLAG_SYSTEM)) { > + ttm->caching_state = tt_cached; > + *placement &= ~TTM_PL_MASK_CACHING; > + *placement |= TTM_PL_FLAG_CACHED; NAK, this will break AGP on PowerMacs. -- Earthling Michel D?nzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
TTM placement & caching issue/questions
On 04.09.2014 10:55, Jerome Glisse wrote: > > While i agree about the issue of incoherent double map of same page, i > think we have more issue. For instance lattely AMD have been pushing a > lot of patches to move things to use uncached memory for radeon and as > usual thoses patches comes with no comment to the motivations of those > changes. That would have been a fair review comment... > What i understand is that uncached mapping for some frequently use buffer > give a significant performance boost (i am assuming this has to do with > all the snoop pci transaction overhead). Exactly, although it's a win even if the data is written by the CPU only once and read by the GPU only once. > This also means that we need to fix ttm_tt_set_placement_caching so that > when it returns an error it switches to cached mapping. Which will always > work. GTT with AGP being one exception. -- Earthling Michel D?nzer| http://www.amd.com Libre software enthusiast |Mesa and X developer
[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start
https://bugs.freedesktop.org/show_bug.cgi?id=83416 --- Comment #17 from Grigori Goronzy --- Does this Mesa patch help? https://bugs.freedesktop.org/attachment.cgi?id=105755 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/e8552757/attachment-0001.html>
[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences
Hey, Op 04-09-14 om 15:34 schreef Christian K?nig: >> I need to check the docs how to do this correctly, > The docs don't really cover this case. > > For the GPU waiting on an address there is an extra document just for this > case which I don't have at hand right now. But IIRC it was recommended to use > the local memory of the device waiting on the semaphore. I'm just not sure if > that's for pure performance reasons to avoid accessing the bus or if there's > a hard and unavoidable hardware reason to do so. > > For the GPU signaling case there is a special bit in the semaphore > instructions that you need to set if any user outside of the GPU should see > the write. > > In general it is explicitly supported to use semaphores for inter device > synchronization on the bus (that's what the block is made for), but it's not > intended to be used for synchronization between the CPU and the device. So > I'm not sure if things like cache snooping is implemented and correctly > supported. > > Well I see the feature more like nice to have and needs a bunch of testing, > so I would say either wait with the patch for now or make it optional to use > or something like that. You're right, it's meant as something 'nice to have'. This is why it came after the patch that exports reservation_object to/from dma-buf. :-) ~Maarten
SI display gap for more than 2 displays
Hi, In si_program_display_gap we have DISP1_GAP and DISP2_GAP. Where are DISP3_GAP to DISP6_GAP? What does expect this hardware block when more than 2 displays are connected? Is DISP2_GAP actually stand for DISP[3-6]_GAP? Still in the same function, what happened to the pipes for DCCG_DISP[2-6]_SLOW_SELECT? regards, -- Sylvain P.S. It seems that all this was "fixed" in CI with new hardware blocks, but I'm focussing on SI blocks.
[Bug 83500] si_dma_copy_tile causes GPU hangs
https://bugs.freedesktop.org/show_bug.cgi?id=83500 --- Comment #2 from Grigori Goronzy --- Created attachment 105755 --> https://bugs.freedesktop.org/attachment.cgi?id=105755&action=edit Better fix This is a possibly better fix that only disables DMA if 1D tiling is involved. Please give it a try. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/4dc47054/attachment.html>
[PATCH v2] drm/radeon: reduce memory footprint for debugging
There is no need to use hex_dump_to_buffer() since we have a kernel helper to dump up to 64 bytes just via printk(). In our case the actual size is 15 bytes. Signed-off-by: Andy Shevchenko --- drivers/gpu/drm/radeon/atombios_dp.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c index 95ea276..95d5d4a 100644 --- a/drivers/gpu/drm/radeon/atombios_dp.c +++ b/drivers/gpu/drm/radeon/atombios_dp.c @@ -405,16 +405,13 @@ bool radeon_dp_getdpcd(struct radeon_connector *radeon_connector) u8 msg[DP_DPCD_SIZE]; int ret; - char dpcd_hex_dump[DP_DPCD_SIZE * 3]; - ret = drm_dp_dpcd_read(&radeon_connector->ddc_bus->aux, DP_DPCD_REV, msg, DP_DPCD_SIZE); if (ret > 0) { memcpy(dig_connector->dpcd, msg, DP_DPCD_SIZE); - hex_dump_to_buffer(dig_connector->dpcd, sizeof(dig_connector->dpcd), - 32, 1, dpcd_hex_dump, sizeof(dpcd_hex_dump), false); - DRM_DEBUG_KMS("DPCD: %s\n", dpcd_hex_dump); + DRM_DEBUG_KMS("DPCD: %*ph\n", (int)sizeof(dig_connector->dpcd), + dig_connector->dpcd); radeon_dp_probe_oui(radeon_connector); -- 2.1.0
[PATCH] drm/radeon: reduce memory footprint for debugging
There is no need to use hex_dump_to_buffer() since we have a kernel helper to dump up to 64 bytes just via printk(). In our case the actual size is 15 bytes. Signed-off-by: Andy Shevchenko --- drivers/gpu/drm/radeon/atombios_dp.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c index 95ea276..4e75c48 100644 --- a/drivers/gpu/drm/radeon/atombios_dp.c +++ b/drivers/gpu/drm/radeon/atombios_dp.c @@ -405,16 +405,13 @@ bool radeon_dp_getdpcd(struct radeon_connector *radeon_connector) u8 msg[DP_DPCD_SIZE]; int ret; - char dpcd_hex_dump[DP_DPCD_SIZE * 3]; - ret = drm_dp_dpcd_read(&radeon_connector->ddc_bus->aux, DP_DPCD_REV, msg, DP_DPCD_SIZE); if (ret > 0) { memcpy(dig_connector->dpcd, msg, DP_DPCD_SIZE); - hex_dump_to_buffer(dig_connector->dpcd, sizeof(dig_connector->dpcd), - 32, 1, dpcd_hex_dump, sizeof(dpcd_hex_dump), false); - DRM_DEBUG_KMS("DPCD: %s\n", dpcd_hex_dump); + DRM_DEBUG_KMS("DPCD: %*ph\n", dig_connector->dpcd, + (int)sizeof(dig_connector->dpcd)); radeon_dp_probe_oui(radeon_connector); -- 2.1.0
[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start
https://bugs.freedesktop.org/show_bug.cgi?id=83416 Laurent carlier changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #16 from Laurent carlier --- Bad luck, it's hanging again! -> reopened -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/832b91d8/attachment.html>
[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences
> I need to check the docs how to do this correctly, The docs don't really cover this case. For the GPU waiting on an address there is an extra document just for this case which I don't have at hand right now. But IIRC it was recommended to use the local memory of the device waiting on the semaphore. I'm just not sure if that's for pure performance reasons to avoid accessing the bus or if there's a hard and unavoidable hardware reason to do so. For the GPU signaling case there is a special bit in the semaphore instructions that you need to set if any user outside of the GPU should see the write. In general it is explicitly supported to use semaphores for inter device synchronization on the bus (that's what the block is made for), but it's not intended to be used for synchronization between the CPU and the device. So I'm not sure if things like cache snooping is implemented and correctly supported. Well I see the feature more like nice to have and needs a bunch of testing, so I would say either wait with the patch for now or make it optional to use or something like that. Regards, Christian. Am 04.09.2014 um 14:25 schrieb Christian K?nig: > Am 04.09.2014 um 14:08 schrieb Maarten Lankhorst: >> Hey, >> >> Op 04-09-14 om 13:54 schreef Christian K?nig: >>> Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst: Use the semaphore mechanism to make this happen, this uses signaling from the cpu instead of signaling by the gpu. >>> I'm not sure if this will work reliable when the semaphores are in >>> system memory. We might need to reserve some VRAM for them instead. >>> >>> Regards, >>> Christian. >> Why would it be unreliable? I mostly kept it in semaphore for >> simplicity. > > The semaphore block tries to avoid memory accesses whenever possible. > > For example when a signal for address A arrives the block doesn't > necessary writes that to memory but instead tries to match it > immediately with a wait for address A. Similar is true if a wait for > address A arrives and the semaphore block thinks it knows the memory > value at address A. > > Also I'm not sure if the semaphore block really polls the memory > address for changes, instead it might just snoop the MC for writes to > this address. Since CPU writes to system memory aren't seen by the GPU > MC the semaphore block would never know something changed. > > I need to check the docs how to do this correctly, > Christian. > >> >> ~Maarten >> >
TTM placement & caching issue/questions
On Wed, 2014-09-03 at 22:36 -0400, Jerome Glisse wrote: > On Wed, Sep 03, 2014 at 10:31:18PM -0400, Jerome Glisse wrote: > > On Thu, Sep 04, 2014 at 12:25:23PM +1000, Benjamin Herrenschmidt wrote: > > > On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > > > > > > > So in the meantime the attached patch should work, it just silently > > > > ignore > > > > the caching attribute request on non x86 instead of pretending that > > > > things > > > > are setup as expected and then latter the radeon ou nouveau hw unsetting > > > > the snoop bit. > > > > > > > > It's not tested but i think it should work. > > > > > > I'm still getting placements with !CACHED going from bo_memcpy in > > > ttm_io_prot() though ... I'm looking at filtering the placement > > > attributes instead. > > > > > > Ben. > > > > Ok so this one should do the trick. > > Ok final version ... famous last word. Minus a couple of obvious typos that prevent if from building, it seems to do the trick for me with the AST driver, no more bad mappings. I'll still send a patch that catches the incorrect mapping attempts inside ttm_io_prot() and warns to help future debugging and avoid "random" behaviour. (I need to fix other things in the powerpc code in there anyway). Cheers, Ben.
[Bug 83416] [radeonsi] Serious Sam 3 lockup during its start
https://bugs.freedesktop.org/show_bug.cgi?id=83416 Laurent carlier changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #15 from Laurent carlier --- I can confirm that 8bd67231797e5d79d72a4e91b37ea81da30c6df3 is fixing the hang. Thanks Marek, closing! -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/8b653eac/attachment.html>
[Bug 81644] Random crashes on RadeonSI with Chromium.
https://bugs.freedesktop.org/show_bug.cgi?id=81644 --- Comment #82 from Aaron B --- I'm still bisecting, but I just want to say I suck at it and I'll probably need at least 2 bisects to the same point, if not more. I'm trying to be patient, but on the old Mesa's the glitch just takes so long to do, even when I set it up to do it. So, should I skip to bisecting if this DMA patch that was just proposed is the source of our problem, also? https://bugs.freedesktop.org/show_bug.cgi?id=83500 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/46ca10bf/attachment.html>
[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process
On Thu, Sep 04, 2014 at 11:52:15AM +, Gupta, Sourab wrote: > On Thu, 2014-09-04 at 10:01 +, Daniel Vetter wrote: > > Interface design discussions should happen in public (so that > > non-intel people can jump in, which happens rather often for other > > drivers actually). But at least include internal mailing lists next > > time around. Also adding dri-devel. > > > > The problem I see with your approach is that "process-wise" is not a > > solid concept with drm. We can dump information per open drm file, but > > that file descriptor can be shared between processes. And the latest > > generation of linux compositor protocols (like dri3) actually take > > advantage of this. > > By "process-wise" sharing, do you mean the sharing of the drm file > across different processes (having different tgid's), or is it sharing > across the threads of a single process (having same tgid)? > Sorry, we are not aware of the sharing of drm file across processes in > dri3 protocols, as in android userspace, we have not come across such > scenario. Can you please shed some light on it. > > In our design, we have a tgid based accounting mechanism. As long as the > drm file is shared within the threads of the same process, its resources > (objects and memory) are accounted together. But if the drm file is > shared across different processes (diff tgid's), this case is still an > open. > Will our tgid based accounting cover the dri3 usecases also (if they > share drm file within same tgid)? Well in unix a file descriptor is simply not tied to a process/thread at all, so if you expose accounting data for resources which are tied to file descriptors then that doesn't work. E.g. - fork inteherits all the filedescriptors from its parents, same for exec - you can pass file descriptors explicitly between processes over unix domain sockets (this is what dri3 does). So if you'd use the tgid of the process that opened the file you'd account everything to the X server with dri3. Which is not really useful. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences
Am 04.09.2014 um 14:08 schrieb Maarten Lankhorst: > Hey, > > Op 04-09-14 om 13:54 schreef Christian K?nig: >> Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst: >>> Use the semaphore mechanism to make this happen, this uses signaling >>> from the cpu instead of signaling by the gpu. >> I'm not sure if this will work reliable when the semaphores are in system >> memory. We might need to reserve some VRAM for them instead. >> >> Regards, >> Christian. > Why would it be unreliable? I mostly kept it in semaphore for simplicity. The semaphore block tries to avoid memory accesses whenever possible. For example when a signal for address A arrives the block doesn't necessary writes that to memory but instead tries to match it immediately with a wait for address A. Similar is true if a wait for address A arrives and the semaphore block thinks it knows the memory value at address A. Also I'm not sure if the semaphore block really polls the memory address for changes, instead it might just snoop the MC for writes to this address. Since CPU writes to system memory aren't seen by the GPU MC the semaphore block would never know something changed. I need to check the docs how to do this correctly, Christian. > > ~Maarten >
[Bug 83432] r600_query.c:269:r600_emit_query_end: Assertion `ctx->num_pipelinestat_queries > 0' failed [Gallium HUD]
https://bugs.freedesktop.org/show_bug.cgi?id=83432 Marek Ol??k changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Marek Ol??k --- Fixed by 3dbf55c1be5a8867616e475d943c776d8245d0c. Closing. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ddb19548/attachment.html>
[PATCH 14/19] drm: Don't update vblank timestamp when the counter didn't change
I thought about this one again and opposed to my previous comment now think it's fine, also for drivers without hw vblank counter queries. -mario On Wed, Aug 6, 2014 at 1:49 PM, wrote: > From: Ville Syrj?l? > > If we already have a timestamp for the current vblank counter, don't > update it with a new timestmap. Small errors can creep in between two > timestamp queries for the same vblank count, which could be confusing to > userspace when it queries the timestamp for the same vblank sequence > number twice. > > This problem gets exposed when the vblank disable timer is not used > (or is set to expire quickly) and thus we can get multiple vblank > disable<->enable transition during the same frame which would all > attempt to update the timestamp with the latest estimate. > > Testcase: igt/kms_flip/flip-vs-expired-vblank > Signed-off-by: Ville Syrj?l? > --- > drivers/gpu/drm/drm_irq.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > index af33df1..0523f5b 100644 > --- a/drivers/gpu/drm/drm_irq.c > +++ b/drivers/gpu/drm/drm_irq.c > @@ -106,6 +106,9 @@ static void drm_update_vblank_count(struct drm_device > *dev, int crtc) > DRM_DEBUG("enabling vblank interrupts on crtc %d, missed %d\n", > crtc, diff); > > + if (diff == 0) > + return; > + > /* Reinitialize corresponding vblank timestamp if high-precision > query > * available. Skip this step if query unsupported or failed. Will > * reinitialize delayed at next vblank interrupt in that case. > -- > 1.8.5.5 > > ------ next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/9cdf6567/attachment.html>
[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences
Hey, Op 04-09-14 om 13:54 schreef Christian K?nig: > Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst: >> Use the semaphore mechanism to make this happen, this uses signaling >> from the cpu instead of signaling by the gpu. > > I'm not sure if this will work reliable when the semaphores are in system > memory. We might need to reserve some VRAM for them instead. > > Regards, > Christian. Why would it be unreliable? I mostly kept it in semaphore for simplicity. ~Maarten
[Bug 83500] si_dma_copy_tile causes GPU hangs
https://bugs.freedesktop.org/show_bug.cgi?id=83500 --- Comment #1 from Marek Ol??k --- Thank you very much for tracking this down. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/520bd07c/attachment.html>
[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences
Am 04.09.2014 um 13:42 schrieb Maarten Lankhorst: > Use the semaphore mechanism to make this happen, this uses signaling > from the cpu instead of signaling by the gpu. I'm not sure if this will work reliable when the semaphores are in system memory. We might need to reserve some VRAM for them instead. Regards, Christian. > > Signed-off-by: Maarten Lankhorst > --- > drivers/gpu/drm/radeon/radeon.h | 17 ++- > drivers/gpu/drm/radeon/radeon_cs.c| 30 ++--- > drivers/gpu/drm/radeon/radeon_fence.c | 13 ++- > drivers/gpu/drm/radeon/radeon_semaphore.c | 184 > ++ > 4 files changed, 221 insertions(+), 23 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index dddb2b7dd752..cd18fa7f801c 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -359,6 +359,11 @@ struct radeon_fence_driver { > struct delayed_work lockup_work; > }; > > +struct radeon_fence_cb { > + struct fence_cb base; > + struct fence *fence; > +}; > + > struct radeon_fence { > struct fence base; > > @@ -368,6 +373,10 @@ struct radeon_fence { > unsignedring; > > wait_queue_tfence_wake; > + > + atomic_tnum_cpu_cbs; > + struct radeon_fence_cb *cpu_cbs; > + uint32_t*cpu_sema; > }; > > int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring); > @@ -574,9 +583,11 @@ int radeon_mode_dumb_mmap(struct drm_file *filp, >*/ > struct radeon_semaphore { > struct radeon_sa_bo *sa_bo; > - signed waiters; > + signed waiters, cpu_waiters, cpu_waiters_max; > uint64_tgpu_addr; > struct radeon_fence *sync_to[RADEON_NUM_RINGS]; > + uint32_t*cpu_sema; > + struct radeon_fence_cb *cpu_cbs; > }; > > int radeon_semaphore_create(struct radeon_device *rdev, > @@ -587,6 +598,10 @@ bool radeon_semaphore_emit_wait(struct radeon_device > *rdev, int ring, > struct radeon_semaphore *semaphore); > void radeon_semaphore_sync_to(struct radeon_semaphore *semaphore, > struct radeon_fence *fence); > +int radeon_semaphore_sync_obj(struct radeon_device *rdev, > + struct radeon_semaphore *semaphore, > + struct reservation_object *resv); > + > int radeon_semaphore_sync_rings(struct radeon_device *rdev, > struct radeon_semaphore *semaphore, > int waiting_ring); > diff --git a/drivers/gpu/drm/radeon/radeon_cs.c > b/drivers/gpu/drm/radeon/radeon_cs.c > index 8ad4e2cfae15..b141f5bd029d 100644 > --- a/drivers/gpu/drm/radeon/radeon_cs.c > +++ b/drivers/gpu/drm/radeon/radeon_cs.c > @@ -250,32 +250,16 @@ static int radeon_cs_get_ring(struct radeon_cs_parser > *p, u32 ring, s32 priority > > static int radeon_cs_sync_rings(struct radeon_cs_parser *p) > { > - int i; > - > - for (i = 0; i < p->nrelocs; i++) { > - struct reservation_object *resv; > - struct fence *fence; > - struct radeon_fence *rfence; > - int r; > + int i, ret = 0; > > + for (i = 0; !ret && i < p->nrelocs; i++) { > if (!p->relocs[i].robj) > continue; > > - resv = p->relocs[i].robj->tbo.resv; > - fence = reservation_object_get_excl(resv); > - if (!fence) > - continue; > - rfence = to_radeon_fence(fence); > - if (!rfence || rfence->rdev != p->rdev) { > - r = fence_wait(fence, true); > - if (r) > - return r; > - continue; > - } > - > - radeon_semaphore_sync_to(p->ib.semaphore, rfence); > + ret = radeon_semaphore_sync_obj(p->rdev, p->ib.semaphore, > + p->relocs[i].robj->tbo.resv); > } > - return 0; > + return ret; > } > > /* XXX: note that this is called from the legacy UMS CS ioctl as well */ > @@ -442,6 +426,10 @@ static void radeon_cs_parser_fini(struct > radeon_cs_parser *parser, int error, bo >*/ > list_sort(NULL, &parser->validated, cmp_size_smaller_first); > > + /* must be called with all reservation_objects still held */ > + radeon_semaphore_free(parser->rdev, &parser->ib.semaphore, > + parser->ib.fence); > + > ttm_eu_fence_buffer_objects(&parser->ticket, > &parser->validated, > &parser->
[Bug 83418] EU IV is incorrectly rendered after git1409011930.d571f2
https://bugs.freedesktop.org/show_bug.cgi?id=83418 Marek Ol??k changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Marek Ol??k --- I reverted the problematic commit as 8bd67231797e5d79d72a4e91b37ea81da30c6df3. Closing. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/534b5783/attachment.html>
[Bug 75112] Meta Bug for HyperZ issues on r600g and radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=75112 Bug 75112 depends on bug 83418, which changed state. Bug 83418 Summary: EU IV is incorrectly rendered after git1409011930.d571f2 https://bugs.freedesktop.org/show_bug.cgi?id=83418 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/17efa872/attachment.html>
[Bug 79980] Random radeonsi crashes
https://bugs.freedesktop.org/show_bug.cgi?id=79980 --- Comment #128 from Grigori Goronzy --- You might want to try the patch in https://bugs.freedesktop.org/show_bug.cgi?id=83500 Maybe some of these issues have a common cause. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ad0d226a/attachment.html>
[Bug 83500] New: si_dma_copy_tile causes GPU hangs
https://bugs.freedesktop.org/show_bug.cgi?id=83500 Priority: medium Bug ID: 83500 Assignee: dri-devel at lists.freedesktop.org Summary: si_dma_copy_tile causes GPU hangs Severity: normal Classification: Unclassified OS: Linux (All) Reporter: greg at chown.ath.cx Hardware: x86-64 (AMD64) Status: NEW Version: git Component: Drivers/Gallium/radeonsi Product: Mesa Created attachment 105745 --> https://bugs.freedesktop.org/attachment.cgi?id=105745&action=edit Workaround Async DMA linear to tiled copies are causing GPU hangs in some cases. On Cape Verde, I can easily triggers this as described in [1]. The game Brutal Legend also triggers similar hangs when it streams assets while gameplay. Disabling usage of this function and using the resource_copy_region fallback instead fixes all hangs. The attached patch does that. [1] https://bugs.freedesktop.org/show_bug.cgi?id=79980#c124 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/683ad5c6/attachment.html>
[PATCH 3/7] drm/radeon: cope with foreign fences inside the reservation object
Am 04.09.2014 um 13:40 schrieb Maarten Lankhorst: > Not the whole world is a radeon! :-) > > Signed-off-by: Maarten Lankhorst > --- > drivers/gpu/drm/radeon/radeon.h | 11 - > drivers/gpu/drm/radeon/radeon_cs.c | 32 + > drivers/gpu/drm/radeon/radeon_display.c | 41 > - > drivers/gpu/drm/radeon/radeon_fence.c | 3 +++ > drivers/gpu/drm/radeon/radeon_mode.h| 1 + > 5 files changed, 61 insertions(+), 27 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h > index d80dc547a105..dddb2b7dd752 100644 > --- a/drivers/gpu/drm/radeon/radeon.h > +++ b/drivers/gpu/drm/radeon/radeon.h > @@ -699,17 +699,6 @@ void radeon_doorbell_free(struct radeon_device *rdev, > u32 doorbell); >* IRQS. >*/ > > -struct radeon_flip_work { > - struct work_struct flip_work; > - struct work_struct unpin_work; > - struct radeon_device*rdev; > - int crtc_id; > - uint64_tbase; > - struct drm_pending_vblank_event *event; > - struct radeon_bo*old_rbo; > - struct radeon_fence *fence; > -}; > - Please keep this structure were it was, apart from that the patch looks good on first glance. Christian. > struct r500_irq_stat_regs { > u32 disp_int; > u32 hdmi0_status; > diff --git a/drivers/gpu/drm/radeon/radeon_cs.c > b/drivers/gpu/drm/radeon/radeon_cs.c > index 6e3d1c8f3483..8ad4e2cfae15 100644 > --- a/drivers/gpu/drm/radeon/radeon_cs.c > +++ b/drivers/gpu/drm/radeon/radeon_cs.c > @@ -248,23 +248,34 @@ static int radeon_cs_get_ring(struct radeon_cs_parser > *p, u32 ring, s32 priority > return 0; > } > > -static void radeon_cs_sync_rings(struct radeon_cs_parser *p) > +static int radeon_cs_sync_rings(struct radeon_cs_parser *p) > { > int i; > > for (i = 0; i < p->nrelocs; i++) { > struct reservation_object *resv; > struct fence *fence; > + struct radeon_fence *rfence; > + int r; > > if (!p->relocs[i].robj) > continue; > > resv = p->relocs[i].robj->tbo.resv; > fence = reservation_object_get_excl(resv); > + if (!fence) > + continue; > + rfence = to_radeon_fence(fence); > + if (!rfence || rfence->rdev != p->rdev) { > + r = fence_wait(fence, true); > + if (r) > + return r; > + continue; > + } > > - radeon_semaphore_sync_to(p->ib.semaphore, > - (struct radeon_fence *)fence); > + radeon_semaphore_sync_to(p->ib.semaphore, rfence); > } > + return 0; > } > > /* XXX: note that this is called from the legacy UMS CS ioctl as well */ > @@ -474,13 +485,19 @@ static int radeon_cs_ib_chunk(struct radeon_device > *rdev, > return r; > } > > + r = radeon_cs_sync_rings(parser); > + if (r) { > + if (r != -ERESTARTSYS) > + DRM_ERROR("Failed to sync rings: %i\n", r); > + return r; > + } > + > if (parser->ring == R600_RING_TYPE_UVD_INDEX) > radeon_uvd_note_usage(rdev); > else if ((parser->ring == TN_RING_TYPE_VCE1_INDEX) || >(parser->ring == TN_RING_TYPE_VCE2_INDEX)) > radeon_vce_note_usage(rdev); > > - radeon_cs_sync_rings(parser); > r = radeon_ib_schedule(rdev, &parser->ib, NULL, true); > if (r) { > DRM_ERROR("Failed to schedule IB !\n"); > @@ -567,7 +584,12 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device > *rdev, > if (r) { > goto out; > } > - radeon_cs_sync_rings(parser); > + r = radeon_cs_sync_rings(parser); > + if (r) { > + if (r != -ERESTARTSYS) > + DRM_ERROR("Failed to sync rings: %i\n", r); > + goto out; > + } > radeon_semaphore_sync_to(parser->ib.semaphore, vm->fence); > > if ((rdev->family >= CHIP_TAHITI) && > diff --git a/drivers/gpu/drm/radeon/radeon_display.c > b/drivers/gpu/drm/radeon/radeon_display.c > index bc894c17b2f9..715b2d95346c 100644 > --- a/drivers/gpu/drm/radeon/radeon_display.c > +++ b/drivers/gpu/drm/radeon/radeon_display.c > @@ -36,6 +36,17 @@ > > #include > > +struct radeon_flip_work { > + struct work_struct flip_work; > + struct work_struct unpin_work; > + struct radeon_device*rdev; > + int crtc_id; > + uint64_tbase; > + struct drm_pending_vblank_event *event; > + struct radeon_bo*old_rbo; > + struct fence*fence;
[PATCH 7/7] drm/nouveau: allow asynchronous waiting using gart fences
This requires allocating a fence sooner to annotate any cross-dev fences, and making sure that enough memory is available before emitting the fence. The current seqno is written to the GART bo on completion, and a list of finished fences is kept to allow arbitrary depth. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/nouveau_bo.c | 28 ++-- drivers/gpu/drm/nouveau/nouveau_chan.c| 6 +- drivers/gpu/drm/nouveau/nouveau_display.c | 45 --- drivers/gpu/drm/nouveau/nouveau_fence.c | 212 ++ drivers/gpu/drm/nouveau/nouveau_fence.h | 29 ++-- drivers/gpu/drm/nouveau/nouveau_gem.c | 25 ++-- drivers/gpu/drm/nouveau/nv04_fence.c | 9 +- drivers/gpu/drm/nouveau/nv10_fence.c | 9 +- drivers/gpu/drm/nouveau/nv84_fence.c | 31 +++-- drivers/gpu/drm/nouveau/nvc0_fence.c | 4 +- 10 files changed, 305 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index f89b4a7c93fe..24c941927926 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -970,21 +970,21 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict, bool intr, } mutex_lock_nested(&cli->mutex, SINGLE_DEPTH_NESTING); - ret = nouveau_fence_sync(nouveau_bo(bo), chan, true); - if (ret == 0) { + ret = nouveau_fence_new(chan, &fence); + if (ret) + goto out; + + ret = nouveau_fence_sync(nouveau_bo(bo), fence, true); + if (ret == 0) ret = drm->ttm.move(chan, bo, &bo->mem, new_mem); - if (ret == 0) { - ret = nouveau_fence_new(chan, false, &fence); - if (ret == 0) { - ret = ttm_bo_move_accel_cleanup(bo, - &fence->base, - evict, - no_wait_gpu, - new_mem); - nouveau_fence_unref(&fence); - } - } - } + if (ret == 0) + ret = nouveau_fence_emit(fence); + if (ret == 0) + ret = ttm_bo_move_accel_cleanup(bo, &fence->base, evict, + no_wait_gpu, new_mem); + nouveau_fence_unref(&fence); + +out: mutex_unlock(&cli->mutex); return ret; } diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c index d639750379d6..1e5c76dfed3a 100644 --- a/drivers/gpu/drm/nouveau/nouveau_chan.c +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c @@ -46,9 +46,11 @@ nouveau_channel_idle(struct nouveau_channel *chan) struct nouveau_fence *fence = NULL; int ret; - ret = nouveau_fence_new(chan, false, &fence); + ret = nouveau_fence_new(chan, &fence); if (!ret) { - ret = nouveau_fence_wait(fence, false, false); + ret = nouveau_fence_emit(fence); + if (!ret) + ret = nouveau_fence_wait(fence, false, false); nouveau_fence_unref(&fence); } diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index a9ec525c0994..adbf870686aa 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -26,6 +26,7 @@ #include #include +#include #include @@ -36,7 +37,6 @@ #include "nouveau_gem.h" #include "nouveau_connector.h" #include "nv50_display.h" - #include "nouveau_fence.h" #include @@ -644,7 +644,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan, struct nouveau_bo *old_bo, struct nouveau_bo *new_bo, struct nouveau_page_flip_state *s, - struct nouveau_fence **pfence) + struct nouveau_fence *fence) { struct nouveau_fence_chan *fctx = chan->fence; struct nouveau_drm *drm = chan->drm; @@ -657,11 +657,6 @@ nouveau_page_flip_emit(struct nouveau_channel *chan, list_add_tail(&s->head, &fctx->flip); spin_unlock_irqrestore(&dev->event_lock, flags); - /* Synchronize with the old framebuffer */ - ret = nouveau_fence_sync(old_bo, chan, false); - if (ret) - goto fail; - /* Emit the pageflip */ ret = RING_SPACE(chan, 2); if (ret) @@ -674,7 +669,7 @@ nouveau_page_flip_emit(struct nouveau_channel *chan, OUT_RING (chan, 0x); FIRE_RING (chan); - ret = nouveau_fence_new(chan, false, pfence); + ret = nouveau_fence_emit(fence); if (ret) goto fail; @@ -700,6 +695,12 @@ nouveau_crtc_page_flip(struct drm_crtc *crtc, stru
[PATCH 6/7] drm/radeon: allow asynchronous waiting on foreign fences
Use the semaphore mechanism to make this happen, this uses signaling from the cpu instead of signaling by the gpu. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/radeon/radeon.h | 17 ++- drivers/gpu/drm/radeon/radeon_cs.c| 30 ++--- drivers/gpu/drm/radeon/radeon_fence.c | 13 ++- drivers/gpu/drm/radeon/radeon_semaphore.c | 184 ++ 4 files changed, 221 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index dddb2b7dd752..cd18fa7f801c 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -359,6 +359,11 @@ struct radeon_fence_driver { struct delayed_work lockup_work; }; +struct radeon_fence_cb { + struct fence_cb base; + struct fence *fence; +}; + struct radeon_fence { struct fence base; @@ -368,6 +373,10 @@ struct radeon_fence { unsignedring; wait_queue_tfence_wake; + + atomic_tnum_cpu_cbs; + struct radeon_fence_cb *cpu_cbs; + uint32_t*cpu_sema; }; int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring); @@ -574,9 +583,11 @@ int radeon_mode_dumb_mmap(struct drm_file *filp, */ struct radeon_semaphore { struct radeon_sa_bo *sa_bo; - signed waiters; + signed waiters, cpu_waiters, cpu_waiters_max; uint64_tgpu_addr; struct radeon_fence *sync_to[RADEON_NUM_RINGS]; + uint32_t*cpu_sema; + struct radeon_fence_cb *cpu_cbs; }; int radeon_semaphore_create(struct radeon_device *rdev, @@ -587,6 +598,10 @@ bool radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring, struct radeon_semaphore *semaphore); void radeon_semaphore_sync_to(struct radeon_semaphore *semaphore, struct radeon_fence *fence); +int radeon_semaphore_sync_obj(struct radeon_device *rdev, + struct radeon_semaphore *semaphore, + struct reservation_object *resv); + int radeon_semaphore_sync_rings(struct radeon_device *rdev, struct radeon_semaphore *semaphore, int waiting_ring); diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 8ad4e2cfae15..b141f5bd029d 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -250,32 +250,16 @@ static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority static int radeon_cs_sync_rings(struct radeon_cs_parser *p) { - int i; - - for (i = 0; i < p->nrelocs; i++) { - struct reservation_object *resv; - struct fence *fence; - struct radeon_fence *rfence; - int r; + int i, ret = 0; + for (i = 0; !ret && i < p->nrelocs; i++) { if (!p->relocs[i].robj) continue; - resv = p->relocs[i].robj->tbo.resv; - fence = reservation_object_get_excl(resv); - if (!fence) - continue; - rfence = to_radeon_fence(fence); - if (!rfence || rfence->rdev != p->rdev) { - r = fence_wait(fence, true); - if (r) - return r; - continue; - } - - radeon_semaphore_sync_to(p->ib.semaphore, rfence); + ret = radeon_semaphore_sync_obj(p->rdev, p->ib.semaphore, + p->relocs[i].robj->tbo.resv); } - return 0; + return ret; } /* XXX: note that this is called from the legacy UMS CS ioctl as well */ @@ -442,6 +426,10 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo */ list_sort(NULL, &parser->validated, cmp_size_smaller_first); + /* must be called with all reservation_objects still held */ + radeon_semaphore_free(parser->rdev, &parser->ib.semaphore, + parser->ib.fence); + ttm_eu_fence_buffer_objects(&parser->ticket, &parser->validated, &parser->ib.fence->base); diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c index 0262fe2580d2..7687a7f8f41b 100644 --- a/drivers/gpu/drm/radeon/radeon_fence.c +++ b/drivers/gpu/drm/radeon/radeon_fence.c @@ -142,6 +142,8 @@ int radeon_fence_emit(struct radeon_device *rdev, (*fence)->ring = ring; fence_init(&(*fence)->base,
[PATCH 5/7] drm/nouveau: export reservation_object from dmabuf to ttm
Adds an extra argument to nouveau_bo_new, which is used in nouveau_prime.c. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/nouveau/dispnv04/crtc.c | 2 +- drivers/gpu/drm/nouveau/nouveau_bo.c| 4 ++-- drivers/gpu/drm/nouveau/nouveau_bo.h| 1 + drivers/gpu/drm/nouveau/nouveau_chan.c | 2 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 6 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 2 +- drivers/gpu/drm/nouveau/nouveau_prime.c | 5 - drivers/gpu/drm/nouveau/nv17_fence.c| 2 +- drivers/gpu/drm/nouveau/nv50_display.c | 6 +++--- drivers/gpu/drm/nouveau/nv50_fence.c| 2 +- drivers/gpu/drm/nouveau/nv84_fence.c| 4 ++-- 11 files changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c b/drivers/gpu/drm/nouveau/dispnv04/crtc.c index b90aa5c1f90a..fca6a1f9c20c 100644 --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c @@ -1127,7 +1127,7 @@ nv04_crtc_create(struct drm_device *dev, int crtc_num) drm_mode_crtc_set_gamma_size(&nv_crtc->base, 256); ret = nouveau_bo_new(dev, 64*64*4, 0x100, TTM_PL_FLAG_VRAM, -0, 0x, NULL, &nv_crtc->cursor.nvbo); +0, 0x, NULL, NULL, &nv_crtc->cursor.nvbo); if (!ret) { ret = nouveau_bo_pin(nv_crtc->cursor.nvbo, TTM_PL_FLAG_VRAM); if (!ret) { diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index bda32276bcc2..f89b4a7c93fe 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -181,7 +181,7 @@ nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32 flags, int nouveau_bo_new(struct drm_device *dev, int size, int align, uint32_t flags, uint32_t tile_mode, uint32_t tile_flags, - struct sg_table *sg, + struct sg_table *sg, struct reservation_object *robj, struct nouveau_bo **pnvbo) { struct nouveau_drm *drm = nouveau_drm(dev); @@ -230,7 +230,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align, ret = ttm_bo_init(&drm->ttm.bdev, &nvbo->bo, size, type, &nvbo->placement, align >> PAGE_SHIFT, false, NULL, acc_size, sg, - NULL, nouveau_bo_del_ttm); + robj, nouveau_bo_del_ttm); if (ret) { /* ttm will call nouveau_bo_del_ttm if it fails.. */ return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.h b/drivers/gpu/drm/nouveau/nouveau_bo.h index ae95b2d43b36..d20c0b5c4e31 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.h +++ b/drivers/gpu/drm/nouveau/nouveau_bo.h @@ -68,6 +68,7 @@ extern struct ttm_bo_driver nouveau_bo_driver; void nouveau_bo_move_init(struct nouveau_drm *); int nouveau_bo_new(struct drm_device *, int size, int align, u32 flags, u32 tile_mode, u32 tile_flags, struct sg_table *sg, + struct reservation_object *robj, struct nouveau_bo **); int nouveau_bo_pin(struct nouveau_bo *, u32 flags); int nouveau_bo_unpin(struct nouveau_bo *); diff --git a/drivers/gpu/drm/nouveau/nouveau_chan.c b/drivers/gpu/drm/nouveau/nouveau_chan.c index 99cd9e4a2aa6..d639750379d6 100644 --- a/drivers/gpu/drm/nouveau/nouveau_chan.c +++ b/drivers/gpu/drm/nouveau/nouveau_chan.c @@ -106,7 +106,7 @@ nouveau_channel_prep(struct nouveau_drm *drm, struct nvif_device *device, if (nouveau_vram_pushbuf) target = TTM_PL_FLAG_VRAM; - ret = nouveau_bo_new(drm->dev, size, 0, target, 0, 0, NULL, + ret = nouveau_bo_new(drm->dev, size, 0, target, 0, 0, NULL, NULL, &chan->push.buffer); if (ret == 0) { ret = nouveau_bo_pin(chan->push.buffer, target); diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index decfe6c4ac07..574517a396fd 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -195,8 +195,12 @@ nouveau_fence_work(struct fence *fence, work = kmalloc(sizeof(*work), GFP_KERNEL); if (!work) { + /* +* this might not be a nouveau fence any more, +* so force a lazy wait here +*/ WARN_ON(nouveau_fence_wait((struct nouveau_fence *)fence, - false, false)); + true, false)); goto err; } diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index b7dbd16904e0..1bc4eb33b60f 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -165,7 +165,7 @@ nouveau_gem_new(struct drm_device *dev, int size, int align, uint32_t domain, flags |= TTM_PL_FLAG_SYSTEM; ret
[PATCH 4/7] drm/radeon: export reservation_object from dmabuf to ttm
Adds an extra argument to radeon_bo_create, which is used in radeon_prime.c. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/radeon/cik.c | 4 ++-- drivers/gpu/drm/radeon/evergreen.c| 6 +++--- drivers/gpu/drm/radeon/r600.c | 4 ++-- drivers/gpu/drm/radeon/radeon_benchmark.c | 4 ++-- drivers/gpu/drm/radeon/radeon_device.c| 2 +- drivers/gpu/drm/radeon/radeon_gart.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_object.c| 8 +--- drivers/gpu/drm/radeon/radeon_object.h| 1 + drivers/gpu/drm/radeon/radeon_prime.c | 5 - drivers/gpu/drm/radeon/radeon_ring.c | 2 +- drivers/gpu/drm/radeon/radeon_sa.c| 2 +- drivers/gpu/drm/radeon/radeon_test.c | 5 +++-- drivers/gpu/drm/radeon/radeon_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_uvd.c | 3 ++- drivers/gpu/drm/radeon/radeon_vce.c | 3 ++- drivers/gpu/drm/radeon/radeon_vm.c| 5 +++-- 17 files changed, 35 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c index 1f598ab3b9a7..d984de903928 100644 --- a/drivers/gpu/drm/radeon/cik.c +++ b/drivers/gpu/drm/radeon/cik.c @@ -4689,7 +4689,7 @@ static int cik_mec_init(struct radeon_device *rdev) r = radeon_bo_create(rdev, rdev->mec.num_mec *rdev->mec.num_pipe * MEC_HPD_SIZE * 2, PAGE_SIZE, true, -RADEON_GEM_DOMAIN_GTT, 0, NULL, +RADEON_GEM_DOMAIN_GTT, 0, NULL, NULL, &rdev->mec.hpd_eop_obj); if (r) { dev_warn(rdev->dev, "(%d) create HDP EOP bo failed\n", r); @@ -4860,7 +4860,7 @@ static int cik_cp_compute_resume(struct radeon_device *rdev) sizeof(struct bonaire_mqd), PAGE_SIZE, true, RADEON_GEM_DOMAIN_GTT, 0, NULL, -&rdev->ring[idx].mqd_obj); +NULL, &rdev->ring[idx].mqd_obj); if (r) { dev_warn(rdev->dev, "(%d) create MQD bo failed\n", r); return r; diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index dbca60c7d097..c6ccef6c3596 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -4023,7 +4023,7 @@ int sumo_rlc_init(struct radeon_device *rdev) if (rdev->rlc.save_restore_obj == NULL) { r = radeon_bo_create(rdev, dws * 4, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, 0, NULL, -&rdev->rlc.save_restore_obj); +NULL, &rdev->rlc.save_restore_obj); if (r) { dev_warn(rdev->dev, "(%d) create RLC sr bo failed\n", r); return r; @@ -4102,7 +4102,7 @@ int sumo_rlc_init(struct radeon_device *rdev) if (rdev->rlc.clear_state_obj == NULL) { r = radeon_bo_create(rdev, dws * 4, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, 0, NULL, -&rdev->rlc.clear_state_obj); +NULL, &rdev->rlc.clear_state_obj); if (r) { dev_warn(rdev->dev, "(%d) create RLC c bo failed\n", r); sumo_rlc_fini(rdev); @@ -4179,7 +4179,7 @@ int sumo_rlc_init(struct radeon_device *rdev) r = radeon_bo_create(rdev, rdev->rlc.cp_table_size, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, 0, NULL, -&rdev->rlc.cp_table_obj); +NULL, &rdev->rlc.cp_table_obj); if (r) { dev_warn(rdev->dev, "(%d) create RLC cp table bo failed\n", r); sumo_rlc_fini(rdev); diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c index a95ced569d84..94e82c6b03ca 100644 --- a/drivers/gpu/drm/radeon/r600.c +++ b/drivers/gpu/drm/radeon/r600.c @@ -1430,7 +1430,7 @@ int r600_vram_scratch_init(struct radeon_device *rdev) if (rdev->vram_scratch.robj == NULL) { r = radeon_bo_create(rdev, RADEON_GPU_PAGE_SIZE, PAGE_SIZE, true, RADEON_GEM_DOMAIN_VRAM, -0, NULL, &rdev->vram_scratch.robj); +
[PATCH 3/7] drm/radeon: cope with foreign fences inside the reservation object
Not the whole world is a radeon! :-) Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/radeon/radeon.h | 11 - drivers/gpu/drm/radeon/radeon_cs.c | 32 + drivers/gpu/drm/radeon/radeon_display.c | 41 - drivers/gpu/drm/radeon/radeon_fence.c | 3 +++ drivers/gpu/drm/radeon/radeon_mode.h| 1 + 5 files changed, 61 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h index d80dc547a105..dddb2b7dd752 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -699,17 +699,6 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell); * IRQS. */ -struct radeon_flip_work { - struct work_struct flip_work; - struct work_struct unpin_work; - struct radeon_device*rdev; - int crtc_id; - uint64_tbase; - struct drm_pending_vblank_event *event; - struct radeon_bo*old_rbo; - struct radeon_fence *fence; -}; - struct r500_irq_stat_regs { u32 disp_int; u32 hdmi0_status; diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 6e3d1c8f3483..8ad4e2cfae15 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -248,23 +248,34 @@ static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority return 0; } -static void radeon_cs_sync_rings(struct radeon_cs_parser *p) +static int radeon_cs_sync_rings(struct radeon_cs_parser *p) { int i; for (i = 0; i < p->nrelocs; i++) { struct reservation_object *resv; struct fence *fence; + struct radeon_fence *rfence; + int r; if (!p->relocs[i].robj) continue; resv = p->relocs[i].robj->tbo.resv; fence = reservation_object_get_excl(resv); + if (!fence) + continue; + rfence = to_radeon_fence(fence); + if (!rfence || rfence->rdev != p->rdev) { + r = fence_wait(fence, true); + if (r) + return r; + continue; + } - radeon_semaphore_sync_to(p->ib.semaphore, -(struct radeon_fence *)fence); + radeon_semaphore_sync_to(p->ib.semaphore, rfence); } + return 0; } /* XXX: note that this is called from the legacy UMS CS ioctl as well */ @@ -474,13 +485,19 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev, return r; } + r = radeon_cs_sync_rings(parser); + if (r) { + if (r != -ERESTARTSYS) + DRM_ERROR("Failed to sync rings: %i\n", r); + return r; + } + if (parser->ring == R600_RING_TYPE_UVD_INDEX) radeon_uvd_note_usage(rdev); else if ((parser->ring == TN_RING_TYPE_VCE1_INDEX) || (parser->ring == TN_RING_TYPE_VCE2_INDEX)) radeon_vce_note_usage(rdev); - radeon_cs_sync_rings(parser); r = radeon_ib_schedule(rdev, &parser->ib, NULL, true); if (r) { DRM_ERROR("Failed to schedule IB !\n"); @@ -567,7 +584,12 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, if (r) { goto out; } - radeon_cs_sync_rings(parser); + r = radeon_cs_sync_rings(parser); + if (r) { + if (r != -ERESTARTSYS) + DRM_ERROR("Failed to sync rings: %i\n", r); + goto out; + } radeon_semaphore_sync_to(parser->ib.semaphore, vm->fence); if ((rdev->family >= CHIP_TAHITI) && diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index bc894c17b2f9..715b2d95346c 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -36,6 +36,17 @@ #include +struct radeon_flip_work { + struct work_struct flip_work; + struct work_struct unpin_work; + struct radeon_device*rdev; + int crtc_id; + uint64_tbase; + struct drm_pending_vblank_event *event; + struct radeon_bo*old_rbo; + struct fence*fence; +}; + static void avivo_crtc_load_lut(struct drm_crtc *crtc) { struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc); @@ -402,14 +413,21 @@ static void radeon_flip_work_func(struct work_struct *__work) down_read(&rdev->exclusive_lock); if (work->fence) { - r = radeon_fence_wai
[PATCH 2/7] drm/ttm: add reservation_object as argument to ttm_bo_init
This allows importing reservation objects from dma-bufs. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/ast/ast_ttm.c| 2 +- drivers/gpu/drm/bochs/bochs_mm.c | 2 +- drivers/gpu/drm/cirrus/cirrus_ttm.c | 2 +- drivers/gpu/drm/mgag200/mgag200_ttm.c| 2 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 2 +- drivers/gpu/drm/qxl/qxl_object.c | 2 +- drivers/gpu/drm/radeon/radeon_object.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 24 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 2 +- include/drm/ttm/ttm_bo_api.h | 2 ++ 10 files changed, 28 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c index 8008ea0bc76c..58c19cfe6af0 100644 --- a/drivers/gpu/drm/ast/ast_ttm.c +++ b/drivers/gpu/drm/ast/ast_ttm.c @@ -339,7 +339,7 @@ int ast_bo_create(struct drm_device *dev, int size, int align, ret = ttm_bo_init(&ast->ttm.bdev, &astbo->bo, size, ttm_bo_type_device, &astbo->placement, align >> PAGE_SHIFT, false, NULL, acc_size, - NULL, ast_bo_ttm_destroy); + NULL, NULL, ast_bo_ttm_destroy); if (ret) return ret; diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c index 2af30e7607d7..6c50a7a44864 100644 --- a/drivers/gpu/drm/bochs/bochs_mm.c +++ b/drivers/gpu/drm/bochs/bochs_mm.c @@ -377,7 +377,7 @@ static int bochs_bo_create(struct drm_device *dev, int size, int align, ret = ttm_bo_init(&bochs->ttm.bdev, &bochsbo->bo, size, ttm_bo_type_device, &bochsbo->placement, align >> PAGE_SHIFT, false, NULL, acc_size, - NULL, bochs_bo_ttm_destroy); + NULL, NULL, bochs_bo_ttm_destroy); if (ret) return ret; diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c b/drivers/gpu/drm/cirrus/cirrus_ttm.c index 3e7d758330a9..b3b3d16d1279 100644 --- a/drivers/gpu/drm/cirrus/cirrus_ttm.c +++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c @@ -343,7 +343,7 @@ int cirrus_bo_create(struct drm_device *dev, int size, int align, ret = ttm_bo_init(&cirrus->ttm.bdev, &cirrusbo->bo, size, ttm_bo_type_device, &cirrusbo->placement, align >> PAGE_SHIFT, false, NULL, acc_size, - NULL, cirrus_bo_ttm_destroy); + NULL, NULL, cirrus_bo_ttm_destroy); if (ret) return ret; diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c b/drivers/gpu/drm/mgag200/mgag200_ttm.c index be883ef5a1d3..398b6fb161a6 100644 --- a/drivers/gpu/drm/mgag200/mgag200_ttm.c +++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c @@ -339,7 +339,7 @@ int mgag200_bo_create(struct drm_device *dev, int size, int align, ret = ttm_bo_init(&mdev->ttm.bdev, &mgabo->bo, size, ttm_bo_type_device, &mgabo->placement, align >> PAGE_SHIFT, false, NULL, acc_size, - NULL, mgag200_bo_ttm_destroy); + NULL, NULL, mgag200_bo_ttm_destroy); if (ret) return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index eea74b127b03..bda32276bcc2 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -230,7 +230,7 @@ nouveau_bo_new(struct drm_device *dev, int size, int align, ret = ttm_bo_init(&drm->ttm.bdev, &nvbo->bo, size, type, &nvbo->placement, align >> PAGE_SHIFT, false, NULL, acc_size, sg, - nouveau_bo_del_ttm); + NULL, nouveau_bo_del_ttm); if (ret) { /* ttm will call nouveau_bo_del_ttm if it fails.. */ return ret; diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c index 69c104c3240f..cdeaf08fdc74 100644 --- a/drivers/gpu/drm/qxl/qxl_object.c +++ b/drivers/gpu/drm/qxl/qxl_object.c @@ -110,7 +110,7 @@ int qxl_bo_create(struct qxl_device *qdev, r = ttm_bo_init(&qdev->mman.bdev, &bo->tbo, size, type, &bo->placement, 0, !kernel, NULL, size, - NULL, &qxl_ttm_bo_destroy); + NULL, NULL, &qxl_ttm_bo_destroy); if (unlikely(r != 0)) { if (r != -ERESTARTSYS) dev_err(qdev->dev, diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c index aadbd36e64b9..61f3f16bbcbc 100644 --- a/drivers/gpu/drm/radeon/radeon_object.c +++ b/drivers/gpu/drm/radeon/radeon_object.c @@ -209,7 +209,7 @@ int radeon_bo_create(struct radeon_device *rdev, down_read(&rdev->pm.mclk_lock); r = ttm_bo_init(&rdev->mman.bdev, &bo->tbo, size, type,
[PATCH 1/7] drm: Pass dma-buf as argument to, gem_prime_import_sg_table
Allows importing reservation_objects from a dma-buf. Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/drm_gem_cma_helper.c| 5 +++-- drivers/gpu/drm/drm_prime.c | 2 +- drivers/gpu/drm/msm/msm_drv.h | 2 +- drivers/gpu/drm/msm/msm_gem_prime.c | 4 ++-- drivers/gpu/drm/nouveau/nouveau_gem.h | 2 +- drivers/gpu/drm/nouveau/nouveau_prime.c | 5 +++-- drivers/gpu/drm/qxl/qxl_drv.h | 2 +- drivers/gpu/drm/qxl/qxl_prime.c | 2 +- drivers/gpu/drm/radeon/radeon_drv.c | 2 +- drivers/gpu/drm/radeon/radeon_prime.c | 5 +++-- include/drm/drmP.h | 3 ++- include/drm/drm_gem_cma_helper.h| 3 ++- 12 files changed, 21 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c index e467e67af6e7..08646af2ddc2 100644 --- a/drivers/gpu/drm/drm_gem_cma_helper.c +++ b/drivers/gpu/drm/drm_gem_cma_helper.c @@ -316,7 +316,8 @@ out: EXPORT_SYMBOL_GPL(drm_gem_cma_prime_get_sg_table); struct drm_gem_object * -drm_gem_cma_prime_import_sg_table(struct drm_device *dev, size_t size, +drm_gem_cma_prime_import_sg_table(struct drm_device *dev, + struct dma_buf_attachment *attach, struct sg_table *sgt) { struct drm_gem_cma_object *cma_obj; @@ -325,7 +326,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev, size_t size, return ERR_PTR(-EINVAL); /* Create a CMA GEM buffer. */ - cma_obj = __drm_gem_cma_create(dev, size); + cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size); if (IS_ERR(cma_obj)) return ERR_CAST(cma_obj); diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 99d578bad17e..dc4711f30382 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -522,7 +522,7 @@ struct drm_gem_object *drm_gem_prime_import(struct drm_device *dev, goto fail_detach; } - obj = dev->driver->gem_prime_import_sg_table(dev, dma_buf->size, sgt); + obj = dev->driver->gem_prime_import_sg_table(dev, attach, sgt); if (IS_ERR(obj)) { ret = PTR_ERR(obj); goto fail_unmap; diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index 8a2c5fd0893e..a0dc2592ffc1 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -170,7 +170,7 @@ struct sg_table *msm_gem_prime_get_sg_table(struct drm_gem_object *obj); void *msm_gem_prime_vmap(struct drm_gem_object *obj); void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr); struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev, - size_t size, struct sg_table *sg); + struct dma_buf_attachment *attach, struct sg_table *sg); int msm_gem_prime_pin(struct drm_gem_object *obj); void msm_gem_prime_unpin(struct drm_gem_object *obj); void *msm_gem_vaddr_locked(struct drm_gem_object *obj); diff --git a/drivers/gpu/drm/msm/msm_gem_prime.c b/drivers/gpu/drm/msm/msm_gem_prime.c index d48f9fc5129b..b75f9940ee9e 100644 --- a/drivers/gpu/drm/msm/msm_gem_prime.c +++ b/drivers/gpu/drm/msm/msm_gem_prime.c @@ -37,9 +37,9 @@ void msm_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr) } struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev, - size_t size, struct sg_table *sg) + struct dma_buf_attachment *attach, struct sg_table *sg) { - return msm_gem_import(dev, size, sg); + return msm_gem_import(dev, attach->dmabuf->size, sg); } int msm_gem_prime_pin(struct drm_gem_object *obj) diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.h b/drivers/gpu/drm/nouveau/nouveau_gem.h index ddab762d81fe..e4049faca780 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.h +++ b/drivers/gpu/drm/nouveau/nouveau_gem.h @@ -39,7 +39,7 @@ struct reservation_object *nouveau_gem_prime_res_obj(struct drm_gem_object *); extern void nouveau_gem_prime_unpin(struct drm_gem_object *); extern struct sg_table *nouveau_gem_prime_get_sg_table(struct drm_gem_object *); extern struct drm_gem_object *nouveau_gem_prime_import_sg_table( - struct drm_device *, size_t size, struct sg_table *); + struct drm_device *, struct dma_buf_attachment *, struct sg_table *); extern void *nouveau_gem_prime_vmap(struct drm_gem_object *); extern void nouveau_gem_prime_vunmap(struct drm_gem_object *, void *); diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c index 1f51008e4d26..2215cdba587d 100644 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c @@ -23,6 +23,7 @@ */ #include +#include #include "nouveau_drm.h" #include "nouveau_gem.h" @@ -56,7 +57,7 @@ void nouveau_gem_prime_vunmap(struct drm_gem_object *obj, void *vaddr) } struct drm_gem_object *nouveau_gem_prime_import_sg_t
[PATCH 0/7] cross-dev synchronization in TTM through dma-buf.
So this is finally it. After all the work writing support for fences cross-dev synchronization is now possible. :-) The last 2 patches of this series are not needed for cross-dev to work. But without it any waits on cross-device fences will be done synchronously. I've previously tested this with i915, but the patches for i915 fail to apply again with the execlist stuff, so I haven't tried with the latest drm-next changes. I would like to have the first 2 patches applied on drm-next, and the radeon/nouveau specific patches when they go through their review.
[Bug 75112] Meta Bug for HyperZ issues on r600g and radeonsi
https://bugs.freedesktop.org/show_bug.cgi?id=75112 smoki changed: What|Removed |Added Depends on||83418 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/1b7b6bbc/attachment-0001.html>
[Bug 83418] EU IV is incorrectly rendered after git1409011930.d571f2
https://bugs.freedesktop.org/show_bug.cgi?id=83418 smoki changed: What|Removed |Added Blocks||75112 -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ad047194/attachment.html>
[Bug 44126] [r300g] 0ad: carpet textures "flash" and get hidden by ground texture.
https://bugs.freedesktop.org/show_bug.cgi?id=44126 --- Comment #4 from Marek Ol??k --- Flickering when 2 primitives exactly intersect each other and vertex positions are not equal is a quite common programming mistake in games. Drivers cannot do anything about it. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/01a410f7/attachment.html>
[Bug 83418] EU IV is incorrectly rendered after git1409011930.d571f2
https://bugs.freedesktop.org/show_bug.cgi?id=83418 --- Comment #7 from Jos? Su?rez --- As stated by smoki, reverting that commit indeed fixes the problem. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/ac463558/attachment.html>
TTM placement & caching issue/questions
On Wed, 2014-09-03 at 22:07 -0400, Jerome Glisse wrote: > So in the meantime the attached patch should work, it just silently ignore > the caching attribute request on non x86 instead of pretending that things > are setup as expected and then latter the radeon ou nouveau hw unsetting > the snoop bit. > > It's not tested but i think it should work. I'm still getting placements with !CACHED going from bo_memcpy in ttm_io_prot() though ... I'm looking at filtering the placement attributes instead. Ben. > > > > Cheers, > > J?r?me > > > > > > > > Cheers, > > > Ben. > > > > > > > > > ___ > > > dri-devel mailing list > > > dri-devel at lists.freedesktop.org > > > http://lists.freedesktop.org/mailman/listinfo/dri-devel
TTM placement & caching issue/questions
On 09/04/2014 11:43 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 11:34 +0200, Daniel Vetter wrote: >> On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: >>> Last time I tested, (and it seems like Michel is on the same track), >>> writing with the CPU to write-combined memory was substantially faster >>> than writing to cached memory, with the additional side-effect that CPU >>> caches are left unpolluted. >>> >>> Moreover (although only tested on Intel's embedded chipsets), texturing >>> from cpu-cache-coherent PCI memory was a real GPU performance hog >>> compared to texturing from non-snooped memory. Hence, whenever a buffer >>> could be classified as GPU-read-only (or almost at least), it should be >>> placed in write-combined memory. >> Just a quick comment since this explicitly referes to intel chips: On >> desktop/laptop chips with the big shared l3/l4 caches it's the other way >> round. Cached uploads are substantially faster than wc and not using >> coherent access is a severe perf hit for texturing. I guess the hw guys >> worked really hard to hide the snooping costs so that the gpu can benefit >> from the massive bandwidth these caches can provide. > This is similar to modern POWER chips as well. We have pretty big L3's > (though not technically shared they are in a separate quadrant and we > have a shared L4 in the memory buffer) and our fabric is generally > optimized for cachable/coherent access performance. In fact, we only > have so many credits for NC accesses on the bus... > Thanks both of you for the update. I haven't dealt with real hardware for a while.. /Thomas
TTM placement & caching issue/questions
On Wed, 2014-09-03 at 21:55 -0400, Jerome Glisse wrote: > So i think we need to get a platform flags and or set_pages_array_wc|uc > needs to fail and this would fallback to cached mapping if the fallback > code still works. So if your arch properly return and error for those > cache changing function then you should be fine. > > This also means that we need to fix ttm_tt_set_placement_caching so that > when it returns an error it switches to cached mapping. Which will always > work. Can't I just filter the mem_type definitions in the mem_type_manager with something along that totally untested patch ? Or do I *also* need to make those set_page_array_* things fail ? --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -1308,6 +1308,24 @@ int ttm_bo_evict_mm(struct ttm_bo_device *bdev, unsigned } EXPORT_SYMBOL(ttm_bo_evict_mm); +static void ttm_bo_filter_mem_type(struct ttm_bo_device *bdev, unsigned type, + struct ttm_mem_type_manager *man) +{ + /* +* On some architectures/patforms, we cannot allow non-cachable +* mappings of system memory. This can be a problem with AGP on +* old G5 systems vs. TTM_PL_TT but we don't really have a choice +* at this point on ppc64 at least and the AGP on these never +* worked reliably anyway. +*/ +#if defined(CONFIG_PPC) && !defined(CONFIG_NOT_COHERENT_CACHE) + if (type == TTM_PL_SYSTEM || type == TTM_PL_TT) { + man->available_caching &= TTM_PL_FLAG_CACHED; + man->default_caching &= man->available_caching; + } +#endif +} + int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned type, unsigned long p_size) { @@ -1327,6 +1345,8 @@ int ttm_bo_init_mm(struct ttm_bo_device *bdev, unsigned ty return ret; man->bdev = bdev; + ttm_bo_filter_mem_type(bdev, type, man); + ret = 0; if (type != TTM_PL_SYSTEM) { ret = (*man->func->init)(man, p_size);
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #10 from smoki --- OK i will bisect this, having now here pretty much clear case something like 3 times performance drop in OpenJK :) -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/acc1c2a6/attachment-0001.html>
[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process
On Thu, Sep 4, 2014 at 9:03 AM, Gupta, Sourab wrote: > On Wed, 2014-09-03 at 13:09 +, Daniel Vetter wrote: >> On Wed, Sep 03, 2014 at 11:49:52AM +, Gupta, Sourab wrote: >> > On Wed, 2014-09-03 at 10:58 +, Daniel Vetter wrote: >> > > On Wed, Sep 03, 2014 at 03:39:55PM +0530, sourab.gupta at intel.com >> > > wrote: >> > > > From: Sourab Gupta >> > > > >> > > > Currently the Graphics Driver provides an interface through which >> > > > one can get a snapshot of the overall Graphics memory consumption. >> > > > Also there is an interface available, which provides information >> > > > about the several memory related attributes of every single Graphics >> > > > buffer created by the various clients. >> > > > >> > > > There is a requirement of a new interface for achieving below >> > > > functionalities: >> > > > 1) Need to provide Client based detailed information about the >> > > > distribution of Graphics memory >> > > > 2) Need to provide an interface which can provide info about the >> > > > sharing of Graphics buffers between the clients. >> > > > >> > > > The client based interface would also aid in debugging of >> > > > memory usage/consumption by each client & debug memleak related issues. >> > > > >> > > > With this new interface, >> > > > 1) In case of memleak scenarios, we can easily zero in on the culprit >> > > > client which is unexpectedly holding on the Graphics buffers for an >> > > > inordinate amount of time. >> > > > 2) We can get an estimate of the instantaneous memory footprint of >> > > > every Graphics client. >> > > > 3) We can now trace all the processes sharing a particular Graphics >> > > > buffer. >> > > > >> > > > By means of this patch we try to provide a sysfs interface to achieve >> > > > the mentioned functionalities. >> > > > >> > > > There are two files created in sysfs: >> > > > 'i915_gem_meminfo' will provide summary of the graphics resources used >> > > > by >> > > > each graphics client. >> > > > 'i915_gem_objinfo' will provide detailed view of each object created by >> > > > individual clients. >> > > > >> > > > v2: Changes made for >> > > > - adding support to report user virtual addresses of mapped buffers >> > > > - replacing pid based reporting with tgid based one >> > > > - checkpatch and other misc cleanup >> > > > >> > > > Signed-off-by: Sourab Gupta >> > > > Signed-off-by: Akash Goel >> > > >> > > Sorry I didn't spot this the first time around, but I think sysfs is the >> > > wrong place for this. >> > > >> > > Generally sysfs is for setting/reading per-object values, and it has the >> > > big rule that there should be only _one_ value per file. The error state >> > > is a bit an exception, but otoh it's also just the full dump as a binary >> > > file (which for historical reasons is printed as ascii). >> > > >> > > The other issue is that imo this should be a generic interface, so that >> > > we >> > > can write a gpu_top tool for dumping memory consumers which works on all >> > > linux platforms. >> > > >> > > To avoid delaying for a long time can we just move ahead by putting this >> > > into debugfs? >> > > >> > > Also in debugfs there's already a lot of this stuff around - why is that >> > > not sufficient and could we extend it somehow with the missing bits? >> > > >> > > Thanks, Daniel >> > >> > Hi Daniel, >> > >> > Thanks for your inputs. >> > We had originally put the patch in sysfs, as there was a requirement for >> > this feature to be available in production kernels also. >> > We can move it to debugfs to move ahead with this. I'll submit the >> > debugfs version of this patch next time. >> >> Yeah sysfs is the only place where we have a stable api, but that also >> implies that requirements are a _lot_ more stringent. At least we need >> testcases to make sure the interface actually do what we want them to do, >> and to make sure we don't break the interface by accident. >> >> > Also, >> > we developed this new interface to overcome the deficiencies of existing >> > interface. With this new interface, we can provide client based detailed >> > information about the distribution of Graphics memory. This gives >> > information about the various states of the graphics objects opened per >> > process (summarized as well as detailed info) >> > It also gives information about Graphics buffers shared between the >> > clients, and gives user mapped virtual address of all the mapped >> > graphics buffers. >> > It was not feasible to fit all this info in the existing interface. So >> > we decided to go ahead with new interface for these functionality. >> >> Well the problem is that adding more files like that increases the >> maintenance burden. So if there's some way to compute the information you >> want from information already provided in debugfs, then I prefer we do >> that at first. >> -Daniel > > Hi Daniel, > > We went through the existing debugfs interfaces, but we couldn't derive > the information we need from these inter
[Intel-gfx] [PATCH v2] drm/i915: Sysfs interface to get GFX shmem usage stats per process
On Thu, 2014-09-04 at 10:01 +, Daniel Vetter wrote: > On Thu, Sep 4, 2014 at 9:03 AM, Gupta, Sourab > wrote: > > On Wed, 2014-09-03 at 13:09 +, Daniel Vetter wrote: > >> On Wed, Sep 03, 2014 at 11:49:52AM +, Gupta, Sourab wrote: > >> > On Wed, 2014-09-03 at 10:58 +, Daniel Vetter wrote: > >> > > On Wed, Sep 03, 2014 at 03:39:55PM +0530, sourab.gupta at intel.com > >> > > wrote: > >> > > > From: Sourab Gupta > >> > > > > >> > > > Currently the Graphics Driver provides an interface through which > >> > > > one can get a snapshot of the overall Graphics memory consumption. > >> > > > Also there is an interface available, which provides information > >> > > > about the several memory related attributes of every single Graphics > >> > > > buffer created by the various clients. > >> > > > > >> > > > There is a requirement of a new interface for achieving below > >> > > > functionalities: > >> > > > 1) Need to provide Client based detailed information about the > >> > > > distribution of Graphics memory > >> > > > 2) Need to provide an interface which can provide info about the > >> > > > sharing of Graphics buffers between the clients. > >> > > > > >> > > > The client based interface would also aid in debugging of > >> > > > memory usage/consumption by each client & debug memleak related > >> > > > issues. > >> > > > > >> > > > With this new interface, > >> > > > 1) In case of memleak scenarios, we can easily zero in on the culprit > >> > > > client which is unexpectedly holding on the Graphics buffers for an > >> > > > inordinate amount of time. > >> > > > 2) We can get an estimate of the instantaneous memory footprint of > >> > > > every Graphics client. > >> > > > 3) We can now trace all the processes sharing a particular Graphics > >> > > > buffer. > >> > > > > >> > > > By means of this patch we try to provide a sysfs interface to achieve > >> > > > the mentioned functionalities. > >> > > > > >> > > > There are two files created in sysfs: > >> > > > 'i915_gem_meminfo' will provide summary of the graphics resources > >> > > > used by > >> > > > each graphics client. > >> > > > 'i915_gem_objinfo' will provide detailed view of each object created > >> > > > by > >> > > > individual clients. > >> > > > > >> > > > v2: Changes made for > >> > > > - adding support to report user virtual addresses of mapped > >> > > > buffers > >> > > > - replacing pid based reporting with tgid based one > >> > > > - checkpatch and other misc cleanup > >> > > > > >> > > > Signed-off-by: Sourab Gupta > >> > > > Signed-off-by: Akash Goel > >> > > > >> > > Sorry I didn't spot this the first time around, but I think sysfs is > >> > > the > >> > > wrong place for this. > >> > > > >> > > Generally sysfs is for setting/reading per-object values, and it has > >> > > the > >> > > big rule that there should be only _one_ value per file. The error > >> > > state > >> > > is a bit an exception, but otoh it's also just the full dump as a > >> > > binary > >> > > file (which for historical reasons is printed as ascii). > >> > > > >> > > The other issue is that imo this should be a generic interface, so > >> > > that we > >> > > can write a gpu_top tool for dumping memory consumers which works on > >> > > all > >> > > linux platforms. > >> > > > >> > > To avoid delaying for a long time can we just move ahead by putting > >> > > this > >> > > into debugfs? > >> > > > >> > > Also in debugfs there's already a lot of this stuff around - why is > >> > > that > >> > > not sufficient and could we extend it somehow with the missing bits? > >> > > > >> > > Thanks, Daniel > >> > > >> > Hi Daniel, > >> > > >> > Thanks for your inputs. > >> > We had originally put the patch in sysfs, as there was a requirement for > >> > this feature to be available in production kernels also. > >> > We can move it to debugfs to move ahead with this. I'll submit the > >> > debugfs version of this patch next time. > >> > >> Yeah sysfs is the only place where we have a stable api, but that also > >> implies that requirements are a _lot_ more stringent. At least we need > >> testcases to make sure the interface actually do what we want them to do, > >> and to make sure we don't break the interface by accident. > >> > >> > Also, > >> > we developed this new interface to overcome the deficiencies of existing > >> > interface. With this new interface, we can provide client based detailed > >> > information about the distribution of Graphics memory. This gives > >> > information about the various states of the graphics objects opened per > >> > process (summarized as well as detailed info) > >> > It also gives information about Graphics buffers shared between the > >> > clients, and gives user mapped virtual address of all the mapped > >> > graphics buffers. > >> > It was not feasible to fit all this info in the existing interface. So > >> > we decided to go ahead with new interface for these functionality. > >> > >> Well the problem is
TTM placement & caching issue/questions
On Thu, Sep 04, 2014 at 09:44:04AM +0200, Thomas Hellstrom wrote: > Last time I tested, (and it seems like Michel is on the same track), > writing with the CPU to write-combined memory was substantially faster > than writing to cached memory, with the additional side-effect that CPU > caches are left unpolluted. > > Moreover (although only tested on Intel's embedded chipsets), texturing > from cpu-cache-coherent PCI memory was a real GPU performance hog > compared to texturing from non-snooped memory. Hence, whenever a buffer > could be classified as GPU-read-only (or almost at least), it should be > placed in write-combined memory. Just a quick comment since this explicitly referes to intel chips: On desktop/laptop chips with the big shared l3/l4 caches it's the other way round. Cached uploads are substantially faster than wc and not using coherent access is a severe perf hit for texturing. I guess the hw guys worked really hard to hide the snooping costs so that the gpu can benefit from the massive bandwidth these caches can provide. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
TTM placement & caching issue/questions
On 09/04/2014 10:06 AM, Benjamin Herrenschmidt wrote: > On Thu, 2014-09-04 at 09:44 +0200, Thomas Hellstrom wrote: > >>> This will, from what I can tell, try to use the same caching mode as the >>> original object: >>> >>> if ((cur_placement & caching) != 0) >>> result |= (cur_placement & caching); >>> >>> And cur_placement comes from bo->mem.placement which as far as I can >>> tell is based on the placement array which the drivers set up. >> This originates from the fact that when evicting GTT memory, on x86 it's >> unnecessary and undesirable to switch caching mode when going to system. > But that's what I don't quite understand. We have two different mappings > here. The VRAM and the memory object. We wouldn't be "switching"... we > are creating a temporary mapping for the memory object in order to do > the memcpy, but we seem to be doing it by using the caching attributes > of the VRAM object or am I missing something ? I don't see how that > makes sense so I suppose I'm missing something here :-) Well, the intention when TTM was written was that the driver writer should be smart enough that when he wanted a move from unached VRAM to system, he'd request cached system in the placement flags in the first place. If TTM somehow overrides such a request, that's a bug in TTM. If the move, for example, is a result of an eviction, then the driver evict_flags() function should ideally look at the current placement and decide about a suitable placement based on that: vram-to-system moves should generally request cacheable memory if the next access is expected by the CPU. Probably write-combined otherwise. If the move is the result of a TTM swapout, TTM will automatically select cachable system, and for most other moves, I think the driver writer is in full control. > >> Last time I tested, (and it seems like Michel is on the same track), >> writing with the CPU to write-combined memory was substantially faster >> than writing to cached memory, with the additional side-effect that CPU >> caches are left unpolluted. > That's very strange indeed. It's certainly an x86 specific artifact, > even if we were allowed by our hypervisor to map memory non-cachable > (the HW somewhat can), we tend to have a higher throughput by going > cachable, but that could be due to the way the PowerBus works (it's > basically very biased toward cachable transactions). > >> I dislike the approach of rewriting placements. In some cases I think it >> won't even work, because placements are declared 'static const' >> >> What I'd suggest is instead to intercept the driver response from >> init_mem_type() and filter out undesired caching modes from >> available_caching and default_caching, > This was my original intent but Jerome seems to have different ideas > (see his proposed patches). I'm happy to revive mine as well and post it > as an alternative after I've tested it a bit more (tomorrow). > >> perhaps also looking at whether >> the memory type is mappable or not. This should have the additional >> benefit of working everywhere, and if a caching mode is selected that's >> not available on the platform, you'll simply get an error. (I guess?) > You mean that if not mappable we don't bother filtering ? > > The rule is really for me pretty simple: > >- If it's system memory (PL_SYSTEM/PL_TT), it MUST be cachable > >- If it's PCIe memory space (VRAM, registers, ...) it MUST be > non-cachable. Yes, something along these lines. I guess checking for VRAM or TTM_MEMTYPE_FLAG_FIXED would perhaps do the trick /Thomas > > Cheers, > Ben. > >> /Thomas >> >> >>> Cheers, >>> Ben. >>> >>> >>> ___ >>> dri-devel mailing list >>> dri-devel at lists.freedesktop.org >>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=C9AHL1VngKBOxe2UrNP2eCZo6FLqdlr6Y90rpfE5rUs%3D%0A&s=73da0633bafc5d54bf116bc861d48d13c39cf8f41832adfb739709e98ec05553 >
[Bug 83436] Sudden framerate drops in multiple games
https://bugs.freedesktop.org/show_bug.cgi?id=83436 --- Comment #9 from Maciej --- Did an update today, performance decrease is still there, but fps drops are gone. I had no other apps running in the background, so I'm not sure what's up. However fps drops in TF2 are still a thing. As for bisecting, I really have no skills to do that, I'm just a gamer with AMD card :/ -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/627f9d1b/attachment.html>
[git pull] drm fixes
Hi Linus, just i915 and vmwgfx fixes, i915 contains a bunch of fixes for recent regressions in outputs, vmwgfx fixes a possible loop for ever and a bad return code. Dave. The following changes since commit 59753a805499f1ffbca4ac0a24b3dff67bf1: Merge tag 'backlight-fixes-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight (2014-08-28 10:47:10 -0700) are available in the git repository at: git://people.freedesktop.org/~airlied/linux drm-fixes for you to fetch changes up to 3aacfda0ecd9040521fbfb4a2c53cd6bf77ae4ee: Merge tag 'drm-intel-fixes-2014-09-03' of git://anongit.freedesktop.org/drm-intel into drm-fixes (2014-09-04 11:20:00 +1000) Dave Airlie (4): Merge tag 'drm-intel-fixes-2014-08-28' of git://anongit.freedesktop.org/drm-intel into drm-fixes drm/i915: handle G45/GM45 pulse detection connected state. Merge branch 'vmwgfx-fixes-3.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes Merge tag 'drm-intel-fixes-2014-09-03' of git://anongit.freedesktop.org/drm-intel into drm-fixes Mathias Krause (1): drm/i915: Remove bogus __init annotation from DMI callbacks Paulo Zanoni (1): drm/i915: fix plane/cursor handling when runtime suspended Scot Doyle (2): drm/i915: Ignore VBT backlight presence check on Acer C720 (4005U) drm/i915: don't warn if backlight unexpectedly enabled Thomas Hellstrom (2): drm/vmwgfx: Fix an incorrect OOM return value drm/vmwgfx: Fix a potential infinite spin waiting for fifo idle Ville Syrj?l? (2): drm/i915: Move intel_ddi_set_vc_payload_alloc(false) to haswell_crtc_disable() drm/i915: Fix lock dropping in intel_tv_detect() drivers/gpu/drm/i915/intel_bios.c | 2 +- drivers/gpu/drm/i915/intel_crt.c| 2 +- drivers/gpu/drm/i915/intel_display.c| 34 +--- drivers/gpu/drm/i915/intel_dp.c | 55 ++--- drivers/gpu/drm/i915/intel_lvds.c | 2 +- drivers/gpu/drm/i915/intel_panel.c | 8 ++--- drivers/gpu/drm/i915/intel_tv.c | 10 -- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 11 +++ drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c| 3 +- 9 files changed, 88 insertions(+), 39 deletions(-)
[Bug 82828] Regression: Crash in 3Dmark2001
https://bugs.freedesktop.org/show_bug.cgi?id=82828 --- Comment #20 from Pavel Ondra?ka --- Your patch does indeed fix the crashing tests, I still see some piglit regressions but that should be either bug 82882 or bug 82978. Thanks for the fix. -- You are receiving this mail because: You are the assignee for the bug. -- next part -- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20140904/41fa3826/attachment.html>
TTM placement & caching issue/questions
Hi folks ! I've been tracking down some problems with the recent DRI on powerpc and stumbled upon something that doesn't look right, and not necessarily only for us. Now it's possible that I haven't fully understood the code here and I also don't know to what extent some of that behaviour is necessary for some platforms such as Intel GTT bits. What I've observed with a simple/dumb (no DMA) driver like AST (but this probably happens more generally) is that when evicting a BO from VRAM into System memory, the TTM tries to preserve the existing caching attributes of the VRAM object. >From what I can tell, we end up with going from VRAM to System memory type, and we eventually call ttm_bo_select_caching() to select the caching option for the target. This will, from what I can tell, try to use the same caching mode as the original object: if ((cur_placement & caching) != 0) result |= (cur_placement & caching); And cur_placement comes from bo->mem.placement which as far as I can tell is based on the placement array which the drivers set up. Now they tend to uniformly setup the placement for System memory as TTM_PL_MASK_CACHING which enables all caching modes. So I end up with, for example, my System memory BOs having TTM_PL_FLAG_CACHED not set (though they also don't have TTM_PL_FLAG_UNCACHED) and TTM_PL_FLAG_WC. We don't seem to use the man->default_caching (which will have TTM_PL_FLAG_CACHED) unless there is no matching bit at all between the proposed placement and the existing caching mode. Now this is a problem for several reason that I can think of: - On a number of powerpc platforms, such as all our server 64-bit one for example, it's actually illegal to map system memory non-cached. The system is fully cache coherent for all possible DMA originators (that we care about at least) and mapping memory non-cachable while it's mapped cachable in the linear mapping can cause nasty cache paradox which, when detected by HW, can checkstop the system. - A similar issue exists, afaik, on ARM >= v7, so anything mapped non-cachable must be removed from the linear mapping explicitly since otherwise it can be speculatively prefetched into the cache. - I don't know about x86, but even then, it looks quite sub-optimal to map the memory backing of the BOs and access it using a WC rather than a cachable mapping attribute. Now, some folks on IRC mentioned that there might be reasons for the current behaviour as to not change the caching attributes when going in/out of the GTT on Intel, I don't know how that relates and how that works, but maybe that should be enforced by having a different placement mask specifically on those chipsets. Dave, should we change the various PCI drivers for generally coherent devices such that the System memory type doesn't allow placements without CACHED attribute ? Or at least on coherent platforms ? How do detect that ? Should we have a TTM helper to establish the default memory placement attributes that "normal PCI" drivers call to set that up so we can have all the necessary arch ifdefs in one single place, at least for "classic PCI/PCIe" stuff (AGP might need additional tweaks) ? Non-PCI and "special" drivers like Intel can use a different set of placement attributes to represent the requirements of those specific platforms (mostly thinking of embedded ARM here which under some circumstances might actually require non-cached mappings). Or am I missing another part of the puzzle ? As it-is, things are broken for me even for dumb drivers, and I suspect to a large extent with radeon and nouveau too, though in some case we might get away with it most of the time ... until the machine locks up for some unexplainable reason... This might cause problems on existing distros such as RHEL7 with our radeon adapters even. Any suggestion of what's the best approach to fix it ? I'm happy to produce the patches but I'm not that familiar with the TTM so I would like to make sure I'm the right track first :-) Cheers, Ben.
[PATCH 6/9] drm/ttm: force cached mapping on non x86 platform
On 09/04/2014 09:46 AM, Benjamin Herrenschmidt wrote: > From: J?r?me Glisse > > People interested in providing uncached or write combined mapping > on there architecture need to do the ground work inside there arch > specific code to allow to break the linear kernel mapping so that > page mapping attributes can be updated, in the meantime force cached > mapping for non x86 architecture. I don't like this patch. Please see my previous email. /Thomas