[PATCH 1/3 v2] drm/v3d: Take a lock across GPU scheduler job creation and queuing.

2018-06-06 Thread Eric Anholt
Between creation and queueing of a job, you need to prevent any other
job from being created and queued.  Otherwise the scheduler's fences
may be signaled out of seqno order.

v2: move mutex unlock to the error label.

Signed-off-by: Eric Anholt 
Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D 
V3.x+")
---
 drivers/gpu/drm/v3d/v3d_drv.h | 5 +
 drivers/gpu/drm/v3d/v3d_gem.c | 4 
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a043ac3aae98..26005abd9c5d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -85,6 +85,11 @@ struct v3d_dev {
 */
struct mutex reset_lock;
 
+   /* Lock taken when creating and pushing the GPU scheduler
+* jobs, to keep the sched-fence seqnos in order.
+*/
+   struct mutex sched_lock;
+
struct {
u32 num_allocated;
u32 pages_allocated;
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b513f9189caf..269fe16379c0 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -550,6 +550,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail;
 
+   mutex_lock(&v3d->sched_lock);
if (exec->bin.start != exec->bin.end) {
ret = drm_sched_job_init(&exec->bin.base,
 &v3d->queue[V3D_BIN].sched,
@@ -576,6 +577,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
kref_get(&exec->refcount); /* put by scheduler job completion */
drm_sched_entity_push_job(&exec->render.base,
  &v3d_priv->sched_entity[V3D_RENDER]);
+   mutex_unlock(&v3d->sched_lock);
 
v3d_attach_object_fences(exec);
 
@@ -594,6 +596,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
return 0;
 
 fail_unreserve:
+   mutex_unlock(&v3d->sched_lock);
v3d_unlock_bo_reservations(dev, exec, &acquire_ctx);
 fail:
v3d_exec_put(exec);
@@ -615,6 +618,7 @@ v3d_gem_init(struct drm_device *dev)
spin_lock_init(&v3d->job_lock);
mutex_init(&v3d->bo_lock);
mutex_init(&v3d->reset_lock);
+   mutex_init(&v3d->sched_lock);
 
/* Note: We don't allocate address 0.  Various bits of HW
 * treat 0 as special, such as the occlusion query counters
-- 
2.17.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 0/7] Enabling Color Management - Round 3

2018-06-06 Thread Leo Li



On 2018-06-06 01:03 PM, Michel Dänzer wrote:

On 2018-06-06 06:01 PM, Michel Dänzer wrote:

On 2018-06-01 06:03 PM, sunpeng...@amd.com wrote:

From: "Leo (Sunpeng) Li" 

This ended up being different enough from v2 to warrant a new patchset. Per
Michel's suggestions, there have been various optimizations and cleanups.
Here's what's changed:

* Cache DRM color management property IDs at pre-init,
 * instead of querying DRM each time we need to modify color properties.

* Remove drmmode_update_cm_props().
 * Update color properties in drmmode_output_get_property() instead.
 * This also makes old calls to update_cm_props redundant.

* Get rid of fake CRTCs.
 * Previously, we were allocating a fake CRTC to configure color props on
   outputs that don't have a CRTC.
 * Instead, rr_configure_and_change_cm_property() can be easily modified to
   accept NULL CRTCs.

* Drop patches to persist color properties across DPMS events.
 * Kernel driver should be patched instead:
   https://lists.freedesktop.org/archives/amd-gfx/2018-May/022744.html
 * Color props including legacy gamma now persist across crtc dpms.
 * Non-legacy props now persist across output dpms and hotplug, as long
   as the same CRTC remains attached to that output.

And some smaller improvements:

* Change CTM to be 32-bit format instead of 16-bit.
 * This requires clients to ensure that each 32-bit element is padded to be
   long-sized, since libXrandr parses 32-bit format as long-typed.

* Optimized color management init during CRTC init.
 * Query DRM once for the list of properties, instead of twice.


Sounds good. I'll be going through the patches in detail from now on,
but I don't know yet when I'll be able to finish the review.


Meanwhile, heads up on two issues I discovered while smoke-testing the
series (which are sort of related, but occur even without this series):


Running Xorg in depth 30[0] results in completely wrong colours
(everything has a red tint) with current kernels. I think this is
because DC now preserves the gamma LUT values, but xf86-video-amdgpu
never sets them at depth 30, so the hardware is still using values for
24-bit RGB.


Actually, looks like I made a mistake in my testing before; this issue
only occurs as of patch 6 of this series.



Hi Michel,

I'll look into this. Thanks for testing :)

Leo

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 0/7] Enabling Color Management - Round 3

2018-06-06 Thread Michel Dänzer
On 2018-06-06 06:01 PM, Michel Dänzer wrote:
> On 2018-06-01 06:03 PM, sunpeng...@amd.com wrote:
>> From: "Leo (Sunpeng) Li" 
>>
>> This ended up being different enough from v2 to warrant a new patchset. Per
>> Michel's suggestions, there have been various optimizations and cleanups.
>> Here's what's changed:
>>
>> * Cache DRM color management property IDs at pre-init,
>> * instead of querying DRM each time we need to modify color properties.
>>
>> * Remove drmmode_update_cm_props().
>> * Update color properties in drmmode_output_get_property() instead.
>> * This also makes old calls to update_cm_props redundant.
>>
>> * Get rid of fake CRTCs.
>> * Previously, we were allocating a fake CRTC to configure color props on
>>   outputs that don't have a CRTC.
>> * Instead, rr_configure_and_change_cm_property() can be easily modified 
>> to
>>   accept NULL CRTCs.
>>
>> * Drop patches to persist color properties across DPMS events.
>> * Kernel driver should be patched instead:
>>   https://lists.freedesktop.org/archives/amd-gfx/2018-May/022744.html
>> * Color props including legacy gamma now persist across crtc dpms.
>> * Non-legacy props now persist across output dpms and hotplug, as long
>>   as the same CRTC remains attached to that output.
>>
>> And some smaller improvements:
>>
>> * Change CTM to be 32-bit format instead of 16-bit.
>> * This requires clients to ensure that each 32-bit element is padded to 
>> be
>>   long-sized, since libXrandr parses 32-bit format as long-typed.
>>
>> * Optimized color management init during CRTC init.
>> * Query DRM once for the list of properties, instead of twice.
> 
> Sounds good. I'll be going through the patches in detail from now on,
> but I don't know yet when I'll be able to finish the review.
> 
> 
> Meanwhile, heads up on two issues I discovered while smoke-testing the
> series (which are sort of related, but occur even without this series):
> 
> 
> Running Xorg in depth 30[0] results in completely wrong colours
> (everything has a red tint) with current kernels. I think this is
> because DC now preserves the gamma LUT values, but xf86-video-amdgpu
> never sets them at depth 30, so the hardware is still using values for
> 24-bit RGB.

Actually, looks like I made a mistake in my testing before; this issue
only occurs as of patch 6 of this series.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH xf86-video-amdgpu 0/7] Enabling Color Management - Round 3

2018-06-06 Thread Michel Dänzer

Hi Leo,


On 2018-06-01 06:03 PM, sunpeng...@amd.com wrote:
> From: "Leo (Sunpeng) Li" 
> 
> This ended up being different enough from v2 to warrant a new patchset. Per
> Michel's suggestions, there have been various optimizations and cleanups.
> Here's what's changed:
> 
> * Cache DRM color management property IDs at pre-init,
> * instead of querying DRM each time we need to modify color properties.
> 
> * Remove drmmode_update_cm_props().
> * Update color properties in drmmode_output_get_property() instead.
> * This also makes old calls to update_cm_props redundant.
> 
> * Get rid of fake CRTCs.
> * Previously, we were allocating a fake CRTC to configure color props on
>   outputs that don't have a CRTC.
> * Instead, rr_configure_and_change_cm_property() can be easily modified to
>   accept NULL CRTCs.
> 
> * Drop patches to persist color properties across DPMS events.
> * Kernel driver should be patched instead:
>   https://lists.freedesktop.org/archives/amd-gfx/2018-May/022744.html
> * Color props including legacy gamma now persist across crtc dpms.
> * Non-legacy props now persist across output dpms and hotplug, as long
>   as the same CRTC remains attached to that output.
> 
> And some smaller improvements:
> 
> * Change CTM to be 32-bit format instead of 16-bit.
> * This requires clients to ensure that each 32-bit element is padded to be
>   long-sized, since libXrandr parses 32-bit format as long-typed.
> 
> * Optimized color management init during CRTC init.
> * Query DRM once for the list of properties, instead of twice.

Sounds good. I'll be going through the patches in detail from now on,
but I don't know yet when I'll be able to finish the review.


Meanwhile, heads up on two issues I discovered while smoke-testing the
series (which are sort of related, but occur even without this series):


Running Xorg in depth 30[0] results in completely wrong colours
(everything has a red tint) with current kernels. I think this is
because DC now preserves the gamma LUT values, but xf86-video-amdgpu
never sets them at depth 30, so the hardware is still using values for
24-bit RGB.

Can you look into making xf86-video-amdgpu set the LUT values at depth
30 as well? Ideally in a way which doesn't require all patches in this
series, so that it could be backported for a 18.0.2 point release if
necessary. (Similarly for skipping drmmode_cursor_gamma when the kernel
supports the new colour management properties, to fix
https://bugs.freedesktop.org/106578)


Trying to run Xorg in depth 16 or 8[0] results in failure to set any mode:

[56.138] (EE) AMDGPU(0): failed to set mode: Invalid argument
[56.138] (WW) AMDGPU(0): Failed to set mode on CRTC 0
[56.172] (EE) AMDGPU(0): Failed to enable any CRTC

Works fine with amdgpu.dc=0. This has been broken at least since the
4.16 development cycle.


[0] You can change Xorg's colour depth either via -depth on its command
line, or via the xorg.conf screen section:

Section "Screen"
Identifier "Screen0"
DefaultDepth 30 # or 16 or 8
EndSection


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/5] dma_buf: remove device parameter from attach callback

2018-06-06 Thread Christian König

Just a gentle ping.

Daniel, Chris and all the other usual suspects for infrastructure stuff: 
What do you think about that?


The cleanup patches are rather obvious correct, but #3 could result in 
some fallout.


I really think it is the right thing in the long term.

Regards,
Christian.

Am 01.06.2018 um 14:00 schrieb Christian König:

The device parameter is completely unused because it is available in the
attachment structure as well.

Signed-off-by: Christian König 
---
  drivers/dma-buf/dma-buf.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 3 +--
  drivers/gpu/drm/drm_prime.c   | 3 +--
  drivers/gpu/drm/udl/udl_dmabuf.c  | 1 -
  drivers/gpu/drm/vmwgfx/vmwgfx_prime.c | 1 -
  drivers/media/common/videobuf2/videobuf2-dma-contig.c | 2 +-
  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 2 +-
  drivers/media/common/videobuf2/videobuf2-vmalloc.c| 2 +-
  include/drm/drm_prime.h   | 2 +-
  include/linux/dma-buf.h   | 3 +--
  10 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index d78d5fc173dc..e99a8d19991b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -568,7 +568,7 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf 
*dmabuf,
mutex_lock(&dmabuf->lock);
  
  	if (dmabuf->ops->attach) {

-   ret = dmabuf->ops->attach(dmabuf, dev, attach);
+   ret = dmabuf->ops->attach(dmabuf, attach);
if (ret)
goto err_attach;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 4683626b065f..f1500f1ec0f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -133,7 +133,6 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
  }
  
  static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,

-struct device *target_dev,
 struct dma_buf_attachment *attach)
  {
struct drm_gem_object *obj = dma_buf->priv;
@@ -141,7 +140,7 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
long r;
  
-	r = drm_gem_map_attach(dma_buf, target_dev, attach);

+   r = drm_gem_map_attach(dma_buf, attach);
if (r)
return r;
  
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c

index 7856a9b3f8a8..4a3a232fea67 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -186,7 +186,6 @@ static int drm_prime_lookup_buf_handle(struct 
drm_prime_file_private *prime_fpri
  /**
   * drm_gem_map_attach - dma_buf attach implementation for GEM
   * @dma_buf: buffer to attach device to
- * @target_dev: not used
   * @attach: buffer attachment data
   *
   * Allocates &drm_prime_attachment and calls &drm_driver.gem_prime_pin for
@@ -195,7 +194,7 @@ static int drm_prime_lookup_buf_handle(struct 
drm_prime_file_private *prime_fpri
   *
   * Returns 0 on success, negative error code on failure.
   */
-int drm_gem_map_attach(struct dma_buf *dma_buf, struct device *target_dev,
+int drm_gem_map_attach(struct dma_buf *dma_buf,
   struct dma_buf_attachment *attach)
  {
struct drm_prime_attachment *prime_attach;
diff --git a/drivers/gpu/drm/udl/udl_dmabuf.c b/drivers/gpu/drm/udl/udl_dmabuf.c
index 2867ed155ff6..5fdc8bdc2026 100644
--- a/drivers/gpu/drm/udl/udl_dmabuf.c
+++ b/drivers/gpu/drm/udl/udl_dmabuf.c
@@ -29,7 +29,6 @@ struct udl_drm_dmabuf_attachment {
  };
  
  static int udl_attach_dma_buf(struct dma_buf *dmabuf,

- struct device *dev,
  struct dma_buf_attachment *attach)
  {
struct udl_drm_dmabuf_attachment *udl_attach;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_prime.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_prime.c
index 0d42a46521fc..fbffb37ccf42 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_prime.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_prime.c
@@ -40,7 +40,6 @@
   */
  
  static int vmw_prime_map_attach(struct dma_buf *dma_buf,

-   struct device *target_dev,
struct dma_buf_attachment *attach)
  {
return -ENOSYS;
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c 
b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index f1178f6f434d..12d0072c52c2 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -222,7 +222,7 @@ struct vb2_dc_attachment {
enum dma_data_direction dma_dir;
  };
  
-static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf, struct device *dev,

+static int vb2_dc_dmabuf_ops_attach(struct dma_buf *dbuf,

Re: [PATCH 3/3] drm/amd/amdgpu: Fix NULL pointer OOPS during S3

2018-06-06 Thread Christian König

NAK as well. mem->mm_node can't be NULL on a correctly allocated BO.

You are running into a BO corruption here and trying to work around by 
mitigating the effect and not fixing the root problem.


Regards,
Christian.

Am 06.06.2018 um 11:25 schrieb Pratik Vishwakarma:

Fixes NULL pointer dereference in amdgpu_ttm_copy_mem_to_mem

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: amdgpu_ttm_copy_mem_to_mem+0x85/0x40c
Workqueue: events_unbound async_run_entry_fn
Call Trace:
? _raw_spin_unlock+0xe/0x20
? ttm_check_swapping+0x4e/0x72
? ttm_mem_global_reserve.constprop.4+0xb1/0xc0
amdgpu_move_blit+0x80/0xe2
amdgpu_bo_move+0x114/0x155
ttm_bo_handle_move_mem+0x1f7/0x34a
? ttm_bo_mem_space+0x162/0x38e
? dev_vprintk_emit+0x10a/0x1f2
ttm_bo_evict+0x13e/0x2e9
? do_wait_for_common+0x151/0x187
ttm_mem_evict_first+0x136/0x181
ttm_bo_force_list_clean+0x78/0x10f
amdgpu_device_suspend+0x167/0x210
pci_pm_suspend+0x12a/0x1a5
? pci_dev_driver+0x36/0x36
dpm_run_callback+0x59/0xbf
__device_suspend+0x215/0x33f
async_suspend+0x1f/0x5c
async_run_entry_fn+0x3d/0xd2
process_one_work+0x1b0/0x314
worker_thread+0x1cb/0x2c1
? create_worker+0x1da/0x1da
kthread+0x156/0x15e
? kthread_flush_work+0xea/0xea
ret_from_fork+0x22/0x40

Signed-off-by: Pratik Vishwakarma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 57d4da6..f9de429 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -414,12 +414,16 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
return -EINVAL;
}
  
+	if (!src->mem->mm_node)

+   return -EINVAL;
src_mm = amdgpu_find_mm_node(src->mem, &src->offset);
src_node_start = amdgpu_mm_node_addr(src->bo, src_mm, src->mem) +
 src->offset;
src_node_size = (src_mm->size << PAGE_SHIFT) - src->offset;
src_page_offset = src_node_start & (PAGE_SIZE - 1);
  
+	if (!dst->mem->mm_node)

+   return -EINVAL;
dst_mm = amdgpu_find_mm_node(dst->mem, &dst->offset);
dst_node_start = amdgpu_mm_node_addr(dst->bo, dst_mm, dst->mem) +
 dst->offset;


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] drm/amd/amdgpu: Fix crash in amdgpu_bo_reserve

2018-06-06 Thread Christian König
NAK, when bo->tbo.resv is NULL then the BO is corrupted (or already 
released).


Please find the root cause of that corruption or freed memory access 
instead of adding such crude workarounds.


Regards,
Christian.

Am 06.06.2018 um 11:25 schrieb Pratik Vishwakarma:

Fixes null pointer access in ww_mutex_lock
where lock->base is NULL

Crash dump is as follows:
Call Trace:
ww_mutex_lock+0x3a/0x8e
amdgpu_bo_reserve+0x40/0x87
amdgpu_device_suspend+0xf4/0x210
pci_pm_suspend+0x12a/0x1a5
? pci_dev_driver+0x36/0x36
dpm_run_callback+0x59/0xbf
__device_suspend+0x215/0x33f
async_suspend+0x1f/0x5c
async_run_entry_fn+0x3d/0xd2
process_one_work+0x1b0/0x314
worker_thread+0x1cb/0x2c1
? create_worker+0x1da/0x1da
kthread+0x156/0x15e
? kthread_flush_work+0xea/0xea
ret_from_fork+0x22/0x40

Signed-off-by: Pratik Vishwakarma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 7317480..c9df7ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -152,6 +152,8 @@ static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, 
bool no_intr)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
int r;
  
+	if (&(bo->tbo.resv->lock) == NULL)

+   return -EINVAL;
r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amd/amdgpu: Fix crash in amdgpu_bo_reserve

2018-06-06 Thread Pratik Vishwakarma
Fixes null pointer access in ww_mutex_lock
where lock->base is NULL

Crash dump is as follows:
Call Trace:
ww_mutex_lock+0x3a/0x8e
amdgpu_bo_reserve+0x40/0x87
amdgpu_device_suspend+0xf4/0x210
pci_pm_suspend+0x12a/0x1a5
? pci_dev_driver+0x36/0x36
dpm_run_callback+0x59/0xbf
__device_suspend+0x215/0x33f
async_suspend+0x1f/0x5c
async_run_entry_fn+0x3d/0xd2
process_one_work+0x1b0/0x314
worker_thread+0x1cb/0x2c1
? create_worker+0x1da/0x1da
kthread+0x156/0x15e
? kthread_flush_work+0xea/0xea
ret_from_fork+0x22/0x40

Signed-off-by: Pratik Vishwakarma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 7317480..c9df7ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -152,6 +152,8 @@ static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, 
bool no_intr)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
int r;
 
+   if (&(bo->tbo.resv->lock) == NULL)
+   return -EINVAL;
r = ttm_bo_reserve(&bo->tbo, !no_intr, false, NULL);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/3] drm/amd/amdgpu: Fix NULL pointer OOPS during S3

2018-06-06 Thread Pratik Vishwakarma
Fixes NULL pointer dereference in amdgpu_ttm_copy_mem_to_mem

BUG: unable to handle kernel NULL pointer dereference at 0010
IP: amdgpu_ttm_copy_mem_to_mem+0x85/0x40c
Workqueue: events_unbound async_run_entry_fn
Call Trace:
? _raw_spin_unlock+0xe/0x20
? ttm_check_swapping+0x4e/0x72
? ttm_mem_global_reserve.constprop.4+0xb1/0xc0
amdgpu_move_blit+0x80/0xe2
amdgpu_bo_move+0x114/0x155
ttm_bo_handle_move_mem+0x1f7/0x34a
? ttm_bo_mem_space+0x162/0x38e
? dev_vprintk_emit+0x10a/0x1f2
ttm_bo_evict+0x13e/0x2e9
? do_wait_for_common+0x151/0x187
ttm_mem_evict_first+0x136/0x181
ttm_bo_force_list_clean+0x78/0x10f
amdgpu_device_suspend+0x167/0x210
pci_pm_suspend+0x12a/0x1a5
? pci_dev_driver+0x36/0x36
dpm_run_callback+0x59/0xbf
__device_suspend+0x215/0x33f
async_suspend+0x1f/0x5c
async_run_entry_fn+0x3d/0xd2
process_one_work+0x1b0/0x314
worker_thread+0x1cb/0x2c1
? create_worker+0x1da/0x1da
kthread+0x156/0x15e
? kthread_flush_work+0xea/0xea
ret_from_fork+0x22/0x40

Signed-off-by: Pratik Vishwakarma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 57d4da6..f9de429 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -414,12 +414,16 @@ int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev,
return -EINVAL;
}
 
+   if (!src->mem->mm_node)
+   return -EINVAL;
src_mm = amdgpu_find_mm_node(src->mem, &src->offset);
src_node_start = amdgpu_mm_node_addr(src->bo, src_mm, src->mem) +
 src->offset;
src_node_size = (src_mm->size << PAGE_SHIFT) - src->offset;
src_page_offset = src_node_start & (PAGE_SIZE - 1);
 
+   if (!dst->mem->mm_node)
+   return -EINVAL;
dst_mm = amdgpu_find_mm_node(dst->mem, &dst->offset);
dst_node_start = amdgpu_mm_node_addr(dst->bo, dst_mm, dst->mem) +
 dst->offset;
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 0/3] Fix S3 entry crashes

2018-06-06 Thread Pratik Vishwakarma
This patch series resolves crashes observed during S3 entry.
First patch removes the cursor BO hack which causes crashes
mentioned in patches 2 and 3. Patches 2 and 3 add NULL
checks to prevent crashing the system.

Pratik Vishwakarma (3):
  drm/amd/display: Remove cursor hack for S3
  drm/amd/amdgpu: Fix crash in amdgpu_bo_reserve
  drm/amd/amdgpu: Fix NULL pointer OOPS during S3

 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 17 +
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
 3 files changed, 15 insertions(+), 15 deletions(-)

-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/amd/display: remove cursor hack for S3

2018-06-06 Thread Pratik Vishwakarma
cursor_bo operations cause crash during S3 entry.

On cursor movement between displays during S3 cycles
the system crashes on S3 entry.

These crashes are no more seen with this patch applied.

Also as per the comment just above the code that this
patch removes
"IN 4.10 kernel this code should be removed and
amdgpu_device_suspend code touching fram buffers
should be avoided for DC"

Signed-off-by: Pratik Vishwakarma 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index a2009d5..8c31abf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3138,17 +3138,6 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
}
}
 
-   /* It's a hack for s3 since in 4.9 kernel filter out cursor buffer
-* prepare and cleanup in drm_atomic_helper_prepare_planes
-* and drm_atomic_helper_cleanup_planes because fb doens't in s3.
-* IN 4.10 kernel this code should be removed and amdgpu_device_suspend
-* code touching fram buffers should be avoided for DC.
-*/
-   if (plane->type == DRM_PLANE_TYPE_CURSOR) {
-   struct amdgpu_crtc *acrtc = to_amdgpu_crtc(new_state->crtc);
-
-   acrtc->cursor_bo = obj;
-   }
return 0;
 }
 
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Rename cleanup functions.

2018-06-06 Thread Lucas Stach
Am Dienstag, den 05.06.2018, 13:02 -0400 schrieb Andrey Grodzovsky:
> Everything in the flush code path (i.e. waiting for SW queue
> to become empty) names with *_flush()
> and everything in the release code path names *_fini()
> 
> This patch also effect the amdgpu and etnaviv drivers which
> use those functions.
> 
> > Signed-off-by: Andrey Grodzovsky 
> > Suggested-by: Christian König 
> ---
[...]
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 23e73c2..3dff4d0 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -140,7 +140,7 @@ static void etnaviv_postclose(struct drm_device *dev, 
> struct drm_file *file)
> >     gpu->lastctx = NULL;
> >     mutex_unlock(&gpu->lock);
>  
> > -   drm_sched_entity_fini(&gpu->sched,
> > +   drm_sched_entity_destroy(&gpu->sched,
>     &ctx->sched_entity[i]);

Style nit: this disaligns the second row of parameters to the opening
parenthesis where it was previously aligned. I would prefer if the
second line is also changed to keep the alignment.

Acked-by: Lucas Stach 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/v3d: Take a lock across GPU scheduler job creation and queuing.

2018-06-06 Thread Christian König

Am 06.06.2018 um 10:46 schrieb Lucas Stach:

Am Dienstag, den 05.06.2018, 12:03 -0700 schrieb Eric Anholt:

Between creation and queueing of a job, you need to prevent any other
job from being created and queued.  Otherwise the scheduler's fences
may be signaled out of seqno order.


Signed-off-by: Eric Anholt 

Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D 
V3.x+")
---

ccing amd-gfx due to interaction of this series with the scheduler.

  drivers/gpu/drm/v3d/v3d_drv.h |  5 +
  drivers/gpu/drm/v3d/v3d_gem.c | 11 +--
  2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index a043ac3aae98..26005abd9c5d 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -85,6 +85,11 @@ struct v3d_dev {

     */
    struct mutex reset_lock;
  

+   /* Lock taken when creating and pushing the GPU scheduler
+    * jobs, to keep the sched-fence seqnos in order.
+    */
+   struct mutex sched_lock;

+

    struct {
    u32 num_allocated;
    u32 pages_allocated;

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index b513f9189caf..9ea83bdb9a30 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -550,13 +550,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,

    if (ret)
    goto fail;
  

+   mutex_lock(&v3d->sched_lock);
    if (exec->bin.start != exec->bin.end) {
    ret = drm_sched_job_init(&exec->bin.base,
     &v3d->queue[V3D_BIN].sched,
     &v3d_priv->sched_entity[V3D_BIN],
     v3d_priv);
-   if (ret)
+   if (ret) {
+   mutex_unlock(&v3d->sched_lock);

    goto fail_unreserve;

I don't see any path where you would go to fail_unreserve with the
mutex not yet locked, so you could just fold the mutex_unlock into this
error path for a bit less code duplication.

Otherwise this looks fine.


Yeah, agree that could be cleaned up.

I can't judge the correctness of the driver, but at least the scheduler 
handling looks good to me.


Regards,
Christian.



Regards,
Lucas


+   }
  

    exec->bin_done_fence =
    dma_fence_get(&exec->bin.base.s_fence->finished);

@@ -570,12 +573,15 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,

     &v3d->queue[V3D_RENDER].sched,
     &v3d_priv->sched_entity[V3D_RENDER],
     v3d_priv);
-   if (ret)
+   if (ret) {
+   mutex_unlock(&v3d->sched_lock);
    goto fail_unreserve;
+   }
  

    kref_get(&exec->refcount); /* put by scheduler job completion */
    drm_sched_entity_push_job(&exec->render.base,
      &v3d_priv->sched_entity[V3D_RENDER]);
+   mutex_unlock(&v3d->sched_lock);
  

    v3d_attach_object_fences(exec);
  
@@ -615,6 +621,7 @@ v3d_gem_init(struct drm_device *dev)

    spin_lock_init(&v3d->job_lock);
    mutex_init(&v3d->bo_lock);
    mutex_init(&v3d->reset_lock);
+   mutex_init(&v3d->sched_lock);
  

    /* Note: We don't allocate address 0.  Various bits of HW
     * treat 0 as special, such as the occlusion query counters

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/3] drm/v3d: Take a lock across GPU scheduler job creation and queuing.

2018-06-06 Thread Lucas Stach
Am Dienstag, den 05.06.2018, 12:03 -0700 schrieb Eric Anholt:
> Between creation and queueing of a job, you need to prevent any other
> job from being created and queued.  Otherwise the scheduler's fences
> may be signaled out of seqno order.
> 
> > Signed-off-by: Eric Anholt 
> Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D 
> V3.x+")
> ---
> 
> ccing amd-gfx due to interaction of this series with the scheduler.
> 
>  drivers/gpu/drm/v3d/v3d_drv.h |  5 +
>  drivers/gpu/drm/v3d/v3d_gem.c | 11 +--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
> index a043ac3aae98..26005abd9c5d 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -85,6 +85,11 @@ struct v3d_dev {
> >      */
> >     struct mutex reset_lock;
>  
> > +   /* Lock taken when creating and pushing the GPU scheduler
> > +    * jobs, to keep the sched-fence seqnos in order.
> > +    */
> > +   struct mutex sched_lock;
> +
> >     struct {
> >     u32 num_allocated;
> >     u32 pages_allocated;
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
> index b513f9189caf..9ea83bdb9a30 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -550,13 +550,16 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >     if (ret)
> >     goto fail;
>  
> > +   mutex_lock(&v3d->sched_lock);
> >     if (exec->bin.start != exec->bin.end) {
> >     ret = drm_sched_job_init(&exec->bin.base,
> >      &v3d->queue[V3D_BIN].sched,
> >      &v3d_priv->sched_entity[V3D_BIN],
> >      v3d_priv);
> > -   if (ret)
> > +   if (ret) {
> > +   mutex_unlock(&v3d->sched_lock);
>   goto fail_unreserve;

I don't see any path where you would go to fail_unreserve with the
mutex not yet locked, so you could just fold the mutex_unlock into this
error path for a bit less code duplication.

Otherwise this looks fine.

Regards,
Lucas

> + }
>  
> >     exec->bin_done_fence =
> >     dma_fence_get(&exec->bin.base.s_fence->finished);
> @@ -570,12 +573,15 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
> >      &v3d->queue[V3D_RENDER].sched,
> >      &v3d_priv->sched_entity[V3D_RENDER],
> >      v3d_priv);
> > -   if (ret)
> > +   if (ret) {
> > +   mutex_unlock(&v3d->sched_lock);
> >     goto fail_unreserve;
> > +   }
>  
> >     kref_get(&exec->refcount); /* put by scheduler job completion */
> >     drm_sched_entity_push_job(&exec->render.base,
> >       &v3d_priv->sched_entity[V3D_RENDER]);
> > +   mutex_unlock(&v3d->sched_lock);
>  
> >     v3d_attach_object_fences(exec);
>  
> @@ -615,6 +621,7 @@ v3d_gem_init(struct drm_device *dev)
> >     spin_lock_init(&v3d->job_lock);
> >     mutex_init(&v3d->bo_lock);
> >     mutex_init(&v3d->reset_lock);
> > +   mutex_init(&v3d->sched_lock);
>  
> >     /* Note: We don't allocate address 0.  Various bits of HW
> >      * treat 0 as special, such as the occlusion query counters
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 1/2] drm/scheduler: Rename cleanup functions.

2018-06-06 Thread Christian König

Am 06.06.2018 um 09:03 schrieb Michel Dänzer:

On 2018-06-05 07:02 PM, Andrey Grodzovsky wrote:

Everything in the flush code path (i.e. waiting for SW queue
to become empty) names with *_flush()
and everything in the release code path names *_fini()

This patch also effect the amdgpu and etnaviv drivers which
use those functions.

Signed-off-by: Andrey Grodzovsky 
Suggested-by: Christian König 

[...]


diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
index 23e73c2..3dff4d0 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
@@ -140,7 +140,7 @@ static void etnaviv_postclose(struct drm_device *dev, 
struct drm_file *file)
gpu->lastctx = NULL;
mutex_unlock(&gpu->lock);
  
-			drm_sched_entity_fini(&gpu->sched,

+   drm_sched_entity_destroy(&gpu->sched,
  &ctx->sched_entity[i]);
}
}

The drm-next tree for 4.18 has a new v3d driver, which also uses
drm_sched_entity_fini. Please either make sure to merge this change via
a tree which has the v3d driver, and fix it up as well, or don't do the
fini => destroy rename.


I think we should just wait for the next rebase of amd-staging-drm-next 
before pushing this cleanup.


Alex was already preparing that before his vacation, so that should 
happen shortly after he's back.


Apart from that looks good to me,
Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)

2018-06-06 Thread Christian König

Am 05.06.2018 um 16:44 schrieb Borislav Petkov:

Hi guys,

X just froze here ontop of 4.17-rc7+ tip/master (kernel is from last
week) with the splat at the end.

Box is a x470 chipset with Ryzen 2700X.

GPU gets detected as

[7.440971] [drm] radeon kernel modesetting enabled.
[7.441220] [drm] initializing kernel modesetting (RV635 0x1002:0x9598 
0x1043:0x01DA 0x00).

...


in the PCIe slot with two monitors connected to it. radeon firmware is

Version: 20170823-1

What practically happened is X froze and got restarted after the GPU
reset. It seems to be ok now, as I'm typing in it.

Thoughts?


Well what did you do to trigger the lockup? Looks like an application 
send something to the hardware to crash the GFX block.


Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)

2018-06-06 Thread Huang Rui
On Tue, Jun 05, 2018 at 04:44:04PM +0200, Borislav Petkov wrote:
> Hi guys,
> 
> X just froze here ontop of 4.17-rc7+ tip/master (kernel is from last
> week) with the splat at the end.
> 
> Box is a x470 chipset with Ryzen 2700X.
> 
> GPU gets detected as
> 
> [7.440971] [drm] radeon kernel modesetting enabled.
> [7.441220] [drm] initializing kernel modesetting (RV635 0x1002:0x9598 
> 0x1043:0x01DA 0x00).
> [7.441328] ATOM BIOS: 9598.10.88.0.3.AS05
> [7.441395] radeon :1d:00.0: VRAM: 512M 0x - 
> 0x1FFF (512M used)
> [7.441464] radeon :1d:00.0: GTT: 512M 0x2000 - 
> 0x3FFF
> [7.441531] [drm] Detected VRAM RAM=512M, BAR=256M
> [7.441588] [drm] RAM width 128bits DDR
> [7.441690] [TTM] Zone  kernel: Available graphics memory: 16462214 kiB
> [7.441751] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
> [7.441811] [TTM] Initializing pool allocator
> [7.441868] [TTM] Initializing DMA pool allocator
> [7.441934] [drm] radeon: 512M of VRAM memory ready
> [7.441990] [drm] radeon: 512M of GTT memory ready.
> [7.442050] [drm] Loading RV635 Microcode
> [7.442865] [drm] Internal thermal controller without fan control
> [7.442940] [drm] radeon: power management initialized
> [7.443222] [drm] GART: num cpu pages 131072, num gpu pages 131072
> [7.443487] [drm] enabling PCIE gen 2 link speeds, disable with 
> radeon.pcie_gen2=0
> [7.477319] [drm] PCIE GART of 512M enabled (table at 0x00142000).
> [7.477400] radeon :1d:00.0: WB enabled
> [7.477455] radeon :1d:00.0: fence driver on ring 0 use gpu addr 
> 0x2c00 and cpu addr 0x(ptrval)
> [7.477708] radeon :1d:00.0: fence driver on ring 5 use gpu addr 
> 0x000521d0 and cpu addr 0x(ptrval)
> [7.48] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [7.477836] [drm] Driver supports precise vblank timestamp query.
> [7.477896] radeon :1d:00.0: radeon: MSI limited to 32-bit
> [7.477990] radeon :1d:00.0: radeon: using MSI.
> [7.478062] [drm] radeon: irq initialized.
> [7.509056] [drm] ring test on 0 succeeded in 0 usecs
> [7.683793] [drm] ring test on 5 succeeded in 1 usecs
> [7.683853] [drm] UVD initialized successfully.
> [7.684009] [drm] ib test on ring 0 succeeded in 0 usecs
> [8.348466] [drm] ib test on ring 5 succeeded
> [8.348921] [drm] Radeon Display Connectors
> [8.348978] [drm] Connector 0:
> [8.349031] [drm]   DVI-I-1
> [8.349082] [drm]   HPD1
> [8.349135] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 
> 0x7e5c
> [8.349200] [drm]   Encoders:
> [8.349252] [drm] DFP1: INTERNAL_UNIPHY
> [8.349308] [drm] CRT2: INTERNAL_KLDSCP_DAC2
> [8.349364] [drm] Connector 1:
> [8.349416] [drm]   DIN-1
> [8.349467] [drm]   Encoders:
> [8.349520] [drm] TV1: INTERNAL_KLDSCP_DAC2
> [8.349576] [drm] Connector 2:
> [8.349628] [drm]   DVI-I-2
> [8.349680] [drm]   HPD2
> [8.349732] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 
> 0x7e4c
> [8.349797] [drm]   Encoders:
> [8.349849] [drm] CRT1: INTERNAL_KLDSCP_DAC1
> [8.349905] [drm] DFP2: INTERNAL_KLDSCP_LVTMA
> [8.430521] [drm] fb mappable at 0xE0243000
> [8.430575] [drm] vram apper at 0xE000
> [8.431194] [drm] size 9216000
> [8.431245] [drm] fb depth is 24
> [8.431295] [drm]pitch is 7680
> [8.431406] fbcon: radeondrmfb (fb0) is primary device
> [8.496928] Console: switching to colour frame buffer device 240x75
> [8.501851] radeon :1d:00.0: fb0: radeondrmfb frame buffer device
> [8.520179] [drm] Initialized radeon 2.50.0 20080528 for :1d:00.0 on 
> minor 0
> 
> in the PCIe slot with two monitors connected to it. radeon firmware is
> 
> Version: 20170823-1
> 
> What practically happened is X froze and got restarted after the GPU
> reset. It seems to be ok now, as I'm typing in it.
> 
> Thoughts?
> 
> [197439.022249] Restarting tasks ... done.
> [197439.024043] PM: hibernation exit
> [197439.058296] r8169 :18:00.0 eth0: link up
> [200941.240184] perf: interrupt took too long (2507 > 2500), lowering 
> kernel.perf_event_max_sample_rate to 79750
> [221973.686894] radeon :1d:00.0: ring 0 stalled for more than 10176msec
> [221973.686900] radeon :1d:00.0: GPU lockup (current fence id 
> 0x010da43f last fence id 0x010da52d on ring 0)
> [221973.686929] radeon :1d:00.0: failed to get a new IB (-35)
> [221973.686950] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
> [221973.693971] radeon :1d:00.0: Saved 7609 dwords of commands on ring 0.
> [221973.693985] radeon :1d:00.0: GPU softreset: 0x0008
> [221973.693988] radeon :1d:00.0:   R_008010_GRBM_STATUS  = 0xA0001030
> [221973.693990] radeon :1d:00.0:   R_008014_GRBM_STATUS2 = 0x0003
> [2

radeon 0000:1d:00.0: GPU lockup (current fence id 0x00000000010da43f last fence id 0x00000000010da52d on ring 0)

2018-06-06 Thread Borislav Petkov
Hi guys,

X just froze here ontop of 4.17-rc7+ tip/master (kernel is from last
week) with the splat at the end.

Box is a x470 chipset with Ryzen 2700X.

GPU gets detected as

[7.440971] [drm] radeon kernel modesetting enabled.
[7.441220] [drm] initializing kernel modesetting (RV635 0x1002:0x9598 
0x1043:0x01DA 0x00).
[7.441328] ATOM BIOS: 9598.10.88.0.3.AS05
[7.441395] radeon :1d:00.0: VRAM: 512M 0x - 
0x1FFF (512M used)
[7.441464] radeon :1d:00.0: GTT: 512M 0x2000 - 
0x3FFF
[7.441531] [drm] Detected VRAM RAM=512M, BAR=256M
[7.441588] [drm] RAM width 128bits DDR
[7.441690] [TTM] Zone  kernel: Available graphics memory: 16462214 kiB
[7.441751] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[7.441811] [TTM] Initializing pool allocator
[7.441868] [TTM] Initializing DMA pool allocator
[7.441934] [drm] radeon: 512M of VRAM memory ready
[7.441990] [drm] radeon: 512M of GTT memory ready.
[7.442050] [drm] Loading RV635 Microcode
[7.442865] [drm] Internal thermal controller without fan control
[7.442940] [drm] radeon: power management initialized
[7.443222] [drm] GART: num cpu pages 131072, num gpu pages 131072
[7.443487] [drm] enabling PCIE gen 2 link speeds, disable with 
radeon.pcie_gen2=0
[7.477319] [drm] PCIE GART of 512M enabled (table at 0x00142000).
[7.477400] radeon :1d:00.0: WB enabled
[7.477455] radeon :1d:00.0: fence driver on ring 0 use gpu addr 
0x2c00 and cpu addr 0x(ptrval)
[7.477708] radeon :1d:00.0: fence driver on ring 5 use gpu addr 
0x000521d0 and cpu addr 0x(ptrval)
[7.48] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[7.477836] [drm] Driver supports precise vblank timestamp query.
[7.477896] radeon :1d:00.0: radeon: MSI limited to 32-bit
[7.477990] radeon :1d:00.0: radeon: using MSI.
[7.478062] [drm] radeon: irq initialized.
[7.509056] [drm] ring test on 0 succeeded in 0 usecs
[7.683793] [drm] ring test on 5 succeeded in 1 usecs
[7.683853] [drm] UVD initialized successfully.
[7.684009] [drm] ib test on ring 0 succeeded in 0 usecs
[8.348466] [drm] ib test on ring 5 succeeded
[8.348921] [drm] Radeon Display Connectors
[8.348978] [drm] Connector 0:
[8.349031] [drm]   DVI-I-1
[8.349082] [drm]   HPD1
[8.349135] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 
0x7e5c
[8.349200] [drm]   Encoders:
[8.349252] [drm] DFP1: INTERNAL_UNIPHY
[8.349308] [drm] CRT2: INTERNAL_KLDSCP_DAC2
[8.349364] [drm] Connector 1:
[8.349416] [drm]   DIN-1
[8.349467] [drm]   Encoders:
[8.349520] [drm] TV1: INTERNAL_KLDSCP_DAC2
[8.349576] [drm] Connector 2:
[8.349628] [drm]   DVI-I-2
[8.349680] [drm]   HPD2
[8.349732] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 
0x7e4c
[8.349797] [drm]   Encoders:
[8.349849] [drm] CRT1: INTERNAL_KLDSCP_DAC1
[8.349905] [drm] DFP2: INTERNAL_KLDSCP_LVTMA
[8.430521] [drm] fb mappable at 0xE0243000
[8.430575] [drm] vram apper at 0xE000
[8.431194] [drm] size 9216000
[8.431245] [drm] fb depth is 24
[8.431295] [drm]pitch is 7680
[8.431406] fbcon: radeondrmfb (fb0) is primary device
[8.496928] Console: switching to colour frame buffer device 240x75
[8.501851] radeon :1d:00.0: fb0: radeondrmfb frame buffer device
[8.520179] [drm] Initialized radeon 2.50.0 20080528 for :1d:00.0 on 
minor 0

in the PCIe slot with two monitors connected to it. radeon firmware is

Version: 20170823-1

What practically happened is X froze and got restarted after the GPU
reset. It seems to be ok now, as I'm typing in it.

Thoughts?

[197439.022249] Restarting tasks ... done.
[197439.024043] PM: hibernation exit
[197439.058296] r8169 :18:00.0 eth0: link up
[200941.240184] perf: interrupt took too long (2507 > 2500), lowering 
kernel.perf_event_max_sample_rate to 79750
[221973.686894] radeon :1d:00.0: ring 0 stalled for more than 10176msec
[221973.686900] radeon :1d:00.0: GPU lockup (current fence id 
0x010da43f last fence id 0x010da52d on ring 0)
[221973.686929] radeon :1d:00.0: failed to get a new IB (-35)
[221973.686950] [drm:radeon_cs_ioctl [radeon]] *ERROR* Failed to get ib !
[221973.693971] radeon :1d:00.0: Saved 7609 dwords of commands on ring 0.
[221973.693985] radeon :1d:00.0: GPU softreset: 0x0008
[221973.693988] radeon :1d:00.0:   R_008010_GRBM_STATUS  = 0xA0001030
[221973.693990] radeon :1d:00.0:   R_008014_GRBM_STATUS2 = 0x0003
[221973.693992] radeon :1d:00.0:   R_000E50_SRBM_STATUS  = 0x200010C0
[221973.693994] radeon :1d:00.0:   R_008674_CP_STALLED_STAT1 = 0x
[221973.693996] radeon :1d:00.0:   R_008678_CP_STALLED_STAT2 = 0x
[221973.693998] radeon :1d:00.0:   

Re: [PATCH 1/2] drm/scheduler: Rename cleanup functions.

2018-06-06 Thread Michel Dänzer
On 2018-06-05 07:02 PM, Andrey Grodzovsky wrote:
> Everything in the flush code path (i.e. waiting for SW queue
> to become empty) names with *_flush()
> and everything in the release code path names *_fini()
> 
> This patch also effect the amdgpu and etnaviv drivers which
> use those functions.
> 
> Signed-off-by: Andrey Grodzovsky 
> Suggested-by: Christian König 

[...]

> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> index 23e73c2..3dff4d0 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> @@ -140,7 +140,7 @@ static void etnaviv_postclose(struct drm_device *dev, 
> struct drm_file *file)
>   gpu->lastctx = NULL;
>   mutex_unlock(&gpu->lock);
>  
> - drm_sched_entity_fini(&gpu->sched,
> + drm_sched_entity_destroy(&gpu->sched,
> &ctx->sched_entity[i]);
>   }
>   }

The drm-next tree for 4.18 has a new v3d driver, which also uses
drm_sched_entity_fini. Please either make sure to merge this change via
a tree which has the v3d driver, and fix it up as well, or don't do the
fini => destroy rename.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx