Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Matthew Auld
On Wed, 25 Jan 2023 at 14:20, Christian König
 wrote:
>
> Am 25.01.23 um 13:53 schrieb Matthew Auld:
> > On Wed, 25 Jan 2023 at 11:35, Christian König
> >  wrote:
> >> Am 25.01.23 um 11:21 schrieb Matthew Auld:
> >>> On Wed, 25 Jan 2023 at 10:07, Christian König
> >>>  wrote:
>  Am 25.01.23 um 10:56 schrieb Matthew Auld:
> > On Tue, 24 Jan 2023 at 17:15, Matthew Auld
> >  wrote:
> >> On Tue, 24 Jan 2023 at 13:48, Matthew Auld
> >>  wrote:
> >>> On Tue, 24 Jan 2023 at 12:57, Christian König
> >>>  wrote:
>  From: Christian König 
> 
>  Make sure we can at least move and alloc TT objects without backing 
>  store.
> 
>  v2: clear the tt object even when no resource is allocated.
>  v3: add Matthews changes for i915 as well.
> 
>  Signed-off-by: Christian König 
> >>> Reviewed-by: Matthew Auld 
> >> Ofc that assumes intel-gfx CI is now happy with the series.
> > There are still some nasty failures it seems (in the extended test
> > list). But it looks like the series is already merged. Can we quickly
> > revert and try again?
>  Ah, crap. I thought everything would be fine after the CI gave it's go.
> 
>  Which patch is causing the fallout?
> >>> I'm not sure. I think all of the patches kind of interact with each
> >>> other, but for sure there is an issue with the first patch. There is
> >>> one splat like:
> >> Well I would rather like to revert as less as possible.
> >>
> >> Are you sure that this isn't only on some i915 specific branch with not
> >> yet upstream changes?
> > Yeah, that splat is taken directly from the CI results reported with
> > this series. So it's just your series applied on top of drm-tip.
> >
> > Can you take a look at the first patch here:
> > https://patchwork.freedesktop.org/series/113332/
> >
> > Maybe you have a better idea? For reference the IGTs that we have for
> > verifying userspace object clearing are now failing, so hoping that
> > fixes it. The other two patches I'm hoping will fix the splat.
>
> The TTM change looks like a good idea to me. Feel free to add my rb to
> this one.
>
> I can't say much about the i915 changes.
>
> Maybe we should revert the two TTM patches to not allocate resources for
> now and fix i915 first?

>From what I can see, we would need to revert all three TTM patches,
keeping just the i915 one. Reverting for now I think makes sense.

>
> Christian.
>
> >
> >> I can't even find the i915_gem_obj_copy_ttm function in drm-misc-next
> >> nor drm-next.
> >>
> >> Regards,
> >> Christian.
> >>
> >>> <1>[  109.735148] BUG: kernel NULL pointer dereference, address:
> >>> 0010
> >>> <1>[  109.735151] #PF: supervisor read access in kernel mode
> >>> <1>[  109.735152] #PF: error_code(0x) - not-present page
> >>> <6>[  109.735153] PGD 0 P4D 0
> >>> <4>[  109.735155] Oops:  [#1] PREEMPT SMP NOPTI
> >>> <4>[  109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted
> >>> 6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1
> >>> <4>[  109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390
> >>> Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
> >>> <4>[  109.735160] Workqueue: events_unbound async_run_entry_fn
> >>> <4>[  109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915]
> >>> <4>[  109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f
> >>> 18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> >>> 66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff
> >>> 0f 00 00
> >>> <4>[  109.735288] RSP: 0018:c9f339a8 EFLAGS: 00010246
> >>> <4>[  109.735289] RAX:  RBX:  RCX:
> >>> 88810cea3a00
> >>> <4>[  109.735290] RDX:  RSI: c9f33af0 RDI:
> >>> 
> >>> <4>[  109.735292] RBP: 88811645d7c0 R08:  R09:
> >>> 888123afa940
> >>> <4>[  109.735292] R10: 0001 R11: 888104b70040 R12:
> >>> 
> >>> <4>[  109.735293] R13:  R14: c9f33b08 R15:
> >>> c9f33af0
> >>> <4>[  109.735294] FS:  ()
> >>> GS:8884ad68() knlGS:
> >>> <4>[  109.735295] CS:  0010 DS:  ES:  CR0: 80050033
> >>> <4>[  109.735296] CR2: 0010 CR3: 00011f9c6003 CR4:
> >>> 003706e0
> >>> <4>[  109.735297] DR0:  DR1:  DR2:
> >>> 
> >>> <4>[  109.735298] DR3:  DR6: fffe0ff0 DR7:
> >>> 0400
> >>> <4>[  109.735299] Call Trace:
> >>> <4>[  109.735300]  
> >>> <4>[  109.735301]  __i915_ttm_move+0x128/0x940 [i915]
> >>> <4>[  109.735408]  ? dma_resv_iter_next+0x91/0xb0
> >>> <4>[  109.735412]  ? dma_resv_iter_first+0x42/0xb0
> >>> <4>[  109.735414]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
> >>> <4>[  109.735520]  i915_gem_obj_copy_ttm+0x12f/0x250 [i915]
> >>> <4>[  109.735625]  i915_ttm

Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Christian König

Am 25.01.23 um 13:53 schrieb Matthew Auld:

On Wed, 25 Jan 2023 at 11:35, Christian König
 wrote:

Am 25.01.23 um 11:21 schrieb Matthew Auld:

On Wed, 25 Jan 2023 at 10:07, Christian König
 wrote:

Am 25.01.23 um 10:56 schrieb Matthew Auld:

On Tue, 24 Jan 2023 at 17:15, Matthew Auld
 wrote:

On Tue, 24 Jan 2023 at 13:48, Matthew Auld
 wrote:

On Tue, 24 Jan 2023 at 12:57, Christian König
 wrote:

From: Christian König 

Make sure we can at least move and alloc TT objects without backing store.

v2: clear the tt object even when no resource is allocated.
v3: add Matthews changes for i915 as well.

Signed-off-by: Christian König 

Reviewed-by: Matthew Auld 

Ofc that assumes intel-gfx CI is now happy with the series.

There are still some nasty failures it seems (in the extended test
list). But it looks like the series is already merged. Can we quickly
revert and try again?

Ah, crap. I thought everything would be fine after the CI gave it's go.

Which patch is causing the fallout?

I'm not sure. I think all of the patches kind of interact with each
other, but for sure there is an issue with the first patch. There is
one splat like:

Well I would rather like to revert as less as possible.

Are you sure that this isn't only on some i915 specific branch with not
yet upstream changes?

Yeah, that splat is taken directly from the CI results reported with
this series. So it's just your series applied on top of drm-tip.

Can you take a look at the first patch here:
https://patchwork.freedesktop.org/series/113332/

Maybe you have a better idea? For reference the IGTs that we have for
verifying userspace object clearing are now failing, so hoping that
fixes it. The other two patches I'm hoping will fix the splat.


The TTM change looks like a good idea to me. Feel free to add my rb to 
this one.


I can't say much about the i915 changes.

Maybe we should revert the two TTM patches to not allocate resources for 
now and fix i915 first?


Christian.




I can't even find the i915_gem_obj_copy_ttm function in drm-misc-next
nor drm-next.

Regards,
Christian.


<1>[  109.735148] BUG: kernel NULL pointer dereference, address:
0010
<1>[  109.735151] #PF: supervisor read access in kernel mode
<1>[  109.735152] #PF: error_code(0x) - not-present page
<6>[  109.735153] PGD 0 P4D 0
<4>[  109.735155] Oops:  [#1] PREEMPT SMP NOPTI
<4>[  109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted
6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1
<4>[  109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390
Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4>[  109.735160] Workqueue: events_unbound async_run_entry_fn
<4>[  109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915]
<4>[  109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f
18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff
0f 00 00
<4>[  109.735288] RSP: 0018:c9f339a8 EFLAGS: 00010246
<4>[  109.735289] RAX:  RBX:  RCX:
88810cea3a00
<4>[  109.735290] RDX:  RSI: c9f33af0 RDI:

<4>[  109.735292] RBP: 88811645d7c0 R08:  R09:
888123afa940
<4>[  109.735292] R10: 0001 R11: 888104b70040 R12:

<4>[  109.735293] R13:  R14: c9f33b08 R15:
c9f33af0
<4>[  109.735294] FS:  ()
GS:8884ad68() knlGS:
<4>[  109.735295] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  109.735296] CR2: 0010 CR3: 00011f9c6003 CR4:
003706e0
<4>[  109.735297] DR0:  DR1:  DR2:

<4>[  109.735298] DR3:  DR6: fffe0ff0 DR7:
0400
<4>[  109.735299] Call Trace:
<4>[  109.735300]  
<4>[  109.735301]  __i915_ttm_move+0x128/0x940 [i915]
<4>[  109.735408]  ? dma_resv_iter_next+0x91/0xb0
<4>[  109.735412]  ? dma_resv_iter_first+0x42/0xb0
<4>[  109.735414]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
<4>[  109.735520]  i915_gem_obj_copy_ttm+0x12f/0x250 [i915]
<4>[  109.735625]  i915_ttm_restore+0x167/0x250 [i915]
<4>[  109.735759]  i915_gem_process_region+0x27a/0x3b0 [i915]
<4>[  109.735881]  i915_ttm_restore_region+0x4b/0x70 [i915]
<4>[  109.735999]  lmem_restore+0x3a/0x60 [i915]
<4>[  109.736101]  i915_gem_resume+0x4c/0x100 [i915]
<4>[  109.736202]  i915_drm_resume+0xc2/0x170 [i915]

Plus some other less obvious issue(s) with some tests failing.


Christian.




Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Matthew Auld
On Wed, 25 Jan 2023 at 11:35, Christian König
 wrote:
>
> Am 25.01.23 um 11:21 schrieb Matthew Auld:
> > On Wed, 25 Jan 2023 at 10:07, Christian König
> >  wrote:
> >> Am 25.01.23 um 10:56 schrieb Matthew Auld:
> >>> On Tue, 24 Jan 2023 at 17:15, Matthew Auld
> >>>  wrote:
>  On Tue, 24 Jan 2023 at 13:48, Matthew Auld
>   wrote:
> > On Tue, 24 Jan 2023 at 12:57, Christian König
> >  wrote:
> >> From: Christian König 
> >>
> >> Make sure we can at least move and alloc TT objects without backing 
> >> store.
> >>
> >> v2: clear the tt object even when no resource is allocated.
> >> v3: add Matthews changes for i915 as well.
> >>
> >> Signed-off-by: Christian König 
> > Reviewed-by: Matthew Auld 
>  Ofc that assumes intel-gfx CI is now happy with the series.
> >>> There are still some nasty failures it seems (in the extended test
> >>> list). But it looks like the series is already merged. Can we quickly
> >>> revert and try again?
> >> Ah, crap. I thought everything would be fine after the CI gave it's go.
> >>
> >> Which patch is causing the fallout?
> > I'm not sure. I think all of the patches kind of interact with each
> > other, but for sure there is an issue with the first patch. There is
> > one splat like:
>
> Well I would rather like to revert as less as possible.
>
> Are you sure that this isn't only on some i915 specific branch with not
> yet upstream changes?

Yeah, that splat is taken directly from the CI results reported with
this series. So it's just your series applied on top of drm-tip.

Can you take a look at the first patch here:
https://patchwork.freedesktop.org/series/113332/

Maybe you have a better idea? For reference the IGTs that we have for
verifying userspace object clearing are now failing, so hoping that
fixes it. The other two patches I'm hoping will fix the splat.

>
> I can't even find the i915_gem_obj_copy_ttm function in drm-misc-next
> nor drm-next.
>
> Regards,
> Christian.
>
> >
> > <1>[  109.735148] BUG: kernel NULL pointer dereference, address:
> > 0010
> > <1>[  109.735151] #PF: supervisor read access in kernel mode
> > <1>[  109.735152] #PF: error_code(0x) - not-present page
> > <6>[  109.735153] PGD 0 P4D 0
> > <4>[  109.735155] Oops:  [#1] PREEMPT SMP NOPTI
> > <4>[  109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted
> > 6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1
> > <4>[  109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390
> > Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
> > <4>[  109.735160] Workqueue: events_unbound async_run_entry_fn
> > <4>[  109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915]
> > <4>[  109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f
> > 18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> > 66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff
> > 0f 00 00
> > <4>[  109.735288] RSP: 0018:c9f339a8 EFLAGS: 00010246
> > <4>[  109.735289] RAX:  RBX:  RCX:
> > 88810cea3a00
> > <4>[  109.735290] RDX:  RSI: c9f33af0 RDI:
> > 
> > <4>[  109.735292] RBP: 88811645d7c0 R08:  R09:
> > 888123afa940
> > <4>[  109.735292] R10: 0001 R11: 888104b70040 R12:
> > 
> > <4>[  109.735293] R13:  R14: c9f33b08 R15:
> > c9f33af0
> > <4>[  109.735294] FS:  ()
> > GS:8884ad68() knlGS:
> > <4>[  109.735295] CS:  0010 DS:  ES:  CR0: 80050033
> > <4>[  109.735296] CR2: 0010 CR3: 00011f9c6003 CR4:
> > 003706e0
> > <4>[  109.735297] DR0:  DR1:  DR2:
> > 
> > <4>[  109.735298] DR3:  DR6: fffe0ff0 DR7:
> > 0400
> > <4>[  109.735299] Call Trace:
> > <4>[  109.735300]  
> > <4>[  109.735301]  __i915_ttm_move+0x128/0x940 [i915]
> > <4>[  109.735408]  ? dma_resv_iter_next+0x91/0xb0
> > <4>[  109.735412]  ? dma_resv_iter_first+0x42/0xb0
> > <4>[  109.735414]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
> > <4>[  109.735520]  i915_gem_obj_copy_ttm+0x12f/0x250 [i915]
> > <4>[  109.735625]  i915_ttm_restore+0x167/0x250 [i915]
> > <4>[  109.735759]  i915_gem_process_region+0x27a/0x3b0 [i915]
> > <4>[  109.735881]  i915_ttm_restore_region+0x4b/0x70 [i915]
> > <4>[  109.735999]  lmem_restore+0x3a/0x60 [i915]
> > <4>[  109.736101]  i915_gem_resume+0x4c/0x100 [i915]
> > <4>[  109.736202]  i915_drm_resume+0xc2/0x170 [i915]
> >
> > Plus some other less obvious issue(s) with some tests failing.
> >
> >> Christian.
>


Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Christian König

Am 25.01.23 um 11:21 schrieb Matthew Auld:

On Wed, 25 Jan 2023 at 10:07, Christian König
 wrote:

Am 25.01.23 um 10:56 schrieb Matthew Auld:

On Tue, 24 Jan 2023 at 17:15, Matthew Auld
 wrote:

On Tue, 24 Jan 2023 at 13:48, Matthew Auld
 wrote:

On Tue, 24 Jan 2023 at 12:57, Christian König
 wrote:

From: Christian König 

Make sure we can at least move and alloc TT objects without backing store.

v2: clear the tt object even when no resource is allocated.
v3: add Matthews changes for i915 as well.

Signed-off-by: Christian König 

Reviewed-by: Matthew Auld 

Ofc that assumes intel-gfx CI is now happy with the series.

There are still some nasty failures it seems (in the extended test
list). But it looks like the series is already merged. Can we quickly
revert and try again?

Ah, crap. I thought everything would be fine after the CI gave it's go.

Which patch is causing the fallout?

I'm not sure. I think all of the patches kind of interact with each
other, but for sure there is an issue with the first patch. There is
one splat like:


Well I would rather like to revert as less as possible.

Are you sure that this isn't only on some i915 specific branch with not 
yet upstream changes?


I can't even find the i915_gem_obj_copy_ttm function in drm-misc-next 
nor drm-next.


Regards,
Christian.



<1>[  109.735148] BUG: kernel NULL pointer dereference, address:
0010
<1>[  109.735151] #PF: supervisor read access in kernel mode
<1>[  109.735152] #PF: error_code(0x) - not-present page
<6>[  109.735153] PGD 0 P4D 0
<4>[  109.735155] Oops:  [#1] PREEMPT SMP NOPTI
<4>[  109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted
6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1
<4>[  109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390
Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4>[  109.735160] Workqueue: events_unbound async_run_entry_fn
<4>[  109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915]
<4>[  109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f
18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff
0f 00 00
<4>[  109.735288] RSP: 0018:c9f339a8 EFLAGS: 00010246
<4>[  109.735289] RAX:  RBX:  RCX:
88810cea3a00
<4>[  109.735290] RDX:  RSI: c9f33af0 RDI:

<4>[  109.735292] RBP: 88811645d7c0 R08:  R09:
888123afa940
<4>[  109.735292] R10: 0001 R11: 888104b70040 R12:

<4>[  109.735293] R13:  R14: c9f33b08 R15:
c9f33af0
<4>[  109.735294] FS:  ()
GS:8884ad68() knlGS:
<4>[  109.735295] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  109.735296] CR2: 0010 CR3: 00011f9c6003 CR4:
003706e0
<4>[  109.735297] DR0:  DR1:  DR2:

<4>[  109.735298] DR3:  DR6: fffe0ff0 DR7:
0400
<4>[  109.735299] Call Trace:
<4>[  109.735300]  
<4>[  109.735301]  __i915_ttm_move+0x128/0x940 [i915]
<4>[  109.735408]  ? dma_resv_iter_next+0x91/0xb0
<4>[  109.735412]  ? dma_resv_iter_first+0x42/0xb0
<4>[  109.735414]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
<4>[  109.735520]  i915_gem_obj_copy_ttm+0x12f/0x250 [i915]
<4>[  109.735625]  i915_ttm_restore+0x167/0x250 [i915]
<4>[  109.735759]  i915_gem_process_region+0x27a/0x3b0 [i915]
<4>[  109.735881]  i915_ttm_restore_region+0x4b/0x70 [i915]
<4>[  109.735999]  lmem_restore+0x3a/0x60 [i915]
<4>[  109.736101]  i915_gem_resume+0x4c/0x100 [i915]
<4>[  109.736202]  i915_drm_resume+0xc2/0x170 [i915]

Plus some other less obvious issue(s) with some tests failing.


Christian.




Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Matthew Auld
On Wed, 25 Jan 2023 at 10:07, Christian König
 wrote:
>
>
>
> Am 25.01.23 um 10:56 schrieb Matthew Auld:
> > On Tue, 24 Jan 2023 at 17:15, Matthew Auld
> >  wrote:
> >> On Tue, 24 Jan 2023 at 13:48, Matthew Auld
> >>  wrote:
> >>> On Tue, 24 Jan 2023 at 12:57, Christian König
> >>>  wrote:
>  From: Christian König 
> 
>  Make sure we can at least move and alloc TT objects without backing 
>  store.
> 
>  v2: clear the tt object even when no resource is allocated.
>  v3: add Matthews changes for i915 as well.
> 
>  Signed-off-by: Christian König 
> >>> Reviewed-by: Matthew Auld 
> >> Ofc that assumes intel-gfx CI is now happy with the series.
> > There are still some nasty failures it seems (in the extended test
> > list). But it looks like the series is already merged. Can we quickly
> > revert and try again?
>
> Ah, crap. I thought everything would be fine after the CI gave it's go.
>
> Which patch is causing the fallout?

I'm not sure. I think all of the patches kind of interact with each
other, but for sure there is an issue with the first patch. There is
one splat like:

<1>[  109.735148] BUG: kernel NULL pointer dereference, address:
0010
<1>[  109.735151] #PF: supervisor read access in kernel mode
<1>[  109.735152] #PF: error_code(0x) - not-present page
<6>[  109.735153] PGD 0 P4D 0
<4>[  109.735155] Oops:  [#1] PREEMPT SMP NOPTI
<4>[  109.735157] CPU: 1 PID: 92 Comm: kworker/u12:6 Not tainted
6.2.0-rc5-Patchwork_113269v1-gc4d436608c4e+ #1
<4>[  109.735159] Hardware name: Gigabyte Technology Co., Ltd. GB-Z390
Garuda/GB-Z390 Garuda-CF, BIOS IG1c 11/19/2019
<4>[  109.735160] Workqueue: events_unbound async_run_entry_fn
<4>[  109.735163] RIP: 0010:i915_ttm_resource_mappable+0x4/0x30 [i915]
<4>[  109.735286] Code: b8 f9 ff ff ff eb c2 e8 aa 5e 52 e1 e9 4f 0f
18 00 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
66 0f 1f 00 <8b> 57 10 b8 01 00 00 00 85 d2 74 15 48 8b 47 08 48 05 ff
0f 00 00
<4>[  109.735288] RSP: 0018:c9f339a8 EFLAGS: 00010246
<4>[  109.735289] RAX:  RBX:  RCX:
88810cea3a00
<4>[  109.735290] RDX:  RSI: c9f33af0 RDI:

<4>[  109.735292] RBP: 88811645d7c0 R08:  R09:
888123afa940
<4>[  109.735292] R10: 0001 R11: 888104b70040 R12:

<4>[  109.735293] R13:  R14: c9f33b08 R15:
c9f33af0
<4>[  109.735294] FS:  ()
GS:8884ad68() knlGS:
<4>[  109.735295] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  109.735296] CR2: 0010 CR3: 00011f9c6003 CR4:
003706e0
<4>[  109.735297] DR0:  DR1:  DR2:

<4>[  109.735298] DR3:  DR6: fffe0ff0 DR7:
0400
<4>[  109.735299] Call Trace:
<4>[  109.735300]  
<4>[  109.735301]  __i915_ttm_move+0x128/0x940 [i915]
<4>[  109.735408]  ? dma_resv_iter_next+0x91/0xb0
<4>[  109.735412]  ? dma_resv_iter_first+0x42/0xb0
<4>[  109.735414]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
<4>[  109.735520]  i915_gem_obj_copy_ttm+0x12f/0x250 [i915]
<4>[  109.735625]  i915_ttm_restore+0x167/0x250 [i915]
<4>[  109.735759]  i915_gem_process_region+0x27a/0x3b0 [i915]
<4>[  109.735881]  i915_ttm_restore_region+0x4b/0x70 [i915]
<4>[  109.735999]  lmem_restore+0x3a/0x60 [i915]
<4>[  109.736101]  i915_gem_resume+0x4c/0x100 [i915]
<4>[  109.736202]  i915_drm_resume+0xc2/0x170 [i915]

Plus some other less obvious issue(s) with some tests failing.

>
> Christian.


Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Christian König




Am 25.01.23 um 10:56 schrieb Matthew Auld:

On Tue, 24 Jan 2023 at 17:15, Matthew Auld
 wrote:

On Tue, 24 Jan 2023 at 13:48, Matthew Auld
 wrote:

On Tue, 24 Jan 2023 at 12:57, Christian König
 wrote:

From: Christian König 

Make sure we can at least move and alloc TT objects without backing store.

v2: clear the tt object even when no resource is allocated.
v3: add Matthews changes for i915 as well.

Signed-off-by: Christian König 

Reviewed-by: Matthew Auld 

Ofc that assumes intel-gfx CI is now happy with the series.

There are still some nasty failures it seems (in the extended test
list). But it looks like the series is already merged. Can we quickly
revert and try again?


Ah, crap. I thought everything would be fine after the CI gave it's go.

Which patch is causing the fallout?

Christian.


Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-25 Thread Matthew Auld
On Tue, 24 Jan 2023 at 17:15, Matthew Auld
 wrote:
>
> On Tue, 24 Jan 2023 at 13:48, Matthew Auld
>  wrote:
> >
> > On Tue, 24 Jan 2023 at 12:57, Christian König
> >  wrote:
> > >
> > > From: Christian König 
> > >
> > > Make sure we can at least move and alloc TT objects without backing store.
> > >
> > > v2: clear the tt object even when no resource is allocated.
> > > v3: add Matthews changes for i915 as well.
> > >
> > > Signed-off-by: Christian König 
> > Reviewed-by: Matthew Auld 
>
> Ofc that assumes intel-gfx CI is now happy with the series.

There are still some nasty failures it seems (in the extended test
list). But it looks like the series is already merged. Can we quickly
revert and try again?


Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-24 Thread Matthew Auld
On Tue, 24 Jan 2023 at 13:48, Matthew Auld
 wrote:
>
> On Tue, 24 Jan 2023 at 12:57, Christian König
>  wrote:
> >
> > From: Christian König 
> >
> > Make sure we can at least move and alloc TT objects without backing store.
> >
> > v2: clear the tt object even when no resource is allocated.
> > v3: add Matthews changes for i915 as well.
> >
> > Signed-off-by: Christian König 
> Reviewed-by: Matthew Auld 

Ofc that assumes intel-gfx CI is now happy with the series.


Re: [Intel-gfx] [PATCH 1/5] drm/i915: audit bo->resource usage v3

2023-01-24 Thread Matthew Auld
On Tue, 24 Jan 2023 at 12:57, Christian König
 wrote:
>
> From: Christian König 
>
> Make sure we can at least move and alloc TT objects without backing store.
>
> v2: clear the tt object even when no resource is allocated.
> v3: add Matthews changes for i915 as well.
>
> Signed-off-by: Christian König 
Reviewed-by: Matthew Auld