Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-20 Thread Christian König

Am 17.11.23 um 20:24 schrieb Alex Deucher:

On Fri, Nov 10, 2023 at 10:22 AM Christian König
 wrote:

Am 10.11.23 um 16:02 schrieb Alex Deucher:

This worked by luck if the GART aperture ended up at 0.  When
we ended up moving GART on some chips, the GART aperture ended
up offsetting the the AGP address since the resource->start is
a GART offset, not an MC address.  Fix this by moving the AGP
address setup into amdgpu_bo_gpu_offset_no_check().

v2: check mem_type before checking agp

Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Reported-by: Jesse Zhang 
Reported-by: Yifan Zhang 
Signed-off-by: Alex Deucher 
Cc: christian.koe...@amd.com

Reviewed-by: Christian König 

Mario is getting a segfault with this patch on PHX.  Any ideas?


No idea, what does amdgpu_gmc_agp_addr+0x16 point to?

What could be is that the BO isn't backed by anything, e.g. GART not 
initialized yet or something like that.


In this case we probably need to add some checks to amdgpu_gmc_agp_addr().

Regards,
Christian.


   I
don't see how this could happen off hand.

[   28.980823] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   28.980846] BUG: kernel NULL pointer dereference, address:
000c
[   28.981424] #PF: supervisor read access in kernel mode
[   28.981849] #PF: error_code(0x) - not-present page
[   28.982259] PGD 0 P4D 0
[   28.982469] Oops:  [#1] PREEMPT SMP NOPTI
[   28.982817] CPU: 10 PID: 547 Comm: (udev-worker) Not tainted
6.7.0-rc1-6-ge5e258131973 #175
[   28.984060] RIP: 0010:amdgpu_gmc_agp_addr+0x16/
0x60 [amdgpu]
[   28.984828] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 0f 1f 44 00 00 48 8b 87 88 01 00 00 49 b8 ff ff ff ff ff ff ff
7f <83> 78 0c 01 75 2e 83 78 28 02 74 28 48 8b 40 18 48 8b 97 60 01 00
[   28.984829] RSP: 0018:c9ff7998 EFLAGS: 00010282
[   28.984831] RAX:  RBX: 888103f83800 RCX:

[   28.984832] RDX: 7fff RSI: 888103f83858 RDI:
888103f83858
[   28.984832] RBP: 888102f0f020 R08: 7fff R09:

[   28.984833] R10: 1000 R11: 1000 R12:
888103f83800
[   28.984834] R13: 888102f0fdf8 R14: 888102f0fe00 R15:

[   28.984835] FS:  7fe6aa26a8c0() GS:88844e88()
knlGS:
[   28.984836] CS:  0010 DS:  ES:  CR0: 80050033
[   28.984837] CR2: 000c CR3: 00011556e000 CR4:
00750ef0
[   28.984838] PKRU: 5554
[   28.984839] Call Trace:
[   28.984842]  
[   28.984845]  ? __die+0x20/0x70
[   28.984850]  ? page_fault_oops+0x151/0x4b0
[   28.984854]  ? srso_alias_return_thunk+0x5/0xfbef5
[   28.992663]  ? do_user_addr_fault+0x65/0x6b0
[   28.992672]  ? exc_page_fault+0x74/0x170
[   28.992676]  ? asm_exc_page_fault+0x22/0x30
[   28.993714]  ? amdgpu_gmc_agp_addr+0x16/0x60 [amdgpu]
[   28.994455]  amdgpu_bo_gpu_offset_no_check+0x1a/0x70 [amdgpu]
[   28.995110]  amdgpu_bo_create_reserved.part.0+0x109/0x290 [amdgpu]
[   28.995786]  ? __pfx_amdgpu_bo_destroy+0x10/0x10 [amdgpu]
[   28.996400]  amdgpu_bo_create_kernel+0x3f/0xa0 [amdgpu]
[   28.996992]  amdgpu_device_init+0x15fa/0x2b60 [amdgpu]
[   28.997591]  ? pci_bus_read_config_word+0x46/0x80
[   28.997598]  ? srso_alias_return_thunk+0x5/0xfbef5
[   28.998385]  ? do_pci_enable_device+0xd4/0x100
[   28.998390]  amdgpu_driver_load_kms+0x15/0x190 [amdgpu]
[   28.999194]  amdgpu_pci_probe+0x180/0x570 [amdgpu]
[   28.999781]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.000232]  local_pci_probe+0x43/0xa0
[   29.000552]  pci_device_probe+0xc5/0x270
[   29.000883]  really_probe+0x1b4/0x420
[   29.001194]  __driver_probe_device+0x84/0x170
[   29.001558]  driver_probe_device+0x1e/0xb0
[   29.001901]  __driver_attach+0xe5/0x1f0
[   29.002224]  ? __pfx___driver_attach+0x10/0x10
[   29.002594]  bus_for_each_dev+0x75/0xd0
[   29.002919]  bus_add_driver+0x112/0x220
[   29.003243]  driver_register+0x5c/0x120
[   29.003569]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   29.004148]  do_one_initcall+0x41/0x300
[   29.004471]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.004876]  ? kmalloc_trace+0x25/0x90
[   29.005197]  do_init_module+0x64/0x250
[   29.005201]  init_module_from_file+0x8b/0xd0
[   29.005207]  idempotent_init_module+0x181/0x240
[   29.006235]  __x64_sys_finit_module+0x5a/0xb0
[   29.006238]  do_syscall_64+0x5c/0xe0
[   29.006868]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.006870]  ? syscall_exit_to_user_mode+0x27/0x40
[   29.006872]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.008000]  ? do_syscall_64+0x6b/0xe0
[   29.008004]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   29.008703] RIP: 0033:0x7fe6aa125c7d
[   29.008991] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 81 0d 00 f7 d8 64 89 01 48
[   29.008993] RSP: 002b:7ffe22be7618 EFLAGS: 0246 ORIG_RAX:
00

Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-17 Thread Alex Deucher
On Fri, Nov 10, 2023 at 10:22 AM Christian König
 wrote:
>
> Am 10.11.23 um 16:02 schrieb Alex Deucher:
> > This worked by luck if the GART aperture ended up at 0.  When
> > we ended up moving GART on some chips, the GART aperture ended
> > up offsetting the the AGP address since the resource->start is
> > a GART offset, not an MC address.  Fix this by moving the AGP
> > address setup into amdgpu_bo_gpu_offset_no_check().
> >
> > v2: check mem_type before checking agp
> >
> > Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
> > Reported-by: Jesse Zhang 
> > Reported-by: Yifan Zhang 
> > Signed-off-by: Alex Deucher 
> > Cc: christian.koe...@amd.com
>
> Reviewed-by: Christian König 

Mario is getting a segfault with this patch on PHX.  Any ideas?  I
don't see how this could happen off hand.

[   28.980823] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   28.980846] BUG: kernel NULL pointer dereference, address:
000c
[   28.981424] #PF: supervisor read access in kernel mode
[   28.981849] #PF: error_code(0x) - not-present page
[   28.982259] PGD 0 P4D 0
[   28.982469] Oops:  [#1] PREEMPT SMP NOPTI
[   28.982817] CPU: 10 PID: 547 Comm: (udev-worker) Not tainted
6.7.0-rc1-6-ge5e258131973 #175
[   28.984060] RIP: 0010:amdgpu_gmc_agp_addr+0x16/
0x60 [amdgpu]
[   28.984828] Code: 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 0f 1f 44 00 00 48 8b 87 88 01 00 00 49 b8 ff ff ff ff ff ff ff
7f <83> 78 0c 01 75 2e 83 78 28 02 74 28 48 8b 40 18 48 8b 97 60 01 00
[   28.984829] RSP: 0018:c9ff7998 EFLAGS: 00010282
[   28.984831] RAX:  RBX: 888103f83800 RCX:

[   28.984832] RDX: 7fff RSI: 888103f83858 RDI:
888103f83858
[   28.984832] RBP: 888102f0f020 R08: 7fff R09:

[   28.984833] R10: 1000 R11: 1000 R12:
888103f83800
[   28.984834] R13: 888102f0fdf8 R14: 888102f0fe00 R15:

[   28.984835] FS:  7fe6aa26a8c0() GS:88844e88()
knlGS:
[   28.984836] CS:  0010 DS:  ES:  CR0: 80050033
[   28.984837] CR2: 000c CR3: 00011556e000 CR4:
00750ef0
[   28.984838] PKRU: 5554
[   28.984839] Call Trace:
[   28.984842]  
[   28.984845]  ? __die+0x20/0x70
[   28.984850]  ? page_fault_oops+0x151/0x4b0
[   28.984854]  ? srso_alias_return_thunk+0x5/0xfbef5
[   28.992663]  ? do_user_addr_fault+0x65/0x6b0
[   28.992672]  ? exc_page_fault+0x74/0x170
[   28.992676]  ? asm_exc_page_fault+0x22/0x30
[   28.993714]  ? amdgpu_gmc_agp_addr+0x16/0x60 [amdgpu]
[   28.994455]  amdgpu_bo_gpu_offset_no_check+0x1a/0x70 [amdgpu]
[   28.995110]  amdgpu_bo_create_reserved.part.0+0x109/0x290 [amdgpu]
[   28.995786]  ? __pfx_amdgpu_bo_destroy+0x10/0x10 [amdgpu]
[   28.996400]  amdgpu_bo_create_kernel+0x3f/0xa0 [amdgpu]
[   28.996992]  amdgpu_device_init+0x15fa/0x2b60 [amdgpu]
[   28.997591]  ? pci_bus_read_config_word+0x46/0x80
[   28.997598]  ? srso_alias_return_thunk+0x5/0xfbef5
[   28.998385]  ? do_pci_enable_device+0xd4/0x100
[   28.998390]  amdgpu_driver_load_kms+0x15/0x190 [amdgpu]
[   28.999194]  amdgpu_pci_probe+0x180/0x570 [amdgpu]
[   28.999781]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.000232]  local_pci_probe+0x43/0xa0
[   29.000552]  pci_device_probe+0xc5/0x270
[   29.000883]  really_probe+0x1b4/0x420
[   29.001194]  __driver_probe_device+0x84/0x170
[   29.001558]  driver_probe_device+0x1e/0xb0
[   29.001901]  __driver_attach+0xe5/0x1f0
[   29.002224]  ? __pfx___driver_attach+0x10/0x10
[   29.002594]  bus_for_each_dev+0x75/0xd0
[   29.002919]  bus_add_driver+0x112/0x220
[   29.003243]  driver_register+0x5c/0x120
[   29.003569]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   29.004148]  do_one_initcall+0x41/0x300
[   29.004471]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.004876]  ? kmalloc_trace+0x25/0x90
[   29.005197]  do_init_module+0x64/0x250
[   29.005201]  init_module_from_file+0x8b/0xd0
[   29.005207]  idempotent_init_module+0x181/0x240
[   29.006235]  __x64_sys_finit_module+0x5a/0xb0
[   29.006238]  do_syscall_64+0x5c/0xe0
[   29.006868]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.006870]  ? syscall_exit_to_user_mode+0x27/0x40
[   29.006872]  ? srso_alias_return_thunk+0x5/0xfbef5
[   29.008000]  ? do_syscall_64+0x6b/0xe0
[   29.008004]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   29.008703] RIP: 0033:0x7fe6aa125c7d
[   29.008991] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 81 0d 00 f7 d8 64 89 01 48
[   29.008993] RSP: 002b:7ffe22be7618 EFLAGS: 0246 ORIG_RAX:
0139
[   29.010434] RAX: ffda RBX: 55b2f8dee0f0 RCX:
7fe6aa125c7d
[   29.010435] RDX:  RSI: 7fe6aa33544a RDI:
0017
[   29.010436] RBP: 7fe6aa33544a R08: 0040 R09:
fde0
[   2

RE: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-16 Thread Zhang, Yifan
[AMD Official Use Only - General]

Yes, it fixes regressions in KFDTest introduced by this commit ("b93ed51c32ca 
drm/amdgpu: fix AGP init order"),

e.g.  KFDMemoryTest.MemoryRegister failure:

fault addr 0x8084575a6000 is calculated by (gart_start + AGP aperture mc 
addr) wrongly.

[   46.662856] amdgpu :c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 
ring:169 vmid:0 pasid:0, for process  pid 0 thread  pid 0)
[   46.662890] amdgpu :c2:00.0: amdgpu:   in page starting at address 
0x8084575a6000 from client 10
[   46.662909] amdgpu :c2:00.0: amdgpu: 
GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B52
[   46.662923] amdgpu :c2:00.0: amdgpu:  Faulty UTCL2 client ID: CPC 
(0x5)
[   46.662936] amdgpu :c2:00.0: amdgpu:  MORE_FAULTS: 0x0
[   46.662947] amdgpu :c2:00.0: amdgpu:  WALKER_ERROR: 0x1
[   46.662957] amdgpu :c2:00.0: amdgpu:  PERMISSION_FAULTS: 0x5
[   46.662968] amdgpu :c2:00.0: amdgpu:  MAPPING_ERROR: 0x1
[   46.662979] amdgpu :c2:00.0: amdgpu:  RW: 0x1


-Original Message-
From: Alex Deucher 
Sent: Thursday, November 16, 2023 10:26 PM
To: Zhang, Yifan 
Cc: Koenig, Christian ; Deucher, Alexander 
; amd-gfx@lists.freedesktop.org; Zhang, Jesse(Jie) 

Subject: Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

On Thu, Nov 16, 2023 at 4:37 AM Zhang, Yifan  wrote:
>
> [AMD Official Use Only - General]
>
> Ping... this patch seems still not merged.
>

Can you confirm it fixes the AGP issues you saw?

Alex

> Best Regards,
> Yifan
>
> -Original Message-
> From: Alex Deucher 
> Sent: Monday, November 13, 2023 2:13 AM
> To: Koenig, Christian 
> Cc: Deucher, Alexander ;
> amd-gfx@lists.freedesktop.org; Zhang, Yifan ;
> Zhang, Jesse(Jie) 
> Subject: Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not
> at 0
>
> On Sat, Nov 11, 2023 at 2:17 AM Christian König  
> wrote:
> >
> > Am 10.11.23 um 15:47 schrieb Alex Deucher:
> > > This worked by luck if the GART aperture ended up at 0.  When we
> > > ended up moving GART on some chips, the GART aperture ended up
> > > offsetting the the AGP address since the resource->start is a GART
> > > offset, not an MC address.  Fix this by moving the AGP address
> > > setup into amdgpu_bo_gpu_offset_no_check().
> > >
> > > Reported-by: Jesse Zhang 
> > > Reported-by: Yifan Zhang 
> > > Signed-off-by: Alex Deucher 
> > > Cc: christian.koe...@amd.com
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
> > >   2 files changed, 8 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index cef920a93924..1b3e97522838 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
> > >   u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
> > >   {
> > >   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > > - uint64_t offset;
> > > + uint64_t offset, addr;
> > >
> > > - offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > > -  amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
> > > + addr = amdgpu_gmc_agp_addr(&bo->tbo);
> >
> > IIRC you must check bo->tbo.resource->mem_type before calling
> > amdgpu_gmc_agp_addr().
>
> Yes, this was fixed in v2.
>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> > > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> > > + offset = addr;
> > > + else
> > > + offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > > + amdgpu_ttm_domain_start(adev,
> > > + bo->tbo.resource->mem_type);
> > >
> > >   return amdgpu_gmc_sign_extend(offset);
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > index 05991c5c8ddb..ab4a762aed5b 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > @@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object 
> > > *bo)
> > >   return 0;
> > >
> > >   addr = amdgpu_gmc_agp_addr(bo);
> > > - if (addr != AMDGPU_BO_INVALID_OFFSET) {
> > > - bo->resource->start = addr >> PAGE_SHIFT;
> > > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> > >   return 0;
> > > - }
> > >
> > >   /* allocate GART space */
> > >   placement.num_placement = 1;
> >


Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-16 Thread Alex Deucher
On Thu, Nov 16, 2023 at 4:37 AM Zhang, Yifan  wrote:
>
> [AMD Official Use Only - General]
>
> Ping... this patch seems still not merged.
>

Can you confirm it fixes the AGP issues you saw?

Alex

> Best Regards,
> Yifan
>
> -Original Message-
> From: Alex Deucher 
> Sent: Monday, November 13, 2023 2:13 AM
> To: Koenig, Christian 
> Cc: Deucher, Alexander ; 
> amd-gfx@lists.freedesktop.org; Zhang, Yifan ; Zhang, 
> Jesse(Jie) 
> Subject: Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0
>
> On Sat, Nov 11, 2023 at 2:17 AM Christian König  
> wrote:
> >
> > Am 10.11.23 um 15:47 schrieb Alex Deucher:
> > > This worked by luck if the GART aperture ended up at 0.  When we
> > > ended up moving GART on some chips, the GART aperture ended up
> > > offsetting the the AGP address since the resource->start is a GART
> > > offset, not an MC address.  Fix this by moving the AGP address setup
> > > into amdgpu_bo_gpu_offset_no_check().
> > >
> > > Reported-by: Jesse Zhang 
> > > Reported-by: Yifan Zhang 
> > > Signed-off-by: Alex Deucher 
> > > Cc: christian.koe...@amd.com
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
> > >   2 files changed, 8 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > index cef920a93924..1b3e97522838 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > > @@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
> > >   u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
> > >   {
> > >   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > > - uint64_t offset;
> > > + uint64_t offset, addr;
> > >
> > > - offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > > -  amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
> > > + addr = amdgpu_gmc_agp_addr(&bo->tbo);
> >
> > IIRC you must check bo->tbo.resource->mem_type before calling
> > amdgpu_gmc_agp_addr().
>
> Yes, this was fixed in v2.
>
> Alex
>
> >
> > Regards,
> > Christian.
> >
> > > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> > > + offset = addr;
> > > + else
> > > + offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > > + amdgpu_ttm_domain_start(adev,
> > > + bo->tbo.resource->mem_type);
> > >
> > >   return amdgpu_gmc_sign_extend(offset);
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > index 05991c5c8ddb..ab4a762aed5b 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > > @@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object 
> > > *bo)
> > >   return 0;
> > >
> > >   addr = amdgpu_gmc_agp_addr(bo);
> > > - if (addr != AMDGPU_BO_INVALID_OFFSET) {
> > > - bo->resource->start = addr >> PAGE_SHIFT;
> > > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> > >   return 0;
> > > - }
> > >
> > >   /* allocate GART space */
> > >   placement.num_placement = 1;
> >


RE: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-16 Thread Zhang, Yifan
[AMD Official Use Only - General]

Ping... this patch seems still not merged.

Best Regards,
Yifan

-Original Message-
From: Alex Deucher 
Sent: Monday, November 13, 2023 2:13 AM
To: Koenig, Christian 
Cc: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org; Zhang, Yifan ; Zhang, 
Jesse(Jie) 
Subject: Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

On Sat, Nov 11, 2023 at 2:17 AM Christian König  
wrote:
>
> Am 10.11.23 um 15:47 schrieb Alex Deucher:
> > This worked by luck if the GART aperture ended up at 0.  When we
> > ended up moving GART on some chips, the GART aperture ended up
> > offsetting the the AGP address since the resource->start is a GART
> > offset, not an MC address.  Fix this by moving the AGP address setup
> > into amdgpu_bo_gpu_offset_no_check().
> >
> > Reported-by: Jesse Zhang 
> > Reported-by: Yifan Zhang 
> > Signed-off-by: Alex Deucher 
> > Cc: christian.koe...@amd.com
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
> >   2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > index cef920a93924..1b3e97522838 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > @@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
> >   u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
> >   {
> >   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > - uint64_t offset;
> > + uint64_t offset, addr;
> >
> > - offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > -  amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
> > + addr = amdgpu_gmc_agp_addr(&bo->tbo);
>
> IIRC you must check bo->tbo.resource->mem_type before calling
> amdgpu_gmc_agp_addr().

Yes, this was fixed in v2.

Alex

>
> Regards,
> Christian.
>
> > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> > + offset = addr;
> > + else
> > + offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > + amdgpu_ttm_domain_start(adev,
> > + bo->tbo.resource->mem_type);
> >
> >   return amdgpu_gmc_sign_extend(offset);
> >   }
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 05991c5c8ddb..ab4a762aed5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
> >   return 0;
> >
> >   addr = amdgpu_gmc_agp_addr(bo);
> > - if (addr != AMDGPU_BO_INVALID_OFFSET) {
> > - bo->resource->start = addr >> PAGE_SHIFT;
> > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> >   return 0;
> > - }
> >
> >   /* allocate GART space */
> >   placement.num_placement = 1;
>


Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-12 Thread Alex Deucher
On Sat, Nov 11, 2023 at 2:17 AM Christian König
 wrote:
>
> Am 10.11.23 um 15:47 schrieb Alex Deucher:
> > This worked by luck if the GART aperture ended up at 0.  When
> > we ended up moving GART on some chips, the GART aperture ended
> > up offsetting the the AGP address since the resource->start is
> > a GART offset, not an MC address.  Fix this by moving the AGP
> > address setup into amdgpu_bo_gpu_offset_no_check().
> >
> > Reported-by: Jesse Zhang 
> > Reported-by: Yifan Zhang 
> > Signed-off-by: Alex Deucher 
> > Cc: christian.koe...@amd.com
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
> >   2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > index cef920a93924..1b3e97522838 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > @@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
> >   u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
> >   {
> >   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> > - uint64_t offset;
> > + uint64_t offset, addr;
> >
> > - offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > -  amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
> > + addr = amdgpu_gmc_agp_addr(&bo->tbo);
>
> IIRC you must check bo->tbo.resource->mem_type before calling
> amdgpu_gmc_agp_addr().

Yes, this was fixed in v2.

Alex

>
> Regards,
> Christian.
>
> > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> > + offset = addr;
> > + else
> > + offset = (bo->tbo.resource->start << PAGE_SHIFT) +
> > + amdgpu_ttm_domain_start(adev, 
> > bo->tbo.resource->mem_type);
> >
> >   return amdgpu_gmc_sign_extend(offset);
> >   }
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index 05991c5c8ddb..ab4a762aed5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
> >   return 0;
> >
> >   addr = amdgpu_gmc_agp_addr(bo);
> > - if (addr != AMDGPU_BO_INVALID_OFFSET) {
> > - bo->resource->start = addr >> PAGE_SHIFT;
> > + if (addr != AMDGPU_BO_INVALID_OFFSET)
> >   return 0;
> > - }
> >
> >   /* allocate GART space */
> >   placement.num_placement = 1;
>


RE: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-10 Thread Zhang, Yifan
[AMD Official Use Only - General]

Reviewed-by: Yifan Zhang 

-Original Message-
From: Deucher, Alexander 
Sent: Friday, November 10, 2023 11:02 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Zhang, Jesse(Jie) 
; Zhang, Yifan ; Koenig, Christian 

Subject: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

This worked by luck if the GART aperture ended up at 0.  When we ended up 
moving GART on some chips, the GART aperture ended up offsetting the the AGP 
address since the resource->start is a GART offset, not an MC address.  Fix 
this by moving the AGP address setup into amdgpu_bo_gpu_offset_no_check().

v2: check mem_type before checking agp

Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Reported-by: Jesse Zhang 
Reported-by: Yifan Zhang 
Signed-off-by: Alex Deucher 
Cc: christian.koe...@amd.com
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..d79b4ca1ecfc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
 u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-   uint64_t offset;
+   uint64_t offset = AMDGPU_BO_INVALID_OFFSET;

-   offset = (bo->tbo.resource->start << PAGE_SHIFT) +
-amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
+   if (bo->tbo.resource->mem_type == TTM_PL_TT)
+   offset = amdgpu_gmc_agp_addr(&bo->tbo);
+
+   if (offset == AMDGPU_BO_INVALID_OFFSET)
+   offset = (bo->tbo.resource->start << PAGE_SHIFT) +
+   amdgpu_ttm_domain_start(adev, 
bo->tbo.resource->mem_type);

return amdgpu_gmc_sign_extend(offset);  } diff --git 
a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..ab4a762aed5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
return 0;

addr = amdgpu_gmc_agp_addr(bo);
-   if (addr != AMDGPU_BO_INVALID_OFFSET) {
-   bo->resource->start = addr >> PAGE_SHIFT;
+   if (addr != AMDGPU_BO_INVALID_OFFSET)
return 0;
-   }

/* allocate GART space */
placement.num_placement = 1;
--
2.41.0



Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-10 Thread Christian König

Am 10.11.23 um 16:02 schrieb Alex Deucher:

This worked by luck if the GART aperture ended up at 0.  When
we ended up moving GART on some chips, the GART aperture ended
up offsetting the the AGP address since the resource->start is
a GART offset, not an MC address.  Fix this by moving the AGP
address setup into amdgpu_bo_gpu_offset_no_check().

v2: check mem_type before checking agp

Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Reported-by: Jesse Zhang 
Reported-by: Yifan Zhang 
Signed-off-by: Alex Deucher 
Cc: christian.koe...@amd.com


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
  2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..d79b4ca1ecfc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
  u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-   uint64_t offset;
+   uint64_t offset = AMDGPU_BO_INVALID_OFFSET;
  
-	offset = (bo->tbo.resource->start << PAGE_SHIFT) +

-amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
+   if (bo->tbo.resource->mem_type == TTM_PL_TT)
+   offset = amdgpu_gmc_agp_addr(&bo->tbo);
+
+   if (offset == AMDGPU_BO_INVALID_OFFSET)
+   offset = (bo->tbo.resource->start << PAGE_SHIFT) +
+   amdgpu_ttm_domain_start(adev, 
bo->tbo.resource->mem_type);
  
  	return amdgpu_gmc_sign_extend(offset);

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..ab4a762aed5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
return 0;
  
  	addr = amdgpu_gmc_agp_addr(bo);

-   if (addr != AMDGPU_BO_INVALID_OFFSET) {
-   bo->resource->start = addr >> PAGE_SHIFT;
+   if (addr != AMDGPU_BO_INVALID_OFFSET)
return 0;
-   }
  
  	/* allocate GART space */

placement.num_placement = 1;




Re: [PATCH] drm/amdgpu: fix AGP addressing when GART is not at 0

2023-11-10 Thread Christian König

Am 10.11.23 um 15:47 schrieb Alex Deucher:

This worked by luck if the GART aperture ended up at 0.  When
we ended up moving GART on some chips, the GART aperture ended
up offsetting the the AGP address since the resource->start is
a GART offset, not an MC address.  Fix this by moving the AGP
address setup into amdgpu_bo_gpu_offset_no_check().

Reported-by: Jesse Zhang 
Reported-by: Yifan Zhang 
Signed-off-by: Alex Deucher 
Cc: christian.koe...@amd.com
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +---
  2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..1b3e97522838 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1527,10 +1527,14 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo)
  u64 amdgpu_bo_gpu_offset_no_check(struct amdgpu_bo *bo)
  {
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
-   uint64_t offset;
+   uint64_t offset, addr;
  
-	offset = (bo->tbo.resource->start << PAGE_SHIFT) +

-amdgpu_ttm_domain_start(adev, bo->tbo.resource->mem_type);
+   addr = amdgpu_gmc_agp_addr(&bo->tbo);


IIRC you must check bo->tbo.resource->mem_type before calling 
amdgpu_gmc_agp_addr().


Regards,
Christian.


+   if (addr != AMDGPU_BO_INVALID_OFFSET)
+   offset = addr;
+   else
+   offset = (bo->tbo.resource->start << PAGE_SHIFT) +
+   amdgpu_ttm_domain_start(adev, 
bo->tbo.resource->mem_type);
  
  	return amdgpu_gmc_sign_extend(offset);

  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..ab4a762aed5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -959,10 +959,8 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
return 0;
  
  	addr = amdgpu_gmc_agp_addr(bo);

-   if (addr != AMDGPU_BO_INVALID_OFFSET) {
-   bo->resource->start = addr >> PAGE_SHIFT;
+   if (addr != AMDGPU_BO_INVALID_OFFSET)
return 0;
-   }
  
  	/* allocate GART space */

placement.num_placement = 1;