Re: [PATCH v3] drm/radeon: Fix EEH during kexec

2019-10-28 Thread Greg KH
On Fri, Oct 25, 2019 at 03:40:50PM -0500, KyleMahlkuch wrote:
> From: Kyle Mahlkuch 
> 
> During kexec some adapters hit an EEH since they are not properly
> shut down in the radeon_pci_shutdown() function. Adding
> radeon_suspend_kms() fixes this issue.
> Enabled only on PPC because this patch causes issues on some other
> boards.
> 
> Signed-off-by: Kyle Mahlkuch 
> ---
>  drivers/gpu/drm/radeon/radeon_drv.c | 14 ++
>  1 file changed, 14 insertions(+)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

2021-06-21 Thread Greg KH
On Mon, Jun 21, 2021 at 02:28:48PM +0200, Daniel Vetter wrote:
> On Fri, Jun 18, 2021 at 2:36 PM Oded Gabbay  wrote:
> > User process might want to share the device memory with another
> > driver/device, and to allow it to access it over PCIe (P2P).
> >
> > To enable this, we utilize the dma-buf mechanism and add a dma-buf
> > exporter support, so the other driver can import the device memory and
> > access it.
> >
> > The device memory is allocated using our existing allocation uAPI,
> > where the user will get a handle that represents the allocation.
> >
> > The user will then need to call the new
> > uAPI (HL_MEM_OP_EXPORT_DMABUF_FD) and give the handle as a parameter.
> >
> > The driver will return a FD that represents the DMA-BUF object that
> > was created to match that allocation.
> >
> > Signed-off-by: Oded Gabbay 
> > Reviewed-by: Tomer Tayar 
> 
> Mission acomplished, we've gone full circle, and the totally-not-a-gpu
> driver is now trying to use gpu infrastructure. And seems to have
> gained vram meanwhile too. Next up is going to be synchronization
> using dma_fence so you can pass buffers back&forth without stalls
> among drivers.

What's wrong with other drivers using dmabufs and even dma_fence?  It's
a common problem when shuffling memory around systems, why is that
somehow only allowed for gpu drivers?

There are many users of these structures in the kernel today that are
not gpu drivers (tee, fastrpc, virtio, xen, IB, etc) as this is a common
thing that drivers want to do (throw chunks of memory around from
userspace to hardware).

I'm not trying to be a pain here, but I really do not understand why
this is a problem.  A kernel api is present, why not use it by other
in-kernel drivers?  We had the problem in the past where subsystems were
trying to create their own interfaces for the same thing, which is why
you all created the dmabuf api to help unify this.

> Also I'm wondering which is the other driver that we share buffers
> with. The gaudi stuff doesn't have real struct pages as backing
> storage, it only fills out the dma_addr_t. That tends to blow up with
> other drivers, and the only place where this is guaranteed to work is
> if you have a dynamic importer which sets the allow_peer2peer flag.
> Adding maintainers from other subsystems who might want to chime in
> here. So even aside of the big question as-is this is broken.

>From what I can tell this driver is sending the buffers to other
instances of the same hardware, as that's what is on the other "end" of
the network connection.  No different from IB's use of RDMA, right?

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-06-22 Thread Greg KH
On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > Track sysfs files in a list so they all can be removed during pci remove
> > since otherwise their removal after that causes crash because parent
> > folder was already removed during pci remove.

Huh?  That should not happen, do you have a backtrace of that crash?

> > Signed-off-by: Andrey Grodzovsky 
> 
> Uh I thought sysfs just gets yanked completely. Please check with Greg KH
> whether hand-rolling all this really is the right solution here ... Feels
> very wrong. I thought this was all supposed to work by adding attributes
> before publishing the sysfs node, and then letting sysfs clean up
> everything. Not by cleaning up manually yourself.

Yes, that is supposed to be the correct thing to do.

> 
> Adding Greg for an authoritative answer.
> -Daniel
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h  | 13 +++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 +-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 35 
> > 
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 ++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  |  8 ++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 --
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 13 ++-
> >  drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 10 +---
> >  8 files changed, 99 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index 604a681..ba3775f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -726,6 +726,15 @@ struct amd_powerplay {
> >  
> >  #define AMDGPU_RESET_MAGIC_NUM 64
> >  #define AMDGPU_MAX_DF_PERFMONS 4
> > +
> > +struct amdgpu_sysfs_list_node {
> > +   struct list_head head;
> > +   struct device_attribute *attr;
> > +};

You know we have lists of attributes already, called attribute groups,
if you really wanted to do something like this.  But, I don't think so.

Either way, don't hand-roll your own stuff that the driver core has
provided for you for a decade or more, that's just foolish :)

> > +
> > +#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
> > +   struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = 
> > &dev_attr_##_attr}
> > +
> >  struct amdgpu_device {
> > struct device   *dev;
> > struct drm_device   *ddev;
> > @@ -992,6 +1001,10 @@ struct amdgpu_device {
> > charproduct_number[16];
> > charproduct_name[32];
> > charserial[16];
> > +
> > +   struct list_head sysfs_files_list;
> > +   struct mutex sysfs_files_list_lock;
> > +
> >  };
> >  
> >  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device 
> > *bdev)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> > index fdd52d8..c1549ee 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> > @@ -1950,8 +1950,10 @@ static ssize_t 
> > amdgpu_atombios_get_vbios_version(struct device *dev,
> > return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version);
> >  }
> >  
> > +
> >  static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
> >NULL);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
> >  
> >  /**
> >   * amdgpu_atombios_fini - free the driver info and callbacks for atombios
> > @@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
> > adev->mode_info.atom_context = NULL;
> > kfree(adev->mode_info.atom_card_info);
> > adev->mode_info.atom_card_info = NULL;
> > -   device_remove_file(adev->dev, &dev_attr_vbios_version);
> >  }
> >  
> >  /**
> > @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
> > return ret;
> > }
> >  
> > +   mutex_lock(&adev->sysfs_files_list_lock);
> > +   list_add_tail(&dev_attr_handle_vbios_version.head, 
> > &adev->sysfs_files_list);
> > +   mutex_unlock(&adev->sysfs_files_list_lock);
> > +
> > return 0;
> >  }
> >  
> > diff --gi

Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-06-22 Thread Greg KH
On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
> 
> On 6/22/20 7:21 AM, Greg KH wrote:
> > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> > > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > > > Track sysfs files in a list so they all can be removed during pci remove
> > > > since otherwise their removal after that causes crash because parent
> > > > folder was already removed during pci remove.
> > Huh?  That should not happen, do you have a backtrace of that crash?
> 
> 
> 2 examples in the attached trace.

Odd, how did you trigger these?


> [  925.738225 <0.188086>] BUG: kernel NULL pointer dereference, address: 
> 0090
> [  925.738232 <0.07>] #PF: supervisor read access in kernel mode
> [  925.738236 <0.04>] #PF: error_code(0x) - not-present page
> [  925.738240 <0.04>] PGD 0 P4D 0 
> [  925.738245 <0.05>] Oops:  [#1] SMP PTI
> [  925.738249 <0.04>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G   
>  W  OE 5.5.0-rc7-dev-kfd+ #50
> [  925.738256 <0.07>] Hardware name: System manufacturer System 
> Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> [  925.738266 <0.10>] RIP: 0010:kernfs_find_ns+0x18/0x110
> [  925.738270 <0.04>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 
> 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 
> 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
> [  925.738282 <0.12>] RSP: 0018:ad6d0118fb00 EFLAGS: 00010246
> [  925.738287 <0.05>] RAX:  RBX:  
> RCX: 2098a12076864b7e
> [  925.738292 <0.05>] RDX:  RSI: b6606b31 
> RDI: 
> [  925.738297 <0.05>] RBP: b6606b31 R08: b5379d10 
> R09: 
> [  925.738302 <0.05>] R10: ad6d0118fb38 R11: 9a75f64820a8 
> R12: 
> [  925.738307 <0.05>] R13:  R14: b6606b31 
> R15: 9a7612b06130
> [  925.738313 <0.06>] FS:  7f3eca4e8700() 
> GS:9a763dbc() knlGS:
> [  925.738319 <0.06>] CS:  0010 DS:  ES:  CR0: 
> 80050033
> [  925.738323 <0.04>] CR2: 0090 CR3: 35e5a005 
> CR4: 000606e0
> [  925.738329 <0.06>] Call Trace:
> [  925.738334 <0.05>]  kernfs_find_and_get_ns+0x2e/0x50
> [  925.738339 <0.05>]  sysfs_remove_group+0x25/0x80
> [  925.738344 <0.05>]  sysfs_remove_groups+0x29/0x40
> [  925.738350 <0.06>]  free_msi_irqs+0xf5/0x190
> [  925.738354 <0.04>]  pci_disable_msi+0xe9/0x120

So the PCI core is trying to clean up attributes that it had registered,
which is fine.  But we can't seem to find the attributes?  Were they
already removed somewhere else?

that's odd.

> [  925.738406 <0.52>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
> [  925.738453 <0.47>]  tonga_ih_sw_fini+0xe/0x30 [amdgpu]
> [  925.738490 <0.37>]  amdgpu_device_fini_late+0x14b/0x440 [amdgpu]
> [  925.738529 <0.39>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
> [  925.738548 <0.19>]  drm_dev_put+0x5b/0x80 [drm]
> [  925.738558 <0.10>]  drm_release+0xc6/0xd0 [drm]
> [  925.738563 <0.05>]  __fput+0xc6/0x260
> [  925.738568 <0.05>]  task_work_run+0x79/0xb0
> [  925.738573 <0.05>]  do_exit+0x3d0/0xc60
> [  925.738578 <0.05>]  do_group_exit+0x47/0xb0
> [  925.738583 <0.05>]  get_signal+0x18b/0xc30
> [  925.738589 <0.06>]  do_signal+0x36/0x6a0
> [  925.738593 <0.04>]  ? force_sig_info_to_task+0xbc/0xd0
> [  925.738597 <0.04>]  ? signal_wake_up_state+0x15/0x30
> [  925.738603 <0.06>]  exit_to_usermode_loop+0x6f/0xc0
> [  925.738608 <0.05>]  prepare_exit_to_usermode+0xc7/0x110
> [  925.738613 <0.05>]  ret_from_intr+0x25/0x35
> [  925.738617 <0.04>] RIP: 0033:0x417369
> [  925.738621 <0.04>] Code: Bad RIP value.
> [  925.738625 <0.04>] RSP: 002b:7ffdd6bf0900 EFLAGS: 00010246
> [  925.738629 <0.04>] RAX: 7f3eca509000 RBX: 001e 
> RCX: 7f3ec95ba260
> [  925.738634 <0.05>] RDX: 7f3ec9889790 RSI: 000a 
> RDI: 
> [  925.738639 <0.05>] RBP: 7ffdd6bf09

Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-06-23 Thread Greg KH
On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
> 
> On 6/22/20 12:45 PM, Greg KH wrote:
> > On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
> > > On 6/22/20 7:21 AM, Greg KH wrote:
> > > > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> > > > > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > > > > > Track sysfs files in a list so they all can be removed during pci 
> > > > > > remove
> > > > > > since otherwise their removal after that causes crash because parent
> > > > > > folder was already removed during pci remove.
> > > > Huh?  That should not happen, do you have a backtrace of that crash?
> > > 
> > > 2 examples in the attached trace.
> > Odd, how did you trigger these?
> 
> 
> By manually triggering PCI remove from sysfs
> 
> cd /sys/bus/pci/devices/\:05\:00.0 && echo 1 > remove

For some reason, I didn't think that video/drm devices could handle
hot-remove like this.  The "old" PCI hotplug specification explicitly
said that video devices were not supported, has that changed?

And this whole issue is probably tied to the larger issue that Daniel
was asking me about, when it came to device lifetimes and the drm layer,
so odds are we need to fix that up first before worrying about trying to
support this crazy request, right?  :)

> > > [  925.738225 <0.188086>] BUG: kernel NULL pointer dereference, 
> > > address: 0090
> > > [  925.738232 <0.07>] #PF: supervisor read access in kernel mode
> > > [  925.738236 <0.04>] #PF: error_code(0x) - not-present page
> > > [  925.738240 <0.04>] PGD 0 P4D 0
> > > [  925.738245 <0.05>] Oops:  [#1] SMP PTI
> > > [  925.738249 <0.04>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: 
> > > GW  OE 5.5.0-rc7-dev-kfd+ #50
> > > [  925.738256 <0.07>] Hardware name: System manufacturer System 
> > > Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> > > [  925.738266 <0.10>] RIP: 0010:kernfs_find_ns+0x18/0x110
> > > [  925.738270 <0.04>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 
> > > 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 
> > > fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 
> > > 83 e5 20 41
> > > [  925.738282 <0.12>] RSP: 0018:ad6d0118fb00 EFLAGS: 00010246
> > > [  925.738287 <0.05>] RAX:  RBX:  
> > > RCX: 2098a12076864b7e
> > > [  925.738292 <0.05>] RDX:  RSI: b6606b31 
> > > RDI: 
> > > [  925.738297 <0.05>] RBP: b6606b31 R08: b5379d10 
> > > R09: 
> > > [  925.738302 <0.05>] R10: ad6d0118fb38 R11: 9a75f64820a8 
> > > R12: 
> > > [  925.738307 <0.05>] R13:  R14: b6606b31 
> > > R15: 9a7612b06130
> > > [  925.738313 <0.06>] FS:  7f3eca4e8700() 
> > > GS:9a763dbc() knlGS:
> > > [  925.738319 <0.06>] CS:  0010 DS:  ES:  CR0: 
> > > 80050033
> > > [  925.738323 <0.04>] CR2: 0090 CR3: 35e5a005 
> > > CR4: 000606e0
> > > [  925.738329 <0.06>] Call Trace:
> > > [  925.738334 <0.05>]  kernfs_find_and_get_ns+0x2e/0x50
> > > [  925.738339 <0.05>]  sysfs_remove_group+0x25/0x80
> > > [  925.738344 <0.05>]  sysfs_remove_groups+0x29/0x40
> > > [  925.738350 <0.06>]  free_msi_irqs+0xf5/0x190
> > > [  925.738354 <0.04>]  pci_disable_msi+0xe9/0x120
> > So the PCI core is trying to clean up attributes that it had registered,
> > which is fine.  But we can't seem to find the attributes?  Were they
> > already removed somewhere else?
> > 
> > that's odd.
> 
> 
> Yes, as i pointed above i am emulating device remove from sysfs and this
> triggers pci device remove sequence and as part of that my specific device
> folder (05:00.0) is removed from the sysfs tree.

But why are things being removed twice?

> > > [  925.738406 <0.52>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
> > > [  925.738453 <0.47>]  tonga_ih_sw_

Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-06-24 Thread Greg KH
On Tue, Jun 23, 2020 at 11:04:30PM -0400, Andrey Grodzovsky wrote:
> 
> On 6/23/20 2:05 AM, Greg KH wrote:
> > On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
> > > On 6/22/20 12:45 PM, Greg KH wrote:
> > > > On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
> > > > > On 6/22/20 7:21 AM, Greg KH wrote:
> > > > > > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> > > > > > > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > > > > > > > Track sysfs files in a list so they all can be removed during 
> > > > > > > > pci remove
> > > > > > > > since otherwise their removal after that causes crash because 
> > > > > > > > parent
> > > > > > > > folder was already removed during pci remove.
> > > > > > Huh?  That should not happen, do you have a backtrace of that crash?
> > > > > 2 examples in the attached trace.
> > > > Odd, how did you trigger these?
> > > 
> > > By manually triggering PCI remove from sysfs
> > > 
> > > cd /sys/bus/pci/devices/\:05\:00.0 && echo 1 > remove
> > For some reason, I didn't think that video/drm devices could handle
> > hot-remove like this.  The "old" PCI hotplug specification explicitly
> > said that video devices were not supported, has that changed?
> > 
> > And this whole issue is probably tied to the larger issue that Daniel
> > was asking me about, when it came to device lifetimes and the drm layer,
> > so odds are we need to fix that up first before worrying about trying to
> > support this crazy request, right?  :)
> > 
> > > > > [  925.738225 <0.188086>] BUG: kernel NULL pointer dereference, 
> > > > > address: 0090
> > > > > [  925.738232 <0.07>] #PF: supervisor read access in kernel 
> > > > > mode
> > > > > [  925.738236 <0.04>] #PF: error_code(0x) - not-present 
> > > > > page
> > > > > [  925.738240 <0.04>] PGD 0 P4D 0
> > > > > [  925.738245 <0.05>] Oops:  [#1] SMP PTI
> > > > > [  925.738249 <0.04>] CPU: 7 PID: 2547 Comm: amdgpu_test 
> > > > > Tainted: GW  OE 5.5.0-rc7-dev-kfd+ #50
> > > > > [  925.738256 <0.07>] Hardware name: System manufacturer 
> > > > > System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> > > > > [  925.738266 <0.10>] RIP: 0010:kernfs_find_ns+0x18/0x110
> > > > > [  925.738270 <0.04>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 
> > > > > 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 
> > > > > 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 
> > > > > 8b 5f 68 66 83 e5 20 41
> > > > > [  925.738282 <0.12>] RSP: 0018:ad6d0118fb00 EFLAGS: 
> > > > > 00010246
> > > > > [  925.738287 <0.05>] RAX:  RBX: 
> > > > >  RCX: 2098a12076864b7e
> > > > > [  925.738292 <0.05>] RDX:  RSI: 
> > > > > b6606b31 RDI: 
> > > > > [  925.738297 <0.05>] RBP: b6606b31 R08: 
> > > > > b5379d10 R09: 
> > > > > [  925.738302 <0.05>] R10: ad6d0118fb38 R11: 
> > > > > 9a75f64820a8 R12: 
> > > > > [  925.738307 <0.05>] R13:  R14: 
> > > > > b6606b31 R15: 9a7612b06130
> > > > > [  925.738313 <0.06>] FS:  7f3eca4e8700() 
> > > > > GS:9a763dbc() knlGS:
> > > > > [  925.738319 <0.06>] CS:  0010 DS:  ES:  CR0: 
> > > > > 80050033
> > > > > [  925.738323 <0.04>] CR2: 0090 CR3: 
> > > > > 35e5a005 CR4: 000606e0
> > > > > [  925.738329 <0.06>] Call Trace:
> > > > > [  925.738334 <0.05>]  kernfs_find_and_get_ns+0x2e/0x50
> > > > > [  925.738339 <0.05>]  sysfs_remove_group+0x25/0x80
> > > > > [  925.738344 <0.05>]  sysfs_remove_groups+0x29/0x40
> > > > > [  925.738350 <0.06>]  free_msi_irqs+0xf5/0x190
> > > > > [  925.738354 <0.04>]  pci_disable_msi+0xe9/0x120
> > > > So the PCI core is trying to clean up attributes that it had registered,
> > > > which is fine.  But we can't seem to find the attributes?  Were they
> > > > already removed somewhere else?
> > > > 
> > > > that's odd.
> > > 
> > > Yes, as i pointed above i am emulating device remove from sysfs and this
> > > triggers pci device remove sequence and as part of that my specific device
> > > folder (05:00.0) is removed from the sysfs tree.
> > But why are things being removed twice?
> 
> 
> Not sure I understand what removed twice ? I remove only once per sysfs 
> attribute.

This code path shows that the kernel is trying to remove a file that is
not present, so someone removed it already...

thanks,

gre k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-12-02 Thread Greg KH
On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/11/20 10:34 AM, Greg KH wrote:
> > On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> > > On 11/10/20 12:59 PM, Greg KH wrote:
> > > > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > > > Hi, back to this after a long context switch for some higher priority 
> > > > > stuff.
> > > > > 
> > > > > So here I was able eventually to drop all this code and this change 
> > > > > here 
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9fbfecac94a340dfb68408d886571609%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407055896651058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ye8HJR1vidppcOBnlOgVu5GwKD2%2Bb5ztHbiI%2BubKKT0%3D&reserved=0
> > > > > was enough for me. Seems like while device_remove_file can handle the 
> > > > > use
> > > > > case where the file and the parent directory already gone,
> > > > > sysfs_remove_group goes down in flames in that case
> > > > > due to kobj->sd being unset on device removal.
> > > > A driver shouldn't ever have to remove individual sysfs groups, the
> > > > driver core/bus logic should do it for them automatically.
> > > > 
> > > > And whenever a driver calls a sysfs_* call, that's a hint that something
> > > > is not working properly.
> > > 
> > > 
> > > Do you mean that while the driver creates the groups and files explicitly
> > > from it's different subsystems it should not explicitly remove each
> > > one of them because all of them should be removed at once (and
> > > recursively) when the device is being removed ?
> > Individual drivers should never add groups/files in sysfs, the driver
> > core should do it properly for you if you have everything set up
> > properly.  And yes, the driver core will automatically remove them as
> > well.
> > 
> > Please use the default groups attribute for your bus/subsystem and this
> > will happen automagically.
> 
> 
> Hi Greg, I tried your suggestion to hang amdgpu's sysfs
> attributes on default attributes in struct device.groups but turns out it's
> not usable since by the
> time i have access to struct device from amdgpu code it has already been
> initialized by pci core
> (i.e.  past the point where device_add->device_add_attrs->device_add_groups
> with dev->groups is called)
> and so i can't really use it.

That's odd, why can't you just set the groups pointer in your pci_driver
structure?  That's what it is there for, right?

> What I can only think of using is creating my own struct attribute_group **
> array in amdgpu where I aggregate all
> amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci
> probe with that array and on device remove call
> device_remove_groups with the same array.

Horrid, no, see above :)

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-12-02 Thread Greg KH
On Wed, Dec 02, 2020 at 01:02:06PM -0500, Andrey Grodzovsky wrote:
> 
> On 12/2/20 12:34 PM, Greg KH wrote:
> > On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
> > > On 11/11/20 10:34 AM, Greg KH wrote:
> > > > On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> > > > > On 11/10/20 12:59 PM, Greg KH wrote:
> > > > > > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > > > > > Hi, back to this after a long context switch for some higher 
> > > > > > > priority stuff.
> > > > > > > 
> > > > > > > So here I was able eventually to drop all this code and this 
> > > > > > > change here 
> > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C29ff7efb89bd47d8488708d896e86e7c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637425272317529134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Vzc3fVofA6%2BMPSqHmBqcWavQLKWU1%2FXKJFun24irLf0%3D&reserved=0
> > > > > > > was enough for me. Seems like while device_remove_file can handle 
> > > > > > > the use
> > > > > > > case where the file and the parent directory already gone,
> > > > > > > sysfs_remove_group goes down in flames in that case
> > > > > > > due to kobj->sd being unset on device removal.
> > > > > > A driver shouldn't ever have to remove individual sysfs groups, the
> > > > > > driver core/bus logic should do it for them automatically.
> > > > > > 
> > > > > > And whenever a driver calls a sysfs_* call, that's a hint that 
> > > > > > something
> > > > > > is not working properly.
> > > > > 
> > > > > Do you mean that while the driver creates the groups and files 
> > > > > explicitly
> > > > > from it's different subsystems it should not explicitly remove each
> > > > > one of them because all of them should be removed at once (and
> > > > > recursively) when the device is being removed ?
> > > > Individual drivers should never add groups/files in sysfs, the driver
> > > > core should do it properly for you if you have everything set up
> > > > properly.  And yes, the driver core will automatically remove them as
> > > > well.
> > > > 
> > > > Please use the default groups attribute for your bus/subsystem and this
> > > > will happen automagically.
> > > 
> > > Hi Greg, I tried your suggestion to hang amdgpu's sysfs
> > > attributes on default attributes in struct device.groups but turns out 
> > > it's
> > > not usable since by the
> > > time i have access to struct device from amdgpu code it has already been
> > > initialized by pci core
> > > (i.e.  past the point where 
> > > device_add->device_add_attrs->device_add_groups
> > > with dev->groups is called)
> > > and so i can't really use it.
> > That's odd, why can't you just set the groups pointer in your pci_driver
> > structure?  That's what it is there for, right?
> 
> I am probably missing something but amdgpu sysfs attrs are per device not
> per driver

Oops, you are right, you want the 'dev_groups' field.  Looks like pci
doesn't export that directly, so you can do:
.driver {
.dev_groups = my_device_groups;
},
in your pci_driver structure.

Or I'm sure the PCI driver maintainer would take a patch like
7d9c1d2f7aca ("USB: add support for dev_groups to struct
usb_device_driver") was done for the USB subsystem, as diving into the
"raw" .driver pointer isn't really that clean or nice in my opinion.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr

2021-01-19 Thread Greg KH
On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
>  static struct pci_driver amdgpu_kms_pci_driver = {
>   .name = DRIVER_NAME,
>   .id_table = pciidlist,
> @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = {
>   .shutdown = amdgpu_pci_shutdown,
>   .driver.pm = &amdgpu_pm_ops,
>   .err_handler = &amdgpu_pci_err_handler,
> + .driver.dev_groups = amdgpu_sysfs_groups,

Shouldn't this just be:
groups - amdgpu_sysfs_groups,

Why go to the "driver root" here?

Other than that tiny thing, looks good to me, nice cleanup!

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr

2021-01-19 Thread Greg KH
On Tue, Jan 19, 2021 at 11:36:01AM -0500, Andrey Grodzovsky wrote:
> 
> On 1/19/21 2:34 AM, Greg KH wrote:
> > On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
> > >   static struct pci_driver amdgpu_kms_pci_driver = {
> > >   .name = DRIVER_NAME,
> > >   .id_table = pciidlist,
> > > @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver = {
> > >   .shutdown = amdgpu_pci_shutdown,
> > >   .driver.pm = &amdgpu_pm_ops,
> > >   .err_handler = &amdgpu_pci_err_handler,
> > > + .driver.dev_groups = amdgpu_sysfs_groups,
> > Shouldn't this just be:
> > groups - amdgpu_sysfs_groups,
> > 
> > Why go to the "driver root" here?
> 
> 
> Because I still didn't get to your suggestion to propose a patch to add 
> groups to
> pci_driver, it's located in 'base' driver struct.

You are a pci driver, you should never have to mess with the "base"
driver struct.  Look at commit 92d50fc1602e ("PCI/IB: add support for
pci driver attribute groups") which got merged in 4.14, way back in
2017 :)

driver.pm also looks odd, but I'm just going to ignore that for now...

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v4 10/14] dmr/amdgpu: Move some sysfs attrs creation to default_attr

2021-01-19 Thread Greg KH
On Tue, Jan 19, 2021 at 02:04:48PM -0500, Alex Deucher wrote:
> On Tue, Jan 19, 2021 at 1:26 PM Greg KH  wrote:
> >
> > On Tue, Jan 19, 2021 at 11:36:01AM -0500, Andrey Grodzovsky wrote:
> > >
> > > On 1/19/21 2:34 AM, Greg KH wrote:
> > > > On Mon, Jan 18, 2021 at 04:01:19PM -0500, Andrey Grodzovsky wrote:
> > > > >   static struct pci_driver amdgpu_kms_pci_driver = {
> > > > >   .name = DRIVER_NAME,
> > > > >   .id_table = pciidlist,
> > > > > @@ -1595,6 +1607,7 @@ static struct pci_driver amdgpu_kms_pci_driver 
> > > > > = {
> > > > >   .shutdown = amdgpu_pci_shutdown,
> > > > >   .driver.pm = &amdgpu_pm_ops,
> > > > >   .err_handler = &amdgpu_pci_err_handler,
> > > > > + .driver.dev_groups = amdgpu_sysfs_groups,
> > > > Shouldn't this just be:
> > > > groups - amdgpu_sysfs_groups,
> > > >
> > > > Why go to the "driver root" here?
> > >
> > >
> > > Because I still didn't get to your suggestion to propose a patch to add 
> > > groups to
> > > pci_driver, it's located in 'base' driver struct.
> >
> > You are a pci driver, you should never have to mess with the "base"
> > driver struct.  Look at commit 92d50fc1602e ("PCI/IB: add support for
> > pci driver attribute groups") which got merged in 4.14, way back in
> > 2017 :)
> 
> Per the previous discussion of this patch set:
> https://www.mail-archive.com/amd-gfx@lists.freedesktop.org/msg56019.html

Hey, at least I'm consistent :)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH -stable] drm/amdgpu/display: drop DCN support for aarch64

2021-01-21 Thread Greg KH
On Thu, Jan 21, 2021 at 10:20:40AM +0100, Ard Biesheuvel wrote:
> From: Alex Deucher 
> 
> commit c241ed2f0ea549c18cff62a3708b43846b84dae3 upstream.
> 
> >From Ard:
> 
> "Simply disabling -mgeneral-regs-only left and right is risky, given that
> the standard AArch64 ABI permits the use of FP/SIMD registers anywhere,
> and GCC is known to use SIMD registers for spilling, and may invent
> other uses of the FP/SIMD register file that have nothing to do with the
> floating point code in question. Note that putting kernel_neon_begin()
> and kernel_neon_end() around the code that does use FP is not sufficient
> here, the problem is in all the other code that may be emitted with
> references to SIMD registers in it.
> 
> So the only way to do this properly is to put all floating point code in
> a separate compilation unit, and only compile that unit with
> -mgeneral-regs-only."
> 
> Disable support until the code can be properly refactored to support this
> properly on aarch64.
> 
> Acked-by: Will Deacon 
> Reported-by: Ard Biesheuvel 
> Signed-off-by: Alex Deucher 
> [ardb: backport to v5.10 by reverting c38d444e44badc55 instead]
> Acked-by: Alex Deucher  # v5.10 backport
> Signed-off-by: Ard Biesheuvel 
> ---
> The original commit does not apply cleanly to v5.10, as it reverts a
> combination of patches, some of which are not present in v5.10.
> 
> This patch is a straight revert of c38d444e44badc55, which is the only
> change that needs to be backed out from v5.10, and amounts to the same
> thing as the original mainline commit.

Now queued up, thanks.

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [regression 5.7-rc1] System does not power off, just halts

2020-04-14 Thread Greg KH
On Mon, Apr 13, 2020 at 01:48:58PM -0400, Alex Deucher wrote:
> On Mon, Apr 13, 2020 at 1:47 PM Paul Menzel  wrote:
> >
> > Dear Prike, dear Alex, dear Linux folks,
> >
> >
> > Am 13.04.20 um 10:44 schrieb Paul Menzel:
> >
> > > A regression between causes a system with the AMD board MSI B350M MORTAR
> > > (MS-7A37) with an AMD Ryzen 3 2200G not to power off any more but just
> > > to halt.
> > >
> > > The regression is introduced in 9ebe5422ad6c..b032227c6293. I am in the
> > > process to bisect this, but maybe somebody already has an idea.
> >
> > I found the Easter egg:
> >
> > > commit 487eca11a321ef33bcf4ca5adb3c0c4954db1b58
> > > Author: Prike Liang 
> > > Date:   Tue Apr 7 20:21:26 2020 +0800
> > >
> > > drm/amdgpu: fix gfx hang during suspend with video playback (v2)
> > >
> > > The system will be hang up during S3 suspend because of SMU is pending
> > > for GC not respose the register CP_HQD_ACTIVE access request.This 
> > > issue
> > > root cause of accessing the GC register under enter GFX CGGPG and can
> > > be fixed by disable GFX CGPG before perform suspend.
> > >
> > > v2: Use disable the GFX CGPG instead of RLC safe mode guard.
> > >
> > > Signed-off-by: Prike Liang 
> > > Tested-by: Mengbing Wang 
> > > Reviewed-by: Huang Rui 
> > > Signed-off-by: Alex Deucher 
> > > Cc: sta...@vger.kernel.org
> >
> > It reverts cleanly on top of 5.7-rc1, and this fixes the issue.
> >
> > Greg, please do not apply this to the stable series. The commit message
> > doesn’t even reference a issue/bug report, and doesn’t give a detailed
> > problem description. What system is it?
> >
> > Dave, Alex, how to proceed? Revert? I created issue 1094 [1].
> 
> Already fixed:
> https://patchwork.freedesktop.org/patch/361195/

Any reason that doesn't have a cc: stable tag on it?

And is it committed to any tree at the moment?

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/3] kernel: better document the use_mm/unuse_mm API contract

2020-04-16 Thread Greg KH
On Thu, Apr 16, 2020 at 07:31:57AM +0200, Christoph Hellwig wrote:
> Switch the function documentation to kerneldoc comments, and add
> WARN_ON_ONCE asserts that the calling thread is a kernel thread and
> does not have ->mm set (or has ->mm set in the case of unuse_mm).
> 
> Also give the functions a kthread_ prefix to better document the
> use case.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Felix Kuehling 

Acked-by: Greg Kroah-Hartman  [usb]
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] kernel: set USER_DS in kthread_use_mm

2020-04-16 Thread Greg KH
On Thu, Apr 16, 2020 at 07:31:58AM +0200, Christoph Hellwig wrote:
> Some architectures like arm64 and s390 require USER_DS to be set for
> kernel threads to access user address space, which is the whole purpose
> of kthread_use_mm, but other like x86 don't.  That has lead to a huge
> mess where some callers are fixed up once they are tested on said
> architectures, while others linger around and yet other like io_uring
> try to do "clever" optimizations for what usually is just a trivial
> asignment to a member in the thread_struct for most architectures.
> 
> Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the
> previous value instead.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Michael S. Tsirkin  [vhost]
> ---

Acked-by: Greg Kroah-Hartman  [usb]
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] Revert "drm/radeon: handle PCIe root ports with addressing limitations"

2020-09-16 Thread Greg KH
On Tue, Sep 15, 2020 at 02:46:07PM -0400, Alex Deucher wrote:
> This change breaks tons of systems.

Very vague :(

This commit has also been merged for over a year, why the sudden
problem now?

> This reverts commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713.

You mean "33b3ad3788ab ("drm/radeon: handle PCIe root ports with
addressing limitations")"?

That's the proper way to reference commits in changelogs please.  It's
even documented that way...

> 
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206973
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=206697
> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207763
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1140
> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1287
> Signed-off-by: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Cc: Christoph Hellwig 
> Cc: christian.koe...@amd.com

Fixes: 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing 
limitations")

as well?

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RFC] treewide: cleanup unreachable breaks

2020-10-19 Thread Greg KH
On Sat, Oct 17, 2020 at 09:09:28AM -0700, t...@redhat.com wrote:
> From: Tom Rix 
> 
> This is a upcoming change to clean up a new warning treewide.
> I am wondering if the change could be one mega patch (see below) or
> normal patch per file about 100 patches or somewhere half way by collecting
> early acks.

Please break it up into one-patch-per-subsystem, like normal, and get it
merged that way.

Sending us a patch, without even a diffstat to review, isn't going to
get you very far...

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Outreachy kernel] [PATCH 4/5] gpu: drm: amdgpu: Replace snprintf() with sysfs_emit()

2020-10-22 Thread Greg KH
On Thu, Oct 22, 2020 at 07:17:56PM +0530, Sumera Priyadarsini wrote:
> Using snprintf() for show() methods holds the risk of buffer overrun
> as snprintf() does not know the PAGE_SIZE maximum of the temporary
> buffer used to output sysfs content.
> 
> Modify amdgpu_psp.c to use sysfs_emit() instead which knows the
> size of the temporary buffer.
> 
> Issue found with Coccinelle.
> 
> Signed-off-by: Sumera Priyadarsini 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index d6c38e24f130..4d1d1e1b005d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -2621,7 +2621,7 @@ static ssize_t psp_usbc_pd_fw_sysfs_read(struct device 
> *dev,
>   return ret;
>   }
>  
> - return snprintf(buf, PAGE_SIZE, "%x\n", fw_ver);
> + return sysfs_emit(buf, PAGE_SIZE, "%x\n", fw_ver);

Did you build this code?  I don't think it is correct...

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v3 03/11] drm/amd/display: Honor the offset for plane 0.

2020-10-22 Thread Greg KH
On Thu, Oct 22, 2020 at 11:36:12AM -0400, Alex Deucher wrote:
> On Wed, Oct 21, 2020 at 7:31 PM Bas Nieuwenhuizen
>  wrote:
> >
> > With modifiers I'd like to support non-dedicated buffers for
> > images.
> >
> > Signed-off-by: Bas Nieuwenhuizen 
> > Cc: sta...@vger.kernel.org # 5.1.0
> 
> I think you need # 5.1.x- for it to be applied to all stable kernels
> since 5.1 otherwise it will just apply to 5.1.x

Not true, I will try from the number up to the latest version.  So all
should be fine here.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Outreachy kernel] [PATCH] drm/amdgpu: use DEFINE_DEBUGFS_ATTRIBUTE with debugfs_create_file_unsafe()

2020-10-30 Thread Greg KH
On Fri, Oct 30, 2020 at 08:52:45AM +0530, Deepak R Varma wrote:
> Using DEFINE_DEBUGFS_ATTRIBUTE macro with debugfs_create_file_unsafe()
> function in place of the debugfs_create_file() function will make the
> file operation struct "reset" aware of the file's lifetime. Additional
> details here: https://lists.archive.carbon60.com/linux/kernel/2369498
> 
> Issue reported by Coccinelle script:
> scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
> 
> Signed-off-by: Deepak R Varma 
> ---
> Please Note: This is a Outreachy project task patch.
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 2d125b8b15ee..f076b1ba7319 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1551,29 +1551,29 @@ static int amdgpu_debugfs_sclk_set(void *data, u64 
> val)
>   return 0;
>  }
>  
> -DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
> - amdgpu_debugfs_ib_preempt, "%llu\n");
> +DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
> +  amdgpu_debugfs_ib_preempt, "%llu\n");

Are you sure this is ok?  Do these devices need this additional
"protection"?  Do they have the problem that these macros were written
for?

Same for the other patches you just submitted here, I think you need to
somehow "prove" that these changes are necessary, checkpatch isn't able
to determine this all the time.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Outreachy kernel] [PATCH] drm/amdgpu: use DEFINE_DEBUGFS_ATTRIBUTE with debugfs_create_file_unsafe()

2020-10-30 Thread Greg KH
On Fri, Oct 30, 2020 at 01:27:16PM +0530, Deepak R Varma wrote:
> On Fri, Oct 30, 2020 at 08:11:20AM +0100, Greg KH wrote:
> > On Fri, Oct 30, 2020 at 08:52:45AM +0530, Deepak R Varma wrote:
> > > Using DEFINE_DEBUGFS_ATTRIBUTE macro with debugfs_create_file_unsafe()
> > > function in place of the debugfs_create_file() function will make the
> > > file operation struct "reset" aware of the file's lifetime. Additional
> > > details here: https://lists.archive.carbon60.com/linux/kernel/2369498
> > > 
> > > Issue reported by Coccinelle script:
> > > scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
> > > 
> > > Signed-off-by: Deepak R Varma 
> > > ---
> > > Please Note: This is a Outreachy project task patch.
> > > 
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 20 ++--
> > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > index 2d125b8b15ee..f076b1ba7319 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > @@ -1551,29 +1551,29 @@ static int amdgpu_debugfs_sclk_set(void *data, 
> > > u64 val)
> > >   return 0;
> > >  }
> > >  
> > > -DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
> > > - amdgpu_debugfs_ib_preempt, "%llu\n");
> > > +DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
> > > +  amdgpu_debugfs_ib_preempt, "%llu\n");
> > 
> > Are you sure this is ok?  Do these devices need this additional
> > "protection"?  Do they have the problem that these macros were written
> > for?
> > 
> > Same for the other patches you just submitted here, I think you need to
> > somehow "prove" that these changes are necessary, checkpatch isn't able
> > to determine this all the time.
> 
> Hi Greg,
> Based on my understanding, the current function debugfs_create_file()
> adds an overhead of lifetime managing proxy for such fop structs. This
> should be applicable to these set of drivers as well. Hence I think this
> change will be useful.

Why do these drivers need these changes?  Are these files dynamically
removed from the system at random times?

There is a reason we didn't just do a global search/replace for this in
the kernel when the new functions were added, so I don't know why
checkpatch is now saying it must be done.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Outreachy kernel] [PATCH] drm/amdgpu: use DEFINE_DEBUGFS_ATTRIBUTE with debugfs_create_file_unsafe()

2020-10-30 Thread Greg KH
On Fri, Oct 30, 2020 at 01:47:05PM +0530, Sumera Priyadarsini wrote:
> On Fri, 30 Oct, 2020, 1:32 PM Greg KH,  wrote:
> 
> > On Fri, Oct 30, 2020 at 01:27:16PM +0530, Deepak R Varma wrote:
> > > On Fri, Oct 30, 2020 at 08:11:20AM +0100, Greg KH wrote:
> > > > On Fri, Oct 30, 2020 at 08:52:45AM +0530, Deepak R Varma wrote:
> > > > > Using DEFINE_DEBUGFS_ATTRIBUTE macro with
> > debugfs_create_file_unsafe()
> > > > > function in place of the debugfs_create_file() function will make the
> > > > > file operation struct "reset" aware of the file's lifetime.
> > Additional
> > > > > details here:
> > https://lists.archive.carbon60.com/linux/kernel/2369498
> > > > >
> > > > > Issue reported by Coccinelle script:
> > > > > scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
> > > > >
> > > > > Signed-off-by: Deepak R Varma 
> > > > > ---
> > > > > Please Note: This is a Outreachy project task patch.
> > > > >
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 20
> > ++--
> > > > >  1 file changed, 10 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > > index 2d125b8b15ee..f076b1ba7319 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > > @@ -1551,29 +1551,29 @@ static int amdgpu_debugfs_sclk_set(void
> > *data, u64 val)
> > > > >   return 0;
> > > > >  }
> > > > >
> > > > > -DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
> > > > > - amdgpu_debugfs_ib_preempt, "%llu\n");
> > > > > +DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
> > > > > +  amdgpu_debugfs_ib_preempt, "%llu\n");
> > > >
> > > > Are you sure this is ok?  Do these devices need this additional
> > > > "protection"?  Do they have the problem that these macros were written
> > > > for?
> > > >
> > > > Same for the other patches you just submitted here, I think you need to
> > > > somehow "prove" that these changes are necessary, checkpatch isn't able
> > > > to determine this all the time.
> > >
> > > Hi Greg,
> > > Based on my understanding, the current function debugfs_create_file()
> > > adds an overhead of lifetime managing proxy for such fop structs. This
> > > should be applicable to these set of drivers as well. Hence I think this
> > > change will be useful.
> >
> > Why do these drivers need these changes?  Are these files dynamically
> > removed from the system at random times?
> >
> > There is a reason we didn't just do a global search/replace for this in
> > the kernel when the new functions were added, so I don't know why
> > checkpatch is now saying it must be done.
> >
> 
> Hi,
> 
> Sorry to jump in on the thread this way, but what exactly does a 'lifetime
> managing proxy' for file operations mean? I am trying to understand how
> DEFINE_DEBUGFS_ATTRIBUTE changes things wrt debug_ fs file operations but
> can't find many resources. :(

It means that the debugfs core can handle debugfs files being removed
from the system while they are still open when they were created by a
driver/module that is now unloaded from memory.

This is only an issue for drivers that manage devices that have unknown
lifespans (i.e. they can be yanked out of the system at any time, and
the memory for those debugfs files can be freed).

For the entire DRM/GPU subsystem, I strongly doubt this is the case.

> Please let me know if I should be asking this in a different mailing
> list/irc instead.
> 
> The change seems to be suggested by a coccinelle script.

I know, and I don't think that script knows the nuances behind this, as,
again, we would have just done a global search/replace for this when the
debugfs fixes went into the kernel.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [Outreachy kernel] [PATCH] drm/amdgpu: use DEFINE_DEBUGFS_ATTRIBUTE with debugfs_create_file_unsafe()

2020-10-30 Thread Greg KH
On Fri, Oct 30, 2020 at 09:00:04AM +0100, Christian König wrote:
> Am 30.10.20 um 08:57 schrieb Deepak R Varma:
> > On Fri, Oct 30, 2020 at 08:11:20AM +0100, Greg KH wrote:
> > > On Fri, Oct 30, 2020 at 08:52:45AM +0530, Deepak R Varma wrote:
> > > > Using DEFINE_DEBUGFS_ATTRIBUTE macro with debugfs_create_file_unsafe()
> > > > function in place of the debugfs_create_file() function will make the
> > > > file operation struct "reset" aware of the file's lifetime. Additional
> > > > details here: 
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.archive.carbon60.com%2Flinux%2Fkernel%2F2369498&data=04%7C01%7Cchristian.koenig%40amd.com%7Cddd7a6ac8164415a639708d87ca97004%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637396414464384011%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=o6GOHvMxNMuOPlC4nhDyURCHBLqfQZhYQq%2BiIMt3D3s%3D&reserved=0
> > > > 
> > > > Issue reported by Coccinelle script:
> > > > scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci
> > > > 
> > > > Signed-off-by: Deepak R Varma 
> > > > ---
> > > > Please Note: This is a Outreachy project task patch.
> > > > 
> > > >   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 20 ++--
> > > >   1 file changed, 10 insertions(+), 10 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > index 2d125b8b15ee..f076b1ba7319 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> > > > @@ -1551,29 +1551,29 @@ static int amdgpu_debugfs_sclk_set(void *data, 
> > > > u64 val)
> > > > return 0;
> > > >   }
> > > > -DEFINE_SIMPLE_ATTRIBUTE(fops_ib_preempt, NULL,
> > > > -   amdgpu_debugfs_ib_preempt, "%llu\n");
> > > > +DEFINE_DEBUGFS_ATTRIBUTE(fops_ib_preempt, NULL,
> > > > +amdgpu_debugfs_ib_preempt, "%llu\n");
> > > Are you sure this is ok?  Do these devices need this additional
> > > "protection"?  Do they have the problem that these macros were written
> > > for?
> > > 
> > > Same for the other patches you just submitted here, I think you need to
> > > somehow "prove" that these changes are necessary, checkpatch isn't able
> > > to determine this all the time.
> > Hi Greg,
> > Based on my understanding, the current function debugfs_create_file()
> > adds an overhead of lifetime managing proxy for such fop structs. This
> > should be applicable to these set of drivers as well. Hence I think this
> > change will be useful.
> 
> Well since this is only created once per device instance I don't really care
> about this little overhead.
> 
> But what exactly is debugfs doing or not doing here?

It is trying to save drivers from having debugfs files open that point
to memory that can go away at any time.  For graphics devices, I doubt
that is the case.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: do not initialise global variables to 0 or NULL

2020-11-02 Thread Greg KH
On Mon, Nov 02, 2020 at 02:43:45PM -0500, Alex Deucher wrote:
> On Mon, Nov 2, 2020 at 1:42 PM Deepak R Varma  wrote:
> >
> > Initializing global variable to 0 or NULL is not necessary and should
> > be avoided. Issue reported by checkpatch script as:
> > ERROR: do not initialise globals to 0 (or NULL).
> 
> I agree that this is technically correct, but a lot of people don't
> seem to know that so we get a lot of comments about this code for the
> variables that are not explicitly set.  Seems less confusing to
> initialize them even if it not necessary.  I don't have a particularly
> strong opinion on it however.

The kernel coding style is to do it this way.  You even save space and
time by doing it as well :)

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: do not initialise global variables to 0 or NULL

2020-11-03 Thread Greg KH
On Mon, Nov 02, 2020 at 09:48:25PM +0100, Christian König wrote:
> Am 03.11.20 um 07:53 schrieb Greg KH:
> > On Mon, Nov 02, 2020 at 09:06:21PM +0100, Christian König wrote:
> > > Am 02.11.20 um 20:43 schrieb Alex Deucher:
> > > > On Mon, Nov 2, 2020 at 1:42 PM Deepak R Varma  
> > > > wrote:
> > > > > Initializing global variable to 0 or NULL is not necessary and should
> > > > > be avoided. Issue reported by checkpatch script as:
> > > > > ERROR: do not initialise globals to 0 (or NULL).
> > > > I agree that this is technically correct, but a lot of people don't
> > > > seem to know that so we get a lot of comments about this code for the
> > > > variables that are not explicitly set.  Seems less confusing to
> > > > initialize them even if it not necessary.  I don't have a particularly
> > > > strong opinion on it however.
> > > Agree with Alex.
> > > 
> > > Especially for the module parameters we should have a explicit init value
> > > for documentation purposes, even when it is 0.
> > Why is this one tiny driver somehow special compared to the entire rest
> > of the kernel?  (hint, it isn't...)
> 
> And it certainly shouldn't :)
> 
> > Please follow the normal coding style rules, there's no reason to ignore
> > them unless you like to constantly reject patches like this that get
> > sent to you.
> 
> Yeah, that's a rather good point.
> 
> Not a particular strong opinion on this either, but when something global is
> set to 0 people usually do this to emphases that it is important that it is
> zero.

Again, no, that's not what we have been doing in the kernel for the past
20+ years.  If you do not set it to anything, we all know it is
important for it to be set to 0.  Otherwise we would explicitly set it
to something else.  And if we don't care, then that too doesn't matter
so we let it be 0 by not initializing it, it doesn't matter.

I think this very change is what started the whole "kernel janitor"
movement all those years ago, because it was easily proven that this
simple change saved both time and memory.

This shouldn't even be an argument we are having anymore...

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: do not initialise global variables to 0 or NULL

2020-11-03 Thread Greg KH
On Mon, Nov 02, 2020 at 09:06:21PM +0100, Christian König wrote:
> Am 02.11.20 um 20:43 schrieb Alex Deucher:
> > On Mon, Nov 2, 2020 at 1:42 PM Deepak R Varma  wrote:
> > > Initializing global variable to 0 or NULL is not necessary and should
> > > be avoided. Issue reported by checkpatch script as:
> > > ERROR: do not initialise globals to 0 (or NULL).
> > I agree that this is technically correct, but a lot of people don't
> > seem to know that so we get a lot of comments about this code for the
> > variables that are not explicitly set.  Seems less confusing to
> > initialize them even if it not necessary.  I don't have a particularly
> > strong opinion on it however.
> 
> Agree with Alex.
> 
> Especially for the module parameters we should have a explicit init value
> for documentation purposes, even when it is 0.

Why is this one tiny driver somehow special compared to the entire rest
of the kernel?  (hint, it isn't...)

Please follow the normal coding style rules, there's no reason to ignore
them unless you like to constantly reject patches like this that get
sent to you.

thnaks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: do not initialise global variables to 0 or NULL

2020-11-03 Thread Greg KH
On Tue, Nov 03, 2020 at 02:50:40PM +, Deucher, Alexander wrote:
> [AMD Public Use]
> 
> > -Original Message-
> > From: Greg KH 
> > Sent: Tuesday, November 3, 2020 1:53 AM
> > To: Koenig, Christian 
> > Cc: Alex Deucher ; Deepak R Varma
> > ; David Airlie ; LKML  > ker...@vger.kernel.org>; Maling list - DRI developers  > de...@lists.freedesktop.org>; Melissa Wen ;
> > amd-gfx list ; Daniel Vetter
> > ; Daniel Vetter ; Deucher,
> > Alexander 
> > Subject: Re: [PATCH] drm/amdgpu: do not initialise global variables to 0 or
> > NULL
> > 
> > On Mon, Nov 02, 2020 at 09:06:21PM +0100, Christian König wrote:
> > > Am 02.11.20 um 20:43 schrieb Alex Deucher:
> > > > On Mon, Nov 2, 2020 at 1:42 PM Deepak R Varma
> >  wrote:
> > > > > Initializing global variable to 0 or NULL is not necessary and
> > > > > should be avoided. Issue reported by checkpatch script as:
> > > > > ERROR: do not initialise globals to 0 (or NULL).
> > > > I agree that this is technically correct, but a lot of people don't
> > > > seem to know that so we get a lot of comments about this code for
> > > > the variables that are not explicitly set.  Seems less confusing to
> > > > initialize them even if it not necessary.  I don't have a
> > > > particularly strong opinion on it however.
> > >
> > > Agree with Alex.
> > >
> > > Especially for the module parameters we should have a explicit init
> > > value for documentation purposes, even when it is 0.
> > 
> > Why is this one tiny driver somehow special compared to the entire rest of
> > the kernel?  (hint, it isn't...)
> > 
> > Please follow the normal coding style rules, there's no reason to ignore 
> > them
> > unless you like to constantly reject patches like this that get sent to you.
> > 
> 
> I'll apply the patch, but as a data point, this is the first time I've
> gotten a patch to make this change.  I get several bug reports or
> patches every year to explicitly set values to global variables
> because users assume they are not initialized.  So it will always be a
> trade off as to which patches you want to NACK.

Are you all not running your patches through checkpatch.pl to tell you
simple things like this?  If no, I suggest you start doing that :)

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-11-10 Thread Greg KH
On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> Hi, back to this after a long context switch for some higher priority stuff.
> 
> So here I was able eventually to drop all this code and this change here 
> https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=amd-staging-drm-next-device-unplug&id=61852c8a59b4dd89d637693552c73175b9f2ccd6
> was enough for me. Seems like while device_remove_file can handle the use
> case where the file and the parent directory already gone,
> sysfs_remove_group goes down in flames in that case
> due to kobj->sd being unset on device removal.

A driver shouldn't ever have to remove individual sysfs groups, the
driver core/bus logic should do it for them automatically.

And whenever a driver calls a sysfs_* call, that's a hint that something
is not working properly.

Also, run your patch above through checkpatch.pl before submitting it :)

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-11-11 Thread Greg KH
On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/10/20 12:59 PM, Greg KH wrote:
> > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > Hi, back to this after a long context switch for some higher priority 
> > > stuff.
> > > 
> > > So here I was able eventually to drop all this code and this change here 
> > > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7ae9e5798c7648d6dbb908d885a22c58%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637406278875513811%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aoFIsBxpLC9tBZw3E%2B8IJlNqFSq6uRgEvvciaZ6B1iw%3D&reserved=0
> > > was enough for me. Seems like while device_remove_file can handle the use
> > > case where the file and the parent directory already gone,
> > > sysfs_remove_group goes down in flames in that case
> > > due to kobj->sd being unset on device removal.
> > A driver shouldn't ever have to remove individual sysfs groups, the
> > driver core/bus logic should do it for them automatically.
> > 
> > And whenever a driver calls a sysfs_* call, that's a hint that something
> > is not working properly.
> 
> 
> 
> Do you mean that while the driver creates the groups and files explicitly
> from it's different subsystems it should not explicitly remove each
> one of them because all of them should be removed at once (and
> recursively) when the device is being removed ?

Individual drivers should never add groups/files in sysfs, the driver
core should do it properly for you if you have everything set up
properly.  And yes, the driver core will automatically remove them as
well.

Please use the default groups attribute for your bus/subsystem and this
will happen automagically.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal

2020-11-11 Thread Greg KH
On Wed, Nov 11, 2020 at 10:45:53AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/11/20 10:34 AM, Greg KH wrote:
> > On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> > > On 11/10/20 12:59 PM, Greg KH wrote:
> > > > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > > > Hi, back to this after a long context switch for some higher priority 
> > > > > stuff.
> > > > > 
> > > > > So here I was able eventually to drop all this code and this change 
> > > > > here 
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9fbfecac94a340dfb68408d886571609%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407055896651058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Ye8HJR1vidppcOBnlOgVu5GwKD2%2Bb5ztHbiI%2BubKKT0%3D&reserved=0
> > > > > was enough for me. Seems like while device_remove_file can handle the 
> > > > > use
> > > > > case where the file and the parent directory already gone,
> > > > > sysfs_remove_group goes down in flames in that case
> > > > > due to kobj->sd being unset on device removal.
> > > > A driver shouldn't ever have to remove individual sysfs groups, the
> > > > driver core/bus logic should do it for them automatically.
> > > > 
> > > > And whenever a driver calls a sysfs_* call, that's a hint that something
> > > > is not working properly.
> > > 
> > > 
> > > Do you mean that while the driver creates the groups and files explicitly
> > > from it's different subsystems it should not explicitly remove each
> > > one of them because all of them should be removed at once (and
> > > recursively) when the device is being removed ?
> > Individual drivers should never add groups/files in sysfs, the driver
> > core should do it properly for you if you have everything set up
> > properly.  And yes, the driver core will automatically remove them as
> > well.
> > 
> > Please use the default groups attribute for your bus/subsystem and this
> > will happen automagically.
> 
> Googling for default groups attributes i found this - 
> https://www.linuxfoundation.org/blog/2013/06/how-to-create-a-sysfs-file-correctly/

Odd, mirror of the original article:

http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/

> Would this be what you suggest for us ? Specifically for our case the struct
> device's  groups  seems the right solution as different devices
> might have slightly diffreent sysfs attributes.

That's what the is_visable() callback in your attribute group is for, to
tell the kernel if an individual sysfs attribute should be created or
not.

thanks,

greg k-h
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH AUTOSEL 4.19 044/167] drm/amdgpu: validate user pitch alignment

2019-09-03 Thread Greg KH
On Tue, Sep 03, 2019 at 06:40:43PM +0200, Michel Dänzer wrote:
> On 2019-09-03 6:23 p.m., Sasha Levin wrote:
> > From: Yu Zhao 
> > 
> > [ Upstream commit 89f23b6efef554766177bf51aa754bce14c3e7da ]
> 
> Hold your horses!
> 
> This commit and c4a32b266da7bb702e60381ca0c35eaddbc89a6c had to be
> reverted, as they caused regressions. See commits
> 25ec429e86bb790e40387a550f0501d0ac55a47c &
> 92b0730eaf2d549fdfb10ecc8b71f34b9f472c12 .
> 
> 
> This isn't bolstering confidence in how these patches are selected...

The patch _itself_ said to be backported to the stable trees from 4.2
and newer.  Why wouldn't we be confident in doing this?

If the patch doesn't want to be backported, then do not add the cc:
stable line to it...

greg k-h


Re: [PATCH v3] drm/radeon: Fix EEH during kexec

2019-10-10 Thread Greg KH
On Thu, Oct 10, 2019 at 02:44:29PM -0500, KyleMahlkuch wrote:
> During kexec some adapters hit an EEH since they are not properly
> shut down in the radeon_pci_shutdown() function. Adding
> radeon_suspend_kms() fixes this issue.
> Enabled only on PPC because this patch causes issues on some other
> boards.
> 
> Signed-off-by: Kyle Mahlkuch 

Real email address please, with a '@' sign.

And your "From:" line did not match up with this :(

> ---
>  drivers/gpu/drm/radeon/radeon_drv.c | 14 ++
>  1 file changed, 14 insertions(+)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH AUTOSEL 5.5 155/542] drm/amdkfd: remove set but not used variable 'top_dev'

2020-02-14 Thread Greg KH
On Fri, Feb 14, 2020 at 10:42:27AM -0500, Sasha Levin wrote:
> From: zhengbin 
> 
> [ Upstream commit d191bd678153307573d615bb42da4fcca19fe477 ]
> 
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/gpu/drm/amd/amdkfd/kfd_iommu.c: In function kfd_iommu_device_init:
> drivers/gpu/drm/amd/amdkfd/kfd_iommu.c:65:30: warning: variable top_dev set 
> but not used [-Wunused-but-set-variable]
> 
> Reported-by: Hulk Robot 
> Fixes: 1ae99eab34f9 ("drm/amdkfd: Initialize HSA_CAP_ATS_PRESENT capability 
> in topology codes")
> Signed-off-by: zhengbin 
> Reviewed-by: Felix Kuehling 
> Signed-off-by: Felix Kuehling 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 3 ---
>  1 file changed, 3 deletions(-)

Unless all of these "unused bt set variable" patches are needed for
"real" fixes, there's no need to add them here as we are NOT building
the kernel with that option enabled any time soon from what I can tell.

So you can drop a ton of these patches from all of these AUTOSEL
branches please.

thanks,

greg kh
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-10-09 Thread Greg KH
On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote:
> This is also causing log spam on 5.15.  It was included in 5.15.128 as
> commit 4921792e04f2125b5eadef9dbe9417a8354c7eff.  I encountered this and
> found https://gitlab.freedesktop.org/drm/amd/-/issues/2820 while researching
> the problem.

Confused, what should we do here?


Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-10-09 Thread Greg KH
On Mon, Oct 09, 2023 at 02:46:40PM +0200, Christian König wrote:
> Am 07.10.23 um 11:50 schrieb Greg KH:
> > On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote:
> > > This is also causing log spam on 5.15.  It was included in 5.15.128 as
> > > commit 4921792e04f2125b5eadef9dbe9417a8354c7eff.  I encountered this and
> > > found https://gitlab.freedesktop.org/drm/amd/-/issues/2820 while 
> > > researching
> > > the problem.
> > Confused, what should we do here?
> 
> If this patch was backported to even more older kernels then please revert
> that immediately!

It only went to 5.10 and 5.15 and has been reverted from both of them
now.

thanks,

greg k-h


Re: [PATCH] r8169: fix ASPM-related issues on a number of systems with NIC version from RTL8168h

2023-11-08 Thread Greg KH
On Wed, Nov 08, 2023 at 11:34:00AM +0800, Li Ma wrote:
> From: Heiner Kallweit 
> 
> This effectively reverts 4b5f82f6aaef. On a number of systems ASPM L1
> causes tx timeouts with RTL8168h, see referenced bug report.
> 
> Fixes: 4b5f82f6aaef ("r8169: enable ASPM L1/L1.1 from RTL8168h")
> Cc: sta...@vger.kernel.org
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217814
> Signed-off-by: Heiner Kallweit 
> Signed-off-by: David S. Miller 
> (cherry picked from commit 90ca51e8c654699b672ba61aeaa418dfb3252e5e)
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 4 
>  1 file changed, 4 deletions(-)

Is this a proposed stable tree patch?  If so, what kernel(s) are you
wanting it applied to?

thanks,

greg k-h


Re: [PATCH] drm/amd/pm: replace 1-element arrays with flexible-array members

2023-11-08 Thread Greg KH
On Wed, Nov 08, 2023 at 08:54:35AM +0200, José Pekkarinen wrote:
> The following case seems to be safe to be replaced with a flexible array
> to clean up the added coccinelle warning. This patch will just do it.
> 
> drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h:76:38-63: WARNING use 
> flexible-array member instead 
> (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
> 
> Signed-off-by: José Pekkarinen 
> ---
>  drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h 
> b/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> index c7b61222d258..1ce4087005f0 100644
> --- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> +++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> @@ -73,7 +73,7 @@ struct smu8_register_index_data_pair {
>  
>  struct smu8_ih_meta_data {
>   uint32_t command;
> - struct smu8_register_index_data_pair register_index_value_pair[1];
> + struct smu8_register_index_data_pair register_index_value_pair[];

Did you just change this structure size without any need to change any
code as well?  How was this tested?

thanks,

greg k-h


Re: [PATCH] r8169: fix ASPM-related issues on a number of systems with NIC version from RTL8168h

2023-11-08 Thread Greg KH
On Wed, Nov 08, 2023 at 08:40:48AM +0100, Heiner Kallweit wrote:
> On 08.11.2023 08:05, Greg KH wrote:
> > On Wed, Nov 08, 2023 at 11:34:00AM +0800, Li Ma wrote:
> >> From: Heiner Kallweit 
> >>
> >> This effectively reverts 4b5f82f6aaef. On a number of systems ASPM L1
> >> causes tx timeouts with RTL8168h, see referenced bug report.
> >>
> >> Fixes: 4b5f82f6aaef ("r8169: enable ASPM L1/L1.1 from RTL8168h")
> >> Cc: sta...@vger.kernel.org
> >> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217814
> >> Signed-off-by: Heiner Kallweit 
> >> Signed-off-by: David S. Miller 
> >> (cherry picked from commit 90ca51e8c654699b672ba61aeaa418dfb3252e5e)
> >> ---
> >>  drivers/net/ethernet/realtek/r8169_main.c | 4 
> >>  1 file changed, 4 deletions(-)
> > 
> > Is this a proposed stable tree patch?  If so, what kernel(s) are you
> > wanting it applied to?
> > 
> This should have been sent neither to you nor sta...@vger.kernel.org.
> This patch has been applied to stable already, the mail is something
> AMD-internal it seems.

Then someone needs to seriously fix their scripts as it is very
confusing :(


Re: [PATCH] drm/amd/pm: replace 1-element arrays with flexible-array members

2023-11-09 Thread Greg KH
On Thu, Nov 09, 2023 at 10:43:50AM +0200, José Pekkarinen wrote:
> On 2023-11-08 09:29, Greg KH wrote:
> > On Wed, Nov 08, 2023 at 08:54:35AM +0200, José Pekkarinen wrote:
> > > The following case seems to be safe to be replaced with a flexible
> > > array
> > > to clean up the added coccinelle warning. This patch will just do it.
> > > 
> > > drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h:76:38-63:
> > > WARNING use flexible-array member instead 
> > > (https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays)
> > > 
> > > Signed-off-by: José Pekkarinen 
> > > ---
> > >  drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> > > b/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> > > index c7b61222d258..1ce4087005f0 100644
> > > --- a/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> > > +++ b/drivers/gpu/drm/amd/pm/powerplay/smumgr/smu8_smumgr.h
> > > @@ -73,7 +73,7 @@ struct smu8_register_index_data_pair {
> > > 
> > >  struct smu8_ih_meta_data {
> > >   uint32_t command;
> > > - struct smu8_register_index_data_pair register_index_value_pair[1];
> > > + struct smu8_register_index_data_pair register_index_value_pair[];
> > 
> > Did you just change this structure size without any need to change any
> > code as well?  How was this tested?
> 
> I didn't find any use of that struct member, if I missed
> something here, please let me know and I'll happily address any
> needed further work.

I don't think this is even a variable array.  It's just a one element
one, which is fine, don't be confused by the coccinelle "warning" here,
it's fired many false-positives and you need to verify this properly
with the driver authors first before changing anything.

In short, you just changed the size of this structure, are you _sure_
you can do that?  And yes, it doesn't look like this field is used, but
the structure is, so be careful.

thanks,

greg k-h


Re: Kernel 5.15.150 black screen with AMD Raven/Picasso GPU

2024-05-24 Thread Greg KH
On Thu, May 23, 2024 at 05:59:39PM +0200, Armin Wolf wrote:
> Am 23.05.24 um 15:13 schrieb Barry Kauler:
> 
> > On Wed, May 22, 2024 at 12:58 AM Armin Wolf  wrote:
> > > Am 20.05.24 um 18:22 schrieb Alex Deucher:
> > > 
> > > > On Sat, May 18, 2024 at 8:17 PM Armin Wolf  wrote:
> > > > > Am 17.05.24 um 03:30 schrieb Barry Kauler:
> > > > > 
> > > > > > Armin, Yifan, Prike,
> > > > > > I will top-post, so you don't have to scroll down.
> > > > > > After identifying the commit that causes black screen with my gpu, I
> > > > > > posted the result to you guys, on May 9.
> > > > > > It is now May 17 and no reply.
> > > > > > OK, I have now created a patch that reverts Yifan's commit, compiled
> > > > > > 5.15.158, and my gpu now works.
> > > > > > Note, the radeon module is not loaded, so it is not a factor.
> > > > > > I'm not a kernel developer. I have identified the culprit and it is 
> > > > > > up
> > > > > > to you guys to fix it, Yifan especially, as you are the person who 
> > > > > > has
> > > > > > created the regression.
> > > > > > I will attach my patch.
> > > > > > Regards,
> > > > > > Barry Kauler
> > > > > Hi,
> > > > > 
> > > > > sorry for not responding to your findings. I normally do not work 
> > > > > with GPU drivers,
> > > > > so i hoped one of the amdgpu developers would handle this.
> > > > > 
> > > > > I cceddri-de...@lists.freedesktop.org  and 
> > > > > amd-gfx@lists.freedesktop.org so that other
> > > > > amdgpu developers hear from this issue.
> > > > > 
> > > > > Thanks you for you persistence in finding the offending commit.
> > > > Likely this patch should not have been ported to 5.15 in the first
> > > > place.  The IOMMU requirements have been dropped from the driver for
> > > > the last few kernel versions so it is no longer relevant on newer
> > > > kernels.
> > > > 
> > > > Alex
> > > Barry, can you verify that the latest upstream kernel works on you device?
> > > If yes, then the commit itself is ok and just the backporting itself was 
> > > wrong.
> > > 
> > > Thanks,
> > > Armin Wolf
> > Armin,
> > The unmodified 6.8.1 kernel works ok.
> > I presume that patch was applied long before 6.8.1 got released and
> > only got backported to 5.15.x recently.
> > 
> > Regards,
> > Barry
> > 
> Great to hear, that means we only have to revert commit 56b522f46681 
> ("drm/amdgpu: init iommu after amdkfd device init")
> from the 5.15.y series.
> 
> I CCed the stable mailing list so that they can revert the offending commit.

Please submit the patch/revert that you wish to have applied to the tree
so we can have the correct information in it.  I have no idea what to do
here with this deep response thread as-is, sorry.

thanks,

greg k-h


Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-13 Thread Greg KH
On Wed, Jun 12, 2024 at 12:10:37PM +1200, Matthew Ruffell wrote:
> Hi Greg KH, Sasha,
> 
> Please pick up this patch for 5.15 stable tree. I have built a test kernel and
> can confirm that it fixes affected users.
> 
> Downstream bug:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738

Sorry for the delay, now picked up.

greg k-h


Re: [PATCH 21/39] drm/amd/display: Make DML2.1 P-State method force per stream

2024-06-21 Thread Greg KH
On Thu, Jun 20, 2024 at 10:11:27AM -0600, Alex Hung wrote:
> From: Dillon Varone 
> 
> [WHY & HOW]
> Currently the force only works for a single display, make it so it can
> be forced per stream.
> 
> Reviewed-by: Alvin Lee 
> Cc: Mario Limonciello 
> Cc: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Acked-by: Alex Hung 
> Signed-off-by: Dillon Varone 

When submitting patches from others, you too have to sign-off on the
patch.  Read the DCO in the documentation for details.

thanks,

greg k-h


Re: [PATCH v5.10] drm/amd/display: Wake DMCUB before executing GPINT commands

2024-04-16 Thread Greg KH
On Tue, Apr 16, 2024 at 02:43:47AM +, Zhu Wang wrote:
> From: Nicholas Kazlauskas 
> 
> stable inclusion
> from stable-v6.7.3
> commit2ef98c6d753a7 ("drm/amd/display: Wake DMCUB before executing 
> GPINT commands")
> category: bugfix
> bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9BV4C
> CVE: CVE-2023-52624
> 
> Reference: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ef98c6d753a744e333b7e34b9cf687040fba57d
> 
> 
> 
> [ Upstream commit e5ffd1263dd5b ("drm/amd/display: Wake DMCUB before 
> executing GPINT commands") ]
> 
> [Why]
> DMCUB can be in idle when we attempt to interface with the HW through
> the GPINT mailbox resulting in a system hang.
> 
> [How]
> Add dc_wake_and_execute_gpint() to wrap the wake, execute, sleep
> sequence.
> 
> If the GPINT executes successfully then DMCUB will be put back into
> sleep after the optional response is returned.
> 
> It functions similar to the inbox command interface.
> 
> Cc: Mario Limonciello 
> Cc: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Reviewed-by: Hansen Dsouza 
> Acked-by: Wayne Lin 
> Signed-off-by: Nicholas Kazlauskas 
> Tested-by: Daniel Wheeler 
> Signed-off-by: Alex Deucher 
> 
> This patch comes from following commits:
> 
>  115c7e7f0501 ("drm/amd/display: Add psr get_state call")
>  1d496907f1c5 ("drm/amd/display: Engage PSR synchronously")
>  34ba432c946d ("drm/amd/display: [FW Promotion] Release 0.0.44")
>  672251b223c1 ("drm/amd/display: [FW Promotion] Release 0.0.40")
>  04f3c88f0955 ("drm/amd/display: Retry getting PSR state if command times 
> out")
>  b30eda8d416c ("drm/amd/display: Add ETW log to dmub_psr_get_state")
>  f59a66c1915e ("drm/amd/display: use do-while-0 for DC_TRACE_LEVEL_MESSAGE()")
>  e97cc04fe0fb ("drm/amd/display: refactor dmub commands into single function")
>  522b9a5d5852 ("drm/amd/display: drain dmub inbox if queue is full")
>  9dce8c2a5f1b ("drm/amd/display: [FW Promotion] Release 0.0.161.0")
>  276641775848 ("drm/amd/display: [FW Promotion] Release 0.0.162.0")
>  8774029f76b9 ("drm/amd/display: Add DCN35 CLK_MGR")
>  65138eb72e1f ("drm/amd/display: Add DCN35 DMUB")
>  dc01c4b79bfe ("drm/amd/display: Update driver and IPS interop")
>  5b7954272ae9 ("drm/amd/display: [FW Promotion] Release 0.0.183.0")
>  da2d16fcdda3 ("drm/amd/display: Fix IPS handshake for idle optimizations")
>  5e8a0d3598b4 ("drm/amd/display: Negate IPS allow and commit bits")
>  820c3870c491 ("drm/amd/display: Refactor DMCUB enter/exit idle interface")
>  2ef98c6d753a ("drm/amd/display: Wake DMCUB before executing GPINT commands")
> 
> Signed-off-by: Zhu Wang 

I'm confused, what are we supposed to do with this?

greg k-h


Re: [PATCH v6.6] drm/amd/display: Wake DMCUB before executing GPINT commands

2024-04-16 Thread Greg KH
On Tue, Apr 16, 2024 at 03:52:40AM +, Zhu Wang wrote:
> From: Nicholas Kazlauskas 
> 
> stable inclusion
> from stable-v6.7.3
> commit 2ef98c6d753a744e333b7e34b9cf687040fba57d
> category: bugfix
> bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9BV4C
> CVE: CVE-2023-52624
> 
> Reference: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ef98c6d753a744e333b7e34b9cf687040fba57d
> 
> 
> 
> [ Upstream commit e5ffd1263dd5b44929c676171802e7b6af483f21 ]
> 
> [Why]
> DMCUB can be in idle when we attempt to interface with the HW through
> the GPINT mailbox resulting in a system hang.
> 
> [How]
> Add dc_wake_and_execute_gpint() to wrap the wake, execute, sleep
> sequence.
> 
> If the GPINT executes successfully then DMCUB will be put back into
> sleep after the optional response is returned.
> 
> It functions similar to the inbox command interface.
> 
> Cc: Mario Limonciello 
> Cc: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Reviewed-by: Hansen Dsouza 
> Acked-by: Wayne Lin 
> Signed-off-by: Nicholas Kazlauskas 
> Tested-by: Daniel Wheeler 
> Signed-off-by: Alex Deucher 
> 
> This commit comes from following commits:
> 
>  8774029f76b9 ("drm/amd/display: Add DCN35 CLK_MGR")
>  65138eb72e1f ("drm/amd/display: Add DCN35 DMUB")
>  dc01c4b79bfe ("drm/amd/display: Update driver and IPS interop")
>  820c3870c491 ("drm/amd/display: Refactor DMCUB enter/exit idle interface")
>  2ef98c6d753a ("drm/amd/display: Wake DMCUB before executing GPINT commands")

Why are you putting multiple commits together and not just submitting
the individual ones?  And what is this for?

confused,

greg k-h


Re: 答复: [PATCH v6.6] drm/amd/display: Wake DMCUB before executing GPINT commands

2024-04-18 Thread Greg KH
On Thu, Apr 18, 2024 at 01:51:33AM +, wangzhu wrote:
> Hi Greg, thanks for your reply. Since there is no patch to fix CVE-2023-52624 
> in linux-5.10, there is a patch in the linux-6.7 branch, its commit is 
> 2ef98c6d753a744e333b7e34b9cf687040fba57d ("drm/amd/display: Wake DMCUB before 
> executing GPINT commands"). When we apply this patch to linux-5.10, there are 
> lots of conflicts, and we found there are lots of dependent patches, and lots 
> of patches are not proposed to fix the cve, they are presented to add new 
> functions of the kernel.
> 
> My commit comes from nearly 20 patches. For each patch, not all of its 
> content is meant to fix the cve, so I just get the part which is helpful to 
> fix. It is why I don't present the patches one by one instead of merging them 
> into one big patch.
> 




This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.



Also, yes, we want to take the original commits, as you see on the
stable mailing list, submit a series of commits that are in Linus's tree
that resolve the issue (and send them to the proper people, as explained
in the link above.)

thanks,

greg k-h


Re: 答复: [PATCH v5.10] drm/amd/display: Wake DMCUB before executing GPINT commands

2024-04-18 Thread Greg KH
On Thu, Apr 18, 2024 at 03:19:46AM +, wangzhu wrote:
> Hi Greg, thanks for your reply. Since there is no patch to fix CVE-2023-52624 
> in linux-5.10, there is a patch in the linux-6.7 branch to fix it, its commit 
> is 2ef98c6d753a744e333b7e34b9cf687040fba57d ("drm/amd/display: Wake DMCUB 
> before executing GPINT commands"). When we apply this patch to linux-5.10, 
> there are lots of conflicts, and we found there are lots of dependent 
> patches, we do not apply all these patches since some are not meant to fix 
> the cve, so we just get part of them, and for each patch we just get the part 
> which is helpful to fix.

Why do you think that specific CVE is relevent to the 5.10.y kernel?
And if so, what about the ones newer than that, as you know I can not
take patches only for older kernels, that would leave newer branches
vulnerable to the same bug.

So please submit fixes for all branches that you wish to see resolved at
the same time.

thanks,

greg k-h


Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-03 Thread Greg KH
On Fri, May 03, 2024 at 11:30:50AM +0530, Krishna Kurapati PSSNV wrote:
> 
> 
> On 5/3/2024 10:42 AM, Greg KH wrote:
> > Ok, I'm getting tired of seeing these for the USB portion of the tree,
> > so I went to look for:
> > 
> > On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote:
> > > |-- arc-randconfig-002-20240503
> > > |   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used
> > 
> > This warning (same for all arches), but can't seem to find it anywhere.
> > 
> > Any hints as to where it would be?
> > 
> 
> Hi Greg,
> 
>  I think the hw_mode was not removed in hs_phy_setup and left unused.
> 
>  Thinh reported the same when there was a merge conflict into linux next
> (that the hw_mode variable was removed in ss_phy_setup and should be removed
> in hs_phy_setup as well):
> 
> https://lore.kernel.org/all/20240426213923.tyeddub4xszyp...@synopsys.com/
> 
>  Perhaps that was missed ?

Must have been.  I need it in a format that it can be applied in (a
2-way diff can't apply...)

thanks,

greg k-h


Re: [linux-next:master] BUILD REGRESSION 9c6ecb3cb6e20c4fd7997047213ba0efcf9ada1a

2024-05-03 Thread Greg KH
Ok, I'm getting tired of seeing these for the USB portion of the tree,
so I went to look for:

On Fri, May 03, 2024 at 04:44:42AM +0800, kernel test robot wrote:
> |-- arc-randconfig-002-20240503
> |   `-- drivers-usb-dwc3-core.c:warning:variable-hw_mode-set-but-not-used

This warning (same for all arches), but can't seem to find it anywhere.

Any hints as to where it would be?

thanks,

greg k-h


Re: [RFC PATCH] drm/amd/display: fix bandwidth validation failure on DCN 2.1

2024-01-03 Thread Greg KH
On Wed, Jan 03, 2024 at 09:44:18AM -0500, Hamza Mahfooz wrote:
> On 12/29/23 11:25, Melissa Wen wrote:
> > IGT `amdgpu/amd_color/crtc-lut-accuracy` fails right at the beginning of
> > the test execution, during atomic check, because DC rejects the
> > bandwidth state for a fb sizing 64x64. The test was previously working
> > with the deprecated dc_commit_state(). Now using
> > dc_validate_with_context() approach, the atomic check needs to perform a
> > full state validation. Therefore, set fast_validation to false in the
> > dc_validate_global_state call for atomic check.
> > 
> > Fixes: b8272241ff9d ("drm/amd/display: Drop dc_commit_state in favor of 
> > dc_commit_streams")
> > Signed-off-by: Melissa Wen 
> > ---
> > 
> > Hi,
> > 
> > It's a long story. I was inspecting this bug report:
> > - https://gitlab.freedesktop.org/drm/amd/-/issues/2016
> > and noticed the IGT test `igt@amdgpu/amd_color@crtc-lut-accuracy`
> > mentioned there wasn't even being executed on a laptop with DCN 2.1
> > (HP HP ENVY x360 Convertible 13-ay1xxx/8929). The test fails right at
> > the beginning due to an atomic check rejection, as below:
> > 
> > Starting subtest: crtc-lut-accuracy
> > (amd_color:14772) igt_kms-CRITICAL: Test assertion failure function 
> > igt_display_commit_atomic, file ../lib/igt_kms.c:4530:
> > (amd_color:14772) igt_kms-CRITICAL: Failed assertion: ret == 0
> > (amd_color:14772) igt_kms-CRITICAL: Last errno: 22, Invalid argument
> > (amd_color:14772) igt_kms-CRITICAL: error: -22 != 0
> > Stack trace:
> >#0 ../lib/igt_core.c:1989 __igt_fail_assert()
> >#1 [igt_display_commit_atomic+0x44]
> >#2 ../tests/amdgpu/amd_color.c:159 __igt_uniquereal_main395()
> >#3 ../tests/amdgpu/amd_color.c:395 main()
> >#4 ../sysdeps/nptl/libc_start_call_main.h:74 __libc_start_call_main()
> >#5 ../csu/libc-start.c:128 __libc_start_main@@GLIBC_2.34()
> >#6 [_start+0x21]
> > Subtest crtc-lut-accuracy failed.
> > 
> > Checking dmesg, we can see that a bandwidth validation failure causes
> > the atomic check rejection:
> > 
> > [  711.147663] amdgpu :04:00.0: [drm] Mode Validation Warning: Unknown 
> > Status failed validation.
> > [  711.147667] [drm:amdgpu_dm_atomic_check [amdgpu]] DC global validation 
> > failure: Bandwidth validation failure (BW and Watermark) (13)
> > [  711.147772] [drm:amdgpu_irq_dispatch [amdgpu]] Unregistered interrupt 
> > src_id: 243 of client_id:8
> > [  711.148033] [drm:amdgpu_dm_atomic_check [amdgpu]] Atomic check failed 
> > with err: -22
> > 
> > I also noticed that the atomic check doesn't fail if I change the fb
> > width and height used in the test from 64 to 66, and I can get the test
> > execution back (and with success). However, I recall that all test cases
> > of IGT `amd_color` were working in the past, so I bisected and found the
> > first bad commit:
> > 
> > b8272241ff9d drm/amd/display: Drop dc_commit_state in favor of 
> > dc_commit_streams
> > 
> > Bringing the `dc_commit_state` machinery back also prevents the
> > bandwidth validation failure, but the commit above says
> > dc_commit_streams validation is more complete than dc_commit_state, so I
> > discarded this approach.
> > 
> > After some debugging and code inspection, I found out that avoiding fast
> > validation on dc_validate_global_state during atomic check solves the
> > issue, but I'm not sure if this change may affect performance. I
> > compared exec time of some IGT tests and didn't see any differences, but
> > I recognize it doesn't provide enough evidence.
> > 
> > What do you think about this change? Should I examine other things? Do
> > you see any potential issue that I should investigate? Could you
> > recommend a better approach to assess any side-effect of not enabling
> > fast validation in the atomic check?
> > 
> > Please, let me know your thoughts.
> 
> We shouldn't be doing fast updates when lock_and_validation_needed is
> true, so this seems to be correct.
> 
> Which is to say, applied, thanks!
> 
> Cc: sta...@vger.kernel.org



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.




Re: [PATCH] drm/amd/display: Fix sending VSC (+ colorimetry) packets for DP/eDP displays without PSR

2024-01-04 Thread Greg KH
On Thu, Jan 04, 2024 at 08:54:19AM -0500, Hamza Mahfooz wrote:
> On 1/3/24 14:17, Joshua Ashton wrote:
> > Thanks! Is it possible for us to get this backported too?
> 
> Sure thing.
> 
> Cc: sta...@vger.kernel.org



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.




Re: [V9 1/9] drivers core: Add support for Wifi band RF mitigations

2023-08-21 Thread Greg KH
On Fri, Aug 18, 2023 at 11:26:11AM +0800, Evan Quan wrote:
>  drivers/base/Makefile |   1 +
>  drivers/base/wbrf.c   | 280 ++

Why is a wifi-specific thing going into drivers/base/?

confused,

greg k-h


Re: [V9 1/9] drivers core: Add support for Wifi band RF mitigations

2023-08-21 Thread Greg KH
On Fri, Aug 18, 2023 at 05:49:14PM -0500, Limonciello, Mario wrote:
> 
> 
> On 8/18/2023 4:24 PM, Greg KH wrote:
> > On Fri, Aug 18, 2023 at 11:26:11AM +0800, Evan Quan wrote:
> > >   drivers/base/Makefile |   1 +
> > >   drivers/base/wbrf.c   | 280 ++
> > 
> > Why is a wifi-specific thing going into drivers/base/?
> > 
> > confused,
> > 
> > greg k-h
> 
> The original problem statement was at a high level 'there can be
> interference between different devices operating at high frequencies'. The
> original patches introduced some ACPI library code that enabled a mitigated
> for this interference between mac80211 devices and amdgpu devices.
> 
> Andrew Lunn wanted to see something more generic, so the series has morphed
> into base code for things to advertise frequencies in use and other things
> to listen to frequencies in use and react.
> 
> The idea is supposed to be that if the platform knows that these mitigations
> are needed then the producers send the frequencies in use, consumers react
> to them.  The AMD implementation of getting this info from the platform
> plugs into the base code (patch 2).
> 
> If users don't want this behavior they can turn it off on kernel command
> line.
> 
> If the platform doesn't know mitigations are needed but user wants to turn
> them on anyway they can turn it on kernel command line.

That's all fine, I don't object to that at all.  But bus/device-specific
stuff should NOT be in drivers/base/ if at all possible (yes, we do have
some exceptions with hypervisor.c and memory and cpu stuff) but for a
frequency thing like this, why can't it live with the other
wifi/frequency code in drivers/net/wireless/?

In other words, what's the benefit to having me be the maintainer of
this, someone who knows nothing about this subsystem, other than you
passing off that work to me?  :)

thanks,

greg k-h


Re: [V9 1/9] drivers core: Add support for Wifi band RF mitigations

2023-08-22 Thread Greg KH
On Mon, Aug 21, 2023 at 10:13:45PM -0500, Limonciello, Mario wrote:
> So I wonder if the right answer is to put it in drivers/net/wireless
> initially and if we come up with a need later for non wifi producers we can
> discuss moving it at that time.

Please do so.

thanks,

greg k-h


Re: [V9 1/9] drivers core: Add support for Wifi band RF mitigations

2023-08-23 Thread Greg KH
On Wed, Aug 23, 2023 at 10:53:43AM +0300, Kalle Valo wrote:
> Greg KH  writes:
> 
> > On Mon, Aug 21, 2023 at 10:13:45PM -0500, Limonciello, Mario wrote:
> >> So I wonder if the right answer is to put it in drivers/net/wireless
> >> initially and if we come up with a need later for non wifi producers we can
> >> discuss moving it at that time.
> >
> > Please do so.
> 
> Sorry, I haven't been able to follow the discussion in detail but just a
> quick comment: if there's supposed to be code which is shared with
> different wifi drivers then drivers/net/wireless sounds wrong,
> net/wireless or net/mac80211 would be more approriate location.

That's fine with me as well, just not drivers/core/ please :)

thanks,

greg k-h


Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-08-31 Thread Greg KH
On Thu, Aug 31, 2023 at 12:27:27PM +0200, Christian König wrote:
> Am 30.08.23 um 20:53 schrieb Chia-I Wu:
> > On Sun, Jul 23, 2023 at 6:24 PM Sasha Levin  wrote:
> > > From: Lang Yu 
> > > 
> > > [ Upstream commit 187916e6ed9d0c3b3abc27429f7a5f8c936bd1f0 ]
> > > 
> > > When using cpu to update page tables, vm update fences are unused.
> > > Install stub fence into these fence pointers instead of NULL
> > > to avoid NULL dereference when calling dma_fence_wait() on them.
> > > 
> > > Suggested-by: Christian König 
> > > Signed-off-by: Lang Yu 
> > > Reviewed-by: Christian König 
> > > Signed-off-by: Alex Deucher 
> > > Signed-off-by: Sasha Levin 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > We start getting this warning spew on chromeos
> 
> Yeah because the older kernels still kept track of the last VM fence in the
> syncobj.
> 
> This patch here should probably not have been back ported.
> 
> Why was that done anyway? The upstream commit doesn't have a CC stable and
> this is only a bug fix for a new feature not present on older kernels.

It is part of the AUTOSEL process.



Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-08-31 Thread Greg KH
On Wed, Aug 30, 2023 at 11:53:29AM -0700, Chia-I Wu wrote:
> On Sun, Jul 23, 2023 at 6:24 PM Sasha Levin  wrote:
> >
> > From: Lang Yu 
> >
> > [ Upstream commit 187916e6ed9d0c3b3abc27429f7a5f8c936bd1f0 ]
> >
> > When using cpu to update page tables, vm update fences are unused.
> > Install stub fence into these fence pointers instead of NULL
> > to avoid NULL dereference when calling dma_fence_wait() on them.
> >
> > Suggested-by: Christian König 
> > Signed-off-by: Lang Yu 
> > Reviewed-by: Christian König 
> > Signed-off-by: Alex Deucher 
> > Signed-off-by: Sasha Levin 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> We start getting this warning spew on chromeos, likely from
> dma_fence_is_later because the stub fence is on a different timeline:
> 
> [  273.334767] WARNING: CPU: 1 PID: 13383 at
> include/linux/dma-fence.h:478 amdgpu_sync_keep_later+0x95/0xbd
> [  273.334769] Modules linked in: snd_seq_dummy snd_seq snd_seq_device
> bridge stp llc tun vhost_vsock vhost vhost_iotlb
> vmw_vsock_virtio_transport_common vsock 8021q veth lzo_rle
> lzo_compress zram uinput snd_acp_sof_mach snd_acp_mach snd_soc_dmic
> xt_cgroup rfcomm xt_MASQUERADE cmac algif_hash algif_skcipher af_alg
> btusb btrtl btintel btbcm rtw89_8852ae rtw89_pci rtw89_8852a
> rtw89_core snd_sof_amd_renoir snd_sof_xtensa_dsp snd_sof_amd_acp
> snd_acp_pci snd_acp_config snd_soc_acpi snd_pci_acp3x snd_sof_pci
> snd_sof snd_hda_codec_hdmi snd_sof_utils snd_hda_intel mac80211
> snd_intel_dspcfg snd_hda_codec cros_ec_typec snd_hwdep roles
> snd_hda_core typec snd_soc_rt5682s snd_soc_rt1019 snd_soc_rl6231
> ip6table_nat i2c_piix4 fuse bluetooth ecdh_generic ecc cfg80211
> iio_trig_sysfs cros_ec_lid_angle cros_ec_sensors cros_ec_sensors_core
> industrialio_triggered_buffer kfifo_buf industrialio cros_ec_sensorhub
> r8153_ecm cdc_ether usbnet r8152 mii uvcvideo videobuf2_vmalloc
> videobuf2_memops videobuf2_v4l2
> [  273.334795]  videobuf2_common joydev
> [  273.334799] CPU: 1 PID: 13383 Comm: chrome:cs0 Tainted: GW
>5.10.192-23384-g3d3f0f0c5e4f #1
> fe1e7e3b7510aa7b8e01701478119255f825a36f
> [  273.334800] Hardware name: Google Dewatt/Dewatt, BIOS
> Google_Dewatt.14500.347.0 03/30/2023
> [  273.334802] RIP: 0010:amdgpu_sync_keep_later+0x95/0xbd
> [  273.334804] Code: 00 00 b8 01 00 00 00 f0 0f c1 43 38 85 c0 74 26
> 8d 48 01 09 c1 78 24 49 89 1e 5b 41 5e 5d c3 cc cc cc cc e8 4a 94 ac
> ff eb ce <0f> 0b 49 8b 06 48 85 c0 75 af eb c2 be 02 00 00 00 48 8d 7b
> 38 e8
> [  273.334805] RSP: 0018:b222c1817b50 EFLAGS: 00010293
> [  273.334807] RAX: 89bfc838 RBX: 8aa425e9ed00 RCX: 
> 
> [  273.334808] RDX: 8aa426156a98 RSI: 8aa425e9ed00 RDI: 
> 8aa432518918
> [  273.334810] RBP: b222c1817b60 R08: 8aa43ca6c0a0 R09: 
> 8aa33af3c9a0
> [  273.334811] R10: fcf8c5986600 R11: 87a00fce R12: 
> 0098
> [  273.334812] R13: 005e2a00 R14: 8aa432518918 R15: 
> 
> [  273.334814] FS:  7e70f8694640() GS:8aa4e608()
> knlGS:
> [  273.334816] CS:  0010 DS:  ES:  CR0: 80050033
> [  273.334817] CR2: 7e70ea049020 CR3: 000178e6e000 CR4: 
> 00750ee0
> [  273.334818] PKRU: 5554
> [  273.334819] Call Trace:
> [  273.334822]  ? __warn+0xa3/0x131
> [  273.334824]  ? amdgpu_sync_keep_later+0x95/0xbd
> [  273.334826]  ? report_bug+0x97/0xfa
> [  273.334829]  ? handle_bug+0x41/0x66
> [  273.334832]  ? exc_invalid_op+0x1b/0x72
> [  273.334835]  ? asm_exc_invalid_op+0x12/0x20
> [  273.334837]  ? native_sched_clock+0x9a/0x9a
> [  273.334840]  ? amdgpu_sync_keep_later+0x95/0xbd
> [  273.334843]  amdgpu_sync_vm_fence+0x23/0x39
> [  273.334846]  amdgpu_cs_ioctl+0x1782/0x1e56
> [  273.334851]  ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
> [  273.334854]  drm_ioctl_kernel+0xdf/0x150
> [  273.334858]  drm_ioctl+0x1f5/0x3d2
> [  273.334928]  ? amdgpu_cs_report_moved_bytes+0x5f/0x5f
> [  273.334932]  amdgpu_drm_ioctl+0x49/0x81
> [  273.334935]  __x64_sys_ioctl+0x7d/0xc8
> [  273.334937]  do_syscall_64+0x42/0x54
> [  273.334939]  entry_SYSCALL_64_after_hwframe+0x4a/0xaf
> [  273.334941] RIP: 0033:0x7e70ff797649
> [  273.334943] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10
> c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00
> 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1d 48 8b 45 c8 64 48 2b 04 25 28
> 00 00
> [  273.334945] RSP: 002b:7e70f8693170 EFLAGS: 0246 ORIG_RAX:
> 0010
> [  273.334947] RAX: ffda RBX:  RCX: 
> 7e70ff797649
> [  273.334948] RDX: 7e70f8693248 RSI: c0186444 RDI: 
> 0013
> [  273.334950] RBP: 7e70f86931c0 R08: 7e70f8693350 R09: 
> 7e70f8693340
> [  273.334951] R10: 000a R11: 0246 R12: 
> c0186444
> [  273.334952] R13: 7e70f8693380 R14: 7e70f8693248 R15: 
> 

Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-08-31 Thread Greg KH
On Thu, Aug 31, 2023 at 03:26:28PM +0200, Christian König wrote:
> Am 31.08.23 um 12:56 schrieb Greg KH:
> > On Thu, Aug 31, 2023 at 12:27:27PM +0200, Christian König wrote:
> > > Am 30.08.23 um 20:53 schrieb Chia-I Wu:
> > > > On Sun, Jul 23, 2023 at 6:24 PM Sasha Levin  wrote:
> > > > > From: Lang Yu 
> > > > > 
> > > > > [ Upstream commit 187916e6ed9d0c3b3abc27429f7a5f8c936bd1f0 ]
> > > > > 
> > > > > When using cpu to update page tables, vm update fences are unused.
> > > > > Install stub fence into these fence pointers instead of NULL
> > > > > to avoid NULL dereference when calling dma_fence_wait() on them.
> > > > > 
> > > > > Suggested-by: Christian König 
> > > > > Signed-off-by: Lang Yu 
> > > > > Reviewed-by: Christian König 
> > > > > Signed-off-by: Alex Deucher 
> > > > > Signed-off-by: Sasha Levin 
> > > > > ---
> > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 --
> > > > >1 file changed, 4 insertions(+), 2 deletions(-)
> > > > We start getting this warning spew on chromeos
> > > Yeah because the older kernels still kept track of the last VM fence in 
> > > the
> > > syncobj.
> > > 
> > > This patch here should probably not have been back ported.
> > > 
> > > Why was that done anyway? The upstream commit doesn't have a CC stable and
> > > this is only a bug fix for a new feature not present on older kernels.
> > It is part of the AUTOSEL process.
> 
> Could we prevent patches from being backported by adding a Fixes: tag?

Yes, that will show exactly where the patch should be backported to.

thanks,

greg k-h


Re: [PATCH AUTOSEL 5.10 13/22] drm/amdgpu: install stub fence into potential unused fence pointers

2023-09-12 Thread Greg KH
On Sun, Sep 10, 2023 at 03:43:01PM -0500, Bryan Jennings wrote:
> This is also causing log spam on 5.15.  It was included in 5.15.128 as
> commit 4921792e04f2125b5eadef9dbe9417a8354c7eff.  I encountered this and
> found https://gitlab.freedesktop.org/drm/amd/-/issues/2820 while researching
> the problem.

Now reverted, thanks.

greg k-h


Re: [REGRESSION] No image on 4k display port displays connected through usb-c dock in kernel 6.10

2024-07-30 Thread Greg KH
On Tue, Jul 30, 2024 at 05:56:42AM +, Lin, Wayne wrote:
> [Public]
> 
> Hi,
> Thanks for the report.
> 
> Patch fa57924c76d995 ("drm/amd/display: Refactor function 
> dm_dp_mst_is_port_support_mode()")
> is kind of correcting problems causing by commit:
> 4df96ba6676034 ("drm/amd/display: Add timing pixel encoding for mst mode 
> validation")
> 
> Sorry if it misses fixes tag and would suggest to backport to fix it. Thanks!

Please submit a backported version to the sta...@vger.kernel.org list
and we will be glad to consider it.

thanks,

greg k-h


Re: [PATCH] drm/amd/display: fix corruption with high refresh rates on DCN 3.0

2024-07-30 Thread Greg KH
On Mon, Jul 29, 2024 at 10:31:38AM +0200, Linux regression tracking (Thorsten 
Leemhuis) wrote:
> On 16.07.24 19:33, Alex Deucher wrote:
> > This reverts commit bc87d666c05a13e6d4ae1ddce41fc43d2567b9a2 and the
> > register changes from commit 6d4279cb99ac4f51d10409501d29969f687ac8dc.
> > 
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3478
> > Cc: mikhail.v.gavri...@gmail.com
> > Cc: Rodrigo Siqueira 
> > Signed-off-by: Alex Deucher 
> > [...]
> 
> Hi Greg, quick note, as I don't see the quoted patch in your 6.10-queue
> yet: you might want to pick up e3615bd198289f ("drm/amd/display: fix
> corruption with high refresh rates on DCN 3.0") [v6.11-rc1, contains
> table tag] rather sooner than later for 6.10.y, as the problem
> apparently hit a few people and even made the news:
> https://lore.kernel.org/all/20240723042420.GA1455@sol.localdomain/
> https://www.reddit.com/r/archlinux/comments/1eaxjzo/linux_610_causes_screen_flicker_on_amd_gpus/

Thanks, now queued up.

greg k-h


Re: [PATCH 6.10] drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()

2024-08-12 Thread Greg KH
On Tue, Jul 30, 2024 at 08:53:39PM +0200, Kevin Holm wrote:
> From: Wayne Lin 
> 
> [ Upstream commit fa57924c76d995e87ca3533ec60d1d5e55769a27 ]
> 
> [Why]
> dm_dp_mst_is_port_support_mode() is a bit not following the original design 
> rule and cause
> light up issue with multiple 4k monitors after mst dsc hub.
> 
> [How]
> Refactor function dm_dp_mst_is_port_support_mode() a bit to solve the light 
> up issue.
> 
> Reviewed-by: Jerry Zuo 
> Acked-by: Zaeem Mohamed 
> Signed-off-by: Wayne Lin 
> Tested-by: Daniel Wheeler 
> Signed-off-by: Alex Deucher 
> [ke...@holm.dev: Resolved merge conflict in .../amdgpu_dm_mst_types.c]
> Fixes: 4df96ba6676034 ("drm/amd/display: Add timing pixel encoding for mst 
> mode validation")
> Link: 
> https://lore.kernel.org/stable/d74a7768e957e6ce88c27a5bece0c64dff132...@holm.dev/T/#u
> Signed-off-by: Kevin Holm 
> ---
> I resolved the merge conflict so that, after this patch is applied to the
> linux-6.10.y branch of the stable git repository, the resulting function
> dm_dp_mst_is_port_support_mode (and also the new function 
> dp_get_link_current_set_bw) is identical to the original commit.
> 
> I've confirmed that it fixes the regression I reported for my use case.

Now queued up, thanks.

greg k-h


AMD drm patch workflow is broken for stable trees

2024-08-13 Thread Greg KH
Hi all,

As some of you have noticed, there's a TON of failure messages being
sent out for AMD gpu driver commits that are tagged for stable
backports.  In short, you all are doing something really wrong with how
you are tagging these.

Please fix it up to NOT have duplicates in multiple branches that end up
in Linus's tree at different times.  Or if you MUST do that, then give
us a chance to figure out that it IS a duplicate.  As-is, it's not
working at all, and I think I need to just drop all patches for this
driver that are tagged for stable going forward and rely on you all to
provide a proper set of backported fixes when you say they are needed.

Again, what you are doing today is NOT ok and is broken.  Please fix.

greg k-h


Re: AMD drm patch workflow is broken for stable trees

2024-08-15 Thread Greg KH
On Wed, Aug 14, 2024 at 04:39:29PM -0400, Felix Kuehling wrote:
> On 2024-08-12 11:00, Greg KH wrote:
> > Hi all,
> > 
> > As some of you have noticed, there's a TON of failure messages being
> > sent out for AMD gpu driver commits that are tagged for stable
> > backports.  In short, you all are doing something really wrong with how
> > you are tagging these.
> Hi Greg,
> 
> I got notifications about one KFD patch failing to apply on six branches
> (6.10, 6.6, 6.1, 5.15, 5.10 and 5.4). The funny thing is, that you already
> applied this patch on two branches back in May. The emails had a suspicious
> looking date in the header (Sep 17, 2001). I wonder if there was some date
> glitch that caused a whole bunch of patches to be re-sent to stable somehow:
> 
>-- original commit in Linus's tree
>-- From 24e82654e98e96cece5d8b919c522054456eeec6 Mon
>Sep 17 00:00:00 2001 From: Alex Deucher
>Date: Sun, 14 Apr 2024 13:06:39 -0400

That's the real date, ignore the 2001 thing, that's from git.


>Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page
>with large pages ...
> 
> On 6.1 and 6.6, the patch was already applied by you in May:
> 
>$ git log --pretty=fuller stable/linux-6.6.y --grep "drm/amdkfd: don't 
> allow mapping the MMIO HDP page with large pages"
>commit 4b4cff994a27ebf7bd3fb9a798a1cdfa8d01b724
>Author: Alex Deucher 
>AuthorDate: Sun Apr 14 13:06:39 2024 -0400
>Commit: Greg Kroah-Hartman 
>CommitDate: Fri May 17 12:02:34 2024 +0200
> 
> drm/amdkfd: don't allow mapping the MMIO HDP page with large pages
>...
> 
> On 6.10 it was already upstream.

Then why did we have it in the tree again with different commit ids?
That's the issue here, and the major confusion as there is no way for us
to determine if this is a commit that has been in the tree already or if
it's just a normal failure.

> On 5.4-5.15 it doesn't apply because of conflicts. I can resolve those and
> send the fixed patches out for you.

If it is needed in those branches, please do so.

thanks,

greg k-h


Re: AMD drm patch workflow is broken for stable trees

2024-08-15 Thread Greg KH
On Wed, Aug 14, 2024 at 05:30:08PM -0400, Alex Deucher wrote:
> On Wed, Aug 14, 2024 at 4:55 PM Felix Kuehling  wrote:
> >
> > On 2024-08-12 11:00, Greg KH wrote:
> > > Hi all,
> > >
> > > As some of you have noticed, there's a TON of failure messages being
> > > sent out for AMD gpu driver commits that are tagged for stable
> > > backports.  In short, you all are doing something really wrong with how
> > > you are tagging these.
> > Hi Greg,
> >
> > I got notifications about one KFD patch failing to apply on six branches
> > (6.10, 6.6, 6.1, 5.15, 5.10 and 5.4). The funny thing is, that you
> > already applied this patch on two branches back in May. The emails had a
> > suspicious looking date in the header (Sep 17, 2001). I wonder if there
> > was some date glitch that caused a whole bunch of patches to be re-sent
> > to stable somehow:
> 
> I think the crux of the problem is that sometimes patches go into
> -next with stable tags and they end getting taken into -fixes as well
> so after the merge window they end up getting picked up for stable
> again.  Going forward, if they land in -next, I'll cherry-pick -x the
> changes into -fixes so there is better traceability.

Please do so, and also work to not have duplicate commits like this in
different branches.  Git can handle merges quite well, please use it.

If this shows up again in the next -rc1 merge window without any
changes, I'll have to just blackhole all amd drm patches going forward
until you all tell me you have fixed your development process.

thanks,

greg k-h


Re: [PATCH 6.10] drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()

2024-08-19 Thread Greg KH
On Sat, Aug 17, 2024 at 10:30:41PM +0200, Kevin Holm wrote:
> 
> 
> On 17.08.24 10:42, Greg KH wrote:
> > On Tue, Jul 30, 2024 at 08:53:39PM +0200, Kevin Holm wrote:
> > > From: Wayne Lin 
> > > 
> > > [ Upstream commit fa57924c76d995e87ca3533ec60d1d5e55769a27 ]
> > > 
> > > [Why]
> > > dm_dp_mst_is_port_support_mode() is a bit not following the original 
> > > design rule and cause
> > > light up issue with multiple 4k monitors after mst dsc hub.
> > > 
> > > [How]
> > > Refactor function dm_dp_mst_is_port_support_mode() a bit to solve the 
> > > light up issue.
> > > 
> > > Reviewed-by: Jerry Zuo 
> > > Acked-by: Zaeem Mohamed 
> > > Signed-off-by: Wayne Lin 
> > > Tested-by: Daniel Wheeler 
> > > Signed-off-by: Alex Deucher 
> > > [ke...@holm.dev: Resolved merge conflict in .../amdgpu_dm_mst_types.c]
> > > Fixes: 4df96ba6676034 ("drm/amd/display: Add timing pixel encoding for 
> > > mst mode validation")
> > > Link: 
> > > https://lore.kernel.org/stable/d74a7768e957e6ce88c27a5bece0c64dff132...@holm.dev/T/#u
> > > Signed-off-by: Kevin Holm 
> > > ---
> > > I resolved the merge conflict so that, after this patch is applied to the
> > > linux-6.10.y branch of the stable git repository, the resulting function
> > > dm_dp_mst_is_port_support_mode (and also the new function
> > > dp_get_link_current_set_bw) is identical to the original commit.
> > > 
> > > I've confirmed that it fixes the regression I reported for my use case.
> > 
> > And it turns out this change breaks the arm and arm64 builds.  I tried
> > to fix it up by applying the fixup afterward for this file, but it's
> > just too much of a mess to unwind this, so I'm going to have to revert
> > this now, sorry.
> That sucks, sorry for the problems my patch caused. :(
> 
> > See:
> > 
> > https://lore.kernel.org/r/b27c5434-f1b1-4697-985b-91bb3e9a2...@roeck-us.net
> > for details.
> I unfortunately don't know the amdgpu driver and kernel code in general 
> enough to help fix
> that. The back-ported patch I send was my first patch to the kernel.
> 
> In the email thread where I reported the problem I send a patch that reverts
> 4df96ba6676034 ("drm/amd/display: Add timing pixel encoding for mst mode 
> validation") to
> fix the problem that way [1]. I've included a copy of that below.
> I've tested that it still applies to 6.10.6-rc3 without conflicts and 
> compiles for me. I
> could not test if the 6.10.6-rc3 with the revert applied fixes the problem as 
> I'm
> traveling and don't have access to my normal setup. I can only say that 
> reverting it on
> v6.10.2 fixed the problem for me.
> 
> I don't know how to compile for other architectures so I did not test that.
> 
> Not sure what would be best, reverting the problem commit so the regression 
> is fixed in
> the 6.10 stable kernel (and maybe breaking something else?) or waiting for 
> someone at AMD
> with better knowledge of the amdgpu driver to back-port the fixing commit in 
> a non-broken
> way.

Yes, this is up to the amd developers now, I suggest you work with them
to get this resolved please.

Or just use 6.11-rc3 and newer :)

thanks,

greg k-h


Re: [PATCH 6.10] drm/amd/display: Refactor function dm_dp_mst_is_port_support_mode()

2024-08-19 Thread Greg KH
On Tue, Jul 30, 2024 at 08:53:39PM +0200, Kevin Holm wrote:
> From: Wayne Lin 
> 
> [ Upstream commit fa57924c76d995e87ca3533ec60d1d5e55769a27 ]
> 
> [Why]
> dm_dp_mst_is_port_support_mode() is a bit not following the original design 
> rule and cause
> light up issue with multiple 4k monitors after mst dsc hub.
> 
> [How]
> Refactor function dm_dp_mst_is_port_support_mode() a bit to solve the light 
> up issue.
> 
> Reviewed-by: Jerry Zuo 
> Acked-by: Zaeem Mohamed 
> Signed-off-by: Wayne Lin 
> Tested-by: Daniel Wheeler 
> Signed-off-by: Alex Deucher 
> [ke...@holm.dev: Resolved merge conflict in .../amdgpu_dm_mst_types.c]
> Fixes: 4df96ba6676034 ("drm/amd/display: Add timing pixel encoding for mst 
> mode validation")
> Link: 
> https://lore.kernel.org/stable/d74a7768e957e6ce88c27a5bece0c64dff132...@holm.dev/T/#u
> Signed-off-by: Kevin Holm 
> ---
> I resolved the merge conflict so that, after this patch is applied to the
> linux-6.10.y branch of the stable git repository, the resulting function
> dm_dp_mst_is_port_support_mode (and also the new function 
> dp_get_link_current_set_bw) is identical to the original commit.
> 
> I've confirmed that it fixes the regression I reported for my use case.

And it turns out this change breaks the arm and arm64 builds.  I tried
to fix it up by applying the fixup afterward for this file, but it's
just too much of a mess to unwind this, so I'm going to have to revert
this now, sorry.

See:

https://lore.kernel.org/r/b27c5434-f1b1-4697-985b-91bb3e9a2...@roeck-us.net
for details.

greg k-h


Re: AMD drm patch workflow is broken for stable trees

2024-08-24 Thread Greg KH
On Fri, Aug 23, 2024 at 05:23:46PM -0400, Alex Deucher wrote:
> On Thu, Aug 15, 2024 at 1:11 AM Greg KH  wrote:
> >
> > On Wed, Aug 14, 2024 at 05:30:08PM -0400, Alex Deucher wrote:
> > > On Wed, Aug 14, 2024 at 4:55 PM Felix Kuehling  
> > > wrote:
> > > >
> > > > On 2024-08-12 11:00, Greg KH wrote:
> > > > > Hi all,
> > > > >
> > > > > As some of you have noticed, there's a TON of failure messages being
> > > > > sent out for AMD gpu driver commits that are tagged for stable
> > > > > backports.  In short, you all are doing something really wrong with 
> > > > > how
> > > > > you are tagging these.
> > > > Hi Greg,
> > > >
> > > > I got notifications about one KFD patch failing to apply on six branches
> > > > (6.10, 6.6, 6.1, 5.15, 5.10 and 5.4). The funny thing is, that you
> > > > already applied this patch on two branches back in May. The emails had a
> > > > suspicious looking date in the header (Sep 17, 2001). I wonder if there
> > > > was some date glitch that caused a whole bunch of patches to be re-sent
> > > > to stable somehow:
> > >
> > > I think the crux of the problem is that sometimes patches go into
> > > -next with stable tags and they end getting taken into -fixes as well
> > > so after the merge window they end up getting picked up for stable
> > > again.  Going forward, if they land in -next, I'll cherry-pick -x the
> > > changes into -fixes so there is better traceability.
> >
> > Please do so, and also work to not have duplicate commits like this in
> > different branches.  Git can handle merges quite well, please use it.
> >
> > If this shows up again in the next -rc1 merge window without any
> > changes, I'll have to just blackhole all amd drm patches going forward
> > until you all tell me you have fixed your development process.
> 
> Just a heads up, you will see some of these when the 6.12 merge window
> due to what is currently in -next and the fixes that went into 6.11,
> but going forward we have updated our process and it should be better.

Can you give me a list of the git ids that I should be ignoring for
6.12-rc1?  Otherwise again, it's a huge waste of time on my side trying
to sift through them and figure out if the rejection is real or not...

thanks,

greg k-h


Re: [PATCH v4.19-v6.1] drm/amdgpu: Using uninitialized value *size when calling

2024-08-30 Thread Greg KH
On Wed, Aug 28, 2024 at 10:15:56AM -0500, Vamsi Krishna Brahmajosyula wrote:
> From: Jesse Zhang 
> 
> [ Upstream commit 88a9a467c548d0b3c7761b4fd54a68e70f9c0944 ]
> 
> Initialize the size before calling amdgpu_vce_cs_reloc, such as case 
> 0x0301.
> V2: To really improve the handling we would actually
>need to have a separate value of 0x.(Christian)
> 
> Signed-off-by: Jesse Zhang 
> Suggested-by: Christian König 
> Reviewed-by: Christian König 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Sasha Levin 
> Signed-off-by: Vamsi Krishna Brahmajosyula 
> 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Now queued up, thanks.

greg k-h


Re: [PATCH] drivers: Fix typo in comment

2022-01-27 Thread Greg KH
On Thu, Jan 27, 2022 at 02:51:56PM +0800, tangmeng wrote:
> Replace disbale with disable and replace unavaibale with unavailable.
> 
> Signed-off-by: tangmeng 
> ---
>  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +-
>  drivers/gpu/drm/tilcdc/tilcdc_crtc.c  | 2 +-
>  drivers/pcmcia/rsrc_nonstatic.c   | 2 +-
>  drivers/usb/chipidea/udc.c| 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)

This needs to be broken up per-subsystem, thanks.

greg k-h


Re: [PATCH] drm/amd/display/dc/calcs/dce_calcs: Fix a memleak in calculate_bandwidth()

2022-01-28 Thread Greg KH
On Tue, Jan 25, 2022 at 12:55:51AM +0800, Zhou Qingyang wrote:
> In calculate_bandwidth(), the tag free_sclk and free_yclk are reversed,
> which could lead to a memory leak of yclk.
> 
> Fix this bug by changing the location of free_sclk and free_yclk.
> 
> This bug was found by a static analyzer.
> 
> Builds with 'make allyesconfig' show no new warnings,
> and our static analyzer no longer warns about this code.
> 
> Fixes: 2be8989d0fc2 ("drm/amd/display/dc/calcs/dce_calcs: Move some large 
> variables from the stack to the heap")
> Signed-off-by: Zhou Qingyang 
> ---
> The analysis employs differential checking to identify inconsistent 
> security operations (e.g., checks or kfrees) between two code paths 
> and confirms that the inconsistent operations are not recovered in the
> current function or the callers, so they constitute bugs. 
> 
> Note that, as a bug found by static analysis, it can be a false
> positive or hard to trigger. Multiple researchers have cross-reviewed
> the bug.
> 
>  drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c 
> b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
> index ff5bb152ef49..e6ef36de0825 100644
> --- a/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
> +++ b/drivers/gpu/drm/amd/display/dc/calcs/dce_calcs.c
> @@ -2033,10 +2033,10 @@ static void calculate_bandwidth(
>   kfree(surface_type);
>  free_tiling_mode:
>   kfree(tiling_mode);
> -free_yclk:
> - kfree(yclk);
>  free_sclk:
>   kfree(sclk);
> +free_yclk:
> + kfree(yclk);
>  }
>  
>  
> /***
> -- 
> 2.25.1
> 

As stated before, umn.edu is still not allowed to contribute to the
Linux kernel.  Please work with your administration to resolve this
issue.



Re: [PATCH] drm/amd/display: Fix a NULL pointer dereference in amdgpu_dm_connector_add_common_modes()

2022-01-28 Thread Greg KH
On Tue, Jan 25, 2022 at 12:57:29AM +0800, Zhou Qingyang wrote:
> In amdgpu_dm_connector_add_common_modes(), amdgpu_dm_create_common_mode()
> is assigned to mode and is passed to drm_mode_probed_add() directly after
> that. drm_mode_probed_add() passes &mode->head to list_add_tail(), and
> there is a dereference of it in list_add_tail() without recoveries, which
> could lead to NULL pointer dereference on failure of
> amdgpu_dm_create_common_mode().
> 
> Fix this by adding a NULL check of mode.
> 
> This bug was found by a static analyzer.
> 
> Builds with 'make allyesconfig' show no new warnings,
> and our static analyzer no longer warns about this code.
> 
> Fixes: e7b07ceef2a6 ("drm/amd/display: Merge amdgpu_dm_types and amdgpu_dm")
> Signed-off-by: Zhou Qingyang 
> ---
> The analysis employs differential checking to identify inconsistent 
> security operations (e.g., checks or kfrees) between two code paths 
> and confirms that the inconsistent operations are not recovered in the
> current function or the callers, so they constitute bugs. 
> 
> Note that, as a bug found by static analysis, it can be a false
> positive or hard to trigger. Multiple researchers have cross-reviewed
> the bug.
> 
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 7f9773f8dab6..9ad94186b146 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -8143,6 +8143,9 @@ static void amdgpu_dm_connector_add_common_modes(struct 
> drm_encoder *encoder,
>   mode = amdgpu_dm_create_common_mode(encoder,
>   common_modes[i].name, common_modes[i].w,
>   common_modes[i].h);
> + if (!mode)
> + continue;
> +
>   drm_mode_probed_add(connector, mode);
>   amdgpu_dm_connector->num_modes++;
>   }
> -- 
> 2.25.1
> 

As stated before, umn.edu is still not allowed to contribute to the
Linux kernel.  Please work with your administration to resolve this
issue.



Re: [PATCH v5 0/2] Add p2p via dmabuf to habanalabs

2021-07-21 Thread Greg KH
On Sun, Jul 11, 2021 at 05:05:59PM +0300, Oded Gabbay wrote:
> Hi,
> This is v5 of this patch-set following again a long email thread.
> 
> It contains fixes to the implementation according to the review that Jason
> did on v4. Jason, I appreciate your feedback. If you can take another look
> to see I didn't miss anything that would be great.
> 
> The details of the fixes are in the changelog in the commit message of
> the second patch.
> 
> There was one issue with your proposal to set the orig_nents to 0. I did
> that, but I also had to restore it to nents before calling sg_free_table
> because that function uses orig_nents to iterate.

Reviewed-by: Greg Kroah-Hartman 
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-02-28 Thread Greg KH
On Mon, Feb 28, 2022 at 12:08:18PM +0100, Jakob Koschel wrote:
> If the list does not contain the expected element, the value of
> list_for_each_entry() iterator will not point to a valid structure.
> To avoid type confusion in such case, the list iterator
> scope will be limited to list_for_each_entry() loop.
> 
> In preparation to limiting scope of a list iterator to the list traversal
> loop, use a dedicated pointer to point to the found element.
> Determining if an element was found is then simply checking if
> the pointer is != NULL.
> 
> Signed-off-by: Jakob Koschel 
> ---
>  arch/x86/kernel/cpu/sgx/encl.c   |  6 +++--
>  drivers/scsi/scsi_transport_sas.c| 17 -
>  drivers/thermal/thermal_core.c   | 38 ++--
>  drivers/usb/gadget/configfs.c| 22 ++--
>  drivers/usb/gadget/udc/max3420_udc.c | 11 +---
>  drivers/usb/gadget/udc/tegra-xudc.c  | 11 +---
>  drivers/usb/mtu3/mtu3_gadget.c   | 11 +---
>  drivers/usb/musb/musb_gadget.c   | 11 +---
>  drivers/vfio/mdev/mdev_core.c| 11 +---
>  9 files changed, 88 insertions(+), 50 deletions(-)

The drivers/usb/ portion of this patch should be in patch 1/X, right?

Also, you will have to split these up per-subsystem so that the
different subsystem maintainers can take these in their trees.

thanks,

greg k-h


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Greg KH
On Mon, Feb 28, 2022 at 01:06:57PM +0100, Jakob Koschel wrote:
> 
> 
> > On 28. Feb 2022, at 12:20, Greg KH  wrote:
> > 
> > On Mon, Feb 28, 2022 at 12:08:18PM +0100, Jakob Koschel wrote:
> >> If the list does not contain the expected element, the value of
> >> list_for_each_entry() iterator will not point to a valid structure.
> >> To avoid type confusion in such case, the list iterator
> >> scope will be limited to list_for_each_entry() loop.
> >> 
> >> In preparation to limiting scope of a list iterator to the list traversal
> >> loop, use a dedicated pointer to point to the found element.
> >> Determining if an element was found is then simply checking if
> >> the pointer is != NULL.
> >> 
> >> Signed-off-by: Jakob Koschel 
> >> ---
> >> arch/x86/kernel/cpu/sgx/encl.c   |  6 +++--
> >> drivers/scsi/scsi_transport_sas.c| 17 -
> >> drivers/thermal/thermal_core.c   | 38 ++--
> >> drivers/usb/gadget/configfs.c| 22 ++--
> >> drivers/usb/gadget/udc/max3420_udc.c | 11 +---
> >> drivers/usb/gadget/udc/tegra-xudc.c  | 11 +---
> >> drivers/usb/mtu3/mtu3_gadget.c   | 11 +---
> >> drivers/usb/musb/musb_gadget.c   | 11 +---
> >> drivers/vfio/mdev/mdev_core.c| 11 +---
> >> 9 files changed, 88 insertions(+), 50 deletions(-)
> > 
> > The drivers/usb/ portion of this patch should be in patch 1/X, right?
> 
> I kept them separate since it's a slightly different case.
> Using 'ptr' instead of '&ptr->other_member'. Regardless, I'll split
> this commit up as you mentioned.
> 
> > 
> > Also, you will have to split these up per-subsystem so that the
> > different subsystem maintainers can take these in their trees.
> 
> Thanks for the feedback.
> To clarify I understand you correctly:
> I should do one patch set per-subsystem with every instance/(file?)
> fixed as a separate commit?

Yes, each file needs a different commit as they are usually all written
or maintained by different people.

As I said in my other response, if you need any help with this, just let
me know, I'm glad to help.

thanks,

greg k-h


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Greg KH
On Tue, Mar 01, 2022 at 12:28:15PM +0100, Jakob Koschel wrote:
> 
> 
> > On 1. Mar 2022, at 01:41, Linus Torvalds  
> > wrote:
> > 
> > On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  
> > wrote:
> >> 
> >> The goal of this is to get compiler warnings right? This would indeed be 
> >> great.
> > 
> > Yes, so I don't mind having a one-time patch that has been gathered
> > using some automated checker tool, but I don't think that works from a
> > long-term maintenance perspective.
> > 
> > So if we have the basic rule being "don't use the loop iterator after
> > the loop has finished, because it can cause all kinds of subtle
> > issues", then in _addition_ to fixing the existing code paths that
> > have this issue, I really would want to (a) get a compiler warning for
> > future cases and (b) make it not actually _work_ for future cases.
> > 
> > Because otherwise it will just happen again.
> > 
> >> Changing the list_for_each_entry() macro first will break all of those 
> >> cases
> >> (e.g. the ones using 'list_entry_is_head()).
> > 
> > So I have no problems with breaking cases that we basically already
> > have a patch for due to  your automated tool. There were certainly
> > more than a handful, but it didn't look _too_ bad to just make the
> > rule be "don't use the iterator after the loop".
> > 
> > Of course, that's just based on that patch of yours. Maybe there are a
> > ton of other cases that your patch didn't change, because they didn't
> > match your trigger case, so I may just be overly optimistic here.
> 
> Based on the coccinelle script there are ~480 cases that need fixing
> in total. I'll now finish all of them and then split them by
> submodules as Greg suggested and repost a patch set per submodule.
> Sounds good?

Sounds good to me!

If you need help carving these up and maintaining them over time as
different subsystem maintainers accept/ignore them, just let me know.
Doing large patchsets like this can be tough without a lot of
experience.

thanks,

greg k-h


Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Greg KH
On Tue, Mar 01, 2022 at 06:40:04PM +0100, Jakob Koschel wrote:
> 
> 
> > On 1. Mar 2022, at 18:36, Greg KH  wrote:
> > 
> > On Tue, Mar 01, 2022 at 12:28:15PM +0100, Jakob Koschel wrote:
> >> 
> >> 
> >>> On 1. Mar 2022, at 01:41, Linus Torvalds  
> >>> wrote:
> >>> 
> >>> On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  
> >>> wrote:
> >>>> 
> >>>> The goal of this is to get compiler warnings right? This would indeed be 
> >>>> great.
> >>> 
> >>> Yes, so I don't mind having a one-time patch that has been gathered
> >>> using some automated checker tool, but I don't think that works from a
> >>> long-term maintenance perspective.
> >>> 
> >>> So if we have the basic rule being "don't use the loop iterator after
> >>> the loop has finished, because it can cause all kinds of subtle
> >>> issues", then in _addition_ to fixing the existing code paths that
> >>> have this issue, I really would want to (a) get a compiler warning for
> >>> future cases and (b) make it not actually _work_ for future cases.
> >>> 
> >>> Because otherwise it will just happen again.
> >>> 
> >>>> Changing the list_for_each_entry() macro first will break all of those 
> >>>> cases
> >>>> (e.g. the ones using 'list_entry_is_head()).
> >>> 
> >>> So I have no problems with breaking cases that we basically already
> >>> have a patch for due to  your automated tool. There were certainly
> >>> more than a handful, but it didn't look _too_ bad to just make the
> >>> rule be "don't use the iterator after the loop".
> >>> 
> >>> Of course, that's just based on that patch of yours. Maybe there are a
> >>> ton of other cases that your patch didn't change, because they didn't
> >>> match your trigger case, so I may just be overly optimistic here.
> >> 
> >> Based on the coccinelle script there are ~480 cases that need fixing
> >> in total. I'll now finish all of them and then split them by
> >> submodules as Greg suggested and repost a patch set per submodule.
> >> Sounds good?
> > 
> > Sounds good to me!
> > 
> > If you need help carving these up and maintaining them over time as
> > different subsystem maintainers accept/ignore them, just let me know.
> > Doing large patchsets like this can be tough without a lot of
> > experience.
> 
> Very much appreciated!
> 
> There will probably be some cases that do not match one of the pattern
> we already discussed and need separate attention.
> 
> I was planning to start with one subsystem and adjust the coming ones
> according to the feedback gather there instead of posting all of them
> in one go.

That seems wise.  Feel free to use USB as a testing ground for this if
you want to :)

thanks,

greg k-h


Re: AMD Display Core (DC) patches (was: [PATCH 13/16] drm/amd/display: Revert FEC check in validation)

2022-04-12 Thread Greg KH
On Tue, Apr 12, 2022 at 08:52:11AM +0200, Paul Menzel wrote:
> > Reviewed-by: George Shen 
> > Acked-by: Alex Hung 
> > Signed-off-by: Martin Leung 
> 
> Shouldn’t the Signed-off-by line by the author go first?

No, this is the correct order.

thanks,

greg k-h


Re: [PATCH v4 00/41] DYNDBG: opt-in class'd debug for modules, use in drm.

2022-09-07 Thread Greg KH
On Tue, Sep 06, 2022 at 09:18:09PM +0200, Daniel Vetter wrote:
> On Fri, Aug 12, 2022 at 08:03:47AM +0200, Greg KH wrote:
> > On Thu, Aug 11, 2022 at 06:52:40PM +0200, Daniel Vetter wrote:
> > > On Wed, Aug 03, 2022 at 04:13:05PM -0400, Jason Baron wrote:
> > > > 
> > > > 
> > > > On 8/3/22 15:56, jim.cro...@gmail.com wrote:
> > > > > On Wed, Jul 20, 2022 at 9:32 AM Jim Cromie  
> > > > > wrote:
> > > > >>
> > > > > 
> > > > >> Hi Jason, Greg, DRM-folk,
> > > > >>
> > > > >> This adds 'typed' "class FOO" support to dynamic-debug, where 'typed'
> > > > >> means either DISJOINT (like drm debug categories), or VERBOSE (like
> > > > >> nouveau debug-levels).  Use it in DRM modules: core, helpers, and in
> > > > >> drivers i915, amdgpu, nouveau.
> > > > >>
> > > > > 
> > > > > This revision fell over, on a conflict with something in drm-MUMBLE
> > > > > 
> > > > > Error: patch 
> > > > > https://urldefense.com/v3/__https://patchwork.freedesktop.org/api/1.0/series/106427/revisions/2/mbox/__;!!GjvTz_vk!UCPl5Uf32cDVwwysMTfaLwoGLWomargFXuR8HjBA3xsUOjxXHXC5hneAkP4iWK91yc-LjjJxWW89-51Z$
> > > > >  
> > > > > not applied
> > > > > Applying: dyndbg: fix static_branch manipulation
> > > > > Applying: dyndbg: fix module.dyndbg handling
> > > > > Applying: dyndbg: show both old and new in change-info
> > > > > Applying: dyndbg: reverse module walk in cat control
> > > > > Applying: dyndbg: reverse module.callsite walk in cat control
> > > > > Applying: dyndbg: use ESCAPE_SPACE for cat control
> > > > > Applying: dyndbg: let query-modname override actual module name
> > > > > Applying: dyndbg: add test_dynamic_debug module
> > > > > Applying: dyndbg: drop EXPORTed dynamic_debug_exec_queries
> > > > > 
> > > > > Jason,
> > > > > those above are decent maintenance patches, particularly the drop 
> > > > > export.
> > > > > It would be nice to trim this unused api this cycle.
> > > > 
> > > > Hi Jim,
> > > > 
> > > > Agreed - I was thinking the same thing. Feel free to add
> > > > Acked-by: Jason Baron  to those first 9.
> > > 
> > > Does Greg KH usually pick up dyndbg patches or someone else or do I need
> > > to do something? Would be great to get some movement here since -rc1 goes
> > > out and merging will restart next week.
> > 
> > Yes, I can take these into my tree after -rc1 is out.
> 
> [uncovering from an absolutely impressive cascade of holes :-(]
> 
> Did this happen and I can stop worrying here? I'd like to make sure these
> drm debug infra improvements keep moving.

I didn't take these, and I think I saw a 6th series sent:
https://lore.kernel.org/r/20220904214134.408619-1-jim.cro...@gmail.com

If you ack them, I will pick them up.

thanks,

greg k-h


Re: [PATCH v6 21/57] dyndbg: test DECLARE_DYNDBG_CLASSMAP, sysfs nodes

2022-09-07 Thread Greg KH
On Sun, Sep 04, 2022 at 03:40:58PM -0600, Jim Cromie wrote:
> Demonstrate use of DECLARE_DYNDBG_CLASSMAP macro, and expose them as
> sysfs-nodes for testing.

Wait, why sysfs?

sysfs isn't for testing, why not use debugfs?


> 
> For each of the 4 class-map-types:
> 
>   - declare a class-map of that type,
>   - declare the enum corresponding to those class-names
>   - share _base across 0..30 range
>   - add a __pr_debug_cls() call for each class-name
>   - declare 2 sysnodes for each class-map
> for 'p' flag, and future 'T' flag
> 
> These declarations create the following sysfs parameter interface:
> 
>   :#> pwd
>   /sys/module/test_dynamic_debug/parameters
>   :#> ls
>   T_disjoint_bits  T_disjoint_names  T_level_names  T_level_num  do_prints
>   p_disjoint_bits  p_disjoint_names  p_level_names  p_level_num

What is in these files?

For sysfs stuff, you need Documentation/ABI entries so that you can't
abuse sysfs.  With debugfs, you can do anything you want :)

thanks,

greg k-h


Re: [PATCH v6 21/57] dyndbg: test DECLARE_DYNDBG_CLASSMAP, sysfs nodes

2022-09-07 Thread Greg KH
On Wed, Sep 07, 2022 at 04:54:00PM +0200, Greg KH wrote:
> On Sun, Sep 04, 2022 at 03:40:58PM -0600, Jim Cromie wrote:
> > Demonstrate use of DECLARE_DYNDBG_CLASSMAP macro, and expose them as
> > sysfs-nodes for testing.
> 
> Wait, why sysfs?
> 
> sysfs isn't for testing, why not use debugfs?
> 
> 
> > 
> > For each of the 4 class-map-types:
> > 
> >   - declare a class-map of that type,
> >   - declare the enum corresponding to those class-names
> >   - share _base across 0..30 range
> >   - add a __pr_debug_cls() call for each class-name
> >   - declare 2 sysnodes for each class-map
> > for 'p' flag, and future 'T' flag
> > 
> > These declarations create the following sysfs parameter interface:
> > 
> >   :#> pwd
> >   /sys/module/test_dynamic_debug/parameters
> >   :#> ls
> >   T_disjoint_bits  T_disjoint_names  T_level_names  T_level_num  do_prints
> >   p_disjoint_bits  p_disjoint_names  p_level_names  p_level_num
> 
> What is in these files?
> 
> For sysfs stuff, you need Documentation/ABI entries so that you can't
> abuse sysfs.  With debugfs, you can do anything you want :)

Ah, it's just module parameter abuse, not a "normal" sysfs file at all :)

Nevermind, sorry for the noise...

greg k-h


Re: [PATCH v6 00/57] DYNDBG: opt-in class'd debug for modules, use in drm.

2022-09-07 Thread Greg KH
On Sun, Sep 04, 2022 at 03:40:37PM -0600, Jim Cromie wrote:
> hi Greg, Jason, DRM-folk, Steven,
> 
> If Im not too late for linux-next in this cycle, heres V6.  Diffs are minor:
> 
>  - rebased onto e47eb90a0a9a (tag: next-20220901, linux-next/master)
>gets past Kconfig conflict, same for drm-tip.
>  - uint debug_level, not ulong.  to fit nouveau.
>  - -1 on param-read-back, to match prev write val.
>  - added back tracefs parts, missing from -V5
>updated for tracing/events: Add __vstring() and __assign_vstr() helper 
> macros
>no decorations-lite in TP_printk, do it right later.
>  - commit-msg tweaks
> 
> Theres also new RFC stuff with the potential to reduce the size of the
> __dyndbgs section by 20%.  Not ready for prime time, or linux-next,
> but I hope compelling.
> 
> FEATURE DESCRIPTION
> 
> dyndbg provides DECLARE_DYNAMIC_DEBUG_CLASSMAP() which allows module
> authors to declare "good" class-names, of 4 types.
> 
>   DYNAMIC_DEBUG_CLASSMAP(drm_debug_classes,
>   DD_CLASS_TYPE_DISJOINT_BITS, offset,
> "DRM_UT_CORE",
> "DRM_UT_DRIVER",
> "DRM_UT_KMS",
> "DRM_UT_PRIME",
> "DRM_UT_ATOMIC",
> "DRM_UT_VBL",
> "DRM_UT_STATE",
> "DRM_UT_LEASE",
> "DRM_UT_DP",
> "DRM_UT_DRMRES");
> 
> That usage authorizes dyndbg to set class'd pr_debugs accordingly:
> 
>   echo class DRM_UT_CORE +p > /proc/dynamic_debug/control
>   echo class DRM_UT_KMS  +p > /proc/dynamic_debug/control
> 
> Because the DRM modules declare the same classes, they each authorize
> dyndbg with the same classnames, which allows dyndbg to effect changes
> to its selected class'd prdbgs.
> 
> Opting in by using the macro effectively privatizes the limited
> 63-classes available per module; only modules which share classnames
> must coordinate their use of the common range, and they can
> independently use the remaining id-space.
> 
> Other dyndbg filtering pertains too, so single sites can be selected.
> 
> 
> 4 DD_CLASS_TYPE_*_*s determine 2 behaviors:
> 
>   DISJOINTbits are independent, like drm.debug categories
>   LEVELs  3>2, turns on level-2, like nouveau debug-levels
>   NUM/BITSnumeric input, bitmap if disjoint, else 0-32.
>   NAMES   accept proper names, like DRM_UT_CORE
> 
> Dyndbg provides param-callbacks which enforce those behaviors:
> 
>   # DISJOINT_BITS
>   echo 0x03 > /sys/module/drm/parameters/debug
> 
>   # LEVEL_NUM
>   echo 3 > /sys/module/drm/nouveau/debug-mumble*
> 
>   # DISJOINT_NAMES
>   echo +DRM_UT_CORE,+DRM_UT_KMS,-DRM_UT_DRIVER > 
> /sys/module/drm/parameters/debug_categories
> 
>   # LEVEL_NAMES
>   echo NV_TRACE > /sys/module/nouveau/parameters/debug-mumble*
> 
> That design choice is allowed cuz verbosity is always attached to a
> (user visible) interface, and theres no reason not to put the
> implementation there (in the callback).  It also considerably
> simplifies things; ddebug_change can treat class_id's as disjoint,
> period.
> 

I've applied the first 21 patches in this series to my
driver-core-testing branch, and if 0-day doesn't blow up on it, I'll
move it to my -next branch tomorrow and it will get a wider testing
range in the linux-next releases.

Feel free to rebase the rest of the series on top of my tree and
resubmit after addressing Daniel's comments.

thanks,

greg k-h


Re: [PATCH v7 0/9] dyndbg: drm.debug adaptation

2022-09-26 Thread Greg KH
On Sun, Sep 11, 2022 at 11:28:43PM -0600, Jim Cromie wrote:
> hi Greg, Dan, Jason, DRM-folk,
> 
> heres follow-up to V6:
>   rebased on driver-core/driver-core-next for -v6 applied bits (thanks)
>   rework drm_debug_enabled{_raw,_instrumented,} per Dan.
> 
> It excludes:
>   nouveau parts (immature)
>   tracefs parts (I missed --to=Steve on v6)
>   split _ddebug_site and de-duplicate experiment (way unready)
> 
> IOW, its the remaining commits of V6 on which Dan gave his Reviewed-by.
> 
> If these are good to apply, I'll rebase and repost the rest separately.

All now queued up, thanks.

greg k-h


Re: [PATCH] drm/amdgpu: Remove ATC L2 access for MMHUB 2.1.x

2022-10-17 Thread Greg KH
On Tue, Oct 18, 2022 at 10:17:24AM +0530, Lijo Lazar wrote:
> MMHUB 2.1.x versions don't have ATCL2. Remove accesses to ATCL2 registers.
> 
> Since they are non-existing registers, read access will cause a
> 'Completer Abort' and gets reported when AER is enabled with the below patch.
> Tagging with the patch so that this is backported along with it.
> 
> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in 
> get_port_device_capability()")
> 
> Signed-off-by: Lijo Lazar 
> ---
>  drivers/gpu/drm/amd/amdgpu/mmhub_v2_0.c | 28 +++--
>  1 file changed, 8 insertions(+), 20 deletions(-)



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.




Re: [PATCH 00/11] fix memory leak while kset_register() fails

2022-10-21 Thread Greg KH
On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote:
> On 2022-10-20 22:20, Yang Yingliang wrote:
> > The previous discussion link:
> > https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211...@huawei.com/T/
> 
> The very first discussion on this was here:
> 
> https://www.spinics.net/lists/dri-devel/msg368077.html
> 
> Please use this link, and not the that one up there you which quoted above,
> and whose commit description is taken verbatim from the this link.
> 
> > 
> > kset_register() is currently used in some places without calling
> > kset_put() in error path, because the callers think it should be
> > kset internal thing to do, but the driver core can not know what
> > caller doing with that memory at times. The memory could be freed
> > both in kset_put() and error path of caller, if it is called in
> > kset_register().
> 
> As I explained in the link above, the reason there's
> a memory leak is that one cannot call kset_register() without
> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL,
> in this case, i.e. kset_register() fails with -EINVAL.
> 
> Thus, the most common usage is something like this:
> 
>   kobj_set_name(&kset->kobj, format, ...);
>   kset->kobj.kset = parent_kset;
>   kset->kobj.ktype = ktype;
>   res = kset_register(kset);
> 
> So, what is being leaked, is the memory allocated in kobj_set_name(),
> by the common idiom shown above. This needs to be mentioned in
> the documentation, at least, in case, in the future this is absolved
> in kset_register() redesign, etc.

Based on this, can kset_register() just clean up from itself when an
error happens?  Ideally that would be the case, as the odds of a kset
being embedded in a larger structure is probably slim, but we would have
to search the tree to make sure.

thanks,

greg k-h


Re: [PATCH 00/11] fix memory leak while kset_register() fails

2022-10-21 Thread Greg KH
On Fri, Oct 21, 2022 at 03:55:18AM -0400, Luben Tuikov wrote:
> On 2022-10-21 01:37, Greg KH wrote:
> > On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote:
> >> On 2022-10-20 22:20, Yang Yingliang wrote:
> >>> The previous discussion link:
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F0db486eb-6927-927e-3629-958f8f211194%40huawei.com%2FT%2F&data=05%7C01%7Cluben.tuikov%40amd.com%7C65b33f087ef245a9f23708dab3264840%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019274318153227%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=1ZoieEob62iU9kI8fvpp20qGut9EeHKIHtCAT01t%2Bz8%3D&reserved=0
> >>
> >> The very first discussion on this was here:
> >>
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Fdri-devel%2Fmsg368077.html&data=05%7C01%7Cluben.tuikov%40amd.com%7C65b33f087ef245a9f23708dab3264840%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638019274318153227%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9joWxGLUxZZMvrfkxCR8KbkoXifsqoMK0vGR%2FyEG62w%3D&reserved=0
> >>
> >> Please use this link, and not the that one up there you which quoted above,
> >> and whose commit description is taken verbatim from the this link.
> >>
> >>>
> >>> kset_register() is currently used in some places without calling
> >>> kset_put() in error path, because the callers think it should be
> >>> kset internal thing to do, but the driver core can not know what
> >>> caller doing with that memory at times. The memory could be freed
> >>> both in kset_put() and error path of caller, if it is called in
> >>> kset_register().
> >>
> >> As I explained in the link above, the reason there's
> >> a memory leak is that one cannot call kset_register() without
> >> the kset->kobj.name being set--kobj_add_internal() returns -EINVAL,
> >> in this case, i.e. kset_register() fails with -EINVAL.
> >>
> >> Thus, the most common usage is something like this:
> >>
> >>kobj_set_name(&kset->kobj, format, ...);
> >>kset->kobj.kset = parent_kset;
> >>kset->kobj.ktype = ktype;
> >>res = kset_register(kset);
> >>
> >> So, what is being leaked, is the memory allocated in kobj_set_name(),
> >> by the common idiom shown above. This needs to be mentioned in
> >> the documentation, at least, in case, in the future this is absolved
> >> in kset_register() redesign, etc.
> > 
> > Based on this, can kset_register() just clean up from itself when an
> > error happens?  Ideally that would be the case, as the odds of a kset
> > being embedded in a larger structure is probably slim, but we would have
> > to search the tree to make sure.
> 
> Looking at kset_register(), we can add kset_put() in the error path,
> when kobject_add_internal(&kset->kobj) fails.
> 
> See the attached patch. It needs to be tested with the same error injection
> as Yang has been doing.
> 
> Now, struct kset is being embedded in larger structs--see amdgpu_discovery.c
> starting at line 575. If you're on an AMD system, it gets you the tree
> structure you'll see when you run "tree 
> /sys/class/drm/card0/device/ip_discovery/".
> That shouldn't be a problem though.

Yes, that shouldn't be an issue as the kobject embedded in a kset is
ONLY for that kset itself, the kset structure should not be controling
the lifespan of the object it is embedded in, right?

Note, the use of ksets by a device driver like you are doing here in the
amd driver is BROKEN and will cause problems by userspace tools.  Don't
do that please, just use a single subdirectory for an attribute.  Doing
deeper stuff like this is sure to cause problems and be a headache.

thanks,

greg k-h


Re: [PATCH 01/11] kset: fix documentation for kset_register()

2022-10-21 Thread Greg KH
On Fri, Oct 21, 2022 at 04:05:18PM +0800, Yang Yingliang wrote:
> 
> On 2022/10/21 13:34, Luben Tuikov wrote:
> > On 2022-10-20 22:20, Yang Yingliang wrote:
> > > kset_register() is currently used in some places without calling
> > > kset_put() in error path, because the callers think it should be
> > > kset internal thing to do, but the driver core can not know what
> > > caller doing with that memory at times. The memory could be freed
> > > both in kset_put() and error path of caller, if it is called in
> > > kset_register().
> > > 
> > > So make the function documentation more explicit about calling
> > > kset_put() in the error path of caller.
> > > 
> > > Signed-off-by: Yang Yingliang 
> > > ---
> > >   lib/kobject.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/lib/kobject.c b/lib/kobject.c
> > > index a0b2dbfcfa23..6da04353d974 100644
> > > --- a/lib/kobject.c
> > > +++ b/lib/kobject.c
> > > @@ -834,6 +834,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
> > >   /**
> > >* kset_register() - Initialize and add a kset.
> > >* @k: kset.
> > > + *
> > > + * If this function returns an error, kset_put() must be called to
> > > + * properly clean up the memory associated with the object.
> > >*/
> > And I'd continue the sentence, with " ... with the object,
> > for instance the memory for the kset.kobj.name when 
> > kobj_set_name(&kset.kobj, format, ...)
> > was called before calling kset_register()."
> kobject_cleanup() not only frees name, but aslo calls ->release() to free
> another resources.

Yes, but it's the kobject of the kset, which does need to have it's name
cleaned up, but that kobject should NOT be freeing any larger structures
that the kset might be embedded in, right?

> > This makes it clear what we want to make sure is freed, in case of an early 
> > error
> > from kset_register().
> 
> How about like this:
> 
> If this function returns an error, kset_put() must be called to clean up the 
> name of
> kset object and other memory associated with the object.

Again, I think we can fix this up to not be needed.

thanks,

greg k-h


Re: [PATCH 00/11] fix memory leak while kset_register() fails

2022-10-21 Thread Greg KH
On Fri, Oct 21, 2022 at 04:24:23PM +0800, Yang Yingliang wrote:
> 
> On 2022/10/21 13:37, Greg KH wrote:
> > On Fri, Oct 21, 2022 at 01:29:31AM -0400, Luben Tuikov wrote:
> > > On 2022-10-20 22:20, Yang Yingliang wrote:
> > > > The previous discussion link:
> > > > https://lore.kernel.org/lkml/0db486eb-6927-927e-3629-958f8f211...@huawei.com/T/
> > > The very first discussion on this was here:
> > > 
> > > https://www.spinics.net/lists/dri-devel/msg368077.html
> > > 
> > > Please use this link, and not the that one up there you which quoted 
> > > above,
> > > and whose commit description is taken verbatim from the this link.
> > > 
> > > > kset_register() is currently used in some places without calling
> > > > kset_put() in error path, because the callers think it should be
> > > > kset internal thing to do, but the driver core can not know what
> > > > caller doing with that memory at times. The memory could be freed
> > > > both in kset_put() and error path of caller, if it is called in
> > > > kset_register().
> > > As I explained in the link above, the reason there's
> > > a memory leak is that one cannot call kset_register() without
> > > the kset->kobj.name being set--kobj_add_internal() returns -EINVAL,
> > > in this case, i.e. kset_register() fails with -EINVAL.
> > > 
> > > Thus, the most common usage is something like this:
> > > 
> > >   kobj_set_name(&kset->kobj, format, ...);
> > >   kset->kobj.kset = parent_kset;
> > >   kset->kobj.ktype = ktype;
> > >   res = kset_register(kset);
> > > 
> > > So, what is being leaked, is the memory allocated in kobj_set_name(),
> > > by the common idiom shown above. This needs to be mentioned in
> > > the documentation, at least, in case, in the future this is absolved
> > > in kset_register() redesign, etc.
> > Based on this, can kset_register() just clean up from itself when an
> > error happens?  Ideally that would be the case, as the odds of a kset
> > being embedded in a larger structure is probably slim, but we would have
> > to search the tree to make sure.
> I have search the whole tree, the kset used in bus_register() - patch #3,
> kset_create_and_add() - patch #4
> __class_register() - patch #5,  fw_cfg_build_symlink() - patch #6 and
> amdgpu_discovery.c - patch #10
> is embedded in a larger structure. In these cases, we can not call
> kset_put() in error path in kset_register()

Yes you can as the kobject in the kset should NOT be controling the
lifespan of those larger objects.

If it is, please point out the call chain here as I don't think that
should be possible.

Note all of this is a mess because the kobject name stuff was added much
later, after the driver model had been created and running for a while.
We missed this error path when adding the dynamic kobject name logic,
thank for looking into this.

If you could test the patch posted with your error injection systems,
that could make this all much simpler to solve.

thanks,

greg k-h


Re: [PATCH v2] kset: fix memory leak when kset_register() returns error

2022-10-24 Thread Greg KH
On Mon, Oct 24, 2022 at 08:19:10PM +0800, Yang Yingliang wrote:
> Inject fault while loading module, kset_register() may fail.
> If it fails, the name allocated by kobject_set_name() which
> is called before kset_register() is leaked, because refcount
> of kobject is hold in kset_init().
> 
> As a kset may be embedded in a larger structure which needs
> be freed in release() function or error path in callers, we
> can not call kset_put() in kset_register(), or it will cause
> double free, so just call kfree_const() to free the name and
> set it to NULL.
> 
> With this fix, the callers don't need to care about the name
> freeing and call an extra kset_put() if kset_register() fails.
> 
> Suggested-by: Luben Tuikov 
> Signed-off-by: Yang Yingliang 
> ---
> v1 -> v2:
>   Free name inside of kset_register() instead of calling kset_put()
>   in drivers.
> ---
>  lib/kobject.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/kobject.c b/lib/kobject.c
> index a0b2dbfcfa23..3409a89c81e5 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -834,6 +834,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
>  /**
>   * kset_register() - Initialize and add a kset.
>   * @k: kset.
> + *
> + * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
> + * which is called before kset_register() in caller need be freed.

This comment doesn't make any sense anymore.  No caller needs to worry
about this, right?

>   */
>  int kset_register(struct kset *k)
>  {
> @@ -844,8 +847,11 @@ int kset_register(struct kset *k)
>  
>   kset_init(k);
>   err = kobject_add_internal(&k->kobj);
> - if (err)
> + if (err) {
> + kfree_const(k->kobj.name);
> + k->kobj.name = NULL;

Why are you setting the name here to NULL?

thanks,

greg k-h


Re: [PATCH v2] kset: fix memory leak when kset_register() returns error

2022-10-24 Thread Greg KH
On Mon, Oct 24, 2022 at 10:39:44PM +0800, Yang Yingliang wrote:
> 
> On 2022/10/24 21:52, Greg KH wrote:
> > On Mon, Oct 24, 2022 at 08:19:10PM +0800, Yang Yingliang wrote:
> > > Inject fault while loading module, kset_register() may fail.
> > > If it fails, the name allocated by kobject_set_name() which
> > > is called before kset_register() is leaked, because refcount
> > > of kobject is hold in kset_init().
> > > 
> > > As a kset may be embedded in a larger structure which needs
> > > be freed in release() function or error path in callers, we
> > > can not call kset_put() in kset_register(), or it will cause
> > > double free, so just call kfree_const() to free the name and
> > > set it to NULL.
> > > 
> > > With this fix, the callers don't need to care about the name
> > > freeing and call an extra kset_put() if kset_register() fails.
> > > 
> > > Suggested-by: Luben Tuikov 
> > > Signed-off-by: Yang Yingliang 
> > > ---
> > > v1 -> v2:
> > >Free name inside of kset_register() instead of calling kset_put()
> > >in drivers.
> > > ---
> > >   lib/kobject.c | 8 +++-
> > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/lib/kobject.c b/lib/kobject.c
> > > index a0b2dbfcfa23..3409a89c81e5 100644
> > > --- a/lib/kobject.c
> > > +++ b/lib/kobject.c
> > > @@ -834,6 +834,9 @@ EXPORT_SYMBOL_GPL(kobj_sysfs_ops);
> > >   /**
> > >* kset_register() - Initialize and add a kset.
> > >* @k: kset.
> > > + *
> > > + * NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
> > > + * which is called before kset_register() in caller need be freed.
> > This comment doesn't make any sense anymore.  No caller needs to worry
> > about this, right?
> With this fix, the name is freed inside of kset_register(), it can not be
> accessed,

Agreed.

> if it allocated dynamically, but callers don't know this if no comment here,
> they may use it in error path (something like to print error message with
> it),
> so how about comment like this to tell callers not to use the name:
> 
> NOTE: On error, the kset.kobj.name allocated by() kobj_set_name()
> is freed, it can not be used any more.

Sure, that's a better way to word it.

> > >*/
> > >   int kset_register(struct kset *k)
> > >   {
> > > @@ -844,8 +847,11 @@ int kset_register(struct kset *k)
> > >   kset_init(k);
> > >   err = kobject_add_internal(&k->kobj);
> > > - if (err)
> > > + if (err) {
> > > + kfree_const(k->kobj.name);
> > > + k->kobj.name = NULL;
> > Why are you setting the name here to NULL?
> I set it to NULL to avoid accessing bad pointer in callers,
> if callers use it in error path, current callers won't use this
> name pointer in error path, so we can remove this assignment?

Ah, I didn't think about using it on error paths.  Ideally that would
never happen, but that's good to set just to make it obvious.  How about
adding a small comment here saying why you are setting it so we all
remember it in 5 years when we look at the code again.

thanks,

greg k-h


Re: [PATCH v3] kset: fix memory leak when kset_register() returns error

2022-10-25 Thread Greg KH
On Tue, Oct 25, 2022 at 03:15:49PM +0800, Yang Yingliang wrote:
> Inject fault while loading module, kset_register() may fail.
> If it fails, the kset.kobj.name allocated by kobject_set_name()
> which must be called before a call to kset_register() may be
> leaked, since refcount of kobj was set in kset_init().
> 
> To mitigate this, we free the name in kset_register() when an
> error is encountered, i.e. when kset_register() returns an error.
> 
> A kset may be embedded in a larger structure which may be dynamically
> allocated in callers, it needs to be freed in ktype.release() or error
> path in callers, in this case, we can not call kset_put() in kset_register(),
> or it will cause double free, so just call kfree_const() to free the
> name and set it to NULL to avoid accessing bad pointer in callers.
> 
> With this fix, the callers don't need care about freeing the name
> and may call kset_put() if kset_register() fails.
> 
> Suggested-by: Luben Tuikov 
> Signed-off-by: Yang Yingliang 
> ---
> v2 -> v3:
>   Update commit message and comment of kset_register().
> 
> v1 -> v2:
>   Free name inside of kset_register() instead of calling kset_put()
>   in drivers.

Thank you for all of this, it's a much nicer and cleaner fix than
forcing all callers to try to handle it instead.

greg k-h


Re: [linux-next:master] BUILD REGRESSION 088b9c375534d905a4d337c78db3b3bfbb52c4a0

2022-07-07 Thread Greg KH
On Thu, Jul 07, 2022 at 02:56:34PM +0800, kernel test robot wrote:
> tree/branch: 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
> branch HEAD: 088b9c375534d905a4d337c78db3b3bfbb52c4a0  Add linux-next 
> specific files for 20220706
> 
> Error/Warning reports:
> 
> https://lore.kernel.org/linux-doc/202207070644.x48xoovs-...@intel.com
> 
> Error/Warning: (recently discovered and may have been fixed)
> 
> Documentation/arm/google/chromebook-boot-flow.rst: WARNING: document isn't 
> included in any toctree
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x1108): undefined reference to 
> `__aeabi_ddiv'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x1124): undefined reference to 
> `__aeabi_ui2d'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x1164): undefined reference to 
> `__aeabi_dmul'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x1170): undefined reference to 
> `__aeabi_dadd'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x1180): undefined reference to 
> `__aeabi_dsub'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x1190): undefined reference to 
> `__aeabi_d2uiz'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x162c): undefined reference to 
> `__aeabi_d2iz'
> arm-linux-gnueabi-ld: dc_dmub_srv.c:(.text+0x16b0): undefined reference to 
> `__aeabi_i2d'
> dc_dmub_srv.c:(.text+0x10f8): undefined reference to `__aeabi_ui2d'
> dc_dmub_srv.c:(.text+0x464): undefined reference to `__floatunsidf'
> dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x33c): undefined 
> reference to `__floatunsidf'
> drivers/pci/endpoint/functions/pci-epf-vntb.c:975:5: warning: no previous 
> prototype for 'pci_read' [-Wmissing-prototypes]
> drivers/pci/endpoint/functions/pci-epf-vntb.c:984:5: warning: no previous 
> prototype for 'pci_write' [-Wmissing-prototypes]
> drivers/vfio/vfio_iommu_type1.c:2141:35: warning: cast to smaller integer 
> type 'enum iommu_cap' from 'void *' [-Wvoid-pointer-to-enum-cast]
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x34c): 
> undefined reference to `__floatunsidf'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x378): 
> undefined reference to `__divdf3'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x38c): 
> undefined reference to `__muldf3'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x3a0): 
> undefined reference to `__adddf3'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x3b4): 
> undefined reference to `__subdf3'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x3d4): 
> undefined reference to `__fixunsdfsi'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x750): 
> undefined reference to `__fixdfsi'
> mips-linux-ld: dc_dmub_srv.c:(.text.dc_dmub_setup_subvp_dmub_command+0x7c0): 
> undefined reference to `__floatsidf'
> powerpc-linux-ld: drivers/pci/endpoint/functions/pci-epf-vntb.c:174: 
> undefined reference to `ntb_link_event'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x468): undefined reference to 
> `__divdf3'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x46c): undefined reference to 
> `__muldf3'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x470): undefined reference to 
> `__adddf3'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x474): undefined reference to 
> `__subdf3'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x478): undefined reference to 
> `__fixunsdfsi'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x47c): undefined reference to 
> `__fixdfsi'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x480): undefined reference to 
> `__floatsidf'
> xtensa-linux-ld: dc_dmub_srv.c:(.text+0x60c): undefined reference to 
> `__floatunsidf'
> 
> Unverified Error/Warning (likely false positive, please contact us if 
> interested):
> 
> arch/x86/events/core.c:2114 init_hw_perf_events() warn: missing error code 
> 'err'
> drivers/android/binder.c:1481:19-23: ERROR: from is NULL but dereferenced.
> drivers/android/binder.c:2920:29-33: ERROR: target_thread is NULL but 
> dereferenced.
> drivers/android/binder.c:353:25-35: ERROR: node -> proc is NULL but 
> dereferenced.
> drivers/android/binder.c:4888:16-20: ERROR: t is NULL but dereferenced.
> drivers/base/regmap/regmap.c:1996:1: internal compiler error: in arc_ifcvt, 
> at config/arc/arc.c:9637
> drivers/char/random.c:869:1: internal compiler error: in arc_ifcvt, at 
> config/arc/arc.c:9637
> drivers/firmware/arm_scmi/clock.c:394:1: internal compiler error: in 
> arc_ifcvt, at config/arc/arc.c:9637
> drivers/firmware/arm_scmi/powercap.c:376:1: internal compiler error: in 
> arc_ifcvt, at config/arc/arc.c:9637
> drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/vega10_powertune.c:1214:1: 
> internal compiler error: in arc_ifcvt, at config/arc/arc.c:9637
> drivers/gpu/drm/amd/display/dc/os_types.h: drm/drm_print.h is included more 
> than once.
> drivers/gpu/drm/bridge/ite-it66121.c:1398:1: internal compiler error: in 
> arc_ifcvt, at config/arc/arc.c:9637
> drivers/greybus/ope

Re: [PATCH] drm/amdgpu: fix check in fbdev init

2022-08-10 Thread Greg KH
On Wed, Aug 10, 2022 at 11:28:18AM -0400, Alex Deucher wrote:
> On Tue, Jul 19, 2022 at 2:57 PM Alex Deucher  
> wrote:
> >
> > The new vkms virtual display code is atomic so there is
> > no need to call drm_helper_disable_unused_functions()
> > when it is enabled.  Doing so can result in a segfault.
> > When the driver switched from the old virtual display code
> > to the new atomic virtual display code, it was missed that
> > we enable virtual display unconditionally under SR-IOV
> > so the checks here missed that case.  Add the missing
> > check for SR-IOV.
> >
> > There is no equivalent of this patch for Linus' tree
> > because the relevant code no longer exists.  This patch
> > is only relevant to kernels 5.15 and 5.16.
> >
> > Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
> > Signed-off-by: Alex Deucher 
> > Cc: sta...@vger.kernel.org # 5.15.x
> > Cc: hgof...@amazon.com
> 
> Hi Greg,
> 
> Is there any chance this can get applied?  It fixes a regression on
> 5.15 and 5.16.

Ah, missed this as it was not obvious that this was a not-upstream
commit at all, sorry.

I'll dig it out of lore.kernel.org and queue it up for the next round of
releases, but note, this is our "busy time" with many patches marked for
stable.

Oh and 5.16 is long end-of-life, nothing anyone can do there, and no one
should be using that kernel version anymore, so no issues there.

thanks,

greg k-h


Re: [PATCH v4 00/41] DYNDBG: opt-in class'd debug for modules, use in drm.

2022-08-12 Thread Greg KH
On Thu, Aug 11, 2022 at 06:52:40PM +0200, Daniel Vetter wrote:
> On Wed, Aug 03, 2022 at 04:13:05PM -0400, Jason Baron wrote:
> > 
> > 
> > On 8/3/22 15:56, jim.cro...@gmail.com wrote:
> > > On Wed, Jul 20, 2022 at 9:32 AM Jim Cromie  wrote:
> > >>
> > > 
> > >> Hi Jason, Greg, DRM-folk,
> > >>
> > >> This adds 'typed' "class FOO" support to dynamic-debug, where 'typed'
> > >> means either DISJOINT (like drm debug categories), or VERBOSE (like
> > >> nouveau debug-levels).  Use it in DRM modules: core, helpers, and in
> > >> drivers i915, amdgpu, nouveau.
> > >>
> > > 
> > > This revision fell over, on a conflict with something in drm-MUMBLE
> > > 
> > > Error: patch 
> > > https://urldefense.com/v3/__https://patchwork.freedesktop.org/api/1.0/series/106427/revisions/2/mbox/__;!!GjvTz_vk!UCPl5Uf32cDVwwysMTfaLwoGLWomargFXuR8HjBA3xsUOjxXHXC5hneAkP4iWK91yc-LjjJxWW89-51Z$
> > >  
> > > not applied
> > > Applying: dyndbg: fix static_branch manipulation
> > > Applying: dyndbg: fix module.dyndbg handling
> > > Applying: dyndbg: show both old and new in change-info
> > > Applying: dyndbg: reverse module walk in cat control
> > > Applying: dyndbg: reverse module.callsite walk in cat control
> > > Applying: dyndbg: use ESCAPE_SPACE for cat control
> > > Applying: dyndbg: let query-modname override actual module name
> > > Applying: dyndbg: add test_dynamic_debug module
> > > Applying: dyndbg: drop EXPORTed dynamic_debug_exec_queries
> > > 
> > > Jason,
> > > those above are decent maintenance patches, particularly the drop export.
> > > It would be nice to trim this unused api this cycle.
> > 
> > Hi Jim,
> > 
> > Agreed - I was thinking the same thing. Feel free to add
> > Acked-by: Jason Baron  to those first 9.
> 
> Does Greg KH usually pick up dyndbg patches or someone else or do I need
> to do something? Would be great to get some movement here since -rc1 goes
> out and merging will restart next week.

Yes, I can take these into my tree after -rc1 is out.

thanks,

greg k-h


Re: [PATCH] drm/amdgpu: fix check in fbdev init

2022-08-13 Thread Greg KH
On Wed, Aug 10, 2022 at 11:39:39AM -0400, Alex Deucher wrote:
> On Wed, Aug 10, 2022 at 11:38 AM Greg KH  wrote:
> >
> > On Wed, Aug 10, 2022 at 11:28:18AM -0400, Alex Deucher wrote:
> > > On Tue, Jul 19, 2022 at 2:57 PM Alex Deucher  
> > > wrote:
> > > >
> > > > The new vkms virtual display code is atomic so there is
> > > > no need to call drm_helper_disable_unused_functions()
> > > > when it is enabled.  Doing so can result in a segfault.
> > > > When the driver switched from the old virtual display code
> > > > to the new atomic virtual display code, it was missed that
> > > > we enable virtual display unconditionally under SR-IOV
> > > > so the checks here missed that case.  Add the missing
> > > > check for SR-IOV.
> > > >
> > > > There is no equivalent of this patch for Linus' tree
> > > > because the relevant code no longer exists.  This patch
> > > > is only relevant to kernels 5.15 and 5.16.
> > > >
> > > > Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
> > > > Signed-off-by: Alex Deucher 
> > > > Cc: sta...@vger.kernel.org # 5.15.x
> > > > Cc: hgof...@amazon.com
> > >
> > > Hi Greg,
> > >
> > > Is there any chance this can get applied?  It fixes a regression on
> > > 5.15 and 5.16.
> >
> > Ah, missed this as it was not obvious that this was a not-upstream
> > commit at all, sorry.
> >
> > I'll dig it out of lore.kernel.org and queue it up for the next round of
> > releases, but note, this is our "busy time" with many patches marked for
> > stable.
> >
> > Oh and 5.16 is long end-of-life, nothing anyone can do there, and no one
> > should be using that kernel version anymore, so no issues there.
> 
> Thanks Greg.  Much appreciated.

Sorry for the delay, now queued up.

greg k-h


Re: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in vcn_v4_0_stop

2022-08-15 Thread Greg KH
On Mon, Aug 15, 2022 at 09:11:18PM +0600, Khalid Masum wrote:
> On 8/15/22 20:15, Dong, Ruijing wrote:
> > [AMD Official Use Only - General]
> > 
> > Sorry, which "r" value was overwritten?  I didn't see the point of making 
> > this change.
> > 
> > Thanks
> > Ruijing
> > 
> > -Original Message-
> > From: Khalid Masum 
> > Sent: Monday, August 15, 2022 3:01 AM
> > To: amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; 
> > linux-ker...@vger.kernel.org; linux-kernel-ment...@lists.linuxfoundation.org
> > Cc: Deucher, Alexander ; Koenig, Christian 
> > ; Pan, Xinhui ; David Airlie 
> > ; Daniel Vetter ; Zhu, James 
> > ; Jiang, Sonny ; Dong, Ruijing 
> > ; Wan Jiabing ; Liu, Leo 
> > ; Khalid Masum 
> > Subject: [PATCH linux-next] drm/amdgpu/vcn: Remove unused assignment in 
> > vcn_v4_0_stop
> > 
> > The value assigned from vcn_v4_0_stop_dbg_mode to r is overwritten before 
> > it can be used. Remove this assignment.
> > 
> > Addresses-Coverity: 1504988 ("Unused value")
> > Fixes: 8da1170a16e4 ("drm/amdgpu: add VCN4 ip block support")
> > Signed-off-by: Khalid Masum 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> > index ca14c3ef742e..80b8a2c66b36 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> > @@ -1154,7 +1154,7 @@ static int vcn_v4_0_stop(struct amdgpu_device *adev)
> >  fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;
> > 
> >  if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
> > -   r = vcn_v4_0_stop_dpg_mode(adev, i);
> > +   vcn_v4_0_stop_dpg_mode(adev, i);
> >  continue;
> >  }
> > 
> > --
> > 2.37.1
> > 
> 
> After value is overwritten soon right after the diff.
> 
> See:
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> 
> static int vcn_v4_0_stop(struct amdgpu_device *adev)
> {
> volatile struct amdgpu_vcn4_fw_shared *fw_shared;
> ...
> 
> for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
> fw_shared = adev->vcn.inst[i].fw_shared.cpu_addr;
> fw_shared->sq.queue_mode |= FW_QUEUE_DPG_HOLD_OFF;
> 
> if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) {
> r = vcn_v4_0_stop_dpg_mode(adev, i);
> continue;
> }
> 
> /* wait for vcn idle */
> r = SOC15_WAIT_ON_RREG(VCN, i, regUVD_STATUS,
> UVD_STATUS__IDLE, 0x7);
> 
> Here, any value assigned to r is overwritten before it could
> be used. So the assignment in the true branch of the if statement
> here can be removed.

Why not fix vcn_v4_0_stop_dpg_mode() to not return anything, as it does
not, and then remove this assignment as well, which would fix up
everything at once to be more obvious what is happening and why.

thanks,

greg k-h


Re: [PATCH 1/7] drm: mark drm.debug-on-dyndbg as BROKEN for now

2022-11-17 Thread Greg KH
On Fri, Nov 11, 2022 at 03:17:09PM -0700, Jim Cromie wrote:
> drm.debug-on-dyndbg has a regression, due to a chicken-egg
> initialization problem:
> 
> 1- modprobe i915
>i915 needs drm.ko, which is loaded 1st
> 
> 2- "modprobe drm drm.debug=0x1ff" (virtual/implied)
>drm.debug is set post-initialization, from boot-args etc
> 
> 3- `modprobe i915` finishes
> 
> W/O drm.debug-on-dyndbg that just works, because all drm_dbg*
> callsites use drm_debug_enabled() to check __drm_debug & DEM_UT_
> before printing.
> 
> But the whole point of drm.debug-on-dyndbg is to avoid that runtime
> test, by enabling (at post-modinit) a static-key at each callsite in
> the just-loaded module.
> 
> And since drm.ko is loaded before all dependent modules, none are
> "just-loaded", and no drm.debug callsites are present yet, except
> those in drm.ko itself.
> 
> Signed-off-by: Jim Cromie 
> ---
>  drivers/gpu/drm/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 34f5a092c99e..0d1e59e6bb7e 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -54,6 +54,7 @@ config DRM_DEBUG_MM
>  config DRM_USE_DYNAMIC_DEBUG
>   bool "use dynamic debug to implement drm.debug"
>   default y
> + depends on BROKEN   # chicken-egg initial enable problem
>   depends on DRM
>   depends on DYNAMIC_DEBUG || DYNAMIC_DEBUG_CORE
>   depends on JUMP_LABEL
> -- 
> 2.38.1

This should go through the drm tree now.  The rest probably should also
go that way and not through my tree as well.

thanks,

greg k-h


Re: [PATCH 0/7] DYNAMIC_DEBUG fixups for rc

2022-11-17 Thread Greg KH
On Fri, Nov 11, 2022 at 03:17:08PM -0700, Jim Cromie wrote:
> hi Jason, Greg, DRM-folk,
> 
> drm.debug-on-dyndbg has a regression due to a chicken-&-egg problem;
> drm.debug is applied to enable dyndbg callsites before the dependent
> modules' callsites are available to be enabled.
> 
> My "fixes" are unready, so lets just mark it BROKEN for now.
> 
> Meanwhile, heres some other fixes, a comment tweak, a proof of
> non-bug, an internal simplification, and a cleanup/improvement to the
> main macro (API):
> 
> Split DECLARE_DYNDBG_CLASSMAP in 1/2; REFERENCE_DYNDBG_CLASSMAP now
> refers to a classmap DECLARE'd just once.  I think this gives a path
> away from the coordination-by-identical-classmaps "feature" that Jani
> and others thought was "weird" (my term).
> 

Acked-by: Greg Kroah-Hartman 


Re: [PATCH 4.19 0/1] drm/amdkfd: Check for null pointer after calling kmemdup

2023-01-04 Thread Greg KH
On Tue, Jan 03, 2023 at 08:43:07PM +0200, Dragos-Marian Panait wrote:
> The following commit is needed to fix CVE-2022-3108:

That's a funny cve, given that you can not ever trigger it in a system,
right?  Why was a CVE allocated for that?

{sigh}



Re: [PATCH 4.19 1/1] drm/amdkfd: Check for null pointer after calling kmemdup

2023-01-04 Thread Greg KH
On Tue, Jan 03, 2023 at 08:43:08PM +0200, Dragos-Marian Panait wrote:
> From: Jiasheng Jiang 
> 
> [ Upstream commit abfaf0eee97925905e742aa3b0b72e04a918fa9e ]
> 
> As the possible failure of the allocation, kmemdup() may return NULL
> pointer.
> Therefore, it should be better to check the 'props2' in order to prevent
> the dereference of NULL pointer.
> 
> Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
> Signed-off-by: Jiasheng Jiang 
> Reviewed-by: Felix Kuehling 
> Signed-off-by: Felix Kuehling 
> Signed-off-by: Alex Deucher 
> Signed-off-by: Dragos-Marian Panait 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 +++
>  1 file changed, 3 insertions(+)

For obvious reasons, I can't take a patch for 4.19.y and not newer
kernel releases, right?

Please provide backports for all kernels if you really need to see this
merged.  And note, it's not a real bug at all, and given that a CVE was
allocated for it that makes me want to even more reject it to show the
whole folly of that mess.

thanks,

greg k-h


Re: [PATCH 4.19 1/1] drm/amdkfd: Check for null pointer after calling kmemdup

2023-01-04 Thread Greg KH
On Wed, Jan 04, 2023 at 09:35:03AM -0500, Alex Deucher wrote:
> On Wed, Jan 4, 2023 at 8:23 AM Christian König  
> wrote:
> >
> > Am 04.01.23 um 13:41 schrieb Greg KH:
> > > On Tue, Jan 03, 2023 at 08:43:08PM +0200, Dragos-Marian Panait wrote:
> > >> From: Jiasheng Jiang 
> > >>
> > >> [ Upstream commit abfaf0eee97925905e742aa3b0b72e04a918fa9e ]
> > >>
> > >> As the possible failure of the allocation, kmemdup() may return NULL
> > >> pointer.
> > >> Therefore, it should be better to check the 'props2' in order to prevent
> > >> the dereference of NULL pointer.
> > >>
> > >> Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
> > >> Signed-off-by: Jiasheng Jiang 
> > >> Reviewed-by: Felix Kuehling 
> > >> Signed-off-by: Felix Kuehling 
> > >> Signed-off-by: Alex Deucher 
> > >> Signed-off-by: Dragos-Marian Panait 
> > >> ---
> > >>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 +++
> > >>   1 file changed, 3 insertions(+)
> > > For obvious reasons, I can't take a patch for 4.19.y and not newer
> > > kernel releases, right?
> > >
> > > Please provide backports for all kernels if you really need to see this
> > > merged.  And note, it's not a real bug at all, and given that a CVE was
> > > allocated for it that makes me want to even more reject it to show the
> > > whole folly of that mess.
> >
> > Well as far as I can see this is nonsense to back port.
> >
> > The code in question is only used only once during driver load and then
> > never again, that exactly this allocation fails while tons of other are
> > made before and after is extremely unlikely.
> >
> > It's nice to have it fixed in newer kernels, but not worth a backport
> > and certainly not stuff for a CVE.
> 
> It's already fixed in Linus' tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abfaf0eee97925905e742aa3b0b72e04a918fa9e

Yes, that's what the above commit shows...

confused,

greg k-h


  1   2   >