Re: [Intel-gfx] [PATCH 2/2] hda/i915: split the wait for the component binding
Hi, On Thu, 24 Feb 2022, Ramalingam C wrote: > Split the wait for component binding from i915 in multiples of > sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker > thread hung detection given below. while I understand the problem, I'm not sure whether a simpler option should be chosen. Maybe just split the wait_for_completion_timeout() into small 5sec iterations, without consulting value of hung_task_timeout. This would seem unligned with more mainstream use of wait_for_completion_timeout() in kernel and still do the job. I'll loop in Takashi here as well. Basicly the 60 sec timeout in hda/hdac_i915.c is getting caught by hung_task_detection logic in builds where the hung_task_timeout is below 60secs. I have a patch that tries to avoid hitting the timeout in some of the more common cases: "ALSA: hda/i915 - skip acomp init if no matching display" https://lists.freedesktop.org/archives/intel-gfx-trybot/2022-February/128278.html ... but we'll still be stuck with some configurations where the timeout will be hit. And above needs careful testing. One logic comment below as well, but I'll quote the whole patch to give context to Takashi. > <3>[ 60.946316] INFO: task kworker/11:1:104 blocked for more than 30 > seconds. > <3>[ 60.946479] Tainted: GW > 5.17.0-rc5-CI-CI_DRM_11265+ #1 > <3>[ 60.946580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > <6>[ 60.946688] task:kworker/11:1state:D stack:14192 pid: 104 > ppid: 2 flags:0x4000 > <6>[ 60.946713] Workqueue: events azx_probe_work [snd_hda_intel] > <6>[ 60.946740] Call Trace: > <6>[ 60.946745] > <6>[ 60.946763] __schedule+0x42c/0xa80 > <6>[ 60.946797] schedule+0x3f/0xc0 > <6>[ 60.946811] schedule_timeout+0x1be/0x2e0 > <6>[ 60.946829] ? del_timer_sync+0xb0/0xb0 > <6>[ 60.946849] ? 0x8100 > <6>[ 60.946864] ? wait_for_completion_timeout+0x79/0x120 > <6>[ 60.946879] wait_for_completion_timeout+0xab/0x120 > <6>[ 60.946906] snd_hdac_i915_init+0xa5/0xb0 [snd_hda_core] > <6>[ 60.946943] azx_probe_work+0x71/0x84c [snd_hda_intel] > <6>[ 60.946974] process_one_work+0x275/0x5c0 > <6>[ 60.947010] worker_thread+0x37/0x370 > <6>[ 60.947028] ? process_one_work+0x5c0/0x5c0 > <6>[ 60.947038] kthread+0xef/0x120 > <6>[ 60.947047] ? kthread_complete_and_exit+0x20/0x20 > <6>[ 60.947065] ret_from_fork+0x22/0x30 > <6>[ 60.947110] > > Signed-off-by: Ramalingam C > cc: Kai Vehmanen > cc: Lucas De Marchi > --- > sound/hda/hdac_i915.c | 17 ++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c > index d20a450a9a15..daaeebc5099e 100644 > --- a/sound/hda/hdac_i915.c > +++ b/sound/hda/hdac_i915.c > @@ -6,6 +6,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -163,7 +164,8 @@ static bool dg1_gfx_present(void) > int snd_hdac_i915_init(struct hdac_bus *bus) > { > struct drm_audio_component *acomp; > - int err; > + unsigned long timeout, ret = 0; > + int err, i, itr_cnt; > > if (!i915_gfx_present()) > return -ENODEV; > @@ -182,9 +184,18 @@ int snd_hdac_i915_init(struct hdac_bus *bus) > if (!acomp->ops) { > if (!IS_ENABLED(CONFIG_MODULES) || > !request_module("i915")) { 5~> + if (!sysctl_hung_task_timeout_secs) { > + itr_cnt = 1; > + timeout = msecs_to_jiffies(60 * 1000); > + } else { > + itr_cnt = DIV_ROUND_UP(60, > sysctl_hung_task_timeout_secs); > + timeout = > msecs_to_jiffies(sysctl_hung_task_timeout_secs * 1000); > + } > + > /* 60s timeout */ > - > wait_for_completion_timeout(>master_bind_complete, > - msecs_to_jiffies(30 * > 1000)); > + for (i = 0; i < itr_cnt || !ret; i++) > + ret = > wait_for_completion_timeout(>master_bind_complete, > + timeout); I think that should be 'i < itr_cnt && !ret'. If wait_for_completion returns with a positive value, we should stop waiting as the completion has been signalled. > } > } > if (!acomp->ops) { > -- Br, Kai
[PATCH v3] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open after call to component_master_add_*(). This leads to problems when the master does further managed allocations on its own. When any participating driver calls component_del(), this leads to immediate release of resources. This came up when investigating a page fault occurring with i915 DRM driver unbind with 5.15-rc1 kernel. The following sequence occurs: i915_pci_remove() -> intel_display_driver_unregister() -> i915_audio_component_cleanup() -> component_del() -> component.c:take_down_master() -> hdac_component_master_unbind() [via master->ops->unbind()] -> devres_release_group(master->parent, NULL) With older kernels this has not caused issues, but with audio driver moving to use managed interfaces for more of its allocations, this no longer works. Devres log shows following to occur: component_master_add_with_match() [ 126.886032] snd_hda_intel :00:1f.3: DEVRES ADD 323ccdc5 devm_component_match_release (24 bytes) [ 126.886045] snd_hda_intel :00:1f.3: DEVRES ADD 865cdb29 grp< (0 bytes) [ 126.886049] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 grp< (0 bytes) audio driver completes its PCI probe() [ 126.892238] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 pcim_iomap_release (48 bytes) component_del() called() at DRM/i915 unbind() [ 137.579422] i915 :00:02.0: DEVRES REL ef44c293 grp< (0 bytes) [ 137.579445] snd_hda_intel :00:1f.3: DEVRES REL 865cdb29 grp< (0 bytes) [ 137.579458] snd_hda_intel :00:1f.3: DEVRES REL 1b480725 pcim_iomap_release (48 bytes) So the "devres_release_group(master->parent, NULL)" ends up freeing the pcim_iomap allocation. Upon next runtime resume, the audio driver will cause a page fault as the iomap alloc was released without the driver knowing about it. Fix this issue by using the "struct master" pointer as identifier for the devres group, and by closing the devres group after the master->ops->bind() call is done. This allows devres allocations done by the driver acting as master to be isolated from the binding state of the aggregate driver. This modifies the logic originally introduced in commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") Cc: sta...@vger.kernel.org BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") Signed-off-by: Kai Vehmanen Acked-by: Imre Deak Acked-by: Russell King (Oracle) --- drivers/base/component.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) V3 changes: - address feedback from Greg KH, add a Fixes tag and cc stable V2 changes: - after review form Imre and Russell, removing RFC tag - rebased on top of 5.15-rc2 (V1 was on drm-tip) - CI test results for V1 show that this patch fixes multiple failures in i915 unbind and module reload tests: https://patchwork.freedesktop.org/series/94889/ diff --git a/drivers/base/component.c b/drivers/base/component.c index 5e79299f6c3f..870485cbbb87 100644 --- a/drivers/base/component.c +++ b/drivers/base/component.c @@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master, return 0; } - if (!devres_open_group(master->parent, NULL, GFP_KERNEL)) + if (!devres_open_group(master->parent, master, GFP_KERNEL)) return -ENOMEM; /* Found all components */ @@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master, return ret; } + devres_close_group(master->parent, NULL); master->bound = true; return 1; } @@ -282,7 +283,7 @@ static void take_down_master(struct master *master) { if (master->bound) { master->ops->unbind(master->parent); - devres_release_group(master->parent, NULL); + devres_release_group(master->parent, master); master->bound = false; } } base-commit: 9e1ff307c779ce1f0f810c7ecce3d95bbae40896 -- 2.33.0
Re: [PATCH v2] component: do not leave master devres group open after bind
Hi, On Tue, 5 Oct 2021, Greg KH wrote: > On Wed, Sep 22, 2021 at 11:54:32AM +0300, Kai Vehmanen wrote: > > In current code, the devres group for aggregate master is left open > > after call to component_master_add_*(). This leads to problems when the > > master does further managed allocations on its own. When any > > participating driver calls component_del(), this leads to immediate > > release of resources. [...] > > the devres group, and by closing the devres group after > > the master->ops->bind() call is done. This allows devres allocations > > done by the driver acting as master to be isolated from the binding state > > of the aggregate driver. This modifies the logic originally introduced in > > commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") > > > > BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 > > Signed-off-by: Kai Vehmanen > > Acked-by: Imre Deak > > Acked-by: Russell King (Oracle) > > What commit does this "fix:"? And does it need to go to stable > kernel(s)? I didn't put a "Fixes" on the original commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") as it alone didn't cause problems. It did open the door for possible devres issues for anybody calling component_master_add_(). On audio side, this surfaced with the more recent commit 3fcaf24e5dce ("ALSA: hda: Allocate resources with device-managed APIs"). In theory one could have hit issues already before, but this made it very easy to hit on actual systems. If I'd have to pick one, it would be 9e1ccb4a7700 ("drivers/base: fix devres handling for master device"). And yes, given comments on this thread, I'd say this needs to go to stable kernels. Br, Kai
Re: [PATCH v2] component: do not leave master devres group open after bind
Hey, On Tue, 28 Sep 2021, Takashi Iwai wrote: > On Wed, 22 Sep 2021 10:54:32 +0200, Kai Vehmanen wrote: > > --- a/drivers/base/component.c > > +++ b/drivers/base/component.c > > @@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master, > > return 0; > > } > > > > - if (!devres_open_group(master->parent, NULL, GFP_KERNEL)) > > + if (!devres_open_group(master->parent, master, GFP_KERNEL)) > > return -ENOMEM; > > > > /* Found all components */ > > @@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master, > > return ret; > > } > > > > + devres_close_group(master->parent, NULL); > > Just wondering whether we should pass master here instead of NULL, > too? I wondered about this as well. Functionally it should be equivalent as passing NULL will apply the operation to the latest added group. I noted the practise of passing NULL has been followed in the existing code when referring to groups created within the same function. E.g. » if (!devres_open_group(component->dev, component, GFP_KERNEL)) { [...] » ret = component->ops->bind(component->dev, master->parent, data); » if (!ret) { » » component->bound = true; » » /* » »* Close the component device's group so that resources » »* allocated in the binding are encapsulated for removal » »* at unbind. Remove the group on the DRM device as we » »* can clean those resources up independently. » »*/ » » devres_close_group(component->dev, NULL); ... so I followed this existing practise. I can change and send a V3 if the explicit parameter is preferred. Br, Kai
[PATCH v2] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open after call to component_master_add_*(). This leads to problems when the master does further managed allocations on its own. When any participating driver calls component_del(), this leads to immediate release of resources. This came up when investigating a page fault occurring with i915 DRM driver unbind with 5.15-rc1 kernel. The following sequence occurs: i915_pci_remove() -> intel_display_driver_unregister() -> i915_audio_component_cleanup() -> component_del() -> component.c:take_down_master() -> hdac_component_master_unbind() [via master->ops->unbind()] -> devres_release_group(master->parent, NULL) With older kernels this has not caused issues, but with audio driver moving to use managed interfaces for more of its allocations, this no longer works. Devres log shows following to occur: component_master_add_with_match() [ 126.886032] snd_hda_intel :00:1f.3: DEVRES ADD 323ccdc5 devm_component_match_release (24 bytes) [ 126.886045] snd_hda_intel :00:1f.3: DEVRES ADD 865cdb29 grp< (0 bytes) [ 126.886049] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 grp< (0 bytes) audio driver completes its PCI probe() [ 126.892238] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 pcim_iomap_release (48 bytes) component_del() called() at DRM/i915 unbind() [ 137.579422] i915 :00:02.0: DEVRES REL ef44c293 grp< (0 bytes) [ 137.579445] snd_hda_intel :00:1f.3: DEVRES REL 865cdb29 grp< (0 bytes) [ 137.579458] snd_hda_intel :00:1f.3: DEVRES REL 1b480725 pcim_iomap_release (48 bytes) So the "devres_release_group(master->parent, NULL)" ends up freeing the pcim_iomap allocation. Upon next runtime resume, the audio driver will cause a page fault as the iomap alloc was released without the driver knowing about it. Fix this issue by using the "struct master" pointer as identifier for the devres group, and by closing the devres group after the master->ops->bind() call is done. This allows devres allocations done by the driver acting as master to be isolated from the binding state of the aggregate driver. This modifies the logic originally introduced in commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 Signed-off-by: Kai Vehmanen Acked-by: Imre Deak Acked-by: Russell King (Oracle) --- drivers/base/component.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) V2 changes: - after review form Imre and Russell, removing RFC tag - rebased on top of 5.15-rc2 (V1 was on drm-tip) - CI test results for V1 show that this patch fixes multiple failures in i915 unbind and module reload tests: https://patchwork.freedesktop.org/series/94889/ diff --git a/drivers/base/component.c b/drivers/base/component.c index 5e79299f6c3f..870485cbbb87 100644 --- a/drivers/base/component.c +++ b/drivers/base/component.c @@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master, return 0; } - if (!devres_open_group(master->parent, NULL, GFP_KERNEL)) + if (!devres_open_group(master->parent, master, GFP_KERNEL)) return -ENOMEM; /* Found all components */ @@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master, return ret; } + devres_close_group(master->parent, NULL); master->bound = true; return 1; } @@ -282,7 +283,7 @@ static void take_down_master(struct master *master) { if (master->bound) { master->ops->unbind(master->parent); - devres_release_group(master->parent, NULL); + devres_release_group(master->parent, master); master->bound = false; } } base-commit: e4e737bb5c170df6135a127739a9e6148ee3da82 -- 2.32.0
[RFC PATCH] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open after call to component_master_add_*(). This leads to problems when the master does further managed allocations on its own. When any participating driver calls component_del(), this leads to immediate release of resources. This came up when investigating a page fault occurring with i915 DRM driver unbind with 5.15-rc1 kernel. The following sequence occurs: i915_pci_remove() -> intel_display_driver_unregister() -> i915_audio_component_cleanup() -> component_del() -> component.c:take_down_master() -> hdac_component_master_unbind() [via master->ops->unbind()] -> devres_release_group(master->parent, NULL) With older kernels this has not caused issues, but with audio driver moving to use managed interfaces for more of its allocations, this no longer works. Devres log shows following to occur: component_master_add_with_match() [ 126.886032] snd_hda_intel :00:1f.3: DEVRES ADD 323ccdc5 devm_component_match_release (24 bytes) [ 126.886045] snd_hda_intel :00:1f.3: DEVRES ADD 865cdb29 grp< (0 bytes) [ 126.886049] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 grp< (0 bytes) audio driver completes its PCI probe() [ 126.892238] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 pcim_iomap_release (48 bytes) component_del() called() at DRM/i915 unbind() [ 137.579422] i915 :00:02.0: DEVRES REL ef44c293 grp< (0 bytes) [ 137.579445] snd_hda_intel :00:1f.3: DEVRES REL 865cdb29 grp< (0 bytes) [ 137.579458] snd_hda_intel :00:1f.3: DEVRES REL 1b480725 pcim_iomap_release (48 bytes) So the "devres_release_group(master->parent, NULL)" ends up freeing the pcim_iomap allocation. Upon next runtime resume, the audio driver will cause a page fault as the iomap alloc was released without the driver knowing about it. Fix this issue by using the "struct master" pointer as identifier for the devres group, and by closing the devres group after the master->ops->bind() call is done. This allows devres allocations done by the driver acting as master to be isolated from the binding state of the aggregate driver. This modifies the logic originally introduced in commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device"). BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 Signed-off-by: Kai Vehmanen --- drivers/base/component.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Hi, I'm sending this as RFC as I'm not sure of the implications of not leaving the devres group open might have to other users of the component framework. For audio, the current behaviour seems very problematic. The display codec is usually just one of many audio codecs attached to the controller, and unbind of the display codec (and the aggregate driver created with DRM), should not bring down the whole audio card. However, now all allocations audio driver does after call to component_master_add_with_match(), will be freed when display driver calls component_del(). Closing the devres group at end of component_master_add_*() would seem the cleanest option. Looking for feedback whether this approach is feasible. One alternative would be for the audio driver to close the "last opened" group after its call to component_master_add(), but this seems messy (audio would make assumptions on component.c internals). diff --git a/drivers/base/component.c b/drivers/base/component.c index 5e79299f6c3f..870485cbbb87 100644 --- a/drivers/base/component.c +++ b/drivers/base/component.c @@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master, return 0; } - if (!devres_open_group(master->parent, NULL, GFP_KERNEL)) + if (!devres_open_group(master->parent, master, GFP_KERNEL)) return -ENOMEM; /* Found all components */ @@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master, return ret; } + devres_close_group(master->parent, NULL); master->bound = true; return 1; } @@ -282,7 +283,7 @@ static void take_down_master(struct master *master) { if (master->bound) { master->ops->unbind(master->parent); - devres_release_group(master->parent, NULL); + devres_release_group(master->parent, master); master->bound = false; } } base-commit: 930e99a51fcc8b1254e0a45fbe0cd5a5b8a704a5 -- 2.32.0
Re: [PATCH] drm/i915/audio: Use BIOS provided value for RKL HDA link
Hi, On Mon, 6 Sep 2021, Kai-Heng Feng wrote: > Commit 989634fb49ad ("drm/i915/audio: set HDA link parameters in > driver") makes HDMI audio on Lenovo P350 disappear. > > So in addition to TGL, extend the logic to RKL to use BIOS provided > value to fix the regression. thanks Kai-Heng! We were not aware of commercial RKL systems following the old BIOS guidance, but given you just hit one, then this definitely is needed: Reviewed-by: Kai Vehmanen Br, Kai
[PATCH] ALSA: hda/i915 - fix list corruption with concurrent probes
From: Takashi Iwai Current hdac_i915 uses a static completion instance to wait for i915 driver to complete the component bind. This design is not safe if multiple HDA controllers are active and communicating with different i915 instances, and can lead to list corruption and failed audio driver probe. Fix the design by moving completion mechanism to common acomp code and remove the related code from hdac_i915. Co-developed-by: Kai Vehmanen Signed-off-by: Kai Vehmanen Signed-off-by: Takashi Iwai --- include/drm/drm_audio_component.h | 4 sound/hda/hdac_component.c| 3 +++ sound/hda/hdac_i915.c | 23 +++ 3 files changed, 10 insertions(+), 20 deletions(-) diff --git a/include/drm/drm_audio_component.h b/include/drm/drm_audio_component.h index a45f93487039..0d36bfd1a4cd 100644 --- a/include/drm/drm_audio_component.h +++ b/include/drm/drm_audio_component.h @@ -117,6 +117,10 @@ struct drm_audio_component { * @audio_ops: Ops implemented by hda driver, called by DRM driver */ const struct drm_audio_component_audio_ops *audio_ops; + /** +* @master_bind_complete: completion held during component master binding +*/ + struct completion master_bind_complete; }; #endif /* _DRM_AUDIO_COMPONENT_H_ */ diff --git a/sound/hda/hdac_component.c b/sound/hda/hdac_component.c index 89126c6fd216..bb37e7e0bd79 100644 --- a/sound/hda/hdac_component.c +++ b/sound/hda/hdac_component.c @@ -210,12 +210,14 @@ static int hdac_component_master_bind(struct device *dev) goto module_put; } + complete_all(>master_bind_complete); return 0; module_put: module_put(acomp->ops->owner); out_unbind: component_unbind_all(dev, acomp); + complete_all(>master_bind_complete); return ret; } @@ -296,6 +298,7 @@ int snd_hdac_acomp_init(struct hdac_bus *bus, if (!acomp) return -ENOMEM; acomp->audio_ops = aops; + init_completion(>master_bind_complete); bus->audio_component = acomp; devres_add(dev, acomp); diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c index 5f0a1aa6ad84..454474ac5716 100644 --- a/sound/hda/hdac_i915.c +++ b/sound/hda/hdac_i915.c @@ -11,8 +11,6 @@ #include #include -static struct completion bind_complete; - #define IS_HSW_CONTROLLER(pci) (((pci)->device == 0x0a0c) || \ ((pci)->device == 0x0c0c) || \ ((pci)->device == 0x0d0c) || \ @@ -130,19 +128,6 @@ static bool i915_gfx_present(void) return pci_dev_present(ids); } -static int i915_master_bind(struct device *dev, - struct drm_audio_component *acomp) -{ - complete_all(_complete); - /* clear audio_ops here as it was needed only for completion call */ - acomp->audio_ops = NULL; - return 0; -} - -static const struct drm_audio_component_audio_ops i915_init_ops = { - .master_bind = i915_master_bind -}; - /** * snd_hdac_i915_init - Initialize i915 audio component * @bus: HDA core bus @@ -163,9 +148,7 @@ int snd_hdac_i915_init(struct hdac_bus *bus) if (!i915_gfx_present()) return -ENODEV; - init_completion(_complete); - - err = snd_hdac_acomp_init(bus, _init_ops, + err = snd_hdac_acomp_init(bus, NULL, i915_component_master_match, sizeof(struct i915_audio_component) - sizeof(*acomp)); if (err < 0) @@ -177,8 +160,8 @@ int snd_hdac_i915_init(struct hdac_bus *bus) if (!IS_ENABLED(CONFIG_MODULES) || !request_module("i915")) { /* 60s timeout */ - wait_for_completion_timeout(_complete, - msecs_to_jiffies(60 * 1000)); + wait_for_completion_timeout(>master_bind_complete, + msecs_to_jiffies(60 * 1000)); } } if (!acomp->ops) { -- 2.28.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel