Re: [Intel-gfx] [PATCH 2/2] hda/i915: split the wait for the component binding

2022-02-25 Thread Takashi Iwai
On Thu, 24 Feb 2022 17:34:54 +0100,
Kai Vehmanen wrote:
> 
> Hi,
> 
> On Thu, 24 Feb 2022, Ramalingam C wrote:
> 
> > Split the wait for component binding from i915 in multiples of
> > sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker
> > thread hung detection given below.
> 
> while I understand the problem, I'm not sure whether a simpler option
> should be chosen. Maybe just split the wait_for_completion_timeout()
> into small 5sec iterations, without consulting value of hung_task_timeout.
> This would seem unligned with more mainstream use of 
> wait_for_completion_timeout() in kernel and still do the job.
> 
> I'll loop in Takashi here as well. Basicly the 60 sec timeout in 
> hda/hdac_i915.c is getting caught by hung_task_detection logic in builds
> where the hung_task_timeout is below 60secs.
> 
> I have a patch that tries to avoid hitting the timeout in some of the more 
> common cases:
> "ALSA: hda/i915 - skip acomp init if no matching display"
> https://lists.freedesktop.org/archives/intel-gfx-trybot/2022-February/128278.html
> ... but we'll still be stuck with some configurations where the timeout 
> will be hit. And above needs careful testing.
> 
> One logic comment below as well, but I'll quote the whole patch to give
> context to Takashi.

I agree with Kai, we can just make the wait_for_completion_timeout()
split in a loop independently from hung_task_timeout.  It's simpler,
less error-prone.


thanks,

Takashi


Re: [Intel-gfx] [PATCH 2/2] hda/i915: split the wait for the component binding

2022-02-24 Thread Kai Vehmanen
Hi,

On Thu, 24 Feb 2022, Ramalingam C wrote:

> Split the wait for component binding from i915 in multiples of
> sysctl_hung_task_timeout_secs. This helps to avoid the possible kworker
> thread hung detection given below.

while I understand the problem, I'm not sure whether a simpler option
should be chosen. Maybe just split the wait_for_completion_timeout()
into small 5sec iterations, without consulting value of hung_task_timeout.
This would seem unligned with more mainstream use of 
wait_for_completion_timeout() in kernel and still do the job.

I'll loop in Takashi here as well. Basicly the 60 sec timeout in 
hda/hdac_i915.c is getting caught by hung_task_detection logic in builds
where the hung_task_timeout is below 60secs.

I have a patch that tries to avoid hitting the timeout in some of the more 
common cases:
"ALSA: hda/i915 - skip acomp init if no matching display"
https://lists.freedesktop.org/archives/intel-gfx-trybot/2022-February/128278.html
... but we'll still be stuck with some configurations where the timeout 
will be hit. And above needs careful testing.

One logic comment below as well, but I'll quote the whole patch to give
context to Takashi.
 
> <3>[   60.946316] INFO: task kworker/11:1:104 blocked for more than 30
> seconds.
> <3>[   60.946479]   Tainted: GW
> 5.17.0-rc5-CI-CI_DRM_11265+ #1
> <3>[   60.946580] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> <6>[   60.946688] task:kworker/11:1state:D stack:14192 pid:  104
> ppid: 2 flags:0x4000
> <6>[   60.946713] Workqueue: events azx_probe_work [snd_hda_intel]
> <6>[   60.946740] Call Trace:
> <6>[   60.946745]  
> <6>[   60.946763]  __schedule+0x42c/0xa80
> <6>[   60.946797]  schedule+0x3f/0xc0
> <6>[   60.946811]  schedule_timeout+0x1be/0x2e0
> <6>[   60.946829]  ? del_timer_sync+0xb0/0xb0
> <6>[   60.946849]  ? 0x8100
> <6>[   60.946864]  ? wait_for_completion_timeout+0x79/0x120
> <6>[   60.946879]  wait_for_completion_timeout+0xab/0x120
> <6>[   60.946906]  snd_hdac_i915_init+0xa5/0xb0 [snd_hda_core]
> <6>[   60.946943]  azx_probe_work+0x71/0x84c [snd_hda_intel]
> <6>[   60.946974]  process_one_work+0x275/0x5c0
> <6>[   60.947010]  worker_thread+0x37/0x370
> <6>[   60.947028]  ? process_one_work+0x5c0/0x5c0
> <6>[   60.947038]  kthread+0xef/0x120
> <6>[   60.947047]  ? kthread_complete_and_exit+0x20/0x20
> <6>[   60.947065]  ret_from_fork+0x22/0x30
> <6>[   60.947110]  
> 
> Signed-off-by: Ramalingam C 
> cc: Kai Vehmanen 
> cc: Lucas De Marchi 
> ---
>  sound/hda/hdac_i915.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c
> index d20a450a9a15..daaeebc5099e 100644
> --- a/sound/hda/hdac_i915.c
> +++ b/sound/hda/hdac_i915.c
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -163,7 +164,8 @@ static bool dg1_gfx_present(void)
>  int snd_hdac_i915_init(struct hdac_bus *bus)
>  {
>   struct drm_audio_component *acomp;
> - int err;
> + unsigned long timeout, ret = 0;
> + int err, i, itr_cnt;
>  
>   if (!i915_gfx_present())
>   return -ENODEV;
> @@ -182,9 +184,18 @@ int snd_hdac_i915_init(struct hdac_bus *bus)
>   if (!acomp->ops) {
>   if (!IS_ENABLED(CONFIG_MODULES) ||
>   !request_module("i915")) {
5~> +   if (!sysctl_hung_task_timeout_secs) {
> + itr_cnt = 1;
> + timeout = msecs_to_jiffies(60 * 1000);
> + } else {
> + itr_cnt = DIV_ROUND_UP(60, 
> sysctl_hung_task_timeout_secs);
> + timeout = 
> msecs_to_jiffies(sysctl_hung_task_timeout_secs * 1000);
> + }
> +
>   /* 60s timeout */
> - 
> wait_for_completion_timeout(>master_bind_complete,
> - msecs_to_jiffies(30 * 
> 1000));
> + for (i = 0; i < itr_cnt || !ret; i++)
> + ret = 
> wait_for_completion_timeout(>master_bind_complete,
> +   timeout);

I think that should be 'i < itr_cnt && !ret'. If wait_for_completion
returns with a positive value, we should stop waiting as the completion
has been signalled.

>   }
>   }
>   if (!acomp->ops) {
> -- 

Br, Kai