Re: [PATCH] mmc: meson-gx: remove IRQF_ONESHOT
On 7/10/2020 10:32 pm, Jerome Brunet wrote: With arm64 defconfig on Khadas vim3, no obvious regression. Looks good. Tested-by: Jerome Brunet I did not test with RT. Brad, Could you let us know is Thomas's patch works for you ? Thx There was a merge conflict in applying against v5.9-rc8-rt12 with particular this patch: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/kernel/irq/manage.c?h=linux-5.9.y-rt&id=18df00ef0b2b1513dc8f1a9ed26b11fff2261c30 I did manage to add the patch after attempting to resolve the conflict which solves the deadlock issue I am seeing with mmc and works fine during testing (a kernel compilation on preempt_rt configured kernel). irq_thread in /kernel/irq/manage.c Looks like this (not 100% sure I should have placed the irq_finalize_oneshot before add_interrupt_randomness). Based on this I can provide Tested-by: Brad Harper --- static int irq_thread(void *data) { struct callback_head on_exit_work; struct irqaction *action = data; struct irq_desc *desc = irq_to_desc(action->irq); irqreturn_t (*handler_fn)(struct irq_desc *desc, struct irqaction *action); if (force_irqthreads && test_bit(IRQTF_FORCED_THREAD, &action->thread_flags)) handler_fn = irq_forced_thread_fn; else handler_fn = irq_thread_fn; init_task_work(&on_exit_work, irq_thread_dtor); task_work_add(current, &on_exit_work, false); irq_thread_check_affinity(desc, action); while (!irq_wait_for_interrupt(action)) { irqreturn_t action_ret; irq_thread_check_affinity(desc, action); action_ret = handler_fn(desc, action); if (action_ret == IRQ_HANDLED) atomic_inc(&desc->threads_handled); if (action_ret == IRQ_WAKE_THREAD) irq_wake_secondary(desc, action); irq_finalize_oneshot(desc, action); if (IS_ENABLED(CONFIG_PREEMPT_RT)) { migrate_disable(); add_interrupt_randomness(action->irq, 0, desc->random_ip ^ (unsigned long) action); migrate_enable(); } wake_threads_waitq(desc); } /* * This is the regular exit path. __free_irq() is stopping the * thread via kthread_stop() after calling * synchronize_hardirq(). So neither IRQTF_RUNTHREAD nor the * oneshot mask bit can be set. */ task_work_cancel(current, irq_thread_dtor); return 0; } ---
Re: [PATCH] mmc: meson-gx: remove IRQF_ONESHOT
I'm happy to test anything on a range of amlogic hardware with standard / rt and multiple mmc devices. Ill test Jerome's patch in next 24 hours to report the results. On 6/10/2020 11:43 pm, Thomas Gleixner wrote: On Mon, Oct 05 2020 at 10:55, Thomas Gleixner wrote: On Mon, Oct 05 2020 at 10:22, Ulf Hansson wrote: On Fri, 2 Oct 2020 at 18:49, Jerome Brunet wrote: IRQF_ONESHOT was added to this driver to make sure the irq was not enabled again until the thread part of the irq had finished doing its job. Doing so upsets RT because, under RT, the hardirq part of the irq handler is not migrated to a thread if the irq is claimed with IRQF_ONESHOT. In this case, it has been reported to eventually trigger a deadlock with the led subsystem. Preventing RT from doing this migration was certainly not the intent, the description of IRQF_ONESHOT does not really reflect this constraint: > IRQF_ONESHOT - Interrupt is not reenabled after the hardirq handler finished. > Used by threaded interrupts which need to keep the > irq line disabled until the threaded handler has been run. This is exactly what this driver was trying to acheive so I'm still a bit confused whether this is a driver or an RT issue. Anyway, this can be solved driver side by manually disabling the IRQs instead of the relying on the IRQF_ONESHOT. IRQF_ONESHOT may then be removed while still making sure the irq won't trigger until the threaded part of the handler is done. Thomas, may I have your opinion on this one. I have no problem to apply $subject patch, but as Jerome also highlights above - this kind of makes me wonder if this is an RT issue, that perhaps deserves to be solved in a generic way. What do you think? Let me stare at the core code. Something smells fishy. The point is that for threaded interrupts (without a primary handler) the core needs to be told that the interrupt line should be masked until the threaded handler finished. That's what IRQF_ONESHOT is for. For interrupts which have both a primary and a threaded handler that's a different story. The primary handler decides whether the thread should be woken and it decides whether to block further interrupt delivery in the device or keep it enabled. When forced interrupt threading is enabled (even independent of RT) then we have the following cases: 1) Regular device interrupt (primary handler only) The primary handler is replaced with the default 'wake up thread' handler and the original primary handler becomes the threaded handler. This enforces IRQF_ONESHOT so that the interupt line (for level interrupts) stays masked until the thread completed handling. 2) Threaded interrupts Interrupts which have been requested as threaded handler (no primary handler) are not changed obvioulsy 3) Interrupts which have both a primary and a thread handler Here IRQF_ONESHOT decides whether the primary handler will be forced threaded or not. That's a bit unfortunate and ill defined and was not intended to be used that way. We rather should make interrupts which need to have their primary handler in hard interrupt context to set IRQF_NO_THREAD. That should at the same time confirm that the primary handler is RT safe. Let me stare at the core code and the actual usage sites some more. Thanks, tglx
Re: [PATCH v2 1/1] mmc: host: meson-gx-mmc: fix possible deadlock condition for preempt_rt
Hi Kevin, I think you are right, I don't have a good enough understanding to make this work so please disregard the patch. I will take on Sebastian's advice and do some testing with 'threadirqs' parameter enabled in standard kernel to see if I can reproduce my original issue there. I'm hoping Jerome might also be able to help with some time to find proper solution. Many Thanks, Brad. On 29/09/2020 10:35 am, Kevin Hilman wrote: Brad Harper writes: --- drivers/mmc/host/meson-gx-mmc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) This patch still needs changelog summarizing the problem and what is being fixed by the patch. Most of what's in the cover letter belongs here. The cover letter can be used to describe the history/background that you don't want in the patch itself. Alternatviely, you could include that information in the a single patch email also because everything after the "---" line does not end up in git history. diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c index 08a3b1c05..3ba8f988d 100644 --- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -146,6 +146,7 @@ struct sd_emmc_desc { }; struct meson_host { + spinlock_t lock; struct device *dev; struct meson_mmc_data *data; struct mmc_host*mmc; @@ -1051,6 +1052,7 @@ static int meson_mmc_probe(struct platform_device *pdev) host->mmc = mmc; host->dev = &pdev->dev; dev_set_drvdata(&pdev->dev, host); + spin_lock_init(&host->lock); I'm confused about what this lock is intended to do. You init it here, but it's never used anywhere. /* The G12A SDIO Controller needs an SRAM bounce buffer */ host->dram_access_quirk = device_property_read_bool(&pdev->dev, @@ -1139,7 +1141,7 @@ static int meson_mmc_probe(struct platform_device *pdev) host->regs + SD_EMMC_IRQ_EN); ret = request_threaded_irq(host->irq, meson_mmc_irq, - meson_mmc_irq_thread, IRQF_ONESHOT, + meson_mmc_irq_thread, 0, dev_name(&pdev->dev), host); if (ret) goto err_init_clk; Kevin
[PATCH v2 1/1] mmc: host: meson-gx-mmc: fix possible deadlock condition for preempt_rt
--- drivers/mmc/host/meson-gx-mmc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c index 08a3b1c05..3ba8f988d 100644 --- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -146,6 +146,7 @@ struct sd_emmc_desc { }; struct meson_host { + spinlock_t lock; struct device *dev; struct meson_mmc_data *data; struct mmc_host*mmc; @@ -1051,6 +1052,7 @@ static int meson_mmc_probe(struct platform_device *pdev) host->mmc = mmc; host->dev = &pdev->dev; dev_set_drvdata(&pdev->dev, host); + spin_lock_init(&host->lock); /* The G12A SDIO Controller needs an SRAM bounce buffer */ host->dram_access_quirk = device_property_read_bool(&pdev->dev, @@ -1139,7 +1141,7 @@ static int meson_mmc_probe(struct platform_device *pdev) host->regs + SD_EMMC_IRQ_EN); ret = request_threaded_irq(host->irq, meson_mmc_irq, - meson_mmc_irq_thread, IRQF_ONESHOT, + meson_mmc_irq_thread, 0, dev_name(&pdev->dev), host); if (ret) goto err_init_clk; -- 2.20.1
[PATCH v2 0/1] mmc: host: meson-gx-mmc: fix possible deadlock condition for preempt_rt
This is a updated experiamental patch for review following discussions with Jerome / Sebastian regarding the usage of threadded interupts in meson-gx-mmc. I don't have a complete understanding or am I a kernel developer but this is my best efforts attempt to address this issue. Also thanks to both of of them for opening up the discussions and Kevin for pointing me in the right direction for patch formatting. Force threaded interrupts for meson_mmc_irq to prevent possible deadlock condition during mmc operations when using preempt_rt with 5.9.0-rc3-rt3 patches on arm64. Using meson-gx-mmc with an emmc device on Hardkernel Odroid N2+ configured with preempt_rt resulted in the soc becoming unresponsive. With lock checking enabled the below inconsistent lock state was observed during boot. After some discussions with tglx in IRC #linux-rt a patch was suggested to remove IRQF_ONESHOT from request_threaded_irq. This has been tested and confirmed by me to resolve both the unresponsive soc and the inconsistent lock state warning when using 5.9.0-rc3-rt3 on arm64 Odroid N2+. Further review and testing is required to ensure there are no adverse impacts or concerns and that is the correct method to resolve the problem. I will continue to test on various amlogic devices with both standard mainline low latency kernel and preempt_rt kernel with -rt patches. Changes since v1: - Add spinlock_t lock to meson_host structure - Add spin_lock_init to driver probe for the host lock to ensure the irq will not attempt to fire again if the threaded irq component is not complete [7.858446] [7.858448] WARNING: inconsistent lock state [7.858450] 5.9.0-rc3-rt3+ #33 Not tainted [7.858453] [7.858456] inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage. [7.858459] swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: [7.858465] 80001219f4d8 (&trig->leddev_list_lock){+?.+}-{0:0}, at: led_trigger_set+0x104/0x270 [7.858482] {IN-HARDIRQ-R} state was registered at: [7.858484] lock_acquire+0xec/0x468 [7.858491] rt_read_lock+0xb0/0x108 [7.858497] led_trigger_event+0x34/0x88 [7.858501] mmc_request_done+0x3f0/0x450 [7.858505] meson_mmc_irq+0x284/0x378 [7.858511] __handle_irq_event_percpu+0xcc/0x4a8 [7.858515] handle_irq_event_percpu+0x60/0xb0 [7.858519] handle_irq_event+0x50/0x108 [7.858522] handle_fasteoi_irq+0xd0/0x180 [7.858527] generic_handle_irq+0x38/0x50 [7.858530] __handle_domain_irq+0x6c/0xc8 [7.858533] gic_handle_irq+0x5c/0xb8 [7.858537] el1_irq+0xbc/0x180 [7.858540] arch_cpu_idle+0x28/0x38 [7.858544] default_idle_call+0x90/0x3f0 [7.858547] do_idle+0x250/0x268 [7.858551] cpu_startup_entry+0x2c/0x78 [7.858554] rest_init+0x1b0/0x284 [7.858559] arch_call_rest_init+0x18/0x24 [7.858565] start_kernel+0x550/0x588 [7.858569] irq event stamp: 1925495 [7.858571] hardirqs last enabled at (1925495): [] _raw_spin_unlock_irqrestore+0xa4/0xb0 [7.858576] hardirqs last disabled at (1925494): [] _raw_spin_lock_irqsave+0xa8/0xb8 [7.858580] softirqs last enabled at (1857856): [] bdi_register_va+0x114/0x368 [7.858586] softirqs last disabled at (1857849): [] bdi_register_va+0x114/0x368 [7.858590] other info that might help us debug this: [7.858592] Possible unsafe locking scenario: [7.858594]CPU0 [7.858595] [7.858597] lock(&trig->leddev_list_lock); [7.858600] [7.858602] lock(&trig->leddev_list_lock); [7.858604] *** DEADLOCK *** [7.858606] 3 locks held by swapper/0/1: [7.858609] #0: 80001219eb30 (leds_list_lock){}-{0:0}, at: led_trigger_register+0xf4/0x1c0 [7.858619] #1: b0696a70 (&led_cdev->trigger_lock){+.+.}-{0:0}, at: led_trigger_register+0x134/0x1c0 [7.858629] #2: 800011fb83d0 (rcu_read_lock){}-{1:2}, at: rt_write_lock+0x8/0x108 [7.858637] stack backtrace: [7.858640] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc3-rt3+ #33 [7.858643] Hardware name: Hardkernel ODROID-N2Plus (DT) [7.858646] Call trace: [7.858647] dump_backtrace+0x0/0x1e8 [7.858650] show_stack+0x20/0x30 [7.858653] dump_stack+0xf0/0x164 [7.858659] print_usage_bug+0x2b4/0x2c0 [7.858662] mark_lock+0x2e8/0x360 [7.858665] __lock_acquire+0x238/0x1858 [7.858669] lock_acquire+0xec/0x468 [7.858672] rt_write_lock+0xb0/0x108 [7.858675] led_trigger_set+0x104/0x270 [7.858678] led_trigger_register+0x180/0x1c0 [7.858681] heartbeat_trig_init+0x28/0x5c [7.858686] do_one_initcall+0x90/0x4bc [7.858690] kernel_init_freeable+0x2cc/0x338 [7.858694] kernel_init+0x1c/0x11c [7.858697] ret_from_fork+0x10/0x34 Brad Harper (1): mmc: host: meson-gx-mmc: fix possible d
[PATCH] mmc: host: meson-gx-mmc: fix possible deadlock condition for preempt_rt
Force threaded interrupts for meson_mmc_irq to prevent possible deadlock condition during mmc operations when using preempt_rt with 5.9.0-rc3-rt3 patches on arm64. Using meson-gx-mmc with an emmc device on Hardkernel Odroid N2+ configured with preempt_rt resulted in the soc becoming unresponsive. With lock checking enabled the below inconsistent lock state was observed during boot. After some discussions with tglx in IRC #linux-rt the attached patch was suggested to remove IRQF_ONESHOT from request_threaded_irq. This has been tested and confirmed by me to resolve both the unresponsive soc and the inconsistent lock state warning when using 5.9.0-rc3-rt3 on arm64 Odroid N2+. Further review and testing is required to ensure there are no adverse impacts or concerns and that is the correct method to resolve the problem. I will continue to test on various amlogic devices with both standard mainline low latency kernel and preempt_rt kernel with -rt patches. [ 7.858446] [ 7.858448] WARNING: inconsistent lock state [ 7.858450] 5.9.0-rc3-rt3+ #33 Not tainted [ 7.858453] [ 7.858456] inconsistent {IN-HARDIRQ-R} -> {HARDIRQ-ON-W} usage. [ 7.858459] swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 7.858465] 80001219f4d8 (&trig->leddev_list_lock){+?.+}-{0:0}, at: led_trigger_set+0x104/0x270 [ 7.858482] {IN-HARDIRQ-R} state was registered at: [ 7.858484] lock_acquire+0xec/0x468 [ 7.858491] rt_read_lock+0xb0/0x108 [ 7.858497] led_trigger_event+0x34/0x88 [ 7.858501] mmc_request_done+0x3f0/0x450 [ 7.858505] meson_mmc_irq+0x284/0x378 [ 7.858511] __handle_irq_event_percpu+0xcc/0x4a8 [ 7.858515] handle_irq_event_percpu+0x60/0xb0 [ 7.858519] handle_irq_event+0x50/0x108 [ 7.858522] handle_fasteoi_irq+0xd0/0x180 [ 7.858527] generic_handle_irq+0x38/0x50 [ 7.858530] __handle_domain_irq+0x6c/0xc8 [ 7.858533] gic_handle_irq+0x5c/0xb8 [ 7.858537] el1_irq+0xbc/0x180 [ 7.858540] arch_cpu_idle+0x28/0x38 [ 7.858544] default_idle_call+0x90/0x3f0 [ 7.858547] do_idle+0x250/0x268 [ 7.858551] cpu_startup_entry+0x2c/0x78 [ 7.858554] rest_init+0x1b0/0x284 [ 7.858559] arch_call_rest_init+0x18/0x24 [ 7.858565] start_kernel+0x550/0x588 [ 7.858569] irq event stamp: 1925495 [ 7.858571] hardirqs last enabled at (1925495): [] _raw_spin_unlock_irqrestore+0xa4/0xb0 [ 7.858576] hardirqs last disabled at (1925494): [] _raw_spin_lock_irqsave+0xa8/0xb8 [ 7.858580] softirqs last enabled at (1857856): [] bdi_register_va+0x114/0x368 [ 7.858586] softirqs last disabled at (1857849): [] bdi_register_va+0x114/0x368 [ 7.858590] other info that might help us debug this: [ 7.858592] Possible unsafe locking scenario: [ 7.858594] CPU0 [ 7.858595] [ 7.858597] lock(&trig->leddev_list_lock); [ 7.858600] [ 7.858602] lock(&trig->leddev_list_lock); [ 7.858604] *** DEADLOCK *** [ 7.858606] 3 locks held by swapper/0/1: [ 7.858609] #0: 80001219eb30 (leds_list_lock){}-{0:0}, at: led_trigger_register+0xf4/0x1c0 [ 7.858619] #1: b0696a70 (&led_cdev->trigger_lock){+.+.}-{0:0}, at: led_trigger_register+0x134/0x1c0 [ 7.858629] #2: 800011fb83d0 (rcu_read_lock){}-{1:2}, at: rt_write_lock+0x8/0x108 [ 7.858637] stack backtrace: [ 7.858640] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc3-rt3+ #33 [ 7.858643] Hardware name: Hardkernel ODROID-N2Plus (DT) [ 7.858646] Call trace: [ 7.858647] dump_backtrace+0x0/0x1e8 [ 7.858650] show_stack+0x20/0x30 [ 7.858653] dump_stack+0xf0/0x164 [ 7.858659] print_usage_bug+0x2b4/0x2c0 [ 7.858662] mark_lock+0x2e8/0x360 [ 7.858665] __lock_acquire+0x238/0x1858 [ 7.858669] lock_acquire+0xec/0x468 [ 7.858672] rt_write_lock+0xb0/0x108 [ 7.858675] led_trigger_set+0x104/0x270 [ 7.858678] led_trigger_register+0x180/0x1c0 [ 7.858681] heartbeat_trig_init+0x28/0x5c [ 7.858686] do_one_initcall+0x90/0x4bc [ 7.858690] kernel_init_freeable+0x2cc/0x338 [ 7.858694] kernel_init+0x1c/0x11c [ 7.858697] ret_from_fork+0x10/0x34 Signed-off-by: Brad Harper --- drivers/mmc/host/meson-gx-mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c index 08a3b1c05..130ac134d 100644 --- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -1139,7 +1139,7 @@ static int meson_mmc_probe(struct platform_device *pdev) host->regs + SD_EMMC_IRQ_EN); ret = request_threaded_irq(host->irq, meson_mmc_irq, - meson_mmc_irq_thread, IRQF_ONESHOT, + meson_mmc_irq_thread, 0, dev_name(&pdev->dev), host); if (ret) goto err_init_clk; -- 2.20.1
Re: [PATCH] mmc: meson-gx: fix irq ack
On 24/05/2019 12:59 am, Jerome Brunet wrote: While cleaning the ISR of the meson-gx and acking only raised irqs, the ack of the irq was moved at the very last stage of the function. This was stable during the initial tests but it triggered issues with hs200, under specific loads (like booting android). Acking the irqs after calling the mmc_request_done() causes the problem. Moving the ack back to the original place solves the issue. Since the irq is edge triggered, it does not hurt to ack irq even earlier, so let's do it early in the ISR. Fixes: 9c5fdb07a28d ("mmc: meson-gx: ack only raised irq") Tested-by: Neil Armstrong Tested-by: Kevin Hilman Signed-off-by: Jerome Brunet --- drivers/mmc/host/meson-gx-mmc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c index 6ef465304052..cb3f6811d69a 100644 --- a/drivers/mmc/host/meson-gx-mmc.c +++ b/drivers/mmc/host/meson-gx-mmc.c @@ -873,6 +873,9 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id) if (WARN_ON(!host) || WARN_ON(!host->cmd)) return IRQ_NONE; + /* ack all raised interrupts */ + writel(status, host->regs + SD_EMMC_STATUS); + cmd = host->cmd; data = cmd->data; cmd->error = 0; @@ -919,9 +922,6 @@ static irqreturn_t meson_mmc_irq(int irq, void *dev_id) if (ret == IRQ_HANDLED) meson_mmc_request_done(host->mmc, cmd->mrq); - /* ack all raised interrupts */ - writel(status, host->regs + SD_EMMC_STATUS); - return ret; } This patch resolves issues I was having with hung tasks waiting for emmc & sd cards on the odroid N2. Tested-by: Brad Harper
Re: [PATCH 1/1] usb: dwc2: disable power_down on Amlogic devices
On 10/12/2018 6:01 am, Martin Blumenstingl wrote: Disable power_down by setting the parameter to DWC2_POWER_DOWN_PARAM_NONE. This fixes a problem on various Amlogic Meson SoCs where USB devices are only recognized when plugged in before booting Linux. A hot-plugged USB device was not detected even though the device got power (my USB thumb drive for example has an LED which lit up). A similar fix was implemented for Rockchip SoCs in commit c216765d3a1def ("usb: dwc2: disable power_down on rockchip devices"). That commit suggests that a change in the dwc2 driver is the cause because the default value for the "hibernate" parameter (which then got renamed to "power_down" to support other modes) was changed in the v4.17 merge window with: commit 6d23ee9caa6790 ("Merge tag 'usb-for-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-testing"). Cc: # 4.19 Suggested-by: Christian Hewitt Signed-off-by: Martin Blumenstingl Fixed the issue on the Odroid C2 sbc with amlogic meson gxbb soc, devices are now detected when connected after boot Tested-by: Brad Harper --- drivers/usb/dwc2/params.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c index 7c1b6938f212..38c813b1d203 100644 --- a/drivers/usb/dwc2/params.c +++ b/drivers/usb/dwc2/params.c @@ -111,6 +111,7 @@ static void dwc2_set_amlogic_params(struct dwc2_hsotg *hsotg) p->phy_type = DWC2_PHY_TYPE_PARAM_UTMI; p->ahbcfg = GAHBCFG_HBSTLEN_INCR8 << GAHBCFG_HBSTLEN_SHIFT; + p->power_down = DWC2_POWER_DOWN_PARAM_NONE; } static void dwc2_set_amcc_params(struct dwc2_hsotg *hsotg)