Re: [PATCH] wifi: mt76: mt7603: add debugfs attr for disabling frames buffering
On 26.03.2024 00:11, Rafał Miłecki wrote: I'll provide some iperf logs from my ThinkPad with 8086:3166 Intel Wireless-AC 3165 working as 2 GHz client of MT7603EN on Netgear R6220. I run seven 1-hour iperf sessions overnight using ThinkPad + Xiaomi Mi Router 4C (MT7628AN Wi-Fi SoC) with buffering DISABLED. Session 1 (1 hour) with 51.7 Mbps avg: All GOOD Session 2 (1 hour) with 40.5 Mbps avg: * 2 seconds traffic stall (STA diconnected with 34=DISASSOC_LOW_ACK) * 718 seconds traffic stall (nothing in STA logs) Session 3 (1 hour) with 51.6 Mbps avg: * 5 seconds traffic stall (STA diconnected with 34=DISASSOC_LOW_ACK) Session 4 (1 hour) with 51.8 Mbps avg: All GOOD Session 5 (1 hour) with 51.8 Mbps avg: All GOOD Session 6 (1 hour) with 51.8 Mbps avg: All GOOD Session 7 (1 hour) with 51.1 Mbps avg: All GOOD I find MT7628AN without frames buffering pretty usable. Now, with frames buffering enabled, things look quite worse. There are multiple traffic stalls and often TCP session dies. See below. pon, 25 mar 2024, 23:23:51 CET Client connecting to 192.168.27.1, TCP port 5001 TCP window size: 85.0 KByte (default) [ 3] local 192.168.1.181 port 58838 connected with 192.168.27.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 3.12 MBytes 26.2 Mbits/sec [ 3] 1.0- 2.0 sec 3.25 MBytes 27.3 Mbits/sec [ 3] 2.0- 3.0 sec 3.25 MBytes 27.3 Mbits/sec [ 3] 3.0- 4.0 sec 5.25 MBytes 44.0 Mbits/sec [ 3] 4.0- 5.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 5.0- 6.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 6.0- 7.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 7.0- 8.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 8.0- 9.0 sec 7.12 MBytes 59.8 Mbits/sec [ 3] 9.0-10.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 10.0-11.0 sec 6.50 MBytes 54.5 Mbits/sec [ 3] 11.0-12.0 sec 7.12 MBytes 59.8 Mbits/sec [ 3] 12.0-13.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 13.0-14.0 sec 4.62 MBytes 38.8 Mbits/sec [ 3] 14.0-15.0 sec 5.50 MBytes 46.1 Mbits/sec [ 3] 15.0-16.0 sec 5.38 MBytes 45.1 Mbits/sec [ 3] 16.0-17.0 sec 6.50 MBytes 54.5 Mbits/sec [ 3] 17.0-18.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 18.0-19.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 19.0-20.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 20.0-21.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 21.0-22.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 22.0-23.0 sec 7.00 MBytes 58.7 Mbits/sec [ 3] 23.0-24.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 24.0-25.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 25.0-26.0 sec 4.50 MBytes 37.7 Mbits/sec [ 3] 26.0-27.0 sec 3.62 MBytes 30.4 Mbits/sec [ 3] 27.0-28.0 sec 3.12 MBytes 26.2 Mbits/sec [ 3] 28.0-29.0 sec 2.62 MBytes 22.0 Mbits/sec [ 3] 29.0-30.0 sec 1.75 MBytes 14.7 Mbits/sec [ 3] 30.0-31.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 31.0-32.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 32.0-33.0 sec 3.50 MBytes 29.4 Mbits/sec [ 3] 33.0-34.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 34.0-35.0 sec 5.62 MBytes 47.2 Mbits/sec [ 3] 35.0-36.0 sec 7.12 MBytes 59.8 Mbits/sec [ 3] 36.0-37.0 sec 6.50 MBytes 54.5 Mbits/sec [ 3] 37.0-38.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 38.0-39.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 39.0-40.0 sec 7.12 MBytes 59.8 Mbits/sec [ 3] 40.0-41.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 41.0-42.0 sec 4.50 MBytes 37.7 Mbits/sec [ 3] 42.0-43.0 sec 3.88 MBytes 32.5 Mbits/sec [ 3] 43.0-44.0 sec 5.50 MBytes 46.1 Mbits/sec [ 3] 44.0-45.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 45.0-46.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 46.0-47.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 47.0-48.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 48.0-49.0 sec 7.25 MBytes 60.8 Mbits/sec [ 3] 49.0-50.0 sec 6.25 MBytes 52.4 Mbits/sec [ 3] 50.0-51.0 sec 6.38 MBytes 53.5 Mbits/sec [ 3] 51.0-52.0 sec 7.12 MBytes 59.8 Mbits/sec [ 3] 52.0-53.0 sec 6.50 MBytes 54.5 Mbits/sec [ 3] 53.0-54.0 sec 7.00 MBytes 58.7 Mbits/sec [ 3] 54.0-55.0 sec 5.50 MBytes 46.1 Mbits/sec [ 3] 55.0-56.0 sec 3.75 MBytes 31.5 Mbits/sec [ 3] 56.0-57.0 sec 1.88 MBytes 15.7 Mbits/sec [ 3] 57.0-58.0 sec 2.38 MBytes 19.9 Mbits/sec [ 3] 58.0-59.0 sec 2.88 MBytes 24.1 Mbits/sec [ 3] 59.0-60.0 sec 2.12 MBytes 17.8 Mbits/sec [ 3] 60.0-61.0 sec 1.12 MBytes 9.44 Mbits/sec [ 3] 61.0-62.0 sec 2.12 MBytes 17.8 Mbits/sec [ 3] 62.0-63.0 sec 2.12 MBytes 17.8 Mbits/sec [ 3] 63.0-64.0 sec 2.00 MBytes 16.8 Mbits/sec [ 3] 64.0-65.0 sec 2.25 MBytes 18.9 Mbits/sec [ 3] 65.0-66.0 sec 2.00 MBytes 16.8 Mbits/sec [ 3] 66.0-67.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 67.0-68.0 sec 1.25 MBytes 10.5 Mbits/sec [ 3] 68.0-69.0 sec 1.12 MBytes 9.44 Mbits/sec [ 3] 69.0-70.0 sec 1.12 MBytes 9.44 Mbits/sec [ 3] 70.0-71.0 sec 3.75 MBytes 31.5 Mbits/sec [ 3] 71.0-72.0 sec 1.88 MBytes 15.7 Mbits/sec [ 3] 72.0-73.0 sec 0.00 Bytes 0.00 bits/sec [ 3] 73.0-74.0 sec 0.00 Bytes 0.00 bits/sec [
[PATCH] wifi: mt76: mt7603: add debugfs attr for disabling frames buffering
From: Rafał Miłecki MT7603EN and MT7628AN were reported by multiple users to be unstable under high traffic. Transmission of packets could stop for seconds often leading to disconnections. Long research & debugging revelaed a close relation between MCU interrupts of type PKT_TYPE_TXS and slowdowns / stalls. That led to questioning frames buffering feature. It turns out that disabling SKBs loopback code makes mt7603 devices much more stable under load. There are still some traffic hiccups but those happen like once every an hour and end up in recovery in most cases. Add a debugfs option for disabling frames buffering so users can give it a try. If this solution yields a success we can make this feature disabled by default. This change was successfully tested using 2 GHz AP interface on: 1. Netgear R6220 - MT7621ST (SoC) + MT7603EN (WiFi) + MT7612EN (WiFi) 2. Xiaomi Mi Router 4C - MT7628AN (Wi-Fi SoC) Link: https://lore.kernel.org/linux-wireless/7c96d5ee-86c1-8068-1b58-40db6087a...@gmail.com/ Closes: https://github.com/openwrt/mt76/issues/865 Fixes: c8846e101502 ("mt76: add driver for MT7603E and MT7628/7688") Signed-off-by: Rafał Miłecki --- drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c | 2 ++ drivers/net/wireless/mediatek/mt76/mt7603/dma.c | 3 +++ drivers/net/wireless/mediatek/mt76/mt7603/init.c| 1 + drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h | 2 ++ 4 files changed, 8 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c b/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c index 3967f2f05774..c80baba7a402 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c @@ -115,4 +115,6 @@ void mt7603_init_debugfs(struct mt7603_dev *dev) >sensitivity_limit); debugfs_create_bool("dynamic_sensitivity", 0600, dir, >dynamic_sensitivity); + debugfs_create_bool("frames_buffering", 0600, dir, + >frames_buffering); } diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c index 7a2f5d38562b..f5ab729dec31 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c @@ -27,6 +27,9 @@ mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb) u32 val; u8 tid = 0; + if (!dev->frames_buffering) + goto free; + if (skb->len < MT_TXD_SIZE + sizeof(struct ieee80211_hdr)) goto free; diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/init.c b/drivers/net/wireless/mediatek/mt76/mt7603/init.c index 6c55c72f28a2..5abc2618ec8b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/init.c @@ -515,6 +515,7 @@ int mt7603_register_device(struct mt7603_dev *dev) dev->slottime = 9; dev->sensitivity_limit = 28; dev->dynamic_sensitivity = true; + dev->frames_buffering = true; ret = mt7603_init_hardware(dev); if (ret) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h b/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h index 9e58df7042ad..02c88334cdc0 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h +++ b/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h @@ -155,6 +155,8 @@ struct mt7603_dev { u32 reset_test; unsigned int reset_cause[__RESET_CAUSE_MAX]; + + bool frames_buffering; }; extern const struct mt76_driver_ops mt7603_drv_ops; -- 2.35.3 ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[PATCH] dt-bindings: leds: add FUNCTION defines for per-band WLANs
From: Rafał Miłecki Most wireless routers and access points can operate in multiple bands simultaneously. Vendors often equip their devices with per-band LEDs. Add defines for those very common functions to allow cleaner & clearer bindings. Signed-off-by: Rafał Miłecki --- include/dt-bindings/leds/common.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/dt-bindings/leds/common.h b/include/dt-bindings/leds/common.h index 9a0d33d027ff..c56785bb9c9c 100644 --- a/include/dt-bindings/leds/common.h +++ b/include/dt-bindings/leds/common.h @@ -101,6 +101,9 @@ #define LED_FUNCTION_USB "usb" #define LED_FUNCTION_WAN "wan" #define LED_FUNCTION_WLAN "wlan" +#define LED_FUNCTION_WLAN_2GHZ "wlan-2ghz" +#define LED_FUNCTION_WLAN_5GHZ "wlan-5ghz" +#define LED_FUNCTION_WLAN_6GHZ "wlan-6ghz" #define LED_FUNCTION_WPS "wps" #endif /* __DT_BINDINGS_LEDS_H */ -- 2.35.3 ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH] wifi: mt76: mt7603: add debugfs attr for disabling frames buffering
On 2.04.2024 18:54, Shengyu Qu wrote: > Maybe we could disable frames buffering by default until it is fixed? Please check commit description: "If this solution yields a success we can make this feature disabled by default." > Also, maybe we could do more tests on newer models such as mt7986/81 to > make this patch benefit more models? Can you point me to the mt7915 driver buffering code? I clearly missed that. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH dt-schema] schemas: chosen: Add OpenWrt LEDs properties for system states
On 9.01.2024 20:10, Krzysztof Kozlowski wrote: On 09/01/2024 17:38, Rafał Miłecki wrote: On 9.01.2024 10:02, Krzysztof Kozlowski wrote: On 09/01/2024 09:23, Rafał Miłecki wrote: From: Rafał Miłecki OpenWrt project provides downstream support for thousands of embedded home network devices. Its custom requirement for DT is to provide info about LEDs roles. Currently it does it by using custom non-documented aliases. While formally valid (aliases.yaml doesn't limit names or purposes of aliases) it's quite a loose solution. Document 4 precise "chosen" biding properties with clearly documented OpenWrt usage. This will allow upstreaming tons of DTS files that noone typo: none typo: no one ;) cared about so far as those would need to be patched downstream anyway. From all this description I understand why you want to add it, but I don't understand what are you exactly adding and how it is being used by the users/OS. In OpenWrt we have user space script that reads LED full path: cat /proc/device-tree/aliases/led- And then sets LED to state matching current boot stage: echo 1 > /sys/class/leds//brightness OK, it's nowhere close to a hardware property. You now instruct OS what to do. It's software or software policy. That's the reason I targeted "chosen" node which seemed the best option given it does not describe real hardware. Signed-off-by: Rafał Miłecki --- A few weeks ago I was seeking for a help regarding OpenWrt's need for specifing LEDs roles in DT, see: Describing LEDs roles in device tree? https://lore.kernel.org/linux-devicetree/ee912a89-4fd7-43c3-a79b-16659a035...@gmail.com/T/#u I DON'T think OpenWrt's current solution with aliases is good enough: * It's not clearly documented * It may vary from other projects usa case * It may be refused by random maintainers I think I decided to suggest 4 OpenWrt-prefixed properties for "chosen". I'm hoping this small custom binding is sth we could go with. I'm really looking forward to upstreaming OpenWrt's downstream DTS files so other projects (e.g. Buildroot) can use them. If you have any better fitting solution in mind please let me know. I should be fine with anything that lets me solve this downstream mess situation. dtschema/schemas/chosen.yaml | 9 + 1 file changed, 9 insertions(+) diff --git a/dtschema/schemas/chosen.yaml b/dtschema/schemas/chosen.yaml index 6d5c3f1..96d0db7 100644 --- a/dtschema/schemas/chosen.yaml +++ b/dtschema/schemas/chosen.yaml @@ -264,4 +264,13 @@ properties: patternProperties: "^framebuffer": true + "^openwrt,led-(boot|failsafe|running|upgrade)$": +$ref: types.yaml#/definitions/string +description: + OpenWrt choice of LED for a given role. Neither this explains it. What is "OpenWrt choice"? Choice like what? What are the valid choices? Value must be a full path (encoded + as a string) to a relevant LED node. What do you mean by full path? DT node path? Then no, use phandles. Anyway, we have existing properties for this - LED functions. Just choose LED with given function (from sysfs) and you are done. If function is missing in the header: feel free to go, least for me. In "Describing LEDs roles in device tree?" e-mail I wrote: " Please note that "function" on its own is not sufficient as multiple LEDs my share the same function name but its meaning may vary depending on color. " Let me elaborate here. Values of "function" property match LEDs descriptions (usually it's a physical label). That is OK and makes sense but doesn't allow OpenWrt to automatically pick proper LED to use during given boot stage. Some devices may have multiple color of a the same LED function. OpenWrt developer needs to choose which color to use for failsafe more and which boot fully booted state (and others too). Devices also differ in available LEDs by their functions. Some may have LED_FUNCTION_POWER that OpenWrt needs to use. Some may have LED_FUNCTION_STATUS. Or both. There are some more (less common) functions like LED_FUNCTION_ACTIVITY. We failed at OpenWrt to develop a single generic script to rule all devices and all their LEDs combinations. We ended up with developers *choosing* what LED to use for a given system state. So this all is because you want to write easier OS. That's abuse of Devicetree. You can define which LEDs have different meaning, e.g. physical label or icon attached to them. That's a hardware property. You can also define how pieces of hardware are wired together and create entire system, e.g. connect one LED to disk activity. However what you are proposing here is to dynamically configure one, given OS. I don't think it is suitable. The problem of OS to nicely figure out which LED to blink when, is not a problem of Devicetree. It is a problem of OS and its configuration. I'd say it's a thin line. Or just a grey idea as Geert said. What is a LED "function" after all? How exactly are: LED_FUNCTION_STATUS LED_FUNCTION_ACTIVITY
Re: [PATCH dt-schema] schemas: chosen: Add OpenWrt LEDs properties for system states
On 9.01.2024 10:02, Krzysztof Kozlowski wrote: On 09/01/2024 09:23, Rafał Miłecki wrote: From: Rafał Miłecki OpenWrt project provides downstream support for thousands of embedded home network devices. Its custom requirement for DT is to provide info about LEDs roles. Currently it does it by using custom non-documented aliases. While formally valid (aliases.yaml doesn't limit names or purposes of aliases) it's quite a loose solution. Document 4 precise "chosen" biding properties with clearly documented OpenWrt usage. This will allow upstreaming tons of DTS files that noone typo: none typo: no one ;) cared about so far as those would need to be patched downstream anyway. From all this description I understand why you want to add it, but I don't understand what are you exactly adding and how it is being used by the users/OS. In OpenWrt we have user space script that reads LED full path: cat /proc/device-tree/aliases/led- And then sets LED to state matching current boot stage: echo 1 > /sys/class/leds//brightness Signed-off-by: Rafał Miłecki --- A few weeks ago I was seeking for a help regarding OpenWrt's need for specifing LEDs roles in DT, see: Describing LEDs roles in device tree? https://lore.kernel.org/linux-devicetree/ee912a89-4fd7-43c3-a79b-16659a035...@gmail.com/T/#u I DON'T think OpenWrt's current solution with aliases is good enough: * It's not clearly documented * It may vary from other projects usa case * It may be refused by random maintainers I think I decided to suggest 4 OpenWrt-prefixed properties for "chosen". I'm hoping this small custom binding is sth we could go with. I'm really looking forward to upstreaming OpenWrt's downstream DTS files so other projects (e.g. Buildroot) can use them. If you have any better fitting solution in mind please let me know. I should be fine with anything that lets me solve this downstream mess situation. dtschema/schemas/chosen.yaml | 9 + 1 file changed, 9 insertions(+) diff --git a/dtschema/schemas/chosen.yaml b/dtschema/schemas/chosen.yaml index 6d5c3f1..96d0db7 100644 --- a/dtschema/schemas/chosen.yaml +++ b/dtschema/schemas/chosen.yaml @@ -264,4 +264,13 @@ properties: patternProperties: "^framebuffer": true + "^openwrt,led-(boot|failsafe|running|upgrade)$": +$ref: types.yaml#/definitions/string +description: + OpenWrt choice of LED for a given role. Neither this explains it. What is "OpenWrt choice"? Choice like what? What are the valid choices? Value must be a full path (encoded + as a string) to a relevant LED node. What do you mean by full path? DT node path? Then no, use phandles. Anyway, we have existing properties for this - LED functions. Just choose LED with given function (from sysfs) and you are done. If function is missing in the header: feel free to go, least for me. In "Describing LEDs roles in device tree?" e-mail I wrote: " Please note that "function" on its own is not sufficient as multiple LEDs my share the same function name but its meaning may vary depending on color. " Let me elaborate here. Values of "function" property match LEDs descriptions (usually it's a physical label). That is OK and makes sense but doesn't allow OpenWrt to automatically pick proper LED to use during given boot stage. Some devices may have multiple color of a the same LED function. OpenWrt developer needs to choose which color to use for failsafe more and which boot fully booted state (and others too). Devices also differ in available LEDs by their functions. Some may have LED_FUNCTION_POWER that OpenWrt needs to use. Some may have LED_FUNCTION_STATUS. Or both. There are some more (less common) functions like LED_FUNCTION_ACTIVITY. We failed at OpenWrt to develop a single generic script to rule all devices and all their LEDs combinations. We ended up with developers *choosing* what LED to use for a given system state. Also: please Cc LED maintainers on all future submissions of this. Included them (apart from linux-leds@ already present) now, thanks. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes
I made a second attempt on debugging some longstanding stability issues affecting BCM53753 SoCs. Those are single CPU core ARM Cortex-A7 boards with a pretty slow arch timer running at 36,8 kHz. After 0 to 20 minutes of close to zero activity I experience hangs and I need to wait a minute for watchdog to kick in and reboot device. First debugging attempt: https://lore.kernel.org/netdev/0f9d0cd6-d344-7915-7bc1-7a090b830...@gmail.com/T/ ("ARM board lockups/hangs triggered by locks and mutexes") After a lot of bisecting, testing & hacking I believe there are 3 types of kernel aspects that affect BCM53573 stability. I'd like to describe them below to document my debugging work. I'm clueless at this point. Maybe someone can come up with an idea of actual issue & ideally a solution. # 1. Locking During my first bisecting attempts I found multiple locking-related commit that regressed stability. Bisected commits: 131287ff833d ("once: add DO_ONCE_SLOW() for sleepable contexts"). and a following group: d0d583484d2e ("locking/refcount: Consolidate implementations of refcount_t") dab787c73f6e ("locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions") 0d3182fbe689 ("locking/refcount: Move saturation warnings out of line") 809554147d60 ("locking/refcount: Improve performance of generic REFCOUNT_FULL code") 9c9269977f03 ("locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the header") 04bff7d7b808 ("locking/refcount: Remove unused refcount_*_checked() variants") 513b19a43bec ("locking/refcount: Ensure integer operands are treated as signed") 68b4ee68e8c8 ("locking/refcount: Define constants for saturation and max refcount values") I don't believe there is actually anything wrong about above changes. Maybe it's some tiny timing thing that my board just doesn't like? # 2. Clock (arm,armv7-timer) While comparing main clock in Broadcom's SDK with upstream one I noticed a tiny difference: mask value. I don't know it it makes any sense but switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in arm_arch_timer.c (to match SDK) increases average uptime (time before a hang/lockup happens) from 4 minutes to 36 minutes. # 3. Random code changes During my bisecting attempts I found one commit that regressed kernel stability but actual changes were meaningless in context of locking. It was commit ad9b10d1eaad ("mtd: core: introduce of support for dynamic partitions"): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ad9b10d1eaada169bd764abcab58f08538877e26 I thought that maybe it was all about making add_mtd_device() bigger and changing addresses of a lot of symbols (looking at System.map). So I reverted that mtd commit and developed a dummy change relocating as few symbols (System.map) as possible while still breaking stability: --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -94,6 +94,21 @@ void __cpuidle default_idle_call(void) arch_cpu_idle(); start_critical_timings(); } + + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); + if (cpu_idle_force_poll == 5678) + arch_cpu_idle(); + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); + if (cpu_idle_force_poll == 5678) + arch_cpu_idle(); + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); + if (cpu_idle_force_poll == 5678) + arch_cpu_idle(); + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); } static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev, Above dummy change didn't relocate thousands of symbols but only about 20 of them. They happened to be lock symbols however. Does it make any sense for above diff to regress kernel stability for me and cause hangs/lockups? --- System.map.good +++ System.map.bad @@ -22214,36 +22214,36 @@ c062e7e0 T __cpuidle_text_start c062e7e0 t cpu_idle_poll c062e860 T default_idle_call -c062e884 T __cpuidle_text_end -c062e888 T __lock_text_start -c062e8a0 T _raw_spin_unlock_irqrestore -c062e8c0 T _raw_spin_trylock -c062e900 T _raw_write_unlock_irqrestore -c062e920 T _raw_read_trylock -c062e960 T _raw_write_trylock -c062e9a0 T _raw_spin_lock_bh -c062ea00 T _raw_read_lock_bh -c062ea40 T _raw_write_lock_bh -c062ea80 T _raw_spin_trylock_bh -c062eb00 T _raw_spin_unlock_bh -c062eb40 T _raw_write_unlock_bh -c062eb80 T _raw_read_unlock_bh -c062ebc0 T _raw_read_unlock_irqrestore -c062ec00 T _raw_write_lock -c062ec40 T _raw_write_lock_irq -c062ec80 T _raw_write_lock_irqsave -c062ecc0 T _raw_read_lock -c062ed00 T _raw_spin_lock -c062ed40 T _raw_read_lock_irq -c062ed80 T _raw_spin_lock_irq -c062ede0 T _raw_spin_lock_irqsave -c062ee40 T _raw_read_lock_irqsave -c062ee70 T __hyp_text_end -c062ee70 T __hyp_text_start -c062ee70 T __kprobes_text_end -c062ee70 T __kprobes_text_start -c062ee70 T __lock_text_end -c062ee70 T _etext +c062e954 T
Re: ARM board lockups/hangs triggered by locks and mutexes
On 18.08.2023 22:23, Rafał Miłecki wrote: On 14.08.2023 11:04, Geert Uytterhoeven wrote: Hi Rafal, On Mon, Aug 7, 2023 at 1:11 PM Rafał Miłecki wrote: On 4.08.2023 13:07, Rafał Miłecki wrote: I triple checked that. Dropping a single unused function breaks kernel / device stability on BCM53573! AFAIK the only thing below diff actually affects is location of symbols (I actually verified that by comparing System.map before and after - over 22'000 of relocated symbols). Can some unfortunate location of symbols cause those hangs/lockups? I performed another experiment. First I dropped mtd_check_of_node() to bring kernel back to the stable state. Then I started adding useless code to the mtdchar_unlocked_ioctl(). I ended up adding just enough to make sure all post-mtd symbols in System.map got the same offset as in case of backporting mtd_check_of_node(). I started experiencing lockups/hangs again. I repeated the same test with adding dumb code to the brcm_nvram_probe() and verifying symbols offsets following brcm_nvram_probe one. I believe this confirms that this problem is about offset or alignment of some specific symbol(s). The remaining question is what symbols and how to fix or workaround that. I had similar experiences on other ARM platforms many years ago: bisection lead to something completely bogus, and it turned out adding a single line of innocent code made the system lock-up or crash unexpectedly. It was definitely related to alignment, as adding the right extra amount of innocent code would fix the problem. Until some later change changing alignment again... I never found the real cause, but the problems went away over time. I am not sure I did enable all required errata config options, so I may have missed some... I already experiented some weird performance variations on Broadcom's Northstar platform that was related to symbols layout & cache hit/miss ratio. For that reason I use -falign-functions=32 for that whole OpenWrt's "bcm53xx" target (it covers Northstar and BCM53573). So this aspect should be ruled out already in my case. Relevant OpenWrt commit with some description and links: b54ef39e0b91 ("bcm53xx: use -falign-functions=32 for kernel compilation"): https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=b54ef39e0b910a4b8eaca0497fe9b63e8392262a ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
On 14.08.2023 11:04, Geert Uytterhoeven wrote: Hi Rafal, On Mon, Aug 7, 2023 at 1:11 PM Rafał Miłecki wrote: On 4.08.2023 13:07, Rafał Miłecki wrote: I triple checked that. Dropping a single unused function breaks kernel / device stability on BCM53573! AFAIK the only thing below diff actually affects is location of symbols (I actually verified that by comparing System.map before and after - over 22'000 of relocated symbols). Can some unfortunate location of symbols cause those hangs/lockups? I performed another experiment. First I dropped mtd_check_of_node() to bring kernel back to the stable state. Then I started adding useless code to the mtdchar_unlocked_ioctl(). I ended up adding just enough to make sure all post-mtd symbols in System.map got the same offset as in case of backporting mtd_check_of_node(). I started experiencing lockups/hangs again. I repeated the same test with adding dumb code to the brcm_nvram_probe() and verifying symbols offsets following brcm_nvram_probe one. I believe this confirms that this problem is about offset or alignment of some specific symbol(s). The remaining question is what symbols and how to fix or workaround that. I had similar experiences on other ARM platforms many years ago: bisection lead to something completely bogus, and it turned out adding a single line of innocent code made the system lock-up or crash unexpectedly. It was definitely related to alignment, as adding the right extra amount of innocent code would fix the problem. Until some later change changing alignment again... I never found the real cause, but the problems went away over time. I am not sure I did enable all required errata config options, so I may have missed some... I already experiented some weird performance variations on Broadcom's Northstar platform that was related to symbols layout & cache hit/miss ratio. For that reason I use -falign-functions=32 for that whole OpenWrt's "bcm53xx" target (it covers Northstar and BCM53573). So this aspect should be ruled out already in my case. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
On 7.08.2023 20:34, Florian Fainelli wrote: On 8/7/23 04:10, Rafał Miłecki wrote: On 4.08.2023 13:07, Rafał Miłecki wrote: I triple checked that. Dropping a single unused function breaks kernel / device stability on BCM53573! AFAIK the only thing below diff actually affects is location of symbols (I actually verified that by comparing System.map before and after - over 22'000 of relocated symbols). Can some unfortunate location of symbols cause those hangs/lockups? I performed another experiment. First I dropped mtd_check_of_node() to bring kernel back to the stable state. Then I started adding useless code to the mtdchar_unlocked_ioctl(). I ended up adding just enough to make sure all post-mtd symbols in System.map got the same offset as in case of backporting mtd_check_of_node(). I started experiencing lockups/hangs again. I repeated the same test with adding dumb code to the brcm_nvram_probe() and verifying symbols offsets following brcm_nvram_probe one. I believe this confirms that this problem is about offset or alignment of some specific symbol(s). The remaining question is what symbols and how to fix or workaround that. In the config.gz file you attached in your first email, both CONFIG_MTD_* and CONFIG_NVMEM_* so it is not like we are reaching into module space for code and/or data and need veneers or anything, it is part of the kernel image so we can assert the maximum distance between instructions etc. Now is it just that specific mutex that is an issue, or do other mutexes through the system do cause problems as well? If you mean mtd mutex, I'm quite sure it's not the one to blame. It just happened modified function was using a mutex. Could be any other. Do we suspect the toolchain to be possibly problematic? Maybe, I really don't know much such low level stuff. Following dump change brings back lockups/hangs: diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c index ee437af41..0a24dec55 100644 --- a/drivers/mtd/mtdchar.c +++ b/drivers/mtd/mtdchar.c @@ -1028,6 +1028,22 @@ static long mtdchar_unlocked_ioctl(struct file *file, u_int cmd, u_long arg) { int ret; + if (!file) + pr_info("Missing\n"); + WARN_ON(!file); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + mutex_lock(_mutex); ret = mtdchar_ioctl(file, cmd, arg); mutex_unlock(_mutex); ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
On 4.08.2023 13:07, Rafał Miłecki wrote: I triple checked that. Dropping a single unused function breaks kernel / device stability on BCM53573! AFAIK the only thing below diff actually affects is location of symbols (I actually verified that by comparing System.map before and after - over 22'000 of relocated symbols). Can some unfortunate location of symbols cause those hangs/lockups? I performed another experiment. First I dropped mtd_check_of_node() to bring kernel back to the stable state. Then I started adding useless code to the mtdchar_unlocked_ioctl(). I ended up adding just enough to make sure all post-mtd symbols in System.map got the same offset as in case of backporting mtd_check_of_node(). I started experiencing lockups/hangs again. I repeated the same test with adding dumb code to the brcm_nvram_probe() and verifying symbols offsets following brcm_nvram_probe one. I believe this confirms that this problem is about offset or alignment of some specific symbol(s). The remaining question is what symbols and how to fix or workaround that. Following dump change brings back lockups/hangs: diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c index ee437af41..0a24dec55 100644 --- a/drivers/mtd/mtdchar.c +++ b/drivers/mtd/mtdchar.c @@ -1028,6 +1028,22 @@ static long mtdchar_unlocked_ioctl(struct file *file, u_int cmd, u_long arg) { int ret; + if (!file) + pr_info("Missing\n"); + WARN_ON(!file); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + WARN_ON(cmd == 1234); + WARN_ON(cmd == 5678); + mutex_lock(_mutex); ret = mtdchar_ioctl(file, cmd, arg); mutex_unlock(_mutex); ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
On 2.08.2023 00:10, Rafał Miłecki wrote: Unfortunately enabling *any* of following options: CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y seems to make locksup/hangs go away. I tried for few hours. I decided to find out why enabling CONFIG_DEBUG_MUTEXES "fixes" kernel / device stability for me. I tried enabling manually code that normally hides behind the #ifdev CONFIG_DEBUG_MUTEXES. Attached to this e-mail is a small patch that is enough to make my kernel stable (mutex-fix-bcm53573.diff). # It's not what's the most interesting thought. What really doesn't make sense anymore is that below diff (on top of attached one) brings back hangs/lockups. I triple checked that. Dropping a single unused function breaks kernel / device stability on BCM53573! AFAIK the only thing below diff actually affects is location of symbols (I actually verified that by comparing System.map before and after - over 22'000 of relocated symbols). Can some unfortunate location of symbols cause those hangs/lockups? diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c index 4fe40910f..c440222a4 100644 --- a/kernel/locking/mutex-debug.c +++ b/kernel/locking/mutex-debug.c @@ -34,6 +34,8 @@ void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) INIT_LIST_HEAD(>list); } +/* Dropping below function brings back hangs/lockups & reboots */ +#if 0 void debug_mutex_wake_waiter(struct mutex *lock, struct mutex_waiter *waiter) { lockdep_assert_held(>wait_lock); @@ -41,6 +43,7 @@ void debug_mutex_wake_waiter(struct mutex *lock, struct mutex_waiter *waiter) DEBUG_LOCKS_WARN_ON(waiter->magic != waiter); DEBUG_LOCKS_WARN_ON(list_empty(>list)); } +#endif void debug_mutex_free_waiter(struct mutex_waiter *waiter) { diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 479bc96c3..15bd4691b 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -57,9 +57,7 @@ struct mutex { struct optimistic_spin_queue osq; /* Spinner MCS lock */ #endif struct list_head wait_list; -#ifdef CONFIG_DEBUG_MUTEXES void *magic; -#endif #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map dep_map; #endif @@ -73,12 +71,10 @@ struct mutex_waiter { struct list_head list; struct task_struct *task; struct ww_acquire_ctx *ww_ctx; -#ifdef CONFIG_DEBUG_MUTEXES void *magic; -#endif }; -#ifdef CONFIG_DEBUG_MUTEXES +#if 1 //def CONFIG_DEBUG_MUTEXES #define __DEBUG_MUTEX_INITIALIZER(lockname)\ , .magic = diff --git a/include/linux/sched.h b/include/linux/sched.h index d0e639497..8fef4485e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -958,10 +958,8 @@ struct task_struct { struct rt_mutex_waiter *pi_blocked_on; #endif -#ifdef CONFIG_DEBUG_MUTEXES /* Mutex deadlock detection: */ struct mutex_waiter *blocked_on; -#endif #ifdef CONFIG_DEBUG_ATOMIC_SLEEP intnon_block_count; diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile index 45452facf..b22e6ecd8 100644 --- a/kernel/locking/Makefile +++ b/kernel/locking/Makefile @@ -12,7 +12,7 @@ CFLAGS_REMOVE_mutex-debug.o = $(CC_FLAGS_FTRACE) CFLAGS_REMOVE_rtmutex-debug.o = $(CC_FLAGS_FTRACE) endif -obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o +obj-y += mutex-debug.o obj-$(CONFIG_LOCKDEP) += lockdep.o ifeq ($(CONFIG_PROC_FS),y) obj-$(CONFIG_LOCKDEP) += lockdep_proc.o diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index b02fff282..6dc3f80a3 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -946,9 +946,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, might_sleep(); -#ifdef CONFIG_DEBUG_MUTEXES DEBUG_LOCKS_WARN_ON(lock->magic != lock); -#endif ww = container_of(lock, struct ww_mutex, base); if (ww_ctx) { @@ -1417,9 +1415,7 @@ int __sched mutex_trylock(struct mutex *lock) { bool locked; -#ifdef CONFIG_DEBUG_MUTEXES DEBUG_LOCKS_WARN_ON(lock->magic != lock); -#endif locked = __mutex_trylock(lock); if (locked) ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
On 2.08.2023 00:10, Rafał Miłecki wrote: Reverting that extra commit from v5.4.238 allows me to run Linux for hours again (currently 3 devices x 6 hours and counting). So I need in total 10+1 reverts from 5.4 branch to get a stable kernel. I switched back to OpenWrt's kernel 5.4 and applied all those reverts I found. Nothing. I was still getting hangs / lockups + reboots. After more bisecting and I found out it's because OpenWrt backported commit ad9b10d1eaad ("mtd: core: introduce of support for dynamic partitions"): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ad9b10d1eaada169bd764abcab58f08538877e26 It didn't make any sense to me. That patch does nothing on my device and its code is only executed when booting. It makes even less sense to me. Why such changes that should not affect anything actually break stability for BCM53573? I narrowed above patch even furher. It's actually enough to apply below diff to break kernel stability: diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c index f69c5b94e..f10dd3af1 100644 --- a/drivers/mtd/mtdcore.c +++ b/drivers/mtd/mtdcore.c @@ -590,6 +590,25 @@ static int mtd_nvmem_add(struct mtd_info *mtd) return 0; } +static void mtd_check_of_node(struct mtd_info *mtd) +{ + struct device_node *partitions, *parent_dn; + struct mtd_info *parent; + + /* Check if MTD already has a device node */ + if (dev_of_node(>dev)) + return; + + /* Check if a partitions node exist */ + parent = mtd_get_master(mtd); + parent_dn = dev_of_node(>dev); + pr_info("[%s] mtd->name:%s parent_dn:%pOF\n", __func__, mtd->name, parent_dn); + if (!parent_dn) + return; + + of_node_put(parent_dn); +} + /** * add_mtd_device - register an MTD device * @mtd: pointer to new MTD device info structure @@ -673,6 +692,7 @@ int add_mtd_device(struct mtd_info *mtd) mtd->dev.devt = MTD_DEVT(i); dev_set_name(>dev, "mtd%d", i); dev_set_drvdata(>dev, mtd); + mtd_check_of_node(mtd); of_node_get(mtd_get_of_node(mtd)); error = device_register(>dev); if (error) { ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
ARM board lockups/hangs triggered by locks and mutexes
Hi, Years ago I added support for Broadcom's BCM53573 SoCs. We released firmwares based on Linux 4.4 (and later on 4.14) that worked almost fine. There was one little issue we couldn't debug or fix: random hangs and reboots. They were too rare to deal with (most devices worked fine for weeks or months). Recently I updated my stable kernel 5.4 and I started experiencing stability issues on my own! After some uptime (usually from 0 to 20 minutes of close to zero activity) serial console hangs. I can't type anything and I stop getting any messages. I've to wait about a minute for watchdog to kick in and reboot device. # I took that great chance and decided to track the regression. Linux 5.4 stable branch worked stable up to the release v5.4.197. Starting with v5.4.198 I started experiencing those stability issues. I bisected it down to the commit 4460066eb248 ("ipv6: fix locking issues with loops over idev->addr_list"): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=4460066eb2480b9e203c73755e12e2efc820a27e With above commit reverted I was able to use stable 5.4 branch up to the release v5.4.207. Starting with v5.4.208 it got unstable again. I bisected it down to: commit d0d583484d2e ("locking/refcount: Consolidate implementations of refcount_t") commit dab787c73f6e ("locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions") commit 0d3182fbe689 ("locking/refcount: Move saturation warnings out of line") commit 809554147d60 ("locking/refcount: Improve performance of generic REFCOUNT_FULL code") commit 9c9269977f03 ("locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the header") commit 04bff7d7b808 ("locking/refcount: Remove unused refcount_*_checked() variants") commit 513b19a43bec ("locking/refcount: Ensure integer operands are treated as signed") commit 68b4ee68e8c8 ("locking/refcount: Define constants for saturation and max refcount values") https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=d0d583484d2ed9f5903edbbfa7e2a68f78b950b0 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=dab787c73f6e38d8e7ed3c1e683385e8f0fe28a2 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=0d3182fbe689e3808c03b6cde6be98237f9e0a4a https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=809554147d609163cfbaf815c443c575b538a7ef https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=9c9269977f03ab9c448c8b71581a951e0eb4fb7b https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=04bff7d7b8081c4bb2e8171be31d33df297eee5b https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=513b19a43becee5f7af6d283bb9d3d241a8a21a8 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=68b4ee68e8c8800cf8d6b61cc74b4031a0742a4c (I didn't actually check above commits individually). Reverting above locking/refcount commits worked fine for few releases: up to the v5.4.219. Starting with v5.4.220 I got hangs again. I bisected that down to the commit 131287ff833d ("once: add DO_ONCE_SLOW() for sleepable contexts"). Reverting that extra commit from v5.4.238 allows me to run Linux for hours again (currently 3 devices x 6 hours and counting). So I need in total 10+1 reverts from 5.4 branch to get a stable kernel. # I'm clueless at this point. Is that possible kernel has some locking bug I can hit only using this specific SoC? BCM53573s have a single ARM Cortex-A7 CPU running at 900 MHz. The only unusual thing about this hw I can think of is a slow arch timer running at 36,8 kHz. I tried compiling kernel with: CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_DETECT_HUNG_TASK=y CONFIG_WQ_WATCHDOG=y but it didn't change or report anything. Unfortunately enabling *any* of following options: CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y seems to make locksup/hangs go away. I tried for few hours. Sadly I don't have access to JTAG or any low level debugging interface. Does looking at commits I reported above give anyone a hint on what may be going on maybe? -- Rafał [0.00] Booting Linux on physical CPU 0x0 [0.00] Linux version 5.4.238 (ubuntu@nat) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r15234+1-d89a7f0120)) #0 SMP Fri Jul 14 12:56:51 2023 [0.00] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d [0.00] CPU: div instructions available: patching division code [0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache [0.00] OF: fdt: Machine model: Tenda AC9 [0.00] earlycon: ns16550a0 at MMIO 0x18000300 (options '115200n8') [0.00] printk: bootconsole [ns16550a0] enabled [0.00] Memory policy: Data cache writealloc [0.00] Hit pending asynchronous external abort (FSR=0x0c06) during
Re: ARM board lockups/hangs triggered by locks and mutexes
On 2.08.2023 00:21, Russell King (Oracle) wrote: On Wed, Aug 02, 2023 at 12:10:24AM +0200, Rafał Miłecki wrote: Years ago I added support for Broadcom's BCM53573 SoCs. We released firmwares based on Linux 4.4 (and later on 4.14) that worked almost fine. There was one little issue we couldn't debug or fix: random hangs and reboots. They were too rare to deal with (most devices worked fine for weeks or months). Recently I updated my stable kernel 5.4 and I started experiencing stability issues on my own! After some uptime (usually from 0 to 20 minutes of close to zero activity) serial console hangs. I can't type anything and I stop getting any messages. I've to wait about a minute for watchdog to kick in and reboot device. (...) I'm clueless at this point. Is that possible kernel has some locking bug I can hit only using this specific SoC? BCM53573s have a single ARM Cortex-A7 CPU running at 900 MHz. The only unusual thing about this hw I can think of is a slow arch timer running at 36,8 kHz. I tried compiling kernel with: CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_DETECT_HUNG_TASK=y CONFIG_WQ_WATCHDOG=y but it didn't change or report anything. Unfortunately enabling *any* of following options: CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y seems to make locksup/hangs go away. I tried for few hours. Sadly I don't have access to JTAG or any low level debugging interface. Does looking at commits I reported above give anyone a hint on what may be going on maybe? If you suspect locking issues, make sure you have lockdep enabled which will detect locking errors. You will want CONFIG_PROVE_LOCKING enabled. I will say that I use IPv6, and I run 32-bit kernels here both on real ARMv7 hardware (Armada 388 and iMX6 based stuff) and also in KVM based VMs, and these have run virtually every release of the kernel (not stable kernels though) and I haven't ever seen the behaviour that you describe. If it is specific to stable kernels, then that would be rather disappointing. I wrote above that with any of: CONFIG_DEBUG_RT_MUTEXES, CONFIG_DEBUG_SPINLOCK or CONFIG_DEBUG_MUTEXES enabled I can't reproduce the issue anymore. Right? Well I swear it was true for some random 5.4 release I tested before. With your comment I decided to try CONFIG_PROVE_LOCKING anyway / again and this time on 1 of my BCM53573 devices I got something very interesting on the first boot. FWIW following error: Broadcom B53 (2) bcma_mdio-0-0:1e: failed to register switch: -517 is caused by invalid DT I sent fixes for just recently. Please scroll through the first booting lines for the WARNING: [0.00] Booting Linux on physical CPU 0x0 [0.00] Linux version 5.4.238 (ubuntu@nat) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r15234+1-d89a7f0120)) #0 SMP Fri Jul 14 12:56:51 2023 [0.00] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d [0.00] CPU: div instructions available: patching division code [0.00] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache [0.00] OF: fdt: Machine model: Tenda AC9 [0.00] earlycon: ns16550a0 at MMIO 0x18000300 (options '115200n8') [0.00] printk: bootconsole [ns16550a0] enabled [0.00] Memory policy: Data cache writealloc [0.00] Hit pending asynchronous external abort (FSR=0x0c06) during first unmask, this is most likely caused by a firmware/bootloader bug. [0.00] percpu: Embedded 14 pages/cpu s27944 r8192 d21208 u57344 [0.00] Built 1 zonelists, mobility grouping on. Total pages: 32480 [0.00] Kernel command line: console=ttyS0,115200 earlycon [0.00] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear) [0.00] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear) [0.00] mem auto-init: stack:off, heap alloc:off, heap free:off [0.00] Memory: 118164K/131072K available (5531K kernel code, 201K rwdata, 1960K rodata, 1024K init, 2106K bss, 12908K reserved, 0K cma-reserved, 0K highmem) [0.00] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [0.00] rcu: Hierarchical RCU implementation. [0.00] rcu: RCU restricting CPUs from NR_CPUS=2 to nr_cpu_ids=1. [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. [0.00] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 [0.00] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16 [0.00] arch_timer: cp15 timer(s) running at 0.03MHz (virt). [0.00] clocksource: arch_sys_counter: mask: 0xff max_cycles: 0x10eb00226, max_idle_ns: 56421785894076 ns [0.27] sched_clock: 56 bits at 35kHz, resolution 27918ns, wraps every 70368744165810ns [0.008654] Ignoring delay timer arch_delay_timer, which has insufficient resolution of 27918ns [0.017951] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [0.025936] ...
Re: ARM board lockups/hangs triggered by locks and mutexes
On 2.08.2023 00:25, Florian Fainelli wrote: Hi Rafal, On 8/1/23 15:10, Rafał Miłecki wrote: Hi, Years ago I added support for Broadcom's BCM53573 SoCs. We released firmwares based on Linux 4.4 (and later on 4.14) that worked almost fine. There was one little issue we couldn't debug or fix: random hangs and reboots. They were too rare to deal with (most devices worked fine for weeks or months). Recently I updated my stable kernel 5.4 and I started experiencing stability issues on my own! After some uptime (usually from 0 to 20 minutes of close to zero activity) serial console hangs. I can't type anything and I stop getting any messages. I've to wait about a minute for watchdog to kick in and reboot device. # I took that great chance and decided to track the regression. Linux 5.4 stable branch worked stable up to the release v5.4.197. Starting with v5.4.198 I started experiencing those stability issues. I bisected it down to the commit 4460066eb248 ("ipv6: fix locking issues with loops over idev->addr_list"): https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=4460066eb2480b9e203c73755e12e2efc820a27e With above commit reverted I was able to use stable 5.4 branch up to the release v5.4.207. Starting with v5.4.208 it got unstable again. I bisected it down to: commit d0d583484d2e ("locking/refcount: Consolidate implementations of refcount_t") commit dab787c73f6e ("locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions") commit 0d3182fbe689 ("locking/refcount: Move saturation warnings out of line") commit 809554147d60 ("locking/refcount: Improve performance of generic REFCOUNT_FULL code") commit 9c9269977f03 ("locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the header") commit 04bff7d7b808 ("locking/refcount: Remove unused refcount_*_checked() variants") commit 513b19a43bec ("locking/refcount: Ensure integer operands are treated as signed") commit 68b4ee68e8c8 ("locking/refcount: Define constants for saturation and max refcount values") https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=d0d583484d2ed9f5903edbbfa7e2a68f78b950b0 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=dab787c73f6e38d8e7ed3c1e683385e8f0fe28a2 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=0d3182fbe689e3808c03b6cde6be98237f9e0a4a https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=809554147d609163cfbaf815c443c575b538a7ef https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=9c9269977f03ab9c448c8b71581a951e0eb4fb7b https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=04bff7d7b8081c4bb2e8171be31d33df297eee5b https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=513b19a43becee5f7af6d283bb9d3d241a8a21a8 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y=68b4ee68e8c8800cf8d6b61cc74b4031a0742a4c (I didn't actually check above commits individually). Reverting above locking/refcount commits worked fine for few releases: up to the v5.4.219. Starting with v5.4.220 I got hangs again. I bisected that down to the commit 131287ff833d ("once: add DO_ONCE_SLOW() for sleepable contexts"). Reverting that extra commit from v5.4.238 allows me to run Linux for hours again (currently 3 devices x 6 hours and counting). So I need in total 10+1 reverts from 5.4 branch to get a stable kernel. # I'm clueless at this point. Is that possible kernel has some locking bug I can hit only using this specific SoC? BCM53573s have a single ARM Cortex-A7 CPU running at 900 MHz. The only unusual thing about this hw I can think of is a slow arch timer running at 36,8 kHz. From the look of it, it seems like the CPU might have bugs with atomics? Your log indicates that your Cortex-A7 is r0p5 which is described to be susceptible to ARM_ERRATA_814220, do you have it enabled by any chance, if not, can you enable it and see if makes any difference? I had it disabled. Unfortunately CONFIG_ARM_ERRATA_814220=y doesn't help. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
On 2.08.2023 09:00, Rafał Miłecki wrote: With your comment I decided to try CONFIG_PROVE_LOCKING anyway / again and this time on 1 of my BCM53573 devices I got something very interesting on the first boot. FWIW following error: Broadcom B53 (2) bcma_mdio-0-0:1e: failed to register switch: -517 is caused by invalid DT I sent fixes for just recently. Please scroll through the first booting lines for the WARNING: (...) [ 1.167234] bgmac_bcma bcma0:5: Found PHY addr: 30 (NOREGS) [ 1.173655] [ cut here ] [ 1.178374] WARNING: CPU: 0 PID: 1 at kernel/locking/mutex.c:950 __mutex_lock+0x6b4/0x8a0 [ 1.186721] DEBUG_LOCKS_WARN_ON(lock->magic != lock) Ah, that mutex WARNING comes from my Tenda AC9 device which happens to use a hacky OpenWrt downstream b53 driver. That driver uses wrong API (it behaves as PHY driver instead of MDIO driver). It results in probing against PHY device which isn't properly initialized. Long story short: above WARNING is just a noise. Ignore it please. Sorry for that. Kernel compiled with CONFIG_PROVE_LOCKING still works fine on other devices and on Tenda AC9 after fixing PHY<->MDIO thing. That kernel option hides actual bug whatever it is. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[PATCH] ath79: replace "mac-address-ascii" with "mac-base"
From: Rafał Miłecki With upstream accepted "mac-base" binding there is no need for a downstream "mac-address-ascii" workaround anymore. Signed-off-by: Rafał Miłecki --- .../ath79/dts/ar7161_dlink_dir-825-b1.dts | 57 +++ .../linux/ath79/dts/ar9331_hiwifi_hc6361.dts | 37 ++-- .../linux/ath79/dts/ar9342_zyxel_nwa11xx.dtsi | 28 + .../linux/ath79/dts/ar9344_dlink_dir-8x5.dtsi | 42 -- .../ath79/dts/qca9558_dlink_dir-629-a1.dts| 32 +++ .../ath79/dts/qca9563_dlink_dir-8x9-a1.dtsi | 31 ++ 6 files changed, 134 insertions(+), 93 deletions(-) diff --git a/target/linux/ath79/dts/ar7161_dlink_dir-825-b1.dts b/target/linux/ath79/dts/ar7161_dlink_dir-825-b1.dts index 0e39be7d0b..bdb678298d 100644 --- a/target/linux/ath79/dts/ar7161_dlink_dir-825-b1.dts +++ b/target/linux/ath79/dts/ar7161_dlink_dir-825-b1.dts @@ -139,8 +139,8 @@ ath9k0: wifi@0,11 { compatible = "pci168c,0029"; reg = <0x8800 0 0 0 0>; - nvmem-cells = <_lan>, <_art_1000>; - nvmem-cell-names = "mac-address-ascii", "calibration"; + nvmem-cells = <_lan 0>, <_art_1000>; + nvmem-cell-names = "mac-address", "calibration"; #gpio-cells = <2>; gpio-controller; }; @@ -148,9 +148,8 @@ ath9k1: wifi@0,12 { compatible = "pci168c,0029"; reg = <0x9000 0 0 0 0>; - nvmem-cells = <_wan>, <_art_5000>; - nvmem-cell-names = "mac-address-ascii", "calibration"; - mac-address-increment = <1>; + nvmem-cells = <_wan 1>, <_art_5000>; + nvmem-cell-names = "mac-address", "calibration"; #gpio-cells = <2>; gpio-controller; }; @@ -191,23 +190,31 @@ label = "caldata"; reg = <0x66 0x01>; read-only; - #address-cells = <1>; - #size-cells = <1>; - cal_art_1000: cal@1000 { - reg = <0x1000 0xeb8>; - }; - - cal_art_5000: cal@5000 { - reg = <0x5000 0xeb8>; - }; - - macaddr_lan: macaddr@ffa0 { - reg = <0xffa0 0x11>; - }; - - macaddr_wan: macaddr@ffb4 { - reg = <0xffb4 0x11>; + nvmem-layout { + compatible = "fixed-layout"; + #address-cells = <1>; + #size-cells = <1>; + + cal_art_1000: cal@1000 { + reg = <0x1000 0xeb8>; + }; + + cal_art_5000: cal@5000 { + reg = <0x5000 0xeb8>; + }; + + macaddr_lan: macaddr@ffa0 { + compatible = "mac-base"; + reg = <0xffa0 0x11>; + #nvmem-cell-cells = <1>; + }; + + macaddr_wan: macaddr@ffb4 { + compatible = "mac-base"; + reg = <0xffb4 0x11>; + #nvmem-cell-cells = <1>; + }; }; }; @@ -224,8 +231,8 @@ pll-data = <0x 0x1099 0x00991099>; - nvmem-cells = <_lan>; - nvmem-cell-names = "mac-address-ascii"; + nvmem-cells = <_lan 0>; + nvmem-cell-names = "mac-address"; fixed-link { speed = <1000>; @@ -238,8 +245,8 @@ pll-data = <0x 0x1099 0x00991099>; - nvmem-cells = <_wan>; - nvmem-cell-names = "mac-address-ascii"; + nvmem-cells = <_wan 0>; + nvmem-cell-names = "mac-address"; phy-handle = <>; }; diff --git a/target/linux/ath79/dts/ar9331_hiwifi_hc6361.dts b/target/linux/ath79/dts/ar9331_hiwifi_hc6361.dts index 05d3f6730e..fa000ab90c 100644 --- a/target/linux/ath79/dts/ar9331_hiwifi_hc6361.dts +++ b/target/linux/ath79/dts/ar9331_hiwifi_hc6361.dts @@ -77,9 +77,22 @@ }; bdinfo: partition@1 { + compatible = "nvmem-cells";
Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes
Hi, it's a late reply but I didn't find enough determination earlier. On 8.09.2023 10:10, Linus Walleij wrote: On Mon, Sep 4, 2023 at 10:34 AM Rafał Miłecki wrote: I'm clueless at this point. Maybe someone can come up with an idea of actual issue & ideally a solution. Damn this is frustrating. 2. Clock (arm,armv7-timer) While comparing main clock in Broadcom's SDK with upstream one I noticed a tiny difference: mask value. I don't know it it makes any sense but switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in arm_arch_timer.c (to match SDK) increases average uptime (time before a hang/lockup happens) from 4 minutes to 36 minutes. This could be related to how often the system goes to idle. + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); + if (cpu_idle_force_poll == 5678) + arch_cpu_idle(); + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); + if (cpu_idle_force_poll == 5678) + arch_cpu_idle(); + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); + if (cpu_idle_force_poll == 5678) + arch_cpu_idle(); + if (cpu_idle_force_poll == 1234) + arch_cpu_idle(); Idle again. I would have tried to see what arch_cpu_idle() is doing. arm_pm_idle() or cpu_do_idle()? In my case arm_pm_idle is NULL. What happens if you just put return in arch_cpu_idle() so it does nothing? Doesn't help. I also tried putting: udelay(10); and udelay(1000); at the arch_cpu_idle() beginning. None helped. Here comes more interesting experiment though. Putting there: if (!(foo++ % 1)) { pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle); } doesn't seem to help. Putting following however seems to make kernel/device stable: if (!(foo++ % 100)) { pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle); } I think I'm just going to assume those chipsets are simply hw broken. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH] brcm: add CLM BLOB files for Luxul devices
Hi Josh, I'm OpenWrt developer and those CLM BLOBs have significant meaning for this project. They allow OpenWrt users to use their devices with full WiFi support (not limited to some generic fallback setup level). For that let me speak up regarding this PATCH case. On 12.06.2023 18:57, Josh Boyer wrote: On Mon, Jun 12, 2023 at 11:05 AM Rip Route wrote: The files were compiled from source obtained through Luxul's contract manufacturer of these devices. OK, it's not clear to me whether we can accept these. We have no idea what your contract specified, what sources were used, what the license on them is, or if you had authority to release them at all. I'm not questioning your integrity by any means, but we have very little way to know Broadcom is agreeable to these being included in the linux-firmware repo. If Broadcom wants to submit them or provide a Signed-off-by statement, we might be able to take them then. I fully understand your concerns. Those are binary files and it's hard to tell what they contain at first sight. It'd be perfect to have Broadcom simply approve them but I'm afraid it's unlikely. I see linux-wireless@ and Broadcom lists were cc-ed but we didn't get any feedback. I know some firmware contributions took Broadcom months or years in the past. So I'd try to analyze what we have here and try to review it. First of all: CLM (BLOB) is a binary formatted data. It doesn't contain any actual CPU-executable code. It gets uploaded to FullMAC (closed source) firmware and gets parsed by it. So what we have here is not actual firmware (executable binary) built from proprietary C sources full of logic and possibly some patent. We have binary files containing (literally) numbers and strings. They just use a binary undocumented format. I actually took a moment to reverse engineer those files and write basic tools for them. BLOB is a simple container format with header containing offsets, sizes and CRC32 sums. I developed "bcmblob" tool for parsing BLOBs and extracting from them: https://git.openwrt.org/?p=project/firmware-utils.git;a=commitdiff;h=f730ad2fa0b4728e2bd66771d22cf642909e020e CLM BLOB is a BLOB file containing CLM and chipset version. I also developed absolutely basic tool "bcmclm" for parsing CLM files: https://git.openwrt.org/?p=project/firmware-utils.git;a=commitdiff;h=916633160dc92ccae6a0ad8fd900f2d75b5b5ff0 Given that, in my I-am-not-a-lawyer opinion, those files are safe to accept and distribute. If I were to compare them to something it'd be an XLS stylesheet file. It's a binary undocumented format but what it actually contains are sets of values. I don't think it raises any concerns to use a custom binary format. We're clearly fine sharing .xls, .exe or .rar files. I also don't think you can stop anyone from sharing a list of 2 GHz, 5 GHz or 6 GHz channels. Or their rates. Or flags (like radar etc.). Or allowed TX power levels. Just numbers in general. Do you think that with my basic technical summary and Luxul-provided info you can accept those files? ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[PATCH] net: gro: respect nf_conntrack_checksum for skipping csum verification
From: Rafał Miłecki Netfilter allows disabling checksum verification of incoming packets by setting nf_conntrack_checksum variable. That feature is very useful for home routers which: 1. Most of the time just /forward/ network traffic 2. Have slow CPU(s) and csum calculation is a challenge Some projects like OpenWrt set nf_conntrack_checksum to 0 by default. It would be nice to allow similar optimization in the GRO code paths. This patch simply reuses nf_conntrack_checksum variable to skip skb_gro_checksum_validate() calls if applicable. Signed-off-by: Rafał Miłecki --- Hi guys, I'm not very familiar with net subsystem, please let me know if there is a better way of implementing such a feature. --- net/ipv4/tcp_offload.c | 3 +++ net/ipv6/tcpv6_offload.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 30abde86db45..734a3c0f3d4a 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -311,6 +311,9 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) { /* Don't bother verifying checksum if we're going to flush anyway. */ if (!NAPI_GRO_CB(skb)->flush && +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + dev_net(skb->dev)->ct.sysctl_checksum && +#endif skb_gro_checksum_validate(skb, IPPROTO_TCP, inet_gro_compute_pseudo)) { NAPI_GRO_CB(skb)->flush = 1; diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index 39db5a226855..2144afa56fa3 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -18,6 +18,9 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) { /* Don't bother verifying checksum if we're going to flush anyway. */ if (!NAPI_GRO_CB(skb)->flush && +#if IS_ENABLED(CONFIG_NF_CONNTRACK) + dev_net(skb->dev)->ct.sysctl_checksum && +#endif skb_gro_checksum_validate(skb, IPPROTO_TCP, ip6_gro_compute_pseudo)) { NAPI_GRO_CB(skb)->flush = 1; -- 2.34.1 ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH] wifi: mt76: mt7603: add debugfs attr for disabling frames buffering
Hi Rafal, Maybe we could disable frames buffering by default until it is fixed? Also, maybe we could do more tests on newer models such as mt7986/81 to make this patch benefit more models? Best regards, Shengyu 在 2024/3/26 6:33, Rafał Miłecki 写道: From: Rafał Miłecki MT7603EN and MT7628AN were reported by multiple users to be unstable under high traffic. Transmission of packets could stop for seconds often leading to disconnections. Long research & debugging revelaed a close relation between MCU interrupts of type PKT_TYPE_TXS and slowdowns / stalls. That led to questioning frames buffering feature. It turns out that disabling SKBs loopback code makes mt7603 devices much more stable under load. There are still some traffic hiccups but those happen like once every an hour and end up in recovery in most cases. Add a debugfs option for disabling frames buffering so users can give it a try. If this solution yields a success we can make this feature disabled by default. This change was successfully tested using 2 GHz AP interface on: 1. Netgear R6220 - MT7621ST (SoC) + MT7603EN (WiFi) + MT7612EN (WiFi) 2. Xiaomi Mi Router 4C - MT7628AN (Wi-Fi SoC) Link: https://lore.kernel.org/linux-wireless/7c96d5ee-86c1-8068-1b58-40db6087a...@gmail.com/ Closes: https://github.com/openwrt/mt76/issues/865 Fixes: c8846e101502 ("mt76: add driver for MT7603E and MT7628/7688") Signed-off-by: Rafał Miłecki --- drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c | 2 ++ drivers/net/wireless/mediatek/mt76/mt7603/dma.c | 3 +++ drivers/net/wireless/mediatek/mt76/mt7603/init.c| 1 + drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h | 2 ++ 4 files changed, 8 insertions(+) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c b/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c index 3967f2f05774..c80baba7a402 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/debugfs.c @@ -115,4 +115,6 @@ void mt7603_init_debugfs(struct mt7603_dev *dev) >sensitivity_limit); debugfs_create_bool("dynamic_sensitivity", 0600, dir, >dynamic_sensitivity); + debugfs_create_bool("frames_buffering", 0600, dir, + >frames_buffering); } diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c index 7a2f5d38562b..f5ab729dec31 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/dma.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/dma.c @@ -27,6 +27,9 @@ mt7603_rx_loopback_skb(struct mt7603_dev *dev, struct sk_buff *skb) u32 val; u8 tid = 0; + if (!dev->frames_buffering) + goto free; + if (skb->len < MT_TXD_SIZE + sizeof(struct ieee80211_hdr)) goto free; diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/init.c b/drivers/net/wireless/mediatek/mt76/mt7603/init.c index 6c55c72f28a2..5abc2618ec8b 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/init.c +++ b/drivers/net/wireless/mediatek/mt76/mt7603/init.c @@ -515,6 +515,7 @@ int mt7603_register_device(struct mt7603_dev *dev) dev->slottime = 9; dev->sensitivity_limit = 28; dev->dynamic_sensitivity = true; + dev->frames_buffering = true; ret = mt7603_init_hardware(dev); if (ret) diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h b/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h index 9e58df7042ad..02c88334cdc0 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h +++ b/drivers/net/wireless/mediatek/mt76/mt7603/mt7603.h @@ -155,6 +155,8 @@ struct mt7603_dev { u32 reset_test; unsigned int reset_cause[__RESET_CAUSE_MAX]; + + bool frames_buffering; }; extern const struct mt76_driver_ops mt7603_drv_ops; OpenPGP_0xE3520CC91929C8E7.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[PATCH v2] imx: add imx8m support
Add imx8m support: - add a cortexa53 subtarget to imx - move ARCH and KERNELNAME to subtargets No device-specific targets or firmware images are created yet but all imx8m* dtbs will be built. enabling CONFIG_TARGET_ROOTFS_INITRAMFS results in openwrt-imx-cortexa53-imx8m-initramfs-kernel.bin which has been successfully booted on an imx8mm-evk using the following: u-boot=> tftpboot $fdt_addr_r image-imx8mm-evk.dtb && \ tftpboot $kernel_addr_r openwrt-imx-cortexa53-imx8m-initramfs-kernel.bin && \ booti $kernel_addr_r - $fdt_addr_r Signed-off-by: Tim Harvey --- v2: - fix build failure regarding ubifs image generation - remove kernel patches (these all go away when we move to 6.1 and I've already submitted a series to add 6.1 support to imx) - add a generic imx8m device that picks up imx8m dtb's - show example of how I'm booting this in commit log --- target/linux/imx/Makefile | 5 +- target/linux/imx/cortexa53/config-default | 96 +++ target/linux/imx/cortexa53/target.mk | 8 ++ target/linux/imx/cortexa7/target.mk | 2 + target/linux/imx/cortexa9/target.mk | 2 + target/linux/imx/image/cortexa53.mk | 16 6 files changed, 125 insertions(+), 4 deletions(-) create mode 100644 target/linux/imx/cortexa53/config-default create mode 100644 target/linux/imx/cortexa53/target.mk create mode 100644 target/linux/imx/image/cortexa53.mk diff --git a/target/linux/imx/Makefile b/target/linux/imx/Makefile index 5f7f9db589..53c1ccce0c 100644 --- a/target/linux/imx/Makefile +++ b/target/linux/imx/Makefile @@ -4,19 +4,16 @@ include $(TOPDIR)/rules.mk -ARCH:=arm BOARD:=imx BOARDNAME:=NXP i.MX FEATURES:=audio display fpu gpio pcie rtc usb usbgadget squashfs targz nand ubifs boot-part rootfs-part -SUBTARGETS:=cortexa7 cortexa9 +SUBTARGETS:=cortexa7 cortexa9 cortexa53 KERNEL_PATCHVER:=5.15 KERNEL_TESTING_PATCHVER:=6.1 include $(INCLUDE_DIR)/target.mk -KERNELNAME:=zImage dtbs - DEFAULT_PACKAGES += uboot-envtools mkf2fs e2fsprogs blkid $(eval $(call BuildTarget)) diff --git a/target/linux/imx/cortexa53/config-default b/target/linux/imx/cortexa53/config-default new file mode 100644 index 00..de52260489 --- /dev/null +++ b/target/linux/imx/cortexa53/config-default @@ -0,0 +1,96 @@ +CONFIG_64BIT=y +CONFIG_ARM64=y +CONFIG_ARM64_4K_PAGES=y +CONFIG_ARM64_VA_BITS_39=y +CONFIG_ARM64_CRYPTO=y +CONFIG_ARCH_NXP=y +CONFIG_CRYPTO_AES_ARM64=y +CONFIG_CRYPTO_AES_ARM64_CE=y +CONFIG_CRYPTO_AES_ARM64_CE_BLK=y +CONFIG_CRYPTO_AES_ARM64_CE_CCM=y +CONFIG_CRYPTO_BLAKE2S=y +CONFIG_CRYPTO_GHASH_ARM64_CE=y +CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC=y +CONFIG_CRYPTO_SHA1_ARM64_CE=y +CONFIG_CRYPTO_SHA256_ARM64=y +CONFIG_CRYPTO_SHA2_ARM64_CE=y +CONFIG_CRYPTO_SHA512_ARM64=y +CONFIG_CRYPTO_SHA512_ARM64_CE=y +CONFIG_CRYPTO_ZSTD=y +CONFIG_UNMAP_KERNEL_AT_EL0=y +CONFIG_RODATA_FULL_DEFAULT_ENABLED=y +CONFIG_ARM64_TAGGED_ADDR_ABI=y +CONFIG_ARCH_MMAP_RND_BITS=18 +CONFIG_VMAP_STACK=y +CONFIG_MEMORY_ISOLATION=y +CONFIG_CMA=y +CONFIG_CMA_AREAS=7 +# CONFIG_CMA_DEBUG is not set +# CONFIG_CMA_DEBUGFS is not set +# CONFIG_CMA_SYSFS is not set +CONFIG_CONSOLE_TRANSLATIONS=y +CONFIG_CONTIG_ALLOC=y +CONFIG_ZONE_DMA32=y +CONFIG_ARM_IMX_CPUFREQ_DT=y +CONFIG_ARM64_CRYPTO=y +CONFIG_EXTRA_FIRMWARE="imx/sdma/sdma-imx7d.bin" +CONFIG_EXTRA_FIRMWARE_DIR="firmware" +CONFIG_PCI=y +CONFIG_PCIEAER=y +CONFIG_PCIEPORTBUS=y +CONFIG_PCIE_PME=y +CONFIG_PHY_FSL_IMX8M_PCIE=y +CONFIG_PCIEAER=y +CONFIG_PCIEPORTBUS=y +CONFIG_PCIE_DW=y +CONFIG_PCIE_DW_HOST=y +CONFIG_PCIE_PME=y +CONFIG_PCI_DOMAINS=y +CONFIG_PCI_DOMAINS_GENERIC=y +CONFIG_PCI_IMX6=y +CONFIG_PCI_MSI=y +CONFIG_PCI_MSI_IRQ_DOMAIN=y +CONFIG_PINCTRL_IMX=y +CONFIG_DWMAC_DWC_QOS_ETH=y +CONFIG_PINCTRL=y +CONFIG_PINCTRL_SINGLE=y +CONFIG_PINCTRL_IMX=y +CONFIG_PINCTRL_IMX8MM=y +CONFIG_PINCTRL_IMX8MN=y +CONFIG_PINCTRL_IMX8MP=y +CONFIG_PINCTRL_IMX8MQ=y +CONFIG_THERMAL=y +CONFIG_IMX8MM_THERMAL=y +CONFIG_REGULATOR_MP5416=y +CONFIG_REGULATOR_PCA9450=y +CONFIG_USB_CONN_GPIO=y +CONFIG_NOP_USB_XCEIV=y +CONFIG_USB_OTG=y +CONFIG_USB_DWC3=y +CONFIG_USB_DWC3_DUAL_ROLE=y +# CONFIG_USB_DWC3_GADGET is not set +# CONFIG_USB_DWC3_HOST is not set +CONFIG_USB_DWC3_IMX8MP=y +# CONFIG_MMC_SDHCI_PCI is not set +CONFIG_CLK_IMX8MM=y +CONFIG_CLK_IMX8MN=y +CONFIG_CLK_IMX8MP=y +CONFIG_CLK_IMX8MQ=y +CONFIG_CLK_IMX8QXP=y +CONFIG_SOC_IMX8M=y +# CONFIG_IMX_DSP is not set +# CONFIG_IMX_SCU is not set +CONFIG_EXTCON_USB_GPIO=y +CONFIG_PHY_FSL_IMX8MQ_USB=y +# CONFIG_PHY_MIXEL_LVDS_PHY is not set +CONFIG_RESET_IMX7=y +CONFIG_INTERCONNECT=y +CONFIG_INTERCONNECT_IMX=y +CONFIG_INTERCONNECT_IMX8MM=y +CONFIG_INTERCONNECT_IMX8MN=y +CONFIG_INTERCONNECT_IMX8MQ=y +CONFIG_INTERCONNECT_IMX8MP=y +# CONFIG_DMA_CMA is not set +CONFIG_VT=y +CONFIG_VT_CONSOLE=y +CONFIG_VT_HW_CONSOLE_BINDING=y diff --git a/target/linux/imx/cortexa53/target.mk b/target/linux/imx/cortexa53/target.mk new file mode 100644 index 00..b9b32d1829 --- /dev/null +++ b/target/linux/imx/cortexa53/target.mk @@ -0,0
Re: [PATCH] dt-bindings: leds: add FUNCTION defines for per-band WLANs
On Wed, 17 Jan 2024 16:17:36 +0100, Rafał Miłecki wrote: > From: Rafał Miłecki > > Most wireless routers and access points can operate in multiple bands > simultaneously. Vendors often equip their devices with per-band LEDs. > > Add defines for those very common functions to allow cleaner & clearer > bindings. > > Signed-off-by: Rafał Miłecki > --- > include/dt-bindings/leds/common.h | 3 +++ > 1 file changed, 3 insertions(+) > Acked-by: Rob Herring ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: (subset) [PATCH] dt-bindings: leds: add FUNCTION defines for per-band WLANs
On Wed, 17 Jan 2024 16:17:36 +0100, Rafał Miłecki wrote: > Most wireless routers and access points can operate in multiple bands > simultaneously. Vendors often equip their devices with per-band LEDs. > > Add defines for those very common functions to allow cleaner & clearer > bindings. > > > [...] Applied, thanks! [1/1] dt-bindings: leds: add FUNCTION defines for per-band WLANs commit: 89d9d3eedc8804e06a770e3cf1279f9131b785f1 -- Lee Jones [李琼斯] ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH dt-schema] schemas: chosen: Add OpenWrt LEDs properties for system states
On 09/01/2024 17:38, Rafał Miłecki wrote: > On 9.01.2024 10:02, Krzysztof Kozlowski wrote: >> On 09/01/2024 09:23, Rafał Miłecki wrote: >>> From: Rafał Miłecki >>> >>> OpenWrt project provides downstream support for thousands of embedded >>> home network devices. Its custom requirement for DT is to provide info >>> about LEDs roles. Currently it does it by using custom non-documented >>> aliases. While formally valid (aliases.yaml doesn't limit names or >>> purposes of aliases) it's quite a loose solution. >>> >>> Document 4 precise "chosen" biding properties with clearly documented >>> OpenWrt usage. This will allow upstreaming tons of DTS files that noone >> >> typo: none > > typo: no one ;) > >>> cared about so far as those would need to be patched downstream anyway. >> >> From all this description I understand why you want to add it, but I >> don't understand what are you exactly adding and how it is being used by >> the users/OS. > > In OpenWrt we have user space script that reads LED full path: > cat /proc/device-tree/aliases/led- > > And then sets LED to state matching current boot stage: > echo 1 > /sys/class/leds//brightness OK, it's nowhere close to a hardware property. You now instruct OS what to do. It's software or software policy. > > >>> Signed-off-by: Rafał Miłecki >>> --- >>> A few weeks ago I was seeking for a help regarding OpenWrt's need for >>> specifing LEDs roles in DT, see: >>> >>> Describing LEDs roles in device tree? >>> https://lore.kernel.org/linux-devicetree/ee912a89-4fd7-43c3-a79b-16659a035...@gmail.com/T/#u >>> >>> I DON'T think OpenWrt's current solution with aliases is good enough: >>> * It's not clearly documented >>> * It may vary from other projects usa case >>> * It may be refused by random maintainers I think >>> >>> I decided to suggest 4 OpenWrt-prefixed properties for "chosen". I'm >>> hoping this small custom binding is sth we could go with. I'm really >>> looking forward to upstreaming OpenWrt's downstream DTS files so other >>> projects (e.g. Buildroot) can use them. >>> >>> If you have any better fitting solution in mind please let me know. I >>> should be fine with anything that lets me solve this downstream mess >>> situation. >>> >>> dtschema/schemas/chosen.yaml | 9 + >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/dtschema/schemas/chosen.yaml b/dtschema/schemas/chosen.yaml >>> index 6d5c3f1..96d0db7 100644 >>> --- a/dtschema/schemas/chosen.yaml >>> +++ b/dtschema/schemas/chosen.yaml >>> @@ -264,4 +264,13 @@ properties: >>> patternProperties: >>> "^framebuffer": true >>> >>> + "^openwrt,led-(boot|failsafe|running|upgrade)$": >>> +$ref: types.yaml#/definitions/string >>> +description: >>> + OpenWrt choice of LED for a given role. >> >> Neither this explains it. What is "OpenWrt choice"? Choice like what? >> What are the valid choices? >> >>> Value must be a full path (encoded >>> + as a string) to a relevant LED node. >> >> What do you mean by full path? DT node path? Then no, use phandles. >> >> Anyway, we have existing properties for this - LED functions. Just >> choose LED with given function (from sysfs) and you are done. If >> function is missing in the header: feel free to go, least for me. > > In "Describing LEDs roles in device tree?" e-mail I wrote: > > " > Please note that "function" on its own is not sufficient as multiple > LEDs my share the same function name but its meaning may vary depending > on color. > " > > Let me elaborate here. > > Values of "function" property match LEDs descriptions (usually it's a > physical label). That is OK and makes sense but doesn't allow OpenWrt to > automatically pick proper LED to use during given boot stage. > > Some devices may have multiple color of a the same LED function. OpenWrt > developer needs to choose which color to use for failsafe more and which > boot fully booted state (and others too). > > Devices also differ in available LEDs by their functions. Some may have > LED_FUNCTION_POWER that OpenWrt needs to use. Some may have > LED_FUNCTION_STATUS. Or both. There are some more (less common) > functions like LED_FUNCTION_ACTIVITY. > > We failed at OpenWrt to develop a single generic script to rule all > devices and all their LEDs combinations. We ended up with developers > *choosing* what LED to use for a given system state. So this all is because you want to write easier OS. That's abuse of Devicetree. You can define which LEDs have different meaning, e.g. physical label or icon attached to them. That's a hardware property. You can also define how pieces of hardware are wired together and create entire system, e.g. connect one LED to disk activity. However what you are proposing here is to dynamically configure one, given OS. I don't think it is suitable. The problem of OS to nicely figure out which LED to blink when, is not a problem of Devicetree. It is a problem of OS and its configuration. Best regards, Krzysztof
Re: [PATCH dt-schema] schemas: chosen: Add OpenWrt LEDs properties for system states
On 09/01/2024 22:08, Geert Uytterhoeven wrote: > Hi Krzysztof, > > On Tue, Jan 9, 2024 at 8:10 PM Krzysztof Kozlowski > wrote: >> On 09/01/2024 17:38, Rafał Miłecki wrote: >>> On 9.01.2024 10:02, Krzysztof Kozlowski wrote: On 09/01/2024 09:23, Rafał Miłecki wrote: > From: Rafał Miłecki > > OpenWrt project provides downstream support for thousands of embedded > home network devices. Its custom requirement for DT is to provide info > about LEDs roles. Currently it does it by using custom non-documented > aliases. While formally valid (aliases.yaml doesn't limit names or > purposes of aliases) it's quite a loose solution. > > Document 4 precise "chosen" biding properties with clearly documented > OpenWrt usage. This will allow upstreaming tons of DTS files that noone typo: none >>> >>> typo: no one ;) >>> > cared about so far as those would need to be patched downstream anyway. From all this description I understand why you want to add it, but I don't understand what are you exactly adding and how it is being used by the users/OS. >>> >>> In OpenWrt we have user space script that reads LED full path: >>> cat /proc/device-tree/aliases/led- >>> >>> And then sets LED to state matching current boot stage: >>> echo 1 > /sys/class/leds//brightness >> >> OK, it's nowhere close to a hardware property. You now instruct OS what >> to do. It's software or software policy. > Anyway, we have existing properties for this - LED functions. Just choose LED with given function (from sysfs) and you are done. If function is missing in the header: feel free to go, least for me. >>> >>> In "Describing LEDs roles in device tree?" e-mail I wrote: >>> >>> " >>> Please note that "function" on its own is not sufficient as multiple >>> LEDs my share the same function name but its meaning may vary depending >>> on color. >>> " >>> >>> Let me elaborate here. >>> >>> Values of "function" property match LEDs descriptions (usually it's a >>> physical label). That is OK and makes sense but doesn't allow OpenWrt to >>> automatically pick proper LED to use during given boot stage. >>> >>> Some devices may have multiple color of a the same LED function. OpenWrt >>> developer needs to choose which color to use for failsafe more and which >>> boot fully booted state (and others too). >>> >>> Devices also differ in available LEDs by their functions. Some may have >>> LED_FUNCTION_POWER that OpenWrt needs to use. Some may have >>> LED_FUNCTION_STATUS. Or both. There are some more (less common) >>> functions like LED_FUNCTION_ACTIVITY. >>> >>> We failed at OpenWrt to develop a single generic script to rule all >>> devices and all their LEDs combinations. We ended up with developers >>> *choosing* what LED to use for a given system state. >> >> So this all is because you want to write easier OS. That's abuse of >> Devicetree. You can define which LEDs have different meaning, e.g. >> physical label or icon attached to them. That's a hardware property. >> >> You can also define how pieces of hardware are wired together and create >> entire system, e.g. connect one LED to disk activity. >> >> However what you are proposing here is to dynamically configure one, >> given OS. I don't think it is suitable. >> >> The problem of OS to nicely figure out which LED to blink when, is not a >> problem of Devicetree. It is a problem of OS and its configuration. > > I'd say it's a grey area... > > What if the colors are printed on the case, next to the LED? > Like these multi-color LEDs indicating port speed on network switches? > Then it suddenly becomes hardware description, just like > "aliases/serial2 = &...;" referring to serial port 2. > > Next, what if the colors are not printed on the case, but documented > in the product's manual? > What if there is no paper product manual, but just the OpenWRT online > documentation? Mapping between colors and logical meanings is subjective. I don't think it is the same case as LED with a radio-signal or power-plug label. Anyway I also agree that this distinction is a bit blurred. Best regards, Krzysztof ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH dt-schema] schemas: chosen: Add OpenWrt LEDs properties for system states
On 09/01/2024 22:48, Rafał Miłecki wrote: >> >> You can also define how pieces of hardware are wired together and create >> entire system, e.g. connect one LED to disk activity. >> >> However what you are proposing here is to dynamically configure one, >> given OS. I don't think it is suitable. >> >> The problem of OS to nicely figure out which LED to blink when, is not a >> problem of Devicetree. It is a problem of OS and its configuration. > > I'd say it's a thin line. Or just a grey idea as Geert said. > > What is a LED "function" after all? How exactly are: > LED_FUNCTION_STATUS > LED_FUNCTION_ACTIVITY > LED_FUNCTION_BOOT > LED_FUNCTION_HEARTBEAT > different from each other? > > I can imagine OpenWrt seeing a different role for LED_FUNCTION_ACTIVITY > or LED_FUNCTION_BOOT than other projects. ...which is not a problem. The meaning of these, except quite obvious heartbeat, is defined by the OS or system configurators. > > Proposed properties "openwrt,led-" don't exactly describe hardware > per se but are still designed to deal with hardware differences. > > From a practical point of view it's much easier to put such OS > configuration info in DT since it's closely related to LEDs defined > there and it helps a lot with maintenance. If at some point we change I agree, however this is an abuse of DT and therefore it is not an argument to put something into DT. And this was told many, many times on the lists: just because it is easier to instantiate each Linux struct device from DT (with 1-1 mapping between devices and device nodes), does not mean you should do it. Same here. Just because it is easier for OpenWRT, does not mean this is the solution. This is the most frequent argument used in all of such DT abuses. Another example: I want to boot some virtual machine and doing ACPI is too difficult, so I will just use DT as way to pass from host to guest. There were several examples of this. I understand why DT is the easiest for the job... > DT due to previous mistake (e.g. we fix LED color from amber to red) > that would mean breaking user space of Linux system (changing LED name). > Having DT binding for LEDs roles would prevent that. I can argue that LEDs "label" can be un-deprecated and used for that purpose as well. It will provide you stable sysfs entry, regardless of the "color" property. In your case you could also use to solve the actual problem: just label each LED accordingly, e.g. "phase:boot", "phase:upgrade". It might be not the best solution though, because we put one's OS expectations inside DT device node... > I was hoping that vendor prefixed "chosen" properties may somehow get > accepted as a reasonable solution for dealing with hardware differences > even if they don't strictly describe hardware itself. It's actually not the worst idea considering above "OS expectations inside DT device node" when using "label"... > > Is there any other DT solution you think would be better and could be > accepted? > Given my hesitation about "function" meaning would something like > openwrt,function = "(boot|failsafe|running|upgrade)" > be any better? Your problem is not really that specific to OpenWRT - several embedded systems want to do the same, including Android. Some of the LEDs must be active before the user-space comes up, so it is the job for kernel and/or DT. Therefore let's go with generic solutions? I still wonder why we cannot define new LED FUNCTION constants and use them? You need them only for the pre-userspace phase, so do you expect one LED would have two functions? But if you do not have user-space how this aliases are being handled? By how? If you have user-space, then it's not a job for kernel. Best regards, Krzysztof ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH dt-schema] schemas: chosen: Add OpenWrt LEDs properties for system states
Hi Krzysztof, On Tue, Jan 9, 2024 at 8:10 PM Krzysztof Kozlowski wrote: > On 09/01/2024 17:38, Rafał Miłecki wrote: > > On 9.01.2024 10:02, Krzysztof Kozlowski wrote: > >> On 09/01/2024 09:23, Rafał Miłecki wrote: > >>> From: Rafał Miłecki > >>> > >>> OpenWrt project provides downstream support for thousands of embedded > >>> home network devices. Its custom requirement for DT is to provide info > >>> about LEDs roles. Currently it does it by using custom non-documented > >>> aliases. While formally valid (aliases.yaml doesn't limit names or > >>> purposes of aliases) it's quite a loose solution. > >>> > >>> Document 4 precise "chosen" biding properties with clearly documented > >>> OpenWrt usage. This will allow upstreaming tons of DTS files that noone > >> > >> typo: none > > > > typo: no one ;) > > > >>> cared about so far as those would need to be patched downstream anyway. > >> > >> From all this description I understand why you want to add it, but I > >> don't understand what are you exactly adding and how it is being used by > >> the users/OS. > > > > In OpenWrt we have user space script that reads LED full path: > > cat /proc/device-tree/aliases/led- > > > > And then sets LED to state matching current boot stage: > > echo 1 > /sys/class/leds//brightness > > OK, it's nowhere close to a hardware property. You now instruct OS what > to do. It's software or software policy. > >> Anyway, we have existing properties for this - LED functions. Just > >> choose LED with given function (from sysfs) and you are done. If > >> function is missing in the header: feel free to go, least for me. > > > > In "Describing LEDs roles in device tree?" e-mail I wrote: > > > > " > > Please note that "function" on its own is not sufficient as multiple > > LEDs my share the same function name but its meaning may vary depending > > on color. > > " > > > > Let me elaborate here. > > > > Values of "function" property match LEDs descriptions (usually it's a > > physical label). That is OK and makes sense but doesn't allow OpenWrt to > > automatically pick proper LED to use during given boot stage. > > > > Some devices may have multiple color of a the same LED function. OpenWrt > > developer needs to choose which color to use for failsafe more and which > > boot fully booted state (and others too). > > > > Devices also differ in available LEDs by their functions. Some may have > > LED_FUNCTION_POWER that OpenWrt needs to use. Some may have > > LED_FUNCTION_STATUS. Or both. There are some more (less common) > > functions like LED_FUNCTION_ACTIVITY. > > > > We failed at OpenWrt to develop a single generic script to rule all > > devices and all their LEDs combinations. We ended up with developers > > *choosing* what LED to use for a given system state. > > So this all is because you want to write easier OS. That's abuse of > Devicetree. You can define which LEDs have different meaning, e.g. > physical label or icon attached to them. That's a hardware property. > > You can also define how pieces of hardware are wired together and create > entire system, e.g. connect one LED to disk activity. > > However what you are proposing here is to dynamically configure one, > given OS. I don't think it is suitable. > > The problem of OS to nicely figure out which LED to blink when, is not a > problem of Devicetree. It is a problem of OS and its configuration. I'd say it's a grey area... What if the colors are printed on the case, next to the LED? Like these multi-color LEDs indicating port speed on network switches? Then it suddenly becomes hardware description, just like "aliases/serial2 = &...;" referring to serial port 2. Next, what if the colors are not printed on the case, but documented in the product's manual? What if there is no paper product manual, but just the OpenWRT online documentation? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM board lockups/hangs triggered by locks and mutexes
Hi Rafal, On Mon, Aug 7, 2023 at 1:11 PM Rafał Miłecki wrote: > On 4.08.2023 13:07, Rafał Miłecki wrote: > > I triple checked that. Dropping a single unused function breaks kernel / > > device stability on BCM53573! > > > > AFAIK the only thing below diff actually affects is location of symbols > > (I actually verified that by comparing System.map before and after - > > over 22'000 of relocated symbols). > > > > Can some unfortunate location of symbols cause those hangs/lockups? > > I performed another experiment. First I dropped mtd_check_of_node() to > bring kernel back to the stable state. > > Then I started adding useless code to the mtdchar_unlocked_ioctl(). I > ended up adding just enough to make sure all post-mtd symbols in > System.map got the same offset as in case of backporting > mtd_check_of_node(). > > I started experiencing lockups/hangs again. > > I repeated the same test with adding dumb code to the brcm_nvram_probe() > and verifying symbols offsets following brcm_nvram_probe one. > > I believe this confirms that this problem is about offset or alignment > of some specific symbol(s). The remaining question is what symbols and > how to fix or workaround that. I had similar experiences on other ARM platforms many years ago: bisection lead to something completely bogus, and it turned out adding a single line of innocent code made the system lock-up or crash unexpectedly. It was definitely related to alignment, as adding the right extra amount of innocent code would fix the problem. Until some later change changing alignment again... I never found the real cause, but the problems went away over time. I am not sure I did enable all required errata config options, so I may have missed some... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes
Hi Rafał, On Mon, Sep 4, 2023 at 10:35 AM Rafał Miłecki wrote: > 2. Clock (arm,armv7-timer) > > While comparing main clock in Broadcom's SDK with upstream one I noticed > a tiny difference: mask value. I don't know it it makes any sense but > switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in > arm_arch_timer.c (to match SDK) increases average uptime (time before a > hang/lockup happens) from 4 minutes to 36 minutes. That code path is used only for type != ARCH_TIMER_TYPE_CP15, but your kernel log arch_timer: cp15 timer(s) running at 0.03MHz (virt). suggest that type == ARCH_TIMER_TYPE_CP15?!? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel