Hi Samuel, thanks for your insights. Will try to follow them and will report back here.
In the meantime I have built a kernel with dynamic debug and I can see that cpu_freq and the associated calls shown in my earlier post must run millions of times. And then out of the blue a crash...so some hw flakiness came to my mind, too. regards Torsten sam...@sholland.org schrieb am Samstag, 16. Juli 2022 um 06:16:16 UTC+2: > Hi Torsten, > > On 7/13/22 3:18 AM, Torsten Beyer wrote: > > Hi all, > > > > I am trying to debug a bug on an open source air navigation box for > > gliders called openvario <https://www.openvario.org/doku.php>. It is > > based on a cubieboard (A20) plus some additional serial connections > > and an optional sensor board for various flight related pressures. > > > > System runs on kernel 5.18.5 generated using Yocto 4.0 kirkstone. The > > system tends to run for a couple of hours and then freezes/crashes. > > At the bottom of this post I have pasted a typical kernel debug > > output once these freezes happen. The crash always happens in the > > cpu_freq driver. If I set cpu frequency to a fixed frequency (setting > > min=max frequency) those crashed disappear. This seems to be a work > > around at the cost of fixing cpu speed. > > > > So it _seems_ the crash is caused by cpu_freq trying to change the > > cpu frequency (at least at some point in time). > > > > To be honest, I am rather clueless on how to go about finding the > > root of this issue, let along fixing it. So I thought, I'd ask around > > here whether this bug somehow looks familiar and may have been > > tackled (or even fixed) previously (didn't find anything, though, via > > the search function). In other words: I am thankful for any hint > > people may be able to give me to get nearer to a fix. > > I have not seen something like this before. It looks like hardware > flakiness. Can you provide a disassembly of ccu_div_recalc_rate > from the kernel this splat came from, to confirm my analysis? > > > thanks for any pointers > > Torsten > > > > [26996.004010] Unable to handle kernel paging request at virtual address > 08d80050 > > [26996.011337] [08d80050] *pgd=00000000 > > [26996.014952] Internal error: Oops: 5 [#1] SMP ARM > > [26996.019590] Modules linked in: > > [26996.022663] CPU: 1 PID: 95 Comm: sugov:0 Not tainted 5.18.5 #1 > > [26996.028509] Hardware name: Allwinner sun7i (A20) Family > > [26996.033738] PC is at ccu_div_recalc_rate+0x48/0x90 > > [26996.038555] LR is at ccu_mux_helper_apply_prediv+0x18/0x1c > > The crash is between the calls to ccu_mux_helper_apply_prediv and > divider_recalc_rate, so we are loading arguments for the call to > divider_recalc_rate. > > > [26996.044054] pc : [] lr : [] psr: 600b0113 > > [26996.050326] sp : f09e5dc8 ip : 00000000 fp : c1938200 > > [26996.055554] r10: c1867440 r9 : 1f78a400 r8 : c1302d00 > > [26996.060781] r7 : 1312d000 r6 : 1f78a400 r5 : 00000002 r4 : 08d80084 > > Assuming r4 is "hw", then the faulting address is cd->div.flags. > This is weird because r5 already contains cd->div.width... > > > [26996.067311] r3 : 00000000 r2 : ffffffff r1 : 00000001 r0 : 1f78a400 > > ..and r3 already contains cd->div.table. So we were already able > to access parts of the struct both before and after the faulting > address. > > > [26996.073843] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment > none > > [26996.080985] Control: 10c5387d Table: 41ff006a DAC: 00000051 > > [26996.086733] Register r0 information: non-paged memory > > [26996.091799] Register r1 information: non-paged memory > > [26996.096858] Register r2 information: non-paged memory > > [26996.101915] Register r3 information: NULL pointer > > [26996.106627] Register r4 information: non-paged memory > > [26996.111688] Register r5 information: non-paged memory > > [26996.116746] Register r6 information: non-paged memory > > [26996.121805] Register r7 information: non-paged memory > > [26996.126863] Register r8 information: slab kmalloc-128 start c1302d00 > pointer offset 0 size 128 > > [26996.135514] Register r9 information: non-paged memory > > [26996.140574] Register r10 information: slab task_struct start c1867440 > pointer offset 0 > > [26996.148517] Register r11 information: slab kmalloc-128 start c1938200 > pointer offset 0 size 128 > > [26996.157244] Register r12 information: NULL pointer > > [26996.162049] Process sugov:0 (pid: 95, stack limit = 0xf4bf205c) > > [26996.167985] Stack: (0xf09e5dc8 to 0xf09e6000) > > [26996.172361] 5dc0: c0d81584 c03db530 00000000 1f78a400 c1355700 > c03d181c > > What I think is happening is that the value in r4 got corrupted from > 0xc0d81584 (the saved value on the top of the stack) to 0x08d80084. > > Can you try increasing the voltage of the lower OPPs by 100 mV? And > if that doesn't work, try setting all of the OPPs to 1.4 V. That > should rule out any instability due to an insufficient CPU supply > voltage, and also due to any delay in slewing the regulator output. > > Regards, > Samuel > > > [26996.180547] 5de0: c1355600 c1355700 1f78a400 c03d34ec 00000000 > c1355600 1f78a400 39387000 > > [26996.188733] 5e00: c1302d00 1f78a400 c1867440 c03d3554 00000000 > c1302d00 016e3600 39387000 > > [26996.196917] 5e20: c1302d00 1f78a400 c1867440 c03d3554 c1355600 > 00000000 1f78a400 c1867440 > > [26996.205101] 5e40: c1302d00 1f78a400 c1867440 c03d39f0 1f78a400 > 00000000 ffffffff 1f78a400 > > [26996.213287] 5e60: c0d81bd0 df7bf617 c193a340 1f78a400 1f78a400 > c1938300 ef7dc050 1f78a400 > > [26996.221474] 5e80: c1867440 c03d3c28 c18b3b00 c1938500 1f78a400 > c1938300 ef7dc050 c06122a4 > > [26996.229659] 5ea0: c1938300 00000001 ffffffff df7bf617 c0d81bd0 > c18b3b00 ef7dc050 1f78a400 > > [26996.237844] 5ec0: 00000007 c1867440 c1938500 c0db652c 00080e80 > c0612674 00000000 c0db617c > > [26996.246030] 5ee0: 1f78a400 df7bf617 c1812800 c1812800 00000000 > c0dfd944 000ea600 00000000 > > [26996.254214] 5f00: 00000002 c0617054 00000001 c1867440 00000000 > 00000000 f09e5f5c c1812800 > > [26996.262400] 5f20: 000ea600 00080e80 00000024 df7bf617 00000004 > c184ba00 c184ba14 00000000 > > [26996.270585] 5f40: 00080e80 c184ba2c 00000001 c0a34650 00000000 > c0159c98 00000000 c184ba28 > > [26996.278770] 5f60: c1867440 c0dea144 c184ba2c c0136954 c193a500 > c1867440 c01368e0 c184ba28 > > [26996.286955] 5f80: c13c2100 f0891c44 00000000 c0138194 c193a500 > c01380c4 00000000 00000000 > > [26996.295138] 5fa0: 00000000 00000000 00000000 c0100148 00000000 > 00000000 00000000 00000000 > > [26996.303321] 5fc0: 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 > > [26996.311505] 5fe0: 00000000 00000000 00000000 00000000 00000013 > 00000000 00000000 00000000 > > [26996.319695] ccu_div_recalc_rate from clk_recalc+0x34/0x78 > > [26996.325215] clk_recalc from clk_change_rate+0xa4/0x29c > > [26996.330461] clk_change_rate from clk_change_rate+0x10c/0x29c > > [26996.336226] clk_change_rate from clk_change_rate+0x10c/0x29c > > [26996.341991] clk_change_rate from clk_core_set_rate_nolock+0x16c/0x234 > > [26996.348539] clk_core_set_rate_nolock from clk_set_rate+0x30/0x154 > > [26996.354741] clk_set_rate from _set_opp+0x268/0x550 > > [26996.359644] _set_opp from dev_pm_opp_set_rate+0xe8/0x20c > > [26996.365062] dev_pm_opp_set_rate from > __cpufreq_driver_target+0x584/0x6e4 > > [26996.371876] __cpufreq_driver_target from sugov_work+0x48/0x54 > > [26996.377741] sugov_work from kthread_worker_fn+0x74/0x1a4 > > [26996.383167] kthread_worker_fn from kthread+0xd0/0xec > > [26996.388242] kthread from ret_from_fork+0x14/0x2c > > [26996.392967] Exception stack(0xf09e5fb0 to 0xf09e5ff8) > > [26996.398032] 5fa0: 00000000 00000000 00000000 00000000 > > [26996.406216] 5fc0: 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 > > [26996.414398] 5fe0: 00000000 00000000 00000000 00000000 00000013 > 00000000 > > [26996.421027] Code: e0055231 e244102c e3e02000 eb0001f3 (e5143034) > -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. To view this discussion on the web, visit https://groups.google.com/d/msgid/linux-sunxi/73b23b16-d359-43a7-8e96-4a2d891b50d8n%40googlegroups.com.