Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On Thu, Mar 09, 2017 at 03:43:40PM +0100, Tobias Klauser wrote: > On 2017-03-09 at 14:20:51 +0100, Guenter Roeck wrote: > > On 03/07/2017 04:46 AM, Tobias Klauser wrote: > > [ ... ] > > > > > > > >Linux version 4.11.0-rc1-dirty (tobiask@ziws08) (gcc version 7.0.1 > > >20170226 (experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017 > > >bootconsole [early0] enabled > > >Early console on uart16650 initialized at 0xf8001600 > > >OF: fdt: Error -11 processing FDT > > >Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! > > > > > >---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in > > >devicetree! > > > > > >Looks like the in-memory device tree somehow gets corrupted. Not sure > > >yet why and how this is linked to the Kconfig options selected but at > > >least we now have a possibility to use debug messages earlier on. > > > > > > > I think I found the problem. In unflatten_and_copy_device_tree(), with added > > debug information: > > > > OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca) > > > > ... and then initial_boot_params is copied to dt, which results in corrupted > > fdt since the memory overlaps. Looks like the initial_boot_params memory > > is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch(). > > Thanks for the analysis. That certainly explains the issue. The > following patch solves the issue for me. Though I'm not entirely sure if > it is correct and that is all that is needed. Do we need to retain the > memory for initial_boot_params after bootmem is freed? > I don't know if it is correct either, but it matches what I came up with, and it does work for me as well. Feel free to add Tested-by: Guenter Roeck when you submit the patch for real. Thanks, Guenter > diff --git a/arch/nios2/kernel/prom.c b/arch/nios2/kernel/prom.c > index 099f5ce1f3cb..6869fe03f3ff 100644 > --- a/arch/nios2/kernel/prom.c > +++ b/arch/nios2/kernel/prom.c > @@ -48,6 +48,13 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, > u64 align) > return alloc_bootmem_align(size, align); > } > > +int __init early_init_dt_reserve_memory_arch(phys_addr_t base, > + phys_addr_t size, bool nomap) > +{ > + reserve_bootmem(base, size, BOOTMEM_DEFAULT); > + return 0; > +} > + > void __init early_init_devtree(void *params) > { > __be32 *dtb = (u32 *)__dtb_start; > diff --git a/arch/nios2/kernel/setup.c b/arch/nios2/kernel/setup.c > index 6e57ffa5db27..6044d9be28b4 100644 > --- a/arch/nios2/kernel/setup.c > +++ b/arch/nios2/kernel/setup.c > @@ -201,6 +201,9 @@ void __init setup_arch(char **cmdline_p) > } > #endif /* CONFIG_BLK_DEV_INITRD */ > > + early_init_fdt_reserve_self(); > + early_init_fdt_scan_reserved_mem(); > + > unflatten_and_copy_device_tree(); > > setup_cpuinfo();
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 2017-03-09 at 14:20:51 +0100, Guenter Roeck wrote: > On 03/07/2017 04:46 AM, Tobias Klauser wrote: > [ ... ] > > > > >Linux version 4.11.0-rc1-dirty (tobiask@ziws08) (gcc version 7.0.1 20170226 > >(experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017 > >bootconsole [early0] enabled > >Early console on uart16650 initialized at 0xf8001600 > >OF: fdt: Error -11 processing FDT > >Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! > > > >---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in > >devicetree! > > > >Looks like the in-memory device tree somehow gets corrupted. Not sure > >yet why and how this is linked to the Kconfig options selected but at > >least we now have a possibility to use debug messages earlier on. > > > > I think I found the problem. In unflatten_and_copy_device_tree(), with added > debug information: > > OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca) > > ... and then initial_boot_params is copied to dt, which results in corrupted > fdt since the memory overlaps. Looks like the initial_boot_params memory > is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch(). Thanks for the analysis. That certainly explains the issue. The following patch solves the issue for me. Though I'm not entirely sure if it is correct and that is all that is needed. Do we need to retain the memory for initial_boot_params after bootmem is freed? diff --git a/arch/nios2/kernel/prom.c b/arch/nios2/kernel/prom.c index 099f5ce1f3cb..6869fe03f3ff 100644 --- a/arch/nios2/kernel/prom.c +++ b/arch/nios2/kernel/prom.c @@ -48,6 +48,13 @@ void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align) return alloc_bootmem_align(size, align); } +int __init early_init_dt_reserve_memory_arch(phys_addr_t base, + phys_addr_t size, bool nomap) +{ + reserve_bootmem(base, size, BOOTMEM_DEFAULT); + return 0; +} + void __init early_init_devtree(void *params) { __be32 *dtb = (u32 *)__dtb_start; diff --git a/arch/nios2/kernel/setup.c b/arch/nios2/kernel/setup.c index 6e57ffa5db27..6044d9be28b4 100644 --- a/arch/nios2/kernel/setup.c +++ b/arch/nios2/kernel/setup.c @@ -201,6 +201,9 @@ void __init setup_arch(char **cmdline_p) } #endif /* CONFIG_BLK_DEV_INITRD */ + early_init_fdt_reserve_self(); + early_init_fdt_scan_reserved_mem(); + unflatten_and_copy_device_tree(); setup_cpuinfo();
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 03/07/2017 04:46 AM, Tobias Klauser wrote: [ ... ] Linux version 4.11.0-rc1-dirty (tobiask@ziws08) (gcc version 7.0.1 20170226 (experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017 bootconsole [early0] enabled Early console on uart16650 initialized at 0xf8001600 OF: fdt: Error -11 processing FDT Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! ---[ end Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! Looks like the in-memory device tree somehow gets corrupted. Not sure yet why and how this is linked to the Kconfig options selected but at least we now have a possibility to use debug messages earlier on. I think I found the problem. In unflatten_and_copy_device_tree(), with added debug information: OF: fdt: initial_boot_params=c861e400, dt=c861f000 size=28874 (0x70ca) ... and then initial_boot_params is copied to dt, which results in corrupted fdt since the memory overlaps. Looks like the initial_boot_params memory is not reserved and (re-)allocated by early_init_dt_alloc_memory_arch(). Guenter
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 03/07/2017 04:46 AM, Tobias Klauser wrote: On 2017-03-03 at 04:04:41 +0100, Guenter Roeck wrote: On 03/02/2017 08:38 AM, Tobias Klauser wrote: On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: Hi Guenter, Tobias and Sandra, thanks for your effort here. On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: On 02/28/2017 08:53 AM, Tobias Klauser wrote: (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. It seems a bit strange that code which is not actually called causes problems like that. Yes, it is, though it is always possible. The code isn't exactly easy to understand; there may be some hidden caveats such as global variables. It may also be that some jump target exceeds its range (though why that would only be seen with the LZ4 code is another question), or that the compiler gets confused by the forced inlines (disabling that didn't make a difference, though, nor did disabling -O3). Please let me know if and how I may help you figure out what's happening, especially regarding the differences between the previous LZ4 and the current implementation. For my part I am all but clueless. Unless someone has an idea, we may to disable LZ4 support for nios2 for the time being. Does anyone have thoughts on that ? Of course, that would not help if the problem also affects recent gcc/binutil versions on other architectures. After some further investigations, I'd say this isn't "caused" by LZ4 specifically but by a more general problem with one of the nios2 arch specific tools involved. I manually enabled random additional CONFIG_* options and in some cases I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return -EINVAL in place) while in others I didn't. So I'd rather suspect this problem to be connected to the size or structure of the generated vmlinux image. Or could this even be a problem with qemu? Did anyone already verify this on the 10m50 devboard? (Unfortunately I don't have any nios2 devboard available right now, otherwise I would have done this...) That is of course always possible. Other than that I'm also becoming all but clueless... One option I thought of was using the QEMU monitor to dump the CPU state after the hang but so far I didn't manage to get it to work (hints appreciated ;) Something like qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio gives you a qemu monitor window. Use "info registers" to see registers. Looks like it is stuck in init_bootmem_core, or at least that is what it shows for me. Thanks a lot for the hint, this worked perfectly. I'm not all that familiar with qemu :-/ Using the qemu gdbserver I can indeed confirm that it seems to be stuck in init_bootmem_core: (gdb) file vmlinux Reading symbols from vmlinux...done. (gdb) target remote localhost:1234 Remote debugging using localhost:1234 link_bootmem (bdata=) at mm/bootmem.c:80 80 if (bdata->node_min_pfn < ent->node_min_pfn) { This looks like a very weird place for it to get stuck... So I followed a different path and implemented early printk support for the 8250/16650 serial console on nios2, so I could get debug outputs earlier on (patch below, I'll also officially submit this later one). That is great; I'll add that to my own tests to get some output. Now I get the following output on boot: Linux version 4.11.0-rc1-dirty (tobiask@ziws08) (gcc version 7.0.1 20170226 (experimental) (GCC) ) #46 Tue Mar 7 13:40:53 CET 2017 bootconsole [early0] enabled Early console on uart16650 initialized at 0xf8001600 OF: fdt: Error -11 processing FDT Kernel panic - not syncing: setup_cpuinfo: No CPU found in devicetree! ---[ en
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 2017-03-03 at 04:04:41 +0100, Guenter Roeck wrote: > On 03/02/2017 08:38 AM, Tobias Klauser wrote: > >On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: > >>On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: > >>>Hi Guenter, Tobias and Sandra, > >>> > >>>thanks for your effort here. > >>> > >>>On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: > On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > >On 02/28/2017 08:53 AM, Tobias Klauser wrote: > >>(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > >>for nios2) > >> > >>On 2017-02-26 at 22:03:38 +0100, Guenter Roeck > >>wrote: > >>>Hi Sven, > >>> > >>>my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > >>>update LZ4 compressor module"). The test hangs early during boot before > >>>any console output is seen. Reverting the offending patch as well as > >>>the > >>>subsequent lz4 related patches fixes the problem. Disabling > >>>CONFIG_RD_LZ4 > >>>and with it other LZ4 options also fixes it (as does adding "return > >>>-EINVAL;" > >>>at the top of the LZ4 decompression code). For reference, bisect log > >>>is attached. > >>> > >>>I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > >>>and binutils 2.26.1. Scripts used to run the tests are available at > >>>https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > >>>Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > >> > >>Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > >>binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > >>get a kernel booting on latest master branch. AFAICT, none of the > >>LZ4_decompress_* functions are called during boot. > >> > >>> > >>>It seems a bit strange that code which is not actually called causes > >>>problems like that. > >>> > >>Yes, it is, though it is always possible. The code isn't exactly easy to > >>understand; there may be some hidden caveats such as global variables. It > >>may > >>also be that some jump target exceeds its range (though why that would only > >>be seen with the LZ4 code is another question), or that the compiler gets > >>confused by the forced inlines (disabling that didn't make a difference, > >>though, nor did disabling -O3). > >> > >>>Please let me know if and how I may help you figure out what's happening, > >>>especially > >>>regarding the differences between the previous LZ4 and the current > >>>implementation. > >>> > >> > >>For my part I am all but clueless. Unless someone has an idea, we may to > >>disable LZ4 support for nios2 for the time being. Does anyone have thoughts > >>on that ? Of course, that would not help if the problem also affects > >>recent gcc/binutil versions on other architectures. > > > >After some further investigations, I'd say this isn't "caused" by LZ4 > >specifically but by a more general problem with one of the nios2 arch > >specific tools involved. > > > >I manually enabled random additional CONFIG_* options and in some cases > >I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return > >-EINVAL in place) while in others I didn't. So I'd rather suspect this > >problem to be connected to the size or structure of the generated vmlinux > >image. > > > >Or could this even be a problem with qemu? Did anyone already verify > >this on the 10m50 devboard? (Unfortunately I don't have any nios2 > >devboard available right now, otherwise I would have done this...) > > > > That is of course always possible. > > >Other than that I'm also becoming all but clueless... One option I > >thought of was using the QEMU monitor to dump the CPU state after the > >hang but so far I didn't manage to get it to work (hints appreciated ;) > > > > Something like > > qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ > -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ > --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio > > gives you a qemu monitor window. Use "info registers" to see registers. > Looks like it is stuck in init_bootmem_core, or at least that is what it > shows for me. Thanks a lot for the hint, this worked perfectly. I'm not all that familiar with qemu :-/ Using the qemu gdbserver I can indeed confirm that it seems to be stuck in init_bootmem_core: (gdb) file vmlinux Reading symbols from vmlinux...done. (gdb) target remote localhost:1234 Remote debugging using localhost:1234 link_bootmem (bdata=) at mm/bootmem.c:80 80 if (bdata->node_min_pfn < ent->node_min_pfn) { This looks like a very weird place for it to get stuck... So I followed a different path and implemented early printk support for the 8250/16650 serial console on nios2, so I could get debug outputs earlier on (patch below, I'll also officially submit this later one). Now I get the following output on
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 03/02/2017 08:38 AM, Tobias Klauser wrote: On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: Hi Guenter, Tobias and Sandra, thanks for your effort here. On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: On 02/28/2017 08:53 AM, Tobias Klauser wrote: (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. It seems a bit strange that code which is not actually called causes problems like that. Yes, it is, though it is always possible. The code isn't exactly easy to understand; there may be some hidden caveats such as global variables. It may also be that some jump target exceeds its range (though why that would only be seen with the LZ4 code is another question), or that the compiler gets confused by the forced inlines (disabling that didn't make a difference, though, nor did disabling -O3). Please let me know if and how I may help you figure out what's happening, especially regarding the differences between the previous LZ4 and the current implementation. For my part I am all but clueless. Unless someone has an idea, we may to disable LZ4 support for nios2 for the time being. Does anyone have thoughts on that ? Of course, that would not help if the problem also affects recent gcc/binutil versions on other architectures. After some further investigations, I'd say this isn't "caused" by LZ4 specifically but by a more general problem with one of the nios2 arch specific tools involved. I manually enabled random additional CONFIG_* options and in some cases I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return -EINVAL in place) while in others I didn't. So I'd rather suspect this problem to be connected to the size or structure of the generated vmlinux image. Or could this even be a problem with qemu? Did anyone already verify this on the 10m50 devboard? (Unfortunately I don't have any nios2 devboard available right now, otherwise I would have done this...) That is of course always possible. Other than that I'm also becoming all but clueless... One option I thought of was using the QEMU monitor to dump the CPU state after the hang but so far I didn't manage to get it to work (hints appreciated ;) Something like qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ --append "rdinit=/sbin/init" -initrd busybox-nios2.cpio gives you a qemu monitor window. Use "info registers" to see registers. Looks like it is stuck in init_bootmem_core, or at least that is what it shows for me. Guenter
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 2017-03-01 at 20:45:21 +0100, Guenter Roeck wrote: > On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: > > Hi Guenter, Tobias and Sandra, > > > > thanks for your effort here. > > > > On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: > > > On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > > > > On 02/28/2017 08:53 AM, Tobias Klauser wrote: > > > > >(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > > > > >for nios2) > > > > > > > > > >On 2017-02-26 at 22:03:38 +0100, Guenter Roeck > > > > >wrote: > > > > >>Hi Sven, > > > > >> > > > > >>my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > > > > >>update LZ4 compressor module"). The test hangs early during boot > > > > >>before > > > > >>any console output is seen. Reverting the offending patch as well as > > > > >>the > > > > >>subsequent lz4 related patches fixes the problem. Disabling > > > > >>CONFIG_RD_LZ4 > > > > >>and with it other LZ4 options also fixes it (as does adding "return > > > > >>-EINVAL;" > > > > >>at the top of the LZ4 decompression code). For reference, bisect log > > > > >>is attached. > > > > >> > > > > >>I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > > > > >>and binutils 2.26.1. Scripts used to run the tests are available at > > > > >>https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > > > > >>Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > > > > > > > > >Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > > > > >binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > > > > >get a kernel booting on latest master branch. AFAICT, none of the > > > > >LZ4_decompress_* functions are called during boot. > > > > > > > > > It seems a bit strange that code which is not actually called causes > > problems like that. > > > Yes, it is, though it is always possible. The code isn't exactly easy to > understand; there may be some hidden caveats such as global variables. It may > also be that some jump target exceeds its range (though why that would only > be seen with the LZ4 code is another question), or that the compiler gets > confused by the forced inlines (disabling that didn't make a difference, > though, nor did disabling -O3). > > > Please let me know if and how I may help you figure out what's happening, > > especially > > regarding the differences between the previous LZ4 and the current > > implementation. > > > > For my part I am all but clueless. Unless someone has an idea, we may to > disable LZ4 support for nios2 for the time being. Does anyone have thoughts > on that ? Of course, that would not help if the problem also affects > recent gcc/binutil versions on other architectures. After some further investigations, I'd say this isn't "caused" by LZ4 specifically but by a more general problem with one of the nios2 arch specific tools involved. I manually enabled random additional CONFIG_* options and in some cases I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return -EINVAL in place) while in others I didn't. So I'd rather suspect this problem to be connected to the size or structure of the generated vmlinux image. Or could this even be a problem with qemu? Did anyone already verify this on the 10m50 devboard? (Unfortunately I don't have any nios2 devboard available right now, otherwise I would have done this...) Other than that I'm also becoming all but clueless... One option I thought of was using the QEMU monitor to dump the CPU state after the hang but so far I didn't manage to get it to work (hints appreciated ;) Thanks Tobias
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 2017-03-01 at 23:50:03 +0100, Sandra Loosemore wrote: > On 03/01/2017 11:58 AM, Sven Schmidt wrote: > >Hi Guenter, Tobias and Sandra, > > > >thanks for your effort here. > > > >On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: > >>On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > >>>On 02/28/2017 08:53 AM, Tobias Klauser wrote: > (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > for nios2) > > On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: > >Hi Sven, > > > >my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > >update LZ4 compressor module"). The test hangs early during boot before > >any console output is seen. Reverting the offending patch as well as the > >subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > >and with it other LZ4 options also fixes it (as does adding "return > >-EINVAL;" > >at the top of the LZ4 decompression code). For reference, bisect log > >is attached. > > > >I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > >and binutils 2.26.1. Scripts used to run the tests are available at > >https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > >Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > get a kernel booting on latest master branch. AFAICT, none of the > LZ4_decompress_* functions are called during boot. > > > > >It seems a bit strange that code which is not actually called causes > >problems like that. > > > >Please let me know if and how I may help you figure out what's happening, > >especially > >regarding the differences between the previous LZ4 and the current > >implementation. > > > However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can > reproduce the problem you see using the instructions Guenter provided in > the reply to Sven. > > I'll try to dig a bit deeper from here on. Any suggestions on what to > look out for wrt the differences between the gcc/binutils version are > welcome of course. > >>> > >>>This message doesn't give me enough context to know what is going on, > >>>especially without seeing the rest of the thread. Generally speaking, > >>>Mentor recommends you use one of our stable releases instead of trying to > >>>roll your own from mainline sources. As an upstream binutils and gcc > >>>maintainer I do try my best to look at bug reports for those components, > >>>but > >>>I need a reproducible standalone testcase and specific versions of the > >>>different components involved. > >>> > >>The problem is also seen with Sourcery CodeBench Lite 2016.11-32 (gcc 6.2.0, > >>binutils 2.26.51). I can provide additional details if needed, but we don't > >>have a well enough understanding of the problem to be able to provide a > >>reduced size test case. The test used to reproduce the problem is available > >>at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2, > >>run on the ToT linux kernel. > > Just a suggestion: can you try binutils trunk, too? Alan Modra and > I just tracked down and fixed a bug with the linker creating bad > executables that the kernel's ELF loader couldn't properly map into > memory. IIUC it only affected programs that use dynamic libraries, > but maybe there was more to it than that. In any case it would be > good to know if the problem has already been fixed before > investigating further. Thanks for the suggestion. Just tried it with a kernel compiled with binutils trunk as of today (2.28.51.20170302) and latest gcc snapshot (7.0.1 20170226). Unfortunately, the issue still persists. Tobias
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 03/01/2017 11:58 AM, Sven Schmidt wrote: Hi Guenter, Tobias and Sandra, thanks for your effort here. On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: On 02/28/2017 08:53 AM, Tobias Klauser wrote: (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. It seems a bit strange that code which is not actually called causes problems like that. Please let me know if and how I may help you figure out what's happening, especially regarding the differences between the previous LZ4 and the current implementation. However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can reproduce the problem you see using the instructions Guenter provided in the reply to Sven. I'll try to dig a bit deeper from here on. Any suggestions on what to look out for wrt the differences between the gcc/binutils version are welcome of course. This message doesn't give me enough context to know what is going on, especially without seeing the rest of the thread. Generally speaking, Mentor recommends you use one of our stable releases instead of trying to roll your own from mainline sources. As an upstream binutils and gcc maintainer I do try my best to look at bug reports for those components, but I need a reproducible standalone testcase and specific versions of the different components involved. The problem is also seen with Sourcery CodeBench Lite 2016.11-32 (gcc 6.2.0, binutils 2.26.51). I can provide additional details if needed, but we don't have a well enough understanding of the problem to be able to provide a reduced size test case. The test used to reproduce the problem is available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2, run on the ToT linux kernel. Just a suggestion: can you try binutils trunk, too? Alan Modra and I just tracked down and fixed a bug with the linker creating bad executables that the kernel's ELF loader couldn't properly map into memory. IIUC it only affected programs that use dynamic libraries, but maybe there was more to it than that. In any case it would be good to know if the problem has already been fixed before investigating further. -Sandra
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote: > Hi Guenter, Tobias and Sandra, > > thanks for your effort here. > > On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: > > On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > > > On 02/28/2017 08:53 AM, Tobias Klauser wrote: > > > >(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > > > >for nios2) > > > > > > > >On 2017-02-26 at 22:03:38 +0100, Guenter Roeck > > > >wrote: > > > >>Hi Sven, > > > >> > > > >>my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > > > >>update LZ4 compressor module"). The test hangs early during boot before > > > >>any console output is seen. Reverting the offending patch as well as the > > > >>subsequent lz4 related patches fixes the problem. Disabling > > > >>CONFIG_RD_LZ4 > > > >>and with it other LZ4 options also fixes it (as does adding "return > > > >>-EINVAL;" > > > >>at the top of the LZ4 decompression code). For reference, bisect log > > > >>is attached. > > > >> > > > >>I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > > > >>and binutils 2.26.1. Scripts used to run the tests are available at > > > >>https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > > > >>Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > > > > > > >Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > > > >binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > > > >get a kernel booting on latest master branch. AFAICT, none of the > > > >LZ4_decompress_* functions are called during boot. > > > > > > It seems a bit strange that code which is not actually called causes problems > like that. > Yes, it is, though it is always possible. The code isn't exactly easy to understand; there may be some hidden caveats such as global variables. It may also be that some jump target exceeds its range (though why that would only be seen with the LZ4 code is another question), or that the compiler gets confused by the forced inlines (disabling that didn't make a difference, though, nor did disabling -O3). > Please let me know if and how I may help you figure out what's happening, > especially > regarding the differences between the previous LZ4 and the current > implementation. > For my part I am all but clueless. Unless someone has an idea, we may to disable LZ4 support for nios2 for the time being. Does anyone have thoughts on that ? Of course, that would not help if the problem also affects recent gcc/binutil versions on other architectures. Thanks, Guenter > > > >However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can > > > >reproduce the problem you see using the instructions Guenter provided in > > > >the reply to Sven. > > > > > > > >I'll try to dig a bit deeper from here on. Any suggestions on what to > > > >look out for wrt the differences between the gcc/binutils version are > > > >welcome of course. > > > > > > This message doesn't give me enough context to know what is going on, > > > especially without seeing the rest of the thread. Generally speaking, > > > Mentor recommends you use one of our stable releases instead of trying to > > > roll your own from mainline sources. As an upstream binutils and gcc > > > maintainer I do try my best to look at bug reports for those components, > > > but > > > I need a reproducible standalone testcase and specific versions of the > > > different components involved. > > > > > The problem is also seen with Sourcery CodeBench Lite 2016.11-32 (gcc 6.2.0, > > binutils 2.26.51). I can provide additional details if needed, but we don't > > have a well enough understanding of the problem to be able to provide a > > reduced size test case. The test used to reproduce the problem is available > > at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2, > > run on the ToT linux kernel. > > > > Guenter > > Regards, > > Sven
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
Hi Guenter, Tobias and Sandra, thanks for your effort here. On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote: > On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > > On 02/28/2017 08:53 AM, Tobias Klauser wrote: > > >(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > > >for nios2) > > > > > >On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: > > >>Hi Sven, > > >> > > >>my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > > >>update LZ4 compressor module"). The test hangs early during boot before > > >>any console output is seen. Reverting the offending patch as well as the > > >>subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > > >>and with it other LZ4 options also fixes it (as does adding "return > > >>-EINVAL;" > > >>at the top of the LZ4 decompression code). For reference, bisect log > > >>is attached. > > >> > > >>I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > > >>and binutils 2.26.1. Scripts used to run the tests are available at > > >>https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > > >>Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > > > > >Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > > >binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > > >get a kernel booting on latest master branch. AFAICT, none of the > > >LZ4_decompress_* functions are called during boot. > > > It seems a bit strange that code which is not actually called causes problems like that. Please let me know if and how I may help you figure out what's happening, especially regarding the differences between the previous LZ4 and the current implementation. > > >However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can > > >reproduce the problem you see using the instructions Guenter provided in > > >the reply to Sven. > > > > > >I'll try to dig a bit deeper from here on. Any suggestions on what to > > >look out for wrt the differences between the gcc/binutils version are > > >welcome of course. > > > > This message doesn't give me enough context to know what is going on, > > especially without seeing the rest of the thread. Generally speaking, > > Mentor recommends you use one of our stable releases instead of trying to > > roll your own from mainline sources. As an upstream binutils and gcc > > maintainer I do try my best to look at bug reports for those components, but > > I need a reproducible standalone testcase and specific versions of the > > different components involved. > > > The problem is also seen with Sourcery CodeBench Lite 2016.11-32 (gcc 6.2.0, > binutils 2.26.51). I can provide additional details if needed, but we don't > have a well enough understanding of the problem to be able to provide a > reduced size test case. The test used to reproduce the problem is available > at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2, > run on the ToT linux kernel. > > Guenter Regards, Sven
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On 02/28/2017 08:53 AM, Tobias Klauser wrote: (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can reproduce the problem you see using the instructions Guenter provided in the reply to Sven. I'll try to dig a bit deeper from here on. Any suggestions on what to look out for wrt the differences between the gcc/binutils version are welcome of course. This message doesn't give me enough context to know what is going on, especially without seeing the rest of the thread. Generally speaking, Mentor recommends you use one of our stable releases instead of trying to roll your own from mainline sources. As an upstream binutils and gcc maintainer I do try my best to look at bug reports for those components, but I need a reproducible standalone testcase and specific versions of the different components involved. -Sandra
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote: > On 02/28/2017 08:53 AM, Tobias Klauser wrote: > >(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > >for nios2) > > > >On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: > >>Hi Sven, > >> > >>my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > >>update LZ4 compressor module"). The test hangs early during boot before > >>any console output is seen. Reverting the offending patch as well as the > >>subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > >>and with it other LZ4 options also fixes it (as does adding "return > >>-EINVAL;" > >>at the top of the LZ4 decompression code). For reference, bisect log > >>is attached. > >> > >>I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > >>and binutils 2.26.1. Scripts used to run the tests are available at > >>https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > >>Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > > >Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > >binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > >get a kernel booting on latest master branch. AFAICT, none of the > >LZ4_decompress_* functions are called during boot. > > > >However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can > >reproduce the problem you see using the instructions Guenter provided in > >the reply to Sven. > > > >I'll try to dig a bit deeper from here on. Any suggestions on what to > >look out for wrt the differences between the gcc/binutils version are > >welcome of course. > > This message doesn't give me enough context to know what is going on, > especially without seeing the rest of the thread. Generally speaking, > Mentor recommends you use one of our stable releases instead of trying to > roll your own from mainline sources. As an upstream binutils and gcc > maintainer I do try my best to look at bug reports for those components, but > I need a reproducible standalone testcase and specific versions of the > different components involved. > The problem is also seen with Sourcery CodeBench Lite 2016.11-32 (gcc 6.2.0, binutils 2.26.51). I can provide additional details if needed, but we don't have a well enough understanding of the problem to be able to provide a reduced size test case. The test used to reproduce the problem is available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2, run on the ToT linux kernel. Guenter
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On Tue, Feb 28, 2017 at 04:53:31PM +0100, Tobias Klauser wrote: > (adding Sandra Loosemore to Cc due to possible relation to gcc/binutils > for nios2) > > On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: > > Hi Sven, > > > > my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > > update LZ4 compressor module"). The test hangs early during boot before > > any console output is seen. Reverting the offending patch as well as the > > subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > > and with it other LZ4 options also fixes it (as does adding "return > > -EINVAL;" > > at the top of the LZ4 decompression code). For reference, bisect log > > is attached. > > > > I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > > and binutils 2.26.1. Scripts used to run the tests are available at > > https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > > Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and > binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can > get a kernel booting on latest master branch. AFAICT, none of the > LZ4_decompress_* functions are called during boot. > > However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can > reproduce the problem you see using the instructions Guenter provided in > the reply to Sven. > > I'll try to dig a bit deeper from here on. Any suggestions on what to > look out for wrt the differences between the gcc/binutils version are > welcome of course. > I tried the following combinations of gcc and binutils: gcc 6.1.0; binutils 2.26.1 gcc 6.3.0; binutils 2.25.1 gcc 6.3.0; binutils 2.26.1 gcc 6.3.0; binutils 2.27 I used buildroot 2017.02-rc3 to build the toolchains. All have the problem. Another data point, confirming what you say above: I am relatively (99%) sure that the code in question is not actually called. I added reference counters to make sure that this is the case. Just the existence of the LZ4 decompression code appears to be sufficient to cause the boot failure. Weird. I just hope this only affects nios2. Guenter
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils for nios2) On 2017-02-26 at 22:03:38 +0100, Guenter Roeck wrote: > Hi Sven, > > my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > update LZ4 compressor module"). The test hangs early during boot before > any console output is seen. Reverting the offending patch as well as the > subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" > at the top of the LZ4 decompression code). For reference, bisect log > is attached. > > I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > and binutils 2.26.1. Scripts used to run the tests are available at > https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can get a kernel booting on latest master branch. AFAICT, none of the LZ4_decompress_* functions are called during boot. However, using a self-built GCC 7.0 (20161127) and binutils 2.27 I can reproduce the problem you see using the instructions Guenter provided in the reply to Sven. I'll try to dig a bit deeper from here on. Any suggestions on what to look out for wrt the differences between the gcc/binutils version are welcome of course. Thanks Tobias
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
Hi Guenter, thanks for your testing! I must admit, I'm fairly new to kernel development and a little overwhelmed by all that tools used. So I do not really know how to reproduce your test using your script. I installed qemu from the master branch and buildroot. Unfortunately, that's the point I'm stuck. I would be grateful if you provide me some lead how to continue. Would I make the kernel using ARCH=nios2 and a defconf and pass it to qemu? What arguments do I provide to that script (especially, the machine param)? On Sun, Feb 26, 2017 at 01:03:38PM -0800, Guenter Roeck wrote: > Hi Sven, > > my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > update LZ4 compressor module"). The test hangs early during boot before > any console output is seen. Reverting the offending patch as well as the > subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" > at the top of the LZ4 decompression code). For reference, bisect log > is attached. > So, seems like it's the decompressor? Which decompression code do you mean exactly? LZ4_decompress_fast/_safe/_generic? Since the decompression functions worked fine in all previous tests, and this is a problem during boot, my first guess would be the lib/decompress_unlz4.c, providing the functions for decompressing a lz4-compressed kernel image. But then it should only result in problems when the kernel image is compressed, wouldn't it? > I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > and binutils 2.26.1. Scripts used to run the tests are available at > https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > I tried to track down the problem, with no success. Just the presence > of the LZ4 code seems to be sufficient to cause the problem; I have > no idea why that would be the case. > Maybe there's someone who has an idea and/or is experiencing similar issues. Hopefully, we can track this down. > Please let me know if there is anything I can do to help tracking down > the problem. > > Thanks, > Guenter > > --- > # bad: [c4f3f22eddc982d247ffe2a6690c3e4a5c46dd09] Merge tag > 'linux-kselftest-4.11-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest > # good: [9e314890292c0dd357eadef6a043704fa0b4c157] Merge tag > 'openrisc-for-linus' of git://github.com/openrisc/linux > git bisect start 'HEAD' '9e31489' > # bad: [7067739df23ffd641ca99c967830e0ed2ba39eab] Merge branch 'i2c/for-4.11' > of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux > git bisect bad 7067739df23ffd641ca99c967830e0ed2ba39eab > # good: [c5adae9583ef6985875532904160c6bf9f07b453] lib: add CONFIG_TEST_SORT > to enable self-test of sort() > git bisect good c5adae9583ef6985875532904160c6bf9f07b453 > # bad: [edccb59429657b09806146339e2b27594c1d1da0] Merge tag 'fbdev-v4.11' of > git://github.com/bzolnier/linux > git bisect bad edccb59429657b09806146339e2b27594c1d1da0 > # good: [72db33355c1431fefcabb06c9c25705e8226eed5] fbdev: ssd1307fb: Start to > use gpiod API for reset gpio > git bisect good 72db33355c1431fefcabb06c9c25705e8226eed5 > # bad: [95330473636e5e4546f94874c957c3be66bb2140] checkpatch: remove false > unbalanced braces warning > git bisect bad 95330473636e5e4546f94874c957c3be66bb2140 > # bad: [69c78423b8f439b077929410bdf8f88e7031b891] lib/lz4: remove back-compat > wrappers > git bisect bad 69c78423b8f439b077929410bdf8f88e7031b891 > # bad: [e23d54e48346e775be53b3cc25a95d65da960393] lib/decompress_unlz4: > change module to work with new LZ4 module version > git bisect bad e23d54e48346e775be53b3cc25a95d65da960393 > # bad: [4e1a33b105ddf201f66dcc44490c6086a25eca0b] lib: update LZ4 compressor > module > git bisect bad 4e1a33b105ddf201f66dcc44490c6086a25eca0b > # good: [8893f519330bb073a49c5b4676fce4be6f1be15d] lib/test_sort.c: make it > explicitly non-modular > git bisect good 8893f519330bb073a49c5b4676fce4be6f1be15d > # first bad commit: [4e1a33b105ddf201f66dcc44490c6086a25eca0b] lib: update > LZ4 compressor module Thank you, Sven
Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
On Mon, Feb 27, 2017 at 08:34:55PM +0100, Sven Schmidt wrote: > Hi Guenter, > > thanks for your testing! > > I must admit, I'm fairly new to kernel development and a little overwhelmed > by all that tools used. > So I do not really know how to reproduce your test using your script. I > installed qemu from the master branch and buildroot. > Unfortunately, that's the point I'm stuck. I would be grateful if you provide > me some lead how to continue. > Would I make the kernel using ARCH=nios2 and a defconf and pass it to qemu? > What arguments do I provide to that script > (especially, the machine param)? > run-qemu-nios2.sh doesn't need any parameters, though you would have to update PATH_NIOS2 to match your toolchain and QEMU to match the qemu binary location. Otherwise just run the script from your linux repository. You can also build a nios2 image using 10m50_defconfig and run qemu directly. Just remember to enable CONFIG_NIOS2_PASS_CMDLINE=y and CONFIG_BLK_DEV_INITRD=y. CONFIG_BLK_DEV_INITRD=y enables CONFIG_RD_LZ4 which triggers the problem. path-to-qemu/qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \ -dtb arch/nios2/boot/dts/10m50_devboard.dtb \ --append "rdinit=/sbin/init" \ -initrd busybox-nios2.cpio \ -nographic -monitor none should do it (assuming you copied the root file system from https://github.com/groeck/linux-build-test/blob/master/rootfs/nios2/busybox-nios2.cpio). > On Sun, Feb 26, 2017 at 01:03:38PM -0800, Guenter Roeck wrote: > > Hi Sven, > > > > my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: > > update LZ4 compressor module"). The test hangs early during boot before > > any console output is seen. Reverting the offending patch as well as the > > subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 > > and with it other LZ4 options also fixes it (as does adding "return > > -EINVAL;" > > at the top of the LZ4 decompression code). For reference, bisect log > > is attached. > > > > So, seems like it's the decompressor? Which decompression code do you mean > exactly? LZ4_decompress_fast/_safe/_generic? > Since the decompression functions worked fine in all previous tests, and this > is a problem during boot, my first guess would > be the lib/decompress_unlz4.c, providing the functions for decompressing a > lz4-compressed kernel image. > But then it should only result in problems when the kernel image is > compressed, wouldn't it? > I am booting the uncompressed kernel. Also, if I disable CONFIG_RD_LZ4 in the configuration, everything works just fine. Just the _presence_ of the decompression code seems to trigger the problem. No idea if enabling CONFIG_RD_LZ4 results in some LZ4 compressed code to be generated (I do see usr/initramfs_data.cpio.lz4). I added "return -EINVAL;" to the top of LZ4_decompress_generic(), which also helped. Adding it to the individual decompression functions seemed to be an on/off thing; sometimes it helped, sometimes not. > > I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 > > and binutils 2.26.1. Scripts used to run the tests are available at > > https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. > > Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. > > > > I tried to track down the problem, with no success. Just the presence > > of the LZ4 code seems to be sufficient to cause the problem; I have > > no idea why that would be the case. > > > > Maybe there's someone who has an idea and/or is experiencing similar issues. > Hopefully, we can track this down. > Agreed. For my part I am pretty much out of ideas. I could explicitly disable CONFIG_RD_LZ4 in my tests, but that would really just defeat the purpose. Guenter > > Please let me know if there is anything I can do to help tracking down > > the problem. > > > > Thanks, > > Guenter > > > > --- > > # bad: [c4f3f22eddc982d247ffe2a6690c3e4a5c46dd09] Merge tag > > 'linux-kselftest-4.11-rc1' of > > git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest > > # good: [9e314890292c0dd357eadef6a043704fa0b4c157] Merge tag > > 'openrisc-for-linus' of git://github.com/openrisc/linux > > git bisect start 'HEAD' '9e31489' > > # bad: [7067739df23ffd641ca99c967830e0ed2ba39eab] Merge branch > > 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux > > git bisect bad 7067739df23ffd641ca99c967830e0ed2ba39eab > > # good: [c5adae9583ef6985875532904160c6bf9f07b453] lib: add > > CONFIG_TEST_SORT to enable self-test of sort() > > git bisect good c5adae9583ef6985875532904160c6bf9f07b453 > > # bad: [edccb59429657b09806146339e2b27594c1d1da0] Merge tag 'fbdev-v4.11' > > of git://github.com/bzolnier/linux > > git bisect bad edccb59429657b09806146339e2b27594c1d1da0 > > # good: [72db33355c1431fefcabb06c9c25705e8226eed5] fbdev: ssd1307fb: Start > > to use gpiod API for reset gpio > > git bisect good 72db33355c1431fefcabb06c9c25705e8226
nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'
Hi Sven, my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib: update LZ4 compressor module"). The test hangs early during boot before any console output is seen. Reverting the offending patch as well as the subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4 and with it other LZ4 options also fixes it (as does adding "return -EINVAL;" at the top of the LZ4 decompression code). For reference, bisect log is attached. I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0 and binutils 2.26.1. Scripts used to run the tests are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2. Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied. I tried to track down the problem, with no success. Just the presence of the LZ4 code seems to be sufficient to cause the problem; I have no idea why that would be the case. Please let me know if there is anything I can do to help tracking down the problem. Thanks, Guenter --- # bad: [c4f3f22eddc982d247ffe2a6690c3e4a5c46dd09] Merge tag 'linux-kselftest-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest # good: [9e314890292c0dd357eadef6a043704fa0b4c157] Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux git bisect start 'HEAD' '9e31489' # bad: [7067739df23ffd641ca99c967830e0ed2ba39eab] Merge branch 'i2c/for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux git bisect bad 7067739df23ffd641ca99c967830e0ed2ba39eab # good: [c5adae9583ef6985875532904160c6bf9f07b453] lib: add CONFIG_TEST_SORT to enable self-test of sort() git bisect good c5adae9583ef6985875532904160c6bf9f07b453 # bad: [edccb59429657b09806146339e2b27594c1d1da0] Merge tag 'fbdev-v4.11' of git://github.com/bzolnier/linux git bisect bad edccb59429657b09806146339e2b27594c1d1da0 # good: [72db33355c1431fefcabb06c9c25705e8226eed5] fbdev: ssd1307fb: Start to use gpiod API for reset gpio git bisect good 72db33355c1431fefcabb06c9c25705e8226eed5 # bad: [95330473636e5e4546f94874c957c3be66bb2140] checkpatch: remove false unbalanced braces warning git bisect bad 95330473636e5e4546f94874c957c3be66bb2140 # bad: [69c78423b8f439b077929410bdf8f88e7031b891] lib/lz4: remove back-compat wrappers git bisect bad 69c78423b8f439b077929410bdf8f88e7031b891 # bad: [e23d54e48346e775be53b3cc25a95d65da960393] lib/decompress_unlz4: change module to work with new LZ4 module version git bisect bad e23d54e48346e775be53b3cc25a95d65da960393 # bad: [4e1a33b105ddf201f66dcc44490c6086a25eca0b] lib: update LZ4 compressor module git bisect bad 4e1a33b105ddf201f66dcc44490c6086a25eca0b # good: [8893f519330bb073a49c5b4676fce4be6f1be15d] lib/test_sort.c: make it explicitly non-modular git bisect good 8893f519330bb073a49c5b4676fce4be6f1be15d # first bad commit: [4e1a33b105ddf201f66dcc44490c6086a25eca0b] lib: update LZ4 compressor module