Hi Ming, On 2019-02-21 11:38, Ming Lei wrote: > On Thu, Feb 21, 2019 at 11:22:39AM +0100, Marek Szyprowski wrote: >> On 2019-02-21 11:16, Ming Lei wrote: >>> On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote: >>>> On 2019-02-21 10:57, Ming Lei wrote: >>>>> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote: >>>>>> On 2019-02-15 12:13, Ming Lei wrote: >>>>>>> This patch pulls the trigger for multi-page bvecs. >>>>>>> >>>>>>> Reviewed-by: Omar Sandoval <osan...@fb.com> >>>>>>> Signed-off-by: Ming Lei <ming....@redhat.com> >>>>>> Since Linux next-20190218 I've observed problems with block layer on one >>>>>> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting >>>>>> this issue led me to this change. This is also the first linux-next >>>>>> release with this change merged. The issue is fully reproducible and can >>>>>> be observed in the following kernel log: >>>>>> >>>>>> sdhci: Secure Digital Host Controller Interface driver >>>>>> sdhci: Copyright(c) Pierre Ossman >>>>>> s3c-sdhci 12530000.sdhci: clock source 2: mmc_busclk.2 (100000000 Hz) >>>>>> s3c-sdhci 12530000.sdhci: Got CD GPIO >>>>>> mmc0: SDHCI controller on samsung-hsmmc [12530000.sdhci] using ADMA >>>>>> mmc0: new high speed SDHC card at address aaaa >>>>>> mmcblk0: mmc0:aaaa SL16G 14.8 GiB >>>>>> >>>>>> ... >>>>>> >>>>>> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem >>>>>> EXT4-fs (mmcblk0p2): write access will be enabled during recovery >>>>>> EXT4-fs (mmcblk0p2): recovery complete >>>>>> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: >>>>>> (null) >>>>>> VFS: Mounted root (ext4 filesystem) readonly on device 179:2. >>>>>> devtmpfs: mounted >>>>>> Freeing unused kernel memory: 1024K >>>>>> hub 1-3:1.0: USB hub found >>>>>> Run /sbin/init as init process >>>>>> hub 1-3:1.0: 3 ports detected >>>>>> *** stack smashing detected ***: <unknown> terminated >>>>>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 >>>>>> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546 >>>>>> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) >>>>>> [<c01118d0>] (unwind_backtrace) from [<c010d794>] (show_stack+0x10/0x14) >>>>>> [<c010d794>] (show_stack) from [<c09ff8a4>] (dump_stack+0x90/0xc8) >>>>>> [<c09ff8a4>] (dump_stack) from [<c0125944>] (panic+0xfc/0x304) >>>>>> [<c0125944>] (panic) from [<c012bc98>] (do_exit+0xabc/0xc6c) >>>>>> [<c012bc98>] (do_exit) from [<c012c100>] (do_group_exit+0x3c/0xbc) >>>>>> [<c012c100>] (do_group_exit) from [<c0138908>] (get_signal+0x130/0xbf4) >>>>>> [<c0138908>] (get_signal) from [<c010c7a0>] (do_work_pending+0x130/0x618) >>>>>> [<c010c7a0>] (do_work_pending) from [<c0101034>] >>>>>> (slow_work_pending+0xc/0x20) >>>>>> Exception stack(0xe88c3fb0 to 0xe88c3ff8) >>>>>> 3fa0: 00000000 bea7787c 00000005 >>>>>> b6e8d0b8 >>>>>> 3fc0: bea77a18 b6f92010 b6e8d0b8 00000001 b6e8d0c8 00000001 b6e8c000 >>>>>> bea77b60 >>>>>> 3fe0: 00000020 bea77998 ffffffff b6d52368 60000050 ffffffff >>>>>> CPU3: stopping >>>>>> >>>>>> I would like to help debugging and fixing this issue, but I don't really >>>>>> have idea where to start. Here are some more detailed information about >>>>>> my test system: >>>>>> >>>>>> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree >>>>>> source: arch/arm/boot/dts/exynos4412-odroidu3.dts) >>>>>> >>>>>> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card >>>>>> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device >>>>>> tree) >>>>>> >>>>>> 3. Rootfs: Ext4 >>>>>> >>>>>> 4. Kernel config: arch/arm/configs/exynos_defconfig >>>>>> >>>>>> I can gather more logs if needed, just let me which kernel option to >>>>>> enable. Reverting this commit on top of next-20190218 as well as current >>>>>> linux-next (tested with next-20190221) fixes this issue and makes the >>>>>> system bootable again. >>>>> Could you test the patch in following link and see if it can make a >>>>> difference? >>>>> >>>>> https://marc.info/?l=linux-aio&m=155070355614541&w=2 >>>> I've tested that patch, but it doesn't make any difference on the test >>>> system. In the log I see no warning added by it. >>> I guess it might be related with memory corruption, could you enable the >>> following debug options and post the dmesg log? >>> >>> CONFIG_DEBUG_STACKOVERFLOW=y >>> CONFIG_KASAN=y >> It won't be that easy as none of the above options is available on ARM >> 32bit. I will try to apply some ARM KASAN patches floating on the net >> and let you know the result. > Hi Marek, > > Could you test the following patch?
Yes. Sadly, no change observed. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland