Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs
On Thu, Feb 21, 2019 at 11:22:39AM +0100, Marek Szyprowski wrote: > Hi Ming, > > On 2019-02-21 11:16, Ming Lei wrote: > > On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote: > >> On 2019-02-21 10:57, Ming Lei wrote: > >>> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote: > On 2019-02-15 12:13, Ming Lei wrote: > > This patch pulls the trigger for multi-page bvecs. > > > > Reviewed-by: Omar Sandoval > > Signed-off-by: Ming Lei > Since Linux next-20190218 I've observed problems with block layer on one > of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting > this issue led me to this change. This is also the first linux-next > release with this change merged. The issue is fully reproducible and can > be observed in the following kernel log: > > sdhci: Secure Digital Host Controller Interface driver > sdhci: Copyright(c) Pierre Ossman > s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz) > s3c-sdhci 1253.sdhci: Got CD GPIO > mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA > mmc0: new high speed SDHC card at address > mmcblk0: mmc0: SL16G 14.8 GiB > > ... > > EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem > EXT4-fs (mmcblk0p2): write access will be enabled during recovery > EXT4-fs (mmcblk0p2): recovery complete > EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: > (null) > VFS: Mounted root (ext4 filesystem) readonly on device 179:2. > devtmpfs: mounted > Freeing unused kernel memory: 1024K > hub 1-3:1.0: USB hub found > Run /sbin/init as init process > hub 1-3:1.0: 3 ports detected > *** stack smashing detected ***: terminated > Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 > CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546 > Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) > [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [] (show_stack) from [] (dump_stack+0x90/0xc8) > [] (dump_stack) from [] (panic+0xfc/0x304) > [] (panic) from [] (do_exit+0xabc/0xc6c) > [] (do_exit) from [] (do_group_exit+0x3c/0xbc) > [] (do_group_exit) from [] (get_signal+0x130/0xbf4) > [] (get_signal) from [] (do_work_pending+0x130/0x618) > [] (do_work_pending) from [] > (slow_work_pending+0xc/0x20) > Exception stack(0xe88c3fb0 to 0xe88c3ff8) > 3fa0: bea7787c 0005 > b6e8d0b8 > 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000 > bea77b60 > 3fe0: 0020 bea77998 b6d52368 6050 > CPU3: stopping > > I would like to help debugging and fixing this issue, but I don't really > have idea where to start. Here are some more detailed information about > my test system: > > 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree > source: arch/arm/boot/dts/exynos4412-odroidu3.dts) > > 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card > (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device > tree) > > 3. Rootfs: Ext4 > > 4. Kernel config: arch/arm/configs/exynos_defconfig > > I can gather more logs if needed, just let me which kernel option to > enable. Reverting this commit on top of next-20190218 as well as current > linux-next (tested with next-20190221) fixes this issue and makes the > system bootable again. > >>> Could you test the patch in following link and see if it can make a > >>> difference? > >>> > >>> https://marc.info/?l=linux-aio&m=155070355614541&w=2 > >> I've tested that patch, but it doesn't make any difference on the test > >> system. In the log I see no warning added by it. > > I guess it might be related with memory corruption, could you enable the > > following debug options and post the dmesg log? > > > > CONFIG_DEBUG_STACKOVERFLOW=y > > CONFIG_KASAN=y > > It won't be that easy as none of the above options is available on ARM > 32bit. I will try to apply some ARM KASAN patches floating on the net > and let you know the result. Hi Marek, Could you test the following patch? diff --git a/block/bounce.c b/block/bounce.c index add085e28b1d..0c618c0b3cf8 100644 --- a/block/bounce.c +++ b/block/bounce.c @@ -295,7 +295,6 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig, bool bounce = false; int sectors = 0; bool passthrough = bio_is_passthrough(*bio_orig); - struct bvec_iter_all iter_all; bio_for_each_segment(from, *bio_orig, iter) { if (i++ < BIO_MAX_PAGES) @@ -315,7 +314,8 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig, bio = bounce_clo
Re: [Cluster-devel] [dm-devel] [PATCH V15 00/18] block: support multi-page bvec
On Sun, 2019-02-17 at 21:11 +0800, Ming Lei wrote: > The following patch should fix this issue: > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index bed065904677..066b66430523 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -363,13 +363,15 @@ static unsigned int __blk_recalc_rq_segments(struct > request_queue *q, > struct bio_vec bv, bvprv = { NULL }; > int prev = 0; > unsigned int seg_size, nr_phys_segs; > - unsigned front_seg_size = bio->bi_seg_front_size; > + unsigned front_seg_size; > struct bio *fbio, *bbio; > struct bvec_iter iter; > > if (!bio) > return 0; > > + front_seg_size = bio->bi_seg_front_size; > + > switch (bio_op(bio)) { > case REQ_OP_DISCARD: > case REQ_OP_SECURE_ERASE: Hi Ming, With this patch applied test nvmeof-mp/002 fails as follows: [ 694.700400] kernel BUG at lib/sg_pool.c:103! [ 694.705932] invalid opcode: [#1] PREEMPT SMP KASAN [ 694.708297] CPU: 2 PID: 349 Comm: kworker/2:1H Tainted: GB 5.0.0-rc6-dbg+ #2 [ 694.711730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 694.715113] Workqueue: kblockd blk_mq_run_work_fn [ 694.716894] RIP: 0010:sg_alloc_table_chained+0xe5/0xf0 [ 694.758222] Call Trace: [ 694.759645] nvme_rdma_queue_rq+0x2aa/0xcc0 [nvme_rdma] [ 694.764915] blk_mq_try_issue_directly+0x2a5/0x4b0 [ 694.771779] blk_insert_cloned_request+0x11e/0x1c0 [ 694.778417] dm_mq_queue_rq+0x3d1/0x770 [ 694.793400] blk_mq_dispatch_rq_list+0x5fc/0xb10 [ 694.798386] blk_mq_sched_dispatch_requests+0x2f7/0x300 [ 694.803180] __blk_mq_run_hw_queue+0xd6/0x180 [ 694.808933] blk_mq_run_work_fn+0x27/0x30 [ 694.810315] process_one_work+0x4f1/0xa40 [ 694.813178] worker_thread+0x67/0x5b0 [ 694.814487] kthread+0x1cf/0x1f0 [ 694.819134] ret_from_fork+0x24/0x30 The code in sg_pool.c that triggers the BUG() statement is as follows: int sg_alloc_table_chained(struct sg_table *table, int nents, struct scatterlist *first_chunk) { int ret; BUG_ON(!nents); [ ... ] Bart.
Re: [Cluster-devel] [dm-devel] [PATCH V15 00/18] block: support multi-page bvec
On 2/19/19 5:17 PM, Ming Lei wrote: On Tue, Feb 19, 2019 at 08:28:19AM -0800, Bart Van Assche wrote: With this patch applied test nvmeof-mp/002 fails as follows: [ 694.700400] kernel BUG at lib/sg_pool.c:103! [ 694.705932] invalid opcode: [#1] PREEMPT SMP KASAN [ 694.708297] CPU: 2 PID: 349 Comm: kworker/2:1H Tainted: GB 5.0.0-rc6-dbg+ #2 [ 694.711730] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 694.715113] Workqueue: kblockd blk_mq_run_work_fn [ 694.716894] RIP: 0010:sg_alloc_table_chained+0xe5/0xf0 [ 694.758222] Call Trace: [ 694.759645] nvme_rdma_queue_rq+0x2aa/0xcc0 [nvme_rdma] [ 694.764915] blk_mq_try_issue_directly+0x2a5/0x4b0 [ 694.771779] blk_insert_cloned_request+0x11e/0x1c0 [ 694.778417] dm_mq_queue_rq+0x3d1/0x770 [ 694.793400] blk_mq_dispatch_rq_list+0x5fc/0xb10 [ 694.798386] blk_mq_sched_dispatch_requests+0x2f7/0x300 [ 694.803180] __blk_mq_run_hw_queue+0xd6/0x180 [ 694.808933] blk_mq_run_work_fn+0x27/0x30 [ 694.810315] process_one_work+0x4f1/0xa40 [ 694.813178] worker_thread+0x67/0x5b0 [ 694.814487] kthread+0x1cf/0x1f0 [ 694.819134] ret_from_fork+0x24/0x30 The code in sg_pool.c that triggers the BUG() statement is as follows: int sg_alloc_table_chained(struct sg_table *table, int nents, struct scatterlist *first_chunk) { int ret; BUG_ON(!nents); [ ... ] Bart. I can reproduce this issue("kernel BUG at lib/sg_pool.c:103") without mp-bvec patches, so looks it isn't the fault of this patchset. Thanks Ming for your feedback. Jens, I don't see that issue with kernel v5.0-rc6. Does that mean that the sg_pool BUG() is a regression in your for-next branch that predates Ming's multi-page bvec patch series? Thanks, Bart.
Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs
Hi Ming, On 2019-02-21 10:57, Ming Lei wrote: > On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote: >> On 2019-02-15 12:13, Ming Lei wrote: >>> This patch pulls the trigger for multi-page bvecs. >>> >>> Reviewed-by: Omar Sandoval >>> Signed-off-by: Ming Lei >> Since Linux next-20190218 I've observed problems with block layer on one >> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting >> this issue led me to this change. This is also the first linux-next >> release with this change merged. The issue is fully reproducible and can >> be observed in the following kernel log: >> >> sdhci: Secure Digital Host Controller Interface driver >> sdhci: Copyright(c) Pierre Ossman >> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz) >> s3c-sdhci 1253.sdhci: Got CD GPIO >> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA >> mmc0: new high speed SDHC card at address >> mmcblk0: mmc0: SL16G 14.8 GiB >> >> ... >> >> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem >> EXT4-fs (mmcblk0p2): write access will be enabled during recovery >> EXT4-fs (mmcblk0p2): recovery complete >> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null) >> VFS: Mounted root (ext4 filesystem) readonly on device 179:2. >> devtmpfs: mounted >> Freeing unused kernel memory: 1024K >> hub 1-3:1.0: USB hub found >> Run /sbin/init as init process >> hub 1-3:1.0: 3 ports detected >> *** stack smashing detected ***: terminated >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 >> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546 >> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) >> [] (unwind_backtrace) from [] (show_stack+0x10/0x14) >> [] (show_stack) from [] (dump_stack+0x90/0xc8) >> [] (dump_stack) from [] (panic+0xfc/0x304) >> [] (panic) from [] (do_exit+0xabc/0xc6c) >> [] (do_exit) from [] (do_group_exit+0x3c/0xbc) >> [] (do_group_exit) from [] (get_signal+0x130/0xbf4) >> [] (get_signal) from [] (do_work_pending+0x130/0x618) >> [] (do_work_pending) from [] >> (slow_work_pending+0xc/0x20) >> Exception stack(0xe88c3fb0 to 0xe88c3ff8) >> 3fa0: bea7787c 0005 >> b6e8d0b8 >> 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000 >> bea77b60 >> 3fe0: 0020 bea77998 b6d52368 6050 >> CPU3: stopping >> >> I would like to help debugging and fixing this issue, but I don't really >> have idea where to start. Here are some more detailed information about >> my test system: >> >> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree >> source: arch/arm/boot/dts/exynos4412-odroidu3.dts) >> >> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card >> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device >> tree) >> >> 3. Rootfs: Ext4 >> >> 4. Kernel config: arch/arm/configs/exynos_defconfig >> >> I can gather more logs if needed, just let me which kernel option to >> enable. Reverting this commit on top of next-20190218 as well as current >> linux-next (tested with next-20190221) fixes this issue and makes the >> system bootable again. > Could you test the patch in following link and see if it can make a > difference? > > https://marc.info/?l=linux-aio&m=155070355614541&w=2 I've tested that patch, but it doesn't make any difference on the test system. In the log I see no warning added by it. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs
Dear All, On 2019-02-15 12:13, Ming Lei wrote: > This patch pulls the trigger for multi-page bvecs. > > Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei Since Linux next-20190218 I've observed problems with block layer on one of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting this issue led me to this change. This is also the first linux-next release with this change merged. The issue is fully reproducible and can be observed in the following kernel log: sdhci: Secure Digital Host Controller Interface driver sdhci: Copyright(c) Pierre Ossman s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz) s3c-sdhci 1253.sdhci: Got CD GPIO mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA mmc0: new high speed SDHC card at address mmcblk0: mmc0: SL16G 14.8 GiB ... EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem EXT4-fs (mmcblk0p2): write access will be enabled during recovery EXT4-fs (mmcblk0p2): recovery complete EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null) VFS: Mounted root (ext4 filesystem) readonly on device 179:2. devtmpfs: mounted Freeing unused kernel memory: 1024K hub 1-3:1.0: USB hub found Run /sbin/init as init process hub 1-3:1.0: 3 ports detected *** stack smashing detected ***: terminated Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x90/0xc8) [] (dump_stack) from [] (panic+0xfc/0x304) [] (panic) from [] (do_exit+0xabc/0xc6c) [] (do_exit) from [] (do_group_exit+0x3c/0xbc) [] (do_group_exit) from [] (get_signal+0x130/0xbf4) [] (get_signal) from [] (do_work_pending+0x130/0x618) [] (do_work_pending) from [] (slow_work_pending+0xc/0x20) Exception stack(0xe88c3fb0 to 0xe88c3ff8) 3fa0: bea7787c 0005 b6e8d0b8 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000 bea77b60 3fe0: 0020 bea77998 b6d52368 6050 CPU3: stopping I would like to help debugging and fixing this issue, but I don't really have idea where to start. Here are some more detailed information about my test system: 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree source: arch/arm/boot/dts/exynos4412-odroidu3.dts) 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device tree) 3. Rootfs: Ext4 4. Kernel config: arch/arm/configs/exynos_defconfig I can gather more logs if needed, just let me which kernel option to enable. Reverting this commit on top of next-20190218 as well as current linux-next (tested with next-20190221) fixes this issue and makes the system bootable again. > --- > block/bio.c | 22 +++--- > fs/iomap.c | 4 ++-- > fs/xfs/xfs_aops.c | 4 ++-- > include/linux/bio.h | 2 +- > 4 files changed, 20 insertions(+), 12 deletions(-) > > diff --git a/block/bio.c b/block/bio.c > index 968b12fea564..83a2dfa417ca 100644 > --- a/block/bio.c > +++ b/block/bio.c > @@ -753,6 +753,8 @@ EXPORT_SYMBOL(bio_add_pc_page); > * @page: page to add > * @len: length of the data to add > * @off: offset of the data in @page > + * @same_page: if %true only merge if the new data is in the same physical > + * page as the last segment of the bio. > * > * Try to add the data at @page + @off to the last bvec of @bio. This is a > * a useful optimisation for file systems with a block size smaller than the > @@ -761,19 +763,25 @@ EXPORT_SYMBOL(bio_add_pc_page); > * Return %true on success or %false on failure. > */ > bool __bio_try_merge_page(struct bio *bio, struct page *page, > - unsigned int len, unsigned int off) > + unsigned int len, unsigned int off, bool same_page) > { > if (WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED))) > return false; > > if (bio->bi_vcnt > 0) { > struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1]; > + phys_addr_t vec_end_addr = page_to_phys(bv->bv_page) + > + bv->bv_offset + bv->bv_len - 1; > + phys_addr_t page_addr = page_to_phys(page); > > - if (page == bv->bv_page && off == bv->bv_offset + bv->bv_len) { > - bv->bv_len += len; > - bio->bi_iter.bi_size += len; > - return true; > - } > + if (vec_end_addr + 1 != page_addr + off) > + return false; > + if (same_page && (vec_end_addr & PAGE_MASK) != page_addr) > + return false; > + > + bv->bv_len += len; > + bio->bi_iter.bi_size += len; > + return true; > } > return false; > } > @@ -819,7 +827,7
Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs
Hi Ming, On 2019-02-21 11:16, Ming Lei wrote: > On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote: >> On 2019-02-21 10:57, Ming Lei wrote: >>> On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote: On 2019-02-15 12:13, Ming Lei wrote: > This patch pulls the trigger for multi-page bvecs. > > Reviewed-by: Omar Sandoval > Signed-off-by: Ming Lei Since Linux next-20190218 I've observed problems with block layer on one of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting this issue led me to this change. This is also the first linux-next release with this change merged. The issue is fully reproducible and can be observed in the following kernel log: sdhci: Secure Digital Host Controller Interface driver sdhci: Copyright(c) Pierre Ossman s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz) s3c-sdhci 1253.sdhci: Got CD GPIO mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA mmc0: new high speed SDHC card at address mmcblk0: mmc0: SL16G 14.8 GiB ... EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem EXT4-fs (mmcblk0p2): write access will be enabled during recovery EXT4-fs (mmcblk0p2): recovery complete EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null) VFS: Mounted root (ext4 filesystem) readonly on device 179:2. devtmpfs: mounted Freeing unused kernel memory: 1024K hub 1-3:1.0: USB hub found Run /sbin/init as init process hub 1-3:1.0: 3 ports detected *** stack smashing detected ***: terminated Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [] (show_stack) from [] (dump_stack+0x90/0xc8) [] (dump_stack) from [] (panic+0xfc/0x304) [] (panic) from [] (do_exit+0xabc/0xc6c) [] (do_exit) from [] (do_group_exit+0x3c/0xbc) [] (do_group_exit) from [] (get_signal+0x130/0xbf4) [] (get_signal) from [] (do_work_pending+0x130/0x618) [] (do_work_pending) from [] (slow_work_pending+0xc/0x20) Exception stack(0xe88c3fb0 to 0xe88c3ff8) 3fa0: bea7787c 0005 b6e8d0b8 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000 bea77b60 3fe0: 0020 bea77998 b6d52368 6050 CPU3: stopping I would like to help debugging and fixing this issue, but I don't really have idea where to start. Here are some more detailed information about my test system: 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree source: arch/arm/boot/dts/exynos4412-odroidu3.dts) 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device tree) 3. Rootfs: Ext4 4. Kernel config: arch/arm/configs/exynos_defconfig I can gather more logs if needed, just let me which kernel option to enable. Reverting this commit on top of next-20190218 as well as current linux-next (tested with next-20190221) fixes this issue and makes the system bootable again. >>> Could you test the patch in following link and see if it can make a >>> difference? >>> >>> https://marc.info/?l=linux-aio&m=155070355614541&w=2 >> I've tested that patch, but it doesn't make any difference on the test >> system. In the log I see no warning added by it. > I guess it might be related with memory corruption, could you enable the > following debug options and post the dmesg log? > > CONFIG_DEBUG_STACKOVERFLOW=y > CONFIG_KASAN=y It won't be that easy as none of the above options is available on ARM 32bit. I will try to apply some ARM KASAN patches floating on the net and let you know the result. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
Re: [Cluster-devel] [PATCH V15 14/18] block: enable multipage bvecs
Hi Ming, On 2019-02-21 11:38, Ming Lei wrote: > On Thu, Feb 21, 2019 at 11:22:39AM +0100, Marek Szyprowski wrote: >> On 2019-02-21 11:16, Ming Lei wrote: >>> On Thu, Feb 21, 2019 at 11:08:19AM +0100, Marek Szyprowski wrote: On 2019-02-21 10:57, Ming Lei wrote: > On Thu, Feb 21, 2019 at 09:42:59AM +0100, Marek Szyprowski wrote: >> On 2019-02-15 12:13, Ming Lei wrote: >>> This patch pulls the trigger for multi-page bvecs. >>> >>> Reviewed-by: Omar Sandoval >>> Signed-off-by: Ming Lei >> Since Linux next-20190218 I've observed problems with block layer on one >> of my test devices (Odroid U3 with EXT4 rootfs on SD card). Bisecting >> this issue led me to this change. This is also the first linux-next >> release with this change merged. The issue is fully reproducible and can >> be observed in the following kernel log: >> >> sdhci: Secure Digital Host Controller Interface driver >> sdhci: Copyright(c) Pierre Ossman >> s3c-sdhci 1253.sdhci: clock source 2: mmc_busclk.2 (1 Hz) >> s3c-sdhci 1253.sdhci: Got CD GPIO >> mmc0: SDHCI controller on samsung-hsmmc [1253.sdhci] using ADMA >> mmc0: new high speed SDHC card at address >> mmcblk0: mmc0: SL16G 14.8 GiB >> >> ... >> >> EXT4-fs (mmcblk0p2): INFO: recovery required on readonly filesystem >> EXT4-fs (mmcblk0p2): write access will be enabled during recovery >> EXT4-fs (mmcblk0p2): recovery complete >> EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: >> (null) >> VFS: Mounted root (ext4 filesystem) readonly on device 179:2. >> devtmpfs: mounted >> Freeing unused kernel memory: 1024K >> hub 1-3:1.0: USB hub found >> Run /sbin/init as init process >> hub 1-3:1.0: 3 ports detected >> *** stack smashing detected ***: terminated >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0004 >> CPU: 1 PID: 1 Comm: init Not tainted 5.0.0-rc6-next-20190218 #1546 >> Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) >> [] (unwind_backtrace) from [] (show_stack+0x10/0x14) >> [] (show_stack) from [] (dump_stack+0x90/0xc8) >> [] (dump_stack) from [] (panic+0xfc/0x304) >> [] (panic) from [] (do_exit+0xabc/0xc6c) >> [] (do_exit) from [] (do_group_exit+0x3c/0xbc) >> [] (do_group_exit) from [] (get_signal+0x130/0xbf4) >> [] (get_signal) from [] (do_work_pending+0x130/0x618) >> [] (do_work_pending) from [] >> (slow_work_pending+0xc/0x20) >> Exception stack(0xe88c3fb0 to 0xe88c3ff8) >> 3fa0: bea7787c 0005 >> b6e8d0b8 >> 3fc0: bea77a18 b6f92010 b6e8d0b8 0001 b6e8d0c8 0001 b6e8c000 >> bea77b60 >> 3fe0: 0020 bea77998 b6d52368 6050 >> CPU3: stopping >> >> I would like to help debugging and fixing this issue, but I don't really >> have idea where to start. Here are some more detailed information about >> my test system: >> >> 1. Board: ARM 32bit Samsung Exynos4412-based Odroid U3 (device tree >> source: arch/arm/boot/dts/exynos4412-odroidu3.dts) >> >> 2. Block device: MMC/SDHCI/SDHCI-S3C with SD card >> (drivers/mmc/host/sdhci-s3c.c driver, sdhci_2 device node in the device >> tree) >> >> 3. Rootfs: Ext4 >> >> 4. Kernel config: arch/arm/configs/exynos_defconfig >> >> I can gather more logs if needed, just let me which kernel option to >> enable. Reverting this commit on top of next-20190218 as well as current >> linux-next (tested with next-20190221) fixes this issue and makes the >> system bootable again. > Could you test the patch in following link and see if it can make a > difference? > > https://marc.info/?l=linux-aio&m=155070355614541&w=2 I've tested that patch, but it doesn't make any difference on the test system. In the log I see no warning added by it. >>> I guess it might be related with memory corruption, could you enable the >>> following debug options and post the dmesg log? >>> >>> CONFIG_DEBUG_STACKOVERFLOW=y >>> CONFIG_KASAN=y >> It won't be that easy as none of the above options is available on ARM >> 32bit. I will try to apply some ARM KASAN patches floating on the net >> and let you know the result. > Hi Marek, > > Could you test the following patch? Yes. Sadly, no change observed. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland