Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()
On Mon 13-05-13 23:08:11, Alexey Khoroshilov wrote: > Hi, Ted! > > Our tests for ext4 with targeted fault injection were stalled in > Uninterruptible Sleep State > when they simulate a memory allocation failure in ext4_init_io_end() while it > is called from > mpage_da_submit_io() or ext4_writepage(). > > It looks like the problems are that pages left locked after failure handling. > > I am not completely sure that my patches take into account all required > aspects, > but the tests are passed if the patches are applied. > > Please find below syslog excerpt for the first issue. > > Found by Linux File System Verification project (linuxtesting.org/spruce). Thanks for the patches! As Zheng said, the patch introducing these problems is reverted from upstream now but we'll push an equivalent version soon so I'll take care to include your fixes in it as well. Honza > > -- > Alexey Khoroshilov > Linux Verification Center, ISPRAS > web: http://linuxtesting.org > > > > [ 1212.454601] Failure is simulated in the following location > [ 1212.454675] Call Trace: > [ 1212.454681] [] dump_stack+0x19/0x1b > [ 1212.454687] [] warn_slowpath_common+0x70/0xa0 > [ 1212.454691] [] ? indicator_simulate+0x29/0x1a0 > [kedr_fsim_indicator_common] > [ 1212.454701] [] ? ext4_init_io_end+0x23/0x50 [ext4] > [ 1212.454705] [] warn_slowpath_fmt+0x46/0x50 > [ 1212.454710] [] ? kedr_fsim_point_simulate+0x7e/0xa0 > [kedr_fault_simulation] > [ 1212.454715] [] ? kedr_fsim_point_simulate+0x5/0xa0 > [kedr_fault_simulation] > [ 1212.454719] [] kedr_repl_kmem_cache_alloc+0x7a/0xb0 > [kedr_fsim_cmm] > [ 1212.454727] [] ? ext4_init_io_end+0x23/0x50 [ext4] > [ 1212.454732] [] > kedr_intermediate_func_kmem_cache_alloc+0x73/0xd0 [kedr_lc_common_mm] > [ 1212.454740] [] ? ext4_init_io_end+0x23/0x50 [ext4] > [ 1212.454746] [] ext4_init_io_end+0x23/0x50 [ext4] > [ 1212.454753] [] mpage_da_submit_io+0x6f/0x380 [ext4] > [ 1212.454762] [] ? > __ext4_handle_dirty_metadata+0xab/0x140 [ext4] > [ 1212.454769] [] ? > jbd2_journal_get_write_access+0x3b/0x50 [jbd2] > [ 1212.454777] [] ? ext4_mark_iloc_dirty+0x468/0x660 [ext4] > [ 1212.454784] [] ? ext4_mark_inode_dirty+0x9a/0x270 [ext4] > [ 1212.454790] [] ? mpage_da_map_and_submit+0x1fc/0x430 > [ext4] > [ 1212.454798] [] mpage_da_map_and_submit+0x10e/0x430 > [ext4] > [ 1212.454804] [] ? ext4_da_writepages+0x377/0x6a0 [ext4] > [ 1212.454811] [] ext4_da_writepages+0x3cc/0x6a0 [ext4] > [ 1212.454815] [] ? __do_fault+0x14a/0x470 > [ 1212.454820] [] do_writepages+0x23/0x40 > [ 1212.454824] [] __filemap_fdatawrite_range+0x59/0x60 > [ 1212.454828] [] filemap_write_and_wait_range+0x3a/0x80 > [ 1212.454835] [] ext4_punch_hole+0x1f5/0x5c0 [ext4] > [ 1212.454839] [] ? __do_page_fault+0x108/0x550 > [ 1212.454843] [] ? do_fallocate+0x101/0x190 > [ 1212.454847] [] ? do_fallocate+0x101/0x190 > [ 1212.454855] [] ext4_fallocate+0x30c/0x5d0 [ext4] > [ 1212.454858] [] ? do_fallocate+0x101/0x190 > [ 1212.454862] [] ? __fput+0x16d/0x2e0 > [ 1212.454865] [] do_fallocate+0x117/0x190 > [ 1212.454869] [] SyS_fallocate+0x57/0x90 > [ 1212.454877] [] system_call_fastpath+0x16/0x1b > [ 1212.454880] ---[ end trace 16f55656139fb9de ]--- > [ 1443.148536] INFO: task flush-8:16:11233 blocked for more than 120 seconds. > [ 1443.148628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 1443.148646] flush-8:16 D 8112d160 0 11233 2 > 0x > [ 1443.148669] 88002fe5d768 0046 > 880082f8b4b8 > [ 1443.148692] 8800300b9fb0 88002fe5dfd8 88002fe5dfd8 > 88002fe5dfd8 > [ 1443.148712] 88002ed75f10 8800300b9fb0 88002fe5d768 > 880082a144e0 > [ 1443.148733] Call Trace: > [ 1443.148764] [] ? __lock_page+0x70/0x70 > [ 1443.148770] [] schedule+0x29/0x70 > [ 1443.148773] [] io_schedule+0x8f/0xd0 > [ 1443.148777] [] sleep_on_page+0xe/0x20 > [ 1443.148781] [] __wait_on_bit_lock+0x5a/0xc0 > [ 1443.148785] [] ? find_get_pages_tag+0x2e/0x1e0 > [ 1443.148790] [] ? __block_write_full_page+0x211/0x390 > [ 1443.148794] [] __lock_page+0x67/0x70 > [ 1443.148799] [] ? autoremove_wake_function+0x50/0x50 > [ 1443.148809] [] ext4_num_dirty_pages.isra.53+0x1fe/0x210 > [ext4] > [ 1443.148814] [] ? pagevec_lookup_tag+0x25/0x40 > [ 1443.148818] [] ? write_cache_pages+0x144/0x4e0 > [ 1443.148825] [] ext4_da_writepages+0x63d/0x6a0 [ext4] > [ 1443.148829] [] ? set_page_dirty_lock+0x70/0x70 > [ 1443.148833] [] ? __writeback_single_inode+0x63/0x310 > [ 1443.148837] [] ? __writeback_single_inode+0x63/0x310 > [ 1443.148841] [] ? writeback_sb_inodes+0x13c/0x550 > [ 1443.148845] [] do_writepages+0x23/0x40 > [ 1443.148849] [] __writeback_single_inode+0x45/0x310 > [ 1443.148853] [] writeback_sb_inodes+0x2c8/0x550 > [ 1443.148856] [] __writeback_inodes_wb+0x9e/0xd0 > [ 1443.148860] [] wb_writeback+0x34b/0x370 > [
Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()
On Mon, May 13, 2013 at 11:08:11PM +0400, Alexey Khoroshilov wrote: > Hi, Ted! > > Our tests for ext4 with targeted fault injection were stalled in > Uninterruptible Sleep State > when they simulate a memory allocation failure in ext4_init_io_end() while it > is called from > mpage_da_submit_io() or ext4_writepage(). > > It looks like the problems are that pages left locked after failure handling. > > I am not completely sure that my patches take into account all required > aspects, > but the tests are passed if the patches are applied. > > Please find below syslog excerpt for the first issue. > > Found by Linux File System Verification project (linuxtesting.org/spruce). Hi Alexey, Thanks for fixing this. The patch series looks good to me. But the commit (ext4: use io_end for multiple bios) has been reverted in dev branch of ext4 tree. I forward the mail to Jan to let him know your fixes. Regards, - Zheng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()
On Mon, May 13, 2013 at 11:08:11PM +0400, Alexey Khoroshilov wrote: Hi, Ted! Our tests for ext4 with targeted fault injection were stalled in Uninterruptible Sleep State when they simulate a memory allocation failure in ext4_init_io_end() while it is called from mpage_da_submit_io() or ext4_writepage(). It looks like the problems are that pages left locked after failure handling. I am not completely sure that my patches take into account all required aspects, but the tests are passed if the patches are applied. Please find below syslog excerpt for the first issue. Found by Linux File System Verification project (linuxtesting.org/spruce). Hi Alexey, Thanks for fixing this. The patch series looks good to me. But the commit (ext4: use io_end for multiple bios) has been reverted in dev branch of ext4 tree. I forward the mail to Jan to let him know your fixes. Regards, - Zheng -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()
On Mon 13-05-13 23:08:11, Alexey Khoroshilov wrote: Hi, Ted! Our tests for ext4 with targeted fault injection were stalled in Uninterruptible Sleep State when they simulate a memory allocation failure in ext4_init_io_end() while it is called from mpage_da_submit_io() or ext4_writepage(). It looks like the problems are that pages left locked after failure handling. I am not completely sure that my patches take into account all required aspects, but the tests are passed if the patches are applied. Please find below syslog excerpt for the first issue. Found by Linux File System Verification project (linuxtesting.org/spruce). Thanks for the patches! As Zheng said, the patch introducing these problems is reverted from upstream now but we'll push an equivalent version soon so I'll take care to include your fixes in it as well. Honza -- Alexey Khoroshilov Linux Verification Center, ISPRAS web: http://linuxtesting.org [ 1212.454601] Failure is simulated in the following location [ 1212.454675] Call Trace: [ 1212.454681] [81637257] dump_stack+0x19/0x1b [ 1212.454687] [81044fa0] warn_slowpath_common+0x70/0xa0 [ 1212.454691] [a02fa869] ? indicator_simulate+0x29/0x1a0 [kedr_fsim_indicator_common] [ 1212.454701] [a0314b43] ? ext4_init_io_end+0x23/0x50 [ext4] [ 1212.454705] [81045086] warn_slowpath_fmt+0x46/0x50 [ 1212.454710] [a020f24e] ? kedr_fsim_point_simulate+0x7e/0xa0 [kedr_fault_simulation] [ 1212.454715] [a020f1d5] ? kedr_fsim_point_simulate+0x5/0xa0 [kedr_fault_simulation] [ 1212.454719] [a02c181a] kedr_repl_kmem_cache_alloc+0x7a/0xb0 [kedr_fsim_cmm] [ 1212.454727] [a0314b43] ? ext4_init_io_end+0x23/0x50 [ext4] [ 1212.454732] [a02b3f23] kedr_intermediate_func_kmem_cache_alloc+0x73/0xd0 [kedr_lc_common_mm] [ 1212.454740] [a0314b43] ? ext4_init_io_end+0x23/0x50 [ext4] [ 1212.454746] [a0314b43] ext4_init_io_end+0x23/0x50 [ext4] [ 1212.454753] [a030bf6f] mpage_da_submit_io+0x6f/0x380 [ext4] [ 1212.454762] [a033aeeb] ? __ext4_handle_dirty_metadata+0xab/0x140 [ext4] [ 1212.454769] [a0280a4b] ? jbd2_journal_get_write_access+0x3b/0x50 [jbd2] [ 1212.454777] [a03105c8] ? ext4_mark_iloc_dirty+0x468/0x660 [ext4] [ 1212.454784] [a03108fa] ? ext4_mark_inode_dirty+0x9a/0x270 [ext4] [ 1212.454790] [a0312a5c] ? mpage_da_map_and_submit+0x1fc/0x430 [ext4] [ 1212.454798] [a031296e] mpage_da_map_and_submit+0x10e/0x430 [ext4] [ 1212.454804] [a0313567] ? ext4_da_writepages+0x377/0x6a0 [ext4] [ 1212.454811] [a03135bc] ext4_da_writepages+0x3cc/0x6a0 [ext4] [ 1212.454815] [8115360a] ? __do_fault+0x14a/0x470 [ 1212.454820] [8113a123] do_writepages+0x23/0x40 [ 1212.454824] [8112ed29] __filemap_fdatawrite_range+0x59/0x60 [ 1212.454828] [8112ed6a] filemap_write_and_wait_range+0x3a/0x80 [ 1212.454835] [a0311f65] ext4_punch_hole+0x1f5/0x5c0 [ext4] [ 1212.454839] [81642b88] ? __do_page_fault+0x108/0x550 [ 1212.454843] [8118a981] ? do_fallocate+0x101/0x190 [ 1212.454847] [8118a981] ? do_fallocate+0x101/0x190 [ 1212.454855] [a0339e3c] ext4_fallocate+0x30c/0x5d0 [ext4] [ 1212.454858] [8118a981] ? do_fallocate+0x101/0x190 [ 1212.454862] [8118de0d] ? __fput+0x16d/0x2e0 [ 1212.454865] [8118a997] do_fallocate+0x117/0x190 [ 1212.454869] [8118aa67] SyS_fallocate+0x57/0x90 [ 1212.454877] [81647782] system_call_fastpath+0x16/0x1b [ 1212.454880] ---[ end trace 16f55656139fb9de ]--- [ 1443.148536] INFO: task flush-8:16:11233 blocked for more than 120 seconds. [ 1443.148628] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 1443.148646] flush-8:16 D 8112d160 0 11233 2 0x [ 1443.148669] 88002fe5d768 0046 880082f8b4b8 [ 1443.148692] 8800300b9fb0 88002fe5dfd8 88002fe5dfd8 88002fe5dfd8 [ 1443.148712] 88002ed75f10 8800300b9fb0 88002fe5d768 880082a144e0 [ 1443.148733] Call Trace: [ 1443.148764] [8112d160] ? __lock_page+0x70/0x70 [ 1443.148770] [8163d2e9] schedule+0x29/0x70 [ 1443.148773] [8163d3bf] io_schedule+0x8f/0xd0 [ 1443.148777] [8112d16e] sleep_on_page+0xe/0x20 [ 1443.148781] [8163a3aa] __wait_on_bit_lock+0x5a/0xc0 [ 1443.148785] [8112dd0e] ? find_get_pages_tag+0x2e/0x1e0 [ 1443.148790] [811c25b1] ? __block_write_full_page+0x211/0x390 [ 1443.148794] [8112d157] __lock_page+0x67/0x70 [ 1443.148799] [8106f090] ? autoremove_wake_function+0x50/0x50 [ 1443.148809] [a030d07e] ext4_num_dirty_pages.isra.53+0x1fe/0x210 [ext4] [ 1443.148814]