Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()

2013-05-14 Thread Jan Kara
On Mon 13-05-13 23:08:11, Alexey Khoroshilov wrote:
> Hi, Ted!
> 
> Our tests for ext4 with targeted fault injection were stalled in 
> Uninterruptible Sleep State 
> when they simulate a memory allocation failure in ext4_init_io_end() while it 
> is called from 
> mpage_da_submit_io() or ext4_writepage().
> 
> It looks like the problems are that pages left locked after failure handling.
> 
> I am not completely sure that my patches take into account all required 
> aspects, 
> but the tests are passed if the patches are applied.
> 
> Please find below syslog excerpt for the first issue.
> 
> Found by Linux File System Verification project (linuxtesting.org/spruce). 
  Thanks for the patches! As Zheng said, the patch introducing these
problems is reverted from upstream now but we'll push an equivalent version
soon so I'll take care to include your fixes in it as well.

Honza
> 
> --
> Alexey Khoroshilov
> Linux Verification Center, ISPRAS
> web: http://linuxtesting.org
> 
> 
> 
> [ 1212.454601] Failure is simulated in the following location
> [ 1212.454675] Call Trace:
> [ 1212.454681]  [] dump_stack+0x19/0x1b
> [ 1212.454687]  [] warn_slowpath_common+0x70/0xa0
> [ 1212.454691]  [] ? indicator_simulate+0x29/0x1a0 
> [kedr_fsim_indicator_common]
> [ 1212.454701]  [] ? ext4_init_io_end+0x23/0x50 [ext4]
> [ 1212.454705]  [] warn_slowpath_fmt+0x46/0x50
> [ 1212.454710]  [] ? kedr_fsim_point_simulate+0x7e/0xa0 
> [kedr_fault_simulation]
> [ 1212.454715]  [] ? kedr_fsim_point_simulate+0x5/0xa0 
> [kedr_fault_simulation]
> [ 1212.454719]  [] kedr_repl_kmem_cache_alloc+0x7a/0xb0 
> [kedr_fsim_cmm]
> [ 1212.454727]  [] ? ext4_init_io_end+0x23/0x50 [ext4]
> [ 1212.454732]  [] 
> kedr_intermediate_func_kmem_cache_alloc+0x73/0xd0 [kedr_lc_common_mm]
> [ 1212.454740]  [] ? ext4_init_io_end+0x23/0x50 [ext4]
> [ 1212.454746]  [] ext4_init_io_end+0x23/0x50 [ext4]
> [ 1212.454753]  [] mpage_da_submit_io+0x6f/0x380 [ext4]
> [ 1212.454762]  [] ? 
> __ext4_handle_dirty_metadata+0xab/0x140 [ext4]
> [ 1212.454769]  [] ? 
> jbd2_journal_get_write_access+0x3b/0x50 [jbd2]
> [ 1212.454777]  [] ? ext4_mark_iloc_dirty+0x468/0x660 [ext4]
> [ 1212.454784]  [] ? ext4_mark_inode_dirty+0x9a/0x270 [ext4]
> [ 1212.454790]  [] ? mpage_da_map_and_submit+0x1fc/0x430 
> [ext4]
> [ 1212.454798]  [] mpage_da_map_and_submit+0x10e/0x430 
> [ext4]
> [ 1212.454804]  [] ? ext4_da_writepages+0x377/0x6a0 [ext4]
> [ 1212.454811]  [] ext4_da_writepages+0x3cc/0x6a0 [ext4]
> [ 1212.454815]  [] ? __do_fault+0x14a/0x470
> [ 1212.454820]  [] do_writepages+0x23/0x40
> [ 1212.454824]  [] __filemap_fdatawrite_range+0x59/0x60
> [ 1212.454828]  [] filemap_write_and_wait_range+0x3a/0x80
> [ 1212.454835]  [] ext4_punch_hole+0x1f5/0x5c0 [ext4]
> [ 1212.454839]  [] ? __do_page_fault+0x108/0x550
> [ 1212.454843]  [] ? do_fallocate+0x101/0x190
> [ 1212.454847]  [] ? do_fallocate+0x101/0x190
> [ 1212.454855]  [] ext4_fallocate+0x30c/0x5d0 [ext4]
> [ 1212.454858]  [] ? do_fallocate+0x101/0x190
> [ 1212.454862]  [] ? __fput+0x16d/0x2e0
> [ 1212.454865]  [] do_fallocate+0x117/0x190
> [ 1212.454869]  [] SyS_fallocate+0x57/0x90
> [ 1212.454877]  [] system_call_fastpath+0x16/0x1b
> [ 1212.454880] ---[ end trace 16f55656139fb9de ]---
> [ 1443.148536] INFO: task flush-8:16:11233 blocked for more than 120 seconds.
> [ 1443.148628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 1443.148646] flush-8:16  D 8112d160 0 11233  2 
> 0x
> [ 1443.148669]  88002fe5d768 0046  
> 880082f8b4b8
> [ 1443.148692]  8800300b9fb0 88002fe5dfd8 88002fe5dfd8 
> 88002fe5dfd8
> [ 1443.148712]  88002ed75f10 8800300b9fb0 88002fe5d768 
> 880082a144e0
> [ 1443.148733] Call Trace:
> [ 1443.148764]  [] ? __lock_page+0x70/0x70
> [ 1443.148770]  [] schedule+0x29/0x70
> [ 1443.148773]  [] io_schedule+0x8f/0xd0
> [ 1443.148777]  [] sleep_on_page+0xe/0x20
> [ 1443.148781]  [] __wait_on_bit_lock+0x5a/0xc0
> [ 1443.148785]  [] ? find_get_pages_tag+0x2e/0x1e0
> [ 1443.148790]  [] ? __block_write_full_page+0x211/0x390
> [ 1443.148794]  [] __lock_page+0x67/0x70
> [ 1443.148799]  [] ? autoremove_wake_function+0x50/0x50
> [ 1443.148809]  [] ext4_num_dirty_pages.isra.53+0x1fe/0x210 
> [ext4]
> [ 1443.148814]  [] ? pagevec_lookup_tag+0x25/0x40
> [ 1443.148818]  [] ? write_cache_pages+0x144/0x4e0
> [ 1443.148825]  [] ext4_da_writepages+0x63d/0x6a0 [ext4]
> [ 1443.148829]  [] ? set_page_dirty_lock+0x70/0x70
> [ 1443.148833]  [] ? __writeback_single_inode+0x63/0x310
> [ 1443.148837]  [] ? __writeback_single_inode+0x63/0x310
> [ 1443.148841]  [] ? writeback_sb_inodes+0x13c/0x550
> [ 1443.148845]  [] do_writepages+0x23/0x40
> [ 1443.148849]  [] __writeback_single_inode+0x45/0x310
> [ 1443.148853]  [] writeback_sb_inodes+0x2c8/0x550
> [ 1443.148856]  [] __writeback_inodes_wb+0x9e/0xd0
> [ 1443.148860]  [] wb_writeback+0x34b/0x370
> [ 

Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()

2013-05-14 Thread Zheng Liu
On Mon, May 13, 2013 at 11:08:11PM +0400, Alexey Khoroshilov wrote:
> Hi, Ted!
> 
> Our tests for ext4 with targeted fault injection were stalled in 
> Uninterruptible Sleep State 
> when they simulate a memory allocation failure in ext4_init_io_end() while it 
> is called from 
> mpage_da_submit_io() or ext4_writepage().
> 
> It looks like the problems are that pages left locked after failure handling.
> 
> I am not completely sure that my patches take into account all required 
> aspects, 
> but the tests are passed if the patches are applied.
> 
> Please find below syslog excerpt for the first issue.
> 
> Found by Linux File System Verification project (linuxtesting.org/spruce). 

Hi Alexey,

Thanks for fixing this.  The patch series looks good to me.  But the
commit (ext4: use io_end for multiple bios) has been reverted in dev
branch of ext4 tree.  I forward the mail to Jan to let him know your
fixes.

Regards,
- Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()

2013-05-14 Thread Zheng Liu
On Mon, May 13, 2013 at 11:08:11PM +0400, Alexey Khoroshilov wrote:
 Hi, Ted!
 
 Our tests for ext4 with targeted fault injection were stalled in 
 Uninterruptible Sleep State 
 when they simulate a memory allocation failure in ext4_init_io_end() while it 
 is called from 
 mpage_da_submit_io() or ext4_writepage().
 
 It looks like the problems are that pages left locked after failure handling.
 
 I am not completely sure that my patches take into account all required 
 aspects, 
 but the tests are passed if the patches are applied.
 
 Please find below syslog excerpt for the first issue.
 
 Found by Linux File System Verification project (linuxtesting.org/spruce). 

Hi Alexey,

Thanks for fixing this.  The patch series looks good to me.  But the
commit (ext4: use io_end for multiple bios) has been reverted in dev
branch of ext4 tree.  I forward the mail to Jan to let him know your
fixes.

Regards,
- Zheng
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] ext4: deadlocks after allocation failure in ext4_init_io_end()

2013-05-14 Thread Jan Kara
On Mon 13-05-13 23:08:11, Alexey Khoroshilov wrote:
 Hi, Ted!
 
 Our tests for ext4 with targeted fault injection were stalled in 
 Uninterruptible Sleep State 
 when they simulate a memory allocation failure in ext4_init_io_end() while it 
 is called from 
 mpage_da_submit_io() or ext4_writepage().
 
 It looks like the problems are that pages left locked after failure handling.
 
 I am not completely sure that my patches take into account all required 
 aspects, 
 but the tests are passed if the patches are applied.
 
 Please find below syslog excerpt for the first issue.
 
 Found by Linux File System Verification project (linuxtesting.org/spruce). 
  Thanks for the patches! As Zheng said, the patch introducing these
problems is reverted from upstream now but we'll push an equivalent version
soon so I'll take care to include your fixes in it as well.

Honza
 
 --
 Alexey Khoroshilov
 Linux Verification Center, ISPRAS
 web: http://linuxtesting.org
 
 
 
 [ 1212.454601] Failure is simulated in the following location
 [ 1212.454675] Call Trace:
 [ 1212.454681]  [81637257] dump_stack+0x19/0x1b
 [ 1212.454687]  [81044fa0] warn_slowpath_common+0x70/0xa0
 [ 1212.454691]  [a02fa869] ? indicator_simulate+0x29/0x1a0 
 [kedr_fsim_indicator_common]
 [ 1212.454701]  [a0314b43] ? ext4_init_io_end+0x23/0x50 [ext4]
 [ 1212.454705]  [81045086] warn_slowpath_fmt+0x46/0x50
 [ 1212.454710]  [a020f24e] ? kedr_fsim_point_simulate+0x7e/0xa0 
 [kedr_fault_simulation]
 [ 1212.454715]  [a020f1d5] ? kedr_fsim_point_simulate+0x5/0xa0 
 [kedr_fault_simulation]
 [ 1212.454719]  [a02c181a] kedr_repl_kmem_cache_alloc+0x7a/0xb0 
 [kedr_fsim_cmm]
 [ 1212.454727]  [a0314b43] ? ext4_init_io_end+0x23/0x50 [ext4]
 [ 1212.454732]  [a02b3f23] 
 kedr_intermediate_func_kmem_cache_alloc+0x73/0xd0 [kedr_lc_common_mm]
 [ 1212.454740]  [a0314b43] ? ext4_init_io_end+0x23/0x50 [ext4]
 [ 1212.454746]  [a0314b43] ext4_init_io_end+0x23/0x50 [ext4]
 [ 1212.454753]  [a030bf6f] mpage_da_submit_io+0x6f/0x380 [ext4]
 [ 1212.454762]  [a033aeeb] ? 
 __ext4_handle_dirty_metadata+0xab/0x140 [ext4]
 [ 1212.454769]  [a0280a4b] ? 
 jbd2_journal_get_write_access+0x3b/0x50 [jbd2]
 [ 1212.454777]  [a03105c8] ? ext4_mark_iloc_dirty+0x468/0x660 [ext4]
 [ 1212.454784]  [a03108fa] ? ext4_mark_inode_dirty+0x9a/0x270 [ext4]
 [ 1212.454790]  [a0312a5c] ? mpage_da_map_and_submit+0x1fc/0x430 
 [ext4]
 [ 1212.454798]  [a031296e] mpage_da_map_and_submit+0x10e/0x430 
 [ext4]
 [ 1212.454804]  [a0313567] ? ext4_da_writepages+0x377/0x6a0 [ext4]
 [ 1212.454811]  [a03135bc] ext4_da_writepages+0x3cc/0x6a0 [ext4]
 [ 1212.454815]  [8115360a] ? __do_fault+0x14a/0x470
 [ 1212.454820]  [8113a123] do_writepages+0x23/0x40
 [ 1212.454824]  [8112ed29] __filemap_fdatawrite_range+0x59/0x60
 [ 1212.454828]  [8112ed6a] filemap_write_and_wait_range+0x3a/0x80
 [ 1212.454835]  [a0311f65] ext4_punch_hole+0x1f5/0x5c0 [ext4]
 [ 1212.454839]  [81642b88] ? __do_page_fault+0x108/0x550
 [ 1212.454843]  [8118a981] ? do_fallocate+0x101/0x190
 [ 1212.454847]  [8118a981] ? do_fallocate+0x101/0x190
 [ 1212.454855]  [a0339e3c] ext4_fallocate+0x30c/0x5d0 [ext4]
 [ 1212.454858]  [8118a981] ? do_fallocate+0x101/0x190
 [ 1212.454862]  [8118de0d] ? __fput+0x16d/0x2e0
 [ 1212.454865]  [8118a997] do_fallocate+0x117/0x190
 [ 1212.454869]  [8118aa67] SyS_fallocate+0x57/0x90
 [ 1212.454877]  [81647782] system_call_fastpath+0x16/0x1b
 [ 1212.454880] ---[ end trace 16f55656139fb9de ]---
 [ 1443.148536] INFO: task flush-8:16:11233 blocked for more than 120 seconds.
 [ 1443.148628] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables 
 this message.
 [ 1443.148646] flush-8:16  D 8112d160 0 11233  2 
 0x
 [ 1443.148669]  88002fe5d768 0046  
 880082f8b4b8
 [ 1443.148692]  8800300b9fb0 88002fe5dfd8 88002fe5dfd8 
 88002fe5dfd8
 [ 1443.148712]  88002ed75f10 8800300b9fb0 88002fe5d768 
 880082a144e0
 [ 1443.148733] Call Trace:
 [ 1443.148764]  [8112d160] ? __lock_page+0x70/0x70
 [ 1443.148770]  [8163d2e9] schedule+0x29/0x70
 [ 1443.148773]  [8163d3bf] io_schedule+0x8f/0xd0
 [ 1443.148777]  [8112d16e] sleep_on_page+0xe/0x20
 [ 1443.148781]  [8163a3aa] __wait_on_bit_lock+0x5a/0xc0
 [ 1443.148785]  [8112dd0e] ? find_get_pages_tag+0x2e/0x1e0
 [ 1443.148790]  [811c25b1] ? __block_write_full_page+0x211/0x390
 [ 1443.148794]  [8112d157] __lock_page+0x67/0x70
 [ 1443.148799]  [8106f090] ? autoremove_wake_function+0x50/0x50
 [ 1443.148809]  [a030d07e] ext4_num_dirty_pages.isra.53+0x1fe/0x210 
 [ext4]
 [ 1443.148814]