Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
Hi, I'm seeing a machine lockup in xfstests/224, logs attached. Friday's xfstests round with 3.5-rc4 was ok, all tests passed. The 'dd' processes are in D-state with this stacktraces 5597 pts/0D+ 0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.8 bs=4k conv=notrunc [a001bb3e] reserve_metadata_bytes+0x33e/0x8f0 [btrfs] [a001cd64] btrfs_delalloc_reserve_metadata+0x134/0x3b0 [btrfs] [a001d16b] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs] [a004132b] __btrfs_buffered_write+0x17b/0x380 [btrfs] [a0041783] btrfs_file_aio_write+0x253/0x4e0 [btrfs] [81144892] do_sync_write+0xe2/0x120 [8114519e] vfs_write+0xce/0x190 [811454e4] sys_write+0x54/0xa0 [818b4fa9] system_call_fastpath+0x16/0x1b [] 0x and (not sure if there are more) 5666 pts/0D+ 0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.6 bs=4k conv=notrunc [a001bb3e] reserve_metadata_bytes+0x33e/0x8f0 [btrfs] [a001c56a] btrfs_block_rsv_add+0x3a/0x60 [btrfs] [a003155e] start_transaction+0x26e/0x330 [btrfs] [a0031903] btrfs_start_transaction+0x13/0x20 [btrfs] [a003cae0] btrfs_dirty_inode+0xb0/0xe0 [btrfs] [a003cdad] btrfs_update_time+0xcd/0x180 [btrfs] [a00416f8] btrfs_file_aio_write+0x1c8/0x4e0 [btrfs] [81144892] do_sync_write+0xe2/0x120 [8114519e] vfs_write+0xce/0x190 [811454e4] sys_write+0x54/0xa0 [818b4fa9] system_call_fastpath+0x16/0x1b all btrfs kernel threads are idle. Mount options: -o space_cache Mkfs: fresh, default options # btrfs fi df /mnt/a2 System: total=4.00MiB, used=4.00KiB Data+Metadata: total=1020.00MiB, used=987.32MiB [meanwhile] While grabbing lockdep stats the test respawned 224 236s ...[14:57:42] [15:46:56] 2954s but there was no disk activity, I wonder if touching /proc/lockdep or /proc/lock_stat is affecting this. Finishing this report anyway, and will redo the tests again. Looking again into the logs, the first process snapshot (only D-state processes) is much longer than process snapshot of containing all, unfortuntelly I don't have timestamps recorded, but this suggests that it's very slowly going on, so slowly that I considered it stalled looking at the io graphs. david for-linus-hung-224-all.txt.gz Description: Binary data for-linus-hung-224-D.txt.gz Description: Binary data
Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
On Mon, Jul 02, 2012 at 04:10:52PM +0200, David Sterba wrote: Finishing this report anyway, and will redo the tests again. Looking again into the logs, the first process snapshot (only D-state processes) is much longer than process snapshot of containing all, unfortuntelly I don't have timestamps recorded, but this suggests that it's very slowly going on, so slowly that I considered it stalled looking at the io graphs. Fresh build, reboot, and single xfstests/224 run: during first ~20 seconds, there's high write activity, ie. file setup, then it goes to a few tens-to-hundreds of KB every 4 seconds. Cpu is idle, sample output from dstat: total-cpu-usage --dsk/sda9- ---system-- usr sys idl wai hiq siq| read writ| int csw 1 1 99 0 0 0| 0 0 | 923 1856 0 1 98 0 1 0| 0 8192B| 904 2796 0 1 99 0 0 0| 0 0 | 945 1914 1 1 98 0 0 0| 0 0 | 899 1849 1 1 98 0 0 1| 0 0 | 906 1848 0 3 97 0 0 0| 020k| 901 3740 0 0 100 0 0 0| 0 0 | 905 1851 1 1 98 0 0 1| 0 0 | 946 1917 0 1 99 0 0 0| 0 0 | 904 1858 0 1 99 0 0 0| 0 8192B| 907 2805 1 1 98 0 0 1| 0 0 | 891 1836 0 1 99 0 0 0| 0 0 | 900 1847 0 1 99 0 0 0| 0 0 | 940 1905 1 4 95 0 0 0| 032k| 904 5153 1 2 97 0 0 0| 036k| 913 4240 0 1 99 0 0 0| 0 0 | 907 1849 0 1 99 0 0 0| 0 0 | 908 1852 1 1 98 0 0 1| 0 0 | 933 1901 1 2 98 0 0 0| 0 8192B| 916 2808 0 1 99 0 0 0| 0 0 | 917 1843 0 1 99 0 0 1| 0 0 | 908 1844 1 1 99 0 0 0| 0 0 | 905 1860 0 5 95 0 0 0| 036k| 943 7565 1 1 99 0 0 0| 0 0 | 911 1861 0 1 99 0 0 0| 0 0 | 910 1852 1 1 98 0 0 0| 0 0 | 944 1878 1 2 97 0 0 1| 016k| 898 3753 0 9 87 4 0 1| 0 1020k|103511k 0 19 74 7 0 1| 0 2092k|305224k 0 1 99 0 0 0| 0 0 | 909 1851 1 1 98 0 0 1| 0 0 | 915 1856 1 1 99 0 0 0| 0 0 | 896 1847 0 2 98 0 0 0| 0 8192B| 931 2847 0 1 99 0 0 0| 0 0 | 899 1850 1 1 98 0 0 1| 0 0 | 896 1861 0 1 99 0 0 0| 0 0 | 911 1855 1 5 94 0 0 0| 028k| 891 6521 0 9 87 3 0 1| 0 1100k| 96311k 0 1 99 0 0 0| 0 0 | 905 1857 1 1 99 0 0 0| 0 0 | 895 1851 1 1 98 0 0 0| 0 0 | 911 1852 0 7 88 4 0 1| 0 700k| 911 8533 0 1 99 0 0 0| 0 0 | 940 1905 1 1 99 0 0 0| 0 0 | 912 1851 1 1 99 0 0 0| 0 0 | 895 1851 0 10 89 0 0 1| 0 100k| 91213k and repeats more or less the same. Bisection in progress. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)
On Mon, Jul 02, 2012 at 04:34:53PM +0200, David Sterba wrote: Bisection in progress. commit cae76522b19735c576803bec273f49062aa418ab Author: Josef Bacik jba...@fusionio.com Date: Thu Jun 21 14:05:49 2012 -0400 Btrfs: flush delayed inodes if we're short on space Those crazy gentoo guys have been complaining about ENOSPC errors on their portage volumes. This is because doing things like untar tends to create lots of new files which will soak up all the reservation space in the delayed inodes. Usually this gets papered over by the fact that we will try and commit the transaction, however if this happens in the wrong spot or we choose not to commit the transaction you will be screwed. So add the ability to expclitly flush delayed inodes to free up space. Please test this out guys to make sure it works since as usual I cannot reproduce. Thanks, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html