Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

2012-07-02 Thread David Sterba
Hi,

I'm seeing a machine lockup in xfstests/224, logs attached. Friday's
xfstests round with 3.5-rc4 was ok, all tests passed.

The 'dd' processes are in D-state with this stacktraces

 5597 pts/0D+ 0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.8 
bs=4k conv=notrunc
[a001bb3e] reserve_metadata_bytes+0x33e/0x8f0 [btrfs]
[a001cd64] btrfs_delalloc_reserve_metadata+0x134/0x3b0 [btrfs]
[a001d16b] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
[a004132b] __btrfs_buffered_write+0x17b/0x380 [btrfs]
[a0041783] btrfs_file_aio_write+0x253/0x4e0 [btrfs]
[81144892] do_sync_write+0xe2/0x120
[8114519e] vfs_write+0xce/0x190
[811454e4] sys_write+0x54/0xa0
[818b4fa9] system_call_fastpath+0x16/0x1b
[] 0x

and (not sure if there are more)

 5666 pts/0D+ 0:00 dd status=noxfer if=/dev/zero of=/mnt/a2/testfile.6 
bs=4k conv=notrunc
[a001bb3e] reserve_metadata_bytes+0x33e/0x8f0 [btrfs]
[a001c56a] btrfs_block_rsv_add+0x3a/0x60 [btrfs]
[a003155e] start_transaction+0x26e/0x330 [btrfs]
[a0031903] btrfs_start_transaction+0x13/0x20 [btrfs]
[a003cae0] btrfs_dirty_inode+0xb0/0xe0 [btrfs]
[a003cdad] btrfs_update_time+0xcd/0x180 [btrfs]
[a00416f8] btrfs_file_aio_write+0x1c8/0x4e0 [btrfs]
[81144892] do_sync_write+0xe2/0x120
[8114519e] vfs_write+0xce/0x190
[811454e4] sys_write+0x54/0xa0
[818b4fa9] system_call_fastpath+0x16/0x1b

all btrfs kernel threads are idle.

Mount options: -o space_cache
Mkfs: fresh, default options

# btrfs fi df /mnt/a2
System: total=4.00MiB, used=4.00KiB
Data+Metadata: total=1020.00MiB, used=987.32MiB

[meanwhile]

While grabbing lockdep stats the test respawned

224 236s ...[14:57:42] [15:46:56] 2954s

but there was no disk activity, I wonder if touching /proc/lockdep or
/proc/lock_stat is affecting this.

Finishing this report anyway, and will redo the tests again.

Looking again into the logs, the first process snapshot (only D-state
processes) is much longer than process snapshot of containing all,
unfortuntelly I don't have timestamps recorded, but this suggests that it's
very slowly going on, so slowly that I considered it stalled looking at the
io graphs.


david


for-linus-hung-224-all.txt.gz
Description: Binary data


for-linus-hung-224-D.txt.gz
Description: Binary data


Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

2012-07-02 Thread David Sterba
On Mon, Jul 02, 2012 at 04:10:52PM +0200, David Sterba wrote:
 Finishing this report anyway, and will redo the tests again.
 
 Looking again into the logs, the first process snapshot (only D-state
 processes) is much longer than process snapshot of containing all,
 unfortuntelly I don't have timestamps recorded, but this suggests that it's
 very slowly going on, so slowly that I considered it stalled looking at the
 io graphs.

Fresh build, reboot, and single xfstests/224 run:

during first ~20 seconds, there's high write activity, ie. file setup,
then it goes to a few tens-to-hundreds of KB every 4 seconds. Cpu is idle,
sample output from dstat:

total-cpu-usage --dsk/sda9- ---system--
usr sys idl wai hiq siq| read  writ| int   csw
  1   1  99   0   0   0|   0 0 | 923  1856
  0   1  98   0   1   0|   0  8192B| 904  2796
  0   1  99   0   0   0|   0 0 | 945  1914
  1   1  98   0   0   0|   0 0 | 899  1849
  1   1  98   0   0   1|   0 0 | 906  1848
  0   3  97   0   0   0|   020k| 901  3740
  0   0 100   0   0   0|   0 0 | 905  1851
  1   1  98   0   0   1|   0 0 | 946  1917
  0   1  99   0   0   0|   0 0 | 904  1858
  0   1  99   0   0   0|   0  8192B| 907  2805
  1   1  98   0   0   1|   0 0 | 891  1836
  0   1  99   0   0   0|   0 0 | 900  1847
  0   1  99   0   0   0|   0 0 | 940  1905
  1   4  95   0   0   0|   032k| 904  5153
  1   2  97   0   0   0|   036k| 913  4240
  0   1  99   0   0   0|   0 0 | 907  1849
  0   1  99   0   0   0|   0 0 | 908  1852
  1   1  98   0   0   1|   0 0 | 933  1901
  1   2  98   0   0   0|   0  8192B| 916  2808
  0   1  99   0   0   0|   0 0 | 917  1843
  0   1  99   0   0   1|   0 0 | 908  1844
  1   1  99   0   0   0|   0 0 | 905  1860
  0   5  95   0   0   0|   036k| 943  7565
  1   1  99   0   0   0|   0 0 | 911  1861
  0   1  99   0   0   0|   0 0 | 910  1852
  1   1  98   0   0   0|   0 0 | 944  1878
  1   2  97   0   0   1|   016k| 898  3753
  0   9  87   4   0   1|   0  1020k|103511k
  0  19  74   7   0   1|   0  2092k|305224k
  0   1  99   0   0   0|   0 0 | 909  1851
  1   1  98   0   0   1|   0 0 | 915  1856
  1   1  99   0   0   0|   0 0 | 896  1847
  0   2  98   0   0   0|   0  8192B| 931  2847
  0   1  99   0   0   0|   0 0 | 899  1850
  1   1  98   0   0   1|   0 0 | 896  1861
  0   1  99   0   0   0|   0 0 | 911  1855
  1   5  94   0   0   0|   028k| 891  6521
  0   9  87   3   0   1|   0  1100k| 96311k
  0   1  99   0   0   0|   0 0 | 905  1857
  1   1  99   0   0   0|   0 0 | 895  1851
  1   1  98   0   0   0|   0 0 | 911  1852
  0   7  88   4   0   1|   0   700k| 911  8533
  0   1  99   0   0   0|   0 0 | 940  1905
  1   1  99   0   0   0|   0 0 | 912  1851
  1   1  99   0   0   0|   0 0 | 895  1851
  0  10  89   0   0   1|   0   100k| 91213k

and repeats more or less the same.

Bisection in progress.


david
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xfstests/224 lockup/slowdown (was: Please hammer my for-linus branch)

2012-07-02 Thread David Sterba
On Mon, Jul 02, 2012 at 04:34:53PM +0200, David Sterba wrote:
 Bisection in progress.

commit cae76522b19735c576803bec273f49062aa418ab
Author: Josef Bacik jba...@fusionio.com
Date:   Thu Jun 21 14:05:49 2012 -0400

Btrfs: flush delayed inodes if we're short on space

Those crazy gentoo guys have been complaining about ENOSPC errors on their
portage volumes.  This is because doing things like untar tends to create
lots of new files which will soak up all the reservation space in the
delayed inodes.  Usually this gets papered over by the fact that we will try
and commit the transaction, however if this happens in the wrong spot or we
choose not to commit the transaction you will be screwed.  So add the
ability to expclitly flush delayed inodes to free up space.  Please test
this out guys to make sure it works since as usual I cannot reproduce.
Thanks,
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html