atime updates stuck under btrfs_dirty_inode

2011-01-16 Thread Andy Isaacson
Hi all,

I'm seeing processes block (for minutes or longer) in directory access
on btrfs.  I'm running

commit ccda756d13f13a9276edc8d53d05aaecbf3c4baa
Merge: 0c21e3a 65e5341
Author: Andy Isaacson 
Date:   Sun Jan 9 12:37:23 2011 -0800

Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

on a quad core Core i7.

There seem to be a few failure cases.  My most recent ls has been stuck
like this for several hours, on a directory that should have somewhere
between 6 and 50 entries:

% ps u 15028
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
adi  15028  0.0  0.0  27032  1204 pts/8S+   22:18   0:00 ls -l /btr/foo
% cat /proc/15028/stack
[] shrink_delalloc+0x112/0x147 [btrfs]
[] reserve_metadata_bytes+0xf2/0x1a8 [btrfs]
[] btrfs_block_rsv_add+0x28/0x49 [btrfs]
[] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs]
[] start_transaction+0x13e/0x1d8 [btrfs]
[] btrfs_start_transaction+0x13/0x15 [btrfs]
[] btrfs_dirty_inode+0x6c/0xfd [btrfs]
[] __mark_inode_dirty+0x2b/0x1ab
[] touch_atime+0x10e/0x131
[] vfs_readdir+0x79/0xa0
[] sys_getdents+0x81/0xd1
[] tracesys+0xd9/0xde
[] 0x

Sometimes, if I poll fast enough I see this stack instead:

% cat /proc/15028/stack
[] wait_for_commit+0x99/0xe2 [btrfs]
[] btrfs_commit_transaction+0xfe/0x612 [btrfs]
[] start_transaction+0x151/0x1d8 [btrfs]
[] btrfs_start_transaction+0x13/0x15 [btrfs]
[] btrfs_dirty_inode+0x6c/0xfd [btrfs]
[] __mark_inode_dirty+0x2b/0x1ab
[] touch_atime+0x10e/0x131
[] vfs_readdir+0x79/0xa0
[] sys_getdents+0x81/0xd1
[] tracesys+0xd9/0xde
[] 0x

Another ls has been accumulating runtime for several days:

 7406 pts/3R+   2973:56  |   \_ ls -F

the directory it's reading has around 200 files in it, I think.

When I catch it in kernel mode its stack looks like this:

% cat /proc/7406/stack
[] writeback_inodes_sb_nr+0x76/0x7d
[] writeback_inodes_sb_nr_if_idle+0x41/0x55
[] shrink_delalloc+0xaf/0x147 [btrfs]
[] reserve_metadata_bytes+0xf2/0x1a8 [btrfs]
[] btrfs_block_rsv_add+0x28/0x49 [btrfs]
[] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs]
[] start_transaction+0x13e/0x1d8 [btrfs]
[] btrfs_start_transaction+0x13/0x15 [btrfs]
[] btrfs_dirty_inode+0x6c/0xfd [btrfs]
[] __mark_inode_dirty+0x2b/0x1ab
[] touch_atime+0x10e/0x131
[] vfs_readdir+0x79/0xa0
[] sys_getdents+0x81/0xd1
[] system_call_fastpath+0x16/0x1b
[] 0x

vmstat is rather odd:

% vmstat 1
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  0  0 124348 177388 1127622800   1572430  2 10 87  1
 1  0  0 122108 177388 1127854000   536   112 2870 813309  2 13 83  
2
 2  0  0 119628 177388 1128085200   704 0 2938 887219  2 11 87  0
 2  0  0 117768 177388 1128268400   800 0 2936 883427  2 15 82  
1
 2  0  0 114872 177388 1128546400  1104 0 2819 883498  1  8 90  0
 1  0  0 112704 177388 1128738800   612 0 2961 880957  2 14 84  0
^C

One of the three block devices on this btrfs is reading a few hundred
blocks per second, but is nowhere near 100% busy; the CS count is
extremely high; and we have less than a full CPU spent in system time
(even though there are two ls processes "blocked").

If anyone can point me to a good place to start tracking down the root
cause, I'd appreciate it.

Thanks,
-andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.37 BUG at inode.c:1616 (was Re: 2.6.37: Bug on btrfs while umount)

2011-01-10 Thread Andy Isaacson
On Mon, Jan 10, 2011 at 12:08:15AM -0800, Andy Isaacson wrote:
> On Thu, Jan 06, 2011 at 08:29:12PM -0500, Chris Mason wrote:
> > > [50010.838804] [ cut here ]
> > > [50010.838931] kernel BUG at fs/btrfs/inode.c:1616!
> > > [50010.839053] invalid opcode:  [#1] PREEMPT SMP
> [snip]
> > > [50010.839653] Pid: 1681, comm: btrfs-endio-wri Not tainted 2.6.37 #1
> > 
> > Could you please pull from the master branch of the btrfs unstable tree.
> > We had a late fix that is related to this.
> 
> I saw BUG at inode.c:1616 while running 2.6.37-rc6-11882-g55ec86f, I saw

Just got this again, same BUG and same stack trace.  This machine +
fs ran great for several months under approximately the same workload,
but the FS is slowly filling up; is this a regression in .37 or is the
bug a function of the FS load?

-andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.37 BUG at inode.c:1616 (was Re: 2.6.37: Bug on btrfs while umount)

2011-01-10 Thread Andy Isaacson
On Thu, Jan 06, 2011 at 08:29:12PM -0500, Chris Mason wrote:
> > [50010.838804] [ cut here ]
> > [50010.838931] kernel BUG at fs/btrfs/inode.c:1616!
> > [50010.839053] invalid opcode:  [#1] PREEMPT SMP
[snip]
> > [50010.839653] Pid: 1681, comm: btrfs-endio-wri Not tainted 2.6.37 #1
> 
> Could you please pull from the master branch of the btrfs unstable tree.
> We had a late fix that is related to this.

I saw BUG at inode.c:1616 while running 2.6.37-rc6-11882-g55ec86f, I saw
your message and upgraded to Linus tip (0c21e3a) + btrfs-unstable tip
(65e5341), and I just saw it again.  Including both BUG traces below.

The machine is a Core i7 with 12GB, with btrfs spanning three volumes:

Label: btr  uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982
Total devices 3 FS bytes used 735.97GB
devid1 size 18.65GB used 18.64GB path /dev/sda2
devid2 size 512.00GB used 511.88GB path /dev/sdb1
devid3 size 512.00GB used 225.26GB path /dev/sdc1

The primary writer to the filesystem is rtorrent; normally I have ffmpeg
writing to the filesystem at about 100 kbyte/sec as well, but it wasn't
running in this latest crash.

[ 9275.240027] [ cut here ]
[ 9275.249991] kernel BUG at fs/btrfs/inode.c:1616!
[ 9275.259914] invalid opcode:  [#1] SMP 
[ 9275.269794] last sysfs file: 
/sys/devices/pci:00/:00:1a.7/usb1/1-4/1-4:1.0/host8/target8:0:0/8:0:0:0/block/sdd/stat
[ 9275.280066] CPU 0 
[ 9275.280127] Modules linked in: tun ebtable_nat ebtables ipt_MASQUERADE 
iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack 
ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel 
kvm xfs exportfs loop snd_hda_codec_hdmi snd_hda_codec_realtek radeon ttm 
drm_kms_helper drm snd_hda_intel snd_hda_codec i2c_algo_bit snd_usb_audio 
uvcvideo snd_hwdep i2c_i801 snd_usbmidi_lib snd_pcm snd_rawmidi snd_timer 
videodev snd_seq_device snd v4l2_compat_ioctl32 pcspkr i2c_core serio_raw 
soundcore snd_page_alloc processor tpm_tis tpm tpm_bios evdev shpchp button 
thermal_sys ext3 jbd mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c 
usb_storage uas sd_mod crc_t10dif ehci_hcd usbcore ahci libahci libata r8169 
scsi_mod mii nls_base [last unloaded: scsi_wait_scan]
[ 9275.358450] 
[ 9275.369821] Pid: 3654, comm: btrfs-endio-wri Not tainted 
2.6.37-03739-gccda756 #73 MSI X58 Pro-E (MS-7522)/MS-7522
[ 9275.381570] RIP: 0010:[]  [] 
T.1234+0x76/0x201 [btrfs]
[ 9275.393380] RSP: 0018:88025f275c30  EFLAGS: 00010286
[ 9275.405100] RAX: ffe4 RBX: 88032b596b40 RCX: 88032b596c60
[ 9275.416865] RDX:  RSI: ea000b17b8d0 RDI: fff4
[ 9275.428666] RBP: 88025f275cc0 R08: 0005 R09: 88025f2759a0
[ 9275.440522] R10: 88025f275970 R11: dead00100100 R12: 880083a7e888
[ 9275.452374] R13: 06e0c000 R14: 880331d7c800 R15: 8800bb38d880
[ 9275.464146] FS:  () GS:8800bf40() 
knlGS:
[ 9275.475923] CS:  0010 DS:  ES:  CR0: 8005003b
[ 9275.487673] CR2: 7f3eada57000 CR3: 01603000 CR4: 26e0
[ 9275.499547] DR0:  DR1:  DR2: 
[ 9275.511395] DR3:  DR6: 0ff0 DR7: 0400
[ 9275.523192] Process btrfs-endio-wri (pid: 3654, threadinfo 88025f274000, 
task 88032554b020)
[ 9275.535168] Stack:
[ 9275.546852]  06e0c000 1000 00b8c1376000 
1000
[ 9275.558616]  880331d7c800 0001 8800bb38d880 
880331d7c800
[ 9275.570416]  88025f275cb0 a014b53f 88025f275ce0 
8802e21ad7f0
[ 9275.582149] Call Trace:
[ 9275.593731]  [] ? start_transaction+0x1a9/0x1d8 [btrfs]
[ 9275.605513]  [] btrfs_finish_ordered_io+0x1e6/0x2c2 [btrfs]
[ 9275.617426]  [] btrfs_writepage_end_io_hook+0x1a/0x1c 
[btrfs]
[ 9275.629403]  [] end_bio_extent_writepage+0xae/0x159 [btrfs]
[ 9275.641463]  [] bio_endio+0x2d/0x2f
[ 9275.653462]  [] end_workqueue_fn+0x111/0x120 [btrfs]
[ 9275.665484]  [] worker_loop+0x195/0x4c4 [btrfs]
[ 9275.677451]  [] ? worker_loop+0x0/0x4c4 [btrfs]
[ 9275.689317]  [] ? worker_loop+0x0/0x4c4 [btrfs]
[ 9275.701079]  [] kthread+0x82/0x8a
[ 9275.712839]  [] kernel_thread_helper+0x4/0x10
[ 9275.724455]  [] ? kthread+0x0/0x8a
[ 9275.735873]  [] ? kernel_thread_helper+0x0/0x10
[ 9275.747329] Code: 0f 0b eb fe 80 88 88 00 00 00 08 45 31 c9 48 8b 4d 88 4c 
8d 45 c0 4c 01 e9 4c 89 ea 4c 89 e6 4c 89 ff e8 7c 4c 00 00 85 c0 74 04 <0f> 0b 
eb fe 49 8b 84 24 a8 00 00 00 4c 89 6d a9 48 89 45 a0 c6 
[ 9275.771177] RIP  [] T.1234+0x76/0x201 [btrfs]
[ 9275.782973]  RSP 
[ 9276.049117] ---[ end trace 39a68c45b2f4ff68 ]---

The previous BUG also happened at runtime (not umount or anything), with
rtorrent running and ffmpeg writing a video stream from /dev/video0 to
btrfs.

[1719001.040399] [ cut here ]
[1719001.0415

Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-22 Thread Andy Isaacson
On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote:
> On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson  wrote:
> > I have a directory with 1.2M files in it, which makes readdir very slow
> > on btrfs with cold caches (although it's reasonably fast with hot caches
> > as in the first example below):
> 
> Sounds like:
> 
> Bug 21562 - btrfs is dead slow due to fragmentation
> https://bugzilla.kernel.org/show_bug.cgi?id=21562

Hmmm, how do I look at the btree layout for a given inode?

btrfs-image for this filesystem is 1.7GiB .bz2, so I'm afraid it's not
reasonable to publish it.

-andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs: 21 minutes to read 1.2M file directory

2010-12-20 Thread Andy Isaacson
Sigh, wrong btrfs address on the original.  Apologies for the
double-post.

On Mon, Dec 20, 2010 at 02:24:46PM -0800, Andy Isaacson wrote:
> I have a directory with 1.2M files in it, which makes readdir very slow
> on btrfs with cold caches (although it's reasonably fast with hot caches
> as in the first example below):
> 
> % time find /btr/foo > /btr/foo.list
> find /btr/foo > /btr/foo.list  4.10s user 7.97s system 36% cpu 33.275 total
> % head /btr/foo.list
> /btr/foo
> /btr/foo/1281373625.777.fg.jpg
> /btr/foo/1281373625.777.bg.jpg
> /btr/foo/1281373625.948.fg.jpg
> /btr/foo/1281373625.948.bg.jpg
> /btr/foo/1281373626.096.fg.jpg
> /btr/foo/1281373626.096.bg.jpg
> /btr/foo/1281373626.218.fg.jpg
> /btr/foo/1281373626.218.bg.jpg
> /btr/foo/1281373626.350.fg.jpg
> % wc !$
> wc /btr/foo.list
>  1216  1216 401499940 /btr/foo.list
> % wc -l /btr/foo.list
> 1216 /btr/foo.list
> % sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
> vm.drop_caches = 3
> vm.drop_caches = 0
> % time find /btr/foo > /btr/foo.list.2
> find /btr/foo > /btr/foo.list.2  5.62s user 24.54s system 2% cpu 21:40.90 
> total
> % uname -a
> Linux pyron 2.6.36-rc7-00149-g29979aa #71 SMP Wed Oct 13 09:42:57 PDT 2010 
> x86_64 GNU/Linux
> 
> Interestingly, while readdir is busy I'm only seeing IO on sdb even
> though the btrfs is on 3 targets:
> 
> Label: btr  uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982
> Total devices 3 FS bytes used 555.13GB
> devid1 size 18.65GB used 18.64GB path /dev/sda2
> devid3 size 512.00GB used 44.13GB path /dev/sdc1
> devid2 size 512.00GB used 511.76GB path /dev/sdb1
> 
> "iostat -k 1 | grep sdb" tells me:
> 
> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
> 
> sdb 173.00   692.00 0.00692  0
> sdb 185.00   740.00 0.00740  0
> sdb 198.00   792.00 0.00792  0
> sdb 177.00   712.00 0.00712  0
> 
> I updated to a recent git and it's still slow (my test hasn't completed
> yet 19 minutes in):
> 
> Linux pyron 2.6.37-rc6-11882-g55ec86f #72 SMP Mon Dec 20 13:34:38 PST 2010 
> x86_64 GNU/Linux
> 
> The devices are:
> 
> [1.834527] ata1.00: ATA-7: INTEL SSDSA2M040G2GC, 2CV102HD, max UDMA/133
> [1.834816] ata1.00: 78165360 sectors, multi 1: LBA48 NCQ (depth 31/32)
> [1.835369] ata1.00: configured for UDMA/133
> [1.835776] scsi 0:0:0:0: Direct-Access ATA  INTEL SSDSA2M040 2CV1 
> PQ: 0 ANSI: 5
> ...
> [2.904919] ata3.00: ATA-8: ST31500341AS, CC1H, max UDMA/133
> [2.905206] ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [2.947393] ata3.00: configured for UDMA/133
> [2.947850] scsi 2:0:0:0: Direct-Access ATA  ST31500341AS CC1H 
> PQ: 0 ANSI: 5
> ...
> [3.989664] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [4.018524] ata5.00: ATA-8: ST31500341AS, CC1H, max UDMA/133
> [4.018811] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32)
> [4.060838] ata5.00: configured for UDMA/133
> [4.061205] scsi 4:0:0:0: Direct-Access ATA  ST31500341AS CC1H 
> PQ: 0 ANSI: 5
> 
> The host is a "Intel(R) Core(TM) i7 CPU 930 @2.80GHz" with 12GB RAM.
> 
> Thanks,
> -andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html