atime updates stuck under btrfs_dirty_inode
Hi all, I'm seeing processes block (for minutes or longer) in directory access on btrfs. I'm running commit ccda756d13f13a9276edc8d53d05aaecbf3c4baa Merge: 0c21e3a 65e5341 Author: Andy Isaacson Date: Sun Jan 9 12:37:23 2011 -0800 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable on a quad core Core i7. There seem to be a few failure cases. My most recent ls has been stuck like this for several hours, on a directory that should have somewhere between 6 and 50 entries: % ps u 15028 USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND adi 15028 0.0 0.0 27032 1204 pts/8S+ 22:18 0:00 ls -l /btr/foo % cat /proc/15028/stack [] shrink_delalloc+0x112/0x147 [btrfs] [] reserve_metadata_bytes+0xf2/0x1a8 [btrfs] [] btrfs_block_rsv_add+0x28/0x49 [btrfs] [] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs] [] start_transaction+0x13e/0x1d8 [btrfs] [] btrfs_start_transaction+0x13/0x15 [btrfs] [] btrfs_dirty_inode+0x6c/0xfd [btrfs] [] __mark_inode_dirty+0x2b/0x1ab [] touch_atime+0x10e/0x131 [] vfs_readdir+0x79/0xa0 [] sys_getdents+0x81/0xd1 [] tracesys+0xd9/0xde [] 0x Sometimes, if I poll fast enough I see this stack instead: % cat /proc/15028/stack [] wait_for_commit+0x99/0xe2 [btrfs] [] btrfs_commit_transaction+0xfe/0x612 [btrfs] [] start_transaction+0x151/0x1d8 [btrfs] [] btrfs_start_transaction+0x13/0x15 [btrfs] [] btrfs_dirty_inode+0x6c/0xfd [btrfs] [] __mark_inode_dirty+0x2b/0x1ab [] touch_atime+0x10e/0x131 [] vfs_readdir+0x79/0xa0 [] sys_getdents+0x81/0xd1 [] tracesys+0xd9/0xde [] 0x Another ls has been accumulating runtime for several days: 7406 pts/3R+ 2973:56 | \_ ls -F the directory it's reading has around 200 files in it, I think. When I catch it in kernel mode its stack looks like this: % cat /proc/7406/stack [] writeback_inodes_sb_nr+0x76/0x7d [] writeback_inodes_sb_nr_if_idle+0x41/0x55 [] shrink_delalloc+0xaf/0x147 [btrfs] [] reserve_metadata_bytes+0xf2/0x1a8 [btrfs] [] btrfs_block_rsv_add+0x28/0x49 [btrfs] [] btrfs_trans_reserve_metadata+0x4d/0x75 [btrfs] [] start_transaction+0x13e/0x1d8 [btrfs] [] btrfs_start_transaction+0x13/0x15 [btrfs] [] btrfs_dirty_inode+0x6c/0xfd [btrfs] [] __mark_inode_dirty+0x2b/0x1ab [] touch_atime+0x10e/0x131 [] vfs_readdir+0x79/0xa0 [] sys_getdents+0x81/0xd1 [] system_call_fastpath+0x16/0x1b [] 0x vmstat is rather odd: % vmstat 1 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 0 0 124348 177388 1127622800 1572430 2 10 87 1 1 0 0 122108 177388 1127854000 536 112 2870 813309 2 13 83 2 2 0 0 119628 177388 1128085200 704 0 2938 887219 2 11 87 0 2 0 0 117768 177388 1128268400 800 0 2936 883427 2 15 82 1 2 0 0 114872 177388 1128546400 1104 0 2819 883498 1 8 90 0 1 0 0 112704 177388 1128738800 612 0 2961 880957 2 14 84 0 ^C One of the three block devices on this btrfs is reading a few hundred blocks per second, but is nowhere near 100% busy; the CS count is extremely high; and we have less than a full CPU spent in system time (even though there are two ls processes "blocked"). If anyone can point me to a good place to start tracking down the root cause, I'd appreciate it. Thanks, -andy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.37 BUG at inode.c:1616 (was Re: 2.6.37: Bug on btrfs while umount)
On Mon, Jan 10, 2011 at 12:08:15AM -0800, Andy Isaacson wrote: > On Thu, Jan 06, 2011 at 08:29:12PM -0500, Chris Mason wrote: > > > [50010.838804] [ cut here ] > > > [50010.838931] kernel BUG at fs/btrfs/inode.c:1616! > > > [50010.839053] invalid opcode: [#1] PREEMPT SMP > [snip] > > > [50010.839653] Pid: 1681, comm: btrfs-endio-wri Not tainted 2.6.37 #1 > > > > Could you please pull from the master branch of the btrfs unstable tree. > > We had a late fix that is related to this. > > I saw BUG at inode.c:1616 while running 2.6.37-rc6-11882-g55ec86f, I saw Just got this again, same BUG and same stack trace. This machine + fs ran great for several months under approximately the same workload, but the FS is slowly filling up; is this a regression in .37 or is the bug a function of the FS load? -andy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.37 BUG at inode.c:1616 (was Re: 2.6.37: Bug on btrfs while umount)
On Thu, Jan 06, 2011 at 08:29:12PM -0500, Chris Mason wrote: > > [50010.838804] [ cut here ] > > [50010.838931] kernel BUG at fs/btrfs/inode.c:1616! > > [50010.839053] invalid opcode: [#1] PREEMPT SMP [snip] > > [50010.839653] Pid: 1681, comm: btrfs-endio-wri Not tainted 2.6.37 #1 > > Could you please pull from the master branch of the btrfs unstable tree. > We had a late fix that is related to this. I saw BUG at inode.c:1616 while running 2.6.37-rc6-11882-g55ec86f, I saw your message and upgraded to Linus tip (0c21e3a) + btrfs-unstable tip (65e5341), and I just saw it again. Including both BUG traces below. The machine is a Core i7 with 12GB, with btrfs spanning three volumes: Label: btr uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982 Total devices 3 FS bytes used 735.97GB devid1 size 18.65GB used 18.64GB path /dev/sda2 devid2 size 512.00GB used 511.88GB path /dev/sdb1 devid3 size 512.00GB used 225.26GB path /dev/sdc1 The primary writer to the filesystem is rtorrent; normally I have ffmpeg writing to the filesystem at about 100 kbyte/sec as well, but it wasn't running in this latest crash. [ 9275.240027] [ cut here ] [ 9275.249991] kernel BUG at fs/btrfs/inode.c:1616! [ 9275.259914] invalid opcode: [#1] SMP [ 9275.269794] last sysfs file: /sys/devices/pci:00/:00:1a.7/usb1/1-4/1-4:1.0/host8/target8:0:0/8:0:0:0/block/sdd/stat [ 9275.280066] CPU 0 [ 9275.280127] Modules linked in: tun ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc kvm_intel kvm xfs exportfs loop snd_hda_codec_hdmi snd_hda_codec_realtek radeon ttm drm_kms_helper drm snd_hda_intel snd_hda_codec i2c_algo_bit snd_usb_audio uvcvideo snd_hwdep i2c_i801 snd_usbmidi_lib snd_pcm snd_rawmidi snd_timer videodev snd_seq_device snd v4l2_compat_ioctl32 pcspkr i2c_core serio_raw soundcore snd_page_alloc processor tpm_tis tpm tpm_bios evdev shpchp button thermal_sys ext3 jbd mbcache dm_mod btrfs zlib_deflate crc32c libcrc32c usb_storage uas sd_mod crc_t10dif ehci_hcd usbcore ahci libahci libata r8169 scsi_mod mii nls_base [last unloaded: scsi_wait_scan] [ 9275.358450] [ 9275.369821] Pid: 3654, comm: btrfs-endio-wri Not tainted 2.6.37-03739-gccda756 #73 MSI X58 Pro-E (MS-7522)/MS-7522 [ 9275.381570] RIP: 0010:[] [] T.1234+0x76/0x201 [btrfs] [ 9275.393380] RSP: 0018:88025f275c30 EFLAGS: 00010286 [ 9275.405100] RAX: ffe4 RBX: 88032b596b40 RCX: 88032b596c60 [ 9275.416865] RDX: RSI: ea000b17b8d0 RDI: fff4 [ 9275.428666] RBP: 88025f275cc0 R08: 0005 R09: 88025f2759a0 [ 9275.440522] R10: 88025f275970 R11: dead00100100 R12: 880083a7e888 [ 9275.452374] R13: 06e0c000 R14: 880331d7c800 R15: 8800bb38d880 [ 9275.464146] FS: () GS:8800bf40() knlGS: [ 9275.475923] CS: 0010 DS: ES: CR0: 8005003b [ 9275.487673] CR2: 7f3eada57000 CR3: 01603000 CR4: 26e0 [ 9275.499547] DR0: DR1: DR2: [ 9275.511395] DR3: DR6: 0ff0 DR7: 0400 [ 9275.523192] Process btrfs-endio-wri (pid: 3654, threadinfo 88025f274000, task 88032554b020) [ 9275.535168] Stack: [ 9275.546852] 06e0c000 1000 00b8c1376000 1000 [ 9275.558616] 880331d7c800 0001 8800bb38d880 880331d7c800 [ 9275.570416] 88025f275cb0 a014b53f 88025f275ce0 8802e21ad7f0 [ 9275.582149] Call Trace: [ 9275.593731] [] ? start_transaction+0x1a9/0x1d8 [btrfs] [ 9275.605513] [] btrfs_finish_ordered_io+0x1e6/0x2c2 [btrfs] [ 9275.617426] [] btrfs_writepage_end_io_hook+0x1a/0x1c [btrfs] [ 9275.629403] [] end_bio_extent_writepage+0xae/0x159 [btrfs] [ 9275.641463] [] bio_endio+0x2d/0x2f [ 9275.653462] [] end_workqueue_fn+0x111/0x120 [btrfs] [ 9275.665484] [] worker_loop+0x195/0x4c4 [btrfs] [ 9275.677451] [] ? worker_loop+0x0/0x4c4 [btrfs] [ 9275.689317] [] ? worker_loop+0x0/0x4c4 [btrfs] [ 9275.701079] [] kthread+0x82/0x8a [ 9275.712839] [] kernel_thread_helper+0x4/0x10 [ 9275.724455] [] ? kthread+0x0/0x8a [ 9275.735873] [] ? kernel_thread_helper+0x0/0x10 [ 9275.747329] Code: 0f 0b eb fe 80 88 88 00 00 00 08 45 31 c9 48 8b 4d 88 4c 8d 45 c0 4c 01 e9 4c 89 ea 4c 89 e6 4c 89 ff e8 7c 4c 00 00 85 c0 74 04 <0f> 0b eb fe 49 8b 84 24 a8 00 00 00 4c 89 6d a9 48 89 45 a0 c6 [ 9275.771177] RIP [] T.1234+0x76/0x201 [btrfs] [ 9275.782973] RSP [ 9276.049117] ---[ end trace 39a68c45b2f4ff68 ]--- The previous BUG also happened at runtime (not umount or anything), with rtorrent running and ffmpeg writing a video stream from /dev/video0 to btrfs. [1719001.040399] [ cut here ] [1719001.0415
Re: btrfs: 21 minutes to read 1.2M file directory
On Tue, Dec 21, 2010 at 03:07:33AM +0200, Felipe Contreras wrote: > On Tue, Dec 21, 2010 at 12:24 AM, Andy Isaacson wrote: > > I have a directory with 1.2M files in it, which makes readdir very slow > > on btrfs with cold caches (although it's reasonably fast with hot caches > > as in the first example below): > > Sounds like: > > Bug 21562 - btrfs is dead slow due to fragmentation > https://bugzilla.kernel.org/show_bug.cgi?id=21562 Hmmm, how do I look at the btree layout for a given inode? btrfs-image for this filesystem is 1.7GiB .bz2, so I'm afraid it's not reasonable to publish it. -andy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs: 21 minutes to read 1.2M file directory
Sigh, wrong btrfs address on the original. Apologies for the double-post. On Mon, Dec 20, 2010 at 02:24:46PM -0800, Andy Isaacson wrote: > I have a directory with 1.2M files in it, which makes readdir very slow > on btrfs with cold caches (although it's reasonably fast with hot caches > as in the first example below): > > % time find /btr/foo > /btr/foo.list > find /btr/foo > /btr/foo.list 4.10s user 7.97s system 36% cpu 33.275 total > % head /btr/foo.list > /btr/foo > /btr/foo/1281373625.777.fg.jpg > /btr/foo/1281373625.777.bg.jpg > /btr/foo/1281373625.948.fg.jpg > /btr/foo/1281373625.948.bg.jpg > /btr/foo/1281373626.096.fg.jpg > /btr/foo/1281373626.096.bg.jpg > /btr/foo/1281373626.218.fg.jpg > /btr/foo/1281373626.218.bg.jpg > /btr/foo/1281373626.350.fg.jpg > % wc !$ > wc /btr/foo.list > 1216 1216 401499940 /btr/foo.list > % wc -l /btr/foo.list > 1216 /btr/foo.list > % sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0 > vm.drop_caches = 3 > vm.drop_caches = 0 > % time find /btr/foo > /btr/foo.list.2 > find /btr/foo > /btr/foo.list.2 5.62s user 24.54s system 2% cpu 21:40.90 > total > % uname -a > Linux pyron 2.6.36-rc7-00149-g29979aa #71 SMP Wed Oct 13 09:42:57 PDT 2010 > x86_64 GNU/Linux > > Interestingly, while readdir is busy I'm only seeing IO on sdb even > though the btrfs is on 3 targets: > > Label: btr uuid: 1271de53-b3d2-4d68-9d48-b19487e1c982 > Total devices 3 FS bytes used 555.13GB > devid1 size 18.65GB used 18.64GB path /dev/sda2 > devid3 size 512.00GB used 44.13GB path /dev/sdc1 > devid2 size 512.00GB used 511.76GB path /dev/sdb1 > > "iostat -k 1 | grep sdb" tells me: > > Device:tpskB_read/skB_wrtn/skB_readkB_wrtn > > sdb 173.00 692.00 0.00692 0 > sdb 185.00 740.00 0.00740 0 > sdb 198.00 792.00 0.00792 0 > sdb 177.00 712.00 0.00712 0 > > I updated to a recent git and it's still slow (my test hasn't completed > yet 19 minutes in): > > Linux pyron 2.6.37-rc6-11882-g55ec86f #72 SMP Mon Dec 20 13:34:38 PST 2010 > x86_64 GNU/Linux > > The devices are: > > [1.834527] ata1.00: ATA-7: INTEL SSDSA2M040G2GC, 2CV102HD, max UDMA/133 > [1.834816] ata1.00: 78165360 sectors, multi 1: LBA48 NCQ (depth 31/32) > [1.835369] ata1.00: configured for UDMA/133 > [1.835776] scsi 0:0:0:0: Direct-Access ATA INTEL SSDSA2M040 2CV1 > PQ: 0 ANSI: 5 > ... > [2.904919] ata3.00: ATA-8: ST31500341AS, CC1H, max UDMA/133 > [2.905206] ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) > [2.947393] ata3.00: configured for UDMA/133 > [2.947850] scsi 2:0:0:0: Direct-Access ATA ST31500341AS CC1H > PQ: 0 ANSI: 5 > ... > [3.989664] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > [4.018524] ata5.00: ATA-8: ST31500341AS, CC1H, max UDMA/133 > [4.018811] ata5.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 31/32) > [4.060838] ata5.00: configured for UDMA/133 > [4.061205] scsi 4:0:0:0: Direct-Access ATA ST31500341AS CC1H > PQ: 0 ANSI: 5 > > The host is a "Intel(R) Core(TM) i7 CPU 930 @2.80GHz" with 12GB RAM. > > Thanks, > -andy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html