lockdep warnings
Running with lockdep I see these warnings (running 2.6.37-rc1) It occurred during the time when rsync is running backup. Nov 14 12:03:31 nehalam kernel: [ 5527.284541] = Nov 14 12:03:31 nehalam kernel: [ 5527.284544] [ INFO: possible recursive locking detected ] Nov 14 12:03:31 nehalam kernel: [ 5527.284546] 2.6.37-rc1+ #67 Nov 14 12:03:31 nehalam kernel: [ 5527.284547] - Nov 14 12:03:31 nehalam kernel: [ 5527.284549] rsync/2782 is trying to acquire lock: Nov 14 12:03:31 nehalam kernel: [ 5527.284551] (&(&eb->lock)->rlock){+.+...}, at: [] btrfs_try_spin_lock+0x53/0xd1 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284567] Nov 14 12:03:31 nehalam kernel: [ 5527.284567] but task is already holding lock: Nov 14 12:03:31 nehalam kernel: [ 5527.284569] (&(&eb->lock)->rlock){+.+...}, at: [] btrfs_clear_lock_blocking+0x22/0x2c [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284581] Nov 14 12:03:31 nehalam kernel: [ 5527.284581] other info that might help us debug this: Nov 14 12:03:31 nehalam kernel: [ 5527.284583] 2 locks held by rsync/2782: Nov 14 12:03:31 nehalam kernel: [ 5527.284585] #0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: [] do_lookup+0x9d/0x10d Nov 14 12:03:31 nehalam kernel: [ 5527.284592] #1: (&(&eb->lock)->rlock){+.+...}, at: [] btrfs_clear_lock_blocking+0x22/0x2c [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284605] Nov 14 12:03:31 nehalam kernel: [ 5527.284605] stack backtrace: Nov 14 12:03:31 nehalam kernel: [ 5527.284607] Pid: 2782, comm: rsync Not tainted 2.6.37-rc1+ #67 Nov 14 12:03:31 nehalam kernel: [ 5527.284609] Call Trace: Nov 14 12:03:31 nehalam kernel: [ 5527.284615] [] __lock_acquire+0xc7a/0xcf1 Nov 14 12:03:31 nehalam kernel: [ 5527.284619] [] ? activate_page+0x130/0x13f Nov 14 12:03:31 nehalam kernel: [ 5527.284622] [] lock_acquire+0xd1/0xf7 Nov 14 12:03:31 nehalam kernel: [ 5527.284633] [] ? btrfs_try_spin_lock+0x53/0xd1 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284638] [] _raw_spin_lock+0x31/0x40 Nov 14 12:03:31 nehalam kernel: [ 5527.284648] [] ? btrfs_try_spin_lock+0x53/0xd1 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284659] [] ? btrfs_clear_lock_blocking+0x22/0x2c [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284669] [] btrfs_try_spin_lock+0x53/0xd1 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284677] [] btrfs_search_slot+0x3e6/0x513 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284687] [] btrfs_lookup_inode+0x2f/0x8f [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284698] [] ? btrfs_init_locked_inode+0x0/0x2e [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284709] [] btrfs_iget+0xc3/0x415 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284721] [] btrfs_lookup_dentry+0x105/0x3c4 [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284724] [] ? trace_hardirqs_on+0xd/0xf Nov 14 12:03:31 nehalam kernel: [ 5527.284735] [] btrfs_lookup+0x16/0x2e [btrfs] Nov 14 12:03:31 nehalam kernel: [ 5527.284738] [] d_alloc_and_lookup+0x55/0x74 Nov 14 12:03:31 nehalam kernel: [ 5527.284741] [] do_lookup+0xbb/0x10d Nov 14 12:03:31 nehalam kernel: [ 5527.284744] [] link_path_walk+0x2a6/0x3fc Nov 14 12:03:31 nehalam kernel: [ 5527.284746] [] path_walk+0x69/0xd9 Nov 14 12:03:31 nehalam kernel: [ 5527.284750] [] ? strncpy_from_user+0x48/0x76 Nov 14 12:03:31 nehalam kernel: [ 5527.284753] [] do_path_lookup+0x2a/0x4f Nov 14 12:03:31 nehalam kernel: [ 5527.284756] [] user_path_at+0x56/0x9a Nov 14 12:03:31 nehalam kernel: [ 5527.284760] [] ? might_fault+0x5c/0xac Nov 14 12:03:31 nehalam kernel: [ 5527.284764] [] ? cp_new_stat+0xf7/0x10d Nov 14 12:03:31 nehalam kernel: [ 5527.284767] [] vfs_fstatat+0x37/0x62 Nov 14 12:03:31 nehalam kernel: [ 5527.284770] [] vfs_lstat+0x1e/0x20 Nov 14 12:03:31 nehalam kernel: [ 5527.284772] [] sys_newlstat+0x1f/0x3d Nov 14 12:03:31 nehalam kernel: [ 5527.284776] [] ? trace_hardirqs_on_caller+0x118/0x13c Nov 14 12:03:31 nehalam kernel: [ 5527.284779] [] ? trace_hardirqs_on_thunk+0x3a/0x3f Nov 14 12:03:31 nehalam kernel: [ 5527.284783] [] system_call_fastpath+0x16/0x1b -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
namespace routines that could be static
I got namespace.pl working again, and it showed the following routines could be declared static. fs/btrfs/ctree btrfs_clear_path_blocking btrfs_insert_some_items btrfs_prev_leaf fs/btrfs/delayed-ref btrfs_delayed_ref_pending fs/btrfs/dir-item btrfs_match_dir_item_name fs/btrfs/disk-io btrfs_congested_async btrfs_lookup_fs_root btrfs_read_fs_root write_all_supers fs/btrfs/extent-tree block_rsv_release_bytes btrfs_get_block_group btrfs_init_new_buffer fs/btrfs/extent_io extent_bmap extent_commit_write extent_prepare_write set_range_dirty wait_extent_bit wait_on_extent_buffer_writeback wait_on_extent_writeback fs/btrfs/file-item btrfs_lookup_csum fs/btrfs/free-space-cache btrfs_block_group_free_space fs/btrfs/inode btrfs_orphan_del btrfs_writepages fs/btrfs/inode-map btrfs_find_highest_inode fs/btrfs/ioctl btrfs_ioctl_space_info fs/btrfs/locking btrfs_try_tree_lock fs/btrfs/print-tree btrfs_print_tree fs/btrfs/root-tree btrfs_search_root fs/btrfs/struct-funcs btrfs_device_bandwidth btrfs_device_group btrfs_device_seek_speed btrfs_device_start_offset btrfs_dir_transid btrfs_disk_block_group_chunk_objectid btrfs_disk_block_group_flags btrfs_disk_root_generation btrfs_disk_root_level btrfs_file_extent_generation btrfs_inode_transid btrfs_set_chunk_io_align btrfs_set_chunk_io_width btrfs_set_chunk_length btrfs_set_chunk_num_stripes btrfs_set_chunk_owner btrfs_set_chunk_sector_size btrfs_set_chunk_stripe_len btrfs_set_chunk_sub_stripes btrfs_set_chunk_type btrfs_set_disk_block_group_chunk_objectid btrfs_set_disk_block_group_flags btrfs_set_disk_block_group_used btrfs_set_disk_root_bytenr btrfs_set_disk_root_generation btrfs_set_disk_root_level btrfs_set_disk_root_refs btrfs_set_extent_refs_v0 btrfs_set_ref_generation_v0 btrfs_set_ref_objectid_v0 btrfs_set_ref_root_v0 btrfs_set_stripe_devid btrfs_set_stripe_offset fs/btrfs/sysfs btrfs_sysfs_add_root btrfs_sysfs_add_super btrfs_sysfs_del_root btrfs_sysfs_del_super fs/btrfs/tree-log btrfs_log_inode_parent fs/btrfs/volumes btrfs_add_device btrfs_alloc_dev_extent btrfs_lock_volumes btrfs_read_super_device btrfs_unlock_volumes btrfs_unplug_page -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid1 with failing drive
On Wed, 29 Oct 2008 14:02:04 -0600 Joe Peterson <[EMAIL PROTECTED]> wrote: > Chris Mason wrote: > > On Tue, 2008-10-28 at 16:48 -0700, Stephen Hemminger wrote: > >> I have a system with a pair of small/fast but unreliable scsi drives. > >> I tried setting up a raid1 configuration and using it for builds. > >> Using 2.6.26.7 and btrfs 0.16. When using ext3 (no raid) on same > >> partition, > >> the driver would recalibrate and log something an keep going. But with > >> btrfs it doesn't recover and takes drive offline. > >> > > > > Btrfs doesn't really take drives offline. In the future we'll notice > > that a drive is returning all errors, but for now we'll probably just > > keep beating on it. > > It can also detect when a bad checksum is returned or the drive returns an i/o > error, right? Would the "all-zero" test be a heuristic in case neither of > those > happened (but I cannot imagine why the zeros would get by the checksum check)? > > > The IO error handling code in btrfs currently expects it'll be able to > > find at least one good mirror. You're probably hitting some bad > > conditions as it fails to clean up. > > What happens (or rather, will happen) on a regular/non-mirrored btrfs? Would > it > then return an i/o error to the user and/or mark a block as bad? In ZFS, the > state of the volume changes, noting an issue (also happens on a scrub), and > the > user can check this. What I don't like about ZFS is that the user can clear > the > condition, and then it appears OK again until another scrub. > > -Joe I think my problem was that the meta data was mirrored but not the actual data. This lead to total meltdown when data got an error. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_tree_lock & trylock
On Mon, 08 Sep 2008 12:20:32 -0400 Chris Mason <[EMAIL PROTECTED]> wrote: > On Mon, 2008-09-08 at 12:13 -0400, jim owens wrote: > > Chris Mason wrote: > > >> My guess is that the improvement happens mostly from the first couple of > > >> tries, > > >> not from repeated spinning. And since it is a mutex, you could even do: > > > > > > I started with lower spin counts, I really didn't want to spin at all > > > but the current values came from trial and error. > > > > Exactly the problem Steven is saying about adaptive locking. > > > > Using benchmarks (or any test), on a small sample of systems > > leads you to conclude "this design/tuning combination is better". > > > > I've been burned repeatedly by that... ugly things happen > > as you move away from your design testing center. > > > > I'm not saying your code does not work, just that we need > > a lot more proof with different configurations and loads > > to see that is "at least no worse". > > Oh, I completely agree. This is tuned on just one CPU in a handful of > workloads. In general, it makes sense to spin for about as long as it > takes someone to do a btree search in the block, which we could > benchmark up front at mount time. > > I could also get better results from an API where the holder of the lock > indicates it is going to hold on to things for a while, which might > happen right before doing an IO. > > Over the long term these are important issues, but for today I'm focused > on the disk format ;) > > -chris > > Not to mention the problem that developers seem to have faster machines than average user, but slower than the enterprise and future generation CPU's. So any tuning value seems to get out of date fast. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_tree_lock & trylock
On Mon, 8 Sep 2008 17:47:14 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Mon, Sep 08, 2008 at 08:07:51AM -0700, Stephen Hemminger wrote: > > On Mon, 8 Sep 2008 16:20:52 +0200 > > Andi Kleen <[EMAIL PROTECTED]> wrote: > > > > > On Mon, Sep 08, 2008 at 10:02:30AM -0400, Chris Mason wrote: > > > > On Mon, 2008-09-08 at 15:54 +0200, Andi Kleen wrote: > > > > > > The idea is to try to spin for a bit to avoid scheduling away, > > > > > > which is > > > > > > especially important for the high levels. Most holders of the mutex > > > > > > let it go very quickly. > > > > > > > > > > Ok but that surely should be implemented in the general mutex code > > > > > then > > > > > or at least in a standard adaptive mutex wrapper? > > > > > > > > That depends, am I the only one crazy enough to think its a good idea? > > > > > > Adaptive mutexes are classic, a lot of other OS have it. > > > > The problem is that they are a nuisance. It is impossible to choose > > the right trade off between spin an no-spin, also they optimize for > > a case that doesn't occur often enough to be justified. > > At least the numbers done by Gregory et.al. were dramatic improvements. > Given that was an extreme case in that the rt kernel does everything > with mutexes, but it was still a very clear win on a wide range > of workloads. > > -Andi My guess is that the improvement happens mostly from the first couple of tries, not from repeated spinning. And since it is a mutex, you could even do: if (mutex_trylock(&eb->mutex)) return 0; cpu_relax(); if (mutex_trylock(&eb->mutex)) return 0; yield(); return mutex_lock(&eb->mutex); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_tree_lock & trylock
On Mon, 8 Sep 2008 16:20:52 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > On Mon, Sep 08, 2008 at 10:02:30AM -0400, Chris Mason wrote: > > On Mon, 2008-09-08 at 15:54 +0200, Andi Kleen wrote: > > > > The idea is to try to spin for a bit to avoid scheduling away, which is > > > > especially important for the high levels. Most holders of the mutex > > > > let it go very quickly. > > > > > > Ok but that surely should be implemented in the general mutex code then > > > or at least in a standard adaptive mutex wrapper? > > > > That depends, am I the only one crazy enough to think its a good idea? > > Adaptive mutexes are classic, a lot of other OS have it. The problem is that they are a nuisance. It is impossible to choose the right trade off between spin an no-spin, also they optimize for a case that doesn't occur often enough to be justified. People seem to repeatedly come up with adaptive mutex based on intuitive hunch, and never do much analysis before or afterwards. You need some facts to come up with a useful model: % of time lock is contended average lock hold time overhead of entry-exit for lock primitive (spin time) overhead of the adaptive version either pure spin or pure mutex Also, adaptive locks have even worse unfairness than spin locks under contention. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs day 1
On Thu, 14 Aug 2008 14:21:22 -0400 Chris Mason <[EMAIL PROTECTED]> wrote: > On Thu, 2008-08-14 at 11:06 -0700, Stephen Hemminger wrote: > > > > So, the question is why the kernel compile workload works for me. What > > > kind of hardware are you running (ram, cpu, disks?) > > > > Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz > > Memory 2G > > Disk 80G (partition was 20G) > > > > It seems you have the secret to corrupting things. I'll try to > reproduce with smaller partitions and less ram here. > > -chris > > Actually, the partition that got corrupted was 60G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs day 1
On Thu, 14 Aug 2008 13:26:00 -0400 Chris Mason <[EMAIL PROTECTED]> wrote: > On Thu, 2008-08-14 at 09:19 -0700, Stephen Hemminger wrote: > > On Thu, 14 Aug 2008 06:25:14 -0400 > > Chris Mason <[EMAIL PROTECTED]> wrote: > > > > > On Thu, 2008-08-14 at 00:11 -0700, Stephen Hemminger wrote: > > > > Setup new 60G home partition on laptop as a real life test of 0.16. > > > > Using Ubuntu standard kernel 2.6.24-19-generic on i386 > > > > > > > > > > Thanks for giving things a try > > > > > > > I notice that during normal (busy time) everything seems fine, but > > > > after going away > > > > for a while and coming back, it seems sluggish. Lots of errors in log: > > > > > > > > btrfs csum failed ino 139988 off 4583424 csum 3821684403 private 0 > > > > btrfs csum failed ino 139988 off 4579328 csum 3233603900 private 0 > > > > btrfs csum failed ino 139988 off 4575232 csum 306171610 private 0 > > > > > > > > Maybe it isn't handleing spindown properly? or something like that? > > > > > > Were these the only errors in the log, or did you have other errors > > > about not being able to find specific csums? > > > > > > What does 'going away for a while and coming back' include? > > > > 1. Start kernel build > > 2. Come back 2+ hrs later > > > > (So problem could be in step 1 or 2) > > > > All failures are on the same inode > > Ok, my guess is that if you find . -inum 13998 you'll get some form of > vmlinux or .tmp_vmlinux* > > So, the question is why the kernel compile workload works for me. What > kind of hardware are you running (ram, cpu, disks?) Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz Memory 2G Disk 80G (partition was 20G) 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 03) 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03) 00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03) 00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03) 00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03) 00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03) 00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3) 00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 03) 00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03) 04:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller (rev 14) 0c:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN Network Connection (rev 61) 1c:03.0 CardBus bridge: O2 Micro, Inc. OZ711SP1 Memory CardBus Controller (rev 01) 1c:03.2 SD Host controller: O2 Micro, Inc. Integrated MMC/SD Controller (rev 02) 1c:03.3 Mass storage controller: O2 Micro, Inc. Integrated MS/xD Controller (rev 01) I tried rsyncing from usb disk onto a clean filesystem and got more trouble. This was with 2.6.26.15 kernel with btrfs 0.16 module. [ 136.456741] device fsid 14cc68e1c545e25-7f9a584b5e79ea84 devid 1 transid 12 /dev/sda3 [ 349.390467] bad mapping eb start 50405376 len 4096, wanted 421088093 4 [ 349.390467] [ cut here ] [ 349.390467] WARNING: at /home/shemminger/src/btrfs-0.16/extent_io.c:3180 map_private_extent_buffer+0x86/0xfd [btrfs]() [ 349.390467] Modules linked in: usb_storage i915 drm rfcomm l2cap ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_stats cpufreq_conservative sbs sbshc wmi container iptable_filter ip_tables x_tables dm_crypt dm_mod tcp_htcp btrfs snd_hda_intel snd_pcm_oss snd_mixer_oss pcmcia snd_pcm snd_page_alloc snd_hwdep snd_seq_dummy snd_seq_oss
Re: btrfs day 1
On Thu, 14 Aug 2008 06:25:14 -0400 Chris Mason <[EMAIL PROTECTED]> wrote: > On Thu, 2008-08-14 at 00:11 -0700, Stephen Hemminger wrote: > > Setup new 60G home partition on laptop as a real life test of 0.16. > > Using Ubuntu standard kernel 2.6.24-19-generic on i386 > > > > Thanks for giving things a try > > > I notice that during normal (busy time) everything seems fine, but after > > going away > > for a while and coming back, it seems sluggish. Lots of errors in log: > > > > btrfs csum failed ino 139988 off 4583424 csum 3821684403 private 0 > > btrfs csum failed ino 139988 off 4579328 csum 3233603900 private 0 > > btrfs csum failed ino 139988 off 4575232 csum 306171610 private 0 > > > > Maybe it isn't handleing spindown properly? or something like that? > > Were these the only errors in the log, or did you have other errors > about not being able to find specific csums? > > What does 'going away for a while and coming back' include? 1. Start kernel build 2. Come back 2+ hrs later (So problem could be in step 1 or 2) All failures are on the same inode -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs day 1
Setup new 60G home partition on laptop as a real life test of 0.16. Using Ubuntu standard kernel 2.6.24-19-generic on i386 I notice that during normal (busy time) everything seems fine, but after going away for a while and coming back, it seems sluggish. Lots of errors in log: btrfs csum failed ino 139988 off 4583424 csum 3821684403 private 0 btrfs csum failed ino 139988 off 4579328 csum 3233603900 private 0 btrfs csum failed ino 139988 off 4575232 csum 306171610 private 0 Maybe it isn't handleing spindown properly? or something like that? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html