Re: umount oops
sorry, I made a typo in the testcase (the second mount) Basically, it might be enough to mount two different btrfs filesystems to two different locations, umount one of them and watch /var/log/kern.log for the oops dd if=/dev/zero of=mountme bs=4k count=10 dd if=/dev/zero of=mountme2 bs=4k count=10 mkfs.btrfs mountme mkfs.btrfs mountme2 mkdir loop loop2 mount -o loop mountme loop mount -o loop mountme2 loop2 umount loop # wait a moment On 7/24/08, Lukas Vacek [EMAIL PROTECTED] wrote: Hi, I tried very promising btrfs to test it a little and I experienced a little bug in implementation. I'm not sure where the bug lies however this works quite well to reproduce the problem: dd if=/dev/zero of=mountme bs=4k count=10 dd if=/dev/zero of=mountme2 bs=4k count=10 mkfs.btrfs mountme mkfs.btrfs mountme2 mkdir loop loop2 mount -o loop mountme loop mount -o loop mountme loop2 umount loop # wait a moment maybe SMP machine will be necessary to experience the same thanks for the (otherwise ;-)) grea work and have a nice day, Lukas V. the interesting part of log goes next: Jul 24 22:44:00 minerva kernel: [ 1478.326985] device fsid 5442602040543dd1-d32561672012f7a2 devid 1 transid 12 /dev/loop1 Jul 24 22:44:54 minerva kernel: [ 1532.882212] Unable to handle kernel paging request at 7effdc171f47 RIP: Jul 24 22:44:54 minerva kernel: [ 1532.882256] [wq_per_cpu+0x15/0x20] wq_per_cpu+0x15/0x20 Jul 24 22:44:54 minerva kernel: [ 1532.882405] PGD 0 Jul 24 22:44:54 minerva kernel: [ 1532.882476] Oops: [1] SMP Jul 24 22:44:54 minerva kernel: [ 1532.882572] CPU 1 Jul 24 22:44:54 minerva kernel: [ 1532.882641] Modules linked in: loop binfmt_misc af_packet i915 drm rfcomm l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_ondemand cpufreq_stats freq_table cpufreq_userspace cpufreq_conservative cpufreq_powersave video output container sbs sbshc dock battery iptable_filter ip_tables x_tables aes_x86_64 dm_crypt dm_mod ac btrfs libcrc32c lp snd_hda_intel snd_hwdep snd_pcm_oss snd_pcm snd_page_alloc snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer iTCO_wdt iTCO_vendor_support snd_seq_device button snd evdev shpchp pci_hotplug intel_agp soundcore parport_pc parport pcspkr ext3 jbd mbcache usbhid hid sg sd_mod ata_piix pata_acpi floppy ehci_hcd ata_generic libata scsi_mod r8169 uhci_hcd usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse Jul 24 22:44:54 minerva kernel: [ 1532.885056] Pid: 3959, comm: btrfs-transacti Not tainted 2.6.24-19-generic #1 Jul 24 22:44:54 minerva kernel: [ 1532.885117] RIP: 0010:[wq_per_cpu+0x15/0x20] [wq_per_cpu+0x15/0x20] wq_per_cpu+0x15/0x20 Jul 24 22:44:54 minerva kernel: [ 1532.885224] RSP: 0018:81003e6a1c08 EFLAGS: 00010246 Jul 24 22:44:54 minerva kernel: [ 1532.885281] RAX: 7effdc171f3f RBX: 81003db0b770 RCX: Jul 24 22:44:54 minerva kernel: [ 1532.885342] RDX: RSI: 0001 RDI: 810023e8e940 Jul 24 22:44:54 minerva kernel: [ 1532.885404] RBP: 81003db0a000 R08: R09: 882c9d80 Jul 24 22:44:54 minerva kernel: [ 1532.885465] R10: 81003653ade0 R11: R12: 81003653ade0 Jul 24 22:44:54 minerva kernel: [ 1532.885526] R13: 0001 R14: 810014366680 R15: Jul 24 22:44:54 minerva kernel: [ 1532.885588] FS: () GS:81003e401700() knlGS: Jul 24 22:44:54 minerva kernel: [ 1532.885671] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b Jul 24 22:44:54 minerva kernel: [ 1532.885728] CR2: 7effdc171f47 CR3: 225cc000 CR4: 06e0 Jul 24 22:44:54 minerva kernel: [ 1532.885789] DR0: DR1: DR2: Jul 24 22:44:54 minerva kernel: [ 1532.885851] DR3: DR6: 0ff0 DR7: 0400 Jul 24 22:44:54 minerva kernel: [ 1532.885912] Process btrfs-transacti (pid: 3959, threadinfo 81003e6a, task 81003c74c7e0) Jul 24 22:44:54 minerva kernel: [ 1532.885996] Stack: 8024f97c 81001e3db480 882ca4ce 882c9d80 Jul 24 22:44:54 minerva kernel: [ 1532.886193] 01c08fff 810014366680 81003653ad40 1000 Jul 24 22:44:54 minerva kernel: [ 1532.886364] 81003ee140e8 0001 882e19cf 01c09000 Jul 24 22:44:54 minerva kernel: [ 1532.886495] Call Trace: Jul 24 22:44:54 minerva kernel: [ 1532.886587] [shpchp:queue_work+0x2c/0x50] queue_work+0x2c/0x50 Jul 24 22:44:54 minerva kernel: [ 1532.886665] [btrfs:btrfs_wq_submit_bio+0xbe/0xf0] :btrfs:btrfs_wq_submit_bio+0xbe/0xf0 Jul 24 22:44:54 minerva kernel: [ 1532.886743] [btrfs:__btree_submit_bio_hook+0x0/0x60] :btrfs:__btree_submit_bio_hook+0x0/0x60 Jul 24 22:44:54 minerva kernel: [ 1532.886826] [btrfs:submit_one_bio+0xcf/0x120] :btrfs:submit_one_bio+0xcf/0x120
Re: umount oops
On Fri, 2008-07-25 at 13:14 +0200, Lukas Vacek wrote: sorry, I made a typo in the testcase (the second mount) Basically, it might be enough to mount two different btrfs filesystems to two different locations, umount one of them and watch /var/log/kern.log for the oops Thanks for this bug report, which Btrfs version was in use? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] async-thread: fix possible memory leak
On Fri, 2008-07-25 at 13:34 +0800, Li Zefan wrote: When kthread_run() returns failure, this worker hasn't been added to the list, so btrfs_stop_workers() won't free it. Thanks, I've queued this one up for inclusion. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New data=ordered code pushed out to btrfs-unstable
On Mon, 2008-07-21 at 15:23 -0400, Ric Wheeler wrote: [ lock timeouts and stalls ] Ok, I've made a few changes that should lower overall contenion on the allocation mutex. I'm getting better performance on a 3 million file run, please give it a shot. After an update, clean rebuild reboot, the test is running along and has hit about 10 million files. I still see some messages like: INFO: task pdflush:4051 blocked for more than 120 seconds. The latest code in btrfs-unstable has everything I can safely do right now :) Basically the stalls come from someone doing IO with the allocation mutex held. It is surprising that we should be stalling for such a long time, it is probably a mixture of elevator starvation and btrfs fun. But, btrfs-unstable also has code to replace the page lock with a per-tree block mutex, which will allow me to get rid of the big allocation mutex over the long term. I was able to break up most of the long operations and have them drop/reacquire the allocation mutex to prevent this starvation most of the time. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: add orphan support to print-tree
Hello, This adds orphan support to print-tree so when debug_tree hits an orphan item it will print out orphan item under it so you know what it is. Thanks, Josef diff -r e08f2f90e4f8 ctree.h --- a/ctree.h Thu Jul 24 13:52:04 2008 -0400 +++ b/ctree.h Fri Jul 25 16:18:38 2008 -0400 @@ -54,6 +54,9 @@ struct btrfs_trans_handle; /* directory objectid inside the root tree */ #define BTRFS_ROOT_TREE_DIR_OBJECTID 6ULL + +/* oprhan objectid for tracking unlinked/truncated files */ +#define BTRFS_ORPHAN_OBJECTID -5ULL /* * All files have objectids higher than this. @@ -564,6 +567,7 @@ struct btrfs_root { #define BTRFS_INODE_ITEM_KEY 1 #define BTRFS_INODE_REF_KEY2 #define BTRFS_XATTR_ITEM_KEY 8 +#define BTRFS_ORPHAN_ITEM_KEY 9 /* reserve 3-15 close to the inode for later flexibility */ diff -r e08f2f90e4f8 print-tree.c --- a/print-tree.c Thu Jul 24 13:52:04 2008 -0400 +++ b/print-tree.c Fri Jul 25 16:18:38 2008 -0400 @@ -183,6 +183,9 @@ void btrfs_print_leaf(struct btrfs_root di = btrfs_item_ptr(l, i, struct btrfs_dir_item); print_dir_item(l, item, di); break; + case BTRFS_ORPHAN_ITEM_KEY: + printf(\t\torphan item\n); + break; case BTRFS_ROOT_ITEM_KEY: ri = btrfs_item_ptr(l, i, struct btrfs_root_item); read_extent_buffer(l, root_item, (unsigned long)ri, sizeof(root_item)); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: umount oops
On Fri, 2008-07-25 at 15:54 +0200, Lukas Vacek wrote: the newest in the mercurial repo changeset: 558:9da425337329 tag: tip This should be fixed by the unstable tree, the transaction work queues were not properly being torn down. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] initial version of reference cache
I miss two new created files in previous patch, please use this one. Thanks --- diff -r eb4767aa190e Makefile --- a/Makefile Thu Jul 24 12:25:50 2008 -0400 +++ b/Makefile Sat Jul 26 03:47:26 2008 +0800 @@ -6,7 +6,8 @@ btrfs-y := super.o ctree.o extent-tree.o hash.o file-item.o inode-item.o inode-map.o disk-io.o \ transaction.o bit-radix.o inode.o file.o tree-defrag.o \ extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ - extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o + extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ + ref-cache.o btrfs-$(CONFIG_FS_POSIX_ACL) += acl.o else diff -r eb4767aa190e ctree.c --- a/ctree.c Thu Jul 24 12:25:50 2008 -0400 +++ b/ctree.c Sat Jul 26 03:47:26 2008 +0800 @@ -165,7 +165,7 @@ int btrfs_copy_root(struct btrfs_trans_h btrfs_clear_header_flag(cow, BTRFS_HEADER_FLAG_WRITTEN); WARN_ON(btrfs_header_generation(buf) trans-transid); - ret = btrfs_inc_ref(trans, new_root, buf); + ret = btrfs_inc_ref(trans, new_root, buf, 0); kfree(new_root); if (ret) @@ -232,7 +232,7 @@ int __btrfs_cow_block(struct btrfs_trans WARN_ON(btrfs_header_generation(buf) trans-transid); if (btrfs_header_generation(buf) != trans-transid) { different_trans = 1; - ret = btrfs_inc_ref(trans, root, buf); + ret = btrfs_inc_ref(trans, root, buf, 1); if (ret) return ret; } else { diff -r eb4767aa190e ctree.h --- a/ctree.h Thu Jul 24 12:25:50 2008 -0400 +++ b/ctree.h Sat Jul 26 03:47:26 2008 +0800 @@ -592,6 +592,10 @@ struct btrfs_fs_info { u64 last_alloc; u64 last_data_alloc; + spinlock_t ref_cache_lock; + u64 total_ref_cache_size; + u64 running_ref_cache_size; + u64 avail_data_alloc_bits; u64 avail_metadata_alloc_bits; u64 avail_system_alloc_bits; @@ -613,6 +617,8 @@ struct btrfs_root { spinlock_t node_lock; struct extent_buffer *commit_root; + struct btrfs_leaf_ref_tree *ref_tree; + struct btrfs_root_item root_item; struct btrfs_key root_key; struct btrfs_fs_info *fs_info; @@ -1430,7 +1436,7 @@ int btrfs_reserve_extent(struct btrfs_tr u64 search_end, struct btrfs_key *ins, u64 data); int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, - struct extent_buffer *buf); + struct extent_buffer *buf, int cache_ref); int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 num_bytes, u64 root_objectid, u64 ref_generation, diff -r eb4767aa190e disk-io.c --- a/disk-io.c Thu Jul 24 12:25:50 2008 -0400 +++ b/disk-io.c Sat Jul 26 03:47:26 2008 +0800 @@ -716,6 +716,7 @@ static int __setup_root(u32 nodesize, u3 root-node = NULL; root-inode = NULL; root-commit_root = NULL; + root-ref_tree = NULL; root-sectorsize = sectorsize; root-nodesize = nodesize; root-leafsize = leafsize; @@ -1165,12 +1166,19 @@ static int transaction_kthread(void *arg vfs_check_frozen(root-fs_info-sb, SB_FREEZE_WRITE); mutex_lock(root-fs_info-transaction_kthread_mutex); + printk(btrfs: total reference cache size %Lu\n, + root-fs_info-total_ref_cache_size); + mutex_lock(root-fs_info-trans_mutex); cur = root-fs_info-running_transaction; if (!cur) { mutex_unlock(root-fs_info-trans_mutex); goto sleep; } + + printk(btrfs: running reference cache size %Lu\n, + root-fs_info-running_ref_cache_size); + now = get_seconds(); if (now cur-start_time || now - cur-start_time 30) { mutex_unlock(root-fs_info-trans_mutex); @@ -1233,6 +1241,7 @@ struct btrfs_root *open_ctree(struct sup spin_lock_init(fs_info-hash_lock); spin_lock_init(fs_info-delalloc_lock); spin_lock_init(fs_info-new_trans_lock); + spin_lock_init(fs_info-ref_cache_lock); init_completion(fs_info-kobj_unregister); fs_info-tree_root = tree_root; @@ -1699,6 +1708,11 @@ int close_ctree(struct btrfs_root *root) printk(btrfs: at unmount delalloc count %Lu\n, fs_info-delalloc_bytes); } + if (fs_info-total_ref_cache_size) { + printk(btrfs: at umount reference cache size %Lu\n, + fs_info-total_ref_cache_size); + } + if (fs_info-extent_root-node) free_extent_buffer(fs_info-extent_root-node); diff -r eb4767aa190e extent-tree.c ---
Re: [PATCH] initial version of reference cache
On Fri, 2008-07-25 at 14:29 -0500, Yan Zheng wrote: Hello, This is the initial version of leaf reference cache. The cache stores leaf node's extent references in memory, this can improve the performance of snapshot dropping. Outlines of this patch are (1) allocate struct dirty_root when starting transaction (2) put reference cache in struct dirty_root (3) cache extent references when tree leaves are cow'ed (4) when dropping snapshot, use cached references directly to avoid reading tree leaf. I only can access a notebook currenly, so benchmarking isn't enough. I appreciate any help and comment. I have modified this locally to always cache leaves, even when they don't have file extents in them. That way, walk_down_tree will find the cache and won't have to read the leaf (that doesn't have any extents). So far, it is working very well. I did a run with fs_mark to create 58 million files and had very steady numbers. The unmount took 4 seconds. It used to take over an hour. One question, why not use the block number (byte number) as the key to the rbtree instead of the key? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html