Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation
On sun, 27 Mar 2011 14:30:55 +0900, Itaru Kitayama wrote: Chris' stress test, stress.sh -n 50 -c /mnt/linux-2.6 /mnt gave me another lockdep splat (see below). I applied your V5 patches on top of the next-rc branch. I got it. It is because the allocation flag of the metadata's page cache, which is stored in the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we allocate pages for btree's page cache, this lockdep warning will be triggered. I think even without my patch, this lockdep warning can also be triggered, btrfs_evict_inode() do the similar operations like what I do in the btrfs_destroy_inode(). Task1 Kswap0 task open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag. I will make a separate patch to fix it. I haven't triggered it in my actual testing, but do you think we can iterate a list of block groups in an lockless manner using rcu? May be we can use it, but AFAIK, the write-side of the sleepable RCU is quite slow. Though the operations of the block group list are few, I think we should do some test to check the performance regression. Thanks Miao diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2164296..f40ff4e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -740,6 +740,7 @@ struct btrfs_space_info { struct list_head block_groups[BTRFS_NR_RAID_TYPES]; spinlock_t lock; struct rw_semaphore groups_sem; + struct srcu_struct groups_srcu; atomic_t caching_threads; }; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 9e4c9f4..22d6dbb 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3003,6 +3003,7 @@ static int update_space_info(struct btrfs_fs_info *info, u64 flags, for (i = 0; i BTRFS_NR_RAID_TYPES; i++) INIT_LIST_HEAD(found-block_groups[i]); init_rwsem(found-groups_sem); + init_srcu_struct(found-groups_srcu); spin_lock_init(found-lock); found-flags = flags (BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_SYSTEM | @@ -4853,6 +4854,7 @@ static noinline int find_free_extent(struct btrfs_trans_handle *trans, int data) { int ret = 0; + int idx; struct btrfs_root *root = orig_root-fs_info-extent_root; struct btrfs_free_cluster *last_ptr = NULL; struct btrfs_block_group_cache *block_group = NULL; @@ -4929,7 +4931,7 @@ ideal_cache: if (block_group block_group_bits(block_group, data) (block_group-cached != BTRFS_CACHE_NO || search_start == ideal_cache_offset)) { - down_read(space_info-groups_sem); + idx = srcu_read_lock(space_info-groups_srcu); if (list_empty(block_group-list) || block_group-ro) { /* @@ -4939,7 +4941,7 @@ ideal_cache: * valid */ btrfs_put_block_group(block_group); - up_read(space_info-groups_sem); + srcu_read_unlock(space_info-groups_srcu, idx); } else { index = get_block_group_index(block_group); goto have_block_group; @@ -4949,8 +4951,8 @@ ideal_cache: } } search: - down_read(space_info-groups_sem); - list_for_each_entry(block_group, space_info-block_groups[index], + idx = srcu_read_lock(space_info-groups_srcu); + list_for_each_entry_rcu(block_group, space_info-block_groups[index], list) { u64 offset; int cached; @@ -5197,8 +5199,8 @@ loop: BUG_ON(index != get_block_group_index(block_group)); btrfs_put_block_group(block_group); } - up_read(space_info-groups_sem); - + srcu_read_unlock(space_info-groups_srcu, idx); + if (!ins-objectid ++index BTRFS_NR_RAID_TYPES) goto search; = [ INFO: possible irq lock inversion dependency detected ] 2.6.36-v5+ #2
[PATCH] btrfs: fix possible deadlock by clearing __GFP_FS flag
Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause deadlock. Task1 Kswap0 task open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in GFP_HIGHUSER_MOVABLE flag. Reported-by: Itaru Kitayama kitay...@cl.bb4u.ne.jp Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/disk-io.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..cf55fa0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1632,6 +1632,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, goto fail_bdi; } + fs_info-btree_inode-i_mapping-flags = ~__GFP_FS; + INIT_RADIX_TREE(fs_info-fs_roots_radix, GFP_ATOMIC); INIT_LIST_HEAD(fs_info-trans_list); INIT_LIST_HEAD(fs_info-dead_roots); -- 1.7.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation
Hi Miao, On Sun, 27 Mar 2011 15:00:00 +0800 Miao Xie mi...@cn.fujitsu.com wrote: I got it. It is because the allocation flag of the metadata's page cache, which is stored in the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we allocate pages for btree's page cache, this lockdep warning will be triggered. I think even without my patch, this lockdep warning can also be triggered, btrfs_evict_inode() do the similar operations like what I do in the btrfs_destroy_inode(). Task1 Kswap0 task open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still possible for the kswapd0 to call prune_icache(), no? I still see the lockdep warning even with your patch that clears __GFP_FS in open_ctree(). itaru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation
On sun, 27 Mar 2011 20:09:10 +0900, Itaru Kitayama wrote: Hi Miao, On Sun, 27 Mar 2011 15:00:00 +0800 Miao Xie mi...@cn.fujitsu.com wrote: I got it. It is because the allocation flag of the metadata's page cache, which is stored in the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we allocate pages for btree's page cache, this lockdep warning will be triggered. I think even without my patch, this lockdep warning can also be triggered, btrfs_evict_inode() do the similar operations like what I do in the btrfs_destroy_inode(). Task1 Kswap0 task open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still possible for the kswapd0 to call prune_icache(), no? I still see the lockdep warning even with your patch that clears __GFP_FS in open_ctree(). sorry for my mistake. The above explanation is wrong, it has no business with kswap thread. The correct explanation is Task1 open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() do_try_to_free_pages() shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So balance_pgdat() is impossible to trigger the lockdep. (My clearing __GFP_FS patch's changelog is also wrong.) I see, except btree's page cache, free space cache's page cache is also special, can not use __GFP_FS flag. Thanks Miao itaru -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation
Excerpts from Miao Xie's message of 2011-03-27 07:44:06 -0400: On sun, 27 Mar 2011 20:09:10 +0900, Itaru Kitayama wrote: Hi Miao, On Sun, 27 Mar 2011 15:00:00 +0800 Miao Xie mi...@cn.fujitsu.com wrote: I got it. It is because the allocation flag of the metadata's page cache, which is stored in the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we allocate pages for btree's page cache, this lockdep warning will be triggered. I think even without my patch, this lockdep warning can also be triggered, btrfs_evict_inode() do the similar operations like what I do in the btrfs_destroy_inode(). Task1Kswap0 task open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() wait for reclaiming shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still possible for the kswapd0 to call prune_icache(), no? I still see the lockdep warning even with your patch that clears __GFP_FS in open_ctree(). sorry for my mistake. The above explanation is wrong, it has no business with kswap thread. The correct explanation is Task1 open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() do_try_to_free_pages() shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So balance_pgdat() is impossible to trigger the lockdep. (My clearing __GFP_FS patch's changelog is also wrong.) I see, except btree's page cache, free space cache's page cache is also special, can not use __GFP_FS flag. Ok, I've got your first patch already, I'll add a hunk for the free space cache too. Most of the allocations we're doing are explicitly with GFP_NOFS, so it is just supporting allocations and readahead that should be causing trouble. Thanks! -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2] btrfs: fix possible deadlock by clearing __GFP_FS flag
Changelog V1 - V2: - modify the explanation of the deadlock. - clear __GFP_FS flag in the free space's page cache. Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause deadlock. Task1 open() ... btrfs_search_slot() ... btrfs_cow_block() ... alloc_page() ... do_try_to_free_pages() shrink_slab() ... shrink_icache_memory() ... btrfs_evict_inode() ... btrfs_search_slot() If the path is locked by task1, the deadlock happens. So the btree's page cache and free space's page cache is different with the file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in their i_mapping's flag. Reported-by: Itaru Kitayama kitay...@cl.bb4u.ne.jp Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/disk-io.c |2 ++ fs/btrfs/free-space-cache.c |2 ++ 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..cf55fa0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1632,6 +1632,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, goto fail_bdi; } + fs_info-btree_inode-i_mapping-flags = ~__GFP_FS; + INIT_RADIX_TREE(fs_info-fs_roots_radix, GFP_ATOMIC); INIT_LIST_HEAD(fs_info-trans_list); INIT_LIST_HEAD(fs_info-dead_roots); diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index a039065..57df380 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -88,6 +88,8 @@ struct inode *lookup_free_space_inode(struct btrfs_root *root, } spin_unlock(block_group-lock); + inode-i_mapping-flags = ~__GFP_FS; + return inode; } -- 1.7.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2] btrfs: fix possible deadlock by clearing __GFP_FS flag
Excerpts from Miao Xie's message of 2011-03-27 08:27:30 -0400: Changelog V1 - V2: - modify the explanation of the deadlock. - clear __GFP_FS flag in the free space's page cache. diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index a039065..57df380 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -88,6 +88,8 @@ struct inode *lookup_free_space_inode(struct btrfs_root *root, } spin_unlock(block_group-lock); +inode-i_mapping-flags = ~__GFP_FS; + return inode; } I did this part slightly differently, in btrfs_read_locked_inode. That way we know the mask isn't changing while page allocations are taking place. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Phishing (was: Account Verification)
Hallo, Webmail, Du meintest am 27.03.11: You have almost exceeded your webmail storage quota. To avoid account deletion, please click on the link below http://vwebtips.host-ed.net/Session_id_2011.htm That's a phishing invitation. Who can delete this spammer from the mailing list? Viele Gruesse! Helmut -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Phishing (was: Account Verification)
On Sun, 2011-03-27 at 18:22 +0200, Helmut Hullen wrote: Du meintest am 27.03.11: You have almost exceeded your webmail storage quota. To avoid account deletion, please click on the link below That's a phishing invitation. Who can delete this spammer from the mailing list? Like most (but not all) Linux development mailing lists, the linux-btrfs mailing list is open for anyone to post, even if they're not subscribed. There are filters on vger.kernel.org which catch a lot of the spam, but not all of it; the reality is that some will slip through. You just have to live with it: make sure you run your own spam filters and are careful about links in mails. -- Calvin Walton calvin.wal...@kepstin.ca -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400: Dear list, This is V4 of batched discard support, now we will get full mapping of the free space on each device for RAID0/1/10/DUP instead of just a single stripe length, and tested with xfsstests 251, Thanks. I've pushed this out into the for-linus branch, along with a full merge to 2.6.39 current git. Please take a look and make sure I've merged it correctly. Thanks! -chris Changelog V4: *make btrfs_map_block() return full mapping. Changelog V3: *fix style problems. *rebase to 2.6.38-rc7. Changelog V2: *Check if we have devices support trim before trying to trim the fs, also adjust minlen according to the discard_granularity. *Update reserved extent calculations in btrfs_trim_block_group(). *Call cond_resched() without checking need_resched() *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(), so we won't search the same extent for twice. *Try harder in btrfs_discard_extent(), now we won't report errors if it's not a EOPNOTSUPP. *make sure the block group is cached before trimming it,or we'll see an empty caching tree if the block group is not cached. *Minor return value fix in btrfs_discard_block_group(). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I cannot mount two (=all) btrfs volumes after crash while TOI suspending
I used btrfs on / and /var. After first suspending on 2.6.38.1 yesterday (screen was black, no disk activity, but it respond to Magic SysRq [sync,remount-ro,reboot]) system cannot boot. A copied system from 3-month-old backup to new disk using ext4 only fs, so I could have btrfs in a module and here are results of my investigations: path-slots[0] is 0 and I observed, that normally it should be =1 for root root-tree.c line 97, btrfs_find_last_root() if (path-slots[0] == 0) { ret = 1; goto out; } Then find_and_setup_root() returns error: disk-io.c line 1026, find_and_setup_root() ret = btrfs_find_last_root(tree_root, objectid, root-root_item, root-root_key); if (ret 0) return -ENOENT; ...and so does open_ctree() disk-io.c line 1945, open_ctree() ret = find_and_setup_root(tree_root, fs_info, BTRFS_EXTENT_TREE_OBJECTID, extent_root); if (ret) goto fail_tree_root; What can I do now to mount my btrfs? Here is log after trying to mount: kernel: device label jroot devid 1 transid 25083 /dev/sdb3 kernel: btrfs: allowing degraded mounts kernel: parent transid verify failed on 3057614848 wanted 25083 found 25080 kernel: parent transid verify failed on 3057614848 wanted 25083 found 25080 kernel: parent transid verify failed on 3057614848 wanted 25083 found 25080 kernel: btrfs: open_ctree failed btrfsck from next btrfs-progs-unstable: # ./btrfsck /dev/sdb3 using SB copy 1, bytenr 67108864 parent transid verify failed on 3057614848 wanted 25083 found 25080 parent transid verify failed on 3057614848 wanted 25083 found 25080 parent transid verify failed on 3057614848 wanted 25083 found 25080 btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' failed. Aborted Same with /var, and using -s 0 -s 1. What can I do to restore my data? (I only wanted /etc and /var/log...) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400: Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400: Dear list, This is V4 of batched discard support, now we will get full mapping of the free space on each device for RAID0/1/10/DUP instead of just a single stripe length, and tested with xfsstests 251, Thanks. I've pushed this out into the for-linus branch, along with a full merge to 2.6.39 current git. Please take a look and make sure I've merged it correctly. Hmmm, this was doing mod operations on 64 bit numbers, so it didn't compile at all on 32 bit machines. I've fixed it up and pushed the result out to for-linus. Please check the math ;) -chris Thanks! -chris Changelog V4: *make btrfs_map_block() return full mapping. Changelog V3: *fix style problems. *rebase to 2.6.38-rc7. Changelog V2: *Check if we have devices support trim before trying to trim the fs, also adjust minlen according to the discard_granularity. *Update reserved extent calculations in btrfs_trim_block_group(). *Call cond_resched() without checking need_resched() *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(), so we won't search the same extent for twice. *Try harder in btrfs_discard_extent(), now we won't report errors if it's not a EOPNOTSUPP. *make sure the block group is cached before trimming it,or we'll see an empty caching tree if the block group is not cached. *Minor return value fix in btrfs_discard_block_group(). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs
Excerpts from Chris Mason's message of 2011-03-27 21:30:20 -0400: Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400: Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400: Dear list, This is V4 of batched discard support, now we will get full mapping of the free space on each device for RAID0/1/10/DUP instead of just a single stripe length, and tested with xfsstests 251, Thanks. I've pushed this out into the for-linus branch, along with a full merge to 2.6.39 current git. Please take a look and make sure I've merged it correctly. Hmmm, this was doing mod operations on 64 bit numbers, so it didn't compile at all on 32 bit machines. I've fixed it up and pushed the result out to for-linus. Please check the math ;) BTW, I just rebased this so the incremental fix was before merging into Linus' tree. -chris -chris Thanks! -chris Changelog V4: *make btrfs_map_block() return full mapping. Changelog V3: *fix style problems. *rebase to 2.6.38-rc7. Changelog V2: *Check if we have devices support trim before trying to trim the fs, also adjust minlen according to the discard_granularity. *Update reserved extent calculations in btrfs_trim_block_group(). *Call cond_resched() without checking need_resched() *Use bitmap_clear_bits() and unlink_free_space() instead of btrfs_remove_free_space(), so we won't search the same extent for twice. *Try harder in btrfs_discard_extent(), now we won't report errors if it's not a EOPNOTSUPP. *make sure the block group is cached before trimming it,or we'll see an empty caching tree if the block group is not cached. *Minor return value fix in btrfs_discard_block_group(). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] Btrfs: Fix uninitialized root flags for subvolumes
root_item-flags and root_item-byte_limit are not initialized when a subvolume is created. This bug is not revealed until we added readonly snapshot support - now you mount a btrfs filesystem and you may find the subvolumes in it are readonly. To work around this problem, we steal a bit from root_item-inode_item-flags, and use it to indicate if those fields have been properly initialized. When we read a tree root from disk, we check if the bit is set, and if not we'll set the flag and initialize the two fields of the root item. Reported-by: Andreas Philipp philipp.andr...@gmail.com Signed-off-by: Li Zefan l...@cn.fujitsu.com --- fs/btrfs/ctree.h |4 fs/btrfs/disk-io.c |4 +++- fs/btrfs/ioctl.c |4 fs/btrfs/root-tree.c | 18 ++ fs/btrfs/transaction.c |1 + 5 files changed, 30 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8b4b9d1..ff6b991 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1284,6 +1284,8 @@ struct btrfs_root { #define BTRFS_INODE_NOATIME(1 9) #define BTRFS_INODE_DIRSYNC(1 10) +#define BTRFS_INODE_ROOT_ITEM_INIT (1 31) + /* some macros to generate set/get funcs for the struct fields. This * assumes there is a lefoo_to_cpu for every type, so lets make a simple * one for u8: @@ -2355,6 +2357,8 @@ int btrfs_find_dead_roots(struct btrfs_root *root, u64 objectid); int btrfs_find_orphan_roots(struct btrfs_root *tree_root); int btrfs_set_root_node(struct btrfs_root_item *item, struct extent_buffer *node); +void btrfs_check_and_init_root_item(struct btrfs_root_item *item); + /* dir-item.c */ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *name, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..4f8dafc 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1184,8 +1184,10 @@ struct btrfs_root *btrfs_read_fs_root_no_radix(struct btrfs_root *tree_root, root-commit_root = btrfs_root_node(root); BUG_ON(!root-node); out: - if (location-objectid != BTRFS_TREE_LOG_OBJECTID) + if (location-objectid != BTRFS_TREE_LOG_OBJECTID) { root-ref_cows = 1; + btrfs_check_and_init_root_item(root-root_item); + } return root; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 5fdb2ab..2ff51e6 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -294,6 +294,10 @@ static noinline int create_subvol(struct btrfs_root *root, inode_item-nbytes = cpu_to_le64(root-leafsize); inode_item-mode = cpu_to_le32(S_IFDIR | 0755); + root_item.flags = 0; + root_item.byte_limit = 0; + inode_item-flags = cpu_to_le64(BTRFS_INODE_ROOT_ITEM_INIT); + btrfs_set_root_bytenr(root_item, leaf-start); btrfs_set_root_generation(root_item, trans-transid); btrfs_set_root_level(root_item, 0); diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c index 6a1086e..3e45c32 100644 --- a/fs/btrfs/root-tree.c +++ b/fs/btrfs/root-tree.c @@ -471,3 +471,21 @@ again: btrfs_free_path(path); return 0; } + +/* + * Old btrfs forgets to init root_item-flags and root_item-byte_limit + * for subvolumes. To work around this problem, we steal a bit from + * root_item-inode_item-flags, and use it to indicate if those fields + * have been properly initialized. + */ +void btrfs_check_and_init_root_item(struct btrfs_root_item *root_item) +{ + u64 inode_flags = le64_to_cpu(root_item-inode.flags); + + if (!(inode_flags BTRFS_INODE_ROOT_ITEM_INIT)) { + inode_flags |= BTRFS_INODE_ROOT_ITEM_INIT; + root_item-inode.flags = cpu_to_le64(inode_flags); + root_item-flags = 0; + root_item-byte_limit = 0; + } +} diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3d73c8d..f3d6681 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -970,6 +970,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, record_root_in_trans(trans, root); btrfs_set_root_last_snapshot(root-root_item, trans-transid); memcpy(new_root_item, root-root_item, sizeof(*new_root_item)); + btrfs_check_and_init_root_item(new_root_item); root_flags = btrfs_root_flags(new_root_item); if (pending-readonly) -- 1.7.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html