Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-27 Thread Miao Xie
On sun, 27 Mar 2011 14:30:55 +0900, Itaru Kitayama wrote:
 Chris' stress test, stress.sh -n 50 -c /mnt/linux-2.6 /mnt gave me another 
 lockdep splat
 (see below). I applied your V5 patches on top of the next-rc branch.

I got it. It is because the allocation flag of the metadata's page cache, which 
is stored in
the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we 
allocate pages for
btree's page cache, this lockdep warning will be triggered.

I think even without my patch, this lockdep warning can also be triggered, 
btrfs_evict_inode()
do the similar operations like what I do in the btrfs_destroy_inode(). 
  Task1 Kswap0 task
  open()
...
btrfs_search_slot()
  ...
  btrfs_cow_block()
...
alloc_page()
  wait for reclaiming
shrink_slab()
  ...
  shrink_icache_memory()
...
btrfs_evict_inode()
  ...
  btrfs_search_slot()

If the path is locked by task1, the deadlock happens.

So the btree's page cache is different with the file's page cache, it can not 
allocate pages
by GFP_HIGHUSER_MOVABLE flag.

I will make a separate patch to fix it.

 I haven't triggered it in my actual testing, but do you think we can iterate 
 a list of block 
 groups in an lockless manner using rcu?

May be we can use it, but AFAIK, the write-side of the sleepable RCU is quite 
slow. Though the
operations of the block group list are few, I think we should do some test to 
check the performance
regression.

Thanks
Miao

 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 2164296..f40ff4e 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -740,6 +740,7 @@ struct btrfs_space_info {
   struct list_head block_groups[BTRFS_NR_RAID_TYPES];
   spinlock_t lock;
   struct rw_semaphore groups_sem;
 + struct srcu_struct groups_srcu;
   atomic_t caching_threads;
  };
  
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 9e4c9f4..22d6dbb 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3003,6 +3003,7 @@ static int update_space_info(struct btrfs_fs_info 
 *info, u64 flags,
   for (i = 0; i  BTRFS_NR_RAID_TYPES; i++)
   INIT_LIST_HEAD(found-block_groups[i]);
   init_rwsem(found-groups_sem);
 + init_srcu_struct(found-groups_srcu);
   spin_lock_init(found-lock);
   found-flags = flags  (BTRFS_BLOCK_GROUP_DATA |
   BTRFS_BLOCK_GROUP_SYSTEM |
 @@ -4853,6 +4854,7 @@ static noinline int find_free_extent(struct 
 btrfs_trans_handle *trans,
int data)
  {
   int ret = 0;
 + int idx;
   struct btrfs_root *root = orig_root-fs_info-extent_root;
   struct btrfs_free_cluster *last_ptr = NULL;
   struct btrfs_block_group_cache *block_group = NULL;
 @@ -4929,7 +4931,7 @@ ideal_cache:
   if (block_group  block_group_bits(block_group, data) 
   (block_group-cached != BTRFS_CACHE_NO ||
search_start == ideal_cache_offset)) {
 - down_read(space_info-groups_sem);
 + idx = srcu_read_lock(space_info-groups_srcu);
   if (list_empty(block_group-list) ||
   block_group-ro) {
   /*
 @@ -4939,7 +4941,7 @@ ideal_cache:
* valid
*/
   btrfs_put_block_group(block_group);
 - up_read(space_info-groups_sem);
 + srcu_read_unlock(space_info-groups_srcu, idx);
   } else {
   index = get_block_group_index(block_group);
   goto have_block_group;
 @@ -4949,8 +4951,8 @@ ideal_cache:
   }
   }
  search:
 - down_read(space_info-groups_sem);
 - list_for_each_entry(block_group, space_info-block_groups[index],
 + idx = srcu_read_lock(space_info-groups_srcu);
 + list_for_each_entry_rcu(block_group, space_info-block_groups[index],
   list) {
   u64 offset;
   int cached;
 @@ -5197,8 +5199,8 @@ loop:
   BUG_ON(index != get_block_group_index(block_group));
   btrfs_put_block_group(block_group);
   }
 - up_read(space_info-groups_sem);
 -
 + srcu_read_unlock(space_info-groups_srcu, idx);
 + 
   if (!ins-objectid  ++index  BTRFS_NR_RAID_TYPES)
   goto search;
  
 
 
 =
 [ INFO: possible irq lock inversion dependency detected ]
 2.6.36-v5+ #2
 

[PATCH] btrfs: fix possible deadlock by clearing __GFP_FS flag

2011-03-27 Thread Miao Xie
Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause
deadlock.
  Task1 Kswap0 task
  open()
...
btrfs_search_slot()
  ...
  btrfs_cow_block()
...
alloc_page()
  wait for reclaiming
shrink_slab()
  ...
  shrink_icache_memory()
...
btrfs_evict_inode()
  ...
  btrfs_search_slot()

If the path is locked by task1, the deadlock happens.

So the btree's page cache is different with the file's page cache, it can not
allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in
GFP_HIGHUSER_MOVABLE flag.

Reported-by: Itaru Kitayama kitay...@cl.bb4u.ne.jp
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..cf55fa0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1632,6 +1632,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
goto fail_bdi;
}
 
+   fs_info-btree_inode-i_mapping-flags = ~__GFP_FS;
+
INIT_RADIX_TREE(fs_info-fs_roots_radix, GFP_ATOMIC);
INIT_LIST_HEAD(fs_info-trans_list);
INIT_LIST_HEAD(fs_info-dead_roots);
-- 
1.7.4
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-27 Thread Itaru Kitayama
Hi Miao,

On Sun, 27 Mar 2011 15:00:00 +0800
Miao Xie mi...@cn.fujitsu.com wrote:

 I got it. It is because the allocation flag of the metadata's page cache, 
 which is stored in
 the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we 
 allocate pages for
 btree's page cache, this lockdep warning will be triggered.
 
 I think even without my patch, this lockdep warning can also be triggered, 
 btrfs_evict_inode()
 do the similar operations like what I do in the btrfs_destroy_inode(). 
   Task1   Kswap0 task
   open()
 ...
 btrfs_search_slot()
   ...
   btrfs_cow_block()
   ...
   alloc_page()
 wait for reclaiming
   shrink_slab()
 ...
 shrink_icache_memory()
   ...
   btrfs_evict_inode()
 ...
 btrfs_search_slot()
 
 If the path is locked by task1, the deadlock happens.

Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still possible 
for the kswapd0 
to call prune_icache(), no? I still see the lockdep warning even with your 
patch that clears
__GFP_FS in open_ctree().

itaru

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-27 Thread Miao Xie
On sun, 27 Mar 2011 20:09:10 +0900, Itaru Kitayama wrote:
 Hi Miao,
 
 On Sun, 27 Mar 2011 15:00:00 +0800
 Miao Xie mi...@cn.fujitsu.com wrote:
 
 I got it. It is because the allocation flag of the metadata's page cache, 
 which is stored in
 the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we 
 allocate pages for
 btree's page cache, this lockdep warning will be triggered.

 I think even without my patch, this lockdep warning can also be triggered, 
 btrfs_evict_inode()
 do the similar operations like what I do in the btrfs_destroy_inode(). 
   Task1  Kswap0 task
   open()
 ...
 btrfs_search_slot()
   ...
   btrfs_cow_block()
  ...
  alloc_page()
wait for reclaiming
  shrink_slab()
...
shrink_icache_memory()
  ...
  btrfs_evict_inode()
...
btrfs_search_slot()

 If the path is locked by task1, the deadlock happens.
 
 Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still 
 possible for the kswapd0 
 to call prune_icache(), no? I still see the lockdep warning even with your 
 patch that clears
 __GFP_FS in open_ctree().

sorry for my mistake. The above explanation is wrong, it has no business with 
kswap thread.
The correct explanation is

   Task1
   open()
 ...
 btrfs_search_slot()
   ...
   btrfs_cow_block()
...
alloc_page()
  do_try_to_free_pages()
shrink_slab()
  ...
  shrink_icache_memory()
...
btrfs_evict_inode()
  ...
  btrfs_search_slot()

If the path is locked by task1, the deadlock happens.

So balance_pgdat() is impossible to trigger the lockdep.
(My clearing __GFP_FS patch's changelog is also wrong.)

I see, except btree's page cache, free space cache's page cache is also special,
can not use __GFP_FS flag.

Thanks
Miao

 itaru
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 2/2] btrfs: implement delayed inode items operation

2011-03-27 Thread Chris Mason
Excerpts from Miao Xie's message of 2011-03-27 07:44:06 -0400:
 On sun, 27 Mar 2011 20:09:10 +0900, Itaru Kitayama wrote:
  Hi Miao,
  
  On Sun, 27 Mar 2011 15:00:00 +0800
  Miao Xie mi...@cn.fujitsu.com wrote:
  
  I got it. It is because the allocation flag of the metadata's page cache, 
  which is stored in
  the btree inode's i_mapping, was set to be GFP_HIGHUSER_MOVABLE. So if we 
  allocate pages for
  btree's page cache, this lockdep warning will be triggered.
 
  I think even without my patch, this lockdep warning can also be triggered, 
  btrfs_evict_inode()
  do the similar operations like what I do in the btrfs_destroy_inode(). 
Task1Kswap0 task
open()
  ...
  btrfs_search_slot()
...
btrfs_cow_block()
  ...
  alloc_page()
wait for reclaiming
  shrink_slab()
...
shrink_icache_memory()
  ...
  btrfs_evict_inode()
...
btrfs_search_slot()
 
  If the path is locked by task1, the deadlock happens.
  
  Ok. balance_pgdat() calls shrink_slab() with GFP_KERNEL so it's still 
  possible for the kswapd0 
  to call prune_icache(), no? I still see the lockdep warning even with your 
  patch that clears
  __GFP_FS in open_ctree().
 
 sorry for my mistake. The above explanation is wrong, it has no business with 
 kswap thread.
 The correct explanation is
 
Task1
open()
  ...
  btrfs_search_slot()
...
btrfs_cow_block()
  ...
  alloc_page()
do_try_to_free_pages()
  shrink_slab()
...
shrink_icache_memory()
  ...
  btrfs_evict_inode()
...
btrfs_search_slot()
 
 If the path is locked by task1, the deadlock happens.
 
 So balance_pgdat() is impossible to trigger the lockdep.
 (My clearing __GFP_FS patch's changelog is also wrong.)
 
 I see, except btree's page cache, free space cache's page cache is also 
 special,
 can not use __GFP_FS flag.

Ok, I've got your first patch already, I'll add a hunk for the free
space cache too.  Most of the allocations we're doing are explicitly
with GFP_NOFS, so it is just supporting allocations and readahead that
should be causing trouble.

Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] btrfs: fix possible deadlock by clearing __GFP_FS flag

2011-03-27 Thread Miao Xie
Changelog V1 - V2:
- modify the explanation of the deadlock.
- clear __GFP_FS flag in the free space's page cache.

Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause
deadlock.
  Task1
  open()
...
btrfs_search_slot()
  ...
  btrfs_cow_block()
...
alloc_page()
  ...
  do_try_to_free_pages()
shrink_slab()
...
  shrink_icache_memory()
...
btrfs_evict_inode()
  ...
  btrfs_search_slot()

If the path is locked by task1, the deadlock happens.

So the btree's page cache and free space's page cache  is different with the
file's page cache, it can not allocate pages by GFP_HIGHUSER_MOVABLE flag,
we must clear __GFP_FS flag in their i_mapping's flag.

Reported-by: Itaru Kitayama kitay...@cl.bb4u.ne.jp
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c  |2 ++
 fs/btrfs/free-space-cache.c |2 ++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..cf55fa0 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1632,6 +1632,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
goto fail_bdi;
}
 
+   fs_info-btree_inode-i_mapping-flags = ~__GFP_FS;
+
INIT_RADIX_TREE(fs_info-fs_roots_radix, GFP_ATOMIC);
INIT_LIST_HEAD(fs_info-trans_list);
INIT_LIST_HEAD(fs_info-dead_roots);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index a039065..57df380 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -88,6 +88,8 @@ struct inode *lookup_free_space_inode(struct btrfs_root *root,
}
spin_unlock(block_group-lock);
 
+   inode-i_mapping-flags = ~__GFP_FS;
+
return inode;
 }
 
-- 
1.7.4
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] btrfs: fix possible deadlock by clearing __GFP_FS flag

2011-03-27 Thread Chris Mason
Excerpts from Miao Xie's message of 2011-03-27 08:27:30 -0400:
 Changelog V1 - V2:
 - modify the explanation of the deadlock.
 - clear __GFP_FS flag in the free space's page cache.
 
 diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
 index a039065..57df380 100644
 --- a/fs/btrfs/free-space-cache.c
 +++ b/fs/btrfs/free-space-cache.c
 @@ -88,6 +88,8 @@ struct inode *lookup_free_space_inode(struct btrfs_root 
 *root,
  }
  spin_unlock(block_group-lock);
  
 +inode-i_mapping-flags = ~__GFP_FS;
 +
  return inode;
  }
  

I did this part slightly differently, in btrfs_read_locked_inode.  That
way we know the mask isn't changing while page allocations are taking
place.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Phishing (was: Account Verification)

2011-03-27 Thread Helmut Hullen
Hallo, Webmail,

Du meintest am 27.03.11:

 You have almost exceeded your webmail storage quota. To avoid account
 deletion, please
 click on the link below

 http://vwebtips.host-ed.net/Session_id_2011.htm

That's a phishing invitation.

Who can delete this spammer from the mailing list?

Viele Gruesse!
Helmut
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: Phishing (was: Account Verification)

2011-03-27 Thread Calvin Walton
On Sun, 2011-03-27 at 18:22 +0200, Helmut Hullen wrote:
 Du meintest am 27.03.11:
 
  You have almost exceeded your webmail storage quota. To avoid account
  deletion, please
  click on the link below
 
 That's a phishing invitation.
 
 Who can delete this spammer from the mailing list?

Like most (but not all) Linux development mailing lists, the linux-btrfs
mailing list is open for anyone to post, even if they're not subscribed.
There are filters on vger.kernel.org which catch a lot of the spam, but
not all of it; the reality is that some will slip through.

You just have to live with it: make sure you run your own spam filters
and are careful about links in mails.
-- 
Calvin Walton calvin.wal...@kepstin.ca

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs

2011-03-27 Thread Chris Mason
Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
 Dear list,
 This is V4 of batched discard support, now we will get full mapping of
 the free space on each device for RAID0/1/10/DUP instead of just a single
 stripe length, and tested with xfsstests 251, Thanks.

I've pushed this out into the for-linus branch, along with a full merge
to 2.6.39 current git.

Please take a look and make sure I've merged it correctly.

Thanks!

-chris

 Changelog V4:
 *make btrfs_map_block() return full mapping.
 Changelog V3:
 *fix style problems.
 *rebase to 2.6.38-rc7.
 Changelog V2:
 *Check if we have devices support trim before trying to trim the fs, also 
 adjust
   minlen according to the discard_granularity.
 *Update reserved extent calculations in btrfs_trim_block_group().
 *Call cond_resched() without checking need_resched()
 *Use bitmap_clear_bits() and unlink_free_space() instead of 
 btrfs_remove_free_space(),
   so we won't search the same extent for twice.
 *Try harder in btrfs_discard_extent(), now we won't report errors
  if it's not a EOPNOTSUPP.
 *make sure the block group is cached before trimming it,or we'll see an 
 empty caching
  tree if the block group is not cached.
 *Minor return value fix in btrfs_discard_block_group().
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


I cannot mount two (=all) btrfs volumes after crash while TOI suspending

2011-03-27 Thread jasiu

I used btrfs on / and /var.
After first suspending on 2.6.38.1 yesterday (screen was black,
no disk activity, but it respond to Magic SysRq 
[sync,remount-ro,reboot])

system cannot boot.

A copied system from 3-month-old backup to new disk using ext4 only fs,
so I could have btrfs in a module and here are results of my 
investigations:


path-slots[0] is 0
and I observed, that normally it should be =1 for root

root-tree.c line 97, btrfs_find_last_root()
if (path-slots[0] == 0) {
ret = 1;
goto out;
}

Then find_and_setup_root() returns error:

disk-io.c   line 1026, find_and_setup_root()
ret = btrfs_find_last_root(tree_root, objectid,
root-root_item, root-root_key);
if (ret  0)
return -ENOENT;

...and so does open_ctree()

disk-io.c   line 1945, open_ctree()
ret = find_and_setup_root(tree_root, fs_info,
BTRFS_EXTENT_TREE_OBJECTID, extent_root);
if (ret)
goto fail_tree_root;

What can I do now to mount my btrfs?
Here is log after trying to mount:

kernel: device label jroot devid 1 transid 25083 /dev/sdb3
kernel: btrfs: allowing degraded mounts
kernel: parent transid verify failed on 3057614848 wanted 25083 found 
25080
kernel: parent transid verify failed on 3057614848 wanted 25083 found 
25080
kernel: parent transid verify failed on 3057614848 wanted 25083 found 
25080

kernel: btrfs: open_ctree failed

btrfsck from next btrfs-progs-unstable:
# ./btrfsck /dev/sdb3
using SB copy 1, bytenr 67108864
parent transid verify failed on 3057614848 wanted 25083 found 25080
parent transid verify failed on 3057614848 wanted 25083 found 25080
parent transid verify failed on 3057614848 wanted 25083 found 25080
btrfsck: disk-io.c:739: open_ctree_fd: Assertion `!(!tree_root-node)' 
failed.

Aborted

Same with /var, and using -s 0 -s 1.

What can I do to restore my data?
(I only wanted /etc and /var/log...)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs

2011-03-27 Thread Chris Mason
Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400:
 Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
  Dear list,
  This is V4 of batched discard support, now we will get full mapping of
  the free space on each device for RAID0/1/10/DUP instead of just a single
  stripe length, and tested with xfsstests 251, Thanks.
 
 I've pushed this out into the for-linus branch, along with a full merge
 to 2.6.39 current git.
 
 Please take a look and make sure I've merged it correctly.

Hmmm, this was doing mod operations on 64 bit numbers, so it didn't
compile at all on 32 bit machines.  I've fixed it up and pushed the
result out to for-linus.  Please check the math ;)

-chris

 
 Thanks!
 
 -chris
 
  Changelog V4:
  *make btrfs_map_block() return full mapping.
  Changelog V3:
  *fix style problems.
  *rebase to 2.6.38-rc7.
  Changelog V2:
  *Check if we have devices support trim before trying to trim the fs, 
  also adjust
minlen according to the discard_granularity.
  *Update reserved extent calculations in btrfs_trim_block_group().
  *Call cond_resched() without checking need_resched()
  *Use bitmap_clear_bits() and unlink_free_space() instead of 
  btrfs_remove_free_space(),
so we won't search the same extent for twice.
  *Try harder in btrfs_discard_extent(), now we won't report errors
   if it's not a EOPNOTSUPP.
  *make sure the block group is cached before trimming it,or we'll see an 
  empty caching
   tree if the block group is not cached.
  *Minor return value fix in btrfs_discard_block_group().
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 0/4] Btrfs: batched discard support for btrfs

2011-03-27 Thread Chris Mason
Excerpts from Chris Mason's message of 2011-03-27 21:30:20 -0400:
 Excerpts from Chris Mason's message of 2011-03-27 14:10:46 -0400:
  Excerpts from Li Dongyang's message of 2011-03-24 06:24:24 -0400:
   Dear list,
   This is V4 of batched discard support, now we will get full mapping of
   the free space on each device for RAID0/1/10/DUP instead of just a single
   stripe length, and tested with xfsstests 251, Thanks.
  
  I've pushed this out into the for-linus branch, along with a full merge
  to 2.6.39 current git.
  
  Please take a look and make sure I've merged it correctly.
 
 Hmmm, this was doing mod operations on 64 bit numbers, so it didn't
 compile at all on 32 bit machines.  I've fixed it up and pushed the
 result out to for-linus.  Please check the math ;)

BTW, I just rebased this so the incremental fix was before merging into
Linus' tree.

-chris

 
 -chris
 
  
  Thanks!
  
  -chris
  
   Changelog V4:
   *make btrfs_map_block() return full mapping.
   Changelog V3:
   *fix style problems.
   *rebase to 2.6.38-rc7.
   Changelog V2:
   *Check if we have devices support trim before trying to trim the fs, 
   also adjust
 minlen according to the discard_granularity.
   *Update reserved extent calculations in btrfs_trim_block_group().
   *Call cond_resched() without checking need_resched()
   *Use bitmap_clear_bits() and unlink_free_space() instead of 
   btrfs_remove_free_space(),
 so we won't search the same extent for twice.
   *Try harder in btrfs_discard_extent(), now we won't report errors
if it's not a EOPNOTSUPP.
   *make sure the block group is cached before trimming it,or we'll see 
   an empty caching
tree if the block group is not cached.
   *Minor return value fix in btrfs_discard_block_group().
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH] Btrfs: Fix uninitialized root flags for subvolumes

2011-03-27 Thread Li Zefan
root_item-flags and root_item-byte_limit are not initialized when
a subvolume is created. This bug is not revealed until we added
readonly snapshot support - now you mount a btrfs filesystem and you
may find the subvolumes in it are readonly.

To work around this problem, we steal a bit from root_item-inode_item-flags,
and use it to indicate if those fields have been properly initialized. 
When we read a tree root from disk, we check if the bit is set, and if
not we'll set the flag and initialize the two fields of the root item.

Reported-by: Andreas Philipp philipp.andr...@gmail.com
Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |4 
 fs/btrfs/disk-io.c |4 +++-
 fs/btrfs/ioctl.c   |4 
 fs/btrfs/root-tree.c   |   18 ++
 fs/btrfs/transaction.c |1 +
 5 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b4b9d1..ff6b991 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1284,6 +1284,8 @@ struct btrfs_root {
 #define BTRFS_INODE_NOATIME(1  9)
 #define BTRFS_INODE_DIRSYNC(1  10)
 
+#define BTRFS_INODE_ROOT_ITEM_INIT (1  31)
+
 /* some macros to generate set/get funcs for the struct fields.  This
  * assumes there is a lefoo_to_cpu for every type, so lets make a simple
  * one for u8:
@@ -2355,6 +2357,8 @@ int btrfs_find_dead_roots(struct btrfs_root *root, u64 
objectid);
 int btrfs_find_orphan_roots(struct btrfs_root *tree_root);
 int btrfs_set_root_node(struct btrfs_root_item *item,
struct extent_buffer *node);
+void btrfs_check_and_init_root_item(struct btrfs_root_item *item);
+
 /* dir-item.c */
 int btrfs_insert_dir_item(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, const char *name,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..4f8dafc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1184,8 +1184,10 @@ struct btrfs_root *btrfs_read_fs_root_no_radix(struct 
btrfs_root *tree_root,
root-commit_root = btrfs_root_node(root);
BUG_ON(!root-node);
 out:
-   if (location-objectid != BTRFS_TREE_LOG_OBJECTID)
+   if (location-objectid != BTRFS_TREE_LOG_OBJECTID) {
root-ref_cows = 1;
+   btrfs_check_and_init_root_item(root-root_item);
+   }
 
return root;
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5fdb2ab..2ff51e6 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -294,6 +294,10 @@ static noinline int create_subvol(struct btrfs_root *root,
inode_item-nbytes = cpu_to_le64(root-leafsize);
inode_item-mode = cpu_to_le32(S_IFDIR | 0755);
 
+   root_item.flags = 0;
+   root_item.byte_limit = 0;
+   inode_item-flags = cpu_to_le64(BTRFS_INODE_ROOT_ITEM_INIT);
+
btrfs_set_root_bytenr(root_item, leaf-start);
btrfs_set_root_generation(root_item, trans-transid);
btrfs_set_root_level(root_item, 0);
diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
index 6a1086e..3e45c32 100644
--- a/fs/btrfs/root-tree.c
+++ b/fs/btrfs/root-tree.c
@@ -471,3 +471,21 @@ again:
btrfs_free_path(path);
return 0;
 }
+
+/*
+ * Old btrfs forgets to init root_item-flags and root_item-byte_limit
+ * for subvolumes. To work around this problem, we steal a bit from
+ * root_item-inode_item-flags, and use it to indicate if those fields
+ * have been properly initialized.
+ */
+void btrfs_check_and_init_root_item(struct btrfs_root_item *root_item)
+{
+   u64 inode_flags = le64_to_cpu(root_item-inode.flags);
+
+   if (!(inode_flags  BTRFS_INODE_ROOT_ITEM_INIT)) {
+   inode_flags |= BTRFS_INODE_ROOT_ITEM_INIT;
+   root_item-inode.flags = cpu_to_le64(inode_flags);
+   root_item-flags = 0;
+   root_item-byte_limit = 0;
+   }
+}
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3d73c8d..f3d6681 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -970,6 +970,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
record_root_in_trans(trans, root);
btrfs_set_root_last_snapshot(root-root_item, trans-transid);
memcpy(new_root_item, root-root_item, sizeof(*new_root_item));
+   btrfs_check_and_init_root_item(new_root_item);
 
root_flags = btrfs_root_flags(new_root_item);
if (pending-readonly)
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html