Re: [PATCH] Btrfs: use do_div to avoid compile errors on 32bit box

2011-08-19 Thread liubo
On 08/19/2011 09:22 PM, Josef Bacik wrote:
 On Fri, Aug 19, 2011 at 05:48:44PM +0800, Liu Bo wrote:
 When doing div operation of u64 type, we need to be careful and use do_div
 to avoid compile ERROR on 32bit box:

 ERROR: __udivdi3 [fs/btrfs/btrfs.ko] undefined!
 make[1]: *** [__modpost] Error 1

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 
 Chris just left for vacation, can you send this to Linus/lkml so it gets 
 pulled
 in.  Thanks,
 

Already done.

thanks,
liubo

 Josef
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use do_div to avoid compile errors on 32bit box

2011-08-19 Thread liubo
On 08/20/2011 09:34 AM, Liu Bo wrote:
 When doing div operation of u64 type, we need to be careful and use do_div
 to avoid compile ERROR on 32bit box:
 
 ERROR: __udivdi3 [fs/btrfs/btrfs.ko] undefined!
 make[1]: *** [__modpost] Error 1
 

Sorry, guys, I just sent a wrong version.

Plz ignore this one.  I'm sorry.

thanks,
liubo

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/extent-tree.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 80d6148..9b495ce 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -6796,14 +6796,14 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 
 bytenr)
   index = get_block_group_index(block_group);
   if (index == 0) {
   dev_min = 4;
 - min_free /= 2;
 + do_div(min_free, 2);
   } else if (index == 1) {
   dev_min = 2;
   } else if (index == 2) {
   min_free *= 2;
   } else if (index == 3) {
   dev_min = fs_devices-rw_devices;
 - min_free /= dev_min;
 + do_div(min_free, dev_min);
   }
  
   mutex_lock(root-fs_info-chunk_mutex);

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix an oops of log replay

2011-08-16 Thread liubo
On 08/08/2011 11:13 PM, Andy Lutomirski wrote:
 On 08/06/2011 04:35 AM, Liu Bo wrote:
 When btrfs recovers from a crash, it may hit the oops below:

 [ cut here ]
 kernel BUG at fs/btrfs/inode.c:4580!
 [...]
 RIP: 0010:[a03df251]  [a03df251]
 btrfs_add_link+0x161/0x1c0 [btrfs]
 [...]
 Call Trace:
   [a03e7b31] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
   [a04054e9] add_inode_ref+0x319/0x3f0 [btrfs]
   [a0407087] replay_one_buffer+0x2c7/0x390 [btrfs]
   [a040444a] walk_down_log_tree+0x32a/0x480 [btrfs]
   [a0404695] walk_log_tree+0xf5/0x240 [btrfs]
   [a0406cc0] btrfs_recover_log_trees+0x250/0x350 [btrfs]
   [a0406dc0] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
   [a03d18b2] open_ctree+0x1442/0x17d0 [btrfs]
 [...]

 This comes from that while replaying an inode ref item, we forget to
 check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
 then we will come to conflict corners which lead to BUG_ON().

 Signed-off-by: Liu Boliubo2...@cn.fujitsu.com
 ---
   fs/btrfs/tree-log.c |   28 
   1 files changed, 24 insertions(+), 4 deletions(-)
 
 This fixes the oops for me.  The bug was a regression in 2.6.39, I believe.
 
 Tested-by: Andy Lutomirski l...@mit.edu
 

Thanks a lot for testing!

thanks,
liubo

 --Andy
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: skip looking for delalloc if we don't have -fill_delalloc

2011-08-01 Thread liubo
On 08/02/2011 12:11 AM, Josef Bacik wrote:
 We always look for delalloc bytes in our io_tree so we can fill in delalloc.
 This is fine in most cases, but if we're writing out the btree_inode this is
 just a superfluous tree search on the io_tree, and if we have a lot of 
 metadata
 dirty this could be an expensive check.  So instead check to see if our 
 io_tree
 has a -fill_delalloc op, and if not don't even bother doing the lookup.
 Thanks,
 
 Signed-off-by: Josef Bacik jo...@redhat.com
 ---

With the patch,

mkfs.btrfs /dev/sda15
mount /dev/sda15 /mnt/btrfs
dd if=/dev/zero of=/mnt/btrfs/tmp bs=1G

then it comes the following bug:

Btrfs loaded
device fsid 91d23288-d352-4346-979f-d6f93cac04a3 devid 1 transid 7 /dev/sda15
[ cut here ]
kernel BUG at fs/btrfs/inode.c:1583!
...
Call Trace:
 [a05b00d8] worker_loop+0x138/0x510 [btrfs]
 [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
 [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
 [81074f06] kthread+0x96/0xa0
 [81467bf4] kernel_thread_helper+0x4/0x10
 [81074e70] ? kthread_worker_fn+0x1a0/0x1a0
 [81467bf0] ? gs_change+0xb/0xb
Code: e0 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 7d b8 48 8d 4d c8 
41 b8 50 00 00 00 4c 89 fa 4c 89 e6 e8 19 cf 01 00 eb bd 0f 0b eb fe 48 89 df 
e8 1b 48 b6 e0 eb 9d 66 0f 1f 84 00 00 00 
RIP  [a0587f59] btrfs_writepage_fixup_worker+0x139/0x150 [btrfs]
 RSP 88000887bdd0
---[ end trace 5089b598ce74fcfc ]---

thanks,
liubo

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: skip looking for delalloc if we don't have -fill_delalloc

2011-08-01 Thread liubo
On 08/02/2011 09:32 AM, liubo wrote:
 On 08/02/2011 12:11 AM, Josef Bacik wrote:
 We always look for delalloc bytes in our io_tree so we can fill in delalloc.
 This is fine in most cases, but if we're writing out the btree_inode this is
 just a superfluous tree search on the io_tree, and if we have a lot of 
 metadata
 dirty this could be an expensive check.  So instead check to see if our 
 io_tree
 has a -fill_delalloc op, and if not don't even bother doing the lookup.
 Thanks,

 Signed-off-by: Josef Bacik jo...@redhat.com
 ---
 

sorry, I mixed the patch with others...

The patch is ok.

 With the patch,
 
 mkfs.btrfs /dev/sda15
 mount /dev/sda15 /mnt/btrfs
 dd if=/dev/zero of=/mnt/btrfs/tmp bs=1G
 
 then it comes the following bug:
 
 Btrfs loaded
 device fsid 91d23288-d352-4346-979f-d6f93cac04a3 devid 1 transid 7 /dev/sda15
 [ cut here ]
 kernel BUG at fs/btrfs/inode.c:1583!
 ...
 Call Trace:
  [a05b00d8] worker_loop+0x138/0x510 [btrfs]
  [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
  [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs]
  [81074f06] kthread+0x96/0xa0
  [81467bf4] kernel_thread_helper+0x4/0x10
  [81074e70] ? kthread_worker_fn+0x1a0/0x1a0
  [81467bf0] ? gs_change+0xb/0xb
 Code: e0 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 7d b8 48 8d 4d c8 
 41 b8 50 00 00 00 4c 89 fa 4c 89 e6 e8 19 cf 01 00 eb bd 0f 0b eb fe 48 89 
 df e8 1b 48 b6 e0 eb 9d 66 0f 1f 84 00 00 00 
 RIP  [a0587f59] btrfs_writepage_fixup_worker+0x139/0x150 [btrfs]
  RSP 88000887bdd0
 ---[ end trace 5089b598ce74fcfc ]---
 
 thanks,
 liubo
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: don't be as agressive with delalloc metadata reservations

2011-07-17 Thread liubo
On 07/16/2011 02:29 AM, Josef Bacik wrote:
 Currently we reserve enough space to COW an entirely full btree for every 
 extent
 we have reserved for an inode.  This _sucks_, because you only need to COW 
 once,
 and then everybody else is ok.  Unfortunately we don't know we'll all be able 
 to
 get into the same transaction so that's what we have had to do.  But the 
 global
 reserve holds a reservation large enough to cover a large percentage of all 
 the
 metadata currently in the fs.  So all we really need to account for is any new
 blocks that we may allocate.  So fix this by
 
 1) Passing to btrfs_alloc_free_block() wether this is a new block or a COW
 block.  If it is a COW block we use the global reserve, if not we use the
 trans-block_rsv.
 2) Reduce the amount of space we reserve.  Since we don't need to account for
 cow'ing the tree we can just keep track of new blocks to reserve, which 
 greatly
 reduces the reservation amount.
 
 This makes my basic random write test go from 3 mb/s to 75 mb/s.  I've tested
 this with my horrible ENOSPC test and it seems to work out fine.  Thanks,
 

Hi, Josef,

After I patched this and did a tar xf source.tar, I got lots of warnings,

Would you like to look into this?

[ cut here ]
WARNING: at fs/btrfs/extent-tree.c:5695 btrfs_alloc_free_block+0x178/0x340 
[btrfs]()
Hardware name: QiTianM7150
Modules linked in: btrfs iptable_nat nf_nat zlib_deflate libcrc32c ebtable_nat 
ebtables bridge stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf be2iscsi 
iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio iscsi_tcp 
libiscsi_tcp libiscsi scsi_transport_iscsi ext3 jbd dm_mirror dm_region_hash 
dm_log dm_mod sg ppdev serio_raw pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support 
sky2 parport_pc parport ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi 
ata_generic ata_piix i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last 
unloaded: btrfs]
Pid: 16008, comm: umount Tainted: GW   2.6.39+ #9
Call Trace:
 [81053baf] warn_slowpath_common+0x7f/0xc0
 [81053c0a] warn_slowpath_null+0x1a/0x20
 [a04d37d8] btrfs_alloc_free_block+0x178/0x340 [btrfs]
 [a0501768] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
 [a04be625] __btrfs_cow_block+0x155/0x5f0 [btrfs]
 [a04bebcb] btrfs_cow_block+0x10b/0x240 [btrfs]
 [a04c4c8e] btrfs_search_slot+0x49e/0x7a0 [btrfs]
 [a04d2399] btrfs_write_dirty_block_groups+0x1a9/0x4d0 [btrfs]
 [a0512e20] ? btrfs_tree_unlock+0x50/0x50 [btrfs]
 [a04df845] commit_cowonly_roots+0x105/0x1e0 [btrfs]
 [a04e0708] btrfs_commit_transaction+0x428/0x850 [btrfs]
 [a04df9b8] ? wait_current_trans+0x28/0x100 [btrfs]
 [a04e0c25] ? join_transaction+0x25/0x250 [btrfs]
 [81075590] ? wake_up_bit+0x40/0x40
 [a04bb187] btrfs_sync_fs+0x67/0xd0 [btrfs]
 [8116c27e] __sync_filesystem+0x5e/0x90
 [8116c38b] sync_filesystem+0x4b/0x70
 [811441c4] generic_shutdown_super+0x34/0xf0
 [81144316] kill_anon_super+0x16/0x60
 [81144a25] deactivate_locked_super+0x45/0x70
 [8114568a] deactivate_super+0x4a/0x70
 [8115efdc] mntput_no_expire+0x13c/0x1c0
 [8115f7bb] sys_umount+0x7b/0x3a0
 [81466b2b] system_call_fastpath+0x16/0x1b
---[ end trace 9a65800674b03b84 ]---


thanks,
liubo


 Signed-off-by: Josef Bacik jo...@redhat.com
 ---
  fs/btrfs/ctree.c   |   10 +-
  fs/btrfs/ctree.h   |5 ++---
  fs/btrfs/disk-io.c |3 ++-
  fs/btrfs/extent-tree.c |   20 +++-
  fs/btrfs/ioctl.c   |2 +-
  5 files changed, 25 insertions(+), 15 deletions(-)
 
 diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
 index 2e66786..fbd48e9 100644
 --- a/fs/btrfs/ctree.c
 +++ b/fs/btrfs/ctree.c
 @@ -206,7 +206,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
  
   cow = btrfs_alloc_free_block(trans, root, buf-len, 0,
new_root_objectid, disk_key, level,
 -  buf-start, 0);
 +  buf-start, 0, 1);
   if (IS_ERR(cow))
   return PTR_ERR(cow);
  
 @@ -412,7 +412,7 @@ static noinline int __btrfs_cow_block(struct 
 btrfs_trans_handle *trans,
  
   cow = btrfs_alloc_free_block(trans, root, buf-len, parent_start,
root-root_key.objectid, disk_key,
 -  level, search_start, empty_size);
 +  level, search_start, empty_size, 0);
   if (IS_ERR(cow))
   return PTR_ERR(cow);
  
 @@ -1985,7 +1985,7 @@ static noinline int insert_new_root(struct 
 btrfs_trans_handle *trans,
  
   c = btrfs_alloc_free_block(trans, root, root-nodesize, 0,
  root-root_key.objectid, lower_key,
 -level, root-node-start, 0);
 +level, root-node-start, 0, 1

Re: [GIT PULL v4] Btrfs: improve write ahead log with sub transaction

2011-06-30 Thread liubo
On 06/30/2011 03:36 PM, Liu Bo wrote:
 I've been working to try to improve the write-ahead log's performance,
 and I found that the bottleneck addresses in the checksum items,
 especially when we want to make a random write on a large file, e.g a 4G file.
 
 Then a idea for this suggested by Chris is to use sub transaction ids and just
 to log the part of inode that had changed since either the last log commit or
 the last transaction commit.  And as we also push the sub transid into the 
 btree
 blocks, we'll get much faster tree walks.  As a result, we abandon the 
 original
 brute force approach, which is to delete all items of the inode in log,
 to making sure we get the most uptodate copies of everything, and instead
 we manage to find and merge, i.e. finding extents in the log tree and 
 merging
 in the new extents from the file.
 
 This patchset puts the above idea into code, and although the code is now more
 complex, it brings us a great deal of performance improvement:
 

This is also available in

git://repo.or.cz/linux-btrfs-devel.git sub-trans


thanks,
liubo

 in my sysbench write + fsync test:
 
 451.01Kb/sec - 4.3621Mb/sec
 
 Also, I've run the synctest, and it works well with both directory and file.
 
 v1-v2, rebase.
 v2-v3, thanks to Chris, we worked together to solve 2 bugs, and after that it
 worked as expected.
 v3-v4, thanks to Josef, we simplify several codes.
  
 Liu Bo (12):
   Btrfs: introduce sub transaction stuff
   Btrfs: update block generation if should_cow_block fails
   Btrfs: modify btrfs_drop_extents API
   Btrfs: introduce first sub trans
   Btrfs: still update inode trans stuff when size remains unchanged
   Btrfs: improve log with sub transaction
   Btrfs: add checksum check for log
   Btrfs: fix a bug of log check
   Btrfs: kick off useless code
   Btrfs: use the right generation number to read log_root_tree
   Btrfs: do not iput inode when inode is still in log
   Revert Btrfs: do not flush csum items of unchanged file data during
 treelog
 
  fs/btrfs/btrfs_inode.h |   12 ++-
  fs/btrfs/ctree.c   |   69 +++
  fs/btrfs/ctree.h   |5 +-
  fs/btrfs/disk-io.c |   12 ++--
  fs/btrfs/extent-tree.c |   10 ++-
  fs/btrfs/file.c|   22 ++---
  fs/btrfs/inode.c   |   39 ++---
  fs/btrfs/ioctl.c   |6 +-
  fs/btrfs/relocation.c  |6 +-
  fs/btrfs/transaction.c |   13 ++-
  fs/btrfs/transaction.h |   19 -
  fs/btrfs/tree-defrag.c |2 +-
  fs/btrfs/tree-log.c|  225 
 
  13 files changed, 293 insertions(+), 147 deletions(-)
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2 v2] Btrfs: kill location key of in-memory inode

2011-06-27 Thread liubo
ping?

On 06/20/2011 10:59 AM, Liu Bo wrote:
 In btrfs's in-memory inode, there is a btrfs_key which has the structure:
 - key.objectid
 - key.type
 - key.offset
 
 however, we only use key.objectid to search, to check or something else,
 and to reduce in-memory inode size I just keep what is valuable.
 
 v1-v2: update a more proper typo for inode number (thanks to David).
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/btrfs_inode.h |   10 --
  fs/btrfs/disk-io.c |3 +--
  fs/btrfs/export.c  |2 +-
  fs/btrfs/extent-tree.c |2 +-
  fs/btrfs/inode.c   |   48 
 +---
  5 files changed, 36 insertions(+), 29 deletions(-)
 
 diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
 index 52d7eca..9f1bbf2 100644
 --- a/fs/btrfs/btrfs_inode.h
 +++ b/fs/btrfs/btrfs_inode.h
 @@ -29,11 +29,6 @@ struct btrfs_inode {
   /* which subvolume this inode belongs to */
   struct btrfs_root *root;
  
 - /* key used to find this inode on disk.  This is used by the code
 -  * to read in roots of subvolumes
 -  */
 - struct btrfs_key location;
 -
   /* the extent_tree has caches of all the extent mappings to disk */
   struct extent_map_tree extent_tree;
  
 @@ -72,6 +67,9 @@ struct btrfs_inode {
   /* the space_info for where this inode's data allocations are done */
   struct btrfs_space_info *space_info;
  
 + /* full 64 bit inode number */
 + u64 ino;
 +
   /* full 64 bit generation number, struct vfs_inode doesn't have a big
* enough field for this.
*/
 @@ -171,7 +169,7 @@ static inline struct btrfs_inode *BTRFS_I(struct inode 
 *inode)
  
  static inline u64 btrfs_ino(struct inode *inode)
  {
 - u64 ino = BTRFS_I(inode)-location.objectid;
 + u64 ino = BTRFS_I(inode)-ino;
  
   if (ino = BTRFS_FIRST_FREE_OBJECTID)
   ino = inode-i_ino;
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index a203d36..06c9b18 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1693,9 +1693,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
  
   BTRFS_I(fs_info-btree_inode)-io_tree.ops = btree_extent_io_ops;
  
 + BTRFS_I(fs_info-btree_inode)-ino = BTRFS_BTREE_INODE_OBJECTID;
   BTRFS_I(fs_info-btree_inode)-root = tree_root;
 - memset(BTRFS_I(fs_info-btree_inode)-location, 0,
 -sizeof(struct btrfs_key));
   BTRFS_I(fs_info-btree_inode)-dummy_inode = 1;
   insert_inode_hash(fs_info-btree_inode);
  
 diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c
 index 1b8dc33..b60c118 100644
 --- a/fs/btrfs/export.c
 +++ b/fs/btrfs/export.c
 @@ -43,7 +43,7 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, 
 int *max_len,
   spin_lock(dentry-d_lock);
  
   parent = dentry-d_parent-d_inode;
 - fid-parent_objectid = BTRFS_I(parent)-location.objectid;
 + fid-parent_objectid = BTRFS_I(parent)-ino;
   fid-parent_gen = parent-i_generation;
   parent_root_id = BTRFS_I(parent)-root-objectid;
  
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 5b9b6b6..f3d1230 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -3037,7 +3037,7 @@ int btrfs_check_data_free_space(struct inode *inode, 
 u64 bytes)
   bytes = (bytes + root-sectorsize - 1)  ~((u64)root-sectorsize - 1);
  
   if (root == root-fs_info-tree_root ||
 - BTRFS_I(inode)-location.objectid == BTRFS_FREE_INO_OBJECTID) {
 + BTRFS_I(inode)-ino == BTRFS_FREE_INO_OBJECTID) {
   alloc_chunk = 0;
   committed = 1;
   }
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 02ff4a1..bbe4cdc 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -754,7 +754,7 @@ static inline bool is_free_space_inode(struct btrfs_root 
 *root,
  struct inode *inode)
  {
   if (root == root-fs_info-tree_root ||
 - BTRFS_I(inode)-location.objectid == BTRFS_FREE_INO_OBJECTID)
 + BTRFS_I(inode)-ino == BTRFS_FREE_INO_OBJECTID)
   return true;
   return false;
  }
 @@ -2513,7 +2513,10 @@ static void btrfs_read_locked_inode(struct inode 
 *inode)
   path = btrfs_alloc_path();
   BUG_ON(!path);
   path-leave_spinning = 1;
 - memcpy(location, BTRFS_I(inode)-location, sizeof(location));
 +
 + location.objectid = BTRFS_I(inode)-ino;
 + location.offset = 0;
 + btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY);
  
   ret = btrfs_lookup_inode(NULL, root, path, location, 0);
   if (ret)
 @@ -2667,6 +2670,7 @@ noinline int btrfs_update_inode(struct 
 btrfs_trans_handle *trans,
   struct btrfs_inode_item *inode_item;
   struct btrfs_path *path;
   struct extent_buffer *leaf;
 + struct btrfs_key location;
   int ret;
  
   /*
 @@ -2687,8 +2691,12 @@ noinline int btrfs_update_inode(struct 
 

Re: [PATCH 10/12 v3] Btrfs: deal with EEXIST after iput

2011-06-21 Thread liubo
On 06/21/2011 10:00 PM, Josef Bacik wrote:
 On 06/21/2011 04:49 AM, Liu Bo wrote:
 There are two cases when BTRFS_I(inode)-logged_trans is zero:
 a) an inode is just allocated;
 b) iput an inode and reread it.

 However, in b) if btrfs is not committed yet, and this inode _may_
 still remain
 in log tree.

 So we need to check the log tree to get logged_trans a right value
 in case it hits a EEXIST while logging.
 
 Instead of doing this why not just check and see if the inode has been
 logged but the transaction has not yet been committed in
 btrfs_drop_inode?  That way the inode doesn't get evicted from cache
 until after we know it's ok and that way we don't have to waste a tree
 lookup.  Thanks,
 

Good idea, I'll follow it.

thanks,
liubo

 Josef
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: kill location key of in-memory inode

2011-06-19 Thread liubo
, BTRFS_INODE_ITEM_KEY);
 -
  btrfs_inherit_iflags(inode, dir);
  
  if ((mode  S_IFREG)) {
 @@ -7029,7 +7039,7 @@ static int btrfs_rename(struct inode *old_dir, struct 
 dentry *old_dentry,
  new_inode-i_ctime = CURRENT_TIME;
  if (unlikely(btrfs_ino(new_inode) ==
   BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) {
 -root_objectid = BTRFS_I(new_inode)-location.objectid;
 +root_objectid = BTRFS_I(new_inode)-inode_id;
 
 direct assignment, no btrfs_ino as in the first hunk

This is a special case, where is new_inode-i_ino is 
BTRFS_EMPTY_SUBVOL_DIR_OBJECTID, 
while BTRFS_I(new_inode)-location.objectid is 256.

Thanks for the reviewing!

liubo
thanks,

 
  ret = btrfs_unlink_subvol(trans, dest, new_dir,
  root_objectid,
  new_dentry-d_name.name,
 -- 
 
 
 david
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11 v2] Btrfs: improve write ahead log with sub transaction

2011-06-09 Thread liubo
On 06/10/2011 08:40 AM, David Sterba wrote:
 Hi,
 
 is it possible to refresh this patchset and resend? I'd like to enroll
 it and give it some review and testing. So far I have seen notions and
 use of trans_mutex, which has been removed.
 

Sure, thanks for the passion.

Yea, I've noticed the trans_mutex thing, but I'm afraid I have to do this
till next week, cause these is a btrfs fi bal bug still on going on my 
schedule.

thanks,
liubo 

 
 thanks,
 david
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: check root_key's offset instead

2011-06-08 Thread liubo
When we use reloc root to cow or copy a tree block, we do not set the block's
owner, instead we set its header's flag with BTRFS_HEADER_FLAG_RELOC.

So here we should check for root_key's offset.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 5b9b6b6..0bda273 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6160,7 +6160,7 @@ static noinline int walk_up_proc(struct 
btrfs_trans_handle *trans,
if (wc-flags[level + 1]  BTRFS_BLOCK_FLAG_FULL_BACKREF)
parent = path-nodes[level + 1]-start;
else
-   BUG_ON(root-root_key.objectid !=
+   BUG_ON(root-root_key.offset !=
   btrfs_header_owner(path-nodes[level + 1]));
}
 
-- 
1.6.5.2


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:6164!

2011-06-07 Thread liubo
On 06/07/2011 04:24 PM, Tsutomu Itoh wrote:
 (2011/06/07 15:17), Tsutomu Itoh wrote:
 (2011/06/07 14:59), Tsutomu Itoh wrote:
 Hi liubo,

 (2011/06/07 14:31), liubo wrote:
 On 06/06/2011 04:33 PM, Tsutomu Itoh wrote:
 Hi,

 I encountered following panic using 'btrfs-unstable + for-linus'
 kernel.

 I ran btrfs fi bal /test5 command, and mount option of /test5
 is as follows:

  /dev/sdc3 on /test5 type btrfs (rw,space_cache,compress=lzo,inode_cache)

 So, just a btrfs fi bal would lead to the bug?
 I think so.

 I've figured out the warnings, but not reproduced the bug yet...
 I used 'btrfs-unstable + for-linus whose top commit is

 commit aa0467d8d2a00e75b2bb6a56a4ee6d70c5d1928f
 Author: David Sterba dste...@suse.cz
 Date:   Fri Jun 3 16:29:08 2011 +0200

 btrfs: fix uninitialized variable warning
 It's same of my environment.

 
 and tried on 1) a single disk, 2) 2 disks and 3) 4 disks respectively,
 but none of them leaded to the below bug.
 The test script and the volume composition that I am executing are
 same as following mail.

   http://marc.info/?l=linux-btrfsm=130680171426371w=2

 and, in my environment, panic is done within almost 30 minutes when
 test script is executed.
 
 I forgot to write.
 I am adding '-o inode_cache' to the mount option in my test script.
 

Yep, I've added this and reproduced it.
Seems that there are several bugs.

Anyway, thanks for the report.  I'm trying to work it out. :)

thanks,
liubo

 Another panic occurred when I executed it again.

 
 I rebuilt the kernel with 3.0-rc2. but, same problem occurred.
 
 
 4[  131.708325] WARNING: at fs/btrfs/transaction.c:213 
 start_transaction+0x74/0x259 [btrfs]()
 4[  131.708329] Hardware name: PRIMERGY
 4[  131.708330] Modules linked in: autofs4 sunrpc 8021q garp stp llc 
 cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c 
 libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev 
 parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support 
 tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod 
 crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata 
 scsi_mod floppy [last unloaded: microcode]
 4[  131.708378] Pid: 3041, comm: btrfs Not tainted 3.0.0-rc2test #1
 4[  131.708381] Call Trace:
 4[  131.708388]  [8104514a] warn_slowpath_common+0x85/0x9d
 4[  131.708392]  [8104517c] warn_slowpath_null+0x1a/0x1c
 4[  131.708410]  [a02a6f8b] start_transaction+0x74/0x259 [btrfs]
 4[  131.708430]  [a02bf965] ? btrfs_wait_ordered_range+0xf9/0x11d 
 [btrfs]
 4[  131.708448]  [a02a73ed] btrfs_start_transaction+0x13/0x15 
 [btrfs]
 4[  131.708467]  [a02aec08] btrfs_evict_inode+0x113/0x22d [btrfs]
 4[  131.708471]  [81123a98] evict+0x77/0x118
 4[  131.708475]  [81123ec1] iput+0x13d/0x146
 4[  131.708489]  [a02939c9] btrfs_remove_block_group+0x14d/0x35b 
 [btrfs]
 4[  131.708508]  [a02c6ff7] btrfs_relocate_chunk+0x464/0x50d 
 [btrfs]
 4[  131.708527]  [a02c54ce] ? btrfs_item_key_to_cpu+0x2a/0x46 
 [btrfs]
 4[  131.708545]  [a02c7672] btrfs_balance+0x1ca/0x219 [btrfs]
 4[  131.708563]  [a02cfbfd] btrfs_ioctl+0x890/0xb87 [btrfs]
 4[  131.708567]  [810e87c8] ? handle_mm_fault+0x233/0x24a
 4[  131.708572]  [813a6e25] ? do_page_fault+0x340/0x3b2
 4[  131.708577]  [8111d6f8] do_vfs_ioctl+0x474/0x4c3
 4[  131.708581]  [810ffd25] ? virt_to_head_page+0xe/0x31
 4[  131.708585]  [81100fcc] ? kmem_cache_free+0x20/0xae
 4[  131.708588]  [8111d79d] sys_ioctl+0x56/0x79
 4[  131.708592]  [813aa542] system_call_fastpath+0x16/0x1b
 4[  131.708595] ---[ end trace 5f962f46d3ba5425 ]---
 6[  131.708777] btrfs: relocating block group 29360128 flags 20
 6[  132.385682] btrfs: found 85 extents
 0[  132.798892] [ cut here ]
 2[  132.799014] kernel BUG at fs/btrfs/extent-tree.c:1424!
 0[  132.799014] invalid opcode:  [#1] SMP
 4[  132.799014] CPU 0
 4[  132.799014] Modules linked in: autofs4 sunrpc 8021q garp stp llc 
 cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c 
 libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev 
 parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support 
 tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod 
 crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata 
 scsi_mod floppy [last unloaded: microcode]
 4[  132.799014]
 4[  132.799014] Pid: 3041, comm: btrfs Tainted: GW   3.0.0-rc2test 
 #1 FUJITSU-SV  PRIMERGY/D2399
 4[  132.799014] RIP: 0010:[a0296c86]  [a0296c86] 
 lookup_inline_extent_backref+0xe3/0x3a9 [btrfs]
 4[  132.799014] RSP: 0018:880193aa5808  EFLAGS: 00010202
 4[  132.799014] RAX: 0001 RBX: 880192fac000 RCX: 
 0002
 4[  132.799014] RDX

Re: kernel BUG at fs/btrfs/extent-tree.c:6164!

2011-06-06 Thread liubo
On 06/06/2011 04:33 PM, Tsutomu Itoh wrote:
 Hi,
 
 I encountered following panic using 'btrfs-unstable + for-linus'
 kernel.
 
 I ran btrfs fi bal /test5 command, and mount option of /test5
 is as follows:
 
  /dev/sdc3 on /test5 type btrfs (rw,space_cache,compress=lzo,inode_cache)
 

So, just a btrfs fi bal would lead to the bug?

I've figured out the warnings, but not reproduced the bug yet...
I used 'btrfs-unstable + for-linus whose top commit is

commit aa0467d8d2a00e75b2bb6a56a4ee6d70c5d1928f
Author: David Sterba dste...@suse.cz
Date:   Fri Jun 3 16:29:08 2011 +0200

btrfs: fix uninitialized variable warning

and tried on 1) a single disk, 2) 2 disks and 3) 4 disks respectively,
but none of them leaded to the below bug.

I guess maybe I miss something to reproduce it?

thanks,
liubo

 Thanks,
 Tsutomu
 
 =
 
 btrfs: relocating block group 23383244800 flags 20
 btrfs: found 2959 extents
 [ cut here ]
 WARNING: at fs/btrfs/transaction.c:213 start_transaction+0x2a7/0x2b0 [btrfs]()
 Hardware name: PRIMERGY
 Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand 
 acpi_cpufr
 eq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd 
 dm_mirror
 dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr 
 i2c_i
 801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac 
 edac
 _core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas 
 pata_a
 cpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode]
 Pid: 23781, comm: btrfs Tainted: GW   2.6.39btrfs-test+ #4
 Call Trace:
  [8106004f] warn_slowpath_common+0x7f/0xc0
  [810600aa] warn_slowpath_null+0x1a/0x20
  [a0337047] start_transaction+0x2a7/0x2b0 [btrfs]
  [a035498d] ? btrfs_wait_ordered_range+0x10d/0x160 [btrfs]
  [a0337323] btrfs_start_transaction+0x13/0x20 [btrfs]
  [a033bbca] btrfs_evict_inode+0x11a/0x260 [btrfs]
  [811687f8] evict+0x78/0x170
  [81168c92] iput+0xe2/0x1a0
  [a031f171] btrfs_remove_block_group+0x141/0x3c0 [btrfs]
  [a035e6ea] btrfs_relocate_chunk+0x54a/0x670 [btrfs]
  [a0357668] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
  [a031be51] ? btrfs_previous_item+0xb1/0x150 [btrfs]
  [a035f43a] btrfs_balance+0x21a/0x2b0 [btrfs]
  [8115dc41] ? path_openat+0x101/0x3d0
  [a03685bc] btrfs_ioctl+0x51c/0xc40 [btrfs]
  [8111e358] ? handle_mm_fault+0x148/0x270
  [814809e8] ? do_page_fault+0x1d8/0x4b0
  [81160d6a] do_vfs_ioctl+0x9a/0x540
  [811612b1] sys_ioctl+0xa1/0xb0
  [81484ec2] system_call_fastpath+0x16/0x1b
 ---[ end trace e5c5cb2e98a3cd1a ]---
 btrfs: relocating block group 20971520 flags 18
 btrfs: relocating block group 34925969408 flags 18
 btrfs: found 1 extents
 [ cut here ]
 kernel BUG at fs/btrfs/extent-tree.c:6164!
 invalid opcode:  [#1] SMP
 last sysfs file: /sys/kernel/mm/ksm/run
 CPU 0
 Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand 
 acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 
 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc 
 parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp 
 pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif 
 sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod 
 floppy [last unloaded: microcode]
 
 Pid: 4109, comm: btrfs Tainted: GW   2.6.39btrfs-test+ #4 FUJITSU-SV  
 PRIMERGY/D2399
 RIP: 0010:[a0325b95]  [a0325b95] walk_up_proc+0x375/0x420 
 [btrfs]
 RSP: 0018:8801801eb9c8  EFLAGS: 00010286
 RAX: 0005 RBX: 880167a70140 RCX: fff8
 RDX: 8801801ea000 RSI: 8800 RDI: 880194909fa8
 RBP: 8801801eba18 R08:  R09: 0005
 R10: 0001 R11: 880194909fa8 R12: 
 R13: 88013973d000 R14: 88015ad4d9a0 R15: 880042203920
 FS:  7fa86bcb9740() GS:88019fc0() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 0033cf60b0c0 CR3: 000181cf7000 CR4: 06f0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process btrfs (pid: 4109, threadinfo 8801801ea000, task 88011a4914a0)
 Stack:
  8801801eba18 880194909fa8 8801 a03280e8
  8801801eba58 88015ad4d9a0  
  8801801ea000 880167a70140 8801801eba78 a0325d71
 Call Trace:
  [a03280e8] ? btrfs_run_delayed_refs+0xc8/0x210 [btrfs]
  [a0325d71] walk_up_tree+0x131/0x1b0 [btrfs]
  [a03260b0] btrfs_drop_snapshot+0x2c0/0x5c0 [btrfs]
  [a03328b0

Re: [3.0-rc1] kernel BUG at fs/btrfs/relocation.c:4285!

2011-06-01 Thread liubo
On 05/31/2011 08:27 AM, Tsutomu Itoh wrote:
 The panic occurred when 'btrfs fi bal /test5' was executed.
 
 /test5 is as follows:
 # mount -o space_cache,compress=lzo /dev/sdc3 /test5
 #
 # btrfs fi sh /dev/sdc3
 Label: none  uuid: 38ec48b2-a64b-4225-8cc6-5eb08024dc64
 Total devices 5 FS bytes used 7.87MB
 devid1 size 10.00GB used 2.02GB path /dev/sdc3
 devid2 size 15.01GB used 3.00GB path /dev/sdc5
 devid3 size 15.01GB used 3.00GB path /dev/sdc6
 devid4 size 20.01GB used 2.01GB path /dev/sdc7
 devid5 size 10.00GB used 2.01GB path /dev/sdc8
 
 Btrfs v0.19-50-ge6bd18d
 # btrfs fi df /test5
 Data, RAID0: total=10.00GB, used=3.52MB
 Data: total=8.00MB, used=1.60MB
 System, RAID1: total=8.00MB, used=4.00KB
 System: total=4.00MB, used=0.00
 Metadata, RAID1: total=1.00GB, used=216.00KB
 Metadata: total=8.00MB, used=0.00
 

Hi, Itoh san, 

I've come up with a patch aiming to fix this bug.
The problems is that the inode allocator stores one inode cache per root,
which is at least not good for relocation tree, cause we only find
new inode number from fs tree or file tree (subvol/snapshot).

I've tested with your run.sh and it works well on my box, so you can try this:

===
based on 3.0, commit d6c0cb379c5198487e4ac124728cbb2346d63b1f
===
diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 0009705..ebc2a7b 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -372,6 +372,10 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
int prealloc;
bool retry = false;
 
+   if (root-root_key.objectid != BTRFS_FS_TREE_OBJECTID 
+   root-root_key.objectid  BTRFS_FIRST_FREE_OBJECTID)
+   return 0;
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;



thanks,
liubo

 ---
 Tsutomu
 
 
 
 6device fsid 25424ba6b248ec38-64dc2480b05ec68c devid 5 transid 4 /dev/sdc8
 6device fsid 25424ba6b248ec38-64dc2480b05ec68c devid 1 transid 7 /dev/sdc3
 6btrfs: enabling disk space caching
 6btrfs: use lzo compression
 6device fsid 69423c117ae771dd-c275f966f982cf84 devid 1 transid 7 /dev/sdd4
 6btrfs: disk space caching is enabled
 6btrfs: relocating block group 1103101952 flags 9
 6btrfs: found 318 extents
 0[ cut here ]
 2kernel BUG at fs/btrfs/relocation.c:4285!
 0invalid opcode:  [#1] SMP
 4CPU 1
 4Modules linked in: btrfs autofs4 sunrpc 8021q garp stp llc 
 cpufreq_ondemand acpi_cpufreq freq_table m
 perf ipv6 zlib_deflate libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log 
 dm_mod kvm uinput ppdev parpor
 t_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 
 shpchp i3000_edac edac_core ex
 t4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi 
 ata_generic ata_piix floppy [last
 unloaded: btrfs]
 4Pid: 6173, comm: btrfs Not tainted 3.0.0-rc1btrfs-test #1 FUJITSU-SV  
 PRIMERGY/D2399
 4RIP: 0010:[a049308c]  [a049308c] 
 btrfs_reloc_cow_block+0x22c/0x270 [btrfs]
 4RSP: 0018:8801514236a8  EFLAGS: 00010246
 4RAX: 8801930dc000 RBX: 8801936f5800 RCX: 880163241d60
 4RDX: 88016325dd18 RSI: 8801931a3000 RDI: 8801632fb3e0
 4RBP: 880151423708 R08: 880151423784 R09: 0100
 4R10:  R11: 880163224d58 R12: 8801931a3000
 4R13: 88016325dd18 R14: 8801632fb3e0 R15: 
 4FS:  7f41577ce740() GS:88019fd0() 
 knlGS:
 4CS:  0010 DS:  ES:  CR0: 8005003b
 4CR2: 010afb80 CR3: 00015142e000 CR4: 06e0
 4DR0:  DR1:  DR2: 
 4DR3:  DR6: 0ff0 DR7: 0400
 4Process btrfs (pid: 6173, threadinfo 880151422000, task 
 880151997580)
 0Stack:
 4 88016325dd18 8801632fb3e0 880151423708 a042b2ed
 4  0001 880151423708 8801931a3000
 4 880163241d60 88016325dd18 8801632fb3e0 
 0Call Trace:
 4 [a042b2ed] ? update_ref_for_cow+0x22d/0x330 [btrfs]
 4 [a042b841] __btrfs_cow_block+0x451/0x5e0 [btrfs]
 4 [a042badb] btrfs_cow_block+0x10b/0x250 [btrfs]
 4 [a0431c67] btrfs_search_slot+0x557/0x870 [btrfs]
 4 [a042a252] ? generic_bin_search+0x1f2/0x210 [btrfs]
 4 [a04447bf] btrfs_lookup_inode+0x2f/0xa0 [btrfs]
 4 [a04557c2] btrfs_update_inode+0xc2/0x140 [btrfs]
 4 [a0444fbc] btrfs_save_ino_cache+0x7c/0x200 [btrfs]
 4 [a044c5ad] commit_fs_roots+0xad/0x180 [btrfs]
 4 [a044d555] btrfs_commit_transaction+0x385/0x7d0 [btrfs]
 4 [81081e00] ? wake_up_bit+0x40/0x40
 4 [a048f4bf] prepare_to_relocate+0xdf/0xf0 [btrfs]
 4 [a0496121] relocate_block_group+0x41/0x600 [btrfs]
 4 [814baa6e] ? mutex_lock+0x1e/0x50
 4 [a044bc59

Re: [3.0-rc1] kernel BUG at fs/btrfs/relocation.c:4285!

2011-06-01 Thread liubo
On 06/01/2011 03:44 PM, liubo wrote:
 On 05/31/2011 08:27 AM, Tsutomu Itoh wrote:
  The panic occurred when 'btrfs fi bal /test5' was executed.
  
  /test5 is as follows:
  # mount -o space_cache,compress=lzo /dev/sdc3 /test5
  #
  # btrfs fi sh /dev/sdc3
  Label: none  uuid: 38ec48b2-a64b-4225-8cc6-5eb08024dc64
  Total devices 5 FS bytes used 7.87MB
  devid1 size 10.00GB used 2.02GB path /dev/sdc3
  devid2 size 15.01GB used 3.00GB path /dev/sdc5
  devid3 size 15.01GB used 3.00GB path /dev/sdc6
  devid4 size 20.01GB used 2.01GB path /dev/sdc7
  devid5 size 10.00GB used 2.01GB path /dev/sdc8
  
  Btrfs v0.19-50-ge6bd18d
  # btrfs fi df /test5
  Data, RAID0: total=10.00GB, used=3.52MB
  Data: total=8.00MB, used=1.60MB
  System, RAID1: total=8.00MB, used=4.00KB
  System: total=4.00MB, used=0.00
  Metadata, RAID1: total=1.00GB, used=216.00KB
  Metadata: total=8.00MB, used=0.00
  
 
 Hi, Itoh san, 
 
 I've come up with a patch aiming to fix this bug.
 The problems is that the inode allocator stores one inode cache per root,
 which is at least not good for relocation tree, cause we only find
 new inode number from fs tree or file tree (subvol/snapshot).
 
 I've tested with your run.sh and it works well on my box, so you can try this:
 

Sorry, I messed up BTRFS_FIRST_FREE_OBJECTID and BTRFS_LAST_FREE_OBJECTID,
plz ignore this.

 ===
 based on 3.0, commit d6c0cb379c5198487e4ac124728cbb2346d63b1f
 ===
 diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
 index 0009705..ebc2a7b 100644
 --- a/fs/btrfs/inode-map.c
 +++ b/fs/btrfs/inode-map.c
 @@ -372,6 +372,10 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
   int prealloc;
   bool retry = false;
  
 + if (root-root_key.objectid != BTRFS_FS_TREE_OBJECTID 
 + root-root_key.objectid  BTRFS_FIRST_FREE_OBJECTID)
 + return 0;
 +
   path = btrfs_alloc_path();
   if (!path)
   return -ENOMEM;
 
 
 
 thanks,
 liubo
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.0-rc1] kernel BUG at fs/btrfs/relocation.c:4285!

2011-06-01 Thread liubo
On 06/01/2011 04:12 PM, liubo wrote:
 On 06/01/2011 03:44 PM, liubo wrote:
  On 05/31/2011 08:27 AM, Tsutomu Itoh wrote:
   The panic occurred when 'btrfs fi bal /test5' was executed.
   
   /test5 is as follows:
   # mount -o space_cache,compress=lzo /dev/sdc3 /test5
   #
   # btrfs fi sh /dev/sdc3
   Label: none  uuid: 38ec48b2-a64b-4225-8cc6-5eb08024dc64
   Total devices 5 FS bytes used 7.87MB
   devid1 size 10.00GB used 2.02GB path /dev/sdc3
   devid2 size 15.01GB used 3.00GB path /dev/sdc5
   devid3 size 15.01GB used 3.00GB path /dev/sdc6
   devid4 size 20.01GB used 2.01GB path /dev/sdc7
   devid5 size 10.00GB used 2.01GB path /dev/sdc8
   
   Btrfs v0.19-50-ge6bd18d
   # btrfs fi df /test5
   Data, RAID0: total=10.00GB, used=3.52MB
   Data: total=8.00MB, used=1.60MB
   System, RAID1: total=8.00MB, used=4.00KB
   System: total=4.00MB, used=0.00
   Metadata, RAID1: total=1.00GB, used=216.00KB
   Metadata: total=8.00MB, used=0.00
   
  
  Hi, Itoh san, 
  
  I've come up with a patch aiming to fix this bug.
  The problems is that the inode allocator stores one inode cache per root,
  which is at least not good for relocation tree, cause we only find
  new inode number from fs tree or file tree (subvol/snapshot).
  
  I've tested with your run.sh and it works well on my box, so you can try 
  this:
  

I've tested the following patch for about 1.5 hour, and nothing happened.
And would you please test this patch?

thanks,

From: Liu Bo liubo2...@cn.fujitsu.com

[PATCH] Btrfs: fix save ino cache bug

We just get new inode number from fs root or subvol/snap root,
so we'd like to save fs/subvol/snap root's inode cache into disk.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/inode-map.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c
index 0009705..8c0c25b 100644
--- a/fs/btrfs/inode-map.c
+++ b/fs/btrfs/inode-map.c
@@ -372,6 +372,12 @@ int btrfs_save_ino_cache(struct btrfs_root *root,
int prealloc;
bool retry = false;
 
+   /* only fs tree and subvol/snap needs ino cache */
+   if (root-root_key.objectid != BTRFS_FS_TREE_OBJECTID 
+   (root-root_key.objectid  BTRFS_FIRST_FREE_OBJECTID ||
+root-root_key.objectid  BTRFS_LAST_FREE_OBJECTID))
+   return 0;
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs error after using kernel 3.0-rc1

2011-06-01 Thread liubo
On 06/01/2011 08:22 PM, Fajar A. Nugraha wrote:
 On Wed, Jun 1, 2011 at 6:06 AM, Fajar A. Nugraha l...@fajar.net wrote:
 While using btrfs as root on kernel 3.0-rc1, there was some errors (I
 wasn't able to capture the error) that forced me to do hard reset.

 Now during startup system drops to busybox shell because it's unable
 to mount root partition.
 Is there a way to recover the data, as at least grub2 was still happy
 enough to load kernel and initrd (both of which located on the same
 btrfs partition)?

 This is what dmesg says

 [4.536798] device label SSD-ROOT devid 1 transid 38245 /dev/sda2
 [9.552086] device label SSD-ROOT devid 1 transid 38245
 /dev/disk/by-label/SSD-ROOT
 [9.554563] btrfs: disk space caching is enabled
 [9.564301] parent transid verify failed on 44040192 wanted 38240 found 
 32526
 [9.564535] parent transid verify failed on 44040192 wanted 38240 found 
 32526
 [9.564778] parent transid verify failed on 44040192 wanted 38240 found 
 32526
 [9.575679] parent transid verify failed on 44052480 wanted 38240 found 
 31547
 [9.575904] parent transid verify failed on 44052480 wanted 38240 found 
 31547
 [9.576176] parent transid verify failed on 44052480 wanted 38240 found 
 31547
 [9.586121] parent transid verify failed on 44064768 wanted 38240 found 
 34145
 [9.586319] parent transid verify failed on 44064768 wanted 38240 found 
 34145
 [9.586515] parent transid verify failed on 44064768 wanted 38240 found 
 34145
 [9.587027] parent transid verify failed on 44068864 wanted 38240 found 
 34476
 [9.589732] Btrfs detected SSD devices, enabling SSD mode
 [9.592923] block group 29360128 has an wrong amount of free space
 [9.592959] btrfs: failed to load free space cache for block group 
 29360128
 
 
 For anyone who got the same problem,
 
 I was finally able to mount the fs using Ubuntu Natty's
 2.6.38-8-generic (the one on live CD).
 Previously I tried using 2.6.38-9-generic and and 3.0-rc1, none works.
 Now I'm copying the files somewhere else before reinstalling this
 system.
 
 On another note, does anybody know how btrfs allocates ID for subvols?
 It doesn't seem to reuse deleted subvol's ID. What happens when the
 last subvol ID is 999?
 

Yes, no reuse.

a new subvol will be 1000, one large than 999.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11 v2] Btrfs: improve write ahead log with sub transaction

2011-05-26 Thread liubo

This includes the two patches that we've discussed before.

I sent this as a whole just in case you have to patch the code by yourself. :)

thanks,
liubo

On 05/26/2011 04:19 PM, Liu Bo wrote:
 I've been working to try to improve the write-ahead log's performance,
 and I found that the bottleneck addresses in the checksum items,
 especially when we want to make a random write on a large file, e.g a 4G file.
 
 Then a idea for this suggested by Chris is to use sub transaction ids and just
 to log the part of inode that had changed since either the last log commit or
 the last transaction commit.  And as we also push the sub transid into the 
 btree
 blocks, we'll get much faster tree walks.  As a result, we abandon the 
 original
 brute force approach, which is to delete all items of the inode in log,
 to making sure we get the most uptodate copies of everything, and instead
 we manage to find and merge, i.e. finding extents in the log tree and 
 merging
 in the new extents from the file.
 
 This patchset puts the above idea into code, and although the code is now more
 complex, it brings us a great deal of performance improvement.
 
 Beside the improvement of log, patch 8 fixes a small but critical bug of log 
 code
 with sub transaction.
 
 Here I have some test results to show, I use sysbench to do random write + 
 fsync.
 
 ===
 sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K 
 --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync 
 --file-extra-flags=  [prepare, run]
 ===
 
 Sysbench args:
   - Number of threads: 1
   - Extra file open flags: 0
   - 2 files, 4Gb each
   - Block size 4Kb
   - Number of random requests for random IO: 1
   - Read/Write ratio for combined random IO test: 1.50
   - Periodic FSYNC enabled, calling fsync() each 100 requests.
   - Calling fsync() at the end of test, Enabled.
   - Using synchronous I/O mode
   - Doing random write test
 
 Sysbench results:
 ===
Operations performed:  0 Read, 1 Write, 200 Other = 10200 Total
Read 0b  Written 39.062Mb  Total transferred 39.062Mb
 ===
 a) without patch:  (*SPEED* : 451.01Kb/sec)
112.75 Requests/sec executed
 
 b) with patch: (*SPEED* : 4.3621Mb/sec)
1116.71 Requests/sec executed
 
 v1-v2: fix a EEXIST by logged_trans and a mismatch by log root generation
 
 Liu Bo (11):
   Btrfs: introduce sub transaction stuff
   Btrfs: update block generation if should_cow_block fails
   Btrfs: modify btrfs_drop_extents API
   Btrfs: introduce first sub trans
   Btrfs: still update inode trans stuff when size remains unchanged
   Btrfs: improve log with sub transaction
   Btrfs: add checksum check for log
   Btrfs: fix a bug of log check
   Btrfs: kick off useless code
   Btrfs: deal with EEXIST after iput
   Btrfs: use the right generation number to read log_root_tree
 
  fs/btrfs/btrfs_inode.h |   12 ++-
  fs/btrfs/ctree.c   |   69 +
  fs/btrfs/ctree.h   |5 +-
  fs/btrfs/disk-io.c |   12 +-
  fs/btrfs/extent-tree.c |   10 +-
  fs/btrfs/file.c|   22 ++---
  fs/btrfs/inode.c   |   33 ---
  fs/btrfs/ioctl.c   |6 +-
  fs/btrfs/relocation.c  |6 +-
  fs/btrfs/transaction.c |   13 ++-
  fs/btrfs/transaction.h |   19 +++-
  fs/btrfs/tree-defrag.c |2 +-
  fs/btrfs/tree-log.c|  267 
 +++-
  13 files changed, 330 insertions(+), 146 deletions(-)
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backref walking utilities

2011-05-25 Thread liubo
On 05/25/2011 11:08 PM, Jan Schmidt wrote:
 On 05/23/2011 12:02 PM, Arne Jansen wrote:
 Hi liubo,

 On 23.05.2011 11:53, liubo wrote:
 As one of my plans, I'm going to take this project over unless someone has 
 been working on it.
 Jan Schmidt has a patch for scrub nearly ready, that does some
 ref-walking to report affected files to the user. While this is
 kernel code and you're planning to add user-space code, it might
 still be possible to share some of it. Maybe the efforts can be
 coordinated.
 
 The patches are ready and should be flexible enough to use for your
 purpose. However I use them in context of the scrub code, thus I'm
 planning to send them out as soon as the current version of scrub is
 included in Chris' master.
 
 If anybody wants to test the patches before that (apply well against
 Arnes scrub branch), drop me an email.
 

I'd like to have a look ahead.  Would you please give the address of these 
patches?

thanks,
liubo

 -Jan
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-24 Thread liubo
On 05/24/2011 11:56 PM, liubo wrote:
  The problems I hit:
  
  When an inode is dropped from cache (just via iput) and then read in
  again, the BTRFS_I(inode)-logged_trans goes back to zero.  When this
  happens the logging code assumes the inode isn't in the log and hits
  -EEXIST if it finds inode items.
  
 
 ok, I just find where the problem addresses.  This is because I've put
 a check between logged_trans and transaction_id, which is inclined to
 filter those that are first logged, and I'm sorry for not taking the
 'iput' stuff into consideration.  And it's easy to fix this, as we
 can just kick this check off and put another check while searching
 'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree,
 it proves that this inode is definitely not in the tree.
 
 So I'd like to make some changes like this patch(_UNTEST_):

I've thought of this problem more and came up with a _better and more 
efficient_ patch.
It will always get BTRFS_I(inode)-logged_trans correct value.

But I'm still trying to test it somehow... :P

Here it is:

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 40f6f8f..d22b3bf 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1769,12 +1769,9 @@ static int btrfs_finish_ordered_io(struct inode *inode, 
u64 start, u64 end)
add_pending_csums(trans, inode, ordered_extent-file_offset,
  ordered_extent-list);
 
-   ret = btrfs_ordered_update_i_size(inode, 0, ordered_extent);
-   if (!ret) {
-   ret = btrfs_update_inode(trans, root, inode);
-   BUG_ON(ret);
-   } else
-   btrfs_set_inode_last_trans(trans, inode);
+   btrfs_ordered_update_i_size(inode, 0, ordered_extent);
+   ret = btrfs_update_inode(trans, root, inode);
+   BUG_ON(ret);
ret = 0;
 out:
if (nolock) {
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 912397c..92fe5dd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3032,6 +3032,37 @@ out:
return ret;
 }
 
+static int check_logged_trans(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root, struct inode *inode)
+{
+   struct btrfs_inode_item *inode_item;
+   struct btrfs_path *path;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   ret = btrfs_search_slot(trans, root,
+   BTRFS_I(inode)-location, path, 0, 0);
+   if (ret) {
+   if (ret  0)
+   ret = 0;
+   goto out;
+   }
+
+   btrfs_unlock_up_safe(path, 1);
+   inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0],
+   struct btrfs_inode_item);
+
+   BTRFS_I(inode)-logged_trans = btrfs_inode_transid(path-nodes[0],
+  inode_item);
+out:
+   btrfs_free_path(path);
+   return ret;
+}
+
+
 static int inode_in_log(struct btrfs_trans_handle *trans,
 struct inode *inode)
 {
@@ -3084,6 +3115,18 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle 
*trans,
if (ret)
goto end_no_trans;
 
+   /*
+* After we iput a inode and reread it from disk, logged_trans is 0.
+* However, this inode _may_ still remain in log tree and not be
+* committed yet.
+* So we need to check the log tree to get logged_trans a right value.
+*/
+   if (!BTRFS_I(inode)-logged_trans  root-log_root) {
+   ret = check_logged_trans(trans, root-log_root, inode);
+   if (ret)
+   goto end_no_trans;
+   }
+
if (inode_in_log(trans, inode)) {
ret = BTRFS_NO_LOG_SYNC;
goto end_no_trans;


thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-24 Thread liubo
On 05/24/2011 11:56 PM, liubo wrote:
  
  Second, we use the generation number of the super to read in the log
  tree root after a crash.  This doesn't always match the sub trans id and
  so it doesn't always match the transid stored in the btree blocks.
  
  There are a few solutions to this, we can use some of the reserved
  fields in the super for the generation numbers of the roots the super
  points to, and use whichever one is bigger when we read things in.
  
 
 All right, I'm going to dig it more.
 

I've got this resolved via 'log_root_transid' of 'struct btrfs_super_block',
and it looks nice on both syntactic and functional side. :)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ac8d2ac..1006898 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2103,6 +2103,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (btrfs_super_log_root(disk_super) != 0 
!(fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)) {
u64 bytenr = btrfs_super_log_root(disk_super);
+   u64 log_root_transid = btrfs_super_log_root_transid(disk_super);
 
if (fs_devices-rw_devices == 0) {
printk(KERN_WARNING Btrfs log replay required 
@@ -2125,7 +2126,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
log_tree_root-node = read_tree_block(tree_root, bytenr,
  blocksize,
- generation + 1);
+ log_root_transid);
ret = btrfs_recover_log_trees(log_tree_root);
BUG_ON(ret);
 
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 912397c..b304ec1 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2089,6 +2089,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
log_root_tree-node-start);
btrfs_set_super_log_root_level(root-fs_info-super_for_commit,
btrfs_header_level(log_root_tree-node));
+   btrfs_set_super_log_root_transid(root-fs_info-super_for_commit,
+trans-transid);
 
log_root_tree-log_batch = 0;
log_root_tree-log_transid++;

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] tracing: add __print_symbolic_u64 to avoid warnings on 32bit machine

2011-05-24 Thread liubo
On 05/01/2011 11:35 AM, Steven Rostedt wrote:
 On Fri, 2011-04-29 at 18:01 +0800, liubo wrote:
 ping?
 
 Sorry, I've been trying to get the new ftrace function tracer features
 out ASAP. I plan on looking at this when I'm done.
 
 Thanks,
 

Hi, Steven,

I've seen your latest git-pull, but these 2 patches are not included yet,
so is there any problem with them?

If it does, I can be helpful. :)

thanks,
liubo

 -- Steve
 
 On 04/19/2011 09:35 AM, liubo wrote:
 Filesystem, like Btrfs, has some ULL macros, and when these macros are 
 passed
 to tracepoints'__print_symbolic(), there will be 64-32 truncate WARNINGS 
 during
 compiling on 32bit box.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  include/linux/ftrace_event.h |   12 
  include/trace/ftrace.h   |   13 +
  kernel/trace/trace_output.c  |   27 +++
  3 files changed, 52 insertions(+), 0 deletions(-)

 diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
 index 47e3997..efb2330 100644
 --- a/include/linux/ftrace_event.h
 +++ b/include/linux/ftrace_event.h
 @@ -16,6 +16,11 @@ struct trace_print_flags {
 const char  *name;
  };
  
 +struct trace_print_flags_u64 {
 +   unsigned long long  mask;
 +   const char  *name;
 +};
 +
  const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
unsigned long flags,
const struct trace_print_flags *flag_array);
 @@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, 
 const char *delim,
  const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long 
 val,
  const struct trace_print_flags 
 *symbol_array);
  
 +#if BITS_PER_LONG == 32
 +const char *ftrace_print_symbols_seq_u64(struct trace_seq *p,
 +unsigned long long val,
 +const struct trace_print_flags_u64
 +*symbol_array);
 +#endif
 +
  const char *ftrace_print_hex_seq(struct trace_seq *p,
  const unsigned char *buf, int len);
  
 diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
 index 3e68366..533c49f 100644
 --- a/include/trace/ftrace.h
 +++ b/include/trace/ftrace.h
 @@ -205,6 +205,19 @@
 ftrace_print_symbols_seq(p, value, symbols);\
 })
  
 +#undef __print_symbolic_u64
 +#if BITS_PER_LONG == 32
 +#define __print_symbolic_u64(value, symbol_array...)   
 \
 +   ({  \
 +   static const struct trace_print_flags_u64 symbols[] =   \
 +   { symbol_array, { -1, NULL } }; \
 +   ftrace_print_symbols_seq_u64(p, value, symbols);\
 +   })
 +#else
 +#define __print_symbolic_u64(value, symbol_array...)   
 \
 +   __print_symbolic(value, symbol_array)
 +#endif
 +
  #undef __print_hex
  #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len)
  
 diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
 index 02272ba..b783504 100644
 --- a/kernel/trace/trace_output.c
 +++ b/kernel/trace/trace_output.c
 @@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned 
 long val,
  }
  EXPORT_SYMBOL(ftrace_print_symbols_seq);
  
 +#if BITS_PER_LONG == 32
 +const char *
 +ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
 +const struct trace_print_flags_u64 *symbol_array)
 +{
 +   int i;
 +   const char *ret = p-buffer + p-len;
 +
 +   for (i = 0;  symbol_array[i].name; i++) {
 +
 +   if (val != symbol_array[i].mask)
 +   continue;
 +
 +   trace_seq_puts(p, symbol_array[i].name);
 +   break;
 +   }
 +
 +   if (!p-len)
 +   trace_seq_printf(p, 0x%llx, val);
 +
 +   trace_seq_putc(p, 0);
 +
 +   return ret;
 +}
 +EXPORT_SYMBOL(ftrace_print_symbols_seq_u64);
 +#endif
 +
  const char *
  ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int 
 buf_len)
  {
 
 
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Backref walking utilities

2011-05-23 Thread liubo
Hi,

As one of my plans, I'm going to take this project over unless someone has been 
working on it.

From wiki, quote:
Backref walking utilities

Given a block number on a disk, the Btrfs metadata can find all the files 
and directories
that use or care about that block.  Some utilities to walk these back refs 
and print the
results would help debug corruptions.

Given an inode, the Btrfs metadata can find all the directories that point 
to the inode.
We should have utils to walk these back refs as well. 
end quote.

And I have some thoughts to share with you:

- Clearly, this is going to be another command.  Just like the command 
btrfs-debug-tree,
  btrfs-walk-backref also needs to be able to track btrfs's metadata in
  a) the offline situation (at a umount state), or
  b) the corrupted situation.

- For block number, the main goal is to find relative extent backrefs.  
When it comes to
  those shared blocks, maybe things will be more complex.

- For inode, the main goal is to find relative inode refs.  And we should 
be cautious about
  a) an inode with hard links, b) snapshot.

Did I miss or misunderstand something?  Any comments are welcomed. :)

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Btrfs: improve write ahead log with sub transaction

2011-05-23 Thread liubo
On 05/23/2011 12:43 PM, Josef Bacik wrote:
 On 05/19/2011 04:11 AM, Liu Bo wrote:
 I've been working to try to improve the write-ahead log's performance,
 and I found that the bottleneck addresses in the checksum items,
 especially when we want to make a random write on a large file, e.g a 4G 
 file.

 Then a idea for this suggested by Chris is to use sub transaction ids and 
 just
 to log the part of inode that had changed since either the last log commit or
 the last transaction commit.  And as we also push the sub transid into the 
 btree
 blocks, we'll get much faster tree walks.  As a result, we abandon the 
 original
 brute force approach, which is to delete all items of the inode in log,
 to making sure we get the most uptodate copies of everything, and instead
 we manage to find and merge, i.e. finding extents in the log tree and 
 merging
 in the new extents from the file.

 This patchset puts the above idea into code, and although the code is now 
 more
 complex, it brings us a great deal of performance improvement.

 Beside the improvement of log, patch 8 fixes a small but critical bug of log 
 code
 with sub transaction.

 Here I have some test results to show, I use sysbench to do random write + 
 fsync.

 ===
 sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K 
 --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync 
 --file-extra-flags=  [prepare, run]
 ===

 Sysbench args:
   - Number of threads: 1
   - Extra file open flags: 0
   - 2 files, 4Gb each
   - Block size 4Kb
   - Number of random requests for random IO: 1
   - Read/Write ratio for combined random IO test: 1.50
   - Periodic FSYNC enabled, calling fsync() each 100 requests.
   - Calling fsync() at the end of test, Enabled.
   - Using synchronous I/O mode
   - Doing random write test

 Sysbench results:
 ===
Operations performed:  0 Read, 1 Write, 200 Other = 10200 Total
Read 0b  Written 39.062Mb  Total transferred 39.062Mb
 ===
 a) without patch:  (*SPEED* : 451.01Kb/sec)
112.75 Requests/sec executed

 b) with patch: (*SPEED* : 4.3621Mb/sec)
1116.71 Requests/sec executed

 
 Have you run powerfail tests with this?  I'd like to make sure you
 haven't inadvertently messed something up.  Thanks,
 

Yes, I've done this before, and it has nothing serious but a few of
parent transid verify failed, just the same as Chris had mentioned in the 
thread.

thanks,
liubo

 Josef
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-23 Thread liubo
On 05/23/2011 10:40 AM, Chris Mason wrote:
 Excerpts from Chris Mason's message of 2011-05-19 20:23:29 -0400:
 Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400:
 Introduce a new concept sub transaction,
 the relation between transaction and sub transaction is

 transaction A   --- transid = x
sub trans a(1)   --- sub_transid = x+1
sub trans a(2)   --- sub_transid = x+2
  ... ...
sub trans a(n-1) --- sub_transid = x+n-1
sub trans a(n)   --- sub_transid = x+n
 transaction B   --- transid = x+n+1
  ... ...

 And the most important is
 a) a trans handler's transid now gets value from sub transid instead of 
 transid.
 b) when a transaction commits, transid may not added by 1, but depend on the
biggest sub_transaction of the last neighbour transaction,
i.e.
 B-transid = a(n)-transid + 1,
 (B-transid - A-transid) = 1
 c) we start a new sub transaction after a fsync.

 We also ship some 'trans-transid' to 'trans-transaction-transid' to
 ensure btrfs works well and to get rid of WARNings.

 These are used for the new log code.
 This is exactly what I had in mind.  I need to read it harder and make
 sure it interacts well with the directory logging code, but I love it.
 
 Ok, I hit a few problems with this, and since the transids are used
 everywhere for various reasons, I think we need to wait until 2.6.41.
 This code is really very close to right, but we have the delayed inode
 work, scrub, and the new inode number allocator all at once.  I'd like
 to limit the size of the changes.
 

I agree with this, in fact, I'm a litter worried cause it is such an
important role that the transids are playing in btrfs, which means
to change it is dangerous, so it deserves more test.

 The problems I hit:
 
 When an inode is dropped from cache (just via iput) and then read in
 again, the BTRFS_I(inode)-logged_trans goes back to zero.  When this
 happens the logging code assumes the inode isn't in the log and hits
 -EEXIST if it finds inode items.
 

ok, I just find where the problem addresses.  This is because I've put
a check between logged_trans and transaction_id, which is inclined to
filter those that are first logged, and I'm sorry for not taking the
'iput' stuff into consideration.  And it's easy to fix this, as we
can just kick this check off and put another check while searching
'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree,
it proves that this inode is definitely not in the tree.

So I'd like to make some changes like this patch(_UNTEST_):

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 912397c..69ddbbd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2569,10 +2569,6 @@ static int prepare_for_merge_items(struct 
btrfs_trans_handle *trans,
int i;
int ret;
 
-   /* There are no relative items of the inode in log. */
-   if (BTRFS_I(inode)-logged_trans  trans-transaction-transid)
-   return 0;
-
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
@@ -2622,6 +2618,11 @@ static int prepare_for_merge_items(struct 
btrfs_trans_handle *trans,
 
if (ret  0) {
btrfs_release_path(log, path);
+
+   /* There are no relative items of the inode in log. */
+   if (key.type == BTRFS_INODE_ITEM_KEY)
+   break;
+
continue;
}


 I patched it to just delete away all the logged items if the logged
 transid wasn't set, which is probably safest given that we can now reuse
 inode numbers.
 
 Second, we use the generation number of the super to read in the log
 tree root after a crash.  This doesn't always match the sub trans id and
 so it doesn't always match the transid stored in the btree blocks.
 
 There are a few solutions to this, we can use some of the reserved
 fields in the super for the generation numbers of the roots the super
 points to, and use whichever one is bigger when we read things in.
 

All right, I'm going to dig it more.

 Liubo, since we'll leave this one for .41, I'll take your smaller patch
 that just skips the csum items.
 

ok, I see.  Thank a lot for the review. :)

thanks,
liubo

 -chris
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 2/5] btrfs: state information for readahead

2011-05-23 Thread liubo
On 05/23/2011 08:59 AM, Arne Jansen wrote:
 Add state information for readahead to btrfs_fs_info and btrfs_device
 
 Signed-off-by: Arne Jansen sensi...@gmx.net
 ---
  fs/btrfs/ctree.h   |4 
  fs/btrfs/disk-io.c |4 
  fs/btrfs/volumes.c |8 
  fs/btrfs/volumes.h |8 
  4 files changed, 24 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 2e61fe1..4a33e30 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1079,6 +1079,10 @@ struct btrfs_fs_info {
  
   /* filesystem state */
   u64 fs_state;
 +
 + /* readahead tree */
 + spinlock_t reada_lock;
 + struct radix_tree_root reada_tree;
  };
  
  /*
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 7753eb9..3d4f9c5 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1803,6 +1803,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
   fs_info-max_inline = 8192 * 1024;
   fs_info-metadata_ratio = 0;
  
 + /* readahead state */
 + INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS);
 + spin_lock_init(fs_info-reada_lock);
 +
   fs_info-thread_pool_size = min_t(unsigned long,
 num_online_cpus() + 2, 8);
  
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 8b9fb8c..800e670 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -396,6 +396,14 @@ static noinline int device_list_add(const char *path,
   }
   INIT_LIST_HEAD(device-dev_alloc_list);
  
 + /* init readahead state */
 + spin_lock_init(device-reada_lock);
 + device-reada_curr_zone = NULL;
 + atomic_set(device-reada_in_flight, 0);
 + device-reada_next = 0;
 + INIT_RADIX_TREE(device-reada_zones, GFP_NOFS);
 + INIT_RADIX_TREE(device-reada_extents, GFP_NOFS);
 +
   mutex_lock(fs_devices-device_list_mutex);
   list_add(device-dev_list, fs_devices-devices);
   mutex_unlock(fs_devices-device_list_mutex);
 diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
 index cc2eada..33acd4e 100644
 --- a/fs/btrfs/volumes.h
 +++ b/fs/btrfs/volumes.h
 @@ -86,6 +86,14 @@ struct btrfs_device {
   u8 uuid[BTRFS_UUID_SIZE];
  
   struct btrfs_work work;
 +
 + /* readahead state */
 + spinlock_t reada_lock;
 + atomic_t reada_in_flight;
 + u64 reada_next;
 + struct reada_zone *reada_curr_zone;

struct reada_zone has not been defined yet...

thanks,
liubo

 + struct radix_tree_root reada_zones;
 + struct radix_tree_root reada_extents;
  };
  
  struct btrfs_fs_devices {

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Btrfs: improve write ahead log with sub transaction

2011-05-19 Thread liubo
On 05/19/2011 04:11 PM, Liu Bo wrote:
 I've been working to try to improve the write-ahead log's performance,
 and I found that the bottleneck addresses in the checksum items,
 especially when we want to make a random write on a large file, e.g a 4G file.
 
 Then a idea for this suggested by Chris is to use sub transaction ids and just
 to log the part of inode that had changed since either the last log commit or
 the last transaction commit.  And as we also push the sub transid into the 
 btree
 blocks, we'll get much faster tree walks.  As a result, we abandon the 
 original
 brute force approach, which is to delete all items of the inode in log,
 to making sure we get the most uptodate copies of everything, and instead
 we manage to find and merge, i.e. finding extents in the log tree and 
 merging
 in the new extents from the file.
 
 This patchset puts the above idea into code, and although the code is now more
 complex, it brings us a great deal of performance improvement.
 
 Beside the improvement of log, patch 8 fixes a small but critical bug of log 
 code
 with sub transaction.
 
 Here I have some test results to show, I use sysbench to do random write + 
 fsync.
 
 ===
 sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K 
 --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync 
 --file-extra-flags=  [prepare, run]
 ===
 
 Sysbench args:
   - Number of threads: 1
   - Extra file open flags: 0
   - 2 files, 4Gb each
   - Block size 4Kb
   - Number of random requests for random IO: 1
   - Read/Write ratio for combined random IO test: 1.50
   - Periodic FSYNC enabled, calling fsync() each 100 requests.
   - Calling fsync() at the end of test, Enabled.
   - Using synchronous I/O mode
   - Doing random write test
 
 Sysbench results:
 ===
Operations performed:  0 Read, 1 Write, 200 Other = 10200 Total
Read 0b  Written 39.062Mb  Total transferred 39.062Mb
 ===
 a) without patch:  (*SPEED* : 451.01Kb/sec)
112.75 Requests/sec executed
 
 b) with patch: (*SPEED* : 4.3621Mb/sec)
1116.71 Requests/sec executed
 
 
 Liu Bo (10):
   Btrfs: introduce sub transaction stuff
   Btrfs: modify should_cow_block to update block's generation
   Btrfs: modify btrfs_drop_extents API
   Btrfs: introduce first sub trans
   Btrfs: still update inode transid when size remains unchanged
   Btrfs: main log stuff
   Btrfs: add checksum check for log
   Btrfs: fix a bug of log check
   Btrfs: kick off useless code
   Btrfs: ship trans-transid to trans-transaction-transid
 
  fs/btrfs/btrfs_inode.h |   12 ++-
  fs/btrfs/ctree.c   |   71 ++-
  fs/btrfs/ctree.h   |5 +-
  fs/btrfs/disk-io.c |9 +-
  fs/btrfs/extent-tree.c |   10 ++-
  fs/btrfs/file.c|   22 ++---
  fs/btrfs/inode.c   |   28 --
  fs/btrfs/ioctl.c   |6 +-
  fs/btrfs/relocation.c  |6 +-
  fs/btrfs/transaction.c |   13 ++-
  fs/btrfs/transaction.h |   19 -
  fs/btrfs/tree-defrag.c |2 +-
  fs/btrfs/tree-log.c|  222 ---
  13 files changed, 279 insertions(+), 146 deletions(-)
 
 

Sorry for the wrong analysis info, here is the right one:

Liu Bo (9):
  Btrfs: introduce sub transaction stuff
  Btrfs: update block generation if should_cow_block fails
  Btrfs: modify btrfs_drop_extents API
  Btrfs: introduce first sub trans
  Btrfs: still update inode trans stuff when size remains unchanged
  Btrfs: improve log with sub transaction
  Btrfs: add checksum check for log
  Btrfs: fix a bug of log check
  Btrfs: kick off useless code

 fs/btrfs/btrfs_inode.h |   12 ++-
 fs/btrfs/ctree.c   |   69 +++
 fs/btrfs/ctree.h   |5 +-
 fs/btrfs/disk-io.c |9 +-
 fs/btrfs/extent-tree.c |   10 ++-
 fs/btrfs/file.c|   22 ++---
 fs/btrfs/inode.c   |   28 --
 fs/btrfs/ioctl.c   |6 +-
 fs/btrfs/relocation.c  |6 +-
 fs/btrfs/transaction.c |   13 ++-
 fs/btrfs/transaction.h |   19 -
 fs/btrfs/tree-defrag.c |2 +-
 fs/btrfs/tree-log.c|  222 ---
 13 files changed, 282 insertions(+), 141 deletions(-)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff

2011-05-19 Thread liubo
On 05/20/2011 08:23 AM, Chris Mason wrote:
 Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400:
 Introduce a new concept sub transaction,
 the relation between transaction and sub transaction is

 transaction A   --- transid = x
sub trans a(1)   --- sub_transid = x+1
sub trans a(2)   --- sub_transid = x+2
  ... ...
sub trans a(n-1) --- sub_transid = x+n-1
sub trans a(n)   --- sub_transid = x+n
 transaction B   --- transid = x+n+1
  ... ...

 And the most important is
 a) a trans handler's transid now gets value from sub transid instead of 
 transid.
 b) when a transaction commits, transid may not added by 1, but depend on the
biggest sub_transaction of the last neighbour transaction,
i.e.
 B-transid = a(n)-transid + 1,
 (B-transid - A-transid) = 1
 c) we start a new sub transaction after a fsync.

 We also ship some 'trans-transid' to 'trans-transaction-transid' to
 ensure btrfs works well and to get rid of WARNings.

 These are used for the new log code.
 
 This is exactly what I had in mind.  I need to read it harder and make
 sure it interacts well with the directory logging code, but I love it.
 
 Thanks!
 

It's so great that you like it.  :)

But I must NOTE again:
   Due to the bug which patch 8 fixed, the previous preformance statistics I 
posted sometime ago, 
   like (*SPEED* : 4.7+ Mb/sec), are valueless and cannot be used as a basis 
any more...

Hope that more people can get the patchset tested.

thanks,
liubo

 -chris
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: crash in btrfsck, btrfs-debug-tree, etc

2011-05-16 Thread liubo
On 05/04/2010 05:28 AM, Vladimir G. Ivanovic wrote:
 No help, eh? At the minimum, it would be nice if btrfsck were fixed...
 

Not sure if the following one will help you to show the metadata, but you
can give it a try and go on using btrfs-debug-tree.

diff --git a/disk-io.c b/disk-io.c
index a6e1000..90f2831 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -204,12 +204,8 @@ struct extent_buffer *read_tree_block(struct btrfs_root 
*root, u64 bytenr,
eb-dev_bytenr = multi-stripes[0].physical;
kfree(multi);
ret = read_extent_from_disk(eb);
-   if (ret == 0  check_tree_block(root, eb) == 0 
-   csum_tree_block(root, eb, 1) == 0 
-   verify_parent_transid(eb-tree, eb, parent_transid) == 0) {
-   btrfs_set_buffer_uptodate(eb);
+   if (ret == 0)
return eb;
-   }
num_copies = btrfs_num_copies(root-fs_info-mapping_tree,
  eb-start, eb-len);
if (num_copies == 1) {


thanks,
liubo.

 Unfortunately, now btrfs will NOT mount the drive, so I am now
 completely without data. The mount error is:
 
 kernel: device fsid c64b56bd1c869bb3-e85f95a29c7dd3ad devid 1
 transid 21547 /dev/sdc1
 kernel: btrfs bad tree block start 14052438117991321731 20971520
 kernel: btrfs bad tree block start 14052438117991321731 20971520
 kernel: btrfs bad tree block start 8532476744452893537 20971520
 kernel: btrfs: failed to read chunk root on sdc1
 kernel: btrfs: open_ctree failed
 
 --- Vladimir
 
 Vladimir G. Ivanovichttp://www.leonora.org
 +1 650 450 4101   vladi...@acm.org
 
 
 on 04/28/2010 01:03 PM Vladimir G. Ivanovic said the following:
 I overwrote some part of the first 195641856 bytes of a 1TB (nominal)
 btrfs volume (I CTRL-C'd out
 before dd finished.) OK, OK, you may stop laughing now. Surely something
 similar has happened to
 you. No? Then it will, someday.

 First things first: A huge congratulations to the btrfs team because the
 btrfs volume is still
 usable. I do get many errors similar to:

 kernel: btrfs bad tree block start 3050544144921548175 12056985

 but for many of my files, I don't get errors.

 Now, onto my problems. My first thought was to btrfsck the unmount
 volume, but btrfsck crashes:

 # btrfsck /dev/sdc1
 btrfsck: disk-io.c:723: open_ctree_fd: Assertion
 `!(!chunk_root-node)' failed.
 Aborted (core dumped)

 So does btrfs-debug-tree, and I suspect other utilities will as well. I
 tried the latest utilities
 from btrfs-progs-unstable, but they too crash with the same error. (I'm
 on a Athlon64-powered
 netbook running Fedora 12. btrfs's version is 0.19.) In particular, so
 does btrfs-image, so I can't
 share the volume's metadata.

 So, until the utilities are fixed, what are my options?

 * Can I create a snapshot of the root volume? Would I end up with
 everything that could be read in
   the snapshot, or would it also have errors? If this is a good idea,
 would these commands work?

   btrfsctl -s snapshot_of_root /mnt/chopin1
   mount.btrfs -o subvol=snapshot_of_root /dev/sdc1 /mnt/snap

   do the trick, assuming that btrfsctl doesn't also crash? Then what?
 Copy the snapshot to another
   disk? Somehow make the new snapshot the new root, allowing me to
 delete the old root?

 * Should I just try and copy the data to another disk and reformat my
 current volume?

 * Is there a way of testing whether a particular file is good other than
 (slowly) going through
   each and every file while watching syslog? cat, for example, doesn't
 return an error when the
   file is bad, so I don't think I can write a shell script to copy good
 files to another volume.

 Are there other options that I haven't considered?

 Thanks for all help.

 --- Vladimir

   
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs progs: fix extra metadata chunk allocation in --mixed case

2011-05-05 Thread liubo
On 05/05/2011 10:16 PM, Arne Jansen wrote:
 When creating a mixed fs with mkfs, an extra metadata chunk got allocated.
 This is because btrfs_reserve_extent calls do_chunk_alloc for METADATA,
 which in turn wasn't able to find the proper space_info, as __find_space_info
 did a hard compare of the flags. It is now sufficient for the space_info to
 include the proper flag. This reflects the change done to the kernel code
 to support mixed chunks.
 Also for a subsequent chunk allocation (which should not be hit in the mkfs
 case), the chunk is now created with the flags from the space_info instead
 of the requested flags. A better solution would be to pull the full changeset
 for the mixed case from the kernel into the user mode (or, even better, share
 the code)
 
 The additional chunk probably confused block_rsv calculation, which in turn
 led to severeal ENOSPC Oopses.
 

Good catch!

Reviewed-by: Liu Bo liubo2...@cn.fujitsu.com

 Signed-off-by: Arne Jansen sensi...@gmx.net
 ---
  extent-tree.c |7 ---
  1 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/extent-tree.c b/extent-tree.c
 index b2f9bb2..c6c77c6 100644
 --- a/extent-tree.c
 +++ b/extent-tree.c
 @@ -1735,7 +1735,7 @@ static struct btrfs_space_info 
 *__find_space_info(struct btrfs_fs_info *info,
   struct btrfs_space_info *found;
   list_for_each(cur, head) {
   found = list_entry(cur, struct btrfs_space_info, list);
 - if (found-flags == flags)
 + if (found-flags  flags)
   return found;
   }
   return NULL;
 @@ -1812,7 +1812,8 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
 *trans,
   thresh)
   return 0;
  
 - ret = btrfs_alloc_chunk(trans, extent_root, start, num_bytes, flags);
 + ret = btrfs_alloc_chunk(trans, extent_root, start, num_bytes,
 + space_info-flags);
   if (ret == -ENOSPC) {
   space_info-full = 1;
   return 0;
 @@ -1820,7 +1821,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
 *trans,
  
   BUG_ON(ret);
  
 - ret = btrfs_make_block_group(trans, extent_root, 0, flags,
 + ret = btrfs_make_block_group(trans, extent_root, 0, space_info-flags,
BTRFS_FIRST_CHUNK_TREE_OBJECTID, start, num_bytes);
   BUG_ON(ret);
   return 0;

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog

2011-05-05 Thread liubo

The current code relogs the entire inode every time during fsync log,
and it is much better suited to small files rather than large ones.

During my performance test, the fsync performace of large files sucks,
and we can ascribe this to the tremendous amount of csum infos of the
large ones, cause we have to flush all of these csum infos into log trees
even when there are only _one_ change in the whole file data.  Apparently,
to optimize fsync, we need to create a filter to skip the unnecessary csum
ones, that is, the corresponding file data remains unchanged before this fsync.

Here I have some test results to show, I use sysbench to do random write + 
fsync.

===
sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K 
--file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync 
--file-extra-flags=  [prepare, run]
===

Sysbench args:
  - Number of threads: 1
  - Extra file open flags: 0
  - 2 files, 4Gb each
  - Block size 4Kb
  - Number of random requests for random IO: 1
  - Read/Write ratio for combined random IO test: 1.50
  - Periodic FSYNC enabled, calling fsync() each 100 requests.
  - Calling fsync() at the end of test, Enabled.
  - Using synchronous I/O mode
  - Doing random write test

Sysbench results:
===
   Operations performed:  0 Read, 1 Write, 200 Other = 10200 Total
   Read 0b  Written 39.062Mb  Total transferred 39.062Mb
===
a) without patch:  (*SPEED* : 451.01Kb/sec)
   112.75 Requests/sec executed

b) with patch: (*SPEED* : 4.7533Mb/sec)
   1216.84 Requests/sec executed


PS: I've made a _sub transid_ stuff patch, but it does not perform as 
effectively as this patch,
and I'm wanderring where the problem is and trying to improve it more.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/tree-log.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c50271a..b934a36 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2662,6 +2662,9 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
extent = btrfs_item_ptr(src, start_slot + i,
struct btrfs_file_extent_item);
 
+   if (btrfs_file_extent_generation(src, extent)  
trans-transid)
+   continue;
+
found_type = btrfs_file_extent_type(src, extent);
if (found_type == BTRFS_FILE_EXTENT_REG ||
found_type == BTRFS_FILE_EXTENT_PREALLOC) {
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] tracing: add __print_symbolic_u64 to avoid warnings on 32bit machine

2011-04-29 Thread liubo
ping?

On 04/19/2011 09:35 AM, liubo wrote:
 Filesystem, like Btrfs, has some ULL macros, and when these macros are 
 passed
 to tracepoints'__print_symbolic(), there will be 64-32 truncate WARNINGS 
 during
 compiling on 32bit box.
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  include/linux/ftrace_event.h |   12 
  include/trace/ftrace.h   |   13 +
  kernel/trace/trace_output.c  |   27 +++
  3 files changed, 52 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
 index 47e3997..efb2330 100644
 --- a/include/linux/ftrace_event.h
 +++ b/include/linux/ftrace_event.h
 @@ -16,6 +16,11 @@ struct trace_print_flags {
   const char  *name;
  };
  
 +struct trace_print_flags_u64 {
 + unsigned long long  mask;
 + const char  *name;
 +};
 +
  const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
  unsigned long flags,
  const struct trace_print_flags *flag_array);
 @@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, 
 const char *delim,
  const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
const struct trace_print_flags 
 *symbol_array);
  
 +#if BITS_PER_LONG == 32
 +const char *ftrace_print_symbols_seq_u64(struct trace_seq *p,
 +  unsigned long long val,
 +  const struct trace_print_flags_u64
 +  *symbol_array);
 +#endif
 +
  const char *ftrace_print_hex_seq(struct trace_seq *p,
const unsigned char *buf, int len);
  
 diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
 index 3e68366..533c49f 100644
 --- a/include/trace/ftrace.h
 +++ b/include/trace/ftrace.h
 @@ -205,6 +205,19 @@
   ftrace_print_symbols_seq(p, value, symbols);\
   })
  
 +#undef __print_symbolic_u64
 +#if BITS_PER_LONG == 32
 +#define __print_symbolic_u64(value, symbol_array...) \
 + ({  \
 + static const struct trace_print_flags_u64 symbols[] =   \
 + { symbol_array, { -1, NULL } }; \
 + ftrace_print_symbols_seq_u64(p, value, symbols);\
 + })
 +#else
 +#define __print_symbolic_u64(value, symbol_array...) \
 + __print_symbolic(value, symbol_array)
 +#endif
 +
  #undef __print_hex
  #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len)
  
 diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
 index 02272ba..b783504 100644
 --- a/kernel/trace/trace_output.c
 +++ b/kernel/trace/trace_output.c
 @@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned 
 long val,
  }
  EXPORT_SYMBOL(ftrace_print_symbols_seq);
  
 +#if BITS_PER_LONG == 32
 +const char *
 +ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
 +  const struct trace_print_flags_u64 *symbol_array)
 +{
 + int i;
 + const char *ret = p-buffer + p-len;
 +
 + for (i = 0;  symbol_array[i].name; i++) {
 +
 + if (val != symbol_array[i].mask)
 + continue;
 +
 + trace_seq_puts(p, symbol_array[i].name);
 + break;
 + }
 +
 + if (!p-len)
 + trace_seq_printf(p, 0x%llx, val);
 +
 + trace_seq_putc(p, 0);
 +
 + return ret;
 +}
 +EXPORT_SYMBOL(ftrace_print_symbols_seq_u64);
 +#endif
 +
  const char *
  ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int 
 buf_len)
  {

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog

2011-04-25 Thread liubo
On 04/22/2011 09:28 AM, Chris Mason wrote:
 Excerpts from Li Zefan's message of 2011-04-21 20:55:40 -0400:
 Chris Mason wrote:
 Excerpts from liubo's message of 2011-04-21 03:58:21 -0400:
 The current code relogs the entire inode every time during fsync log,
 and it is much better suited to small files rather than large ones.

 During my performance test, the fsync performace of large files sucks,
 and we can ascribe this to the tremendous amount of csum infos of the
 large ones, cause we have to flush all of these csum infos into log trees
 even when there are only _one_ change in the whole file data.  Apparently,
 to optimize fsync, we need to create a filter to skip the unnecessary csum
 ones, that is, the corresponding file data remains unchanged before this 
 fsync.

 Here I have some test results to show, I use sysbench to do random write 
 + fsync.

 Sysbench args:
   - Number of threads: 1
   - Extra file open flags: 0
   - 2 files, 4Gb each
   - Block size 4Kb
   - Number of random requests for random IO: 1
   - Read/Write ratio for combined random IO test: 1.50
   - Periodic FSYNC enabled, calling fsync() each 100 requests.
   - Calling fsync() at the end of test, Enabled.
   - Using synchronous I/O mode
   - Doing random write test

 Sysbench results:
 ===
Operations performed:  0 Read, 1 Write, 200 Other = 10200 Total
Read 0b  Written 39.062Mb  Total transferred 39.062Mb
 ===
 a) without patch:  (*SPEED* : 451.01Kb/sec)
112.75 Requests/sec executed

 b) with patch: (*SPEED* : 5.1537Mb/sec)
1319.34 Requests/sec executed
 Really nice results! Especially considering the small size of the patch.

 But, I'd really like to look at using sub transaction ids for this, and
 then logging just the part of the inode that had changed since the last
 log commit.  It's more complex, but will also help reduce tree searches
 for the file items.

 And this patch forgot to mention it has compatability issue.
 
 Right, at the very least we want to just use one bit of that field
 instead of all 8.  But keeping a sub-transid and putting that in the
 generation field of the file extent instead can get us the same benefits
 without stealing the bits.
 

Nice.  This is the first step of my plan.

 As we push the sub transid into the btree blocks as well, we'll get much
 faster tree walks too.  The penalty is in complexity in the logging
 code, since it will have to deal with finding extents in the log tree
 and merging in the new extents from the file.

I've been thinking of this extent buffer with sub transid stuff for a while,
and will give it a try. :)

thanks,
liubo.

 
 -chris
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog

2011-04-21 Thread liubo

The current code relogs the entire inode every time during fsync log,
and it is much better suited to small files rather than large ones.

During my performance test, the fsync performace of large files sucks,
and we can ascribe this to the tremendous amount of csum infos of the
large ones, cause we have to flush all of these csum infos into log trees
even when there are only _one_ change in the whole file data.  Apparently,
to optimize fsync, we need to create a filter to skip the unnecessary csum
ones, that is, the corresponding file data remains unchanged before this fsync.

Here I have some test results to show, I use sysbench to do random write + 
fsync.

Sysbench args:
  - Number of threads: 1
  - Extra file open flags: 0
  - 2 files, 4Gb each
  - Block size 4Kb
  - Number of random requests for random IO: 1
  - Read/Write ratio for combined random IO test: 1.50
  - Periodic FSYNC enabled, calling fsync() each 100 requests.
  - Calling fsync() at the end of test, Enabled.
  - Using synchronous I/O mode
  - Doing random write test

Sysbench results:
===
   Operations performed:  0 Read, 1 Write, 200 Other = 10200 Total
   Read 0b  Written 39.062Mb  Total transferred 39.062Mb
===
a) without patch:  (*SPEED* : 451.01Kb/sec)
   112.75 Requests/sec executed

b) with patch: (*SPEED* : 5.1537Mb/sec)
   1319.34 Requests/sec executed

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h|   14 --
 fs/btrfs/inode.c|1 +
 fs/btrfs/tree-log.c |   31 +--
 3 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2e61fe1..300bea0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -642,6 +642,12 @@ struct btrfs_root_ref {
 #define BTRFS_FILE_EXTENT_REG 1
 #define BTRFS_FILE_EXTENT_PREALLOC 2
 
+/*
+ * used to indicate that this file extent has just been changed and
+ * its csums need to be updated when fsync tries to log this inode.
+ */
+#define BTRFS_FILE_EXTENT_CSUM_UPTODATE(1  0)
+
 struct btrfs_file_extent_item {
/*
 * transaction id that created this extent
@@ -665,7 +671,9 @@ struct btrfs_file_extent_item {
 */
u8 compression;
u8 encryption;
-   __le16 other_encoding; /* spare for later use */
+   u8 other_encoding; /* spare for later use */
+
+   u8 flag;
 
/* are we inline data or a real extent? */
u8 type;
@@ -2026,7 +2034,9 @@ BTRFS_SETGET_FUNCS(file_extent_compression, struct 
btrfs_file_extent_item,
 BTRFS_SETGET_FUNCS(file_extent_encryption, struct btrfs_file_extent_item,
   encryption, 8);
 BTRFS_SETGET_FUNCS(file_extent_other_encoding, struct btrfs_file_extent_item,
-  other_encoding, 16);
+  other_encoding, 8);
+BTRFS_SETGET_FUNCS(file_extent_flag, struct btrfs_file_extent_item,
+  flag, 8);
 
 /* this returns the number of file bytes represented by the inline item.
  * If an item is compressed, this is the uncompressed size
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a4157cf..ed4e318 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1660,6 +1660,7 @@ static int insert_reserved_file_extent(struct 
btrfs_trans_handle *trans,
btrfs_set_file_extent_compression(leaf, fi, compression);
btrfs_set_file_extent_encryption(leaf, fi, encryption);
btrfs_set_file_extent_other_encoding(leaf, fi, other_encoding);
+   btrfs_set_file_extent_flag(leaf, fi, BTRFS_FILE_EXTENT_CSUM_UPTODATE);
 
btrfs_unlock_up_safe(path, 1);
btrfs_set_lock_blocking(leaf);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c50271a..baa4a0a 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2591,11 +2591,24 @@ static int drop_objectid_items(struct 
btrfs_trans_handle *trans,
return ret;
 }
 
+static inline int need_csum(struct extent_buffer *src,
+   struct btrfs_file_extent_item *fi,
+   u64 gen, int csum)
+{
+   if (csum 
+   (btrfs_file_extent_generation(src, fi) == gen) 
+   (btrfs_file_extent_flag(src, fi)  BTRFS_FILE_EXTENT_CSUM_UPTODATE))
+   return 1;
+
+   return 0;
+}
+
+
 static noinline int copy_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *log,
   struct btrfs_path *dst_path,
   struct extent_buffer *src,
-  int start_slot, int nr, int inode_only)
+  int start_slot, int nr, int inode_only, int csum)
 {
unsigned long src_offset;
unsigned long dst_offset;
@@ -2653,6 +2666,7 @@ static noinline int copy_items(struct btrfs_trans_handle 
*trans,
btrfs_set_inode_generation(dst_path-nodes[0],
   inode_item, 0);
}
+
/* 

Re: [PATCH 1/1] btrfs: add missing spin_unlock to a rare exit path

2011-04-20 Thread liubo
Good catch!

thanks,
liubo

On 04/20/2011 08:34 PM, David Sterba wrote:
 Signed-off-by: David Sterba dste...@suse.cz
 ---
  fs/btrfs/disk-io.c |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 5e5d07c..25e4b8f 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -2825,6 +2825,7 @@ static int btrfs_destroy_delayed_refs(struct 
 btrfs_transaction *trans,
  
   spin_lock(delayed_refs-lock);
   if (delayed_refs-num_entries == 0) {
 + spin_unlock(delayed_refs-lock);
   printk(KERN_INFO delayed_refs has NO entry\n);
   return ret;
   }

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new file op for fsync to give fs's more control

2011-04-18 Thread liubo
On 04/16/2011 03:32 AM, Josef Bacik wrote:
 On 04/15/2011 03:24 PM, Christoph Hellwig wrote:
 Sorry, but this is too ugly to live.  If the reason for this really is
 good enough we'll just need to push the filemap_write_and_wait_range
 and i_mutex locking into every -fsync instance.

 
 So part of what makes small fsyncs slow in btrfs is all of our random
 threads to make checksumming not suck.  So we submit IO which spreads it
 out to helper threads to do the checksumming, and then when it returns
 it gets handed off to endio threads that run the endio stuff.  This
 works awesome with doing big writes and such, but if say we're and RPM
 database and write a couple of kilbytes, this tends to suck because we
 keep handing work off to other threads and waiting, so the scheduling
 latencies really hurt.
 
 So we'd like to be able to say hey this is a small amount of io, lets
 just do the checksumming in the current thread, and the same with
 handling the endio stuff.  We can't do that currently because
 filemap_write_and_wait_range is called before we get to fsync.  We'd
 like to be able to control this so we can do the appropriate magic to do
 the submission within the fsyncings thread context in order to speed
 things up a bit.
 
 That plus the stuff I said about i_mutex.  Is that a good enough reason
 to just push this down into all the filesystems?  Thanks,
 

Fine with the i_mutex.

I'm wandering that is it worth of doing so?

I've tested your patch with sysbench, and there is little improvement. :(

Sysbench args:
sysbench --test=fileio --num-threads=1 --file-num=10240 --file-block-size=1K 
--file-total-size=20M --file-test-mode=rndwr --file-io-mode=sync 
--file-extra-flags=  run


10240 files, 2Kb each
===
fsync_nolock (patch):
Operations performed:  0 Read, 1 Write, 1024000 Other = 1034000 Total
Read 0b  Written 9.7656Mb  Total transferred 9.7656Mb  (35.152Kb/sec)
   35.15 Requests/sec executed

fsync (orig):
Operations performed:  0 Read, 1 Write, 1024000 Other = 1034000 Total
Read 0b  Written 9.7656Mb  Total transferred 9.7656Mb  (35.287Kb/sec)
   35.29 Requests/sec executed
===

Seems that the improvement of avoiding threads interchange is not enough.

BTW, I'm trying to improve the fsync performance stuff, but mainly for large 
files(4G).
And I found that a large file will have a tremendous amount of csum items 
needed to
be flush into tree log during fsync().  Btrfs now uses a brute force approach to
ensure to get the most uptodate copies of everything, and this results in a bad
performance.  To change the brute way is bugging me a lot...

thanks,
liubo

 Josef
 -- 
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] E2fsprogs: use the generic inode flags

2011-04-18 Thread liubo

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 debugfs/htree.c|2 +-
 e2fsck/pass1.c |   22 +++---
 e2fsck/pass2.c |2 +-
 e2fsck/pass4.c |2 +-
 e2fsck/rehash.c|4 ++--
 ext2ed/inode_com.c |   14 +++---
 lib/e2p/fgetflags.c|6 +++---
 lib/e2p/fsetflags.c|6 +++---
 lib/e2p/getflags.c |6 +++---
 lib/e2p/pf.c   |   34 +-
 lib/e2p/setflags.c |6 +++---
 lib/ext2fs/ext2_fs.h   |   44 ++--
 lib/ext2fs/link.c  |4 ++--
 lib/ext2fs/mkjournal.c |2 +-
 misc/chattr.c  |   26 +-
 misc/tune2fs.c |2 +-
 16 files changed, 91 insertions(+), 91 deletions(-)

diff --git a/debugfs/htree.c b/debugfs/htree.c
index 08f9749..cc9f0fb 100644
--- a/debugfs/htree.c
+++ b/debugfs/htree.c
@@ -243,7 +243,7 @@ void do_htree_dump(int argc, char *argv[])
goto errout;
}
 
-   if ((inode.i_flags  EXT2_BTREE_FL) == 0) {
+   if ((inode.i_flags  FS_BTREE_FL) == 0) {
com_err(argv[0], 0, Not a hash-indexed directory);
goto errout;
}
diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
index 67dd986..5ba93ca 100644
--- a/e2fsck/pass1.c
+++ b/e2fsck/pass1.c
@@ -138,7 +138,7 @@ int e2fsck_pass1_check_device_inode(ext2_filsys fs 
EXT2FS_ATTR((unused)),
 * If the index flag is set, then this is a bogus
 * device/fifo/socket
 */
-   if (inode-i_flags  EXT2_INDEX_FL)
+   if (inode-i_flags  FS_INDEX_FL)
return 0;
 
/*
@@ -152,7 +152,7 @@ int e2fsck_pass1_check_device_inode(ext2_filsys fs 
EXT2FS_ATTR((unused)),
 * you can't set or clear immutable flags for devices.)  Once
 * the kernel has been fixed we can change this...
 */
-   if (inode-i_flags  (EXT2_IMMUTABLE_FL | EXT2_APPEND_FL)) {
+   if (inode-i_flags  (FS_IMMUTABLE_FL | FS_APPEND_FL)) {
for (i=4; i  EXT2_N_BLOCKS; i++)
if (inode-i_block[i])
return 0;
@@ -175,7 +175,7 @@ int e2fsck_pass1_check_symlink(ext2_filsys fs, ext2_ino_t 
ino,
struct ext2fs_extentextent;
 
if ((inode-i_size_high || inode-i_size == 0) ||
-   (inode-i_flags  EXT2_INDEX_FL))
+   (inode-i_flags  FS_INDEX_FL))
return 0;
 
if (inode-i_flags  EXT4_EXTENTS_FL) {
@@ -235,7 +235,7 @@ int e2fsck_pass1_check_symlink(ext2_filsys fs, ext2_ino_t 
ino,
  * If the immutable (or append-only) flag is set on the inode, offer
  * to clear it.
  */
-#define BAD_SPECIAL_FLAGS (EXT2_IMMUTABLE_FL | EXT2_APPEND_FL)
+#define BAD_SPECIAL_FLAGS (FS_IMMUTABLE_FL | FS_APPEND_FL)
 static void check_immutable(e2fsck_t ctx, struct problem_context *pctx)
 {
if (!(pctx-inode-i_flags  BAD_SPECIAL_FLAGS))
@@ -989,7 +989,7 @@ void e2fsck_pass1(e2fsck_t ctx)
  EXT4_FEATURE_RO_COMPAT_HUGE_FILE) 
(inode-osd2.linux2.l_i_blocks_hi != 0))
mark_inode_bad(ctx, ino);
-   if (inode-i_flags  EXT2_IMAGIC_FL) {
+   if (inode-i_flags  FS_IMAGIC_FL) {
if (imagic_fs) {
if (!ctx-inode_imagic_map)
alloc_imagic_map(ctx);
@@ -997,7 +997,7 @@ void e2fsck_pass1(e2fsck_t ctx)
 ino);
} else {
if (fix_problem(ctx, PR_1_SET_IMAGIC, pctx)) {
-   inode-i_flags = ~EXT2_IMAGIC_FL;
+   inode-i_flags = ~FS_IMAGIC_FL;
e2fsck_write_inode(ctx, ino,
   inode, pass1);
}
@@ -1893,13 +1893,13 @@ static void check_blocks(e2fsck_t ctx, struct 
problem_context *pctx,
extent_fs = (ctx-fs-super-s_feature_incompat 
  EXT3_FEATURE_INCOMPAT_EXTENTS);
 
-   if (inode-i_flags  EXT2_COMPRBLK_FL) {
+   if (inode-i_flags  FS_COMPRBLK_FL) {
if (fs-super-s_feature_incompat 
EXT2_FEATURE_INCOMPAT_COMPRESSION)
pb.compressed = 1;
else {
if (fix_problem(ctx, PR_1_COMPR_SET, pctx)) {
-   inode-i_flags = ~EXT2_COMPRBLK_FL;
+   inode-i_flags = ~FS_COMPRBLK_FL;
dirty_inode++;
}
}
@@ -1940,9 +1940,9 @@ static void check_blocks(e2fsck_t ctx, struct 
problem_context *pctx,
return;
}
 
-   if (inode-i_flags  EXT2_INDEX_FL) {
+   if (inode-i_flags  FS_INDEX_FL) {
if (handle_htree(ctx, pctx, ino, 

[PATCH 2/2] E2fsprogs: add compress and cow support in chattr, lsattr

2011-04-18 Thread liubo
Modify command 'chattr' and 'lsattr' to support compress and cow.
- use 'C' to indicate NOCOW attribute.
- still use 'c' to indicate compress attribute.

Also update the man doc.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 lib/e2p/pf.c |1 +
 lib/ext2fs/ext2_fs.h |1 +
 misc/chattr.1.in |   15 +++
 misc/chattr.c|   15 ++-
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/lib/e2p/pf.c b/lib/e2p/pf.c
index cc50896..c9385dd 100644
--- a/lib/e2p/pf.c
+++ b/lib/e2p/pf.c
@@ -48,6 +48,7 @@ static struct flags_name flags_array[] = {
{ FS_TOPDIR_FL, T, Top_of_Directory_Hierarchies },
{ EXT4_EXTENTS_FL, e, Extents },
{ EXT4_HUGE_FILE_FL, h, Huge_file },
+   { FS_NOCOW_FL, C, NOCOW },
{ 0, NULL, NULL }
 };
 
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 858c103..776be92 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -276,6 +276,7 @@ struct ext2_dx_countlimit {
 #define EXT4_EXTENTS_FL0x0008 /* Inode uses extents */
 #define EXT4_EA_INODE_FL   0x0020 /* Inode used for large EA */
 #define EXT4_EOFBLOCKS_FL  0x0040 /* Blocks allocated beyond 
EOF */
+#define FS_NOCOW_FL0x0080 /* Do not cow file */
 #define EXT4_SNAPFILE_FL   0x0100  /* Inode is a snapshot */
 #define EXT4_SNAPFILE_DELETED_FL   0x0400  /* Snapshot is being 
deleted */
 #define EXT4_SNAPFILE_SHRUNK_FL0x0800  /* Snapshot shrink 
has completed */
diff --git a/misc/chattr.1.in b/misc/chattr.1.in
index 92f6d70..434eb04 100644
--- a/misc/chattr.1.in
+++ b/misc/chattr.1.in
@@ -19,17 +19,18 @@ chattr \- change file attributes on a Linux file system
 .B chattr
 changes the file attributes on a Linux file system.
 .PP
-The format of a symbolic mode is +-=[acdeijstuADST].
+The format of a symbolic mode is +-=[acdeijstuACDST].
 .PP
 The operator `+' causes the selected attributes to be added to the
 existing attributes of the files; `-' causes them to be removed; and
 `=' causes them to be the only attributes that the files have.
 .PP
-The letters `acdeijstuADST' select the new attributes for the files:
+The letters `acdeijstuACDST' select the new attributes for the files:
 append only (a), compressed (c), no dump (d), extent format (e), immutable (i),
 data journalling (j), secure deletion (s), no tail-merging (t), 
-undeletable (u), no atime updates (A), synchronous directory updates (D), 
-synchronous updates (S), and top of directory hierarchy (T).
+undeletable (u), no atime updates (A), no copy on write (C),
+synchronous directory updates (D), synchronous updates (S),
+and top of directory hierarchy (T).
 .PP
 The following attributes are read-only, and may be listed by
 .BR lsattr (1)
@@ -64,6 +65,10 @@ this file compresses data before storing them on the disk.  
Note: please
 make sure to read the bugs and limitations section at the end of this
 document.
 .PP
+A file with the `C' attribute set is marked without COW (copy on write).  Note:
+please make sure to read the bugs and limitations section at the end of this
+document.
+.PP
 When a directory with the `D' attribute set is modified,
 the changes are written synchronously on the disk; this is equivalent to
 the `dirsync' mount option applied to a subset of the files.
@@ -161,6 +166,8 @@ The `c', 's',  and `u' attributes are not honored
 by the ext2 and ext3 filesystems as implemented in the current mainline
 Linux kernels.These attributes may be implemented
 in future versions of the ext2 and ext3 filesystems.
+The `C' attribute is only used in btrfs filesystem in the current mainline
+Linux kernels.
 .PP
 The `j' option is only useful if the filesystem is mounted as ext3.
 .PP
diff --git a/misc/chattr.c b/misc/chattr.c
index 78e3736..8c8231e 100644
--- a/misc/chattr.c
+++ b/misc/chattr.c
@@ -82,7 +82,7 @@ static unsigned long sf;
 static void usage(void)
 {
fprintf(stderr,
-   _(Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n),
+   _(Usage: %s [-RVf] [-+=AacDdeijsSuC] [-v version] files...\n),
program_name);
exit(1);
 }
@@ -106,6 +106,7 @@ static const struct flags_char flags_array[] = {
{ FS_UNRM_FL, 'u' },
{ FS_NOTAIL_FL, 't' },
{ FS_TOPDIR_FL, 'T' },
+   { FS_NOCOW_FL, 'C' },
{ 0, 0 }
 };
 
@@ -159,6 +160,12 @@ static int decode_arg (int * i, int argc, char ** argv)
}
if ((fl = get_flag(*p)) == 0)
usage();
+
+   if (fl == FS_COMPR_FL) {
+   af |= FS_NOCOMPR_FL;
+   add = 1;
+   }
+
rf |= fl;
rem = 1;
}
@@ -168,6 +175,12 @@ static int decode_arg (int * i, int argc, char ** argv)
for (p 

Re: [PATCH 1/2] E2fsprogs: use the generic inode flags

2011-04-18 Thread liubo
On 04/18/2011 04:41 PM, Coly Li wrote:
 On 2011年04月18日 15:37, liubo Wrote:
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  debugfs/htree.c|2 +-
  e2fsck/pass1.c |   22 +++---
  e2fsck/pass2.c |2 +-
  e2fsck/pass4.c |2 +-
  e2fsck/rehash.c|4 ++--
  ext2ed/inode_com.c |   14 +++---
  lib/e2p/fgetflags.c|6 +++---
  lib/e2p/fsetflags.c|6 +++---
  lib/e2p/getflags.c |6 +++---
  lib/e2p/pf.c   |   34 +-
  lib/e2p/setflags.c |6 +++---
  lib/ext2fs/ext2_fs.h   |   44 ++--
  lib/ext2fs/link.c  |4 ++--
  lib/ext2fs/mkjournal.c |2 +-
  misc/chattr.c  |   26 +-
  misc/tune2fs.c |2 +-
  16 files changed, 91 insertions(+), 91 deletions(-)
 [snip]
 
 Hi Bo,
 
 Could you please to introduce the motivation of this patch set a little bit 
 more? Thanks.
 

Hi Li,

Since we want to control COW and compression attribute on a per file or per 
directory basis,
and find that the generic command chattr is the Mr Right.

Currently only btrfs supports both, of course.

With these patches, we can do the followings:

c: compress
C: nocow

set compress  nocow:

# ./misc/chattr -V +c +C /mnt/btrfs/dir/
chattr 1.41.14 (22-Dec-2010)
Flags of /mnt/btrfs/dir/ set as c--C

# ./misc/lsattr -d /mnt/btrfs/dir/
c--C /mnt/btrfs/dir/

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Trace: add __print_symbolic_u64 to avoid warnings on 32bit machine

2011-04-18 Thread liubo
On 04/19/2011 02:11 AM, Steven Rostedt wrote:
 On Wed, 2011-04-06 at 17:18 +0800, liubo wrote:
 Btrfs has some ULL macros, and when these macros are passed to tracepoints'
 __print_symbolic(), there will be 64-32 truncate WARNINGS during compiling
 on 32bit box.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  include/linux/ftrace_event.h |   12 
  include/trace/events/btrfs.h |4 ++--
  include/trace/ftrace.h   |   13 +
  kernel/trace/trace_output.c  |   27 +++
  4 files changed, 54 insertions(+), 2 deletions(-)
 
 Could you break this up into two patches. One that touches the ftrace
 core, and one that updates btrfs.
 

Sure, I'll break it and resend soon.  Thanks for the reply.

thanks,
liubo

 Thanks,
 
 -- Steve
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] tracing: update btrfs's tracepoints to use u64 interface

2011-04-18 Thread liubo

To avoid 64-32 truncating WARNING, update btrfs's tracepoints.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/trace/events/btrfs.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index f445cff..4114129 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -28,7 +28,7 @@ struct extent_buffer;
{ BTRFS_SHARED_DATA_REF_KEY,SHARED_DATA_REF })
 
 #define __show_root_type(obj)  \
-   __print_symbolic(obj,   \
+   __print_symbolic_u64(obj,   \
{ BTRFS_ROOT_TREE_OBJECTID, ROOT_TREE },  \
{ BTRFS_EXTENT_TREE_OBJECTID,   EXTENT_TREE   },  \
{ BTRFS_CHUNK_TREE_OBJECTID,CHUNK_TREE},  \
@@ -125,7 +125,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
 );
 
 #define __show_map_type(type)  \
-   __print_symbolic(type,  \
+   __print_symbolic_u64(type,  \
{ EXTENT_MAP_LAST_BYTE, LAST_BYTE },  \
{ EXTENT_MAP_HOLE,  HOLE  },  \
{ EXTENT_MAP_INLINE,INLINE},  \
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] tracing: add __print_symbolic_u64 to avoid warnings on 32bit machine

2011-04-18 Thread liubo

Filesystem, like Btrfs, has some ULL macros, and when these macros are passed
to tracepoints'__print_symbolic(), there will be 64-32 truncate WARNINGS during
compiling on 32bit box.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/linux/ftrace_event.h |   12 
 include/trace/ftrace.h   |   13 +
 kernel/trace/trace_output.c  |   27 +++
 3 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 47e3997..efb2330 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -16,6 +16,11 @@ struct trace_print_flags {
const char  *name;
 };
 
+struct trace_print_flags_u64 {
+   unsigned long long  mask;
+   const char  *name;
+};
+
 const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
   unsigned long flags,
   const struct trace_print_flags *flag_array);
@@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, 
const char *delim,
 const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
 const struct trace_print_flags 
*symbol_array);
 
+#if BITS_PER_LONG == 32
+const char *ftrace_print_symbols_seq_u64(struct trace_seq *p,
+unsigned long long val,
+const struct trace_print_flags_u64
+*symbol_array);
+#endif
+
 const char *ftrace_print_hex_seq(struct trace_seq *p,
 const unsigned char *buf, int len);
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 3e68366..533c49f 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -205,6 +205,19 @@
ftrace_print_symbols_seq(p, value, symbols);\
})
 
+#undef __print_symbolic_u64
+#if BITS_PER_LONG == 32
+#define __print_symbolic_u64(value, symbol_array...)   \
+   ({  \
+   static const struct trace_print_flags_u64 symbols[] =   \
+   { symbol_array, { -1, NULL } }; \
+   ftrace_print_symbols_seq_u64(p, value, symbols);\
+   })
+#else
+#define __print_symbolic_u64(value, symbol_array...)   \
+   __print_symbolic(value, symbol_array)
+#endif
+
 #undef __print_hex
 #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len)
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 02272ba..b783504 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned 
long val,
 }
 EXPORT_SYMBOL(ftrace_print_symbols_seq);
 
+#if BITS_PER_LONG == 32
+const char *
+ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
+const struct trace_print_flags_u64 *symbol_array)
+{
+   int i;
+   const char *ret = p-buffer + p-len;
+
+   for (i = 0;  symbol_array[i].name; i++) {
+
+   if (val != symbol_array[i].mask)
+   continue;
+
+   trace_seq_puts(p, symbol_array[i].name);
+   break;
+   }
+
+   if (!p-len)
+   trace_seq_printf(p, 0x%llx, val);
+
+   trace_seq_putc(p, 0);
+
+   return ret;
+}
+EXPORT_SYMBOL(ftrace_print_symbols_seq_u64);
+#endif
+
 const char *
 ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int 
buf_len)
 {
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix easily get into ENOSPC in mixed case

2011-04-11 Thread liubo
On 04/09/2011 05:55 AM, Sergei Trofimovich wrote:
 [  100.500011] Call Trace:
 [  100.500011]  [810ed3a0] vfs_unlink+0x80/0xf0
 [  100.500011]  [810ef6f3] do_unlinkat+0x173/0x1b0
 [  100.500011]  [8111727b] ? fsnotify_find_inode_mark+0x3b/0x50
 [  100.500011]  [810dff91] ? filp_close+0x61/0x90
 [  100.500011]  [810f0c0d] sys_unlinkat+0x1d/0x40
 [  100.500011]  [81574c3b] system_call_fastpath+0x16/0x1b
 [  100.500011] Code: 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d 
 f8 c9 c3 0f 1f 40 00 4c 89 fe 4c 89 ef e8 05 d0 ff ff 85 c0 74 bb 0f 0b 0f 
 0b 89 c3 eb cd 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 
 [  100.500011] RIP  [a024a011] btrfs_unlink+0xd1/0xe0 [btrfs]
 [  100.500011]  RSP 880070b55e28
 [  100.525672] ---[ end trace 7e63b9144b7307fe ]---
 
 Looks like I won't be able to test your patch until this thing will go away 
 first.

Thanks a lot for testing, though.

I guess something messed up your btrfs metadata, cause when btrfs_unlink() 
wanted to remove A,
it found that A was just missing...

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: add support for mixed data+metadata block groups

2011-04-08 Thread liubo
On 12/10/2010 02:31 AM, Josef Bacik wrote:
 So alot of crazy people (I'm looking at you Meego) want to use btrfs on phones
 and such with small devices.  Unfortunately the way we split out metadata/data
 chunks it makes space usage inefficient for volumes that are smaller than
 1gigabyte.  So add a -M option for mixing metadata+data, and default to this
 mixed mode if the filesystem is less than or equal to 1 gigabyte.  I've tested
 this with xfstests on a 100mb filesystem and everything is a-ok.
 

Hi, Josef,

While using this mix metadata+data option, I noticed the following from 
btrfs-debug-tree's print:

===
chunk tree
leaf 143360 items 4 free space 3557 generation 4 owner 3
fs uuid 77d78a87-a886-4bfa-be3b-0dd052213a17
chunk uuid e64148d6-8267-4ff1-aafd-4266f74afbb2
item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98
dev item devid 1 total_bytes 4999610368 bytes used 20971520
item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3817 itemsize 80
chunk length 4194304 owner 2 type 2 num_stripes 1
stripe 0 devid 1 offset 0
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3737 itemsize 
80
chunk length 8388608 owner 2 type 5 num_stripes 1
stripe 0 devid 1 offset 4194304
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3657 itemsize 
80   == THIS ONE
chunk length 8388608 owner 2 type 4 num_stripes 1   
 == 
stripe 0 devid 1 offset 12582912
 == 
===

you see, there exists another metadata chunk (type 4) after mkfs.btrfs -M 
/dev/xxx.
So I was wondering that _IS_ this chunk what we want, or a spare one?

thanks,
liubo

 Signed-off-by: Josef Bacik jo...@redhat.com
 ---
  btrfs-vol.c  |4 +-
  btrfs_cmds.c |   13 +-
  ctree.h  |   10 +++--
  mkfs.c   |  122 
 +-
  utils.c  |   10 ++--
  utils.h  |2 +-
  6 files changed, 112 insertions(+), 49 deletions(-)
 
 diff --git a/btrfs-vol.c b/btrfs-vol.c
 index 8069778..7200bbc 100644
 --- a/btrfs-vol.c
 +++ b/btrfs-vol.c
 @@ -129,7 +129,9 @@ int main(int ac, char **av)
   exit(1);
   }
   if (cmd == BTRFS_IOC_ADD_DEV) {
 - ret = btrfs_prepare_device(devfd, device, 1, dev_block_count);
 + int mixed = 0;
 +
 + ret = btrfs_prepare_device(devfd, device, 1, dev_block_count, 
 mixed);
   if (ret) {
   fprintf(stderr, Unable to init %s\n, device);
   exit(1);
 diff --git a/btrfs_cmds.c b/btrfs_cmds.c
 index 8031c58..683aec0 100644
 --- a/btrfs_cmds.c
 +++ b/btrfs_cmds.c
 @@ -705,6 +705,7 @@ int do_add_volume(int nargs, char **args)
   int devfd, res;
   u64 dev_block_count = 0;
   struct stat st;
 + int mixed = 0;
  
   devfd = open(args[i], O_RDWR);
   if (!devfd) {
 @@ -727,7 +728,7 @@ int do_add_volume(int nargs, char **args)
   continue;
   }
  
 - res = btrfs_prepare_device(devfd, args[i], 1, dev_block_count);
 + res = btrfs_prepare_device(devfd, args[i], 1, dev_block_count, 
 mixed);
   if (res) {
   fprintf(stderr, ERROR: Unable to init '%s'\n, 
 args[i]);
   close(devfd);
 @@ -889,8 +890,14 @@ int do_df_filesystem(int nargs, char **argv)
   memset(description, 0, 80);
  
   if (flags  BTRFS_BLOCK_GROUP_DATA) {
 - snprintf(description, 5, %s, Data);
 - written += 4;
 + if (flags  BTRFS_BLOCK_GROUP_METADATA) {
 + snprintf(description, 15, %s,
 +  Data+Metadata);
 + written += 14;
 + } else {
 + snprintf(description, 5, %s, Data);
 + written += 4;
 + }
   } else if (flags  BTRFS_BLOCK_GROUP_SYSTEM) {
   snprintf(description, 7, %s, System);
   written += 6;
 diff --git a/ctree.h b/ctree.h
 index 962c510..ed83d02 100644
 --- a/ctree.h
 +++ b/ctree.h
 @@ -352,13 +352,15 @@ struct btrfs_super_block {
   * ones specified below then we will fail to mount
   */
  #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL  0)
 -#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL(2ULL  0)
 +#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL(1ULL  1)
 +#define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS  (1ULL  2)
  
  #define BTRFS_FEATURE_COMPAT_SUPP0ULL
  #define BTRFS_FEATURE_COMPAT_RO_SUPP 0ULL
 -#define BTRFS_FEATURE_INCOMPAT_SUPP  \
 - (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \
 -  BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL)
 +#define

[PATCH] Btrfs: fix easily get into ENOSPC in mixed case

2011-04-08 Thread liubo

When a btrfs disk is created by mixed data  metadata option, it will have no
pure data or pure metadata space info.

In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920
(Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at
the very beginning.  The problem is this initialization does not take the mixed
case into account, which will cause btrfs will easily get into ENOSPC in mixed
case.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f619c3c..1b47ae4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8781,23 +8781,38 @@ out:
 int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
 {
struct btrfs_space_info *space_info;
+   struct btrfs_super_block *disk_super;
+   u64 features;
+   u64 flags;
+   int mixed = 0;
int ret;
 
-   ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0,
-space_info);
-   if (ret)
-   return ret;
+   disk_super = fs_info-super_copy;
+   if (!btrfs_super_root(disk_super))
+   return 1;
 
-   ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0,
-space_info);
-   if (ret)
-   return ret;
+   features = btrfs_super_incompat_flags(disk_super);
+   if (features  BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS)
+   mixed = 1;
 
-   ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0,
-space_info);
+   flags = BTRFS_BLOCK_GROUP_SYSTEM;
+   ret = update_space_info(fs_info, flags, 0, 0, space_info);
if (ret)
-   return ret;
+   goto out;
 
+   if (mixed) {
+   flags = BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA;
+   ret = update_space_info(fs_info, flags, 0, 0, space_info);
+   } else {
+   flags = BTRFS_BLOCK_GROUP_METADATA;
+   ret = update_space_info(fs_info, flags, 0, 0, space_info);
+   if (ret)
+   goto out;
+
+   flags = BTRFS_BLOCK_GROUP_DATA;
+   ret = update_space_info(fs_info, flags, 0, 0, space_info);
+   }
+out:
return ret;
 }
 
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Trace: add __print_symbolic_u64 to avoid warnings on 32bit machine

2011-04-06 Thread liubo

Btrfs has some ULL macros, and when these macros are passed to tracepoints'
__print_symbolic(), there will be 64-32 truncate WARNINGS during compiling
on 32bit box.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/linux/ftrace_event.h |   12 
 include/trace/events/btrfs.h |4 ++--
 include/trace/ftrace.h   |   13 +
 kernel/trace/trace_output.c  |   27 +++
 4 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 22b32af..6b2e245 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -16,6 +16,11 @@ struct trace_print_flags {
const char  *name;
 };
 
+struct trace_print_flags_u64 {
+   unsigned long long  mask;
+   const char  *name;
+};
+
 const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
   unsigned long flags,
   const struct trace_print_flags *flag_array);
@@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, 
const char *delim,
 const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
 const struct trace_print_flags 
*symbol_array);
 
+#if BITS_PER_LONG == 32
+const char *ftrace_print_symbols_seq_u64(struct trace_seq *p,
+unsigned long long val,
+const struct trace_print_flags_u64
+*symbol_array);
+#endif
+
 const char *ftrace_print_hex_seq(struct trace_seq *p,
 const unsigned char *buf, int len);
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index f445cff..4114129 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -28,7 +28,7 @@ struct extent_buffer;
{ BTRFS_SHARED_DATA_REF_KEY,SHARED_DATA_REF })
 
 #define __show_root_type(obj)  \
-   __print_symbolic(obj,   \
+   __print_symbolic_u64(obj,   \
{ BTRFS_ROOT_TREE_OBJECTID, ROOT_TREE },  \
{ BTRFS_EXTENT_TREE_OBJECTID,   EXTENT_TREE   },  \
{ BTRFS_CHUNK_TREE_OBJECTID,CHUNK_TREE},  \
@@ -125,7 +125,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
 );
 
 #define __show_map_type(type)  \
-   __print_symbolic(type,  \
+   __print_symbolic_u64(type,  \
{ EXTENT_MAP_LAST_BYTE, LAST_BYTE },  \
{ EXTENT_MAP_HOLE,  HOLE  },  \
{ EXTENT_MAP_INLINE,INLINE},  \
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 3e68366..533c49f 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -205,6 +205,19 @@
ftrace_print_symbols_seq(p, value, symbols);\
})
 
+#undef __print_symbolic_u64
+#if BITS_PER_LONG == 32
+#define __print_symbolic_u64(value, symbol_array...)   \
+   ({  \
+   static const struct trace_print_flags_u64 symbols[] =   \
+   { symbol_array, { -1, NULL } }; \
+   ftrace_print_symbols_seq_u64(p, value, symbols);\
+   })
+#else
+#define __print_symbolic_u64(value, symbol_array...)   \
+   __print_symbolic(value, symbol_array)
+#endif
+
 #undef __print_hex
 #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len)
 
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 456be90..47aafa9 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned 
long val,
 }
 EXPORT_SYMBOL(ftrace_print_symbols_seq);
 
+#if BITS_PER_LONG == 32
+const char *
+ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val,
+const struct trace_print_flags_u64 *symbol_array)
+{
+   int i;
+   const char *ret = p-buffer + p-len;
+
+   for (i = 0;  symbol_array[i].name; i++) {
+
+   if (val != symbol_array[i].mask)
+   continue;
+
+   trace_seq_puts(p, symbol_array[i].name);
+   break;
+   }
+
+   if (!p-len)
+   trace_seq_printf(p, 0x%llx, val);
+
+   trace_seq_putc(p, 0);
+
+   return ret;
+}
+EXPORT_SYMBOL(ftrace_print_symbols_seq_u64);
+#endif
+
 const char *
 ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, 

Re: [PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression

2011-04-05 Thread liubo
On 04/04/2011 05:31 PM, Konstantinos Skarlatos wrote:
 Hello,
 I would like to ask about the status of this feature/patch, is it
 accepted into btrfs code, and how can I use it?
 

Yes, it is now in the latest 2.6.39-rc1.

 I am interested in enabling compression in a specific
 folder(force-compress would be ideal) of a large btrfs volume, and
 disabling it for the rest.
 

hmm, I'm making the tool's patch, and will come soon. :)

 
 On 21/3/2011 10:57 πμ, liubo wrote:
 Data compression and data cow are controlled across the entire FS by
 mount
 options right now.  ioctls are needed to set this on a per file or per
 directory basis.  This has been proposed previously, but VFS developers
 wanted us to use generic ioctls rather than btrfs-specific ones.

 According to chris's comment, there should be just one true compression
 method(probably LZO) stored in the super.  However, before this, we would
 wait for that one method is stable enough to be adopted into the super.
 So I list it as a long term goal, and just store it in ram today.

 After applying this patch, we can use the generic FS_IOC_SETFLAGS
 ioctl to
 control file and directory's datacow and compression attribute.

 NOTE:
   - The compression type is selected by such rules:
 If we mount btrfs with compress options, ie, zlib/lzo, the type is
 it.
 Otherwise, we'll use the default compress type (zlib today).

 v1-v2:
 Rebase the patch with the latest btrfs.

 Signed-off-by: Liu Boliubo2...@cn.fujitsu.com
 ---
   fs/btrfs/ctree.h   |1 +
   fs/btrfs/disk-io.c |6 ++
   fs/btrfs/inode.c   |   32 
   fs/btrfs/ioctl.c   |   41 +
   4 files changed, 72 insertions(+), 8 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 8b4b9d1..b77d1a5 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1283,6 +1283,7 @@ struct btrfs_root {
   #define BTRFS_INODE_NODUMP(1  8)
   #define BTRFS_INODE_NOATIME(1  9)
   #define BTRFS_INODE_DIRSYNC(1  10)
 +#define BTRFS_INODE_COMPRESS(1  11)

   /* some macros to generate set/get funcs for the struct fields.  This
* assumes there is a lefoo_to_cpu for every type, so lets make a
 simple
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 3e1ea3e..a894c12 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct
 super_block *sb,

   btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);

 +/*
 + * In the long term, we'll store the compression type in the super
 + * block, and it'll be used for per file compression control.
 + */
 +fs_info-compress_type = BTRFS_COMPRESS_ZLIB;
 +
   ret = btrfs_parse_options(tree_root, options);
   if (ret) {
   err = ret;
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index db67821..e687bb9 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -381,7 +381,8 @@ again:
*/
   if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS)
   (btrfs_test_opt(root, COMPRESS) ||
 - (BTRFS_I(inode)-force_compress))) {
 + (BTRFS_I(inode)-force_compress) ||
 + (BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))) {
   WARN_ON(pages);
   pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);

 @@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode
 *inode, struct page *locked_page,
   ret = run_delalloc_nocow(inode, locked_page, start, end,
page_started, 0, nr_written);
   else if (!btrfs_test_opt(root, COMPRESS)
 - !(BTRFS_I(inode)-force_compress))
 + !(BTRFS_I(inode)-force_compress)
 + !(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))
   ret = cow_file_range(inode, locked_page, start, end,
 page_started, nr_written, 1);
   else
 @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct
 btrfs_trans_handle *trans,
   location-offset = 0;
   btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY);

 -btrfs_inherit_iflags(inode, dir);
 -
   if ((mode  S_IFREG)) {
   if (btrfs_test_opt(root, NODATASUM))
   BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM;
 @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct
 btrfs_trans_handle *trans,
   BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW;
   }

 +btrfs_inherit_iflags(inode, dir);
 +
   insert_inode_hash(inode);
   inode_tree_add(inode);
   return inode;
 @@ -6803,6 +6805,26 @@ static int btrfs_getattr(struct vfsmount *mnt,
   return 0;
   }

 +/*
 + * If a file is moved, it will inherit the cow and compression flags
 of the new
 + * directory.
 + */
 +static void fixup_inode_flags(struct inode *dir, struct inode *inode)
 +{
 +struct btrfs_inode *b_dir = BTRFS_I(dir);
 +struct btrfs_inode *b_inode = BTRFS_I(inode);
 +
 +if (b_dir-flags

Re: 2.6.39-rc1: kernel BUG at fs/btrfs/extent-tree.c:5479!

2011-04-02 Thread liubo
On 04/02/2011 06:41 PM, Sergei Trofimovich wrote:
 On Sat, 02 Apr 2011 17:37:58 +0800
 liubo liubo2...@cn.fujitsu.com wrote:
 
 On 04/02/2011 05:19 PM, Sergei Trofimovich wrote:
 The partition is a physical ~5GB --mixed lzo compressed partition.

 The kernel 2.6.39-rc1 + reverted commit 
 c59021f846881a957ac5afe456d0f59d6a517b61.
 (see http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09083.html)

 Hi, Sergei,

 I'm digging this...

 Can u show me steps to reproduce this?
 
 I use the filesystem as a storage of large CVS tree and
 as temp storage for large compilations, so I can roughly
 describe what I did and when it failed.
 
 I've formatter btrfs 5G partition as --mixed and mounter it with lzo 
 compression
 on the kernel of version 'v2.6.38-4148-g054cfaa', then checked out there
 large CVS tree (~170K files, weights 177MB), copied there linux source (not 
 built)
 and copied my '/var/'. I ran compiles there and started to get -ENOSPC
 OOpses when 'df -h' reported 3.5G free.
 
 As Linus pulled josef's changes, so I've updated to v2.6.38-6555-ga44f99c
 and kernel started to OOps right after mount (added assert started to trigger 
 earlier).
 I've reported it to this ML (link above). josef and sensille helped me to 
 debug what's
 going wrong [both CCed]. sensille pointed to the commit, which is guilty to 
 miscomputing
 available space. As I understood they know what exactly screwed up.
 

Great thanks for these details.

I did not consider the mix case when making the guilty patch, sorry.
Frankly, I'm still trying to reproduce your first bug, and on my box mix + 
lzo does not cause bug...

Seems that you are using opensuse's kernel.

 The second case (this one):
 I still use the same filesystem (didn't reformat, so it might carry some 
 corruption
 after debugging patches).
 I've reverted your change c59021f846881a957ac5afe456d0f59d6a517b61
 and made sure it stops OOpsing for me, then updated to 2.6.39-rc1
 and reverted only this commit. Filesystem became usable until I've decided
 to run large compile on it (clang debug source).
 
 I think at the time of OOps the following things did happen simultaneously:
 
 1. one process was splitting debug symbols of some binary:
   - opened original binary for read
   - write to new file (stripped binary)
   - write debug symbols to separate file
 
 2. another process logged that action to log file
 
 3. the filesystem filled-up and OOpsed. At the time of OOps
'df -h' showed 200M free.
 
 I'm trying to reproduce this second case ATM (build takes
 more, that an hour).
 

All right, thanks for the work.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Trace: use unsigned long long in trace print frames

2011-04-01 Thread liubo

While adding tracepoint for btrfs, I got a problem:

btrfs uses some macros with ULL type, but tracepoint's macros,
__print_[flags,symbols](), only have unsigned long, so on 32bit box
there will be 64-32 truncate WARNINGs when compiling.

Here I'm inclined to make the replacement to clear those WARNINGs.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/linux/ftrace_event.h |7 ---
 kernel/trace/trace_output.c  |   10 +-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 22b32af..b52f2c5 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -12,15 +12,16 @@ struct tracer;
 struct dentry;
 
 struct trace_print_flags {
-   unsigned long   mask;
+   unsigned long long  mask;
const char  *name;
 };
 
 const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
-  unsigned long flags,
+  unsigned long long flags,
   const struct trace_print_flags *flag_array);
 
-const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
+const char *ftrace_print_symbols_seq(struct trace_seq *p,
+unsigned long long val,
 const struct trace_print_flags 
*symbol_array);
 
 const char *ftrace_print_hex_seq(struct trace_seq *p,
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 456be90..97ba902 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -294,10 +294,10 @@ int trace_seq_path(struct trace_seq *s, struct path *path)
 
 const char *
 ftrace_print_flags_seq(struct trace_seq *p, const char *delim,
-  unsigned long flags,
+  unsigned long long flags,
   const struct trace_print_flags *flag_array)
 {
-   unsigned long mask;
+   unsigned long long mask;
const char *str;
const char *ret = p-buffer + p-len;
int i;
@@ -319,7 +319,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char 
*delim,
if (flags) {
if (p-len  delim)
trace_seq_puts(p, delim);
-   trace_seq_printf(p, 0x%lx, flags);
+   trace_seq_printf(p, 0x%llx, flags);
}
 
trace_seq_putc(p, 0);
@@ -329,7 +329,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char 
*delim,
 EXPORT_SYMBOL(ftrace_print_flags_seq);
 
 const char *
-ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val,
+ftrace_print_symbols_seq(struct trace_seq *p, unsigned long long val,
 const struct trace_print_flags *symbol_array)
 {
int i;
@@ -345,7 +345,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long 
val,
}
 
if (!p-len)
-   trace_seq_printf(p, 0x%lx, val);
+   trace_seq_printf(p, 0x%llx, val);

trace_seq_putc(p, 0);
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Trace: use unsigned long long in trace print frames

2011-04-01 Thread liubo
On 04/01/2011 09:49 PM, Steven Rostedt wrote:
 On Fri, 2011-04-01 at 14:42 +0800, liubo wrote:
 While adding tracepoint for btrfs, I got a problem:

 btrfs uses some macros with ULL type, but tracepoint's macros,
 __print_[flags,symbols](), only have unsigned long, so on 32bit box
 there will be 64-32 truncate WARNINGs when compiling.

 Here I'm inclined to make the replacement to clear those WARNINGs.
 
 Hmm, I don't like this. unsigned long is a natural word for
 architectures, I don't want to have 32 bit suffer because one user is
 doing something with ULL.
 
 A better solution is to add a trace_print_flags_u64 or something, that
 can be used for cases that u64 is needed. For archs were sizeof(long) ==
 sizeof(u64) we can have the two macros/structs be the same.
 

All right, a u64 specific one is also in my mind. :)

thanks,
liubo

 -- Steve
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: clear __GFP_FS flag in the space cache inode

2011-03-31 Thread liubo
From: Miao Xie mi...@cn.fujitsu.com

the object id of the space cache inode's key is allocated from the relative
root, just like the regular file. So we can't identify space cache inode by
checking the object id of the inode's key, and we have to clear __GFP_FS flag
at the time we look up the space cache inode.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/free-space-cache.c |2 ++
 fs/btrfs/inode.c|2 --
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 0037427..13575de 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -81,6 +81,8 @@ struct inode *lookup_free_space_inode(struct btrfs_root *root,
return ERR_PTR(-ENOENT);
}
 
+   inode-i_mapping-flags = ~__GFP_FS;
+
spin_lock(block_group-lock);
if (!root-fs_info-closing) {
block_group-inode = igrab(inode);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 93c28a1..c103fdc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2537,8 +2537,6 @@ static void btrfs_read_locked_inode(struct inode *inode)
BTRFS_I(inode)-flags = btrfs_inode_flags(leaf, inode_item);
 
alloc_group_block = btrfs_inode_block_group(leaf, inode_item);
-   if (location.objectid == BTRFS_FREE_SPACE_OBJECTID)
-   inode-i_mapping-flags = ~__GFP_FS;
 
/*
 * try to precache a NULL acl entry for files that don't have
-- 
1.7.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compile warning from __btrfs_map_block

2011-03-31 Thread liubo
On 03/31/2011 08:10 PM, Chris Mason wrote:
 Excerpts from liubo's message of 2011-03-31 05:45:20 -0400:
 While compile btrfs modules on 32bit box, I encounter the following:

 WARNING: __umoddi3 [fs/btrfs/btrfs.ko] undefined!

 The WARNING comes from that __btrfs_map_block does not use do_div() for
 relative operations, this will cause problems on 32bit box, for values
 with u64 type should use do_div() instead of a direct %.
 
 Which kernel tree was this against?  I had rebased the for-linus and
 for-linus-unmerged branch to get rid of it.
 
 Sorry for the confusion.

Ah, it is my fault to neglect the version, I found this warning while compiling
the latest for-linus tree (top commit: 
c1e1f82c56af1a286fd747e809c94628c2ca15fb).

thanks,
liubo

 
 -chris
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c |   23 +++
  1 files changed, 15 insertions(+), 8 deletions(-)

 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 41afd50..7b23d0f 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -3076,16 +3076,19 @@ again:
  multi-stripes[i].dev = map-stripes[stripe_index].dev;
  
  if (map-type  BTRFS_BLOCK_GROUP_RAID0) {
 -u64 stripes;
 -int last_stripe = (stripe_nr_end - 1) %
 -map-num_stripes;
 +u64 stripes = stripe_nr_end - 1;
 +int last_stripe = do_div(stripes,
 +map-num_stripes);
  int j;
  
  for (j = 0; j  map-num_stripes; j++) {
 -if ((stripe_nr_end - 1 - j) %
 -  map-num_stripes == stripe_index)
 +stripes = stripe_nr_end - 1 - j;
 +
 +if (do_div(stripes, map-num_stripes) ==
 +stripe_index)
  break;
  }
 +
  stripes = stripe_nr_end - 1 - j;
  do_div(stripes, map-num_stripes);
  multi-stripes[i].length = map-stripe_len *
 @@ -3100,18 +3103,22 @@ again:
  multi-stripes[i].length -=
  stripe_end_offset;
  } else if (map-type  BTRFS_BLOCK_GROUP_RAID10) {
 -u64 stripes;
 +u64 stripes = stripe_nr_end - 1;
  int j;
  int factor = map-num_stripes /
   map-sub_stripes;
 -int last_stripe = (stripe_nr_end - 1) % factor;
 +int last_stripe = do_div(stripes, factor);
 +
  last_stripe *= map-sub_stripes;
  
  for (j = 0; j  factor; j++) {
 -if ((stripe_nr_end - 1 - j) % factor ==
 +stripes = stripe_nr_end - 1 - j;
 +
 +if (do_div(stripes, factor) ==
  stripe_index / map-sub_stripes)
  break;
  }
 +
  stripes = stripe_nr_end - 1 - j;
  do_div(stripes, factor);
  multi-stripes[i].length = map-stripe_len *
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix OOPS of empty filesystem after balance

2011-03-31 Thread liubo
On 03/30/2011 07:58 PM, Arne Jansen wrote:
 Am 10.03.2011 13:28, schrieb Chris Mason:
 Excerpts from liubo's message of 2011-03-10 03:50:27 -0500:
 On 03/07/2011 10:13 AM, liubo wrote:
 btrfs will remove unused block groups after balance.
 When a empty filesystem is balanced, the block group with tag DATA may be
 dropped, and after umount and mount again, it will not find DATA 
 space_info
 and lead to OOPS.
 So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid 
 OOPS.

 
 this patch breaks mixed block groups. If the space_infos get added
 upfront, later on all mixed block groups will be added to the data
 space_info, leaving the metadata space_info completely empty.
 No mixed space_info will ever get created.

Hi, Arne,

Sorry for the late reply.

 As a fix it might be enough to call btrfs_init_space_info after
 btrfs_read_block_groups, not before, but I haven't tested it.
 

Seems impossible, the original bug just occurs in btrfs_read_block_groups()...

 This was the cause of the BUG reported by Sergei Trofimovich in the
 thread v2.6.38-6555-ga44f99c: null pointer dereference on -ENOSPC.
 

Thanks for pointing this out.
Anyway, will dig it more.

thanks,
liubo

 -Arne
 
 Hi, Chirs,

 These two fixes are for critical problems(one OOPS and one memory leak), so 
 would
 you please take some time to review them and check if they are ready for 
 the next
 git pull? 

 Seems that you have been a lot busy these days. ;)
 Hi Liubo,

 I'm looking at both of these.  There are no more rc's for 2.6.38, only
 the final release, so the bar is very high for a commit that goes in.

 -chris

 thanks,
 liubo

 Reported-by: Daniel J Blueman daniel.blue...@gmail.com
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h   |1 +
  fs/btrfs/disk-io.c |6 ++
  fs/btrfs/extent-tree.c |   23 +++
  3 files changed, 30 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 28188a7..49c50e5 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -2221,6 +2221,7 @@ int btrfs_error_discard_extent(struct btrfs_root 
 *root, u64 bytenr,
 u64 num_bytes);
  int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, u64 type);
 +int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
  
  /* ctree.c */
  int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 3e1ea3e..8bcdc62 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1967,6 +1967,12 @@ struct btrfs_root *open_ctree(struct super_block 
 *sb,
  fs_info-metadata_alloc_profile = (u64)-1;
  fs_info-system_alloc_profile = fs_info-metadata_alloc_profile;
  
 +ret = btrfs_init_space_info(fs_info);
 +if (ret) {
 +printk(KERN_ERR Failed to initial space info: %d\n, ret);
 +goto fail_block_groups;
 +}
 +
  ret = btrfs_read_block_groups(extent_root);
  if (ret) {
  printk(KERN_ERR Failed to read block groups: %d\n, ret);
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 100e409..08525ee 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -8714,6 +8714,29 @@ out:
  return ret;
  }
  
 +int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
 +{
 +struct btrfs_space_info *space_info;
 +int ret;
 +
 +ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0,
 + space_info);
 +if (ret)
 +return ret;
 +
 +ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0,
 + space_info);
 +if (ret)
 +return ret;
 +
 +ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0,
 + space_info);
 +if (ret)
 +return ret;
 +
 +return ret;
 +}
 +
  int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, 
 u64 end)
  {
  return unpin_extent_range(root, start, end);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add initial tracepoint support for btrfs

2011-03-29 Thread liubo
On 03/29/2011 09:16 AM, liubo wrote:
 On 03/28/2011 08:59 AM, Chris Mason wrote:
 Excerpts from Chris Mason's message of 2011-03-26 08:12:04 -0400:
 Excerpts from liubo's message of 2011-03-24 07:18:59 -0400:
 Tracepoints can provide insight into why btrfs hits bugs and be greatly
 helpful for debugging, e.g
 This is really neat, I've queued it up.
 Whoops, it has a lot of warnings when compiled on 32 bit machines.
 Please take a look:

 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:68:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:68:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:68:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:144:1: warning: large integer implicitly 
 truncated to unsigned type

 
 Ahh, I figure it out.
 Will send a new version to clear warnings.
 

Here is the patch to clear warnings.

From: Liu Bo liubo2...@cn.fujitsu.com

[PATCH] Btrfs: fix compile warnings of btrfs tracepoint on 32bit box

include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated 
to unsigned type
include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated 
to unsigned type
include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated 
to unsigned type

btrfs has defined some macros which value has ULL type, and when btrfs 
tracepoints
use these macros on 32bit box, values like -1ULL will be truncated.
This is where those warnings come from.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/trace/events/btrfs.h |   19 +++
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index f445cff..27e67fd 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -36,9 +36,12 @@ struct extent_buffer;
{ BTRFS_FS_TREE_OBJECTID,   FS_TREE   },  \
{ BTRFS_ROOT_TREE_DIR_OBJECTID, ROOT_TREE_DIR },  \
{ BTRFS_CSUM_TREE_OBJECTID, CSUM_TREE },  \
-   { BTRFS_TREE_LOG_OBJECTID,  TREE_LOG  },  \
-   { BTRFS_TREE_RELOC_OBJECTID,TREE_RELOC},  \
-   { BTRFS_DATA_RELOC_TREE_OBJECTID, DATA_RELOC_TREE })
+   { (unsigned long)BTRFS_TREE_LOG_OBJECTID,   \
+   TREE_LOG  },  \
+   { (unsigned long)BTRFS_TREE_RELOC_OBJECTID, \
+   TREE_RELOC},  \
+   { (unsigned long)BTRFS_DATA_RELOC_TREE_OBJECTID,\
+   DATA_RELOC_TREE })
 
 #define show_root_type(obj)\
obj, ((obj = BTRFS_DATA_RELOC_TREE_OBJECTID) ||\
@@ -126,13 +129,13 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
 
 #define __show_map_type(type)  \
__print_symbolic(type,  \
-   { EXTENT_MAP_LAST_BYTE, LAST_BYTE },  \
-   { EXTENT_MAP_HOLE,  HOLE  },  \
-   { EXTENT_MAP_INLINE,INLINE},  \
-   { EXTENT_MAP_DELALLOC,  DELALLOC  })
+   { (unsigned long)EXTENT_MAP_LAST_BYTE,  LAST_BYTE },  \
+   { (unsigned long)EXTENT_MAP_HOLE,   HOLE  },  \
+   { (unsigned long)EXTENT_MAP_INLINE, INLINE},  \
+   { (unsigned long)EXTENT_MAP_DELALLOC,   DELALLOC  })
 
 #define show_map_type(type)\
-   type, (type = EXTENT_MAP_LAST_BYTE) ? - :  __show_map_type(type)
+   type, (type = EXTENT_MAP_LAST_BYTE) ? - : __show_map_type(type)
 
 #define show_map_flags(flag)   \
__print_flags(flag, |,\
-- 
1.6.5.2


 Thanks,
 liubo
 
 -chris
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add initial tracepoint support for btrfs

2011-03-29 Thread liubo
Please ignore this patch...

I just found we'd better revise the tracepoint side instead of btrfs side, will 
dig it more.

thanks,
liubo

 From: Liu Bo liubo2...@cn.fujitsu.com
 
 [PATCH] Btrfs: fix compile warnings of btrfs tracepoint on 32bit box
 
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 
 btrfs has defined some macros which value has ULL type, and when btrfs 
 tracepoints
 use these macros on 32bit box, values like -1ULL will be truncated.
 This is where those warnings come from.
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  include/trace/events/btrfs.h |   19 +++
  1 files changed, 11 insertions(+), 8 deletions(-)
 
 diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
 index f445cff..27e67fd 100644
 --- a/include/trace/events/btrfs.h
 +++ b/include/trace/events/btrfs.h
 @@ -36,9 +36,12 @@ struct extent_buffer;
   { BTRFS_FS_TREE_OBJECTID,   FS_TREE   },  \
   { BTRFS_ROOT_TREE_DIR_OBJECTID, ROOT_TREE_DIR },  \
   { BTRFS_CSUM_TREE_OBJECTID, CSUM_TREE },  \
 - { BTRFS_TREE_LOG_OBJECTID,  TREE_LOG  },  \
 - { BTRFS_TREE_RELOC_OBJECTID,TREE_RELOC},  \
 - { BTRFS_DATA_RELOC_TREE_OBJECTID, DATA_RELOC_TREE })
 + { (unsigned long)BTRFS_TREE_LOG_OBJECTID,   \
 + TREE_LOG  },  \
 + { (unsigned long)BTRFS_TREE_RELOC_OBJECTID, \
 + TREE_RELOC},  \
 + { (unsigned long)BTRFS_DATA_RELOC_TREE_OBJECTID,\
 + DATA_RELOC_TREE })
  
  #define show_root_type(obj)  \
   obj, ((obj = BTRFS_DATA_RELOC_TREE_OBJECTID) ||\
 @@ -126,13 +129,13 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
  
  #define __show_map_type(type)
 \
   __print_symbolic(type,  \
 - { EXTENT_MAP_LAST_BYTE, LAST_BYTE },  \
 - { EXTENT_MAP_HOLE,  HOLE  },  \
 - { EXTENT_MAP_INLINE,INLINE},  \
 - { EXTENT_MAP_DELALLOC,  DELALLOC  })
 + { (unsigned long)EXTENT_MAP_LAST_BYTE,  LAST_BYTE },  \
 + { (unsigned long)EXTENT_MAP_HOLE,   HOLE  },  \
 + { (unsigned long)EXTENT_MAP_INLINE, INLINE},  \
 + { (unsigned long)EXTENT_MAP_DELALLOC,   DELALLOC  })
  
  #define show_map_type(type)  \
 - type, (type = EXTENT_MAP_LAST_BYTE) ? - :  __show_map_type(type)
 + type, (type = EXTENT_MAP_LAST_BYTE) ? - : __show_map_type(type)
  
  #define show_map_flags(flag) \
   __print_flags(flag, |,\

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: add initial tracepoint support for btrfs

2011-03-28 Thread liubo
On 03/28/2011 08:59 AM, Chris Mason wrote:
 Excerpts from Chris Mason's message of 2011-03-26 08:12:04 -0400:
 Excerpts from liubo's message of 2011-03-24 07:18:59 -0400:
 Tracepoints can provide insight into why btrfs hits bugs and be greatly
 helpful for debugging, e.g
 This is really neat, I've queued it up.
 
 Whoops, it has a lot of warnings when compiled on 32 bit machines.
 Please take a look:
 
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:47:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:68:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:68:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:68:1: warning: large integer implicitly 
 truncated to unsigned type
 include/trace/events/btrfs.h:144:1: warning: large integer implicitly 
 truncated to unsigned type
 

Ahh, I figure it out.
Will send a new version to clear warnings.

Thanks,
liubo

 -chris
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add initial tracepoint support for btrfs

2011-03-24 Thread liubo

Tracepoints can provide insight into why btrfs hits bugs and be greatly
helpful for debugging, e.g
  dd-7822  [000]  2121.641088: btrfs_inode_request: root = 
5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, 
logged_trans = 0
  dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), 
gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
 btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 
2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 
29388800 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 
1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 
29392896 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 
1(ROOT_TREE), gen = 8
   flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 
3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, 
sub_stripes = 0, type = DATA
   flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 
2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 
29396992 (cow_level = 0)
   flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), 
refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 
0)
   flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 
3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 
20975616 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), 
refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 
0)
 btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 
7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 
29409280 (cow_level = 0)

Here is what I have added:

1) ordere_extent:
btrfs_ordered_extent_add
btrfs_ordered_extent_remove
btrfs_ordered_extent_start
btrfs_ordered_extent_put

These provide critical information to understand how ordered_extents are
updated.

2) extent_map:
btrfs_get_extent

extent_map is used in both read and write cases, and it is useful for tracking
how btrfs specific IO is running.

3) writepage:
__extent_writepage
btrfs_writepage_end_io_hook

Pages are cirtical resourses and produce a lot of corner cases during writeback,
so it is valuable to know how page is written to disk.

4) inode:
btrfs_inode_new
btrfs_inode_request
btrfs_inode_evict

These can show where and when a inode is created, when a inode is evicted.

5) sync:
btrfs_sync_file
btrfs_sync_fs

These show sync arguments.

6) transaction:
btrfs_transaction_commit

In transaction based filesystem, it will be useful to know the generation and
who does commit.

7) back reference and cow:
btrfs_delayed_tree_ref
btrfs_delayed_data_ref
btrfs_delayed_ref_head
btrfs_cow_block

Btrfs natively supports back references, these tracepoints are helpful on
understanding btrfs's COW mechanism.

8) chunk:
btrfs_chunk_alloc
btrfs_chunk_free

Chunk is a link between physical offset and logical offset, and stands for space
infomation in btrfs, and these are helpful on tracing space things.

9) reserved_extent:
btrfs_reserved_extent_alloc
btrfs_reserved_extent_free

These can show how btrfs uses its space.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.c |3 +
 fs/btrfs/ctree.h |1 +
 fs/btrfs/delayed-ref.c   |6 +
 fs/btrfs/extent-tree.c   |4 +
 fs/btrfs/extent_io.c |2 +
 fs/btrfs/file.c  |1 +
 fs/btrfs/inode.c |   12 +
 fs/btrfs/ordered-data.c  |8 +
 fs/btrfs/super.c |5 +
 fs/btrfs/transaction.c   |2 +
 fs/btrfs/volumes.c   |   16 +-
 fs/btrfs/volumes.h   |   11 +
 include/trace/events/btrfs.h |  667 ++
 13 files changed, 727 insertions(+), 11 deletions(-)
 create mode 100644 include/trace/events/btrfs.h

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index b5baff0..351515d 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -542,6 +542,9 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle 
*trans,
 
ret = __btrfs_cow_block(trans, root, buf, parent,
 parent_slot, cow_ret, search_start, 0);
+
+   trace_btrfs_cow_block(root, buf, *cow_ret);
+
return ret;
 }
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 28188a7..cd6906e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -28,6 +28,7 @@
 #include linux/wait.h
 #include linux/slab.h
 #include linux/kobject.h
+#include trace/events/btrfs.h
 #include asm/kmap_types.h
 #include extent_io.h
 #include extent_map.h
diff --git a/fs/btrfs/delayed-ref.c 

[PATCH 2/2 v3] Btrfs: Per file/directory controls for COW and compression

2011-03-22 Thread liubo

From: Liu Bo liubo2...@cn.fujitsu.com

Subject: [PATCH 2/2 v3] Btrfs: Per file/directory controls for COW and 
compression

Data compression and data cow are controlled across the entire FS by mount
options right now.  ioctls are needed to set this on a per file or per
directory basis.  This has been proposed previously, but VFS developers
wanted us to use generic ioctls rather than btrfs-specific ones.

According to Chris's comment, there should be just one true compression
method(probably LZO) stored in the super.  However, before this, we would
wait for that one method is stable enough to be adopted into the super.
So I list it as a long term goal, and just store it in ram today.

After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to
control file and directory's datacow and compression attribute.

NOTE:
 - The compression type is selected by such rules:
   If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
   Otherwise, we'll use the default compress type (zlib today).

v1-v2:
- rebase to the latest btrfs.
v2-v3:
- fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW
  will be screwed by inheritance from parent directory.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |6 ++
 fs/btrfs/inode.c   |   31 ---
 fs/btrfs/ioctl.c   |   41 +
 4 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b4b9d1..b77d1a5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1283,6 +1283,7 @@ struct btrfs_root {
 #define BTRFS_INODE_NODUMP (1  8)
 #define BTRFS_INODE_NOATIME(1  9)
 #define BTRFS_INODE_DIRSYNC(1  10)
+#define BTRFS_INODE_COMPRESS   (1  11)
 
 /* some macros to generate set/get funcs for the struct fields.  This
  * assumes there is a lefoo_to_cpu for every type, so lets make a simple
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..a894c12 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
 
+   /*
+* In the long term, we'll store the compression type in the super
+* block, and it'll be used for per file compression control.
+*/
+   fs_info-compress_type = BTRFS_COMPRESS_ZLIB;
+
ret = btrfs_parse_options(tree_root, options);
if (ret) {
err = ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index db67821..2d9910d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -381,7 +381,8 @@ again:
 */
if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS) 
(btrfs_test_opt(root, COMPRESS) ||
-(BTRFS_I(inode)-force_compress))) {
+(BTRFS_I(inode)-force_compress) ||
+(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))) {
WARN_ON(pages);
pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
 
@@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct 
page *locked_page,
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 0, nr_written);
else if (!btrfs_test_opt(root, COMPRESS) 
-!(BTRFS_I(inode)-force_compress))
+!(BTRFS_I(inode)-force_compress) 
+!(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))
ret = cow_file_range(inode, locked_page, start, end,
  page_started, nr_written, 1);
else
@@ -4586,7 +4588,8 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
if ((mode  S_IFREG)) {
if (btrfs_test_opt(root, NODATASUM))
BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM;
-   if (btrfs_test_opt(root, NODATACOW))
+   if (btrfs_test_opt(root, NODATACOW) ||
+   (BTRFS_I(dir)-flags  BTRFS_INODE_NODATACOW))
BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW;
}
 
@@ -6803,6 +6806,26 @@ static int btrfs_getattr(struct vfsmount *mnt,
return 0;
 }
 
+/*
+ * If a file is moved, it will inherit the cow and compression flags of the new
+ * directory.
+ */
+static void fixup_inode_flags(struct inode *dir, struct inode *inode)
+{
+   struct btrfs_inode *b_dir = BTRFS_I(dir);
+   struct btrfs_inode *b_inode = BTRFS_I(inode);
+
+   if (b_dir-flags  BTRFS_INODE_NODATACOW)
+   b_inode-flags |= BTRFS_INODE_NODATACOW;
+   else
+   b_inode-flags = ~BTRFS_INODE_NODATACOW;
+
+   if (b_dir-flags  BTRFS_INODE_COMPRESS)
+   b_inode-flags |= BTRFS_INODE_COMPRESS;
+   else
+   b_inode-flags = ~BTRFS_INODE_COMPRESS;
+}
+
 

[PATCH 1/2 v2] Btrfs: add datacow flag in inode flag

2011-03-21 Thread liubo

For datacow control, the corresponding inode flags are needed.
This is for btrfs use.

v1-v2:
Change FS_COW_FL to another bit due to conflict with the upstream e2fsprogs

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/linux/fs.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 63d069b..dbcb47e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -353,6 +353,8 @@ struct inodes_stat_t {
 #define FS_TOPDIR_FL   0x0002 /* Top of directory 
hierarchies*/
 #define FS_EXTENT_FL   0x0008 /* Extents */
 #define FS_DIRECTIO_FL 0x0010 /* Use direct i/o */
+#define FS_NOCOW_FL0x0080 /* Do not cow file */
+#define FS_COW_FL  0x0200 /* Cow file */
 #define FS_RESERVED_FL 0x8000 /* reserved for ext2 lib */
 
 #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression

2011-03-21 Thread liubo

Data compression and data cow are controlled across the entire FS by mount
options right now.  ioctls are needed to set this on a per file or per
directory basis.  This has been proposed previously, but VFS developers
wanted us to use generic ioctls rather than btrfs-specific ones.

According to chris's comment, there should be just one true compression
method(probably LZO) stored in the super.  However, before this, we would
wait for that one method is stable enough to be adopted into the super.
So I list it as a long term goal, and just store it in ram today.

After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to
control file and directory's datacow and compression attribute.

NOTE:
 - The compression type is selected by such rules:
   If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
   Otherwise, we'll use the default compress type (zlib today).

v1-v2:
Rebase the patch with the latest btrfs.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |6 ++
 fs/btrfs/inode.c   |   32 
 fs/btrfs/ioctl.c   |   41 +
 4 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8b4b9d1..b77d1a5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1283,6 +1283,7 @@ struct btrfs_root {
 #define BTRFS_INODE_NODUMP (1  8)
 #define BTRFS_INODE_NOATIME(1  9)
 #define BTRFS_INODE_DIRSYNC(1  10)
+#define BTRFS_INODE_COMPRESS   (1  11)
 
 /* some macros to generate set/get funcs for the struct fields.  This
  * assumes there is a lefoo_to_cpu for every type, so lets make a simple
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..a894c12 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
 
+   /*
+* In the long term, we'll store the compression type in the super
+* block, and it'll be used for per file compression control.
+*/
+   fs_info-compress_type = BTRFS_COMPRESS_ZLIB;
+
ret = btrfs_parse_options(tree_root, options);
if (ret) {
err = ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index db67821..e687bb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -381,7 +381,8 @@ again:
 */
if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS) 
(btrfs_test_opt(root, COMPRESS) ||
-(BTRFS_I(inode)-force_compress))) {
+(BTRFS_I(inode)-force_compress) ||
+(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))) {
WARN_ON(pages);
pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
 
@@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct 
page *locked_page,
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 0, nr_written);
else if (!btrfs_test_opt(root, COMPRESS) 
-!(BTRFS_I(inode)-force_compress))
+!(BTRFS_I(inode)-force_compress) 
+!(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))
ret = cow_file_range(inode, locked_page, start, end,
  page_started, nr_written, 1);
else
@@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
location-offset = 0;
btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY);
 
-   btrfs_inherit_iflags(inode, dir);
-
if ((mode  S_IFREG)) {
if (btrfs_test_opt(root, NODATASUM))
BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM;
@@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW;
}
 
+   btrfs_inherit_iflags(inode, dir);
+
insert_inode_hash(inode);
inode_tree_add(inode);
return inode;
@@ -6803,6 +6805,26 @@ static int btrfs_getattr(struct vfsmount *mnt,
return 0;
 }
 
+/*
+ * If a file is moved, it will inherit the cow and compression flags of the new
+ * directory.
+ */
+static void fixup_inode_flags(struct inode *dir, struct inode *inode)
+{
+   struct btrfs_inode *b_dir = BTRFS_I(dir);
+   struct btrfs_inode *b_inode = BTRFS_I(inode);
+
+   if (b_dir-flags  BTRFS_INODE_NODATACOW)
+   b_inode-flags |= BTRFS_INODE_NODATACOW;
+   else
+   b_inode-flags = ~BTRFS_INODE_NODATACOW;
+
+   if (b_dir-flags  BTRFS_INODE_COMPRESS)
+   b_inode-flags |= BTRFS_INODE_COMPRESS;
+   else
+   b_inode-flags = ~BTRFS_INODE_COMPRESS;
+}
+
 static int btrfs_rename(struct inode *old_dir, struct dentry 

Re: [PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression

2011-03-21 Thread liubo
On 03/22/2011 01:43 AM, Johann Lombardi wrote:
 On Mon, Mar 21, 2011 at 04:57:13PM +0800, liubo wrote:
 @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct 
 btrfs_trans_handle *trans,
  location-offset = 0;
  btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY);
  
 -btrfs_inherit_iflags(inode, dir);
 -
  if ((mode  S_IFREG)) {
  if (btrfs_test_opt(root, NODATASUM))
  BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM;
 @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct 
 btrfs_trans_handle *trans,
  BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW;
  }
  
 +btrfs_inherit_iflags(inode, dir);
 
 The problem is that btrfs_inherit_iflags() overwrites BTRFS_I(inode)-flags 
 with the parent's flags, so you lose BTRFS_INODE_NODATA{SUM|COW}.
 

Thanks for pointing this, will fix it.

thanks,
liubo

 Johann
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: add datacow flag in inode flag

2011-03-16 Thread liubo
On 03/16/2011 05:06 PM, Amir Goldstein wrote:
 On Wed, Mar 16, 2011 at 1:35 AM, Chris Mason chris.ma...@oracle.com wrote:
 Excerpts from Andreas Dilger's message of 2011-03-15 18:06:49 -0400:
 On 2011-03-15, at 2:57 PM, Christoph Hellwig wrote:
 On Tue, Mar 15, 2011 at 04:26:50PM -0400, Chris Mason wrote:
  #define FS_EXTENT_FL 0x0008 /* Extents */
  #define FS_DIRECTIO_FL   0x0010 /* Use direct i/o */
 +#define FS_NOCOW_FL  0x0080 /* Do not cow file */
 +#define FS_COW_FL0x0100 /* Cow file */
  #define FS_RESERVED_FL   0x8000 /* reserved for ext2 lib */
 I'm fine with it.  I'll defer the check for conflicts with extN-specific 
 flags
 to Ted, though.
 Looking at the upstream e2fsprogs I see in that range:

 #define EXT4_EXTENTS_FL   0x0008 /* Inode uses extents */
 #define EXT4_EA_INODE_FL  0x0020 /* Inode used for large EA */
 #define EXT4_EOFBLOCKS_FL 0x0040 /* Blocks allocated beyond 
 EOF */
 #define EXT4_SNAPFILE_FL  0x0100 /* Inode is a snapshot */
 #define EXT4_SNAPFILE_DELETED_FL  0x0400 /* Snapshot is being deleted 
 */
 #define EXT4_SNAPFILE_SHRUNK_FL   0x0800 /* Snapshot shrink has 
 completed */
 #define EXT2_RESERVED_FL  0x8000 /* reserved for ext2 lib */

 #define EXT2_FL_USER_VISIBLE  0x004BDFFF /* User visible flags */
 so there is a conflict with FS_COW_FL and EXT4_SNAPFILE_FL.  I don't know 
 the semantics of those two flags enough to say for sure whether it is 
 reasonable that they alias to each other, but at first glance COW and 
 SNAPSHOT don't seem completely unrelated.
 
 EXT4_SNAPFILE_FL indicates a special system snapshot file, so it has
 no equivalence relation with FS_COW_FL.
 Please use 0x0200 for FS_COW_FL.

Fine with that, but it's up to Chris. :)

thanks,
liubo

 
 EXT4_SNAPFILE_DELETED_FL is a persistent state of a snapshot file,
 which is no longer
 available as a mountable device, but cannot be unlinked because it
 holds changed data sets
 needed by older snapshots.
 
 EXT4_SNAPFILE_SHRUNK_FL is a persistent state of a (deleted) snapshot
 file, which has
 undergone a shrink process to free all change sets not needed by
 older snapshots.
 The persistence of the flag is needed to avoid tedious shrinking when
 it is not needed.
 
 
 In the btrfs case FS_COW_FL means to do COW even when there are no
 snapshots.  FS_NOCOW_FL means to do cow only when there are snapshots.

 
 I am interested in FS_NOCOW_FL as well, but for my implementation it would 
 mean
 do not do COW on rewrites even when there are snapshots, so a user can
 create a pre-allocated
 island of blocks, which are pinned to a physical location, for raw
 VM image for example.
 
 
 Thanks,
 Amir.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix OOPS of empty filesystem after balance

2011-03-10 Thread liubo
On 03/07/2011 10:13 AM, liubo wrote:
 btrfs will remove unused block groups after balance.
 When a empty filesystem is balanced, the block group with tag DATA may be
 dropped, and after umount and mount again, it will not find DATA space_info
 and lead to OOPS.
 So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS.
 

Hi, Chirs,

These two fixes are for critical problems(one OOPS and one memory leak), so 
would
you please take some time to review them and check if they are ready for the 
next
git pull? 

Seems that you have been a lot busy these days. ;)

thanks,
liubo

 Reported-by: Daniel J Blueman daniel.blue...@gmail.com
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/ctree.h   |1 +
  fs/btrfs/disk-io.c |6 ++
  fs/btrfs/extent-tree.c |   23 +++
  3 files changed, 30 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 28188a7..49c50e5 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -2221,6 +2221,7 @@ int btrfs_error_discard_extent(struct btrfs_root *root, 
 u64 bytenr,
  u64 num_bytes);
  int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, u64 type);
 +int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
  
  /* ctree.c */
  int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 3e1ea3e..8bcdc62 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -1967,6 +1967,12 @@ struct btrfs_root *open_ctree(struct super_block *sb,
   fs_info-metadata_alloc_profile = (u64)-1;
   fs_info-system_alloc_profile = fs_info-metadata_alloc_profile;
  
 + ret = btrfs_init_space_info(fs_info);
 + if (ret) {
 + printk(KERN_ERR Failed to initial space info: %d\n, ret);
 + goto fail_block_groups;
 + }
 +
   ret = btrfs_read_block_groups(extent_root);
   if (ret) {
   printk(KERN_ERR Failed to read block groups: %d\n, ret);
 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 100e409..08525ee 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -8714,6 +8714,29 @@ out:
   return ret;
  }
  
 +int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
 +{
 + struct btrfs_space_info *space_info;
 + int ret;
 +
 + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0,
 +  space_info);
 + if (ret)
 + return ret;
 +
 + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0,
 +  space_info);
 + if (ret)
 + return ret;
 +
 + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0,
 +  space_info);
 + if (ret)
 + return ret;
 +
 + return ret;
 +}
 +
  int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 
 end)
  {
   return unpin_extent_range(root, start, end);

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: fix OOPS of empty filesystem after balance

2011-03-06 Thread liubo

btrfs will remove unused block groups after balance.
When a empty filesystem is balanced, the block group with tag DATA may be
dropped, and after umount and mount again, it will not find DATA space_info
and lead to OOPS.
So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS.

Reported-by: Daniel J Blueman daniel.blue...@gmail.com
Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |6 ++
 fs/btrfs/extent-tree.c |   23 +++
 3 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 28188a7..49c50e5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2221,6 +2221,7 @@ int btrfs_error_discard_extent(struct btrfs_root *root, 
u64 bytenr,
   u64 num_bytes);
 int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans,
struct btrfs_root *root, u64 type);
+int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
 
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..8bcdc62 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1967,6 +1967,12 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info-metadata_alloc_profile = (u64)-1;
fs_info-system_alloc_profile = fs_info-metadata_alloc_profile;
 
+   ret = btrfs_init_space_info(fs_info);
+   if (ret) {
+   printk(KERN_ERR Failed to initial space info: %d\n, ret);
+   goto fail_block_groups;
+   }
+
ret = btrfs_read_block_groups(extent_root);
if (ret) {
printk(KERN_ERR Failed to read block groups: %d\n, ret);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 100e409..08525ee 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8714,6 +8714,29 @@ out:
return ret;
 }
 
+int btrfs_init_space_info(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_space_info *space_info;
+   int ret;
+
+   ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0,
+space_info);
+   if (ret)
+   return ret;
+
+   ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0,
+space_info);
+   if (ret)
+   return ret;
+
+   ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0,
+space_info);
+   if (ret)
+   return ret;
+
+   return ret;
+}
+
 int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 end)
 {
return unpin_extent_range(root, start, end);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: fix memory leak of empty filesystem after balance

2011-03-06 Thread liubo

After Josef's patch(commit 3c14874acc71180553fb5aba528e3cf57c5b958b),
btrfs will exclude super bytes when reading block groups(by marking a extent
state UPTODATE).  However, these bytes do not get freed while balance remove
unused block groups, and we won't process those removed ones any more, when
we do umount and unload the btrfs module,  btrfs hits a memory leak.

This patch add the missing free operation.

Reproduce steps:
$ mkfs.btrfs disk
$ mount disk /mnt/btrfs -o loop
$ btrfs filesystem balance /mnt/btrfs
$ umount /mnt/btrfs
$ rmmod btrfs

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/extent-tree.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 08525ee..a1af67a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8611,6 +8611,12 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
BUG_ON(!block_group);
BUG_ON(!block_group-ro);
 
+   /*
+* Free the reserved super bytes from this block group before
+* remove it.
+*/
+   free_excluded_extents(root, block_group);
+
memcpy(key, block_group-key, sizeof(key));
if (block_group-flags  (BTRFS_BLOCK_GROUP_DUP |
  BTRFS_BLOCK_GROUP_RAID1 |
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Btrfs: add datacow flag in inode flag

2011-03-03 Thread liubo

For datacow control, the corresponding inode flags are needed.
This is for the following patch.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 include/linux/fs.h |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 63d069b..bef47ff 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -353,6 +353,8 @@ struct inodes_stat_t {
 #define FS_TOPDIR_FL   0x0002 /* Top of directory 
hierarchies*/
 #define FS_EXTENT_FL   0x0008 /* Extents */
 #define FS_DIRECTIO_FL 0x0010 /* Use direct i/o */
+#define FS_NOCOW_FL0x0080 /* Do not cow file */
+#define FS_COW_FL  0x0100 /* Cow file */
 #define FS_RESERVED_FL 0x8000 /* reserved for ext2 lib */
 
 #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Btrfs: Per file/directory controls for COW and compression

2011-03-03 Thread liubo

Data compression and data cow are controlled across the entire FS by mount
options right now.  ioctls are needed to set this on a per file or per
directory basis.  This has been proposed previously, but VFS developers
wanted us to use generic ioctls rather than btrfs-specific ones.

According to chris's comment, there should be just one true compression
method(probably LZO) stored in the super.  However, before this, we would
wait for that one method is stable enough to be adopted into the super.
So I list it as a long term goal, and just store it in ram today.

After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to
control file and directory's datacow and compression attribute.

NOTE:
 - The compression type is selected by such rules:
   If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
   Otherwise, we'll use the default compress type (zlib today).

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |6 ++
 fs/btrfs/inode.c   |   32 
 fs/btrfs/ioctl.c   |   41 +
 4 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 28188a7..2639107 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1274,6 +1274,7 @@ struct btrfs_root {
 #define BTRFS_INODE_NODUMP (1  8)
 #define BTRFS_INODE_NOATIME(1  9)
 #define BTRFS_INODE_DIRSYNC(1  10)
+#define BTRFS_INODE_COMPRESS   (1  11)
 
 /* some macros to generate set/get funcs for the struct fields.  This
  * assumes there is a lefoo_to_cpu for every type, so lets make a simple
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3e1ea3e..a894c12 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
 
+   /*
+* In the long term, we'll store the compression type in the super
+* block, and it'll be used for per file compression control.
+*/
+   fs_info-compress_type = BTRFS_COMPRESS_ZLIB;
+
ret = btrfs_parse_options(tree_root, options);
if (ret) {
err = ret;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 44b9266..82ca86f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -381,7 +381,8 @@ again:
 */
if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NOCOMPRESS) 
(btrfs_test_opt(root, COMPRESS) ||
-(BTRFS_I(inode)-force_compress))) {
+(BTRFS_I(inode)-force_compress) ||
+(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))) {
WARN_ON(pages);
pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS);
 
@@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct 
page *locked_page,
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 0, nr_written);
else if (!btrfs_test_opt(root, COMPRESS) 
-!(BTRFS_I(inode)-force_compress))
+!(BTRFS_I(inode)-force_compress) 
+!(BTRFS_I(inode)-flags  BTRFS_INODE_COMPRESS))
ret = cow_file_range(inode, locked_page, start, end,
  page_started, nr_written, 1);
else
@@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
location-offset = 0;
btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY);
 
-   btrfs_inherit_iflags(inode, dir);
-
if ((mode  S_IFREG)) {
if (btrfs_test_opt(root, NODATASUM))
BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM;
@@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct 
btrfs_trans_handle *trans,
BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW;
}
 
+   btrfs_inherit_iflags(inode, dir);
+
insert_inode_hash(inode);
inode_tree_add(inode);
return inode;
@@ -6801,6 +6803,26 @@ static int btrfs_getattr(struct vfsmount *mnt,
return 0;
 }
 
+/*
+ * If a file is moved, it will inherit the cow and compression flags of the new
+ * directory.
+ */
+static void fixup_inode_flags(struct inode *dir, struct inode *inode)
+{
+   struct btrfs_inode *b_dir = BTRFS_I(dir);
+   struct btrfs_inode *b_inode = BTRFS_I(inode);
+
+   if (b_dir-flags  BTRFS_INODE_NODATACOW)
+   b_inode-flags |= BTRFS_INODE_NODATACOW;
+   else
+   b_inode-flags = ~BTRFS_INODE_NODATACOW;
+
+   if (b_dir-flags  BTRFS_INODE_COMPRESS)
+   b_inode-flags |= BTRFS_INODE_COMPRESS;
+   else
+   b_inode-flags = ~BTRFS_INODE_COMPRESS;
+}
+
 static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry,
   struct inode 

Re: [RFC PATCH] Btrfs: add ioctl to set compress or cow per file/dir

2011-02-24 Thread liubo
On 02/24/2011 10:54 PM, Chris Mason wrote:
 Excerpts from liubo's message of 2011-02-24 04:40:55 -0500:
 Data compression and data cow are controlled across the entire FS by mount
 options right now.  ioctls are needed to set this on a per file or per
 directory basis.  This has been proposed previously, but VFS developers
 wanted us to use generic ioctls rather than btrfs-specific ones.

 We need to fit these into the existing per-inode flags, and to use the 
 generic
 FS_IOCTL_SETFLAGS ioctl.  For data compression, there are the existing
 compression flags of vfs inode, while for datacow, there is no flag to
 indicate it, which we need to add.
 So, what we will do is to add datacow flag in vfs inode flags and then to
 set or to unset btrfs compress/cow flag on the corresponding btrfs inode's 
 flag
 per file or per directory.  Moreover, we also add a compression type ioctl to
 make this feature more flexible.

 I really expect some advices and comments on the followings:

 - In this patch, I made a special ioctl to set compress type, and to record
   the compress_type per inode on disk, I've consumed some reserved space of
   btrfs_inode_item, so is this acceptable?
 
 I don't expect people to mix compression types on the disk.  There
 really should just be one true compression method (probably LZO once it
 has been established for a while).  So, I'd prefer that we store this in
 the super, and just have flags in the inode for enabling or disabling
 compression.

It sounds nice and will make code neatly. :)
So, all files  directories will share the same compress type stored in the 
super.

 
   Meanwhile, I got another idea from my collegue, could we just owe the whole
   compress type thing to new proper mount options, ie,
   mount xxx xxx -o compress=a,inode_compress=b?
   Seems that this makes mount more flexible.
 
 It does make it more flexible, but I think sometimes extra flexibility
 leads to more QA time and isn't often used by the actual users ;)

ok.

 
 - When we are inclined to set inode's compression type, should it be a 
 force
   mode?
   This is much like the difference between mount as compress and mount as
   compress-force.
 
 I'd store this as flags in the super too.

ok.

 
 - For directory basis, after compress/cow ioctl on it, any files that are
   created or renamed in it, or moved into it, will inherit the directory's
   compress and datacow attribute.
   Here comes to some disputes, is it right that renamed and moved files
   also inherit the father directory's compress  datacow attribute?
   And if what we are dealing with is directory, should this behaviour be
   recursive or not?
   I'm inclined to leave these recursive things to btrfs-progs if this is
   necessary.
 
 I'd say that if we rename a file into a directory it does inherit, but
 not make it recursive.


ok, got it.
I will send a new version based on this thread.

Thanks a lot for reviewing!

thanks,
liubo
 
 -chris
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] Btrfs: add ioctl to set compress or cow per file/dir

2011-02-24 Thread liubo
On 02/25/2011 02:39 AM, Chris Mason wrote:
 Excerpts from Andreas Dilger's message of 2011-02-24 13:37:52 -0500:
 On 2011-02-24, at 2:40 AM, liubo wrote:
  #define FS_DIRECTIO_FL0x0010 /* Use direct i/o */
 +#define FS_NOCOW_FL0x0020 /* Do not cow file */
 +#define FS_COW_FL0x0010 /* Cow file */
  #define FS_RESERVED_FL0x8000 /* reserved for ext2 lib */
 I'm assuming that FS_COW_FL should not be the same as FS_DIRECTIO_FL?
 
 No, we can do DIRECTIO with COW.
 

Sorry for my fault, thanks for pointing it out.

thanks,
liubo

 -chris
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: make inode ref log recovery faster

2011-02-22 Thread liubo
On 02/22/2011 10:32 PM, David Sterba wrote:
 Hi,
 
 no deeper analysis done, but the double free error was obvious :)
 
 On Tue, Feb 22, 2011 at 07:42:25PM +0800, liubo wrote:
 When we recover from crash via write-ahead log tree and process
 the inode refs, for each btrfs_inode_ref item, we will
 1) check if we already have a perfect match in fs/file tree, if
we have, then we're done.
 2) search the corresponding back reference in fs/file tree, and
check all the names in this back reference to see if they are
also in the log to avoid conflict corners.
 3) recover the logged inode refs to fs/file tree.

 In current btrfs, however,
 - for 2)'s check, once is enough, since the checked back references
   will remain unchanged after processing all the inode refs belonged
   to the key.
 - it has no need to do another 1) between 2) and 3).

 This patch focus on the above problems and
 I've made a small test to show how it improves,

 $dd if=/dev/zero of=foobar bs=4K count=1
 $sync
 $make 100 hard links continuously, like ln foobar link_i
 $fsync foobar
 $echo b  /proc/sysrq-trigger
 after reboot
 $time mount DEV PATH

 without patch:
 real 0m0.285s
 user 0m0.001s
 sys  0m0.009s

 with patch:
 real 0m0.123s
 user 0m0.000s
 sys  0m0.010s

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/tree-log.c |   33 +++--
  1 files changed, 11 insertions(+), 22 deletions(-)

 diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
 index a4bbb85..8f2a9f3 100644
 --- a/fs/btrfs/tree-log.c
 +++ b/fs/btrfs/tree-log.c
 @@ -799,12 +799,12 @@ static noinline int add_inode_ref(struct 
 btrfs_trans_handle *trans,
  struct inode *dir;
  int ret;
  struct btrfs_inode_ref *ref;
 -struct btrfs_dir_item *di;
  struct inode *inode;
  char *name;
  int namelen;
  unsigned long ref_ptr;
  unsigned long ref_end;
 +int search_done = 0;
  
  /*
   * it is possible that we didn't log all the parent directories
 @@ -845,7 +845,10 @@ again:
   * existing back reference, and we don't want to create
   * dangling pointers in the directory.
   */
 -conflict_again:
 +
 +if (search_done)
 +goto insert;
 +
  ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
  if (ret == 0) {
  char *victim_name;
 @@ -888,35 +891,21 @@ conflict_again:
   victim_name_len);
  kfree(victim_name);
 ^^^
  btrfs_release_path(root, path);
 -goto conflict_again;
  }
  kfree(victim_name);
 ^^^
 double free

thanks for reviewing, but the first one is followed by a goto phrase, so IMO it 
is ok.

 
  ptr = (unsigned long)(victim_ref + 1) + victim_name_len;
  }
  BUG_ON(ret);
 -}
 -btrfs_release_path(root, path);
  
 -/* look for a conflicting sequence number */
 -di = btrfs_lookup_dir_index_item(trans, root, path, dir-i_ino,
 - btrfs_inode_ref_index(eb, ref),
 - name, namelen, 0);
 -if (di  !IS_ERR(di)) {
 -ret = drop_one_dir_item(trans, root, path, dir, di);
 -BUG_ON(ret);
 -}
 -btrfs_release_path(root, path);
 -
 -
 -/* look for a conflicting name */
 -di = btrfs_lookup_dir_item(trans, root, path, dir-i_ino,
 -   name, namelen, 0);
 -if (di  !IS_ERR(di)) {
 -ret = drop_one_dir_item(trans, root, path, dir, di);
 -BUG_ON(ret);
 +/*
 + * NOTE: we have searched root tree and checked the
 + * coresponding ref, it does not need to check again.
 + */
 +search_done = 1;
  }
  btrfs_release_path(root, path);
  
 +insert:
  /* insert our name */
  ret = btrfs_add_link(trans, dir, inode, name, namelen, 0,
   btrfs_inode_ref_index(eb, ref));
 -- 
 
 d/
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: make inode ref log recovery faster

2011-02-22 Thread liubo
On 02/23/2011 09:30 AM, Josef Bacik wrote:
 On Wed, Feb 23, 2011 at 09:12:36AM +0800, liubo wrote:
 On 02/22/2011 10:32 PM, David Sterba wrote:
 Hi,

 no deeper analysis done, but the double free error was obvious :)

 On Tue, Feb 22, 2011 at 07:42:25PM +0800, liubo wrote:
 When we recover from crash via write-ahead log tree and process
 the inode refs, for each btrfs_inode_ref item, we will
 1) check if we already have a perfect match in fs/file tree, if
we have, then we're done.
 2) search the corresponding back reference in fs/file tree, and
check all the names in this back reference to see if they are
also in the log to avoid conflict corners.
 3) recover the logged inode refs to fs/file tree.

 In current btrfs, however,
 - for 2)'s check, once is enough, since the checked back references
   will remain unchanged after processing all the inode refs belonged
   to the key.
 - it has no need to do another 1) between 2) and 3).

 This patch focus on the above problems and
 I've made a small test to show how it improves,

 $dd if=/dev/zero of=foobar bs=4K count=1
 $sync
 $make 100 hard links continuously, like ln foobar link_i
 $fsync foobar
 $echo b  /proc/sysrq-trigger
 after reboot
 $time mount DEV PATH

 without patch:
 real   0m0.285s
 user   0m0.001s
 sys0m0.009s

 with patch:
 real   0m0.123s
 user   0m0.000s
 sys0m0.010s

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/tree-log.c |   33 +++--
  1 files changed, 11 insertions(+), 22 deletions(-)

 diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
 index a4bbb85..8f2a9f3 100644
 --- a/fs/btrfs/tree-log.c
 +++ b/fs/btrfs/tree-log.c
 @@ -799,12 +799,12 @@ static noinline int add_inode_ref(struct 
 btrfs_trans_handle *trans,
struct inode *dir;
int ret;
struct btrfs_inode_ref *ref;
 -  struct btrfs_dir_item *di;
struct inode *inode;
char *name;
int namelen;
unsigned long ref_ptr;
unsigned long ref_end;
 +  int search_done = 0;
  
/*
 * it is possible that we didn't log all the parent directories
 @@ -845,7 +845,10 @@ again:
 * existing back reference, and we don't want to create
 * dangling pointers in the directory.
 */
 -conflict_again:
 +
 +  if (search_done)
 +  goto insert;
 +
ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
if (ret == 0) {
char *victim_name;
 @@ -888,35 +891,21 @@ conflict_again:
 victim_name_len);
kfree(victim_name);
 ^^^
btrfs_release_path(root, path);
 -  goto conflict_again;
}
kfree(victim_name);
 ^^^
 double free
 thanks for reviewing, but the first one is followed by a goto phrase, so IMO 
 it is ok.

 
 Your patch removes that goto, so it's not ok.  Thanks,

ahh, my fault.
I'll fix it, thanks a lot, :)

liubo

 
 Josef 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Building btrfs as a dkms module on Debian

2011-02-15 Thread liubo
On 02/15/2011 11:35 PM, Yuri D'Elia wrote:
 Hi everyone. I was trying to test a more recent version of btrfs on my 
 current kernel (2.6.37) using dkms, without success.
 
 I followed these instructions:
 
 https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories
 
 - cloned the repo
 - symlinked to /usr/src/btrfs-git
 - patched version.sh:
 
 Please note version.sh requires bash (better to change the shebang or fix the 
 script).
 Even with the patch, version.sh run on a shallow repository generates a 
 -dirty version. I assume this is OK, even though there are no local changes.
 
 - run version.sh
 - dkms add -m btrfs -v git
 - dkms build -m btrfs -v git fails with:
 
 /var/lib/dkms/btrfs/git/build/extent-tree.c: In function 
 ‘btrfs_issue_discard’:
 /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: ‘BLKDEV_IFL_WAIT’ 
 undeclared (first use in this function)
 /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: (Each undeclared 
 identifier is reported only once
 /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: for each function it 
 appears in.)
 /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: ‘BLKDEV_IFL_BARRIER’ 
 undeclared (first use in this function)
 
 I assume BLKDEV_IFL_WAIT/BARRIER was added in later kernels?
 Is there a way to make it build btrfs for 2.6.37?

in commit fbd9b09a177a481eda256447c881f014f29034fe:
include/linux/blkdev.h:

#define BLKDEV_IFL_WAIT (1  BLKDEV_WAIT)
#define BLKDEV_IFL_BARRIER  (1  BLKDEV_BARRIER)
#define BLKDEV_IFL_SECURE   (1  BLKDEV_SECURE)

Maybe this is helpful.:)

thanks,
liubo

 
 Thanks
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs: fix missing break in switch phrase

2011-01-25 Thread liubo

There is a missing break in switch, fix it.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/print-tree.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index 0d126be..fb2605d 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -260,6 +260,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct 
extent_buffer *l)
 #else
BUG();
 #endif
+   break;
case BTRFS_BLOCK_GROUP_ITEM_KEY:
bi = btrfs_item_ptr(l, i,
struct btrfs_block_group_item);
-- 
1.6.5.2
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix return value check of btrfs_start_transaction()

2011-01-20 Thread liubo
On 01/21/2011 12:09 AM, Josef Bacik wrote:
 On Thu, Jan 20, 2011 at 03:19:37PM +0900, Tsutomu Itoh wrote:
 The error check of btrfs_start_transaction() is added, and the mistake
 of the error check on several places is corrected. 

 
 I'd rather we go through and have these things return an error than do a
 BUG_ON().  We're moving towards a more stable BTRFS, not one that panics more
 often :).  Thanks,

Great, seems that we all feel it is the time to focus on this. :)

 
 Josef
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: forced readonly mounts on errors

2011-01-17 Thread liubo
On 01/18/2011 03:56 AM, Chris Mason wrote:
 Excerpts from liubo's message of 2011-01-06 06:30:25 -0500:
 This patch comes from Forced readonly mounts on errors ideas.

 As we know, this is the first step in being more fault tolerant of disk
 corruptions instead of just using BUG() statements.

 The major content:
 - add a framework for generating errors that should result in filesystems
   going readonly.
 - keep FS state in disk super block.
 - make sure that all of resource will be freed and released at umount time.
 - make sure that after FS is forced readonly on error, there will be no more
   disk change before FS is corrected. For this, we should stop write 
 operation.

 After this patch is applied, the conversion from BUG() to such a framework 
 can
 happen incrementally.
 
 I think this is a good overall framework and it will meet our needs
 nicely as we scale up the error handling in the filesystem.
 
 One concern I have is where we save the error state to disk:
 
 +static void __save_error_info(struct btrfs_fs_info *fs_info)
 +{
 +struct btrfs_super_block *disk_super = fs_info-super_copy;
 +
 +fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR;
 +disk_super-flags |= cpu_to_le64(BTRFS_SUPER_FLAG_ERROR);
 +
 +mutex_lock(fs_info-trans_mutex);
 +memcpy(fs_info-super_for_commit, disk_super,
 +   sizeof(fs_info-super_for_commit));
 +mutex_unlock(fs_info-trans_mutex);
 
 The super_for_commit isn't changed until we have a fully consistent set
 of fields in the super block.  The super_copy is changed as the
 transaction progresses.
 
 So, this memcpy isn't quite safe.  We should simply set the flag on the
 super_for_commit and the super_copy individually.
 

Got it, thanks for pointing it out.

 I'll make this change and pull it in.  We can build from here.
 

Great!

thanks,
Liubo

 -chris
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: forced readonly mounts on errors

2011-01-06 Thread liubo

This patch comes from Forced readonly mounts on errors ideas.

As we know, this is the first step in being more fault tolerant of disk
corruptions instead of just using BUG() statements.

The major content:
- add a framework for generating errors that should result in filesystems
  going readonly.
- keep FS state in disk super block.
- make sure that all of resource will be freed and released at umount time.
- make sure that after FS is forced readonly on error, there will be no more
  disk change before FS is corrected. For this, we should stop write operation.

After this patch is applied, the conversion from BUG() to such a framework can
happen incrementally.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h   |   24 +++
 fs/btrfs/disk-io.c |  389 +++-
 fs/btrfs/disk-io.h |1 +
 fs/btrfs/extent-tree.c |   11 ++
 fs/btrfs/file.c|   11 ++
 fs/btrfs/super.c   |   88 +++
 fs/btrfs/transaction.c |3 +
 7 files changed, 525 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index af52f6d..63c35f8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -294,6 +294,14 @@ static inline unsigned long btrfs_chunk_item_size(int 
num_stripes)
 #define BTRFS_FSID_SIZE 16
 #define BTRFS_HEADER_FLAG_WRITTEN  (1ULL  0)
 #define BTRFS_HEADER_FLAG_RELOC(1ULL  1)
+
+/*
+ * File system states
+ */
+
+/* Errors detected */
+#define BTRFS_SUPER_FLAG_ERROR (1ULL  2)
+
 #define BTRFS_SUPER_FLAG_SEEDING   (1ULL  32)
 #define BTRFS_SUPER_FLAG_METADUMP  (1ULL  33)
 
@@ -1050,6 +1058,9 @@ struct btrfs_fs_info {
unsigned metadata_ratio;
 
void *bdev_holder;
+
+   /* filesystem state */
+   u64 fs_state;
 };
 
 /*
@@ -2188,6 +2199,11 @@ int btrfs_set_block_group_ro(struct btrfs_root *root,
 int btrfs_set_block_group_rw(struct btrfs_root *root,
 struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
+int btrfs_error_unpin_extent_range(struct btrfs_root *root,
+  u64 start, u64 end);
+int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr,
+  u64 num_bytes);
+
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 int level, int *slot);
@@ -2541,6 +2557,14 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char 
*buffer, size_t size);
 /* super.c */
 int btrfs_parse_options(struct btrfs_root *root, char *options);
 int btrfs_sync_fs(struct super_block *sb, int wait);
+void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
+unsigned int line, int errno);
+
+#define btrfs_std_error(fs_info, errno)\
+do {   \
+   if ((errno))\
+   __btrfs_std_error((fs_info), __func__, __LINE__, (errno));\
+} while (0)
 
 /* acl.c */
 #ifdef CONFIG_BTRFS_FS_POSIX_ACL
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a5d2249..4f70256 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -44,6 +44,20 @@
 static struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
+static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+   int read_only);
+static int btrfs_destroy_ordered_operations(struct btrfs_root *root);
+static int btrfs_destroy_ordered_extents(struct btrfs_root *root);
+static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
+ struct btrfs_root *root);
+static int btrfs_destroy_pending_snapshots(struct btrfs_transaction *t);
+static int btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
+static int btrfs_destroy_marked_extents(struct btrfs_root *root,
+   struct extent_io_tree *dirty_pages,
+   int mark);
+static int btrfs_destroy_pinned_extent(struct btrfs_root *root,
+  struct extent_io_tree *pinned_extents);
+static int btrfs_cleanup_transaction(struct btrfs_root *root);
 
 /*
  * end_io_wq structs are used to do processing in task context when an IO is
@@ -1727,6 +1741,11 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (!btrfs_super_root(disk_super))
goto fail_iput;
 
+   /* check FS state, whether FS is broken. */
+   fs_info-fs_state |= btrfs_super_flags(disk_super);
+
+   btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
+
ret = btrfs_parse_options(tree_root, options);
if (ret) {
err = ret;
@@ -1957,7 +1976,9 @@ struct btrfs_root *open_ctree(struct super_block *sb,

Re: [RFC PATCH 0/5 v3] Btrfs: Add readonly support to replace BUG_ON phrase

2010-12-15 Thread liubo
Hi, chris,

Is there any comment on this Forced readonly mounts on errors patchset?

thanks,
Liu Bo

On 12/03/2010 04:15 PM, liubo wrote:
 Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic.
 Meanwhile, they are very ugly and should be handled more propriately.
 
 There are mainly two ways to deal with these BUG_ON()s.
 
 1. For those errors which can be handled well by callers, we just return their
 error number to callers.
 
 2. For others, We can force the filesystem readonly when it hits errors, which
  is what this patchset has done. Replaced BUG_ON() with the interface provided
  in this patchset, we will get error infomation via dmesg. Since btrfs is now 
 readonly, we can save our data safely and umount it, then a btrfsck is 
 recommended.
 
 By these ways, we can protect our filesystem from panic caused by those 
 BUG_ONs.
 
 We still need a incompat flag to make old kernels happy.
 
 This patchset needs more test.
 
 v2-v3:
 - since btrfs may do log replay after crash, even it is mounted as readonly,
   and we have add a readonly check at start transaction time, it needs to set
   and to restore readonly flags around log replay.
 
 v1-v2:
 - in order to avoid deadlock thing, move write super stuff from error handle
   path to unmount time.
 - remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it
   simple.
 - add MS_RDONLY check at start of a transaction instead of commit transaction.
 ---
  fs/btrfs/ctree.h   |   19 ++
  fs/btrfs/disk-io.c |   56 +-
  fs/btrfs/super.c   |   88 
 
  fs/btrfs/transaction.c |3 ++
  4 files changed, 164 insertions(+), 2 deletions(-)
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly

2010-12-15 Thread liubo
On 12/15/2010 04:45 PM, Yan, Zheng wrote:
 On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote:
 When the filesystem is readonly, avoid transaction stuff by checking 
 MS_RDONLY
 at start transaction time.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/transaction.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 1fffbc0..14a597d 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -181,6 +181,9 @@ static struct btrfs_trans_handle 
 *start_transaction(struct btrfs_root *root,
struct btrfs_trans_handle *h;
struct btrfs_transaction *cur_trans;
int ret;
 +
 +   if (root-fs_info-sb-s_flags  MS_RDONLY)
 +   return ERR_PTR(-EROFS);
  again:
h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
if (!h)
 
 There are cases that we need to start transaction when MS_RDONLY flag is set.
 For example, remount FS into read-only mode and log replay.

However, is it weird to make changes to disk as fs is in readonly state?
IMO, btrfs needs to limit the use of these disk-change while readonly cases,
as it is not what readonly means.

Since it has been here, we can bypass readonly in those cases(as I did in the 
5th patch):

...
flags = sb-s_flags;
if (sb-s_flags  MS_RDONLY)
sb-s_flags = ~MS_RDONLY

remount()

sb-s_flags = flags;
...

thanks,
Liu Bo

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly

2010-12-15 Thread liubo
On 12/16/2010 12:03 AM, Chris Mason wrote:
 Excerpts from liubo's message of 2010-12-15 04:12:14 -0500:
 On 12/15/2010 04:45 PM, Yan, Zheng wrote:
 On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote:
 When the filesystem is readonly, avoid transaction stuff by checking 
 MS_RDONLY
 at start transaction time.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/transaction.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 1fffbc0..14a597d 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -181,6 +181,9 @@ static struct btrfs_trans_handle 
 *start_transaction(struct btrfs_root *root,
struct btrfs_trans_handle *h;
struct btrfs_transaction *cur_trans;
int ret;
 +
 +   if (root-fs_info-sb-s_flags  MS_RDONLY)
 +   return ERR_PTR(-EROFS);
  again:
h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
if (!h)
 There are cases that we need to start transaction when MS_RDONLY flag is 
 set.
 For example, remount FS into read-only mode and log replay.
 However, is it weird to make changes to disk as fs is in readonly state?
 IMO, btrfs needs to limit the use of these disk-change while readonly 
 cases,
 as it is not what readonly means.
 
 reiserfs and ext3 at least have always done this.  Log replay is
 required even when the FS is readonly.
 

My concern is:
now we have a forced readonly FS, which is already broken, if we still write 
something to
disk, would it become more broken?

 Since it has been here, we can bypass readonly in those cases(as I did in 
 the 5th patch):

 ...
 flags = sb-s_flags;
 if (sb-s_flags  MS_RDONLY)
 sb-s_flags = ~MS_RDONLY
 
 I think we should have a dedicated flag to reflect a filesystem that is
 forced readonly, and check that flag instead.

OK, we did have fs_state, a dedicated flag.

thanks,
Liu Bo

 
 -chris
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix compile warning in fs/btrfs/inode.c

2010-12-08 Thread liubo
While compiling btrfs, I got belows:

  CC [M]  fs/btrfs/inode.o
fs/btrfs/inode.c: In function ‘btrfs_end_dio_bio’:
fs/btrfs/inode.c:5720: warning: format ‘%lu’ expects type ‘long unsigned int’, 
but argument 4 has type ‘sector_t’
  LD [M]  fs/btrfs/btrfs.o
  Building modules, stage 2.
  MODPOST 1 modules
  LD [M]  fs/btrfs/btrfs.ko

This fixes the compile warning.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/inode.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0f34cae..eff5aef 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5713,8 +5713,8 @@ static void btrfs_end_dio_bio(struct bio *bio, int err)
if (err) {
printk(KERN_ERR btrfs direct IO failed ino %lu rw %lu 
  disk_bytenr %lu len %u err no %d\n,
- dip-inode-i_ino, bio-bi_rw, bio-bi_sector,
- bio-bi_size, err);
+ dip-inode-i_ino, bio-bi_rw,
+ (unsigned long)bio-bi_sector, bio-bi_size, err);
dip-errors = 1;
 
/*
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compile warning in fs/btrfs/inode.c

2010-12-08 Thread liubo
On 12/08/2010 06:01 PM, liubo wrote:
 While compiling btrfs, I got belows:
 
   CC [M]  fs/btrfs/inode.o
 fs/btrfs/inode.c: In function ‘btrfs_end_dio_bio’:
 fs/btrfs/inode.c:5720: warning: format ‘%lu’ expects type ‘long unsigned 
 int’, but argument 4 has type ‘sector_t’
   LD [M]  fs/btrfs/btrfs.o
   Building modules, stage 2.
   MODPOST 1 modules
   LD [M]  fs/btrfs/btrfs.ko
 
 This fixes the compile warning.
 

Sorry, plz ignore this.

Have seen someone post patch to fix this.

thanks,
Liu Bo

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/inode.c |4 ++--
  1 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 0f34cae..eff5aef 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -5713,8 +5713,8 @@ static void btrfs_end_dio_bio(struct bio *bio, int err)
   if (err) {
   printk(KERN_ERR btrfs direct IO failed ino %lu rw %lu 
 disk_bytenr %lu len %u err no %d\n,
 -   dip-inode-i_ino, bio-bi_rw, bio-bi_sector,
 -   bio-bi_size, err);
 +   dip-inode-i_ino, bio-bi_rw,
 +   (unsigned long)bio-bi_sector, bio-bi_size, err);
   dip-errors = 1;
  
   /*

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: create a unique inode number for all subvol entries

2010-12-06 Thread liubo
On 12/07/2010 04:48 AM, Josef Bacik wrote:
 Currently BTRFS has a problem where any subvol's will have the same inode
 numbers as other files in the parent subvol.  This can cause problems with
 userspace apps that depend on inode numbers being unique across a volume.  So 
 in
 order to solve this problem we need to do the following
 
 1) Create an empty key with the fake inode number for the subvol.  This is a
 place holder, since we determine which inode number to start with by searching
 for the largest objectid in the subvolume, we need to make sure our fake inode
 number isn't reused by somebody else.
 
 2) Save our fake inode number in our dir item.  We can already store data in 
 dir
 items, so just store the inode number.  This is future proof since I 
 explicitly
 check for data_len == sizeof(u64), that way if we change what data gets put in
 the dir item in the future, older kernels will be able to deal with it 
 properly.
 Also if an older kernel mounts with this change it will be ok.
 
 Since subvols have their own st_dev it is ok for them to continue to have an
 inode number of 256, but the inode returned by readdir needs to be unique to 
 the
 subvolume, so our fake inode number will be used for d_ino with readdir.  I
 tested this with a program that Bruce Fields wrote to spit out the actual 
 inode
 numbers and the inode number returned by readdir
 
 r...@test1244 ~]# touch /mnt/btrfs-test/foo
 [r...@test1244 ~]# touch /mnt/btrfs-test/bar
 [r...@test1244 ~]# touch /mnt/btrfs-test/baz
 [r...@test1244 ~]# ./btrfs-progs-unstable/btrfs subvol create
 /mnt/btrfs-test/subvol
 Create subvolume '/mnt/btrfs-test/subvol'
 [r...@test1244 ~]# ./readdir-test /mnt/btrfs-test/
 . 256 256
 .. 256 139265
 foo 257 257
 bar 258 258
 baz 259 259
 subvol 260 256
 
 Thanks,

Hi, Josef,

The patch looks nice.

since insert dir code is mainly same, what about to change 
btrfs_insert_dir_item ABI
to use such phrase:

int btrfs_insert_subvol_dir_item(...)
{
return btrfs_insert_dir_item(...);
}

does it make code simple?

Thanks,
Liu Bo

 
 Signed-off-by: Josef Bacik jo...@redhat.com
 ---
  fs/btrfs/ctree.h   |6 +++
  fs/btrfs/dir-item.c|  113 
 
  fs/btrfs/inode.c   |   58 -
  fs/btrfs/ioctl.c   |   13 -
  fs/btrfs/transaction.c |   22 +++--
  5 files changed, 202 insertions(+), 10 deletions(-)
 
 diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
 index 54e4252..ea0662e 100644
 --- a/fs/btrfs/ctree.h
 +++ b/fs/btrfs/ctree.h
 @@ -1161,6 +1161,8 @@ struct btrfs_root {
  #define BTRFS_DIR_LOG_INDEX_KEY 72
  #define BTRFS_DIR_ITEM_KEY   84
  #define BTRFS_DIR_INDEX_KEY  96
 +#define BTRFS_DIR_SUBVOL_KEY 97
 +
  /*
   * extent data is for file data
   */
 @@ -2320,6 +2322,10 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle 
 *trans,
 struct btrfs_root *root, const char *name,
 int name_len, u64 dir,
 struct btrfs_key *location, u8 type, u64 index);
 +int btrfs_insert_subvol_dir_item(struct btrfs_trans_handle *trans,
 +  struct btrfs_root *root, const char *name,
 +  int name_len, u64 dir, u64 ino,
 +  struct btrfs_key *location, u64 index);
  struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle 
 *trans,
struct btrfs_root *root,
struct btrfs_path *path, u64 dir,
 diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
 index f0cad5a..95d498f 100644
 --- a/fs/btrfs/dir-item.c
 +++ b/fs/btrfs/dir-item.c
 @@ -116,6 +116,119 @@ int btrfs_insert_xattr_item(struct btrfs_trans_handle 
 *trans,
   return ret;
  }
  
 +/**
 + * btrfs_insert_subvol_dir_item - setup the dir items for a subvol
 + *
 + * @trans: transaction handle
 + * @root: the root of the parent subvol
 + * @name: name of the subvol
 + * @name_len: the length of the name
 + * @dir: the objectid of the parent directory
 + * @ino: the unique inode number for the parent directory
 + * @key: the key that the items will point to
 + * @index: the dir index for readdir purposes
 + *
 + * Creates the dir item/dir index pair for the directory containing the 
 subvol.
 + * This also creates a blank key to hold the made up inode number for the 
 subvol
 + * in order to give us a unique to the parent subvol inode number.
 + */
 +int btrfs_insert_subvol_dir_item(struct btrfs_trans_handle *trans,
 +  struct btrfs_root *root, const char *name,
 +  int name_len, u64 dir, u64 ino,
 +  struct btrfs_key *location, u64 index)
 +{
 + int ret = 0;
 + int ret2 = 0;
 + struct btrfs_path *path;
 + struct btrfs_dir_item *dir_item;
 + struct extent_buffer *leaf;
 + unsigned long name_ptr;
 + unsigned long 

[RFC PATCH 0/5 v3] Btrfs: Add readonly support to replace BUG_ON phrase

2010-12-03 Thread liubo
Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic.
Meanwhile, they are very ugly and should be handled more propriately.

There are mainly two ways to deal with these BUG_ON()s.

1. For those errors which can be handled well by callers, we just return their
error number to callers.

2. For others, We can force the filesystem readonly when it hits errors, which
 is what this patchset has done. Replaced BUG_ON() with the interface provided
 in this patchset, we will get error infomation via dmesg. Since btrfs is now 
readonly, we can save our data safely and umount it, then a btrfsck is 
recommended.

By these ways, we can protect our filesystem from panic caused by those 
BUG_ONs.

We still need a incompat flag to make old kernels happy.

This patchset needs more test.

v2-v3:
- since btrfs may do log replay after crash, even it is mounted as readonly,
  and we have add a readonly check at start transaction time, it needs to set
  and to restore readonly flags around log replay.

v1-v2:
- in order to avoid deadlock thing, move write super stuff from error handle
  path to unmount time.
- remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it
  simple.
- add MS_RDONLY check at start of a transaction instead of commit transaction.
---
 fs/btrfs/ctree.h   |   19 ++
 fs/btrfs/disk-io.c |   56 +-
 fs/btrfs/super.c   |   88 
 fs/btrfs/transaction.c |3 ++
 4 files changed, 164 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/5 v3] Btrfs: add readonly support for error handle

2010-12-03 Thread liubo
This patch provide a new error handle interface for those errors that handled
by current BUG_ONs.

In order to protect btrfs from panic, when it comes to those BUG_ON errors, 
the interface forces btrfs readonly and saves the FS state to disk. And the 
filesystem can be umounted, although mabye with some warning in kernel dmesg.
Then btrfsck is helpful to recover btrfs.

v1-v2:
move write super stuff from error handle path to unmount in order to avoid
deadlock.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h |8 +
 fs/btrfs/super.c |   88 ++
 2 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 92b5ca2..fc9b6a0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2552,6 +2552,14 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char 
*buffer, size_t size);
 /* super.c */
 int btrfs_parse_options(struct btrfs_root *root, char *options);
 int btrfs_sync_fs(struct super_block *sb, int wait);
+void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
+unsigned int line, int errno);
+
+#define btrfs_std_error(fs_info, errno)\
+do {   \
+   if ((errno))\
+   __btrfs_std_error((fs_info), __func__, __LINE__, (errno));\
+} while (0)
 
 /* acl.c */
 #ifdef CONFIG_BTRFS_FS_POSIX_ACL
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 718b10d..07c58f9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -54,6 +54,94 @@
 
 static const struct super_operations btrfs_super_ops;
 
+static const char *btrfs_decode_error(struct btrfs_fs_info *fs_info, int errno,
+ char nbuf[16])
+{
+   char *errstr = NULL;
+
+   switch (errno) {
+   case -EIO:
+   errstr = IO failure;
+   break;
+   case -ENOMEM:
+   errstr = Out of memory;
+   break;
+   case -EROFS:
+   errstr = Readonly filesystem;
+   break;
+   default:
+   if (nbuf) {
+   if (snprintf(nbuf, 16, error %d, -errno) = 0)
+   errstr = nbuf;
+   }
+   break;
+   }
+
+   return errstr;
+}
+
+static void __save_error_info(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_super_block *disk_super = fs_info-super_copy;
+
+   fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR;
+   disk_super-flags |= cpu_to_le64(BTRFS_SUPER_FLAG_ERROR);
+
+   mutex_lock(fs_info-trans_mutex);
+   memcpy(fs_info-super_for_commit, disk_super,
+  sizeof(fs_info-super_for_commit));
+   mutex_unlock(fs_info-trans_mutex);
+}
+
+/* NOTE:
+ * We move write_super stuff at umount in order to avoid deadlock
+ * for umount hold all lock.
+ */
+static void save_error_info(struct btrfs_fs_info *fs_info)
+{
+   __save_error_info(fs_info);
+}
+
+/* btrfs handle error by forcing the filesystem readonly */
+static void btrfs_handle_error(struct btrfs_fs_info *fs_info)
+{
+   struct super_block *sb = fs_info-sb;
+
+   if (sb-s_flags  MS_RDONLY)
+   return;
+
+   if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
+   sb-s_flags |= MS_RDONLY;
+   printk(KERN_INFO btrfs is forced readonly\n);
+   }
+}
+
+/*
+ * __btrfs_std_error decodes expected errors from the caller and
+ * invokes the approciate error response.
+ */
+void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
+unsigned int line, int errno)
+{
+   struct super_block *sb = fs_info-sb;
+   char nbuf[16];
+   const char *errstr;
+
+   /*
+* Special case: if the error is EROFS, and we're already
+* under MS_RDONLY, then it is safe here.
+*/
+   if (errno == -EROFS  (sb-s_flags  MS_RDONLY))
+   return;
+
+   errstr = btrfs_decode_error(fs_info, errno, nbuf);
+   printk(KERN_CRIT BTRFS error (device %s) in %s:%d: %s\n,
+   sb-s_id, function, line, errstr);
+   save_error_info(fs_info);
+
+   btrfs_handle_error(fs_info);
+}
+
 static void btrfs_put_super(struct super_block *sb)
 {
struct btrfs_root *root = btrfs_sb(sb);
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly

2010-12-03 Thread liubo
When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY
at start transaction time.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/transaction.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 1fffbc0..14a597d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
struct btrfs_trans_handle *h;
struct btrfs_transaction *cur_trans;
int ret;
+
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return ERR_PTR(-EROFS);
 again:
h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
if (!h)
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/5 v3] Btrfs: add filesystem state for error handle

2010-12-03 Thread liubo
Add filesystem state and a flags to tell if the filesystem is 
valid or insane now.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8db9234..92b5ca2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -294,6 +294,14 @@ static inline unsigned long btrfs_chunk_item_size(int 
num_stripes)
 #define BTRFS_FSID_SIZE 16
 #define BTRFS_HEADER_FLAG_WRITTEN  (1ULL  0)
 #define BTRFS_HEADER_FLAG_RELOC(1ULL  1)
+
+/*
+ * File system states
+ */
+
+/* Errors detected */
+#define BTRFS_SUPER_FLAG_ERROR (1ULL  2)
+
 #define BTRFS_SUPER_FLAG_SEEDING   (1ULL  32)
 #define BTRFS_SUPER_FLAG_METADUMP  (1ULL  33)
 
@@ -1050,6 +1058,9 @@ struct btrfs_fs_info {
unsigned metadata_ratio;
 
void *bdev_holder;
+
+   /* filesystem state */
+   u64 fs_state;
 };
 
 /*
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/5 v3] Btrfs: deal with filesystem state at mount, umount

2010-12-03 Thread liubo
Since there is a filesystem state, we should deal with it carefully at mount,
umount and remount.

- At mount, the FS state should be checked if there is error on these FS.
  If it does have, btrfsck is recommended.
- At umount, the FS state should be saved into disk for consistency.

v2-v3:
do write super stuff at umount time.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |   47 ++-
 1 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b40dfe4..15d795a 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -43,6 +43,8 @@
 static struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
+static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+   int read_only);
 
 /*
  * end_io_wq structs are used to do processing in task context when an IO is
@@ -1700,6 +1702,11 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (!btrfs_super_root(disk_super))
goto fail_iput;
 
+   /* check filesystem state */
+   fs_info-fs_state |= btrfs_super_flags(disk_super);
+
+   btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
+
ret = btrfs_parse_options(tree_root, options);
if (ret) {
err = ret;
@@ -2405,10 +2412,17 @@ int btrfs_commit_super(struct btrfs_root *root)
up_write(root-fs_info-cleanup_work_sem);
 
trans = btrfs_join_transaction(root, 1);
+   if (IS_ERR(trans))
+   return PTR_ERR(trans);
+
ret = btrfs_commit_transaction(trans, root);
BUG_ON(ret);
+
/* run commit again to drop the original snapshot */
trans = btrfs_join_transaction(root, 1);
+   if (IS_ERR(trans))
+   return PTR_ERR(trans);
+
btrfs_commit_transaction(trans, root);
ret = btrfs_write_and_wait_transaction(NULL, root);
BUG_ON(ret);
@@ -2426,8 +2440,28 @@ int close_ctree(struct btrfs_root *root)
smp_mb();
 
btrfs_put_block_group_cache(fs_info);
+
+   /*
+* Here come 2 situations when btrfs flips readonly:
+*
+* 1. when btrfs flips readonly somewhere else before
+* btrfs_commit_super, sb-s_flags has MS_RDONLY flag,
+* and btrfs will skip to write sb directly to keep
+* ERROR state on disk.
+*
+* 2. when btrfs flips readonly just in btrfs_commit_super,
+* and in such case, btrfs cannnot write sb via btrfs_commit_super,
+* and since fs_state has been set BTRFS_SUPER_FLAG_ERROR flag,
+* btrfs will directly write sb.
+*/
if (!(fs_info-sb-s_flags  MS_RDONLY)) {
-   ret =  btrfs_commit_super(root);
+   ret = btrfs_commit_super(root);
+   if (ret)
+   printk(KERN_ERR btrfs: commit super ret %d\n, ret);
+   }
+
+   if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
+   ret = write_ctree_super(NULL, root, 0);
if (ret)
printk(KERN_ERR btrfs: commit super ret %d\n, ret);
}
@@ -2603,6 +2637,17 @@ out:
return 0;
 }
 
+static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+ int read_only)
+{
+   if (read_only)
+   return;
+
+   if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
+   printk(KERN_WARNING warning: mount fs with errors, 
+  running btrfsck is recommended\n);
+}
+
 static struct extent_io_ops btree_extent_io_ops = {
.write_cache_pages_lock_hook = btree_lock_page_hook,
.readpage_end_io_hook = btree_readpage_end_io_hook,
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 5/5 v3] Btrfs: avoid log replay when btrfs is insane

2010-12-03 Thread liubo
btrfs may do log replay even as mounted readonly, since we have added
readonly check at start transaction time, in order to keep the original
attribute, it needs to set and to restore readonly flags around log
replay.
However, we do not permit log replay when btrfs is insane, and log replay
can start once btrfs is mounted in good state.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 15d795a..727e156 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1937,9 +1937,14 @@ struct btrfs_root *open_ctree(struct super_block *sb,
btrfs_set_opt(fs_info-mount_opt, SSD);
}
 
-   if (btrfs_super_log_root(disk_super) != 0) {
+   if (btrfs_super_log_root(disk_super) != 0 
+   !(fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)) {
u64 bytenr = btrfs_super_log_root(disk_super);
 
+   unsigned int s_flags = sb-s_flags;
+   if (s_flags  MS_RDONLY)
+   sb-s_flags = ~MS_RDONLY;
+
if (fs_devices-rw_devices == 0) {
printk(KERN_WARNING Btrfs log replay required 
   on RO media\n);
@@ -1969,6 +1974,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
ret =  btrfs_commit_super(tree_root);
BUG_ON(ret);
}
+
+   sb-s_flags = s_flags;
}
 
ret = btrfs_find_orphan_roots(tree_root);
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/4 v2] Btrfs: Add readonly support to replace BUG_ON phrase

2010-12-01 Thread liubo
Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic.
Meanwhile, they are very ugly and should be handled more propriately.

There are mainly two ways to deal with these BUG_ON()s.

1. For those errors which can be handled well by callers, we just return their
error number to callers.

2. For others, We can force the filesystem readonly when it hits errors, which
 is what this patchset has done. Replaced BUG_ON() with the interface provided
 in this patchset, we will get error infomation via dmesg. Since btrfs is now 
readonly, we can save our data safely and umount it, then a btrfsck is 
recommended.

By these ways, we can protect our filesystem from panic caused by those 
BUG_ONs.

We still need a incompat flag to make old kernels happy.

v1-v2:
- in order to avoid deadlock thing, move write super stuff from error handle
  path to umount time.
- remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it
  simple.
- add MS_RDONLY check at start of a transaction instead of commit transaction.

---
 fs/btrfs/ctree.h   |   19 ++
 fs/btrfs/disk-io.c |   47 +-
 fs/btrfs/super.c   |   88 
 fs/btrfs/transaction.c |3 ++
 4 files changed, 156 insertions(+), 1 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/4 v2] Btrfs: add filesystem state for error handle

2010-12-01 Thread liubo
Add filesystem state and a flags to tell if the filesystem is 
valid or insane now.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8db9234..92b5ca2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -294,6 +294,14 @@ static inline unsigned long btrfs_chunk_item_size(int 
num_stripes)
 #define BTRFS_FSID_SIZE 16
 #define BTRFS_HEADER_FLAG_WRITTEN  (1ULL  0)
 #define BTRFS_HEADER_FLAG_RELOC(1ULL  1)
+
+/*
+ * File system states
+ */
+
+/* Errors detected */
+#define BTRFS_SUPER_FLAG_ERROR (1ULL  2)
+
 #define BTRFS_SUPER_FLAG_SEEDING   (1ULL  32)
 #define BTRFS_SUPER_FLAG_METADUMP  (1ULL  33)
 
@@ -1050,6 +1058,9 @@ struct btrfs_fs_info {
unsigned metadata_ratio;
 
void *bdev_holder;
+
+   /* filesystem state */
+   u64 fs_state;
 };
 
 /*
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly

2010-12-01 Thread liubo
When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY 
at 
start transaction time.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/transaction.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 1fffbc0..14a597d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
struct btrfs_trans_handle *h;
struct btrfs_transaction *cur_trans;
int ret;
+
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return ERR_PTR(-EROFS);
 again:
h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
if (!h)
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/4 v2] Btrfs: add readonly support for error handle

2010-12-01 Thread liubo
This patch provide a new error handle interface for those errors that handled
 by current BUG_ONs.

In order to protect btrfs from panic, when it comes to those BUG_ON errors, 
the interface forces btrfs readonly and saves the FS state to disk. And the 
filesystem can be umounted, although mabye with some warning in kernel dmesg.
Then btrfsck is helpful to recover btrfs.

v1-v2:
move write super stuff from error handle path to unmount in order to avoid
deadlock.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/ctree.h |8 +
 fs/btrfs/super.c |   88 ++
 2 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 92b5ca2..fc9b6a0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2552,6 +2552,14 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char 
*buffer, size_t size);
 /* super.c */
 int btrfs_parse_options(struct btrfs_root *root, char *options);
 int btrfs_sync_fs(struct super_block *sb, int wait);
+void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
+unsigned int line, int errno);
+
+#define btrfs_std_error(fs_info, errno)\
+do {   \
+   if ((errno))\
+   __btrfs_std_error((fs_info), __func__, __LINE__, (errno));\
+} while (0)
 
 /* acl.c */
 #ifdef CONFIG_BTRFS_FS_POSIX_ACL
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 718b10d..07c58f9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -54,6 +54,94 @@
 
 static const struct super_operations btrfs_super_ops;
 
+static const char *btrfs_decode_error(struct btrfs_fs_info *fs_info, int errno,
+ char nbuf[16])
+{
+   char *errstr = NULL;
+
+   switch (errno) {
+   case -EIO:
+   errstr = IO failure;
+   break;
+   case -ENOMEM:
+   errstr = Out of memory;
+   break;
+   case -EROFS:
+   errstr = Readonly filesystem;
+   break;
+   default:
+   if (nbuf) {
+   if (snprintf(nbuf, 16, error %d, -errno) = 0)
+   errstr = nbuf;
+   }
+   break;
+   }
+
+   return errstr;
+}
+
+static void __save_error_info(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_super_block *disk_super = fs_info-super_copy;
+
+   fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR;
+   disk_super-flags |= cpu_to_le64(BTRFS_SUPER_FLAG_ERROR);
+
+   mutex_lock(fs_info-trans_mutex);
+   memcpy(fs_info-super_for_commit, disk_super,
+  sizeof(fs_info-super_for_commit));
+   mutex_unlock(fs_info-trans_mutex);
+}
+
+/* NOTE:
+ * We move write_super stuff at umount in order to avoid deadlock
+ * for umount hold all lock.
+ */
+static void save_error_info(struct btrfs_fs_info *fs_info)
+{
+   __save_error_info(fs_info);
+}
+
+/* btrfs handle error by forcing the filesystem readonly */
+static void btrfs_handle_error(struct btrfs_fs_info *fs_info)
+{
+   struct super_block *sb = fs_info-sb;
+
+   if (sb-s_flags  MS_RDONLY)
+   return;
+
+   if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
+   sb-s_flags |= MS_RDONLY;
+   printk(KERN_INFO btrfs is forced readonly\n);
+   }
+}
+
+/*
+ * __btrfs_std_error decodes expected errors from the caller and
+ * invokes the approciate error response.
+ */
+void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function,
+unsigned int line, int errno)
+{
+   struct super_block *sb = fs_info-sb;
+   char nbuf[16];
+   const char *errstr;
+
+   /*
+* Special case: if the error is EROFS, and we're already
+* under MS_RDONLY, then it is safe here.
+*/
+   if (errno == -EROFS  (sb-s_flags  MS_RDONLY))
+   return;
+
+   errstr = btrfs_decode_error(fs_info, errno, nbuf);
+   printk(KERN_CRIT BTRFS error (device %s) in %s:%d: %s\n,
+   sb-s_id, function, line, errstr);
+   save_error_info(fs_info);
+
+   btrfs_handle_error(fs_info);
+}
+
 static void btrfs_put_super(struct super_block *sb)
 {
struct btrfs_root *root = btrfs_sb(sb);
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4 v2] Btrfs: deal with filesystem state at mount, umount

2010-12-01 Thread liubo
Since there is a filesystem state, we should deal with it carefully at mount,
 umount and remount.

- At mount, the FS state should be checked if there is error on these FS.
  If it does have, btrfsck is recommended.
- At umount, the FS state should be saved into disk for consistency.

Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c |   47 ++-
 1 files changed, 46 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b40dfe4..663d360 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -43,6 +43,8 @@
 static struct extent_io_ops btree_extent_io_ops;
 static void end_workqueue_fn(struct btrfs_work *work);
 static void free_fs_root(struct btrfs_root *root);
+static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+int read_only);
 
 /*
  * end_io_wq structs are used to do processing in task context when an IO is
@@ -1700,6 +1702,11 @@ struct btrfs_root *open_ctree(struct super_block *sb,
if (!btrfs_super_root(disk_super))
goto fail_iput;
 
+   /* check filesystem state */
+   fs_info-fs_state |= btrfs_super_flags(disk_super);
+
+   btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
+
ret = btrfs_parse_options(tree_root, options);
if (ret) {
err = ret;
@@ -2405,10 +2412,17 @@ int btrfs_commit_super(struct btrfs_root *root)
up_write(root-fs_info-cleanup_work_sem);
 
trans = btrfs_join_transaction(root, 1);
+   if (IS_ERR(trans))
+   return PTR_ERR(trans);
+
ret = btrfs_commit_transaction(trans, root);
BUG_ON(ret);
+
/* run commit again to drop the original snapshot */
trans = btrfs_join_transaction(root, 1);
+   if (IS_ERR(trans))
+   return PTR_ERR(trans);
+
btrfs_commit_transaction(trans, root);
ret = btrfs_write_and_wait_transaction(NULL, root);
BUG_ON(ret);
@@ -2426,8 +2440,28 @@ int close_ctree(struct btrfs_root *root)
smp_mb();
 
btrfs_put_block_group_cache(fs_info);
+
+   /*
+* Here come 2 situations when btrfs flips readonly:
+*
+* 1. when btrfs flips readonly somewhere else before
+* btrfs_commit_super, sb-s_flags has MS_RDONLY flag,
+* and btrfs will skip to write sb directly to keep
+* ERROR state on disk.
+*
+* 2. when btrfs flips readonly just in btrfs_commit_super,
+* and in such case, btrfs cannnot write sb via btrfs_commit_super,
+* and since fs_state has been set BTRFS_SUPER_FLAG_ERROR flag,
+* btrfs will directly write sb.
+*/
if (!(fs_info-sb-s_flags  MS_RDONLY)) {
-   ret =  btrfs_commit_super(root);
+   ret = btrfs_commit_super(root);
+   if (ret)
+   printk(KERN_ERR btrfs: commit super ret %d\n, ret);
+   }
+
+   if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
+   ret = write_ctree_super(NULL, root, 0);
if (ret)
printk(KERN_ERR btrfs: commit super ret %d\n, ret);
}
@@ -2603,6 +2637,17 @@ out:
return 0;
 }
 
+static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
+ int read_only)
+{
+   if (read_only)
+   return;
+
+   if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
+   printk(KERN_WARNING warning: mount fs with errors, 
+  running btfsck is recommended\n);
+}
+
 static struct extent_io_ops btree_extent_io_ops = {
.write_cache_pages_lock_hook = btree_lock_page_hook,
.readpage_end_io_hook = btree_readpage_end_io_hook,
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 4/4 v2] Btrfs: deal with filesystem state at mount, umount

2010-12-01 Thread liubo
On 12/02/2010 10:29 AM, Tsutomu Itoh wrote:
 Hi,
 
 I found 1 typo.
 
 (2010/12/01 19:21), liubo wrote:
 Since there is a filesystem state, we should deal with it carefully at mount,
  umount and remount.

 - At mount, the FS state should be checked if there is error on these FS.
   If it does have, btrfsck is recommended.
 - At umount, the FS state should be saved into disk for consistency.

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/disk-io.c |   47 ++-
  1 files changed, 46 insertions(+), 1 deletions(-)

 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index b40dfe4..663d360 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -43,6 +43,8 @@
  static struct extent_io_ops btree_extent_io_ops;
  static void end_workqueue_fn(struct btrfs_work *work);
  static void free_fs_root(struct btrfs_root *root);
 +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
 + int read_only);
  
  /*
   * end_io_wq structs are used to do processing in task context when an IO is
 @@ -1700,6 +1702,11 @@ struct btrfs_root *open_ctree(struct super_block *sb,
  if (!btrfs_super_root(disk_super))
  goto fail_iput;
  
 +/* check filesystem state */
 +fs_info-fs_state |= btrfs_super_flags(disk_super);
 +
 +btrfs_check_super_valid(fs_info, sb-s_flags  MS_RDONLY);
 +
  ret = btrfs_parse_options(tree_root, options);
  if (ret) {
  err = ret;
 @@ -2405,10 +2412,17 @@ int btrfs_commit_super(struct btrfs_root *root)
  up_write(root-fs_info-cleanup_work_sem);
  
  trans = btrfs_join_transaction(root, 1);
 +if (IS_ERR(trans))
 +return PTR_ERR(trans);
 +
  ret = btrfs_commit_transaction(trans, root);
  BUG_ON(ret);
 +
  /* run commit again to drop the original snapshot */
  trans = btrfs_join_transaction(root, 1);
 +if (IS_ERR(trans))
 +return PTR_ERR(trans);
 +
  btrfs_commit_transaction(trans, root);
  ret = btrfs_write_and_wait_transaction(NULL, root);
  BUG_ON(ret);
 @@ -2426,8 +2440,28 @@ int close_ctree(struct btrfs_root *root)
  smp_mb();
  
  btrfs_put_block_group_cache(fs_info);
 +
 +/*
 + * Here come 2 situations when btrfs flips readonly:
 + *
 + * 1. when btrfs flips readonly somewhere else before
 + * btrfs_commit_super, sb-s_flags has MS_RDONLY flag,
 + * and btrfs will skip to write sb directly to keep
 + * ERROR state on disk.
 + *
 + * 2. when btrfs flips readonly just in btrfs_commit_super,
 + * and in such case, btrfs cannnot write sb via btrfs_commit_super,
 + * and since fs_state has been set BTRFS_SUPER_FLAG_ERROR flag,
 + * btrfs will directly write sb.
 + */
  if (!(fs_info-sb-s_flags  MS_RDONLY)) {
 -ret =  btrfs_commit_super(root);
 +ret = btrfs_commit_super(root);
 +if (ret)
 +printk(KERN_ERR btrfs: commit super ret %d\n, ret);
 +}
 +
 +if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
 +ret = write_ctree_super(NULL, root, 0);
  if (ret)
  printk(KERN_ERR btrfs: commit super ret %d\n, ret);
  }
 @@ -2603,6 +2637,17 @@ out:
  return 0;
  }
  
 +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
 +  int read_only)
 +{
 +if (read_only)
 +return;
 +
 +if (fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
 +printk(KERN_WARNING warning: mount fs with errors, 
 +   running btfsck is recommended\n);
 
 btfsck - btrfsck

ahh, my fault, sorry for my carelessness.

Thanks a lot for reviewing.

thanks,
Liu Bo

 
 +}
 +
  static struct extent_io_ops btree_extent_io_ops = {
  .write_cache_pages_lock_hook = btree_lock_page_hook,
  .readpage_end_io_hook = btree_readpage_end_io_hook,
 
 Regards,
 Itoh
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly

2010-12-01 Thread liubo
On 12/01/2010 06:20 PM, liubo wrote:
 When the filesystem is readonly, avoid transaction stuff by checking 
 MS_RDONLY at 
 start transaction time.
 

This patch may lead btrfs panic.

Since btrfs allows transaction under readonly fs state, which is a bit weird, 
btrfs
does not even check the returned transaction from start_transaction, although 
it may
return -ENOMEM. 

With this patch, if btrfs flips readonly or is mounted readonly, to start a 
transaction
will get a -EROFS. So we needs to check transaction more carefully, rather than 
just
leave it alone.

thanks,
Liu Bo

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
  fs/btrfs/transaction.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 1fffbc0..14a597d 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -181,6 +181,9 @@ static struct btrfs_trans_handle 
 *start_transaction(struct btrfs_root *root,
   struct btrfs_trans_handle *h;
   struct btrfs_transaction *cur_trans;
   int ret;
 +
 + if (root-fs_info-sb-s_flags  MS_RDONLY)
 + return ERR_PTR(-EROFS);
  again:
   h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
   if (!h)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly

2010-12-01 Thread liubo
On 12/02/2010 12:28 PM, Yan, Zheng wrote:
 On Thu, Dec 2, 2010 at 11:42 AM, liubo liubo2...@cn.fujitsu.com wrote:
 On 12/01/2010 06:20 PM, liubo wrote:
 When the filesystem is readonly, avoid transaction stuff by checking 
 MS_RDONLY at
 start transaction time.

 This patch may lead btrfs panic.

 Since btrfs allows transaction under readonly fs state, which is a bit 
 weird, btrfs
 does not even check the returned transaction from start_transaction, 
 although it may
 return -ENOMEM.
 
 btrfs may do log replay even mount as readonly.

Yeah, it it right.

log replay maybe does take place when btrfs is mounted as readonly, but after 
the FS is
broken, is btrfs willing to do log replay in such case?

thanks,
Liu Bo

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly

2010-12-01 Thread liubo
On 12/02/2010 01:41 PM, Mike Fedyk wrote:
 On Wed, Dec 1, 2010 at 8:28 PM, Yan, Zheng yanzh...@21cn.com wrote:
 On Thu, Dec 2, 2010 at 11:42 AM, liubo liubo2...@cn.fujitsu.com wrote:
 On 12/01/2010 06:20 PM, liubo wrote:
 When the filesystem is readonly, avoid transaction stuff by checking 
 MS_RDONLY at
 start transaction time.

 This patch may lead btrfs panic.

 Since btrfs allows transaction under readonly fs state, which is a bit 
 weird, btrfs
 does not even check the returned transaction from start_transaction, 
 although it may
 return -ENOMEM.
 btrfs may do log replay even mount as readonly.

 
 What part is logged besides tree roots and/or superblocks?

log tree is used for log replay after crash and fast fsync and O_SYNC, it logs
inodes.

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase

2010-11-29 Thread liubo
On 11/30/2010 04:10 AM, Josef Bacik wrote:
 On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote:
 Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic.
 Meanwhile, they are very ugly and should be handled more propriately.

 There are mainly two ways to deal with these BUG_ON()s.

 1. For those errors which can be handled well by callers, we just return 
 their
 error number to callers.

 2. For others, We can force the filesystem readonly when it hits errors, 
 which
  is what this patchset has done. Replaced BUG_ON() with the interface 
 provided
  in this patchset, we will get error infomation via dmesg. Since btrfs is 
 now 
 readonly, we can save our data safely and umount it, then a btrfsck is 
 recommended.

 By these ways, we can protect our filesystem from panic caused by those 
 BUG_ONs.

 ---
  fs/btrfs/ctree.h   |   21 ++
  fs/btrfs/disk-io.c |   23 +++
  fs/btrfs/super.c   |  100 
 ++-
  fs/btrfs/transaction.c |7 +++
  4 files changed, 148 insertions(+), 3 deletions(-)

 
 Overall seems sane, but what about kernels that don't make these checks?  I'm 
 ok
 with well sucks for them as an answer, just want to make sure we've at least
 though about it.

You mean those code that does nothing on ret-checks?

IMO, if the code really needs ret-check, we should deal with them seriously, or 
just
leave it alone. And this is a step-by-step job.

 
 Also I'm not sure marking the fs as broken is the right move here.  Ext3/4 
 don't
 do this, they just mount read-only, as long as you can still unmount the
 filesystem everything comes out ok.  Think of the case where we just get a
 spurious EIO, the fs should be fine the next time around, there's reason to
 force the user to run fsck in this case.
 

Yes, I agree on this.
For spurious EIO, it mainly depends on coders, returning the errno to caller 
may work on 
bypassing fsck.

While I'm working on this readonly stuff, it is difficult to solve the 
potential 
deadlock when we write the super block to disk. 
Since btrfs supports multi-device, before write-super, we must get the device 
lock 
device_list_mutex first, and this has puzzled me a lot.

BTW, I've tried another way to bypass deadlock. I made the write-super stuff 
into umount, 
which can make us free from deadlock, however, while testing this, it seemes 
that umount 
cannot work due to a ext3/4 jbd oops, I'm digging on this oops...

So, any ideas about free from deadlock?

 Thanks,
 
 Josef
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase

2010-11-29 Thread liubo
On 11/30/2010 10:30 AM, Josef Bacik wrote:
 On Tue, Nov 30, 2010 at 10:03:58AM +0800, liubo wrote:
 On 11/30/2010 04:10 AM, Josef Bacik wrote:
 On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote:
 Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic.
 Meanwhile, they are very ugly and should be handled more propriately.

 There are mainly two ways to deal with these BUG_ON()s.

 1. For those errors which can be handled well by callers, we just return 
 their
 error number to callers.

 2. For others, We can force the filesystem readonly when it hits errors, 
 which
  is what this patchset has done. Replaced BUG_ON() with the interface 
 provided
  in this patchset, we will get error infomation via dmesg. Since btrfs is 
 now 
 readonly, we can save our data safely and umount it, then a btrfsck is 
 recommended.

 By these ways, we can protect our filesystem from panic caused by those 
 BUG_ONs.

 ---
  fs/btrfs/ctree.h   |   21 ++
  fs/btrfs/disk-io.c |   23 +++
  fs/btrfs/super.c   |  100 
 ++-
  fs/btrfs/transaction.c |7 +++
  4 files changed, 148 insertions(+), 3 deletions(-)

 Overall seems sane, but what about kernels that don't make these checks?  
 I'm ok
 with well sucks for them as an answer, just want to make sure we've at 
 least
 though about it.
 You mean those code that does nothing on ret-checks?

 IMO, if the code really needs ret-check, we should deal with them seriously, 
 or just
 leave it alone. And this is a step-by-step job.

 
 Sorry I mean for older kernels that don't know about these hey your fs is
 screwed flags.  It seems like they'll just get ignored, are we sure thats 
 what
 we want to happen?  I'm fine with that, but if we don't want that to happen it
 may be good to have a incompat flag.
 

Ohh, got it, thanks for pointing it out. Will do it later.

 Also I'm not sure marking the fs as broken is the right move here.  Ext3/4 
 don't
 do this, they just mount read-only, as long as you can still unmount the
 filesystem everything comes out ok.  Think of the case where we just get a
 spurious EIO, the fs should be fine the next time around, there's reason to
 force the user to run fsck in this case.

 Yes, I agree on this.
 For spurious EIO, it mainly depends on coders, returning the errno to caller 
 may work on 
 bypassing fsck.

 
 Right I'm worried about the flipping read only stuff being kicked by EIO, 
 which
 happens with ext* and could happen with btrfs in the right cases.  I'm not
 saying thats wrong, its what should happen, I'm just saying we need to be able
 to unmount the filesystem and mount it back up without needing to run an fsck 
 in
 between.
 

hm, this really makes sense. 

Since it is difficult to tell whether a fake corruption it is, what about just 
implementing readonly stuff like this and making it more friendly to EIO in 
future?

 While I'm working on this readonly stuff, it is difficult to solve the 
 potential 
 deadlock when we write the super block to disk. 
 Since btrfs supports multi-device, before write-super, we must get the 
 device lock 
 device_list_mutex first, and this has puzzled me a lot.

 BTW, I've tried another way to bypass deadlock. I made the write-super stuff 
 into umount, 
 which can make us free from deadlock, however, while testing this, it seemes 
 that umount 
 cannot work due to a ext3/4 jbd oops, I'm digging on this oops...

 So, any ideas about free from deadlock?

 
 None :).  The best thing I can think of is do like we're doing with the read
 only stuff and only write out the super right before we flip read only, and 
 then
 make umount make sure that if we're mounted read only to not do anything.
 
 Truth be told I hate this mark the fs as broken idea.  We don't know if the
 error we got means the filesystem is broken (for example the EIO case).  If we
 do hit actual corruption maybe it would be good, and in that case we should
 write out the super at the point we find that corruption and then flip read 
 only
 and have that be the only time we have to worry about writing out the super.
 
 So I guess that's 2 options
 
 1) Ditch the the fs is broken flag.  This makes things nice and simple since
 on-disk is already consistent, all we have to do is drop anything thats dirty
 and we're home free.
 
 2) Keep the flag, but only worry about writing it out on a case by case basis.
 So we have a btrfs_corrupt_fs() function that writes out the super with the
 appropriate flag, and then flips the fs read only.  Then we don't have to do
 anything special in the common paths, just the normal hey is this fs read
 only? things, so for all other cases we can just flip the fs read only and
 everything works.
 

The 2) is what I have just done. :)

 I hope that makes sense, if not feel free to ignore me and just keep doing 
 what
 you've been doing :).  Thanks,
 

They are very helpful.

Thanks,
Liu Bo

 Josef
 --
 To unsubscribe

  1   2   >