Re: [PATCH] Btrfs: use do_div to avoid compile errors on 32bit box
On 08/19/2011 09:22 PM, Josef Bacik wrote: On Fri, Aug 19, 2011 at 05:48:44PM +0800, Liu Bo wrote: When doing div operation of u64 type, we need to be careful and use do_div to avoid compile ERROR on 32bit box: ERROR: __udivdi3 [fs/btrfs/btrfs.ko] undefined! make[1]: *** [__modpost] Error 1 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com Chris just left for vacation, can you send this to Linus/lkml so it gets pulled in. Thanks, Already done. thanks, liubo Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use do_div to avoid compile errors on 32bit box
On 08/20/2011 09:34 AM, Liu Bo wrote: When doing div operation of u64 type, we need to be careful and use do_div to avoid compile ERROR on 32bit box: ERROR: __udivdi3 [fs/btrfs/btrfs.ko] undefined! make[1]: *** [__modpost] Error 1 Sorry, guys, I just sent a wrong version. Plz ignore this one. I'm sorry. thanks, liubo Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 80d6148..9b495ce 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6796,14 +6796,14 @@ int btrfs_can_relocate(struct btrfs_root *root, u64 bytenr) index = get_block_group_index(block_group); if (index == 0) { dev_min = 4; - min_free /= 2; + do_div(min_free, 2); } else if (index == 1) { dev_min = 2; } else if (index == 2) { min_free *= 2; } else if (index == 3) { dev_min = fs_devices-rw_devices; - min_free /= dev_min; + do_div(min_free, dev_min); } mutex_lock(root-fs_info-chunk_mutex); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix an oops of log replay
On 08/08/2011 11:13 PM, Andy Lutomirski wrote: On 08/06/2011 04:35 AM, Liu Bo wrote: When btrfs recovers from a crash, it may hit the oops below: [ cut here ] kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[a03df251] [a03df251] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [a03e7b31] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [a04054e9] add_inode_ref+0x319/0x3f0 [btrfs] [a0407087] replay_one_buffer+0x2c7/0x390 [btrfs] [a040444a] walk_down_log_tree+0x32a/0x480 [btrfs] [a0404695] walk_log_tree+0xf5/0x240 [btrfs] [a0406cc0] btrfs_recover_log_trees+0x250/0x350 [btrfs] [a0406dc0] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [a03d18b2] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: Liu Boliubo2...@cn.fujitsu.com --- fs/btrfs/tree-log.c | 28 1 files changed, 24 insertions(+), 4 deletions(-) This fixes the oops for me. The bug was a regression in 2.6.39, I believe. Tested-by: Andy Lutomirski l...@mit.edu Thanks a lot for testing! thanks, liubo --Andy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: skip looking for delalloc if we don't have -fill_delalloc
On 08/02/2011 12:11 AM, Josef Bacik wrote: We always look for delalloc bytes in our io_tree so we can fill in delalloc. This is fine in most cases, but if we're writing out the btree_inode this is just a superfluous tree search on the io_tree, and if we have a lot of metadata dirty this could be an expensive check. So instead check to see if our io_tree has a -fill_delalloc op, and if not don't even bother doing the lookup. Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- With the patch, mkfs.btrfs /dev/sda15 mount /dev/sda15 /mnt/btrfs dd if=/dev/zero of=/mnt/btrfs/tmp bs=1G then it comes the following bug: Btrfs loaded device fsid 91d23288-d352-4346-979f-d6f93cac04a3 devid 1 transid 7 /dev/sda15 [ cut here ] kernel BUG at fs/btrfs/inode.c:1583! ... Call Trace: [a05b00d8] worker_loop+0x138/0x510 [btrfs] [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs] [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs] [81074f06] kthread+0x96/0xa0 [81467bf4] kernel_thread_helper+0x4/0x10 [81074e70] ? kthread_worker_fn+0x1a0/0x1a0 [81467bf0] ? gs_change+0xb/0xb Code: e0 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 7d b8 48 8d 4d c8 41 b8 50 00 00 00 4c 89 fa 4c 89 e6 e8 19 cf 01 00 eb bd 0f 0b eb fe 48 89 df e8 1b 48 b6 e0 eb 9d 66 0f 1f 84 00 00 00 RIP [a0587f59] btrfs_writepage_fixup_worker+0x139/0x150 [btrfs] RSP 88000887bdd0 ---[ end trace 5089b598ce74fcfc ]--- thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: skip looking for delalloc if we don't have -fill_delalloc
On 08/02/2011 09:32 AM, liubo wrote: On 08/02/2011 12:11 AM, Josef Bacik wrote: We always look for delalloc bytes in our io_tree so we can fill in delalloc. This is fine in most cases, but if we're writing out the btree_inode this is just a superfluous tree search on the io_tree, and if we have a lot of metadata dirty this could be an expensive check. So instead check to see if our io_tree has a -fill_delalloc op, and if not don't even bother doing the lookup. Thanks, Signed-off-by: Josef Bacik jo...@redhat.com --- sorry, I mixed the patch with others... The patch is ok. With the patch, mkfs.btrfs /dev/sda15 mount /dev/sda15 /mnt/btrfs dd if=/dev/zero of=/mnt/btrfs/tmp bs=1G then it comes the following bug: Btrfs loaded device fsid 91d23288-d352-4346-979f-d6f93cac04a3 devid 1 transid 7 /dev/sda15 [ cut here ] kernel BUG at fs/btrfs/inode.c:1583! ... Call Trace: [a05b00d8] worker_loop+0x138/0x510 [btrfs] [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs] [a05affa0] ? btrfs_queue_worker+0x2d0/0x2d0 [btrfs] [81074f06] kthread+0x96/0xa0 [81467bf4] kernel_thread_helper+0x4/0x10 [81074e70] ? kthread_worker_fn+0x1a0/0x1a0 [81467bf0] ? gs_change+0xb/0xb Code: e0 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8b 7d b8 48 8d 4d c8 41 b8 50 00 00 00 4c 89 fa 4c 89 e6 e8 19 cf 01 00 eb bd 0f 0b eb fe 48 89 df e8 1b 48 b6 e0 eb 9d 66 0f 1f 84 00 00 00 RIP [a0587f59] btrfs_writepage_fixup_worker+0x139/0x150 [btrfs] RSP 88000887bdd0 ---[ end trace 5089b598ce74fcfc ]--- thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: don't be as agressive with delalloc metadata reservations
On 07/16/2011 02:29 AM, Josef Bacik wrote: Currently we reserve enough space to COW an entirely full btree for every extent we have reserved for an inode. This _sucks_, because you only need to COW once, and then everybody else is ok. Unfortunately we don't know we'll all be able to get into the same transaction so that's what we have had to do. But the global reserve holds a reservation large enough to cover a large percentage of all the metadata currently in the fs. So all we really need to account for is any new blocks that we may allocate. So fix this by 1) Passing to btrfs_alloc_free_block() wether this is a new block or a COW block. If it is a COW block we use the global reserve, if not we use the trans-block_rsv. 2) Reduce the amount of space we reserve. Since we don't need to account for cow'ing the tree we can just keep track of new blocks to reserve, which greatly reduces the reservation amount. This makes my basic random write test go from 3 mb/s to 75 mb/s. I've tested this with my horrible ENOSPC test and it seems to work out fine. Thanks, Hi, Josef, After I patched this and did a tar xf source.tar, I got lots of warnings, Would you like to look into this? [ cut here ] WARNING: at fs/btrfs/extent-tree.c:5695 btrfs_alloc_free_block+0x178/0x340 [btrfs]() Hardware name: QiTianM7150 Modules linked in: btrfs iptable_nat nf_nat zlib_deflate libcrc32c ebtable_nat ebtables bridge stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ext3 jbd dm_mirror dm_region_hash dm_log dm_mod sg ppdev serio_raw pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support sky2 parport_pc parport ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: btrfs] Pid: 16008, comm: umount Tainted: GW 2.6.39+ #9 Call Trace: [81053baf] warn_slowpath_common+0x7f/0xc0 [81053c0a] warn_slowpath_null+0x1a/0x20 [a04d37d8] btrfs_alloc_free_block+0x178/0x340 [btrfs] [a0501768] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [a04be625] __btrfs_cow_block+0x155/0x5f0 [btrfs] [a04bebcb] btrfs_cow_block+0x10b/0x240 [btrfs] [a04c4c8e] btrfs_search_slot+0x49e/0x7a0 [btrfs] [a04d2399] btrfs_write_dirty_block_groups+0x1a9/0x4d0 [btrfs] [a0512e20] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [a04df845] commit_cowonly_roots+0x105/0x1e0 [btrfs] [a04e0708] btrfs_commit_transaction+0x428/0x850 [btrfs] [a04df9b8] ? wait_current_trans+0x28/0x100 [btrfs] [a04e0c25] ? join_transaction+0x25/0x250 [btrfs] [81075590] ? wake_up_bit+0x40/0x40 [a04bb187] btrfs_sync_fs+0x67/0xd0 [btrfs] [8116c27e] __sync_filesystem+0x5e/0x90 [8116c38b] sync_filesystem+0x4b/0x70 [811441c4] generic_shutdown_super+0x34/0xf0 [81144316] kill_anon_super+0x16/0x60 [81144a25] deactivate_locked_super+0x45/0x70 [8114568a] deactivate_super+0x4a/0x70 [8115efdc] mntput_no_expire+0x13c/0x1c0 [8115f7bb] sys_umount+0x7b/0x3a0 [81466b2b] system_call_fastpath+0x16/0x1b ---[ end trace 9a65800674b03b84 ]--- thanks, liubo Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/ctree.c | 10 +- fs/btrfs/ctree.h |5 ++--- fs/btrfs/disk-io.c |3 ++- fs/btrfs/extent-tree.c | 20 +++- fs/btrfs/ioctl.c |2 +- 5 files changed, 25 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 2e66786..fbd48e9 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -206,7 +206,7 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, cow = btrfs_alloc_free_block(trans, root, buf-len, 0, new_root_objectid, disk_key, level, - buf-start, 0); + buf-start, 0, 1); if (IS_ERR(cow)) return PTR_ERR(cow); @@ -412,7 +412,7 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans, cow = btrfs_alloc_free_block(trans, root, buf-len, parent_start, root-root_key.objectid, disk_key, - level, search_start, empty_size); + level, search_start, empty_size, 0); if (IS_ERR(cow)) return PTR_ERR(cow); @@ -1985,7 +1985,7 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans, c = btrfs_alloc_free_block(trans, root, root-nodesize, 0, root-root_key.objectid, lower_key, -level, root-node-start, 0); +level, root-node-start, 0, 1
Re: [GIT PULL v4] Btrfs: improve write ahead log with sub transaction
On 06/30/2011 03:36 PM, Liu Bo wrote: I've been working to try to improve the write-ahead log's performance, and I found that the bottleneck addresses in the checksum items, especially when we want to make a random write on a large file, e.g a 4G file. Then a idea for this suggested by Chris is to use sub transaction ids and just to log the part of inode that had changed since either the last log commit or the last transaction commit. And as we also push the sub transid into the btree blocks, we'll get much faster tree walks. As a result, we abandon the original brute force approach, which is to delete all items of the inode in log, to making sure we get the most uptodate copies of everything, and instead we manage to find and merge, i.e. finding extents in the log tree and merging in the new extents from the file. This patchset puts the above idea into code, and although the code is now more complex, it brings us a great deal of performance improvement: This is also available in git://repo.or.cz/linux-btrfs-devel.git sub-trans thanks, liubo in my sysbench write + fsync test: 451.01Kb/sec - 4.3621Mb/sec Also, I've run the synctest, and it works well with both directory and file. v1-v2, rebase. v2-v3, thanks to Chris, we worked together to solve 2 bugs, and after that it worked as expected. v3-v4, thanks to Josef, we simplify several codes. Liu Bo (12): Btrfs: introduce sub transaction stuff Btrfs: update block generation if should_cow_block fails Btrfs: modify btrfs_drop_extents API Btrfs: introduce first sub trans Btrfs: still update inode trans stuff when size remains unchanged Btrfs: improve log with sub transaction Btrfs: add checksum check for log Btrfs: fix a bug of log check Btrfs: kick off useless code Btrfs: use the right generation number to read log_root_tree Btrfs: do not iput inode when inode is still in log Revert Btrfs: do not flush csum items of unchanged file data during treelog fs/btrfs/btrfs_inode.h | 12 ++- fs/btrfs/ctree.c | 69 +++ fs/btrfs/ctree.h |5 +- fs/btrfs/disk-io.c | 12 ++-- fs/btrfs/extent-tree.c | 10 ++- fs/btrfs/file.c| 22 ++--- fs/btrfs/inode.c | 39 ++--- fs/btrfs/ioctl.c |6 +- fs/btrfs/relocation.c |6 +- fs/btrfs/transaction.c | 13 ++- fs/btrfs/transaction.h | 19 - fs/btrfs/tree-defrag.c |2 +- fs/btrfs/tree-log.c| 225 13 files changed, 293 insertions(+), 147 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 v2] Btrfs: kill location key of in-memory inode
ping? On 06/20/2011 10:59 AM, Liu Bo wrote: In btrfs's in-memory inode, there is a btrfs_key which has the structure: - key.objectid - key.type - key.offset however, we only use key.objectid to search, to check or something else, and to reduce in-memory inode size I just keep what is valuable. v1-v2: update a more proper typo for inode number (thanks to David). Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/btrfs_inode.h | 10 -- fs/btrfs/disk-io.c |3 +-- fs/btrfs/export.c |2 +- fs/btrfs/extent-tree.c |2 +- fs/btrfs/inode.c | 48 +--- 5 files changed, 36 insertions(+), 29 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 52d7eca..9f1bbf2 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -29,11 +29,6 @@ struct btrfs_inode { /* which subvolume this inode belongs to */ struct btrfs_root *root; - /* key used to find this inode on disk. This is used by the code - * to read in roots of subvolumes - */ - struct btrfs_key location; - /* the extent_tree has caches of all the extent mappings to disk */ struct extent_map_tree extent_tree; @@ -72,6 +67,9 @@ struct btrfs_inode { /* the space_info for where this inode's data allocations are done */ struct btrfs_space_info *space_info; + /* full 64 bit inode number */ + u64 ino; + /* full 64 bit generation number, struct vfs_inode doesn't have a big * enough field for this. */ @@ -171,7 +169,7 @@ static inline struct btrfs_inode *BTRFS_I(struct inode *inode) static inline u64 btrfs_ino(struct inode *inode) { - u64 ino = BTRFS_I(inode)-location.objectid; + u64 ino = BTRFS_I(inode)-ino; if (ino = BTRFS_FIRST_FREE_OBJECTID) ino = inode-i_ino; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a203d36..06c9b18 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1693,9 +1693,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, BTRFS_I(fs_info-btree_inode)-io_tree.ops = btree_extent_io_ops; + BTRFS_I(fs_info-btree_inode)-ino = BTRFS_BTREE_INODE_OBJECTID; BTRFS_I(fs_info-btree_inode)-root = tree_root; - memset(BTRFS_I(fs_info-btree_inode)-location, 0, -sizeof(struct btrfs_key)); BTRFS_I(fs_info-btree_inode)-dummy_inode = 1; insert_inode_hash(fs_info-btree_inode); diff --git a/fs/btrfs/export.c b/fs/btrfs/export.c index 1b8dc33..b60c118 100644 --- a/fs/btrfs/export.c +++ b/fs/btrfs/export.c @@ -43,7 +43,7 @@ static int btrfs_encode_fh(struct dentry *dentry, u32 *fh, int *max_len, spin_lock(dentry-d_lock); parent = dentry-d_parent-d_inode; - fid-parent_objectid = BTRFS_I(parent)-location.objectid; + fid-parent_objectid = BTRFS_I(parent)-ino; fid-parent_gen = parent-i_generation; parent_root_id = BTRFS_I(parent)-root-objectid; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5b9b6b6..f3d1230 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3037,7 +3037,7 @@ int btrfs_check_data_free_space(struct inode *inode, u64 bytes) bytes = (bytes + root-sectorsize - 1) ~((u64)root-sectorsize - 1); if (root == root-fs_info-tree_root || - BTRFS_I(inode)-location.objectid == BTRFS_FREE_INO_OBJECTID) { + BTRFS_I(inode)-ino == BTRFS_FREE_INO_OBJECTID) { alloc_chunk = 0; committed = 1; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 02ff4a1..bbe4cdc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -754,7 +754,7 @@ static inline bool is_free_space_inode(struct btrfs_root *root, struct inode *inode) { if (root == root-fs_info-tree_root || - BTRFS_I(inode)-location.objectid == BTRFS_FREE_INO_OBJECTID) + BTRFS_I(inode)-ino == BTRFS_FREE_INO_OBJECTID) return true; return false; } @@ -2513,7 +2513,10 @@ static void btrfs_read_locked_inode(struct inode *inode) path = btrfs_alloc_path(); BUG_ON(!path); path-leave_spinning = 1; - memcpy(location, BTRFS_I(inode)-location, sizeof(location)); + + location.objectid = BTRFS_I(inode)-ino; + location.offset = 0; + btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY); ret = btrfs_lookup_inode(NULL, root, path, location, 0); if (ret) @@ -2667,6 +2670,7 @@ noinline int btrfs_update_inode(struct btrfs_trans_handle *trans, struct btrfs_inode_item *inode_item; struct btrfs_path *path; struct extent_buffer *leaf; + struct btrfs_key location; int ret; /* @@ -2687,8 +2691,12 @@ noinline int btrfs_update_inode(struct
Re: [PATCH 10/12 v3] Btrfs: deal with EEXIST after iput
On 06/21/2011 10:00 PM, Josef Bacik wrote: On 06/21/2011 04:49 AM, Liu Bo wrote: There are two cases when BTRFS_I(inode)-logged_trans is zero: a) an inode is just allocated; b) iput an inode and reread it. However, in b) if btrfs is not committed yet, and this inode _may_ still remain in log tree. So we need to check the log tree to get logged_trans a right value in case it hits a EEXIST while logging. Instead of doing this why not just check and see if the inode has been logged but the transaction has not yet been committed in btrfs_drop_inode? That way the inode doesn't get evicted from cache until after we know it's ok and that way we don't have to waste a tree lookup. Thanks, Good idea, I'll follow it. thanks, liubo Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: kill location key of in-memory inode
, BTRFS_INODE_ITEM_KEY); - btrfs_inherit_iflags(inode, dir); if ((mode S_IFREG)) { @@ -7029,7 +7039,7 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, new_inode-i_ctime = CURRENT_TIME; if (unlikely(btrfs_ino(new_inode) == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID)) { -root_objectid = BTRFS_I(new_inode)-location.objectid; +root_objectid = BTRFS_I(new_inode)-inode_id; direct assignment, no btrfs_ino as in the first hunk This is a special case, where is new_inode-i_ino is BTRFS_EMPTY_SUBVOL_DIR_OBJECTID, while BTRFS_I(new_inode)-location.objectid is 256. Thanks for the reviewing! liubo thanks, ret = btrfs_unlink_subvol(trans, dest, new_dir, root_objectid, new_dentry-d_name.name, -- david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11 v2] Btrfs: improve write ahead log with sub transaction
On 06/10/2011 08:40 AM, David Sterba wrote: Hi, is it possible to refresh this patchset and resend? I'd like to enroll it and give it some review and testing. So far I have seen notions and use of trans_mutex, which has been removed. Sure, thanks for the passion. Yea, I've noticed the trans_mutex thing, but I'm afraid I have to do this till next week, cause these is a btrfs fi bal bug still on going on my schedule. thanks, liubo thanks, david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: check root_key's offset instead
When we use reloc root to cow or copy a tree block, we do not set the block's owner, instead we set its header's flag with BTRFS_HEADER_FLAG_RELOC. So here we should check for root_key's offset. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 5b9b6b6..0bda273 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6160,7 +6160,7 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans, if (wc-flags[level + 1] BTRFS_BLOCK_FLAG_FULL_BACKREF) parent = path-nodes[level + 1]-start; else - BUG_ON(root-root_key.objectid != + BUG_ON(root-root_key.offset != btrfs_header_owner(path-nodes[level + 1])); } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel BUG at fs/btrfs/extent-tree.c:6164!
On 06/07/2011 04:24 PM, Tsutomu Itoh wrote: (2011/06/07 15:17), Tsutomu Itoh wrote: (2011/06/07 14:59), Tsutomu Itoh wrote: Hi liubo, (2011/06/07 14:31), liubo wrote: On 06/06/2011 04:33 PM, Tsutomu Itoh wrote: Hi, I encountered following panic using 'btrfs-unstable + for-linus' kernel. I ran btrfs fi bal /test5 command, and mount option of /test5 is as follows: /dev/sdc3 on /test5 type btrfs (rw,space_cache,compress=lzo,inode_cache) So, just a btrfs fi bal would lead to the bug? I think so. I've figured out the warnings, but not reproduced the bug yet... I used 'btrfs-unstable + for-linus whose top commit is commit aa0467d8d2a00e75b2bb6a56a4ee6d70c5d1928f Author: David Sterba dste...@suse.cz Date: Fri Jun 3 16:29:08 2011 +0200 btrfs: fix uninitialized variable warning It's same of my environment. and tried on 1) a single disk, 2) 2 disks and 3) 4 disks respectively, but none of them leaded to the below bug. The test script and the volume composition that I am executing are same as following mail. http://marc.info/?l=linux-btrfsm=130680171426371w=2 and, in my environment, panic is done within almost 30 minutes when test script is executed. I forgot to write. I am adding '-o inode_cache' to the mount option in my test script. Yep, I've added this and reproduced it. Seems that there are several bugs. Anyway, thanks for the report. I'm trying to work it out. :) thanks, liubo Another panic occurred when I executed it again. I rebuilt the kernel with 3.0-rc2. but, same problem occurred. 4[ 131.708325] WARNING: at fs/btrfs/transaction.c:213 start_transaction+0x74/0x259 [btrfs]() 4[ 131.708329] Hardware name: PRIMERGY 4[ 131.708330] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode] 4[ 131.708378] Pid: 3041, comm: btrfs Not tainted 3.0.0-rc2test #1 4[ 131.708381] Call Trace: 4[ 131.708388] [8104514a] warn_slowpath_common+0x85/0x9d 4[ 131.708392] [8104517c] warn_slowpath_null+0x1a/0x1c 4[ 131.708410] [a02a6f8b] start_transaction+0x74/0x259 [btrfs] 4[ 131.708430] [a02bf965] ? btrfs_wait_ordered_range+0xf9/0x11d [btrfs] 4[ 131.708448] [a02a73ed] btrfs_start_transaction+0x13/0x15 [btrfs] 4[ 131.708467] [a02aec08] btrfs_evict_inode+0x113/0x22d [btrfs] 4[ 131.708471] [81123a98] evict+0x77/0x118 4[ 131.708475] [81123ec1] iput+0x13d/0x146 4[ 131.708489] [a02939c9] btrfs_remove_block_group+0x14d/0x35b [btrfs] 4[ 131.708508] [a02c6ff7] btrfs_relocate_chunk+0x464/0x50d [btrfs] 4[ 131.708527] [a02c54ce] ? btrfs_item_key_to_cpu+0x2a/0x46 [btrfs] 4[ 131.708545] [a02c7672] btrfs_balance+0x1ca/0x219 [btrfs] 4[ 131.708563] [a02cfbfd] btrfs_ioctl+0x890/0xb87 [btrfs] 4[ 131.708567] [810e87c8] ? handle_mm_fault+0x233/0x24a 4[ 131.708572] [813a6e25] ? do_page_fault+0x340/0x3b2 4[ 131.708577] [8111d6f8] do_vfs_ioctl+0x474/0x4c3 4[ 131.708581] [810ffd25] ? virt_to_head_page+0xe/0x31 4[ 131.708585] [81100fcc] ? kmem_cache_free+0x20/0xae 4[ 131.708588] [8111d79d] sys_ioctl+0x56/0x79 4[ 131.708592] [813aa542] system_call_fastpath+0x16/0x1b 4[ 131.708595] ---[ end trace 5f962f46d3ba5425 ]--- 6[ 131.708777] btrfs: relocating block group 29360128 flags 20 6[ 132.385682] btrfs: found 85 extents 0[ 132.798892] [ cut here ] 2[ 132.799014] kernel BUG at fs/btrfs/extent-tree.c:1424! 0[ 132.799014] invalid opcode: [#1] SMP 4[ 132.799014] CPU 0 4[ 132.799014] Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode] 4[ 132.799014] 4[ 132.799014] Pid: 3041, comm: btrfs Tainted: GW 3.0.0-rc2test #1 FUJITSU-SV PRIMERGY/D2399 4[ 132.799014] RIP: 0010:[a0296c86] [a0296c86] lookup_inline_extent_backref+0xe3/0x3a9 [btrfs] 4[ 132.799014] RSP: 0018:880193aa5808 EFLAGS: 00010202 4[ 132.799014] RAX: 0001 RBX: 880192fac000 RCX: 0002 4[ 132.799014] RDX
Re: kernel BUG at fs/btrfs/extent-tree.c:6164!
On 06/06/2011 04:33 PM, Tsutomu Itoh wrote: Hi, I encountered following panic using 'btrfs-unstable + for-linus' kernel. I ran btrfs fi bal /test5 command, and mount option of /test5 is as follows: /dev/sdc3 on /test5 type btrfs (rw,space_cache,compress=lzo,inode_cache) So, just a btrfs fi bal would lead to the bug? I've figured out the warnings, but not reproduced the bug yet... I used 'btrfs-unstable + for-linus whose top commit is commit aa0467d8d2a00e75b2bb6a56a4ee6d70c5d1928f Author: David Sterba dste...@suse.cz Date: Fri Jun 3 16:29:08 2011 +0200 btrfs: fix uninitialized variable warning and tried on 1) a single disk, 2) 2 disks and 3) 4 disks respectively, but none of them leaded to the below bug. I guess maybe I miss something to reproduce it? thanks, liubo Thanks, Tsutomu = btrfs: relocating block group 23383244800 flags 20 btrfs: found 2959 extents [ cut here ] WARNING: at fs/btrfs/transaction.c:213 start_transaction+0x2a7/0x2b0 [btrfs]() Hardware name: PRIMERGY Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufr eq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i 801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac _core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_a cpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode] Pid: 23781, comm: btrfs Tainted: GW 2.6.39btrfs-test+ #4 Call Trace: [8106004f] warn_slowpath_common+0x7f/0xc0 [810600aa] warn_slowpath_null+0x1a/0x20 [a0337047] start_transaction+0x2a7/0x2b0 [btrfs] [a035498d] ? btrfs_wait_ordered_range+0x10d/0x160 [btrfs] [a0337323] btrfs_start_transaction+0x13/0x20 [btrfs] [a033bbca] btrfs_evict_inode+0x11a/0x260 [btrfs] [811687f8] evict+0x78/0x170 [81168c92] iput+0xe2/0x1a0 [a031f171] btrfs_remove_block_group+0x141/0x3c0 [btrfs] [a035e6ea] btrfs_relocate_chunk+0x54a/0x670 [btrfs] [a0357668] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [a031be51] ? btrfs_previous_item+0xb1/0x150 [btrfs] [a035f43a] btrfs_balance+0x21a/0x2b0 [btrfs] [8115dc41] ? path_openat+0x101/0x3d0 [a03685bc] btrfs_ioctl+0x51c/0xc40 [btrfs] [8111e358] ? handle_mm_fault+0x148/0x270 [814809e8] ? do_page_fault+0x1d8/0x4b0 [81160d6a] do_vfs_ioctl+0x9a/0x540 [811612b1] sys_ioctl+0xa1/0xb0 [81484ec2] system_call_fastpath+0x16/0x1b ---[ end trace e5c5cb2e98a3cd1a ]--- btrfs: relocating block group 20971520 flags 18 btrfs: relocating block group 34925969408 flags 18 btrfs: found 1 extents [ cut here ] kernel BUG at fs/btrfs/extent-tree.c:6164! invalid opcode: [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Modules linked in: autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf ipv6 btrfs zlib_deflate crc32c libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parport_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp pci_hotplug i3000_edac edac_core ext4 mbcache jbd2 crc16 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix libata scsi_mod floppy [last unloaded: microcode] Pid: 4109, comm: btrfs Tainted: GW 2.6.39btrfs-test+ #4 FUJITSU-SV PRIMERGY/D2399 RIP: 0010:[a0325b95] [a0325b95] walk_up_proc+0x375/0x420 [btrfs] RSP: 0018:8801801eb9c8 EFLAGS: 00010286 RAX: 0005 RBX: 880167a70140 RCX: fff8 RDX: 8801801ea000 RSI: 8800 RDI: 880194909fa8 RBP: 8801801eba18 R08: R09: 0005 R10: 0001 R11: 880194909fa8 R12: R13: 88013973d000 R14: 88015ad4d9a0 R15: 880042203920 FS: 7fa86bcb9740() GS:88019fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 0033cf60b0c0 CR3: 000181cf7000 CR4: 06f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process btrfs (pid: 4109, threadinfo 8801801ea000, task 88011a4914a0) Stack: 8801801eba18 880194909fa8 8801 a03280e8 8801801eba58 88015ad4d9a0 8801801ea000 880167a70140 8801801eba78 a0325d71 Call Trace: [a03280e8] ? btrfs_run_delayed_refs+0xc8/0x210 [btrfs] [a0325d71] walk_up_tree+0x131/0x1b0 [btrfs] [a03260b0] btrfs_drop_snapshot+0x2c0/0x5c0 [btrfs] [a03328b0
Re: [3.0-rc1] kernel BUG at fs/btrfs/relocation.c:4285!
On 05/31/2011 08:27 AM, Tsutomu Itoh wrote: The panic occurred when 'btrfs fi bal /test5' was executed. /test5 is as follows: # mount -o space_cache,compress=lzo /dev/sdc3 /test5 # # btrfs fi sh /dev/sdc3 Label: none uuid: 38ec48b2-a64b-4225-8cc6-5eb08024dc64 Total devices 5 FS bytes used 7.87MB devid1 size 10.00GB used 2.02GB path /dev/sdc3 devid2 size 15.01GB used 3.00GB path /dev/sdc5 devid3 size 15.01GB used 3.00GB path /dev/sdc6 devid4 size 20.01GB used 2.01GB path /dev/sdc7 devid5 size 10.00GB used 2.01GB path /dev/sdc8 Btrfs v0.19-50-ge6bd18d # btrfs fi df /test5 Data, RAID0: total=10.00GB, used=3.52MB Data: total=8.00MB, used=1.60MB System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.00GB, used=216.00KB Metadata: total=8.00MB, used=0.00 Hi, Itoh san, I've come up with a patch aiming to fix this bug. The problems is that the inode allocator stores one inode cache per root, which is at least not good for relocation tree, cause we only find new inode number from fs tree or file tree (subvol/snapshot). I've tested with your run.sh and it works well on my box, so you can try this: === based on 3.0, commit d6c0cb379c5198487e4ac124728cbb2346d63b1f === diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 0009705..ebc2a7b 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -372,6 +372,10 @@ int btrfs_save_ino_cache(struct btrfs_root *root, int prealloc; bool retry = false; + if (root-root_key.objectid != BTRFS_FS_TREE_OBJECTID + root-root_key.objectid BTRFS_FIRST_FREE_OBJECTID) + return 0; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; thanks, liubo --- Tsutomu 6device fsid 25424ba6b248ec38-64dc2480b05ec68c devid 5 transid 4 /dev/sdc8 6device fsid 25424ba6b248ec38-64dc2480b05ec68c devid 1 transid 7 /dev/sdc3 6btrfs: enabling disk space caching 6btrfs: use lzo compression 6device fsid 69423c117ae771dd-c275f966f982cf84 devid 1 transid 7 /dev/sdd4 6btrfs: disk space caching is enabled 6btrfs: relocating block group 1103101952 flags 9 6btrfs: found 318 extents 0[ cut here ] 2kernel BUG at fs/btrfs/relocation.c:4285! 0invalid opcode: [#1] SMP 4CPU 1 4Modules linked in: btrfs autofs4 sunrpc 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table m perf ipv6 zlib_deflate libcrc32c ext3 jbd dm_mirror dm_region_hash dm_log dm_mod kvm uinput ppdev parpor t_pc parport sg pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support tg3 shpchp i3000_edac edac_core ex t4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom megaraid_sas pata_acpi ata_generic ata_piix floppy [last unloaded: btrfs] 4Pid: 6173, comm: btrfs Not tainted 3.0.0-rc1btrfs-test #1 FUJITSU-SV PRIMERGY/D2399 4RIP: 0010:[a049308c] [a049308c] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] 4RSP: 0018:8801514236a8 EFLAGS: 00010246 4RAX: 8801930dc000 RBX: 8801936f5800 RCX: 880163241d60 4RDX: 88016325dd18 RSI: 8801931a3000 RDI: 8801632fb3e0 4RBP: 880151423708 R08: 880151423784 R09: 0100 4R10: R11: 880163224d58 R12: 8801931a3000 4R13: 88016325dd18 R14: 8801632fb3e0 R15: 4FS: 7f41577ce740() GS:88019fd0() knlGS: 4CS: 0010 DS: ES: CR0: 8005003b 4CR2: 010afb80 CR3: 00015142e000 CR4: 06e0 4DR0: DR1: DR2: 4DR3: DR6: 0ff0 DR7: 0400 4Process btrfs (pid: 6173, threadinfo 880151422000, task 880151997580) 0Stack: 4 88016325dd18 8801632fb3e0 880151423708 a042b2ed 4 0001 880151423708 8801931a3000 4 880163241d60 88016325dd18 8801632fb3e0 0Call Trace: 4 [a042b2ed] ? update_ref_for_cow+0x22d/0x330 [btrfs] 4 [a042b841] __btrfs_cow_block+0x451/0x5e0 [btrfs] 4 [a042badb] btrfs_cow_block+0x10b/0x250 [btrfs] 4 [a0431c67] btrfs_search_slot+0x557/0x870 [btrfs] 4 [a042a252] ? generic_bin_search+0x1f2/0x210 [btrfs] 4 [a04447bf] btrfs_lookup_inode+0x2f/0xa0 [btrfs] 4 [a04557c2] btrfs_update_inode+0xc2/0x140 [btrfs] 4 [a0444fbc] btrfs_save_ino_cache+0x7c/0x200 [btrfs] 4 [a044c5ad] commit_fs_roots+0xad/0x180 [btrfs] 4 [a044d555] btrfs_commit_transaction+0x385/0x7d0 [btrfs] 4 [81081e00] ? wake_up_bit+0x40/0x40 4 [a048f4bf] prepare_to_relocate+0xdf/0xf0 [btrfs] 4 [a0496121] relocate_block_group+0x41/0x600 [btrfs] 4 [814baa6e] ? mutex_lock+0x1e/0x50 4 [a044bc59
Re: [3.0-rc1] kernel BUG at fs/btrfs/relocation.c:4285!
On 06/01/2011 03:44 PM, liubo wrote: On 05/31/2011 08:27 AM, Tsutomu Itoh wrote: The panic occurred when 'btrfs fi bal /test5' was executed. /test5 is as follows: # mount -o space_cache,compress=lzo /dev/sdc3 /test5 # # btrfs fi sh /dev/sdc3 Label: none uuid: 38ec48b2-a64b-4225-8cc6-5eb08024dc64 Total devices 5 FS bytes used 7.87MB devid1 size 10.00GB used 2.02GB path /dev/sdc3 devid2 size 15.01GB used 3.00GB path /dev/sdc5 devid3 size 15.01GB used 3.00GB path /dev/sdc6 devid4 size 20.01GB used 2.01GB path /dev/sdc7 devid5 size 10.00GB used 2.01GB path /dev/sdc8 Btrfs v0.19-50-ge6bd18d # btrfs fi df /test5 Data, RAID0: total=10.00GB, used=3.52MB Data: total=8.00MB, used=1.60MB System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.00GB, used=216.00KB Metadata: total=8.00MB, used=0.00 Hi, Itoh san, I've come up with a patch aiming to fix this bug. The problems is that the inode allocator stores one inode cache per root, which is at least not good for relocation tree, cause we only find new inode number from fs tree or file tree (subvol/snapshot). I've tested with your run.sh and it works well on my box, so you can try this: Sorry, I messed up BTRFS_FIRST_FREE_OBJECTID and BTRFS_LAST_FREE_OBJECTID, plz ignore this. === based on 3.0, commit d6c0cb379c5198487e4ac124728cbb2346d63b1f === diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 0009705..ebc2a7b 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -372,6 +372,10 @@ int btrfs_save_ino_cache(struct btrfs_root *root, int prealloc; bool retry = false; + if (root-root_key.objectid != BTRFS_FS_TREE_OBJECTID + root-root_key.objectid BTRFS_FIRST_FREE_OBJECTID) + return 0; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.0-rc1] kernel BUG at fs/btrfs/relocation.c:4285!
On 06/01/2011 04:12 PM, liubo wrote: On 06/01/2011 03:44 PM, liubo wrote: On 05/31/2011 08:27 AM, Tsutomu Itoh wrote: The panic occurred when 'btrfs fi bal /test5' was executed. /test5 is as follows: # mount -o space_cache,compress=lzo /dev/sdc3 /test5 # # btrfs fi sh /dev/sdc3 Label: none uuid: 38ec48b2-a64b-4225-8cc6-5eb08024dc64 Total devices 5 FS bytes used 7.87MB devid1 size 10.00GB used 2.02GB path /dev/sdc3 devid2 size 15.01GB used 3.00GB path /dev/sdc5 devid3 size 15.01GB used 3.00GB path /dev/sdc6 devid4 size 20.01GB used 2.01GB path /dev/sdc7 devid5 size 10.00GB used 2.01GB path /dev/sdc8 Btrfs v0.19-50-ge6bd18d # btrfs fi df /test5 Data, RAID0: total=10.00GB, used=3.52MB Data: total=8.00MB, used=1.60MB System, RAID1: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, RAID1: total=1.00GB, used=216.00KB Metadata: total=8.00MB, used=0.00 Hi, Itoh san, I've come up with a patch aiming to fix this bug. The problems is that the inode allocator stores one inode cache per root, which is at least not good for relocation tree, cause we only find new inode number from fs tree or file tree (subvol/snapshot). I've tested with your run.sh and it works well on my box, so you can try this: I've tested the following patch for about 1.5 hour, and nothing happened. And would you please test this patch? thanks, From: Liu Bo liubo2...@cn.fujitsu.com [PATCH] Btrfs: fix save ino cache bug We just get new inode number from fs root or subvol/snap root, so we'd like to save fs/subvol/snap root's inode cache into disk. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/inode-map.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/inode-map.c b/fs/btrfs/inode-map.c index 0009705..8c0c25b 100644 --- a/fs/btrfs/inode-map.c +++ b/fs/btrfs/inode-map.c @@ -372,6 +372,12 @@ int btrfs_save_ino_cache(struct btrfs_root *root, int prealloc; bool retry = false; + /* only fs tree and subvol/snap needs ino cache */ + if (root-root_key.objectid != BTRFS_FS_TREE_OBJECTID + (root-root_key.objectid BTRFS_FIRST_FREE_OBJECTID || +root-root_key.objectid BTRFS_LAST_FREE_OBJECTID)) + return 0; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs error after using kernel 3.0-rc1
On 06/01/2011 08:22 PM, Fajar A. Nugraha wrote: On Wed, Jun 1, 2011 at 6:06 AM, Fajar A. Nugraha l...@fajar.net wrote: While using btrfs as root on kernel 3.0-rc1, there was some errors (I wasn't able to capture the error) that forced me to do hard reset. Now during startup system drops to busybox shell because it's unable to mount root partition. Is there a way to recover the data, as at least grub2 was still happy enough to load kernel and initrd (both of which located on the same btrfs partition)? This is what dmesg says [4.536798] device label SSD-ROOT devid 1 transid 38245 /dev/sda2 [9.552086] device label SSD-ROOT devid 1 transid 38245 /dev/disk/by-label/SSD-ROOT [9.554563] btrfs: disk space caching is enabled [9.564301] parent transid verify failed on 44040192 wanted 38240 found 32526 [9.564535] parent transid verify failed on 44040192 wanted 38240 found 32526 [9.564778] parent transid verify failed on 44040192 wanted 38240 found 32526 [9.575679] parent transid verify failed on 44052480 wanted 38240 found 31547 [9.575904] parent transid verify failed on 44052480 wanted 38240 found 31547 [9.576176] parent transid verify failed on 44052480 wanted 38240 found 31547 [9.586121] parent transid verify failed on 44064768 wanted 38240 found 34145 [9.586319] parent transid verify failed on 44064768 wanted 38240 found 34145 [9.586515] parent transid verify failed on 44064768 wanted 38240 found 34145 [9.587027] parent transid verify failed on 44068864 wanted 38240 found 34476 [9.589732] Btrfs detected SSD devices, enabling SSD mode [9.592923] block group 29360128 has an wrong amount of free space [9.592959] btrfs: failed to load free space cache for block group 29360128 For anyone who got the same problem, I was finally able to mount the fs using Ubuntu Natty's 2.6.38-8-generic (the one on live CD). Previously I tried using 2.6.38-9-generic and and 3.0-rc1, none works. Now I'm copying the files somewhere else before reinstalling this system. On another note, does anybody know how btrfs allocates ID for subvols? It doesn't seem to reuse deleted subvol's ID. What happens when the last subvol ID is 999? Yes, no reuse. a new subvol will be 1000, one large than 999. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11 v2] Btrfs: improve write ahead log with sub transaction
This includes the two patches that we've discussed before. I sent this as a whole just in case you have to patch the code by yourself. :) thanks, liubo On 05/26/2011 04:19 PM, Liu Bo wrote: I've been working to try to improve the write-ahead log's performance, and I found that the bottleneck addresses in the checksum items, especially when we want to make a random write on a large file, e.g a 4G file. Then a idea for this suggested by Chris is to use sub transaction ids and just to log the part of inode that had changed since either the last log commit or the last transaction commit. And as we also push the sub transid into the btree blocks, we'll get much faster tree walks. As a result, we abandon the original brute force approach, which is to delete all items of the inode in log, to making sure we get the most uptodate copies of everything, and instead we manage to find and merge, i.e. finding extents in the log tree and merging in the new extents from the file. This patchset puts the above idea into code, and although the code is now more complex, it brings us a great deal of performance improvement. Beside the improvement of log, patch 8 fixes a small but critical bug of log code with sub transaction. Here I have some test results to show, I use sysbench to do random write + fsync. === sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] === Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 1 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 1 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 4.3621Mb/sec) 1116.71 Requests/sec executed v1-v2: fix a EEXIST by logged_trans and a mismatch by log root generation Liu Bo (11): Btrfs: introduce sub transaction stuff Btrfs: update block generation if should_cow_block fails Btrfs: modify btrfs_drop_extents API Btrfs: introduce first sub trans Btrfs: still update inode trans stuff when size remains unchanged Btrfs: improve log with sub transaction Btrfs: add checksum check for log Btrfs: fix a bug of log check Btrfs: kick off useless code Btrfs: deal with EEXIST after iput Btrfs: use the right generation number to read log_root_tree fs/btrfs/btrfs_inode.h | 12 ++- fs/btrfs/ctree.c | 69 + fs/btrfs/ctree.h |5 +- fs/btrfs/disk-io.c | 12 +- fs/btrfs/extent-tree.c | 10 +- fs/btrfs/file.c| 22 ++--- fs/btrfs/inode.c | 33 --- fs/btrfs/ioctl.c |6 +- fs/btrfs/relocation.c |6 +- fs/btrfs/transaction.c | 13 ++- fs/btrfs/transaction.h | 19 +++- fs/btrfs/tree-defrag.c |2 +- fs/btrfs/tree-log.c| 267 +++- 13 files changed, 330 insertions(+), 146 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backref walking utilities
On 05/25/2011 11:08 PM, Jan Schmidt wrote: On 05/23/2011 12:02 PM, Arne Jansen wrote: Hi liubo, On 23.05.2011 11:53, liubo wrote: As one of my plans, I'm going to take this project over unless someone has been working on it. Jan Schmidt has a patch for scrub nearly ready, that does some ref-walking to report affected files to the user. While this is kernel code and you're planning to add user-space code, it might still be possible to share some of it. Maybe the efforts can be coordinated. The patches are ready and should be flexible enough to use for your purpose. However I use them in context of the scrub code, thus I'm planning to send them out as soon as the current version of scrub is included in Chris' master. If anybody wants to test the patches before that (apply well against Arnes scrub branch), drop me an email. I'd like to have a look ahead. Would you please give the address of these patches? thanks, liubo -Jan -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff
On 05/24/2011 11:56 PM, liubo wrote: The problems I hit: When an inode is dropped from cache (just via iput) and then read in again, the BTRFS_I(inode)-logged_trans goes back to zero. When this happens the logging code assumes the inode isn't in the log and hits -EEXIST if it finds inode items. ok, I just find where the problem addresses. This is because I've put a check between logged_trans and transaction_id, which is inclined to filter those that are first logged, and I'm sorry for not taking the 'iput' stuff into consideration. And it's easy to fix this, as we can just kick this check off and put another check while searching 'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree, it proves that this inode is definitely not in the tree. So I'd like to make some changes like this patch(_UNTEST_): I've thought of this problem more and came up with a _better and more efficient_ patch. It will always get BTRFS_I(inode)-logged_trans correct value. But I'm still trying to test it somehow... :P Here it is: diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 40f6f8f..d22b3bf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1769,12 +1769,9 @@ static int btrfs_finish_ordered_io(struct inode *inode, u64 start, u64 end) add_pending_csums(trans, inode, ordered_extent-file_offset, ordered_extent-list); - ret = btrfs_ordered_update_i_size(inode, 0, ordered_extent); - if (!ret) { - ret = btrfs_update_inode(trans, root, inode); - BUG_ON(ret); - } else - btrfs_set_inode_last_trans(trans, inode); + btrfs_ordered_update_i_size(inode, 0, ordered_extent); + ret = btrfs_update_inode(trans, root, inode); + BUG_ON(ret); ret = 0; out: if (nolock) { diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 912397c..92fe5dd 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -3032,6 +3032,37 @@ out: return ret; } +static int check_logged_trans(struct btrfs_trans_handle *trans, + struct btrfs_root *root, struct inode *inode) +{ + struct btrfs_inode_item *inode_item; + struct btrfs_path *path; + int ret; + + path = btrfs_alloc_path(); + if (!path) + return -ENOMEM; + + ret = btrfs_search_slot(trans, root, + BTRFS_I(inode)-location, path, 0, 0); + if (ret) { + if (ret 0) + ret = 0; + goto out; + } + + btrfs_unlock_up_safe(path, 1); + inode_item = btrfs_item_ptr(path-nodes[0], path-slots[0], + struct btrfs_inode_item); + + BTRFS_I(inode)-logged_trans = btrfs_inode_transid(path-nodes[0], + inode_item); +out: + btrfs_free_path(path); + return ret; +} + + static int inode_in_log(struct btrfs_trans_handle *trans, struct inode *inode) { @@ -3084,6 +3115,18 @@ int btrfs_log_inode_parent(struct btrfs_trans_handle *trans, if (ret) goto end_no_trans; + /* +* After we iput a inode and reread it from disk, logged_trans is 0. +* However, this inode _may_ still remain in log tree and not be +* committed yet. +* So we need to check the log tree to get logged_trans a right value. +*/ + if (!BTRFS_I(inode)-logged_trans root-log_root) { + ret = check_logged_trans(trans, root-log_root, inode); + if (ret) + goto end_no_trans; + } + if (inode_in_log(trans, inode)) { ret = BTRFS_NO_LOG_SYNC; goto end_no_trans; thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff
On 05/24/2011 11:56 PM, liubo wrote: Second, we use the generation number of the super to read in the log tree root after a crash. This doesn't always match the sub trans id and so it doesn't always match the transid stored in the btree blocks. There are a few solutions to this, we can use some of the reserved fields in the super for the generation numbers of the roots the super points to, and use whichever one is bigger when we read things in. All right, I'm going to dig it more. I've got this resolved via 'log_root_transid' of 'struct btrfs_super_block', and it looks nice on both syntactic and functional side. :) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ac8d2ac..1006898 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2103,6 +2103,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, if (btrfs_super_log_root(disk_super) != 0 !(fs_info-fs_state BTRFS_SUPER_FLAG_ERROR)) { u64 bytenr = btrfs_super_log_root(disk_super); + u64 log_root_transid = btrfs_super_log_root_transid(disk_super); if (fs_devices-rw_devices == 0) { printk(KERN_WARNING Btrfs log replay required @@ -2125,7 +2126,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, log_tree_root-node = read_tree_block(tree_root, bytenr, blocksize, - generation + 1); + log_root_transid); ret = btrfs_recover_log_trees(log_tree_root); BUG_ON(ret); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 912397c..b304ec1 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2089,6 +2089,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, log_root_tree-node-start); btrfs_set_super_log_root_level(root-fs_info-super_for_commit, btrfs_header_level(log_root_tree-node)); + btrfs_set_super_log_root_transid(root-fs_info-super_for_commit, +trans-transid); log_root_tree-log_batch = 0; log_root_tree-log_transid++; thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] tracing: add __print_symbolic_u64 to avoid warnings on 32bit machine
On 05/01/2011 11:35 AM, Steven Rostedt wrote: On Fri, 2011-04-29 at 18:01 +0800, liubo wrote: ping? Sorry, I've been trying to get the new ftrace function tracer features out ASAP. I plan on looking at this when I'm done. Thanks, Hi, Steven, I've seen your latest git-pull, but these 2 patches are not included yet, so is there any problem with them? If it does, I can be helpful. :) thanks, liubo -- Steve On 04/19/2011 09:35 AM, liubo wrote: Filesystem, like Btrfs, has some ULL macros, and when these macros are passed to tracepoints'__print_symbolic(), there will be 64-32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/ftrace_event.h | 12 include/trace/ftrace.h | 13 + kernel/trace/trace_output.c | 27 +++ 3 files changed, 52 insertions(+), 0 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 47e3997..efb2330 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -16,6 +16,11 @@ struct trace_print_flags { const char *name; }; +struct trace_print_flags_u64 { + unsigned long long mask; + const char *name; +}; + const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, const struct trace_print_flags *flag_array); @@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +#if BITS_PER_LONG == 32 +const char *ftrace_print_symbols_seq_u64(struct trace_seq *p, +unsigned long long val, +const struct trace_print_flags_u64 +*symbol_array); +#endif + const char *ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int len); diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 3e68366..533c49f 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -205,6 +205,19 @@ ftrace_print_symbols_seq(p, value, symbols);\ }) +#undef __print_symbolic_u64 +#if BITS_PER_LONG == 32 +#define __print_symbolic_u64(value, symbol_array...) \ + ({ \ + static const struct trace_print_flags_u64 symbols[] = \ + { symbol_array, { -1, NULL } }; \ + ftrace_print_symbols_seq_u64(p, value, symbols);\ + }) +#else +#define __print_symbolic_u64(value, symbol_array...) \ + __print_symbolic(value, symbol_array) +#endif + #undef __print_hex #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 02272ba..b783504 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, } EXPORT_SYMBOL(ftrace_print_symbols_seq); +#if BITS_PER_LONG == 32 +const char * +ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val, +const struct trace_print_flags_u64 *symbol_array) +{ + int i; + const char *ret = p-buffer + p-len; + + for (i = 0; symbol_array[i].name; i++) { + + if (val != symbol_array[i].mask) + continue; + + trace_seq_puts(p, symbol_array[i].name); + break; + } + + if (!p-len) + trace_seq_printf(p, 0x%llx, val); + + trace_seq_putc(p, 0); + + return ret; +} +EXPORT_SYMBOL(ftrace_print_symbols_seq_u64); +#endif + const char * ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Backref walking utilities
Hi, As one of my plans, I'm going to take this project over unless someone has been working on it. From wiki, quote: Backref walking utilities Given a block number on a disk, the Btrfs metadata can find all the files and directories that use or care about that block. Some utilities to walk these back refs and print the results would help debug corruptions. Given an inode, the Btrfs metadata can find all the directories that point to the inode. We should have utils to walk these back refs as well. end quote. And I have some thoughts to share with you: - Clearly, this is going to be another command. Just like the command btrfs-debug-tree, btrfs-walk-backref also needs to be able to track btrfs's metadata in a) the offline situation (at a umount state), or b) the corrupted situation. - For block number, the main goal is to find relative extent backrefs. When it comes to those shared blocks, maybe things will be more complex. - For inode, the main goal is to find relative inode refs. And we should be cautious about a) an inode with hard links, b) snapshot. Did I miss or misunderstand something? Any comments are welcomed. :) thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] Btrfs: improve write ahead log with sub transaction
On 05/23/2011 12:43 PM, Josef Bacik wrote: On 05/19/2011 04:11 AM, Liu Bo wrote: I've been working to try to improve the write-ahead log's performance, and I found that the bottleneck addresses in the checksum items, especially when we want to make a random write on a large file, e.g a 4G file. Then a idea for this suggested by Chris is to use sub transaction ids and just to log the part of inode that had changed since either the last log commit or the last transaction commit. And as we also push the sub transid into the btree blocks, we'll get much faster tree walks. As a result, we abandon the original brute force approach, which is to delete all items of the inode in log, to making sure we get the most uptodate copies of everything, and instead we manage to find and merge, i.e. finding extents in the log tree and merging in the new extents from the file. This patchset puts the above idea into code, and although the code is now more complex, it brings us a great deal of performance improvement. Beside the improvement of log, patch 8 fixes a small but critical bug of log code with sub transaction. Here I have some test results to show, I use sysbench to do random write + fsync. === sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] === Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 1 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 1 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 4.3621Mb/sec) 1116.71 Requests/sec executed Have you run powerfail tests with this? I'd like to make sure you haven't inadvertently messed something up. Thanks, Yes, I've done this before, and it has nothing serious but a few of parent transid verify failed, just the same as Chris had mentioned in the thread. thanks, liubo Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff
On 05/23/2011 10:40 AM, Chris Mason wrote: Excerpts from Chris Mason's message of 2011-05-19 20:23:29 -0400: Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400: Introduce a new concept sub transaction, the relation between transaction and sub transaction is transaction A --- transid = x sub trans a(1) --- sub_transid = x+1 sub trans a(2) --- sub_transid = x+2 ... ... sub trans a(n-1) --- sub_transid = x+n-1 sub trans a(n) --- sub_transid = x+n transaction B --- transid = x+n+1 ... ... And the most important is a) a trans handler's transid now gets value from sub transid instead of transid. b) when a transaction commits, transid may not added by 1, but depend on the biggest sub_transaction of the last neighbour transaction, i.e. B-transid = a(n)-transid + 1, (B-transid - A-transid) = 1 c) we start a new sub transaction after a fsync. We also ship some 'trans-transid' to 'trans-transaction-transid' to ensure btrfs works well and to get rid of WARNings. These are used for the new log code. This is exactly what I had in mind. I need to read it harder and make sure it interacts well with the directory logging code, but I love it. Ok, I hit a few problems with this, and since the transids are used everywhere for various reasons, I think we need to wait until 2.6.41. This code is really very close to right, but we have the delayed inode work, scrub, and the new inode number allocator all at once. I'd like to limit the size of the changes. I agree with this, in fact, I'm a litter worried cause it is such an important role that the transids are playing in btrfs, which means to change it is dangerous, so it deserves more test. The problems I hit: When an inode is dropped from cache (just via iput) and then read in again, the BTRFS_I(inode)-logged_trans goes back to zero. When this happens the logging code assumes the inode isn't in the log and hits -EEXIST if it finds inode items. ok, I just find where the problem addresses. This is because I've put a check between logged_trans and transaction_id, which is inclined to filter those that are first logged, and I'm sorry for not taking the 'iput' stuff into consideration. And it's easy to fix this, as we can just kick this check off and put another check while searching 'BTRFS_INODE_ITEM_KEY', since if we cannot find a inode item in a tree, it proves that this inode is definitely not in the tree. So I'd like to make some changes like this patch(_UNTEST_): diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 912397c..69ddbbd 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2569,10 +2569,6 @@ static int prepare_for_merge_items(struct btrfs_trans_handle *trans, int i; int ret; - /* There are no relative items of the inode in log. */ - if (BTRFS_I(inode)-logged_trans trans-transaction-transid) - return 0; - path = btrfs_alloc_path(); if (!path) return -ENOMEM; @@ -2622,6 +2618,11 @@ static int prepare_for_merge_items(struct btrfs_trans_handle *trans, if (ret 0) { btrfs_release_path(log, path); + + /* There are no relative items of the inode in log. */ + if (key.type == BTRFS_INODE_ITEM_KEY) + break; + continue; } I patched it to just delete away all the logged items if the logged transid wasn't set, which is probably safest given that we can now reuse inode numbers. Second, we use the generation number of the super to read in the log tree root after a crash. This doesn't always match the sub trans id and so it doesn't always match the transid stored in the btree blocks. There are a few solutions to this, we can use some of the reserved fields in the super for the generation numbers of the roots the super points to, and use whichever one is bigger when we read things in. All right, I'm going to dig it more. Liubo, since we'll leave this one for .41, I'll take your smaller patch that just skips the csum items. ok, I see. Thank a lot for the review. :) thanks, liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/5] btrfs: state information for readahead
On 05/23/2011 08:59 AM, Arne Jansen wrote: Add state information for readahead to btrfs_fs_info and btrfs_device Signed-off-by: Arne Jansen sensi...@gmx.net --- fs/btrfs/ctree.h |4 fs/btrfs/disk-io.c |4 fs/btrfs/volumes.c |8 fs/btrfs/volumes.h |8 4 files changed, 24 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2e61fe1..4a33e30 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1079,6 +1079,10 @@ struct btrfs_fs_info { /* filesystem state */ u64 fs_state; + + /* readahead tree */ + spinlock_t reada_lock; + struct radix_tree_root reada_tree; }; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7753eb9..3d4f9c5 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1803,6 +1803,10 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info-max_inline = 8192 * 1024; fs_info-metadata_ratio = 0; + /* readahead state */ + INIT_RADIX_TREE(fs_info-reada_tree, GFP_NOFS); + spin_lock_init(fs_info-reada_lock); + fs_info-thread_pool_size = min_t(unsigned long, num_online_cpus() + 2, 8); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8b9fb8c..800e670 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -396,6 +396,14 @@ static noinline int device_list_add(const char *path, } INIT_LIST_HEAD(device-dev_alloc_list); + /* init readahead state */ + spin_lock_init(device-reada_lock); + device-reada_curr_zone = NULL; + atomic_set(device-reada_in_flight, 0); + device-reada_next = 0; + INIT_RADIX_TREE(device-reada_zones, GFP_NOFS); + INIT_RADIX_TREE(device-reada_extents, GFP_NOFS); + mutex_lock(fs_devices-device_list_mutex); list_add(device-dev_list, fs_devices-devices); mutex_unlock(fs_devices-device_list_mutex); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index cc2eada..33acd4e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -86,6 +86,14 @@ struct btrfs_device { u8 uuid[BTRFS_UUID_SIZE]; struct btrfs_work work; + + /* readahead state */ + spinlock_t reada_lock; + atomic_t reada_in_flight; + u64 reada_next; + struct reada_zone *reada_curr_zone; struct reada_zone has not been defined yet... thanks, liubo + struct radix_tree_root reada_zones; + struct radix_tree_root reada_extents; }; struct btrfs_fs_devices { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] Btrfs: improve write ahead log with sub transaction
On 05/19/2011 04:11 PM, Liu Bo wrote: I've been working to try to improve the write-ahead log's performance, and I found that the bottleneck addresses in the checksum items, especially when we want to make a random write on a large file, e.g a 4G file. Then a idea for this suggested by Chris is to use sub transaction ids and just to log the part of inode that had changed since either the last log commit or the last transaction commit. And as we also push the sub transid into the btree blocks, we'll get much faster tree walks. As a result, we abandon the original brute force approach, which is to delete all items of the inode in log, to making sure we get the most uptodate copies of everything, and instead we manage to find and merge, i.e. finding extents in the log tree and merging in the new extents from the file. This patchset puts the above idea into code, and although the code is now more complex, it brings us a great deal of performance improvement. Beside the improvement of log, patch 8 fixes a small but critical bug of log code with sub transaction. Here I have some test results to show, I use sysbench to do random write + fsync. === sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] === Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 1 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 1 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 4.3621Mb/sec) 1116.71 Requests/sec executed Liu Bo (10): Btrfs: introduce sub transaction stuff Btrfs: modify should_cow_block to update block's generation Btrfs: modify btrfs_drop_extents API Btrfs: introduce first sub trans Btrfs: still update inode transid when size remains unchanged Btrfs: main log stuff Btrfs: add checksum check for log Btrfs: fix a bug of log check Btrfs: kick off useless code Btrfs: ship trans-transid to trans-transaction-transid fs/btrfs/btrfs_inode.h | 12 ++- fs/btrfs/ctree.c | 71 ++- fs/btrfs/ctree.h |5 +- fs/btrfs/disk-io.c |9 +- fs/btrfs/extent-tree.c | 10 ++- fs/btrfs/file.c| 22 ++--- fs/btrfs/inode.c | 28 -- fs/btrfs/ioctl.c |6 +- fs/btrfs/relocation.c |6 +- fs/btrfs/transaction.c | 13 ++- fs/btrfs/transaction.h | 19 - fs/btrfs/tree-defrag.c |2 +- fs/btrfs/tree-log.c| 222 --- 13 files changed, 279 insertions(+), 146 deletions(-) Sorry for the wrong analysis info, here is the right one: Liu Bo (9): Btrfs: introduce sub transaction stuff Btrfs: update block generation if should_cow_block fails Btrfs: modify btrfs_drop_extents API Btrfs: introduce first sub trans Btrfs: still update inode trans stuff when size remains unchanged Btrfs: improve log with sub transaction Btrfs: add checksum check for log Btrfs: fix a bug of log check Btrfs: kick off useless code fs/btrfs/btrfs_inode.h | 12 ++- fs/btrfs/ctree.c | 69 +++ fs/btrfs/ctree.h |5 +- fs/btrfs/disk-io.c |9 +- fs/btrfs/extent-tree.c | 10 ++- fs/btrfs/file.c| 22 ++--- fs/btrfs/inode.c | 28 -- fs/btrfs/ioctl.c |6 +- fs/btrfs/relocation.c |6 +- fs/btrfs/transaction.c | 13 ++- fs/btrfs/transaction.h | 19 - fs/btrfs/tree-defrag.c |2 +- fs/btrfs/tree-log.c| 222 --- 13 files changed, 282 insertions(+), 141 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] Btrfs: introduce sub transaction stuff
On 05/20/2011 08:23 AM, Chris Mason wrote: Excerpts from Liu Bo's message of 2011-05-19 04:11:24 -0400: Introduce a new concept sub transaction, the relation between transaction and sub transaction is transaction A --- transid = x sub trans a(1) --- sub_transid = x+1 sub trans a(2) --- sub_transid = x+2 ... ... sub trans a(n-1) --- sub_transid = x+n-1 sub trans a(n) --- sub_transid = x+n transaction B --- transid = x+n+1 ... ... And the most important is a) a trans handler's transid now gets value from sub transid instead of transid. b) when a transaction commits, transid may not added by 1, but depend on the biggest sub_transaction of the last neighbour transaction, i.e. B-transid = a(n)-transid + 1, (B-transid - A-transid) = 1 c) we start a new sub transaction after a fsync. We also ship some 'trans-transid' to 'trans-transaction-transid' to ensure btrfs works well and to get rid of WARNings. These are used for the new log code. This is exactly what I had in mind. I need to read it harder and make sure it interacts well with the directory logging code, but I love it. Thanks! It's so great that you like it. :) But I must NOTE again: Due to the bug which patch 8 fixed, the previous preformance statistics I posted sometime ago, like (*SPEED* : 4.7+ Mb/sec), are valueless and cannot be used as a basis any more... Hope that more people can get the patchset tested. thanks, liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: crash in btrfsck, btrfs-debug-tree, etc
On 05/04/2010 05:28 AM, Vladimir G. Ivanovic wrote: No help, eh? At the minimum, it would be nice if btrfsck were fixed... Not sure if the following one will help you to show the metadata, but you can give it a try and go on using btrfs-debug-tree. diff --git a/disk-io.c b/disk-io.c index a6e1000..90f2831 100644 --- a/disk-io.c +++ b/disk-io.c @@ -204,12 +204,8 @@ struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr, eb-dev_bytenr = multi-stripes[0].physical; kfree(multi); ret = read_extent_from_disk(eb); - if (ret == 0 check_tree_block(root, eb) == 0 - csum_tree_block(root, eb, 1) == 0 - verify_parent_transid(eb-tree, eb, parent_transid) == 0) { - btrfs_set_buffer_uptodate(eb); + if (ret == 0) return eb; - } num_copies = btrfs_num_copies(root-fs_info-mapping_tree, eb-start, eb-len); if (num_copies == 1) { thanks, liubo. Unfortunately, now btrfs will NOT mount the drive, so I am now completely without data. The mount error is: kernel: device fsid c64b56bd1c869bb3-e85f95a29c7dd3ad devid 1 transid 21547 /dev/sdc1 kernel: btrfs bad tree block start 14052438117991321731 20971520 kernel: btrfs bad tree block start 14052438117991321731 20971520 kernel: btrfs bad tree block start 8532476744452893537 20971520 kernel: btrfs: failed to read chunk root on sdc1 kernel: btrfs: open_ctree failed --- Vladimir Vladimir G. Ivanovichttp://www.leonora.org +1 650 450 4101 vladi...@acm.org on 04/28/2010 01:03 PM Vladimir G. Ivanovic said the following: I overwrote some part of the first 195641856 bytes of a 1TB (nominal) btrfs volume (I CTRL-C'd out before dd finished.) OK, OK, you may stop laughing now. Surely something similar has happened to you. No? Then it will, someday. First things first: A huge congratulations to the btrfs team because the btrfs volume is still usable. I do get many errors similar to: kernel: btrfs bad tree block start 3050544144921548175 12056985 but for many of my files, I don't get errors. Now, onto my problems. My first thought was to btrfsck the unmount volume, but btrfsck crashes: # btrfsck /dev/sdc1 btrfsck: disk-io.c:723: open_ctree_fd: Assertion `!(!chunk_root-node)' failed. Aborted (core dumped) So does btrfs-debug-tree, and I suspect other utilities will as well. I tried the latest utilities from btrfs-progs-unstable, but they too crash with the same error. (I'm on a Athlon64-powered netbook running Fedora 12. btrfs's version is 0.19.) In particular, so does btrfs-image, so I can't share the volume's metadata. So, until the utilities are fixed, what are my options? * Can I create a snapshot of the root volume? Would I end up with everything that could be read in the snapshot, or would it also have errors? If this is a good idea, would these commands work? btrfsctl -s snapshot_of_root /mnt/chopin1 mount.btrfs -o subvol=snapshot_of_root /dev/sdc1 /mnt/snap do the trick, assuming that btrfsctl doesn't also crash? Then what? Copy the snapshot to another disk? Somehow make the new snapshot the new root, allowing me to delete the old root? * Should I just try and copy the data to another disk and reformat my current volume? * Is there a way of testing whether a particular file is good other than (slowly) going through each and every file while watching syslog? cat, for example, doesn't return an error when the file is bad, so I don't think I can write a shell script to copy good files to another volume. Are there other options that I haven't considered? Thanks for all help. --- Vladimir -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs progs: fix extra metadata chunk allocation in --mixed case
On 05/05/2011 10:16 PM, Arne Jansen wrote: When creating a mixed fs with mkfs, an extra metadata chunk got allocated. This is because btrfs_reserve_extent calls do_chunk_alloc for METADATA, which in turn wasn't able to find the proper space_info, as __find_space_info did a hard compare of the flags. It is now sufficient for the space_info to include the proper flag. This reflects the change done to the kernel code to support mixed chunks. Also for a subsequent chunk allocation (which should not be hit in the mkfs case), the chunk is now created with the flags from the space_info instead of the requested flags. A better solution would be to pull the full changeset for the mixed case from the kernel into the user mode (or, even better, share the code) The additional chunk probably confused block_rsv calculation, which in turn led to severeal ENOSPC Oopses. Good catch! Reviewed-by: Liu Bo liubo2...@cn.fujitsu.com Signed-off-by: Arne Jansen sensi...@gmx.net --- extent-tree.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/extent-tree.c b/extent-tree.c index b2f9bb2..c6c77c6 100644 --- a/extent-tree.c +++ b/extent-tree.c @@ -1735,7 +1735,7 @@ static struct btrfs_space_info *__find_space_info(struct btrfs_fs_info *info, struct btrfs_space_info *found; list_for_each(cur, head) { found = list_entry(cur, struct btrfs_space_info, list); - if (found-flags == flags) + if (found-flags flags) return found; } return NULL; @@ -1812,7 +1812,8 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, thresh) return 0; - ret = btrfs_alloc_chunk(trans, extent_root, start, num_bytes, flags); + ret = btrfs_alloc_chunk(trans, extent_root, start, num_bytes, + space_info-flags); if (ret == -ENOSPC) { space_info-full = 1; return 0; @@ -1820,7 +1821,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, BUG_ON(ret); - ret = btrfs_make_block_group(trans, extent_root, 0, flags, + ret = btrfs_make_block_group(trans, extent_root, 0, space_info-flags, BTRFS_FIRST_CHUNK_TREE_OBJECTID, start, num_bytes); BUG_ON(ret); return 0; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do random write + fsync. === sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= [prepare, run] === Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 1 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 1 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 4.7533Mb/sec) 1216.84 Requests/sec executed PS: I've made a _sub transid_ stuff patch, but it does not perform as effectively as this patch, and I'm wanderring where the problem is and trying to improve it more. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/tree-log.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c50271a..b934a36 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2662,6 +2662,9 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, extent = btrfs_item_ptr(src, start_slot + i, struct btrfs_file_extent_item); + if (btrfs_file_extent_generation(src, extent) trans-transid) + continue; + found_type = btrfs_file_extent_type(src, extent); if (found_type == BTRFS_FILE_EXTENT_REG || found_type == BTRFS_FILE_EXTENT_PREALLOC) { -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] tracing: add __print_symbolic_u64 to avoid warnings on 32bit machine
ping? On 04/19/2011 09:35 AM, liubo wrote: Filesystem, like Btrfs, has some ULL macros, and when these macros are passed to tracepoints'__print_symbolic(), there will be 64-32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/ftrace_event.h | 12 include/trace/ftrace.h | 13 + kernel/trace/trace_output.c | 27 +++ 3 files changed, 52 insertions(+), 0 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 47e3997..efb2330 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -16,6 +16,11 @@ struct trace_print_flags { const char *name; }; +struct trace_print_flags_u64 { + unsigned long long mask; + const char *name; +}; + const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, const struct trace_print_flags *flag_array); @@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +#if BITS_PER_LONG == 32 +const char *ftrace_print_symbols_seq_u64(struct trace_seq *p, + unsigned long long val, + const struct trace_print_flags_u64 + *symbol_array); +#endif + const char *ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int len); diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 3e68366..533c49f 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -205,6 +205,19 @@ ftrace_print_symbols_seq(p, value, symbols);\ }) +#undef __print_symbolic_u64 +#if BITS_PER_LONG == 32 +#define __print_symbolic_u64(value, symbol_array...) \ + ({ \ + static const struct trace_print_flags_u64 symbols[] = \ + { symbol_array, { -1, NULL } }; \ + ftrace_print_symbols_seq_u64(p, value, symbols);\ + }) +#else +#define __print_symbolic_u64(value, symbol_array...) \ + __print_symbolic(value, symbol_array) +#endif + #undef __print_hex #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 02272ba..b783504 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, } EXPORT_SYMBOL(ftrace_print_symbols_seq); +#if BITS_PER_LONG == 32 +const char * +ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val, + const struct trace_print_flags_u64 *symbol_array) +{ + int i; + const char *ret = p-buffer + p-len; + + for (i = 0; symbol_array[i].name; i++) { + + if (val != symbol_array[i].mask) + continue; + + trace_seq_puts(p, symbol_array[i].name); + break; + } + + if (!p-len) + trace_seq_printf(p, 0x%llx, val); + + trace_seq_putc(p, 0); + + return ret; +} +EXPORT_SYMBOL(ftrace_print_symbols_seq_u64); +#endif + const char * ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
On 04/22/2011 09:28 AM, Chris Mason wrote: Excerpts from Li Zefan's message of 2011-04-21 20:55:40 -0400: Chris Mason wrote: Excerpts from liubo's message of 2011-04-21 03:58:21 -0400: The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do random write + fsync. Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 1 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 1 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 5.1537Mb/sec) 1319.34 Requests/sec executed Really nice results! Especially considering the small size of the patch. But, I'd really like to look at using sub transaction ids for this, and then logging just the part of the inode that had changed since the last log commit. It's more complex, but will also help reduce tree searches for the file items. And this patch forgot to mention it has compatability issue. Right, at the very least we want to just use one bit of that field instead of all 8. But keeping a sub-transid and putting that in the generation field of the file extent instead can get us the same benefits without stealing the bits. Nice. This is the first step of my plan. As we push the sub transid into the btree blocks as well, we'll get much faster tree walks too. The penalty is in complexity in the logging code, since it will have to deal with finding extents in the log tree and merging in the new extents from the file. I've been thinking of this extent buffer with sub transid stuff for a while, and will give it a try. :) thanks, liubo. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
The current code relogs the entire inode every time during fsync log, and it is much better suited to small files rather than large ones. During my performance test, the fsync performace of large files sucks, and we can ascribe this to the tremendous amount of csum infos of the large ones, cause we have to flush all of these csum infos into log trees even when there are only _one_ change in the whole file data. Apparently, to optimize fsync, we need to create a filter to skip the unnecessary csum ones, that is, the corresponding file data remains unchanged before this fsync. Here I have some test results to show, I use sysbench to do random write + fsync. Sysbench args: - Number of threads: 1 - Extra file open flags: 0 - 2 files, 4Gb each - Block size 4Kb - Number of random requests for random IO: 1 - Read/Write ratio for combined random IO test: 1.50 - Periodic FSYNC enabled, calling fsync() each 100 requests. - Calling fsync() at the end of test, Enabled. - Using synchronous I/O mode - Doing random write test Sysbench results: === Operations performed: 0 Read, 1 Write, 200 Other = 10200 Total Read 0b Written 39.062Mb Total transferred 39.062Mb === a) without patch: (*SPEED* : 451.01Kb/sec) 112.75 Requests/sec executed b) with patch: (*SPEED* : 5.1537Mb/sec) 1319.34 Requests/sec executed Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h| 14 -- fs/btrfs/inode.c|1 + fs/btrfs/tree-log.c | 31 +-- 3 files changed, 38 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2e61fe1..300bea0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -642,6 +642,12 @@ struct btrfs_root_ref { #define BTRFS_FILE_EXTENT_REG 1 #define BTRFS_FILE_EXTENT_PREALLOC 2 +/* + * used to indicate that this file extent has just been changed and + * its csums need to be updated when fsync tries to log this inode. + */ +#define BTRFS_FILE_EXTENT_CSUM_UPTODATE(1 0) + struct btrfs_file_extent_item { /* * transaction id that created this extent @@ -665,7 +671,9 @@ struct btrfs_file_extent_item { */ u8 compression; u8 encryption; - __le16 other_encoding; /* spare for later use */ + u8 other_encoding; /* spare for later use */ + + u8 flag; /* are we inline data or a real extent? */ u8 type; @@ -2026,7 +2034,9 @@ BTRFS_SETGET_FUNCS(file_extent_compression, struct btrfs_file_extent_item, BTRFS_SETGET_FUNCS(file_extent_encryption, struct btrfs_file_extent_item, encryption, 8); BTRFS_SETGET_FUNCS(file_extent_other_encoding, struct btrfs_file_extent_item, - other_encoding, 16); + other_encoding, 8); +BTRFS_SETGET_FUNCS(file_extent_flag, struct btrfs_file_extent_item, + flag, 8); /* this returns the number of file bytes represented by the inline item. * If an item is compressed, this is the uncompressed size diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a4157cf..ed4e318 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1660,6 +1660,7 @@ static int insert_reserved_file_extent(struct btrfs_trans_handle *trans, btrfs_set_file_extent_compression(leaf, fi, compression); btrfs_set_file_extent_encryption(leaf, fi, encryption); btrfs_set_file_extent_other_encoding(leaf, fi, other_encoding); + btrfs_set_file_extent_flag(leaf, fi, BTRFS_FILE_EXTENT_CSUM_UPTODATE); btrfs_unlock_up_safe(path, 1); btrfs_set_lock_blocking(leaf); diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index c50271a..baa4a0a 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2591,11 +2591,24 @@ static int drop_objectid_items(struct btrfs_trans_handle *trans, return ret; } +static inline int need_csum(struct extent_buffer *src, + struct btrfs_file_extent_item *fi, + u64 gen, int csum) +{ + if (csum + (btrfs_file_extent_generation(src, fi) == gen) + (btrfs_file_extent_flag(src, fi) BTRFS_FILE_EXTENT_CSUM_UPTODATE)) + return 1; + + return 0; +} + + static noinline int copy_items(struct btrfs_trans_handle *trans, struct btrfs_root *log, struct btrfs_path *dst_path, struct extent_buffer *src, - int start_slot, int nr, int inode_only) + int start_slot, int nr, int inode_only, int csum) { unsigned long src_offset; unsigned long dst_offset; @@ -2653,6 +2666,7 @@ static noinline int copy_items(struct btrfs_trans_handle *trans, btrfs_set_inode_generation(dst_path-nodes[0], inode_item, 0); } + /*
Re: [PATCH 1/1] btrfs: add missing spin_unlock to a rare exit path
Good catch! thanks, liubo On 04/20/2011 08:34 PM, David Sterba wrote: Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/disk-io.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 5e5d07c..25e4b8f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2825,6 +2825,7 @@ static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, spin_lock(delayed_refs-lock); if (delayed_refs-num_entries == 0) { + spin_unlock(delayed_refs-lock); printk(KERN_INFO delayed_refs has NO entry\n); return ret; } -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Add a new file op for fsync to give fs's more control
On 04/16/2011 03:32 AM, Josef Bacik wrote: On 04/15/2011 03:24 PM, Christoph Hellwig wrote: Sorry, but this is too ugly to live. If the reason for this really is good enough we'll just need to push the filemap_write_and_wait_range and i_mutex locking into every -fsync instance. So part of what makes small fsyncs slow in btrfs is all of our random threads to make checksumming not suck. So we submit IO which spreads it out to helper threads to do the checksumming, and then when it returns it gets handed off to endio threads that run the endio stuff. This works awesome with doing big writes and such, but if say we're and RPM database and write a couple of kilbytes, this tends to suck because we keep handing work off to other threads and waiting, so the scheduling latencies really hurt. So we'd like to be able to say hey this is a small amount of io, lets just do the checksumming in the current thread, and the same with handling the endio stuff. We can't do that currently because filemap_write_and_wait_range is called before we get to fsync. We'd like to be able to control this so we can do the appropriate magic to do the submission within the fsyncings thread context in order to speed things up a bit. That plus the stuff I said about i_mutex. Is that a good enough reason to just push this down into all the filesystems? Thanks, Fine with the i_mutex. I'm wandering that is it worth of doing so? I've tested your patch with sysbench, and there is little improvement. :( Sysbench args: sysbench --test=fileio --num-threads=1 --file-num=10240 --file-block-size=1K --file-total-size=20M --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags= run 10240 files, 2Kb each === fsync_nolock (patch): Operations performed: 0 Read, 1 Write, 1024000 Other = 1034000 Total Read 0b Written 9.7656Mb Total transferred 9.7656Mb (35.152Kb/sec) 35.15 Requests/sec executed fsync (orig): Operations performed: 0 Read, 1 Write, 1024000 Other = 1034000 Total Read 0b Written 9.7656Mb Total transferred 9.7656Mb (35.287Kb/sec) 35.29 Requests/sec executed === Seems that the improvement of avoiding threads interchange is not enough. BTW, I'm trying to improve the fsync performance stuff, but mainly for large files(4G). And I found that a large file will have a tremendous amount of csum items needed to be flush into tree log during fsync(). Btrfs now uses a brute force approach to ensure to get the most uptodate copies of everything, and this results in a bad performance. To change the brute way is bugging me a lot... thanks, liubo Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] E2fsprogs: use the generic inode flags
Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- debugfs/htree.c|2 +- e2fsck/pass1.c | 22 +++--- e2fsck/pass2.c |2 +- e2fsck/pass4.c |2 +- e2fsck/rehash.c|4 ++-- ext2ed/inode_com.c | 14 +++--- lib/e2p/fgetflags.c|6 +++--- lib/e2p/fsetflags.c|6 +++--- lib/e2p/getflags.c |6 +++--- lib/e2p/pf.c | 34 +- lib/e2p/setflags.c |6 +++--- lib/ext2fs/ext2_fs.h | 44 ++-- lib/ext2fs/link.c |4 ++-- lib/ext2fs/mkjournal.c |2 +- misc/chattr.c | 26 +- misc/tune2fs.c |2 +- 16 files changed, 91 insertions(+), 91 deletions(-) diff --git a/debugfs/htree.c b/debugfs/htree.c index 08f9749..cc9f0fb 100644 --- a/debugfs/htree.c +++ b/debugfs/htree.c @@ -243,7 +243,7 @@ void do_htree_dump(int argc, char *argv[]) goto errout; } - if ((inode.i_flags EXT2_BTREE_FL) == 0) { + if ((inode.i_flags FS_BTREE_FL) == 0) { com_err(argv[0], 0, Not a hash-indexed directory); goto errout; } diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c index 67dd986..5ba93ca 100644 --- a/e2fsck/pass1.c +++ b/e2fsck/pass1.c @@ -138,7 +138,7 @@ int e2fsck_pass1_check_device_inode(ext2_filsys fs EXT2FS_ATTR((unused)), * If the index flag is set, then this is a bogus * device/fifo/socket */ - if (inode-i_flags EXT2_INDEX_FL) + if (inode-i_flags FS_INDEX_FL) return 0; /* @@ -152,7 +152,7 @@ int e2fsck_pass1_check_device_inode(ext2_filsys fs EXT2FS_ATTR((unused)), * you can't set or clear immutable flags for devices.) Once * the kernel has been fixed we can change this... */ - if (inode-i_flags (EXT2_IMMUTABLE_FL | EXT2_APPEND_FL)) { + if (inode-i_flags (FS_IMMUTABLE_FL | FS_APPEND_FL)) { for (i=4; i EXT2_N_BLOCKS; i++) if (inode-i_block[i]) return 0; @@ -175,7 +175,7 @@ int e2fsck_pass1_check_symlink(ext2_filsys fs, ext2_ino_t ino, struct ext2fs_extentextent; if ((inode-i_size_high || inode-i_size == 0) || - (inode-i_flags EXT2_INDEX_FL)) + (inode-i_flags FS_INDEX_FL)) return 0; if (inode-i_flags EXT4_EXTENTS_FL) { @@ -235,7 +235,7 @@ int e2fsck_pass1_check_symlink(ext2_filsys fs, ext2_ino_t ino, * If the immutable (or append-only) flag is set on the inode, offer * to clear it. */ -#define BAD_SPECIAL_FLAGS (EXT2_IMMUTABLE_FL | EXT2_APPEND_FL) +#define BAD_SPECIAL_FLAGS (FS_IMMUTABLE_FL | FS_APPEND_FL) static void check_immutable(e2fsck_t ctx, struct problem_context *pctx) { if (!(pctx-inode-i_flags BAD_SPECIAL_FLAGS)) @@ -989,7 +989,7 @@ void e2fsck_pass1(e2fsck_t ctx) EXT4_FEATURE_RO_COMPAT_HUGE_FILE) (inode-osd2.linux2.l_i_blocks_hi != 0)) mark_inode_bad(ctx, ino); - if (inode-i_flags EXT2_IMAGIC_FL) { + if (inode-i_flags FS_IMAGIC_FL) { if (imagic_fs) { if (!ctx-inode_imagic_map) alloc_imagic_map(ctx); @@ -997,7 +997,7 @@ void e2fsck_pass1(e2fsck_t ctx) ino); } else { if (fix_problem(ctx, PR_1_SET_IMAGIC, pctx)) { - inode-i_flags = ~EXT2_IMAGIC_FL; + inode-i_flags = ~FS_IMAGIC_FL; e2fsck_write_inode(ctx, ino, inode, pass1); } @@ -1893,13 +1893,13 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx, extent_fs = (ctx-fs-super-s_feature_incompat EXT3_FEATURE_INCOMPAT_EXTENTS); - if (inode-i_flags EXT2_COMPRBLK_FL) { + if (inode-i_flags FS_COMPRBLK_FL) { if (fs-super-s_feature_incompat EXT2_FEATURE_INCOMPAT_COMPRESSION) pb.compressed = 1; else { if (fix_problem(ctx, PR_1_COMPR_SET, pctx)) { - inode-i_flags = ~EXT2_COMPRBLK_FL; + inode-i_flags = ~FS_COMPRBLK_FL; dirty_inode++; } } @@ -1940,9 +1940,9 @@ static void check_blocks(e2fsck_t ctx, struct problem_context *pctx, return; } - if (inode-i_flags EXT2_INDEX_FL) { + if (inode-i_flags FS_INDEX_FL) { if (handle_htree(ctx, pctx, ino,
[PATCH 2/2] E2fsprogs: add compress and cow support in chattr, lsattr
Modify command 'chattr' and 'lsattr' to support compress and cow. - use 'C' to indicate NOCOW attribute. - still use 'c' to indicate compress attribute. Also update the man doc. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- lib/e2p/pf.c |1 + lib/ext2fs/ext2_fs.h |1 + misc/chattr.1.in | 15 +++ misc/chattr.c| 15 ++- 4 files changed, 27 insertions(+), 5 deletions(-) diff --git a/lib/e2p/pf.c b/lib/e2p/pf.c index cc50896..c9385dd 100644 --- a/lib/e2p/pf.c +++ b/lib/e2p/pf.c @@ -48,6 +48,7 @@ static struct flags_name flags_array[] = { { FS_TOPDIR_FL, T, Top_of_Directory_Hierarchies }, { EXT4_EXTENTS_FL, e, Extents }, { EXT4_HUGE_FILE_FL, h, Huge_file }, + { FS_NOCOW_FL, C, NOCOW }, { 0, NULL, NULL } }; diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h index 858c103..776be92 100644 --- a/lib/ext2fs/ext2_fs.h +++ b/lib/ext2fs/ext2_fs.h @@ -276,6 +276,7 @@ struct ext2_dx_countlimit { #define EXT4_EXTENTS_FL0x0008 /* Inode uses extents */ #define EXT4_EA_INODE_FL 0x0020 /* Inode used for large EA */ #define EXT4_EOFBLOCKS_FL 0x0040 /* Blocks allocated beyond EOF */ +#define FS_NOCOW_FL0x0080 /* Do not cow file */ #define EXT4_SNAPFILE_FL 0x0100 /* Inode is a snapshot */ #define EXT4_SNAPFILE_DELETED_FL 0x0400 /* Snapshot is being deleted */ #define EXT4_SNAPFILE_SHRUNK_FL0x0800 /* Snapshot shrink has completed */ diff --git a/misc/chattr.1.in b/misc/chattr.1.in index 92f6d70..434eb04 100644 --- a/misc/chattr.1.in +++ b/misc/chattr.1.in @@ -19,17 +19,18 @@ chattr \- change file attributes on a Linux file system .B chattr changes the file attributes on a Linux file system. .PP -The format of a symbolic mode is +-=[acdeijstuADST]. +The format of a symbolic mode is +-=[acdeijstuACDST]. .PP The operator `+' causes the selected attributes to be added to the existing attributes of the files; `-' causes them to be removed; and `=' causes them to be the only attributes that the files have. .PP -The letters `acdeijstuADST' select the new attributes for the files: +The letters `acdeijstuACDST' select the new attributes for the files: append only (a), compressed (c), no dump (d), extent format (e), immutable (i), data journalling (j), secure deletion (s), no tail-merging (t), -undeletable (u), no atime updates (A), synchronous directory updates (D), -synchronous updates (S), and top of directory hierarchy (T). +undeletable (u), no atime updates (A), no copy on write (C), +synchronous directory updates (D), synchronous updates (S), +and top of directory hierarchy (T). .PP The following attributes are read-only, and may be listed by .BR lsattr (1) @@ -64,6 +65,10 @@ this file compresses data before storing them on the disk. Note: please make sure to read the bugs and limitations section at the end of this document. .PP +A file with the `C' attribute set is marked without COW (copy on write). Note: +please make sure to read the bugs and limitations section at the end of this +document. +.PP When a directory with the `D' attribute set is modified, the changes are written synchronously on the disk; this is equivalent to the `dirsync' mount option applied to a subset of the files. @@ -161,6 +166,8 @@ The `c', 's', and `u' attributes are not honored by the ext2 and ext3 filesystems as implemented in the current mainline Linux kernels.These attributes may be implemented in future versions of the ext2 and ext3 filesystems. +The `C' attribute is only used in btrfs filesystem in the current mainline +Linux kernels. .PP The `j' option is only useful if the filesystem is mounted as ext3. .PP diff --git a/misc/chattr.c b/misc/chattr.c index 78e3736..8c8231e 100644 --- a/misc/chattr.c +++ b/misc/chattr.c @@ -82,7 +82,7 @@ static unsigned long sf; static void usage(void) { fprintf(stderr, - _(Usage: %s [-RVf] [-+=AacDdeijsSu] [-v version] files...\n), + _(Usage: %s [-RVf] [-+=AacDdeijsSuC] [-v version] files...\n), program_name); exit(1); } @@ -106,6 +106,7 @@ static const struct flags_char flags_array[] = { { FS_UNRM_FL, 'u' }, { FS_NOTAIL_FL, 't' }, { FS_TOPDIR_FL, 'T' }, + { FS_NOCOW_FL, 'C' }, { 0, 0 } }; @@ -159,6 +160,12 @@ static int decode_arg (int * i, int argc, char ** argv) } if ((fl = get_flag(*p)) == 0) usage(); + + if (fl == FS_COMPR_FL) { + af |= FS_NOCOMPR_FL; + add = 1; + } + rf |= fl; rem = 1; } @@ -168,6 +175,12 @@ static int decode_arg (int * i, int argc, char ** argv) for (p
Re: [PATCH 1/2] E2fsprogs: use the generic inode flags
On 04/18/2011 04:41 PM, Coly Li wrote: On 2011年04月18日 15:37, liubo Wrote: Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- debugfs/htree.c|2 +- e2fsck/pass1.c | 22 +++--- e2fsck/pass2.c |2 +- e2fsck/pass4.c |2 +- e2fsck/rehash.c|4 ++-- ext2ed/inode_com.c | 14 +++--- lib/e2p/fgetflags.c|6 +++--- lib/e2p/fsetflags.c|6 +++--- lib/e2p/getflags.c |6 +++--- lib/e2p/pf.c | 34 +- lib/e2p/setflags.c |6 +++--- lib/ext2fs/ext2_fs.h | 44 ++-- lib/ext2fs/link.c |4 ++-- lib/ext2fs/mkjournal.c |2 +- misc/chattr.c | 26 +- misc/tune2fs.c |2 +- 16 files changed, 91 insertions(+), 91 deletions(-) [snip] Hi Bo, Could you please to introduce the motivation of this patch set a little bit more? Thanks. Hi Li, Since we want to control COW and compression attribute on a per file or per directory basis, and find that the generic command chattr is the Mr Right. Currently only btrfs supports both, of course. With these patches, we can do the followings: c: compress C: nocow set compress nocow: # ./misc/chattr -V +c +C /mnt/btrfs/dir/ chattr 1.41.14 (22-Dec-2010) Flags of /mnt/btrfs/dir/ set as c--C # ./misc/lsattr -d /mnt/btrfs/dir/ c--C /mnt/btrfs/dir/ thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Trace: add __print_symbolic_u64 to avoid warnings on 32bit machine
On 04/19/2011 02:11 AM, Steven Rostedt wrote: On Wed, 2011-04-06 at 17:18 +0800, liubo wrote: Btrfs has some ULL macros, and when these macros are passed to tracepoints' __print_symbolic(), there will be 64-32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/ftrace_event.h | 12 include/trace/events/btrfs.h |4 ++-- include/trace/ftrace.h | 13 + kernel/trace/trace_output.c | 27 +++ 4 files changed, 54 insertions(+), 2 deletions(-) Could you break this up into two patches. One that touches the ftrace core, and one that updates btrfs. Sure, I'll break it and resend soon. Thanks for the reply. thanks, liubo Thanks, -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] tracing: update btrfs's tracepoints to use u64 interface
To avoid 64-32 truncating WARNING, update btrfs's tracepoints. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/trace/events/btrfs.h |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index f445cff..4114129 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -28,7 +28,7 @@ struct extent_buffer; { BTRFS_SHARED_DATA_REF_KEY,SHARED_DATA_REF }) #define __show_root_type(obj) \ - __print_symbolic(obj, \ + __print_symbolic_u64(obj, \ { BTRFS_ROOT_TREE_OBJECTID, ROOT_TREE }, \ { BTRFS_EXTENT_TREE_OBJECTID, EXTENT_TREE }, \ { BTRFS_CHUNK_TREE_OBJECTID,CHUNK_TREE}, \ @@ -125,7 +125,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict, ); #define __show_map_type(type) \ - __print_symbolic(type, \ + __print_symbolic_u64(type, \ { EXTENT_MAP_LAST_BYTE, LAST_BYTE }, \ { EXTENT_MAP_HOLE, HOLE }, \ { EXTENT_MAP_INLINE,INLINE}, \ -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] tracing: add __print_symbolic_u64 to avoid warnings on 32bit machine
Filesystem, like Btrfs, has some ULL macros, and when these macros are passed to tracepoints'__print_symbolic(), there will be 64-32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/ftrace_event.h | 12 include/trace/ftrace.h | 13 + kernel/trace/trace_output.c | 27 +++ 3 files changed, 52 insertions(+), 0 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 47e3997..efb2330 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -16,6 +16,11 @@ struct trace_print_flags { const char *name; }; +struct trace_print_flags_u64 { + unsigned long long mask; + const char *name; +}; + const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, const struct trace_print_flags *flag_array); @@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +#if BITS_PER_LONG == 32 +const char *ftrace_print_symbols_seq_u64(struct trace_seq *p, +unsigned long long val, +const struct trace_print_flags_u64 +*symbol_array); +#endif + const char *ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int len); diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 3e68366..533c49f 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -205,6 +205,19 @@ ftrace_print_symbols_seq(p, value, symbols);\ }) +#undef __print_symbolic_u64 +#if BITS_PER_LONG == 32 +#define __print_symbolic_u64(value, symbol_array...) \ + ({ \ + static const struct trace_print_flags_u64 symbols[] = \ + { symbol_array, { -1, NULL } }; \ + ftrace_print_symbols_seq_u64(p, value, symbols);\ + }) +#else +#define __print_symbolic_u64(value, symbol_array...) \ + __print_symbolic(value, symbol_array) +#endif + #undef __print_hex #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 02272ba..b783504 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, } EXPORT_SYMBOL(ftrace_print_symbols_seq); +#if BITS_PER_LONG == 32 +const char * +ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val, +const struct trace_print_flags_u64 *symbol_array) +{ + int i; + const char *ret = p-buffer + p-len; + + for (i = 0; symbol_array[i].name; i++) { + + if (val != symbol_array[i].mask) + continue; + + trace_seq_puts(p, symbol_array[i].name); + break; + } + + if (!p-len) + trace_seq_printf(p, 0x%llx, val); + + trace_seq_putc(p, 0); + + return ret; +} +EXPORT_SYMBOL(ftrace_print_symbols_seq_u64); +#endif + const char * ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int buf_len) { -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix easily get into ENOSPC in mixed case
On 04/09/2011 05:55 AM, Sergei Trofimovich wrote: [ 100.500011] Call Trace: [ 100.500011] [810ed3a0] vfs_unlink+0x80/0xf0 [ 100.500011] [810ef6f3] do_unlinkat+0x173/0x1b0 [ 100.500011] [8111727b] ? fsnotify_find_inode_mark+0x3b/0x50 [ 100.500011] [810dff91] ? filp_close+0x61/0x90 [ 100.500011] [810f0c0d] sys_unlinkat+0x1d/0x40 [ 100.500011] [81574c3b] system_call_fastpath+0x16/0x1b [ 100.500011] Code: 4c 8b 65 e0 48 8b 5d d8 4c 8b 6d e8 4c 8b 75 f0 4c 8b 7d f8 c9 c3 0f 1f 40 00 4c 89 fe 4c 89 ef e8 05 d0 ff ff 85 c0 74 bb 0f 0b 0f 0b 89 c3 eb cd 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 [ 100.500011] RIP [a024a011] btrfs_unlink+0xd1/0xe0 [btrfs] [ 100.500011] RSP 880070b55e28 [ 100.525672] ---[ end trace 7e63b9144b7307fe ]--- Looks like I won't be able to test your patch until this thing will go away first. Thanks a lot for testing, though. I guess something messed up your btrfs metadata, cause when btrfs_unlink() wanted to remove A, it found that A was just missing... thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: add support for mixed data+metadata block groups
On 12/10/2010 02:31 AM, Josef Bacik wrote: So alot of crazy people (I'm looking at you Meego) want to use btrfs on phones and such with small devices. Unfortunately the way we split out metadata/data chunks it makes space usage inefficient for volumes that are smaller than 1gigabyte. So add a -M option for mixing metadata+data, and default to this mixed mode if the filesystem is less than or equal to 1 gigabyte. I've tested this with xfstests on a 100mb filesystem and everything is a-ok. Hi, Josef, While using this mix metadata+data option, I noticed the following from btrfs-debug-tree's print: === chunk tree leaf 143360 items 4 free space 3557 generation 4 owner 3 fs uuid 77d78a87-a886-4bfa-be3b-0dd052213a17 chunk uuid e64148d6-8267-4ff1-aafd-4266f74afbb2 item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98 dev item devid 1 total_bytes 4999610368 bytes used 20971520 item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3817 itemsize 80 chunk length 4194304 owner 2 type 2 num_stripes 1 stripe 0 devid 1 offset 0 item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3737 itemsize 80 chunk length 8388608 owner 2 type 5 num_stripes 1 stripe 0 devid 1 offset 4194304 item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3657 itemsize 80 == THIS ONE chunk length 8388608 owner 2 type 4 num_stripes 1 == stripe 0 devid 1 offset 12582912 == === you see, there exists another metadata chunk (type 4) after mkfs.btrfs -M /dev/xxx. So I was wondering that _IS_ this chunk what we want, or a spare one? thanks, liubo Signed-off-by: Josef Bacik jo...@redhat.com --- btrfs-vol.c |4 +- btrfs_cmds.c | 13 +- ctree.h | 10 +++-- mkfs.c | 122 +- utils.c | 10 ++-- utils.h |2 +- 6 files changed, 112 insertions(+), 49 deletions(-) diff --git a/btrfs-vol.c b/btrfs-vol.c index 8069778..7200bbc 100644 --- a/btrfs-vol.c +++ b/btrfs-vol.c @@ -129,7 +129,9 @@ int main(int ac, char **av) exit(1); } if (cmd == BTRFS_IOC_ADD_DEV) { - ret = btrfs_prepare_device(devfd, device, 1, dev_block_count); + int mixed = 0; + + ret = btrfs_prepare_device(devfd, device, 1, dev_block_count, mixed); if (ret) { fprintf(stderr, Unable to init %s\n, device); exit(1); diff --git a/btrfs_cmds.c b/btrfs_cmds.c index 8031c58..683aec0 100644 --- a/btrfs_cmds.c +++ b/btrfs_cmds.c @@ -705,6 +705,7 @@ int do_add_volume(int nargs, char **args) int devfd, res; u64 dev_block_count = 0; struct stat st; + int mixed = 0; devfd = open(args[i], O_RDWR); if (!devfd) { @@ -727,7 +728,7 @@ int do_add_volume(int nargs, char **args) continue; } - res = btrfs_prepare_device(devfd, args[i], 1, dev_block_count); + res = btrfs_prepare_device(devfd, args[i], 1, dev_block_count, mixed); if (res) { fprintf(stderr, ERROR: Unable to init '%s'\n, args[i]); close(devfd); @@ -889,8 +890,14 @@ int do_df_filesystem(int nargs, char **argv) memset(description, 0, 80); if (flags BTRFS_BLOCK_GROUP_DATA) { - snprintf(description, 5, %s, Data); - written += 4; + if (flags BTRFS_BLOCK_GROUP_METADATA) { + snprintf(description, 15, %s, + Data+Metadata); + written += 14; + } else { + snprintf(description, 5, %s, Data); + written += 4; + } } else if (flags BTRFS_BLOCK_GROUP_SYSTEM) { snprintf(description, 7, %s, System); written += 6; diff --git a/ctree.h b/ctree.h index 962c510..ed83d02 100644 --- a/ctree.h +++ b/ctree.h @@ -352,13 +352,15 @@ struct btrfs_super_block { * ones specified below then we will fail to mount */ #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL 0) -#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL(2ULL 0) +#define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL(1ULL 1) +#define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS (1ULL 2) #define BTRFS_FEATURE_COMPAT_SUPP0ULL #define BTRFS_FEATURE_COMPAT_RO_SUPP 0ULL -#define BTRFS_FEATURE_INCOMPAT_SUPP \ - (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \ - BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL) +#define
[PATCH] Btrfs: fix easily get into ENOSPC in mixed case
When a btrfs disk is created by mixed data metadata option, it will have no pure data or pure metadata space info. In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920 (Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at the very beginning. The problem is this initialization does not take the mixed case into account, which will cause btrfs will easily get into ENOSPC in mixed case. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c | 37 ++--- 1 files changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f619c3c..1b47ae4 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8781,23 +8781,38 @@ out: int btrfs_init_space_info(struct btrfs_fs_info *fs_info) { struct btrfs_space_info *space_info; + struct btrfs_super_block *disk_super; + u64 features; + u64 flags; + int mixed = 0; int ret; - ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0, -space_info); - if (ret) - return ret; + disk_super = fs_info-super_copy; + if (!btrfs_super_root(disk_super)) + return 1; - ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0, -space_info); - if (ret) - return ret; + features = btrfs_super_incompat_flags(disk_super); + if (features BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) + mixed = 1; - ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0, -space_info); + flags = BTRFS_BLOCK_GROUP_SYSTEM; + ret = update_space_info(fs_info, flags, 0, 0, space_info); if (ret) - return ret; + goto out; + if (mixed) { + flags = BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA; + ret = update_space_info(fs_info, flags, 0, 0, space_info); + } else { + flags = BTRFS_BLOCK_GROUP_METADATA; + ret = update_space_info(fs_info, flags, 0, 0, space_info); + if (ret) + goto out; + + flags = BTRFS_BLOCK_GROUP_DATA; + ret = update_space_info(fs_info, flags, 0, 0, space_info); + } +out: return ret; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Trace: add __print_symbolic_u64 to avoid warnings on 32bit machine
Btrfs has some ULL macros, and when these macros are passed to tracepoints' __print_symbolic(), there will be 64-32 truncate WARNINGS during compiling on 32bit box. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/ftrace_event.h | 12 include/trace/events/btrfs.h |4 ++-- include/trace/ftrace.h | 13 + kernel/trace/trace_output.c | 27 +++ 4 files changed, 54 insertions(+), 2 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 22b32af..6b2e245 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -16,6 +16,11 @@ struct trace_print_flags { const char *name; }; +struct trace_print_flags_u64 { + unsigned long long mask; + const char *name; +}; + const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, const struct trace_print_flags *flag_array); @@ -23,6 +28,13 @@ const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, const struct trace_print_flags *symbol_array); +#if BITS_PER_LONG == 32 +const char *ftrace_print_symbols_seq_u64(struct trace_seq *p, +unsigned long long val, +const struct trace_print_flags_u64 +*symbol_array); +#endif + const char *ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf, int len); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index f445cff..4114129 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -28,7 +28,7 @@ struct extent_buffer; { BTRFS_SHARED_DATA_REF_KEY,SHARED_DATA_REF }) #define __show_root_type(obj) \ - __print_symbolic(obj, \ + __print_symbolic_u64(obj, \ { BTRFS_ROOT_TREE_OBJECTID, ROOT_TREE }, \ { BTRFS_EXTENT_TREE_OBJECTID, EXTENT_TREE }, \ { BTRFS_CHUNK_TREE_OBJECTID,CHUNK_TREE}, \ @@ -125,7 +125,7 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict, ); #define __show_map_type(type) \ - __print_symbolic(type, \ + __print_symbolic_u64(type, \ { EXTENT_MAP_LAST_BYTE, LAST_BYTE }, \ { EXTENT_MAP_HOLE, HOLE }, \ { EXTENT_MAP_INLINE,INLINE}, \ diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h index 3e68366..533c49f 100644 --- a/include/trace/ftrace.h +++ b/include/trace/ftrace.h @@ -205,6 +205,19 @@ ftrace_print_symbols_seq(p, value, symbols);\ }) +#undef __print_symbolic_u64 +#if BITS_PER_LONG == 32 +#define __print_symbolic_u64(value, symbol_array...) \ + ({ \ + static const struct trace_print_flags_u64 symbols[] = \ + { symbol_array, { -1, NULL } }; \ + ftrace_print_symbols_seq_u64(p, value, symbols);\ + }) +#else +#define __print_symbolic_u64(value, symbol_array...) \ + __print_symbolic(value, symbol_array) +#endif + #undef __print_hex #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len) diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 456be90..47aafa9 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -353,6 +353,33 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, } EXPORT_SYMBOL(ftrace_print_symbols_seq); +#if BITS_PER_LONG == 32 +const char * +ftrace_print_symbols_seq_u64(struct trace_seq *p, unsigned long long val, +const struct trace_print_flags_u64 *symbol_array) +{ + int i; + const char *ret = p-buffer + p-len; + + for (i = 0; symbol_array[i].name; i++) { + + if (val != symbol_array[i].mask) + continue; + + trace_seq_puts(p, symbol_array[i].name); + break; + } + + if (!p-len) + trace_seq_printf(p, 0x%llx, val); + + trace_seq_putc(p, 0); + + return ret; +} +EXPORT_SYMBOL(ftrace_print_symbols_seq_u64); +#endif + const char * ftrace_print_hex_seq(struct trace_seq *p, const unsigned char *buf,
Re: [PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression
On 04/04/2011 05:31 PM, Konstantinos Skarlatos wrote: Hello, I would like to ask about the status of this feature/patch, is it accepted into btrfs code, and how can I use it? Yes, it is now in the latest 2.6.39-rc1. I am interested in enabling compression in a specific folder(force-compress would be ideal) of a large btrfs volume, and disabling it for the rest. hmm, I'm making the tool's patch, and will come soon. :) On 21/3/2011 10:57 πμ, liubo wrote: Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1-v2: Rebase the patch with the latest btrfs. Signed-off-by: Liu Boliubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/inode.c | 32 fs/btrfs/ioctl.c | 41 + 4 files changed, 72 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8b4b9d1..b77d1a5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1283,6 +1283,7 @@ struct btrfs_root { #define BTRFS_INODE_NODUMP(1 8) #define BTRFS_INODE_NOATIME(1 9) #define BTRFS_INODE_DIRSYNC(1 10) +#define BTRFS_INODE_COMPRESS(1 11) /* some macros to generate set/get funcs for the struct fields. This * assumes there is a lefoo_to_cpu for every type, so lets make a simple diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..a894c12 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); +/* + * In the long term, we'll store the compression type in the super + * block, and it'll be used for per file compression control. + */ +fs_info-compress_type = BTRFS_COMPRESS_ZLIB; + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index db67821..e687bb9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -381,7 +381,8 @@ again: */ if (!(BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) (btrfs_test_opt(root, COMPRESS) || - (BTRFS_I(inode)-force_compress))) { + (BTRFS_I(inode)-force_compress) || + (BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS))) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); @@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); else if (!btrfs_test_opt(root, COMPRESS) - !(BTRFS_I(inode)-force_compress)) + !(BTRFS_I(inode)-force_compress) + !(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); else @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, location-offset = 0; btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY); -btrfs_inherit_iflags(inode, dir); - if ((mode S_IFREG)) { if (btrfs_test_opt(root, NODATASUM)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM; @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW; } +btrfs_inherit_iflags(inode, dir); + insert_inode_hash(inode); inode_tree_add(inode); return inode; @@ -6803,6 +6805,26 @@ static int btrfs_getattr(struct vfsmount *mnt, return 0; } +/* + * If a file is moved, it will inherit the cow and compression flags of the new + * directory. + */ +static void fixup_inode_flags(struct inode *dir, struct inode *inode) +{ +struct btrfs_inode *b_dir = BTRFS_I(dir); +struct btrfs_inode *b_inode = BTRFS_I(inode); + +if (b_dir-flags
Re: 2.6.39-rc1: kernel BUG at fs/btrfs/extent-tree.c:5479!
On 04/02/2011 06:41 PM, Sergei Trofimovich wrote: On Sat, 02 Apr 2011 17:37:58 +0800 liubo liubo2...@cn.fujitsu.com wrote: On 04/02/2011 05:19 PM, Sergei Trofimovich wrote: The partition is a physical ~5GB --mixed lzo compressed partition. The kernel 2.6.39-rc1 + reverted commit c59021f846881a957ac5afe456d0f59d6a517b61. (see http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09083.html) Hi, Sergei, I'm digging this... Can u show me steps to reproduce this? I use the filesystem as a storage of large CVS tree and as temp storage for large compilations, so I can roughly describe what I did and when it failed. I've formatter btrfs 5G partition as --mixed and mounter it with lzo compression on the kernel of version 'v2.6.38-4148-g054cfaa', then checked out there large CVS tree (~170K files, weights 177MB), copied there linux source (not built) and copied my '/var/'. I ran compiles there and started to get -ENOSPC OOpses when 'df -h' reported 3.5G free. As Linus pulled josef's changes, so I've updated to v2.6.38-6555-ga44f99c and kernel started to OOps right after mount (added assert started to trigger earlier). I've reported it to this ML (link above). josef and sensille helped me to debug what's going wrong [both CCed]. sensille pointed to the commit, which is guilty to miscomputing available space. As I understood they know what exactly screwed up. Great thanks for these details. I did not consider the mix case when making the guilty patch, sorry. Frankly, I'm still trying to reproduce your first bug, and on my box mix + lzo does not cause bug... Seems that you are using opensuse's kernel. The second case (this one): I still use the same filesystem (didn't reformat, so it might carry some corruption after debugging patches). I've reverted your change c59021f846881a957ac5afe456d0f59d6a517b61 and made sure it stops OOpsing for me, then updated to 2.6.39-rc1 and reverted only this commit. Filesystem became usable until I've decided to run large compile on it (clang debug source). I think at the time of OOps the following things did happen simultaneously: 1. one process was splitting debug symbols of some binary: - opened original binary for read - write to new file (stripped binary) - write debug symbols to separate file 2. another process logged that action to log file 3. the filesystem filled-up and OOpsed. At the time of OOps 'df -h' showed 200M free. I'm trying to reproduce this second case ATM (build takes more, that an hour). All right, thanks for the work. thanks, liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] Trace: use unsigned long long in trace print frames
While adding tracepoint for btrfs, I got a problem: btrfs uses some macros with ULL type, but tracepoint's macros, __print_[flags,symbols](), only have unsigned long, so on 32bit box there will be 64-32 truncate WARNINGs when compiling. Here I'm inclined to make the replacement to clear those WARNINGs. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/ftrace_event.h |7 --- kernel/trace/trace_output.c | 10 +- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h index 22b32af..b52f2c5 100644 --- a/include/linux/ftrace_event.h +++ b/include/linux/ftrace_event.h @@ -12,15 +12,16 @@ struct tracer; struct dentry; struct trace_print_flags { - unsigned long mask; + unsigned long long mask; const char *name; }; const char *ftrace_print_flags_seq(struct trace_seq *p, const char *delim, - unsigned long flags, + unsigned long long flags, const struct trace_print_flags *flag_array); -const char *ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, +const char *ftrace_print_symbols_seq(struct trace_seq *p, +unsigned long long val, const struct trace_print_flags *symbol_array); const char *ftrace_print_hex_seq(struct trace_seq *p, diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c index 456be90..97ba902 100644 --- a/kernel/trace/trace_output.c +++ b/kernel/trace/trace_output.c @@ -294,10 +294,10 @@ int trace_seq_path(struct trace_seq *s, struct path *path) const char * ftrace_print_flags_seq(struct trace_seq *p, const char *delim, - unsigned long flags, + unsigned long long flags, const struct trace_print_flags *flag_array) { - unsigned long mask; + unsigned long long mask; const char *str; const char *ret = p-buffer + p-len; int i; @@ -319,7 +319,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim, if (flags) { if (p-len delim) trace_seq_puts(p, delim); - trace_seq_printf(p, 0x%lx, flags); + trace_seq_printf(p, 0x%llx, flags); } trace_seq_putc(p, 0); @@ -329,7 +329,7 @@ ftrace_print_flags_seq(struct trace_seq *p, const char *delim, EXPORT_SYMBOL(ftrace_print_flags_seq); const char * -ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, +ftrace_print_symbols_seq(struct trace_seq *p, unsigned long long val, const struct trace_print_flags *symbol_array) { int i; @@ -345,7 +345,7 @@ ftrace_print_symbols_seq(struct trace_seq *p, unsigned long val, } if (!p-len) - trace_seq_printf(p, 0x%lx, val); + trace_seq_printf(p, 0x%llx, val); trace_seq_putc(p, 0); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Trace: use unsigned long long in trace print frames
On 04/01/2011 09:49 PM, Steven Rostedt wrote: On Fri, 2011-04-01 at 14:42 +0800, liubo wrote: While adding tracepoint for btrfs, I got a problem: btrfs uses some macros with ULL type, but tracepoint's macros, __print_[flags,symbols](), only have unsigned long, so on 32bit box there will be 64-32 truncate WARNINGs when compiling. Here I'm inclined to make the replacement to clear those WARNINGs. Hmm, I don't like this. unsigned long is a natural word for architectures, I don't want to have 32 bit suffer because one user is doing something with ULL. A better solution is to add a trace_print_flags_u64 or something, that can be used for cases that u64 is needed. For archs were sizeof(long) == sizeof(u64) we can have the two macros/structs be the same. All right, a u64 specific one is also in my mind. :) thanks, liubo -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: clear __GFP_FS flag in the space cache inode
From: Miao Xie mi...@cn.fujitsu.com the object id of the space cache inode's key is allocated from the relative root, just like the regular file. So we can't identify space cache inode by checking the object id of the inode's key, and we have to clear __GFP_FS flag at the time we look up the space cache inode. Signed-off-by: Miao Xie mi...@cn.fujitsu.com Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/free-space-cache.c |2 ++ fs/btrfs/inode.c|2 -- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 0037427..13575de 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -81,6 +81,8 @@ struct inode *lookup_free_space_inode(struct btrfs_root *root, return ERR_PTR(-ENOENT); } + inode-i_mapping-flags = ~__GFP_FS; + spin_lock(block_group-lock); if (!root-fs_info-closing) { block_group-inode = igrab(inode); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 93c28a1..c103fdc 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2537,8 +2537,6 @@ static void btrfs_read_locked_inode(struct inode *inode) BTRFS_I(inode)-flags = btrfs_inode_flags(leaf, inode_item); alloc_group_block = btrfs_inode_block_group(leaf, inode_item); - if (location.objectid == BTRFS_FREE_SPACE_OBJECTID) - inode-i_mapping-flags = ~__GFP_FS; /* * try to precache a NULL acl entry for files that don't have -- 1.7.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix compile warning from __btrfs_map_block
On 03/31/2011 08:10 PM, Chris Mason wrote: Excerpts from liubo's message of 2011-03-31 05:45:20 -0400: While compile btrfs modules on 32bit box, I encounter the following: WARNING: __umoddi3 [fs/btrfs/btrfs.ko] undefined! The WARNING comes from that __btrfs_map_block does not use do_div() for relative operations, this will cause problems on 32bit box, for values with u64 type should use do_div() instead of a direct %. Which kernel tree was this against? I had rebased the for-linus and for-linus-unmerged branch to get rid of it. Sorry for the confusion. Ah, it is my fault to neglect the version, I found this warning while compiling the latest for-linus tree (top commit: c1e1f82c56af1a286fd747e809c94628c2ca15fb). thanks, liubo -chris Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/volumes.c | 23 +++ 1 files changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 41afd50..7b23d0f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3076,16 +3076,19 @@ again: multi-stripes[i].dev = map-stripes[stripe_index].dev; if (map-type BTRFS_BLOCK_GROUP_RAID0) { -u64 stripes; -int last_stripe = (stripe_nr_end - 1) % -map-num_stripes; +u64 stripes = stripe_nr_end - 1; +int last_stripe = do_div(stripes, +map-num_stripes); int j; for (j = 0; j map-num_stripes; j++) { -if ((stripe_nr_end - 1 - j) % - map-num_stripes == stripe_index) +stripes = stripe_nr_end - 1 - j; + +if (do_div(stripes, map-num_stripes) == +stripe_index) break; } + stripes = stripe_nr_end - 1 - j; do_div(stripes, map-num_stripes); multi-stripes[i].length = map-stripe_len * @@ -3100,18 +3103,22 @@ again: multi-stripes[i].length -= stripe_end_offset; } else if (map-type BTRFS_BLOCK_GROUP_RAID10) { -u64 stripes; +u64 stripes = stripe_nr_end - 1; int j; int factor = map-num_stripes / map-sub_stripes; -int last_stripe = (stripe_nr_end - 1) % factor; +int last_stripe = do_div(stripes, factor); + last_stripe *= map-sub_stripes; for (j = 0; j factor; j++) { -if ((stripe_nr_end - 1 - j) % factor == +stripes = stripe_nr_end - 1 - j; + +if (do_div(stripes, factor) == stripe_index / map-sub_stripes) break; } + stripes = stripe_nr_end - 1 - j; do_div(stripes, factor); multi-stripes[i].length = map-stripe_len * -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: fix OOPS of empty filesystem after balance
On 03/30/2011 07:58 PM, Arne Jansen wrote: Am 10.03.2011 13:28, schrieb Chris Mason: Excerpts from liubo's message of 2011-03-10 03:50:27 -0500: On 03/07/2011 10:13 AM, liubo wrote: btrfs will remove unused block groups after balance. When a empty filesystem is balanced, the block group with tag DATA may be dropped, and after umount and mount again, it will not find DATA space_info and lead to OOPS. So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS. this patch breaks mixed block groups. If the space_infos get added upfront, later on all mixed block groups will be added to the data space_info, leaving the metadata space_info completely empty. No mixed space_info will ever get created. Hi, Arne, Sorry for the late reply. As a fix it might be enough to call btrfs_init_space_info after btrfs_read_block_groups, not before, but I haven't tested it. Seems impossible, the original bug just occurs in btrfs_read_block_groups()... This was the cause of the BUG reported by Sergei Trofimovich in the thread v2.6.38-6555-ga44f99c: null pointer dereference on -ENOSPC. Thanks for pointing this out. Anyway, will dig it more. thanks, liubo -Arne Hi, Chirs, These two fixes are for critical problems(one OOPS and one memory leak), so would you please take some time to review them and check if they are ready for the next git pull? Seems that you have been a lot busy these days. ;) Hi Liubo, I'm looking at both of these. There are no more rc's for 2.6.38, only the final release, so the bar is very high for a commit that goes in. -chris thanks, liubo Reported-by: Daniel J Blueman daniel.blue...@gmail.com Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/extent-tree.c | 23 +++ 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 28188a7..49c50e5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2221,6 +2221,7 @@ int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr, u64 num_bytes); int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 type); +int btrfs_init_space_info(struct btrfs_fs_info *fs_info); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..8bcdc62 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1967,6 +1967,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info-metadata_alloc_profile = (u64)-1; fs_info-system_alloc_profile = fs_info-metadata_alloc_profile; +ret = btrfs_init_space_info(fs_info); +if (ret) { +printk(KERN_ERR Failed to initial space info: %d\n, ret); +goto fail_block_groups; +} + ret = btrfs_read_block_groups(extent_root); if (ret) { printk(KERN_ERR Failed to read block groups: %d\n, ret); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 100e409..08525ee 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8714,6 +8714,29 @@ out: return ret; } +int btrfs_init_space_info(struct btrfs_fs_info *fs_info) +{ +struct btrfs_space_info *space_info; +int ret; + +ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0, + space_info); +if (ret) +return ret; + +ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0, + space_info); +if (ret) +return ret; + +ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0, + space_info); +if (ret) +return ret; + +return ret; +} + int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 end) { return unpin_extent_range(root, start, end); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add initial tracepoint support for btrfs
On 03/29/2011 09:16 AM, liubo wrote: On 03/28/2011 08:59 AM, Chris Mason wrote: Excerpts from Chris Mason's message of 2011-03-26 08:12:04 -0400: Excerpts from liubo's message of 2011-03-24 07:18:59 -0400: Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g This is really neat, I've queued it up. Whoops, it has a lot of warnings when compiled on 32 bit machines. Please take a look: include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:68:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:68:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:68:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:144:1: warning: large integer implicitly truncated to unsigned type Ahh, I figure it out. Will send a new version to clear warnings. Here is the patch to clear warnings. From: Liu Bo liubo2...@cn.fujitsu.com [PATCH] Btrfs: fix compile warnings of btrfs tracepoint on 32bit box include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type btrfs has defined some macros which value has ULL type, and when btrfs tracepoints use these macros on 32bit box, values like -1ULL will be truncated. This is where those warnings come from. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/trace/events/btrfs.h | 19 +++ 1 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index f445cff..27e67fd 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -36,9 +36,12 @@ struct extent_buffer; { BTRFS_FS_TREE_OBJECTID, FS_TREE }, \ { BTRFS_ROOT_TREE_DIR_OBJECTID, ROOT_TREE_DIR }, \ { BTRFS_CSUM_TREE_OBJECTID, CSUM_TREE }, \ - { BTRFS_TREE_LOG_OBJECTID, TREE_LOG }, \ - { BTRFS_TREE_RELOC_OBJECTID,TREE_RELOC}, \ - { BTRFS_DATA_RELOC_TREE_OBJECTID, DATA_RELOC_TREE }) + { (unsigned long)BTRFS_TREE_LOG_OBJECTID, \ + TREE_LOG }, \ + { (unsigned long)BTRFS_TREE_RELOC_OBJECTID, \ + TREE_RELOC}, \ + { (unsigned long)BTRFS_DATA_RELOC_TREE_OBJECTID,\ + DATA_RELOC_TREE }) #define show_root_type(obj)\ obj, ((obj = BTRFS_DATA_RELOC_TREE_OBJECTID) ||\ @@ -126,13 +129,13 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict, #define __show_map_type(type) \ __print_symbolic(type, \ - { EXTENT_MAP_LAST_BYTE, LAST_BYTE }, \ - { EXTENT_MAP_HOLE, HOLE }, \ - { EXTENT_MAP_INLINE,INLINE}, \ - { EXTENT_MAP_DELALLOC, DELALLOC }) + { (unsigned long)EXTENT_MAP_LAST_BYTE, LAST_BYTE }, \ + { (unsigned long)EXTENT_MAP_HOLE, HOLE }, \ + { (unsigned long)EXTENT_MAP_INLINE, INLINE}, \ + { (unsigned long)EXTENT_MAP_DELALLOC, DELALLOC }) #define show_map_type(type)\ - type, (type = EXTENT_MAP_LAST_BYTE) ? - : __show_map_type(type) + type, (type = EXTENT_MAP_LAST_BYTE) ? - : __show_map_type(type) #define show_map_flags(flag) \ __print_flags(flag, |,\ -- 1.6.5.2 Thanks, liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add initial tracepoint support for btrfs
Please ignore this patch... I just found we'd better revise the tracepoint side instead of btrfs side, will dig it more. thanks, liubo From: Liu Bo liubo2...@cn.fujitsu.com [PATCH] Btrfs: fix compile warnings of btrfs tracepoint on 32bit box include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type btrfs has defined some macros which value has ULL type, and when btrfs tracepoints use these macros on 32bit box, values like -1ULL will be truncated. This is where those warnings come from. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/trace/events/btrfs.h | 19 +++ 1 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index f445cff..27e67fd 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -36,9 +36,12 @@ struct extent_buffer; { BTRFS_FS_TREE_OBJECTID, FS_TREE }, \ { BTRFS_ROOT_TREE_DIR_OBJECTID, ROOT_TREE_DIR }, \ { BTRFS_CSUM_TREE_OBJECTID, CSUM_TREE }, \ - { BTRFS_TREE_LOG_OBJECTID, TREE_LOG }, \ - { BTRFS_TREE_RELOC_OBJECTID,TREE_RELOC}, \ - { BTRFS_DATA_RELOC_TREE_OBJECTID, DATA_RELOC_TREE }) + { (unsigned long)BTRFS_TREE_LOG_OBJECTID, \ + TREE_LOG }, \ + { (unsigned long)BTRFS_TREE_RELOC_OBJECTID, \ + TREE_RELOC}, \ + { (unsigned long)BTRFS_DATA_RELOC_TREE_OBJECTID,\ + DATA_RELOC_TREE }) #define show_root_type(obj) \ obj, ((obj = BTRFS_DATA_RELOC_TREE_OBJECTID) ||\ @@ -126,13 +129,13 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict, #define __show_map_type(type) \ __print_symbolic(type, \ - { EXTENT_MAP_LAST_BYTE, LAST_BYTE }, \ - { EXTENT_MAP_HOLE, HOLE }, \ - { EXTENT_MAP_INLINE,INLINE}, \ - { EXTENT_MAP_DELALLOC, DELALLOC }) + { (unsigned long)EXTENT_MAP_LAST_BYTE, LAST_BYTE }, \ + { (unsigned long)EXTENT_MAP_HOLE, HOLE }, \ + { (unsigned long)EXTENT_MAP_INLINE, INLINE}, \ + { (unsigned long)EXTENT_MAP_DELALLOC, DELALLOC }) #define show_map_type(type) \ - type, (type = EXTENT_MAP_LAST_BYTE) ? - : __show_map_type(type) + type, (type = EXTENT_MAP_LAST_BYTE) ? - : __show_map_type(type) #define show_map_flags(flag) \ __print_flags(flag, |,\ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add initial tracepoint support for btrfs
On 03/28/2011 08:59 AM, Chris Mason wrote: Excerpts from Chris Mason's message of 2011-03-26 08:12:04 -0400: Excerpts from liubo's message of 2011-03-24 07:18:59 -0400: Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g This is really neat, I've queued it up. Whoops, it has a lot of warnings when compiled on 32 bit machines. Please take a look: include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:47:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:68:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:68:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:68:1: warning: large integer implicitly truncated to unsigned type include/trace/events/btrfs.h:144:1: warning: large integer implicitly truncated to unsigned type Ahh, I figure it out. Will send a new version to clear warnings. Thanks, liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: add initial tracepoint support for btrfs
Tracepoints can provide insight into why btrfs hits bugs and be greatly helpful for debugging, e.g dd-7822 [000] 2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0 dd-7822 [000] 2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0 btrfs-transacti-7804 [001] 2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0) btrfs-transacti-7804 [001] 2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0) btrfs-transacti-7804 [001] 2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8 flush-btrfs-2-7821 [001] 2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA flush-btrfs-2-7821 [001] 2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0) flush-btrfs-2-7821 [001] 2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0) flush-btrfs-2-7821 [000] 2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0) btrfs-endio-wri-7800 [001] 2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0) Here is what I have added: 1) ordere_extent: btrfs_ordered_extent_add btrfs_ordered_extent_remove btrfs_ordered_extent_start btrfs_ordered_extent_put These provide critical information to understand how ordered_extents are updated. 2) extent_map: btrfs_get_extent extent_map is used in both read and write cases, and it is useful for tracking how btrfs specific IO is running. 3) writepage: __extent_writepage btrfs_writepage_end_io_hook Pages are cirtical resourses and produce a lot of corner cases during writeback, so it is valuable to know how page is written to disk. 4) inode: btrfs_inode_new btrfs_inode_request btrfs_inode_evict These can show where and when a inode is created, when a inode is evicted. 5) sync: btrfs_sync_file btrfs_sync_fs These show sync arguments. 6) transaction: btrfs_transaction_commit In transaction based filesystem, it will be useful to know the generation and who does commit. 7) back reference and cow: btrfs_delayed_tree_ref btrfs_delayed_data_ref btrfs_delayed_ref_head btrfs_cow_block Btrfs natively supports back references, these tracepoints are helpful on understanding btrfs's COW mechanism. 8) chunk: btrfs_chunk_alloc btrfs_chunk_free Chunk is a link between physical offset and logical offset, and stands for space infomation in btrfs, and these are helpful on tracing space things. 9) reserved_extent: btrfs_reserved_extent_alloc btrfs_reserved_extent_free These can show how btrfs uses its space. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.c |3 + fs/btrfs/ctree.h |1 + fs/btrfs/delayed-ref.c |6 + fs/btrfs/extent-tree.c |4 + fs/btrfs/extent_io.c |2 + fs/btrfs/file.c |1 + fs/btrfs/inode.c | 12 + fs/btrfs/ordered-data.c |8 + fs/btrfs/super.c |5 + fs/btrfs/transaction.c |2 + fs/btrfs/volumes.c | 16 +- fs/btrfs/volumes.h | 11 + include/trace/events/btrfs.h | 667 ++ 13 files changed, 727 insertions(+), 11 deletions(-) create mode 100644 include/trace/events/btrfs.h diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index b5baff0..351515d 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -542,6 +542,9 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans, ret = __btrfs_cow_block(trans, root, buf, parent, parent_slot, cow_ret, search_start, 0); + + trace_btrfs_cow_block(root, buf, *cow_ret); + return ret; } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 28188a7..cd6906e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -28,6 +28,7 @@ #include linux/wait.h #include linux/slab.h #include linux/kobject.h +#include trace/events/btrfs.h #include asm/kmap_types.h #include extent_io.h #include extent_map.h diff --git a/fs/btrfs/delayed-ref.c
[PATCH 2/2 v3] Btrfs: Per file/directory controls for COW and compression
From: Liu Bo liubo2...@cn.fujitsu.com Subject: [PATCH 2/2 v3] Btrfs: Per file/directory controls for COW and compression Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to Chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1-v2: - rebase to the latest btrfs. v2-v3: - fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW will be screwed by inheritance from parent directory. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/inode.c | 31 --- fs/btrfs/ioctl.c | 41 + 4 files changed, 72 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8b4b9d1..b77d1a5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1283,6 +1283,7 @@ struct btrfs_root { #define BTRFS_INODE_NODUMP (1 8) #define BTRFS_INODE_NOATIME(1 9) #define BTRFS_INODE_DIRSYNC(1 10) +#define BTRFS_INODE_COMPRESS (1 11) /* some macros to generate set/get funcs for the struct fields. This * assumes there is a lefoo_to_cpu for every type, so lets make a simple diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..a894c12 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + /* +* In the long term, we'll store the compression type in the super +* block, and it'll be used for per file compression control. +*/ + fs_info-compress_type = BTRFS_COMPRESS_ZLIB; + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index db67821..2d9910d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -381,7 +381,8 @@ again: */ if (!(BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) (btrfs_test_opt(root, COMPRESS) || -(BTRFS_I(inode)-force_compress))) { +(BTRFS_I(inode)-force_compress) || +(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS))) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); @@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); else if (!btrfs_test_opt(root, COMPRESS) -!(BTRFS_I(inode)-force_compress)) +!(BTRFS_I(inode)-force_compress) +!(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); else @@ -4586,7 +4588,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, if ((mode S_IFREG)) { if (btrfs_test_opt(root, NODATASUM)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM; - if (btrfs_test_opt(root, NODATACOW)) + if (btrfs_test_opt(root, NODATACOW) || + (BTRFS_I(dir)-flags BTRFS_INODE_NODATACOW)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW; } @@ -6803,6 +6806,26 @@ static int btrfs_getattr(struct vfsmount *mnt, return 0; } +/* + * If a file is moved, it will inherit the cow and compression flags of the new + * directory. + */ +static void fixup_inode_flags(struct inode *dir, struct inode *inode) +{ + struct btrfs_inode *b_dir = BTRFS_I(dir); + struct btrfs_inode *b_inode = BTRFS_I(inode); + + if (b_dir-flags BTRFS_INODE_NODATACOW) + b_inode-flags |= BTRFS_INODE_NODATACOW; + else + b_inode-flags = ~BTRFS_INODE_NODATACOW; + + if (b_dir-flags BTRFS_INODE_COMPRESS) + b_inode-flags |= BTRFS_INODE_COMPRESS; + else + b_inode-flags = ~BTRFS_INODE_COMPRESS; +} +
[PATCH 1/2 v2] Btrfs: add datacow flag in inode flag
For datacow control, the corresponding inode flags are needed. This is for btrfs use. v1-v2: Change FS_COW_FL to another bit due to conflict with the upstream e2fsprogs Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/fs.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d069b..dbcb47e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -353,6 +353,8 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x0002 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x0008 /* Extents */ #define FS_DIRECTIO_FL 0x0010 /* Use direct i/o */ +#define FS_NOCOW_FL0x0080 /* Do not cow file */ +#define FS_COW_FL 0x0200 /* Cow file */ #define FS_RESERVED_FL 0x8000 /* reserved for ext2 lib */ #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression
Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). v1-v2: Rebase the patch with the latest btrfs. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/inode.c | 32 fs/btrfs/ioctl.c | 41 + 4 files changed, 72 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8b4b9d1..b77d1a5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1283,6 +1283,7 @@ struct btrfs_root { #define BTRFS_INODE_NODUMP (1 8) #define BTRFS_INODE_NOATIME(1 9) #define BTRFS_INODE_DIRSYNC(1 10) +#define BTRFS_INODE_COMPRESS (1 11) /* some macros to generate set/get funcs for the struct fields. This * assumes there is a lefoo_to_cpu for every type, so lets make a simple diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..a894c12 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + /* +* In the long term, we'll store the compression type in the super +* block, and it'll be used for per file compression control. +*/ + fs_info-compress_type = BTRFS_COMPRESS_ZLIB; + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index db67821..e687bb9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -381,7 +381,8 @@ again: */ if (!(BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) (btrfs_test_opt(root, COMPRESS) || -(BTRFS_I(inode)-force_compress))) { +(BTRFS_I(inode)-force_compress) || +(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS))) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); @@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); else if (!btrfs_test_opt(root, COMPRESS) -!(BTRFS_I(inode)-force_compress)) +!(BTRFS_I(inode)-force_compress) +!(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); else @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, location-offset = 0; btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY); - btrfs_inherit_iflags(inode, dir); - if ((mode S_IFREG)) { if (btrfs_test_opt(root, NODATASUM)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM; @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW; } + btrfs_inherit_iflags(inode, dir); + insert_inode_hash(inode); inode_tree_add(inode); return inode; @@ -6803,6 +6805,26 @@ static int btrfs_getattr(struct vfsmount *mnt, return 0; } +/* + * If a file is moved, it will inherit the cow and compression flags of the new + * directory. + */ +static void fixup_inode_flags(struct inode *dir, struct inode *inode) +{ + struct btrfs_inode *b_dir = BTRFS_I(dir); + struct btrfs_inode *b_inode = BTRFS_I(inode); + + if (b_dir-flags BTRFS_INODE_NODATACOW) + b_inode-flags |= BTRFS_INODE_NODATACOW; + else + b_inode-flags = ~BTRFS_INODE_NODATACOW; + + if (b_dir-flags BTRFS_INODE_COMPRESS) + b_inode-flags |= BTRFS_INODE_COMPRESS; + else + b_inode-flags = ~BTRFS_INODE_COMPRESS; +} + static int btrfs_rename(struct inode *old_dir, struct dentry
Re: [PATCH 2/2 v2] Btrfs: Per file/directory controls for COW and compression
On 03/22/2011 01:43 AM, Johann Lombardi wrote: On Mon, Mar 21, 2011 at 04:57:13PM +0800, liubo wrote: @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, location-offset = 0; btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY); -btrfs_inherit_iflags(inode, dir); - if ((mode S_IFREG)) { if (btrfs_test_opt(root, NODATASUM)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM; @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW; } +btrfs_inherit_iflags(inode, dir); The problem is that btrfs_inherit_iflags() overwrites BTRFS_I(inode)-flags with the parent's flags, so you lose BTRFS_INODE_NODATA{SUM|COW}. Thanks for pointing this, will fix it. thanks, liubo Johann -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: add datacow flag in inode flag
On 03/16/2011 05:06 PM, Amir Goldstein wrote: On Wed, Mar 16, 2011 at 1:35 AM, Chris Mason chris.ma...@oracle.com wrote: Excerpts from Andreas Dilger's message of 2011-03-15 18:06:49 -0400: On 2011-03-15, at 2:57 PM, Christoph Hellwig wrote: On Tue, Mar 15, 2011 at 04:26:50PM -0400, Chris Mason wrote: #define FS_EXTENT_FL 0x0008 /* Extents */ #define FS_DIRECTIO_FL 0x0010 /* Use direct i/o */ +#define FS_NOCOW_FL 0x0080 /* Do not cow file */ +#define FS_COW_FL0x0100 /* Cow file */ #define FS_RESERVED_FL 0x8000 /* reserved for ext2 lib */ I'm fine with it. I'll defer the check for conflicts with extN-specific flags to Ted, though. Looking at the upstream e2fsprogs I see in that range: #define EXT4_EXTENTS_FL 0x0008 /* Inode uses extents */ #define EXT4_EA_INODE_FL 0x0020 /* Inode used for large EA */ #define EXT4_EOFBLOCKS_FL 0x0040 /* Blocks allocated beyond EOF */ #define EXT4_SNAPFILE_FL 0x0100 /* Inode is a snapshot */ #define EXT4_SNAPFILE_DELETED_FL 0x0400 /* Snapshot is being deleted */ #define EXT4_SNAPFILE_SHRUNK_FL 0x0800 /* Snapshot shrink has completed */ #define EXT2_RESERVED_FL 0x8000 /* reserved for ext2 lib */ #define EXT2_FL_USER_VISIBLE 0x004BDFFF /* User visible flags */ so there is a conflict with FS_COW_FL and EXT4_SNAPFILE_FL. I don't know the semantics of those two flags enough to say for sure whether it is reasonable that they alias to each other, but at first glance COW and SNAPSHOT don't seem completely unrelated. EXT4_SNAPFILE_FL indicates a special system snapshot file, so it has no equivalence relation with FS_COW_FL. Please use 0x0200 for FS_COW_FL. Fine with that, but it's up to Chris. :) thanks, liubo EXT4_SNAPFILE_DELETED_FL is a persistent state of a snapshot file, which is no longer available as a mountable device, but cannot be unlinked because it holds changed data sets needed by older snapshots. EXT4_SNAPFILE_SHRUNK_FL is a persistent state of a (deleted) snapshot file, which has undergone a shrink process to free all change sets not needed by older snapshots. The persistence of the flag is needed to avoid tedious shrinking when it is not needed. In the btrfs case FS_COW_FL means to do COW even when there are no snapshots. FS_NOCOW_FL means to do cow only when there are snapshots. I am interested in FS_NOCOW_FL as well, but for my implementation it would mean do not do COW on rewrites even when there are snapshots, so a user can create a pre-allocated island of blocks, which are pinned to a physical location, for raw VM image for example. Thanks, Amir. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Btrfs: fix OOPS of empty filesystem after balance
On 03/07/2011 10:13 AM, liubo wrote: btrfs will remove unused block groups after balance. When a empty filesystem is balanced, the block group with tag DATA may be dropped, and after umount and mount again, it will not find DATA space_info and lead to OOPS. So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS. Hi, Chirs, These two fixes are for critical problems(one OOPS and one memory leak), so would you please take some time to review them and check if they are ready for the next git pull? Seems that you have been a lot busy these days. ;) thanks, liubo Reported-by: Daniel J Blueman daniel.blue...@gmail.com Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/extent-tree.c | 23 +++ 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 28188a7..49c50e5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2221,6 +2221,7 @@ int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr, u64 num_bytes); int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 type); +int btrfs_init_space_info(struct btrfs_fs_info *fs_info); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..8bcdc62 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1967,6 +1967,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info-metadata_alloc_profile = (u64)-1; fs_info-system_alloc_profile = fs_info-metadata_alloc_profile; + ret = btrfs_init_space_info(fs_info); + if (ret) { + printk(KERN_ERR Failed to initial space info: %d\n, ret); + goto fail_block_groups; + } + ret = btrfs_read_block_groups(extent_root); if (ret) { printk(KERN_ERR Failed to read block groups: %d\n, ret); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 100e409..08525ee 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8714,6 +8714,29 @@ out: return ret; } +int btrfs_init_space_info(struct btrfs_fs_info *fs_info) +{ + struct btrfs_space_info *space_info; + int ret; + + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0, + space_info); + if (ret) + return ret; + + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0, + space_info); + if (ret) + return ret; + + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0, + space_info); + if (ret) + return ret; + + return ret; +} + int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 end) { return unpin_extent_range(root, start, end); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: fix OOPS of empty filesystem after balance
btrfs will remove unused block groups after balance. When a empty filesystem is balanced, the block group with tag DATA may be dropped, and after umount and mount again, it will not find DATA space_info and lead to OOPS. So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS. Reported-by: Daniel J Blueman daniel.blue...@gmail.com Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/extent-tree.c | 23 +++ 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 28188a7..49c50e5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2221,6 +2221,7 @@ int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr, u64 num_bytes); int btrfs_force_chunk_alloc(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 type); +int btrfs_init_space_info(struct btrfs_fs_info *fs_info); /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..8bcdc62 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1967,6 +1967,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info-metadata_alloc_profile = (u64)-1; fs_info-system_alloc_profile = fs_info-metadata_alloc_profile; + ret = btrfs_init_space_info(fs_info); + if (ret) { + printk(KERN_ERR Failed to initial space info: %d\n, ret); + goto fail_block_groups; + } + ret = btrfs_read_block_groups(extent_root); if (ret) { printk(KERN_ERR Failed to read block groups: %d\n, ret); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 100e409..08525ee 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8714,6 +8714,29 @@ out: return ret; } +int btrfs_init_space_info(struct btrfs_fs_info *fs_info) +{ + struct btrfs_space_info *space_info; + int ret; + + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_SYSTEM, 0, 0, +space_info); + if (ret) + return ret; + + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_METADATA, 0, 0, +space_info); + if (ret) + return ret; + + ret = update_space_info(fs_info, BTRFS_BLOCK_GROUP_DATA, 0, 0, +space_info); + if (ret) + return ret; + + return ret; +} + int btrfs_error_unpin_extent_range(struct btrfs_root *root, u64 start, u64 end) { return unpin_extent_range(root, start, end); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: fix memory leak of empty filesystem after balance
After Josef's patch(commit 3c14874acc71180553fb5aba528e3cf57c5b958b), btrfs will exclude super bytes when reading block groups(by marking a extent state UPTODATE). However, these bytes do not get freed while balance remove unused block groups, and we won't process those removed ones any more, when we do umount and unload the btrfs module, btrfs hits a memory leak. This patch add the missing free operation. Reproduce steps: $ mkfs.btrfs disk $ mount disk /mnt/btrfs -o loop $ btrfs filesystem balance /mnt/btrfs $ umount /mnt/btrfs $ rmmod btrfs Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/extent-tree.c |6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 08525ee..a1af67a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8611,6 +8611,12 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, BUG_ON(!block_group); BUG_ON(!block_group-ro); + /* +* Free the reserved super bytes from this block group before +* remove it. +*/ + free_excluded_extents(root, block_group); + memcpy(key, block_group-key, sizeof(key)); if (block_group-flags (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 | -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Btrfs: add datacow flag in inode flag
For datacow control, the corresponding inode flags are needed. This is for the following patch. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- include/linux/fs.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 63d069b..bef47ff 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -353,6 +353,8 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x0002 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x0008 /* Extents */ #define FS_DIRECTIO_FL 0x0010 /* Use direct i/o */ +#define FS_NOCOW_FL0x0080 /* Do not cow file */ +#define FS_COW_FL 0x0100 /* Cow file */ #define FS_RESERVED_FL 0x8000 /* reserved for ext2 lib */ #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Btrfs: Per file/directory controls for COW and compression
Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. According to chris's comment, there should be just one true compression method(probably LZO) stored in the super. However, before this, we would wait for that one method is stable enough to be adopted into the super. So I list it as a long term goal, and just store it in ram today. After applying this patch, we can use the generic FS_IOC_SETFLAGS ioctl to control file and directory's datacow and compression attribute. NOTE: - The compression type is selected by such rules: If we mount btrfs with compress options, ie, zlib/lzo, the type is it. Otherwise, we'll use the default compress type (zlib today). Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |1 + fs/btrfs/disk-io.c |6 ++ fs/btrfs/inode.c | 32 fs/btrfs/ioctl.c | 41 + 4 files changed, 72 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 28188a7..2639107 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1274,6 +1274,7 @@ struct btrfs_root { #define BTRFS_INODE_NODUMP (1 8) #define BTRFS_INODE_NOATIME(1 9) #define BTRFS_INODE_DIRSYNC(1 10) +#define BTRFS_INODE_COMPRESS (1 11) /* some macros to generate set/get funcs for the struct fields. This * assumes there is a lefoo_to_cpu for every type, so lets make a simple diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3e1ea3e..a894c12 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1762,6 +1762,12 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + /* +* In the long term, we'll store the compression type in the super +* block, and it'll be used for per file compression control. +*/ + fs_info-compress_type = BTRFS_COMPRESS_ZLIB; + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 44b9266..82ca86f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -381,7 +381,8 @@ again: */ if (!(BTRFS_I(inode)-flags BTRFS_INODE_NOCOMPRESS) (btrfs_test_opt(root, COMPRESS) || -(BTRFS_I(inode)-force_compress))) { +(BTRFS_I(inode)-force_compress) || +(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS))) { WARN_ON(pages); pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_NOFS); @@ -1253,7 +1254,8 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); else if (!btrfs_test_opt(root, COMPRESS) -!(BTRFS_I(inode)-force_compress)) +!(BTRFS_I(inode)-force_compress) +!(BTRFS_I(inode)-flags BTRFS_INODE_COMPRESS)) ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); else @@ -4581,8 +4583,6 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, location-offset = 0; btrfs_set_key_type(location, BTRFS_INODE_ITEM_KEY); - btrfs_inherit_iflags(inode, dir); - if ((mode S_IFREG)) { if (btrfs_test_opt(root, NODATASUM)) BTRFS_I(inode)-flags |= BTRFS_INODE_NODATASUM; @@ -4590,6 +4590,8 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans, BTRFS_I(inode)-flags |= BTRFS_INODE_NODATACOW; } + btrfs_inherit_iflags(inode, dir); + insert_inode_hash(inode); inode_tree_add(inode); return inode; @@ -6801,6 +6803,26 @@ static int btrfs_getattr(struct vfsmount *mnt, return 0; } +/* + * If a file is moved, it will inherit the cow and compression flags of the new + * directory. + */ +static void fixup_inode_flags(struct inode *dir, struct inode *inode) +{ + struct btrfs_inode *b_dir = BTRFS_I(dir); + struct btrfs_inode *b_inode = BTRFS_I(inode); + + if (b_dir-flags BTRFS_INODE_NODATACOW) + b_inode-flags |= BTRFS_INODE_NODATACOW; + else + b_inode-flags = ~BTRFS_INODE_NODATACOW; + + if (b_dir-flags BTRFS_INODE_COMPRESS) + b_inode-flags |= BTRFS_INODE_COMPRESS; + else + b_inode-flags = ~BTRFS_INODE_COMPRESS; +} + static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode
Re: [RFC PATCH] Btrfs: add ioctl to set compress or cow per file/dir
On 02/24/2011 10:54 PM, Chris Mason wrote: Excerpts from liubo's message of 2011-02-24 04:40:55 -0500: Data compression and data cow are controlled across the entire FS by mount options right now. ioctls are needed to set this on a per file or per directory basis. This has been proposed previously, but VFS developers wanted us to use generic ioctls rather than btrfs-specific ones. We need to fit these into the existing per-inode flags, and to use the generic FS_IOCTL_SETFLAGS ioctl. For data compression, there are the existing compression flags of vfs inode, while for datacow, there is no flag to indicate it, which we need to add. So, what we will do is to add datacow flag in vfs inode flags and then to set or to unset btrfs compress/cow flag on the corresponding btrfs inode's flag per file or per directory. Moreover, we also add a compression type ioctl to make this feature more flexible. I really expect some advices and comments on the followings: - In this patch, I made a special ioctl to set compress type, and to record the compress_type per inode on disk, I've consumed some reserved space of btrfs_inode_item, so is this acceptable? I don't expect people to mix compression types on the disk. There really should just be one true compression method (probably LZO once it has been established for a while). So, I'd prefer that we store this in the super, and just have flags in the inode for enabling or disabling compression. It sounds nice and will make code neatly. :) So, all files directories will share the same compress type stored in the super. Meanwhile, I got another idea from my collegue, could we just owe the whole compress type thing to new proper mount options, ie, mount xxx xxx -o compress=a,inode_compress=b? Seems that this makes mount more flexible. It does make it more flexible, but I think sometimes extra flexibility leads to more QA time and isn't often used by the actual users ;) ok. - When we are inclined to set inode's compression type, should it be a force mode? This is much like the difference between mount as compress and mount as compress-force. I'd store this as flags in the super too. ok. - For directory basis, after compress/cow ioctl on it, any files that are created or renamed in it, or moved into it, will inherit the directory's compress and datacow attribute. Here comes to some disputes, is it right that renamed and moved files also inherit the father directory's compress datacow attribute? And if what we are dealing with is directory, should this behaviour be recursive or not? I'm inclined to leave these recursive things to btrfs-progs if this is necessary. I'd say that if we rename a file into a directory it does inherit, but not make it recursive. ok, got it. I will send a new version based on this thread. Thanks a lot for reviewing! thanks, liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Btrfs: add ioctl to set compress or cow per file/dir
On 02/25/2011 02:39 AM, Chris Mason wrote: Excerpts from Andreas Dilger's message of 2011-02-24 13:37:52 -0500: On 2011-02-24, at 2:40 AM, liubo wrote: #define FS_DIRECTIO_FL0x0010 /* Use direct i/o */ +#define FS_NOCOW_FL0x0020 /* Do not cow file */ +#define FS_COW_FL0x0010 /* Cow file */ #define FS_RESERVED_FL0x8000 /* reserved for ext2 lib */ I'm assuming that FS_COW_FL should not be the same as FS_DIRECTIO_FL? No, we can do DIRECTIO with COW. Sorry for my fault, thanks for pointing it out. thanks, liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: make inode ref log recovery faster
On 02/22/2011 10:32 PM, David Sterba wrote: Hi, no deeper analysis done, but the double free error was obvious :) On Tue, Feb 22, 2011 at 07:42:25PM +0800, liubo wrote: When we recover from crash via write-ahead log tree and process the inode refs, for each btrfs_inode_ref item, we will 1) check if we already have a perfect match in fs/file tree, if we have, then we're done. 2) search the corresponding back reference in fs/file tree, and check all the names in this back reference to see if they are also in the log to avoid conflict corners. 3) recover the logged inode refs to fs/file tree. In current btrfs, however, - for 2)'s check, once is enough, since the checked back references will remain unchanged after processing all the inode refs belonged to the key. - it has no need to do another 1) between 2) and 3). This patch focus on the above problems and I've made a small test to show how it improves, $dd if=/dev/zero of=foobar bs=4K count=1 $sync $make 100 hard links continuously, like ln foobar link_i $fsync foobar $echo b /proc/sysrq-trigger after reboot $time mount DEV PATH without patch: real 0m0.285s user 0m0.001s sys 0m0.009s with patch: real 0m0.123s user 0m0.000s sys 0m0.010s Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/tree-log.c | 33 +++-- 1 files changed, 11 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index a4bbb85..8f2a9f3 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -799,12 +799,12 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, struct inode *dir; int ret; struct btrfs_inode_ref *ref; -struct btrfs_dir_item *di; struct inode *inode; char *name; int namelen; unsigned long ref_ptr; unsigned long ref_end; +int search_done = 0; /* * it is possible that we didn't log all the parent directories @@ -845,7 +845,10 @@ again: * existing back reference, and we don't want to create * dangling pointers in the directory. */ -conflict_again: + +if (search_done) +goto insert; + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); if (ret == 0) { char *victim_name; @@ -888,35 +891,21 @@ conflict_again: victim_name_len); kfree(victim_name); ^^^ btrfs_release_path(root, path); -goto conflict_again; } kfree(victim_name); ^^^ double free thanks for reviewing, but the first one is followed by a goto phrase, so IMO it is ok. ptr = (unsigned long)(victim_ref + 1) + victim_name_len; } BUG_ON(ret); -} -btrfs_release_path(root, path); -/* look for a conflicting sequence number */ -di = btrfs_lookup_dir_index_item(trans, root, path, dir-i_ino, - btrfs_inode_ref_index(eb, ref), - name, namelen, 0); -if (di !IS_ERR(di)) { -ret = drop_one_dir_item(trans, root, path, dir, di); -BUG_ON(ret); -} -btrfs_release_path(root, path); - - -/* look for a conflicting name */ -di = btrfs_lookup_dir_item(trans, root, path, dir-i_ino, - name, namelen, 0); -if (di !IS_ERR(di)) { -ret = drop_one_dir_item(trans, root, path, dir, di); -BUG_ON(ret); +/* + * NOTE: we have searched root tree and checked the + * coresponding ref, it does not need to check again. + */ +search_done = 1; } btrfs_release_path(root, path); +insert: /* insert our name */ ret = btrfs_add_link(trans, dir, inode, name, namelen, 0, btrfs_inode_ref_index(eb, ref)); -- d/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: make inode ref log recovery faster
On 02/23/2011 09:30 AM, Josef Bacik wrote: On Wed, Feb 23, 2011 at 09:12:36AM +0800, liubo wrote: On 02/22/2011 10:32 PM, David Sterba wrote: Hi, no deeper analysis done, but the double free error was obvious :) On Tue, Feb 22, 2011 at 07:42:25PM +0800, liubo wrote: When we recover from crash via write-ahead log tree and process the inode refs, for each btrfs_inode_ref item, we will 1) check if we already have a perfect match in fs/file tree, if we have, then we're done. 2) search the corresponding back reference in fs/file tree, and check all the names in this back reference to see if they are also in the log to avoid conflict corners. 3) recover the logged inode refs to fs/file tree. In current btrfs, however, - for 2)'s check, once is enough, since the checked back references will remain unchanged after processing all the inode refs belonged to the key. - it has no need to do another 1) between 2) and 3). This patch focus on the above problems and I've made a small test to show how it improves, $dd if=/dev/zero of=foobar bs=4K count=1 $sync $make 100 hard links continuously, like ln foobar link_i $fsync foobar $echo b /proc/sysrq-trigger after reboot $time mount DEV PATH without patch: real 0m0.285s user 0m0.001s sys0m0.009s with patch: real 0m0.123s user 0m0.000s sys0m0.010s Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/tree-log.c | 33 +++-- 1 files changed, 11 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index a4bbb85..8f2a9f3 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -799,12 +799,12 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans, struct inode *dir; int ret; struct btrfs_inode_ref *ref; - struct btrfs_dir_item *di; struct inode *inode; char *name; int namelen; unsigned long ref_ptr; unsigned long ref_end; + int search_done = 0; /* * it is possible that we didn't log all the parent directories @@ -845,7 +845,10 @@ again: * existing back reference, and we don't want to create * dangling pointers in the directory. */ -conflict_again: + + if (search_done) + goto insert; + ret = btrfs_search_slot(NULL, root, key, path, 0, 0); if (ret == 0) { char *victim_name; @@ -888,35 +891,21 @@ conflict_again: victim_name_len); kfree(victim_name); ^^^ btrfs_release_path(root, path); - goto conflict_again; } kfree(victim_name); ^^^ double free thanks for reviewing, but the first one is followed by a goto phrase, so IMO it is ok. Your patch removes that goto, so it's not ok. Thanks, ahh, my fault. I'll fix it, thanks a lot, :) liubo Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Building btrfs as a dkms module on Debian
On 02/15/2011 11:35 PM, Yuri D'Elia wrote: Hi everyone. I was trying to test a more recent version of btrfs on my current kernel (2.6.37) using dkms, without success. I followed these instructions: https://btrfs.wiki.kernel.org/index.php/Btrfs_source_repositories - cloned the repo - symlinked to /usr/src/btrfs-git - patched version.sh: Please note version.sh requires bash (better to change the shebang or fix the script). Even with the patch, version.sh run on a shallow repository generates a -dirty version. I assume this is OK, even though there are no local changes. - run version.sh - dkms add -m btrfs -v git - dkms build -m btrfs -v git fails with: /var/lib/dkms/btrfs/git/build/extent-tree.c: In function ‘btrfs_issue_discard’: /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: ‘BLKDEV_IFL_WAIT’ undeclared (first use in this function) /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: (Each undeclared identifier is reported only once /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: for each function it appears in.) /var/lib/dkms/btrfs/git/build/extent-tree.c:1747: error: ‘BLKDEV_IFL_BARRIER’ undeclared (first use in this function) I assume BLKDEV_IFL_WAIT/BARRIER was added in later kernels? Is there a way to make it build btrfs for 2.6.37? in commit fbd9b09a177a481eda256447c881f014f29034fe: include/linux/blkdev.h: #define BLKDEV_IFL_WAIT (1 BLKDEV_WAIT) #define BLKDEV_IFL_BARRIER (1 BLKDEV_BARRIER) #define BLKDEV_IFL_SECURE (1 BLKDEV_SECURE) Maybe this is helpful.:) thanks, liubo Thanks -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs: fix missing break in switch phrase
There is a missing break in switch, fix it. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/print-tree.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c index 0d126be..fb2605d 100644 --- a/fs/btrfs/print-tree.c +++ b/fs/btrfs/print-tree.c @@ -260,6 +260,7 @@ void btrfs_print_leaf(struct btrfs_root *root, struct extent_buffer *l) #else BUG(); #endif + break; case BTRFS_BLOCK_GROUP_ITEM_KEY: bi = btrfs_item_ptr(l, i, struct btrfs_block_group_item); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix return value check of btrfs_start_transaction()
On 01/21/2011 12:09 AM, Josef Bacik wrote: On Thu, Jan 20, 2011 at 03:19:37PM +0900, Tsutomu Itoh wrote: The error check of btrfs_start_transaction() is added, and the mistake of the error check on several places is corrected. I'd rather we go through and have these things return an error than do a BUG_ON(). We're moving towards a more stable BTRFS, not one that panics more often :). Thanks, Great, seems that we all feel it is the time to focus on this. :) Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: forced readonly mounts on errors
On 01/18/2011 03:56 AM, Chris Mason wrote: Excerpts from liubo's message of 2011-01-06 06:30:25 -0500: This patch comes from Forced readonly mounts on errors ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that after FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. I think this is a good overall framework and it will meet our needs nicely as we scale up the error handling in the filesystem. One concern I have is where we save the error state to disk: +static void __save_error_info(struct btrfs_fs_info *fs_info) +{ +struct btrfs_super_block *disk_super = fs_info-super_copy; + +fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR; +disk_super-flags |= cpu_to_le64(BTRFS_SUPER_FLAG_ERROR); + +mutex_lock(fs_info-trans_mutex); +memcpy(fs_info-super_for_commit, disk_super, + sizeof(fs_info-super_for_commit)); +mutex_unlock(fs_info-trans_mutex); The super_for_commit isn't changed until we have a fully consistent set of fields in the super block. The super_copy is changed as the transaction progresses. So, this memcpy isn't quite safe. We should simply set the flag on the super_for_commit and the super_copy individually. Got it, thanks for pointing it out. I'll make this change and pull it in. We can build from here. Great! thanks, Liubo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: forced readonly mounts on errors
This patch comes from Forced readonly mounts on errors ideas. As we know, this is the first step in being more fault tolerant of disk corruptions instead of just using BUG() statements. The major content: - add a framework for generating errors that should result in filesystems going readonly. - keep FS state in disk super block. - make sure that all of resource will be freed and released at umount time. - make sure that after FS is forced readonly on error, there will be no more disk change before FS is corrected. For this, we should stop write operation. After this patch is applied, the conversion from BUG() to such a framework can happen incrementally. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h | 24 +++ fs/btrfs/disk-io.c | 389 +++- fs/btrfs/disk-io.h |1 + fs/btrfs/extent-tree.c | 11 ++ fs/btrfs/file.c| 11 ++ fs/btrfs/super.c | 88 +++ fs/btrfs/transaction.c |3 + 7 files changed, 525 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index af52f6d..63c35f8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -294,6 +294,14 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) #define BTRFS_FSID_SIZE 16 #define BTRFS_HEADER_FLAG_WRITTEN (1ULL 0) #define BTRFS_HEADER_FLAG_RELOC(1ULL 1) + +/* + * File system states + */ + +/* Errors detected */ +#define BTRFS_SUPER_FLAG_ERROR (1ULL 2) + #define BTRFS_SUPER_FLAG_SEEDING (1ULL 32) #define BTRFS_SUPER_FLAG_METADUMP (1ULL 33) @@ -1050,6 +1058,9 @@ struct btrfs_fs_info { unsigned metadata_ratio; void *bdev_holder; + + /* filesystem state */ + u64 fs_state; }; /* @@ -2188,6 +2199,11 @@ int btrfs_set_block_group_ro(struct btrfs_root *root, int btrfs_set_block_group_rw(struct btrfs_root *root, struct btrfs_block_group_cache *cache); void btrfs_put_block_group_cache(struct btrfs_fs_info *info); +int btrfs_error_unpin_extent_range(struct btrfs_root *root, + u64 start, u64 end); +int btrfs_error_discard_extent(struct btrfs_root *root, u64 bytenr, + u64 num_bytes); + /* ctree.c */ int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key, int level, int *slot); @@ -2541,6 +2557,14 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); /* super.c */ int btrfs_parse_options(struct btrfs_root *root, char *options); int btrfs_sync_fs(struct super_block *sb, int wait); +void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, +unsigned int line, int errno); + +#define btrfs_std_error(fs_info, errno)\ +do { \ + if ((errno))\ + __btrfs_std_error((fs_info), __func__, __LINE__, (errno));\ +} while (0) /* acl.c */ #ifdef CONFIG_BTRFS_FS_POSIX_ACL diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a5d2249..4f70256 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -44,6 +44,20 @@ static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + int read_only); +static int btrfs_destroy_ordered_operations(struct btrfs_root *root); +static int btrfs_destroy_ordered_extents(struct btrfs_root *root); +static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans, + struct btrfs_root *root); +static int btrfs_destroy_pending_snapshots(struct btrfs_transaction *t); +static int btrfs_destroy_delalloc_inodes(struct btrfs_root *root); +static int btrfs_destroy_marked_extents(struct btrfs_root *root, + struct extent_io_tree *dirty_pages, + int mark); +static int btrfs_destroy_pinned_extent(struct btrfs_root *root, + struct extent_io_tree *pinned_extents); +static int btrfs_cleanup_transaction(struct btrfs_root *root); /* * end_io_wq structs are used to do processing in task context when an IO is @@ -1727,6 +1741,11 @@ struct btrfs_root *open_ctree(struct super_block *sb, if (!btrfs_super_root(disk_super)) goto fail_iput; + /* check FS state, whether FS is broken. */ + fs_info-fs_state |= btrfs_super_flags(disk_super); + + btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; @@ -1957,7 +1976,9 @@ struct btrfs_root *open_ctree(struct super_block *sb,
Re: [RFC PATCH 0/5 v3] Btrfs: Add readonly support to replace BUG_ON phrase
Hi, chris, Is there any comment on this Forced readonly mounts on errors patchset? thanks, Liu Bo On 12/03/2010 04:15 PM, liubo wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. We still need a incompat flag to make old kernels happy. This patchset needs more test. v2-v3: - since btrfs may do log replay after crash, even it is mounted as readonly, and we have add a readonly check at start transaction time, it needs to set and to restore readonly flags around log replay. v1-v2: - in order to avoid deadlock thing, move write super stuff from error handle path to unmount time. - remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it simple. - add MS_RDONLY check at start of a transaction instead of commit transaction. --- fs/btrfs/ctree.h | 19 ++ fs/btrfs/disk-io.c | 56 +- fs/btrfs/super.c | 88 fs/btrfs/transaction.c |3 ++ 4 files changed, 164 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
On 12/15/2010 04:45 PM, Yan, Zheng wrote: On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. However, is it weird to make changes to disk as fs is in readonly state? IMO, btrfs needs to limit the use of these disk-change while readonly cases, as it is not what readonly means. Since it has been here, we can bypass readonly in those cases(as I did in the 5th patch): ... flags = sb-s_flags; if (sb-s_flags MS_RDONLY) sb-s_flags = ~MS_RDONLY remount() sb-s_flags = flags; ... thanks, Liu Bo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
On 12/16/2010 12:03 AM, Chris Mason wrote: Excerpts from liubo's message of 2010-12-15 04:12:14 -0500: On 12/15/2010 04:45 PM, Yan, Zheng wrote: On Fri, Dec 3, 2010 at 4:16 PM, liubo liubo2...@cn.fujitsu.com wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) There are cases that we need to start transaction when MS_RDONLY flag is set. For example, remount FS into read-only mode and log replay. However, is it weird to make changes to disk as fs is in readonly state? IMO, btrfs needs to limit the use of these disk-change while readonly cases, as it is not what readonly means. reiserfs and ext3 at least have always done this. Log replay is required even when the FS is readonly. My concern is: now we have a forced readonly FS, which is already broken, if we still write something to disk, would it become more broken? Since it has been here, we can bypass readonly in those cases(as I did in the 5th patch): ... flags = sb-s_flags; if (sb-s_flags MS_RDONLY) sb-s_flags = ~MS_RDONLY I think we should have a dedicated flag to reflect a filesystem that is forced readonly, and check that flag instead. OK, we did have fs_state, a dedicated flag. thanks, Liu Bo -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix compile warning in fs/btrfs/inode.c
While compiling btrfs, I got belows: CC [M] fs/btrfs/inode.o fs/btrfs/inode.c: In function ‘btrfs_end_dio_bio’: fs/btrfs/inode.c:5720: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘sector_t’ LD [M] fs/btrfs/btrfs.o Building modules, stage 2. MODPOST 1 modules LD [M] fs/btrfs/btrfs.ko This fixes the compile warning. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/inode.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0f34cae..eff5aef 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5713,8 +5713,8 @@ static void btrfs_end_dio_bio(struct bio *bio, int err) if (err) { printk(KERN_ERR btrfs direct IO failed ino %lu rw %lu disk_bytenr %lu len %u err no %d\n, - dip-inode-i_ino, bio-bi_rw, bio-bi_sector, - bio-bi_size, err); + dip-inode-i_ino, bio-bi_rw, + (unsigned long)bio-bi_sector, bio-bi_size, err); dip-errors = 1; /* -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix compile warning in fs/btrfs/inode.c
On 12/08/2010 06:01 PM, liubo wrote: While compiling btrfs, I got belows: CC [M] fs/btrfs/inode.o fs/btrfs/inode.c: In function ‘btrfs_end_dio_bio’: fs/btrfs/inode.c:5720: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 4 has type ‘sector_t’ LD [M] fs/btrfs/btrfs.o Building modules, stage 2. MODPOST 1 modules LD [M] fs/btrfs/btrfs.ko This fixes the compile warning. Sorry, plz ignore this. Have seen someone post patch to fix this. thanks, Liu Bo Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/inode.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0f34cae..eff5aef 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5713,8 +5713,8 @@ static void btrfs_end_dio_bio(struct bio *bio, int err) if (err) { printk(KERN_ERR btrfs direct IO failed ino %lu rw %lu disk_bytenr %lu len %u err no %d\n, - dip-inode-i_ino, bio-bi_rw, bio-bi_sector, - bio-bi_size, err); + dip-inode-i_ino, bio-bi_rw, + (unsigned long)bio-bi_sector, bio-bi_size, err); dip-errors = 1; /* -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: create a unique inode number for all subvol entries
On 12/07/2010 04:48 AM, Josef Bacik wrote: Currently BTRFS has a problem where any subvol's will have the same inode numbers as other files in the parent subvol. This can cause problems with userspace apps that depend on inode numbers being unique across a volume. So in order to solve this problem we need to do the following 1) Create an empty key with the fake inode number for the subvol. This is a place holder, since we determine which inode number to start with by searching for the largest objectid in the subvolume, we need to make sure our fake inode number isn't reused by somebody else. 2) Save our fake inode number in our dir item. We can already store data in dir items, so just store the inode number. This is future proof since I explicitly check for data_len == sizeof(u64), that way if we change what data gets put in the dir item in the future, older kernels will be able to deal with it properly. Also if an older kernel mounts with this change it will be ok. Since subvols have their own st_dev it is ok for them to continue to have an inode number of 256, but the inode returned by readdir needs to be unique to the subvolume, so our fake inode number will be used for d_ino with readdir. I tested this with a program that Bruce Fields wrote to spit out the actual inode numbers and the inode number returned by readdir r...@test1244 ~]# touch /mnt/btrfs-test/foo [r...@test1244 ~]# touch /mnt/btrfs-test/bar [r...@test1244 ~]# touch /mnt/btrfs-test/baz [r...@test1244 ~]# ./btrfs-progs-unstable/btrfs subvol create /mnt/btrfs-test/subvol Create subvolume '/mnt/btrfs-test/subvol' [r...@test1244 ~]# ./readdir-test /mnt/btrfs-test/ . 256 256 .. 256 139265 foo 257 257 bar 258 258 baz 259 259 subvol 260 256 Thanks, Hi, Josef, The patch looks nice. since insert dir code is mainly same, what about to change btrfs_insert_dir_item ABI to use such phrase: int btrfs_insert_subvol_dir_item(...) { return btrfs_insert_dir_item(...); } does it make code simple? Thanks, Liu Bo Signed-off-by: Josef Bacik jo...@redhat.com --- fs/btrfs/ctree.h |6 +++ fs/btrfs/dir-item.c| 113 fs/btrfs/inode.c | 58 - fs/btrfs/ioctl.c | 13 - fs/btrfs/transaction.c | 22 +++-- 5 files changed, 202 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 54e4252..ea0662e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1161,6 +1161,8 @@ struct btrfs_root { #define BTRFS_DIR_LOG_INDEX_KEY 72 #define BTRFS_DIR_ITEM_KEY 84 #define BTRFS_DIR_INDEX_KEY 96 +#define BTRFS_DIR_SUBVOL_KEY 97 + /* * extent data is for file data */ @@ -2320,6 +2322,10 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, const char *name, int name_len, u64 dir, struct btrfs_key *location, u8 type, u64 index); +int btrfs_insert_subvol_dir_item(struct btrfs_trans_handle *trans, + struct btrfs_root *root, const char *name, + int name_len, u64 dir, u64 ino, + struct btrfs_key *location, u64 index); struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, u64 dir, diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c index f0cad5a..95d498f 100644 --- a/fs/btrfs/dir-item.c +++ b/fs/btrfs/dir-item.c @@ -116,6 +116,119 @@ int btrfs_insert_xattr_item(struct btrfs_trans_handle *trans, return ret; } +/** + * btrfs_insert_subvol_dir_item - setup the dir items for a subvol + * + * @trans: transaction handle + * @root: the root of the parent subvol + * @name: name of the subvol + * @name_len: the length of the name + * @dir: the objectid of the parent directory + * @ino: the unique inode number for the parent directory + * @key: the key that the items will point to + * @index: the dir index for readdir purposes + * + * Creates the dir item/dir index pair for the directory containing the subvol. + * This also creates a blank key to hold the made up inode number for the subvol + * in order to give us a unique to the parent subvol inode number. + */ +int btrfs_insert_subvol_dir_item(struct btrfs_trans_handle *trans, + struct btrfs_root *root, const char *name, + int name_len, u64 dir, u64 ino, + struct btrfs_key *location, u64 index) +{ + int ret = 0; + int ret2 = 0; + struct btrfs_path *path; + struct btrfs_dir_item *dir_item; + struct extent_buffer *leaf; + unsigned long name_ptr; + unsigned long
[RFC PATCH 0/5 v3] Btrfs: Add readonly support to replace BUG_ON phrase
Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. We still need a incompat flag to make old kernels happy. This patchset needs more test. v2-v3: - since btrfs may do log replay after crash, even it is mounted as readonly, and we have add a readonly check at start transaction time, it needs to set and to restore readonly flags around log replay. v1-v2: - in order to avoid deadlock thing, move write super stuff from error handle path to unmount time. - remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it simple. - add MS_RDONLY check at start of a transaction instead of commit transaction. --- fs/btrfs/ctree.h | 19 ++ fs/btrfs/disk-io.c | 56 +- fs/btrfs/super.c | 88 fs/btrfs/transaction.c |3 ++ 4 files changed, 164 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/5 v3] Btrfs: add readonly support for error handle
This patch provide a new error handle interface for those errors that handled by current BUG_ONs. In order to protect btrfs from panic, when it comes to those BUG_ON errors, the interface forces btrfs readonly and saves the FS state to disk. And the filesystem can be umounted, although mabye with some warning in kernel dmesg. Then btrfsck is helpful to recover btrfs. v1-v2: move write super stuff from error handle path to unmount in order to avoid deadlock. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |8 + fs/btrfs/super.c | 88 ++ 2 files changed, 96 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 92b5ca2..fc9b6a0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2552,6 +2552,14 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); /* super.c */ int btrfs_parse_options(struct btrfs_root *root, char *options); int btrfs_sync_fs(struct super_block *sb, int wait); +void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, +unsigned int line, int errno); + +#define btrfs_std_error(fs_info, errno)\ +do { \ + if ((errno))\ + __btrfs_std_error((fs_info), __func__, __LINE__, (errno));\ +} while (0) /* acl.c */ #ifdef CONFIG_BTRFS_FS_POSIX_ACL diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 718b10d..07c58f9 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -54,6 +54,94 @@ static const struct super_operations btrfs_super_ops; +static const char *btrfs_decode_error(struct btrfs_fs_info *fs_info, int errno, + char nbuf[16]) +{ + char *errstr = NULL; + + switch (errno) { + case -EIO: + errstr = IO failure; + break; + case -ENOMEM: + errstr = Out of memory; + break; + case -EROFS: + errstr = Readonly filesystem; + break; + default: + if (nbuf) { + if (snprintf(nbuf, 16, error %d, -errno) = 0) + errstr = nbuf; + } + break; + } + + return errstr; +} + +static void __save_error_info(struct btrfs_fs_info *fs_info) +{ + struct btrfs_super_block *disk_super = fs_info-super_copy; + + fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR; + disk_super-flags |= cpu_to_le64(BTRFS_SUPER_FLAG_ERROR); + + mutex_lock(fs_info-trans_mutex); + memcpy(fs_info-super_for_commit, disk_super, + sizeof(fs_info-super_for_commit)); + mutex_unlock(fs_info-trans_mutex); +} + +/* NOTE: + * We move write_super stuff at umount in order to avoid deadlock + * for umount hold all lock. + */ +static void save_error_info(struct btrfs_fs_info *fs_info) +{ + __save_error_info(fs_info); +} + +/* btrfs handle error by forcing the filesystem readonly */ +static void btrfs_handle_error(struct btrfs_fs_info *fs_info) +{ + struct super_block *sb = fs_info-sb; + + if (sb-s_flags MS_RDONLY) + return; + + if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) { + sb-s_flags |= MS_RDONLY; + printk(KERN_INFO btrfs is forced readonly\n); + } +} + +/* + * __btrfs_std_error decodes expected errors from the caller and + * invokes the approciate error response. + */ +void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, +unsigned int line, int errno) +{ + struct super_block *sb = fs_info-sb; + char nbuf[16]; + const char *errstr; + + /* +* Special case: if the error is EROFS, and we're already +* under MS_RDONLY, then it is safe here. +*/ + if (errno == -EROFS (sb-s_flags MS_RDONLY)) + return; + + errstr = btrfs_decode_error(fs_info, errno, nbuf); + printk(KERN_CRIT BTRFS error (device %s) in %s:%d: %s\n, + sb-s_id, function, line, errstr); + save_error_info(fs_info); + + btrfs_handle_error(fs_info); +} + static void btrfs_put_super(struct super_block *sb) { struct btrfs_root *root = btrfs_sb(sb); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/5 v3] Btrfs: avoid transaction stuff when btrfs is readonly
When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/5 v3] Btrfs: add filesystem state for error handle
Add filesystem state and a flags to tell if the filesystem is valid or insane now. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8db9234..92b5ca2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -294,6 +294,14 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) #define BTRFS_FSID_SIZE 16 #define BTRFS_HEADER_FLAG_WRITTEN (1ULL 0) #define BTRFS_HEADER_FLAG_RELOC(1ULL 1) + +/* + * File system states + */ + +/* Errors detected */ +#define BTRFS_SUPER_FLAG_ERROR (1ULL 2) + #define BTRFS_SUPER_FLAG_SEEDING (1ULL 32) #define BTRFS_SUPER_FLAG_METADUMP (1ULL 33) @@ -1050,6 +1058,9 @@ struct btrfs_fs_info { unsigned metadata_ratio; void *bdev_holder; + + /* filesystem state */ + u64 fs_state; }; /* -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/5 v3] Btrfs: deal with filesystem state at mount, umount
Since there is a filesystem state, we should deal with it carefully at mount, umount and remount. - At mount, the FS state should be checked if there is error on these FS. If it does have, btrfsck is recommended. - At umount, the FS state should be saved into disk for consistency. v2-v3: do write super stuff at umount time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 47 ++- 1 files changed, 46 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b40dfe4..15d795a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -43,6 +43,8 @@ static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + int read_only); /* * end_io_wq structs are used to do processing in task context when an IO is @@ -1700,6 +1702,11 @@ struct btrfs_root *open_ctree(struct super_block *sb, if (!btrfs_super_root(disk_super)) goto fail_iput; + /* check filesystem state */ + fs_info-fs_state |= btrfs_super_flags(disk_super); + + btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; @@ -2405,10 +2412,17 @@ int btrfs_commit_super(struct btrfs_root *root) up_write(root-fs_info-cleanup_work_sem); trans = btrfs_join_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + ret = btrfs_commit_transaction(trans, root); BUG_ON(ret); + /* run commit again to drop the original snapshot */ trans = btrfs_join_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + btrfs_commit_transaction(trans, root); ret = btrfs_write_and_wait_transaction(NULL, root); BUG_ON(ret); @@ -2426,8 +2440,28 @@ int close_ctree(struct btrfs_root *root) smp_mb(); btrfs_put_block_group_cache(fs_info); + + /* +* Here come 2 situations when btrfs flips readonly: +* +* 1. when btrfs flips readonly somewhere else before +* btrfs_commit_super, sb-s_flags has MS_RDONLY flag, +* and btrfs will skip to write sb directly to keep +* ERROR state on disk. +* +* 2. when btrfs flips readonly just in btrfs_commit_super, +* and in such case, btrfs cannnot write sb via btrfs_commit_super, +* and since fs_state has been set BTRFS_SUPER_FLAG_ERROR flag, +* btrfs will directly write sb. +*/ if (!(fs_info-sb-s_flags MS_RDONLY)) { - ret = btrfs_commit_super(root); + ret = btrfs_commit_super(root); + if (ret) + printk(KERN_ERR btrfs: commit super ret %d\n, ret); + } + + if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) { + ret = write_ctree_super(NULL, root, 0); if (ret) printk(KERN_ERR btrfs: commit super ret %d\n, ret); } @@ -2603,6 +2637,17 @@ out: return 0; } +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + int read_only) +{ + if (read_only) + return; + + if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) + printk(KERN_WARNING warning: mount fs with errors, + running btrfsck is recommended\n); +} + static struct extent_io_ops btree_extent_io_ops = { .write_cache_pages_lock_hook = btree_lock_page_hook, .readpage_end_io_hook = btree_readpage_end_io_hook, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 5/5 v3] Btrfs: avoid log replay when btrfs is insane
btrfs may do log replay even as mounted readonly, since we have added readonly check at start transaction time, in order to keep the original attribute, it needs to set and to restore readonly flags around log replay. However, we do not permit log replay when btrfs is insane, and log replay can start once btrfs is mounted in good state. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/disk-io.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 15d795a..727e156 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1937,9 +1937,14 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_set_opt(fs_info-mount_opt, SSD); } - if (btrfs_super_log_root(disk_super) != 0) { + if (btrfs_super_log_root(disk_super) != 0 + !(fs_info-fs_state BTRFS_SUPER_FLAG_ERROR)) { u64 bytenr = btrfs_super_log_root(disk_super); + unsigned int s_flags = sb-s_flags; + if (s_flags MS_RDONLY) + sb-s_flags = ~MS_RDONLY; + if (fs_devices-rw_devices == 0) { printk(KERN_WARNING Btrfs log replay required on RO media\n); @@ -1969,6 +1974,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, ret = btrfs_commit_super(tree_root); BUG_ON(ret); } + + sb-s_flags = s_flags; } ret = btrfs_find_orphan_roots(tree_root); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/4 v2] Btrfs: Add readonly support to replace BUG_ON phrase
Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. We still need a incompat flag to make old kernels happy. v1-v2: - in order to avoid deadlock thing, move write super stuff from error handle path to umount time. - remove BTRFS_SUPER_FLAG_VALID, just use BTRFS_SUPER_FLAG_ERROR to make it simple. - add MS_RDONLY check at start of a transaction instead of commit transaction. --- fs/btrfs/ctree.h | 19 ++ fs/btrfs/disk-io.c | 47 +- fs/btrfs/super.c | 88 fs/btrfs/transaction.c |3 ++ 4 files changed, 156 insertions(+), 1 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/4 v2] Btrfs: add filesystem state for error handle
Add filesystem state and a flags to tell if the filesystem is valid or insane now. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 8db9234..92b5ca2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -294,6 +294,14 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) #define BTRFS_FSID_SIZE 16 #define BTRFS_HEADER_FLAG_WRITTEN (1ULL 0) #define BTRFS_HEADER_FLAG_RELOC(1ULL 1) + +/* + * File system states + */ + +/* Errors detected */ +#define BTRFS_SUPER_FLAG_ERROR (1ULL 2) + #define BTRFS_SUPER_FLAG_SEEDING (1ULL 32) #define BTRFS_SUPER_FLAG_METADUMP (1ULL 33) @@ -1050,6 +1058,9 @@ struct btrfs_fs_info { unsigned metadata_ratio; void *bdev_holder; + + /* filesystem state */ + u64 fs_state; }; /* -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly
When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 3/4 v2] Btrfs: add readonly support for error handle
This patch provide a new error handle interface for those errors that handled by current BUG_ONs. In order to protect btrfs from panic, when it comes to those BUG_ON errors, the interface forces btrfs readonly and saves the FS state to disk. And the filesystem can be umounted, although mabye with some warning in kernel dmesg. Then btrfsck is helpful to recover btrfs. v1-v2: move write super stuff from error handle path to unmount in order to avoid deadlock. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/ctree.h |8 + fs/btrfs/super.c | 88 ++ 2 files changed, 96 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 92b5ca2..fc9b6a0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2552,6 +2552,14 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size); /* super.c */ int btrfs_parse_options(struct btrfs_root *root, char *options); int btrfs_sync_fs(struct super_block *sb, int wait); +void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, +unsigned int line, int errno); + +#define btrfs_std_error(fs_info, errno)\ +do { \ + if ((errno))\ + __btrfs_std_error((fs_info), __func__, __LINE__, (errno));\ +} while (0) /* acl.c */ #ifdef CONFIG_BTRFS_FS_POSIX_ACL diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 718b10d..07c58f9 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -54,6 +54,94 @@ static const struct super_operations btrfs_super_ops; +static const char *btrfs_decode_error(struct btrfs_fs_info *fs_info, int errno, + char nbuf[16]) +{ + char *errstr = NULL; + + switch (errno) { + case -EIO: + errstr = IO failure; + break; + case -ENOMEM: + errstr = Out of memory; + break; + case -EROFS: + errstr = Readonly filesystem; + break; + default: + if (nbuf) { + if (snprintf(nbuf, 16, error %d, -errno) = 0) + errstr = nbuf; + } + break; + } + + return errstr; +} + +static void __save_error_info(struct btrfs_fs_info *fs_info) +{ + struct btrfs_super_block *disk_super = fs_info-super_copy; + + fs_info-fs_state = BTRFS_SUPER_FLAG_ERROR; + disk_super-flags |= cpu_to_le64(BTRFS_SUPER_FLAG_ERROR); + + mutex_lock(fs_info-trans_mutex); + memcpy(fs_info-super_for_commit, disk_super, + sizeof(fs_info-super_for_commit)); + mutex_unlock(fs_info-trans_mutex); +} + +/* NOTE: + * We move write_super stuff at umount in order to avoid deadlock + * for umount hold all lock. + */ +static void save_error_info(struct btrfs_fs_info *fs_info) +{ + __save_error_info(fs_info); +} + +/* btrfs handle error by forcing the filesystem readonly */ +static void btrfs_handle_error(struct btrfs_fs_info *fs_info) +{ + struct super_block *sb = fs_info-sb; + + if (sb-s_flags MS_RDONLY) + return; + + if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) { + sb-s_flags |= MS_RDONLY; + printk(KERN_INFO btrfs is forced readonly\n); + } +} + +/* + * __btrfs_std_error decodes expected errors from the caller and + * invokes the approciate error response. + */ +void __btrfs_std_error(struct btrfs_fs_info *fs_info, const char *function, +unsigned int line, int errno) +{ + struct super_block *sb = fs_info-sb; + char nbuf[16]; + const char *errstr; + + /* +* Special case: if the error is EROFS, and we're already +* under MS_RDONLY, then it is safe here. +*/ + if (errno == -EROFS (sb-s_flags MS_RDONLY)) + return; + + errstr = btrfs_decode_error(fs_info, errno, nbuf); + printk(KERN_CRIT BTRFS error (device %s) in %s:%d: %s\n, + sb-s_id, function, line, errstr); + save_error_info(fs_info); + + btrfs_handle_error(fs_info); +} + static void btrfs_put_super(struct super_block *sb) { struct btrfs_root *root = btrfs_sb(sb); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 4/4 v2] Btrfs: deal with filesystem state at mount, umount
Since there is a filesystem state, we should deal with it carefully at mount, umount and remount. - At mount, the FS state should be checked if there is error on these FS. If it does have, btrfsck is recommended. - At umount, the FS state should be saved into disk for consistency. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 47 ++- 1 files changed, 46 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b40dfe4..663d360 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -43,6 +43,8 @@ static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, +int read_only); /* * end_io_wq structs are used to do processing in task context when an IO is @@ -1700,6 +1702,11 @@ struct btrfs_root *open_ctree(struct super_block *sb, if (!btrfs_super_root(disk_super)) goto fail_iput; + /* check filesystem state */ + fs_info-fs_state |= btrfs_super_flags(disk_super); + + btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; @@ -2405,10 +2412,17 @@ int btrfs_commit_super(struct btrfs_root *root) up_write(root-fs_info-cleanup_work_sem); trans = btrfs_join_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + ret = btrfs_commit_transaction(trans, root); BUG_ON(ret); + /* run commit again to drop the original snapshot */ trans = btrfs_join_transaction(root, 1); + if (IS_ERR(trans)) + return PTR_ERR(trans); + btrfs_commit_transaction(trans, root); ret = btrfs_write_and_wait_transaction(NULL, root); BUG_ON(ret); @@ -2426,8 +2440,28 @@ int close_ctree(struct btrfs_root *root) smp_mb(); btrfs_put_block_group_cache(fs_info); + + /* +* Here come 2 situations when btrfs flips readonly: +* +* 1. when btrfs flips readonly somewhere else before +* btrfs_commit_super, sb-s_flags has MS_RDONLY flag, +* and btrfs will skip to write sb directly to keep +* ERROR state on disk. +* +* 2. when btrfs flips readonly just in btrfs_commit_super, +* and in such case, btrfs cannnot write sb via btrfs_commit_super, +* and since fs_state has been set BTRFS_SUPER_FLAG_ERROR flag, +* btrfs will directly write sb. +*/ if (!(fs_info-sb-s_flags MS_RDONLY)) { - ret = btrfs_commit_super(root); + ret = btrfs_commit_super(root); + if (ret) + printk(KERN_ERR btrfs: commit super ret %d\n, ret); + } + + if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) { + ret = write_ctree_super(NULL, root, 0); if (ret) printk(KERN_ERR btrfs: commit super ret %d\n, ret); } @@ -2603,6 +2637,17 @@ out: return 0; } +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + int read_only) +{ + if (read_only) + return; + + if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) + printk(KERN_WARNING warning: mount fs with errors, + running btfsck is recommended\n); +} + static struct extent_io_ops btree_extent_io_ops = { .write_cache_pages_lock_hook = btree_lock_page_hook, .readpage_end_io_hook = btree_readpage_end_io_hook, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 4/4 v2] Btrfs: deal with filesystem state at mount, umount
On 12/02/2010 10:29 AM, Tsutomu Itoh wrote: Hi, I found 1 typo. (2010/12/01 19:21), liubo wrote: Since there is a filesystem state, we should deal with it carefully at mount, umount and remount. - At mount, the FS state should be checked if there is error on these FS. If it does have, btrfsck is recommended. - At umount, the FS state should be saved into disk for consistency. Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 47 ++- 1 files changed, 46 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index b40dfe4..663d360 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -43,6 +43,8 @@ static struct extent_io_ops btree_extent_io_ops; static void end_workqueue_fn(struct btrfs_work *work); static void free_fs_root(struct btrfs_root *root); +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + int read_only); /* * end_io_wq structs are used to do processing in task context when an IO is @@ -1700,6 +1702,11 @@ struct btrfs_root *open_ctree(struct super_block *sb, if (!btrfs_super_root(disk_super)) goto fail_iput; +/* check filesystem state */ +fs_info-fs_state |= btrfs_super_flags(disk_super); + +btrfs_check_super_valid(fs_info, sb-s_flags MS_RDONLY); + ret = btrfs_parse_options(tree_root, options); if (ret) { err = ret; @@ -2405,10 +2412,17 @@ int btrfs_commit_super(struct btrfs_root *root) up_write(root-fs_info-cleanup_work_sem); trans = btrfs_join_transaction(root, 1); +if (IS_ERR(trans)) +return PTR_ERR(trans); + ret = btrfs_commit_transaction(trans, root); BUG_ON(ret); + /* run commit again to drop the original snapshot */ trans = btrfs_join_transaction(root, 1); +if (IS_ERR(trans)) +return PTR_ERR(trans); + btrfs_commit_transaction(trans, root); ret = btrfs_write_and_wait_transaction(NULL, root); BUG_ON(ret); @@ -2426,8 +2440,28 @@ int close_ctree(struct btrfs_root *root) smp_mb(); btrfs_put_block_group_cache(fs_info); + +/* + * Here come 2 situations when btrfs flips readonly: + * + * 1. when btrfs flips readonly somewhere else before + * btrfs_commit_super, sb-s_flags has MS_RDONLY flag, + * and btrfs will skip to write sb directly to keep + * ERROR state on disk. + * + * 2. when btrfs flips readonly just in btrfs_commit_super, + * and in such case, btrfs cannnot write sb via btrfs_commit_super, + * and since fs_state has been set BTRFS_SUPER_FLAG_ERROR flag, + * btrfs will directly write sb. + */ if (!(fs_info-sb-s_flags MS_RDONLY)) { -ret = btrfs_commit_super(root); +ret = btrfs_commit_super(root); +if (ret) +printk(KERN_ERR btrfs: commit super ret %d\n, ret); +} + +if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) { +ret = write_ctree_super(NULL, root, 0); if (ret) printk(KERN_ERR btrfs: commit super ret %d\n, ret); } @@ -2603,6 +2637,17 @@ out: return 0; } +static void btrfs_check_super_valid(struct btrfs_fs_info *fs_info, + int read_only) +{ +if (read_only) +return; + +if (fs_info-fs_state BTRFS_SUPER_FLAG_ERROR) +printk(KERN_WARNING warning: mount fs with errors, + running btfsck is recommended\n); btfsck - btrfsck ahh, my fault, sorry for my carelessness. Thanks a lot for reviewing. thanks, Liu Bo +} + static struct extent_io_ops btree_extent_io_ops = { .write_cache_pages_lock_hook = btree_lock_page_hook, .readpage_end_io_hook = btree_readpage_end_io_hook, Regards, Itoh -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly
On 12/01/2010 06:20 PM, liubo wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. This patch may lead btrfs panic. Since btrfs allows transaction under readonly fs state, which is a bit weird, btrfs does not even check the returned transaction from start_transaction, although it may return -ENOMEM. With this patch, if btrfs flips readonly or is mounted readonly, to start a transaction will get a -EROFS. So we needs to check transaction more carefully, rather than just leave it alone. thanks, Liu Bo Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com --- fs/btrfs/transaction.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1fffbc0..14a597d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -181,6 +181,9 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_trans_handle *h; struct btrfs_transaction *cur_trans; int ret; + + if (root-fs_info-sb-s_flags MS_RDONLY) + return ERR_PTR(-EROFS); again: h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS); if (!h) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly
On 12/02/2010 12:28 PM, Yan, Zheng wrote: On Thu, Dec 2, 2010 at 11:42 AM, liubo liubo2...@cn.fujitsu.com wrote: On 12/01/2010 06:20 PM, liubo wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. This patch may lead btrfs panic. Since btrfs allows transaction under readonly fs state, which is a bit weird, btrfs does not even check the returned transaction from start_transaction, although it may return -ENOMEM. btrfs may do log replay even mount as readonly. Yeah, it it right. log replay maybe does take place when btrfs is mounted as readonly, but after the FS is broken, is btrfs willing to do log replay in such case? thanks, Liu Bo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/4 v2] Btrfs: avoid transaction stuff when readonly
On 12/02/2010 01:41 PM, Mike Fedyk wrote: On Wed, Dec 1, 2010 at 8:28 PM, Yan, Zheng yanzh...@21cn.com wrote: On Thu, Dec 2, 2010 at 11:42 AM, liubo liubo2...@cn.fujitsu.com wrote: On 12/01/2010 06:20 PM, liubo wrote: When the filesystem is readonly, avoid transaction stuff by checking MS_RDONLY at start transaction time. This patch may lead btrfs panic. Since btrfs allows transaction under readonly fs state, which is a bit weird, btrfs does not even check the returned transaction from start_transaction, although it may return -ENOMEM. btrfs may do log replay even mount as readonly. What part is logged besides tree roots and/or superblocks? log tree is used for log replay after crash and fast fsync and O_SYNC, it logs inodes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On 11/30/2010 04:10 AM, Josef Bacik wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c |7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. You mean those code that does nothing on ret-checks? IMO, if the code really needs ret-check, we should deal with them seriously, or just leave it alone. And this is a step-by-step job. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Yes, I agree on this. For spurious EIO, it mainly depends on coders, returning the errno to caller may work on bypassing fsck. While I'm working on this readonly stuff, it is difficult to solve the potential deadlock when we write the super block to disk. Since btrfs supports multi-device, before write-super, we must get the device lock device_list_mutex first, and this has puzzled me a lot. BTW, I've tried another way to bypass deadlock. I made the write-super stuff into umount, which can make us free from deadlock, however, while testing this, it seemes that umount cannot work due to a ext3/4 jbd oops, I'm digging on this oops... So, any ideas about free from deadlock? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/4] Add readonly support to replace BUG_ON phrase
On 11/30/2010 10:30 AM, Josef Bacik wrote: On Tue, Nov 30, 2010 at 10:03:58AM +0800, liubo wrote: On 11/30/2010 04:10 AM, Josef Bacik wrote: On Thu, Nov 25, 2010 at 05:52:47PM +0800, Miao Xie wrote: Btrfs has a number of BUG_ON()s, which may lead btrfs to unpleasant panic. Meanwhile, they are very ugly and should be handled more propriately. There are mainly two ways to deal with these BUG_ON()s. 1. For those errors which can be handled well by callers, we just return their error number to callers. 2. For others, We can force the filesystem readonly when it hits errors, which is what this patchset has done. Replaced BUG_ON() with the interface provided in this patchset, we will get error infomation via dmesg. Since btrfs is now readonly, we can save our data safely and umount it, then a btrfsck is recommended. By these ways, we can protect our filesystem from panic caused by those BUG_ONs. --- fs/btrfs/ctree.h | 21 ++ fs/btrfs/disk-io.c | 23 +++ fs/btrfs/super.c | 100 ++- fs/btrfs/transaction.c |7 +++ 4 files changed, 148 insertions(+), 3 deletions(-) Overall seems sane, but what about kernels that don't make these checks? I'm ok with well sucks for them as an answer, just want to make sure we've at least though about it. You mean those code that does nothing on ret-checks? IMO, if the code really needs ret-check, we should deal with them seriously, or just leave it alone. And this is a step-by-step job. Sorry I mean for older kernels that don't know about these hey your fs is screwed flags. It seems like they'll just get ignored, are we sure thats what we want to happen? I'm fine with that, but if we don't want that to happen it may be good to have a incompat flag. Ohh, got it, thanks for pointing it out. Will do it later. Also I'm not sure marking the fs as broken is the right move here. Ext3/4 don't do this, they just mount read-only, as long as you can still unmount the filesystem everything comes out ok. Think of the case where we just get a spurious EIO, the fs should be fine the next time around, there's reason to force the user to run fsck in this case. Yes, I agree on this. For spurious EIO, it mainly depends on coders, returning the errno to caller may work on bypassing fsck. Right I'm worried about the flipping read only stuff being kicked by EIO, which happens with ext* and could happen with btrfs in the right cases. I'm not saying thats wrong, its what should happen, I'm just saying we need to be able to unmount the filesystem and mount it back up without needing to run an fsck in between. hm, this really makes sense. Since it is difficult to tell whether a fake corruption it is, what about just implementing readonly stuff like this and making it more friendly to EIO in future? While I'm working on this readonly stuff, it is difficult to solve the potential deadlock when we write the super block to disk. Since btrfs supports multi-device, before write-super, we must get the device lock device_list_mutex first, and this has puzzled me a lot. BTW, I've tried another way to bypass deadlock. I made the write-super stuff into umount, which can make us free from deadlock, however, while testing this, it seemes that umount cannot work due to a ext3/4 jbd oops, I'm digging on this oops... So, any ideas about free from deadlock? None :). The best thing I can think of is do like we're doing with the read only stuff and only write out the super right before we flip read only, and then make umount make sure that if we're mounted read only to not do anything. Truth be told I hate this mark the fs as broken idea. We don't know if the error we got means the filesystem is broken (for example the EIO case). If we do hit actual corruption maybe it would be good, and in that case we should write out the super at the point we find that corruption and then flip read only and have that be the only time we have to worry about writing out the super. So I guess that's 2 options 1) Ditch the the fs is broken flag. This makes things nice and simple since on-disk is already consistent, all we have to do is drop anything thats dirty and we're home free. 2) Keep the flag, but only worry about writing it out on a case by case basis. So we have a btrfs_corrupt_fs() function that writes out the super with the appropriate flag, and then flips the fs read only. Then we don't have to do anything special in the common paths, just the normal hey is this fs read only? things, so for all other cases we can just flip the fs read only and everything works. The 2) is what I have just done. :) I hope that makes sense, if not feel free to ignore me and just keep doing what you've been doing :). Thanks, They are very helpful. Thanks, Liu Bo Josef -- To unsubscribe