[PATCH] don't rename file into dummy directory
A recently change enforces there is only one access point to each subvolume. The first directory entry (the one added when the subvolume/snapshot was created) is treated as valid access point, all other subvolume links are linked to dummy empty directories. The dummy directories are temporary inodes that only in memory, so we can not rename file into them. Signed-off-by: Yan Zheng zheng@oracle.com --- diff -urp 1/fs/btrfs/inode.c 2/fs/btrfs/inode.c --- 1/fs/btrfs/inode.c 2009-09-23 05:49:42.007477065 +0800 +++ 2/fs/btrfs/inode.c 2009-09-23 09:46:22.451357089 +0800 @@ -5057,6 +5057,9 @@ static int btrfs_rename(struct inode *ol u64 root_objectid; int ret; + if (new_dir-i_ino == BTRFS_EMPTY_SUBVOL_DIR_OBJECTID) + return -EPERM; + /* we only allow rename subvolume link between subvolumes */ if (old_inode-i_ino != BTRFS_FIRST_FREE_OBJECTID root != dest) return -EXDEV; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] check size of inode backref before adding hardlink
For every hardlink in btrfs, there is a corresponding inode back reference. All inode back references for hardlinks in a given directory are stored in single b-tree item. The size of b-tree item is limited by the size of b-tree leaf, so we can only create limited number of hardlinks to a given file in a directory. The original code lacks of the check, it oops if the number of hardlinks goes over the limit. This patch fixes the issue by adding check to btrfs_link and btrfs_rename. Signed-off-by: Yan Zheng zheng@oracle.com --- diff -urp 1/fs/btrfs/ctree.c 2/fs/btrfs/ctree.c --- 1/fs/btrfs/ctree.c 2009-07-29 10:03:04.150859426 +0800 +++ 2/fs/btrfs/ctree.c 2009-09-23 09:55:39.366357021 +0800 @@ -2853,6 +2853,12 @@ static noinline int split_leaf(struct bt int split; int num_doubles = 0; + l = path-nodes[0]; + slot = path-slots[0]; + if (extend data_size + btrfs_item_size_nr(l, slot) + + sizeof(struct btrfs_item) BTRFS_LEAF_DATA_SIZE(root)) + return -EOVERFLOW; + /* first try to make some room by pushing left and right */ if (data_size ins_key-type != BTRFS_DIR_ITEM_KEY) { wret = push_leaf_right(trans, root, path, data_size, 0); diff -urp 1/fs/btrfs/inode.c 2/fs/btrfs/inode.c --- 1/fs/btrfs/inode.c 2009-09-23 05:49:42.007477065 +0800 +++ 2/fs/btrfs/inode.c 2009-09-23 10:20:17.006357143 +0800 @@ -4133,18 +4133,16 @@ static int btrfs_link(struct dentry *old err = btrfs_add_nondir(trans, dentry, inode, 1, index); - if (err) - drop_inode = 1; - - btrfs_update_inode_block_group(trans, dir); - err = btrfs_update_inode(trans, root, inode); - - if (err) + if (err) { drop_inode = 1; + } else { + btrfs_update_inode_block_group(trans, dir); + err = btrfs_update_inode(trans, root, inode); + BUG_ON(err); + btrfs_log_new_name(trans, inode, NULL, dentry-d_parent); + } nr = trans-blocks_used; - - btrfs_log_new_name(trans, inode, NULL, dentry-d_parent); btrfs_end_transaction_throttle(trans, root); fail: if (drop_inode) { @@ -5087,23 +5085,26 @@ static int btrfs_rename(struct inode *ol down_read(root-fs_info-subvol_sem); trans = btrfs_start_transaction(root, 1); + btrfs_set_trans_block_group(trans, new_dir); if (dest != root) btrfs_record_root_in_trans(trans, dest); - /* -* make sure the inode gets flushed if it is replacing -* something. -*/ - if (new_inode new_inode-i_size - old_inode S_ISREG(old_inode-i_mode)) { - btrfs_add_ordered_operation(trans, root, old_inode); - } + ret = btrfs_set_inode_index(new_dir, index); + if (ret) + goto out_fail; - if (old_inode-i_ino == BTRFS_FIRST_FREE_OBJECTID) { + if (unlikely(old_inode-i_ino == BTRFS_FIRST_FREE_OBJECTID)) { /* force full log commit if subvolume involved. */ root-fs_info-last_trans_log_full_commit = trans-transid; } else { + ret = btrfs_insert_inode_ref(trans, dest, +new_dentry-d_name.name, +new_dentry-d_name.len, +old_inode-i_ino, +new_dir-i_ino, index); + if (ret) + goto out_fail; /* * this is an ugly little race, but the rename is required * to make sure that if we crash, the inode is either at the @@ -5113,8 +5114,14 @@ static int btrfs_rename(struct inode *ol */ btrfs_pin_log_trans(root); } - - btrfs_set_trans_block_group(trans, new_dir); + /* +* make sure the inode gets flushed if it is replacing +* something. +*/ + if (new_inode new_inode-i_size + old_inode S_ISREG(old_inode-i_mode)) { + btrfs_add_ordered_operation(trans, root, old_inode); + } old_dir-i_ctime = old_dir-i_mtime = ctime; new_dir-i_ctime = new_dir-i_mtime = ctime; @@ -5159,12 +5166,10 @@ static int btrfs_rename(struct inode *ol BUG_ON(ret); } } - ret = btrfs_set_inode_index(new_dir, index); - BUG_ON(ret); ret = btrfs_add_link(trans, new_dir, old_inode, new_dentry-d_name.name, -new_dentry-d_name.len, 1, index); +new_dentry-d_name.len, 0, index); BUG_ON(ret); if (old_inode-i_ino != BTRFS_FIRST_FREE_OBJECTID) { @@ -5172,7 +5177,7 @@ static int btrfs_rename(struct inode *ol new_dentry-d_parent);
Re: [PATCH 4/4] add snapshot/subvolume destroy ioctl
On Thu, Sep 24, 2009 at 04:27:57PM +0800, john wrote: After a git bisect, I think this patch may introduce a performance regression(about 15% slower) in postmark benchmark. Sometimes(last a few seconds) in the test, CPU usage is 100% wait but NO IO is performing, it's not IO-wait. This didn't happen for earlier versions. test*environment: hard disk: INTEL X25-E SSD 64G mkfs options: mkfs.btrfs -m single /dev/xxx mount options: -o ssd,nodatasum,nodatacow Thanks, I'll give this a shot here. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add snapshot/subvolume destroy ioctl
On Thu, Sep 24, 2009 at 08:43:22AM -0400, Chris Mason wrote: On Thu, Sep 24, 2009 at 04:27:57PM +0800, john wrote: After a git bisect, I think this patch may introduce a performance regression(about 15% slower) in postmark benchmark. Sometimes(last a few seconds) in the test, CPU usage is 100% wait but NO IO is performing, it's not IO-wait. This didn't happen for earlier versions. test*environment: hard disk: INTEL X25-E SSD 64G mkfs options: mkfs.btrfs -m single /dev/xxx mount options: -o ssd,nodatasum,nodatacow Thanks, I'll give this a shot here. Looks like Yan Zheng already tracked it down. I've pushed his fix out to the master branch. Thanks, Chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] Btrfs patches for 2.6.32-rc
Hello everyone, The for-linus branch of the btrfs unstable repo is updated for merging with mainline: git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable.git for-linus There was a trivial conflict against fs/btrfs/super.c, so the for-linus branch has things fixed up. The master branch has the pre-merge btrfs changes against 2.6.31. The most significant part of this merge is Yan Zheng's work on snapshot and subvolume deletion. If you pull the latest from btrfs-progs you'll be able to delete snapshots and subvolumes without having to resort to rm -rf. This is much faster because it does the deletion via btree walking. It's also now possible to rename snapshots and subvols. Most of my patches are around improving write performance. Streaming writes on very fast hardware got CPU bound at around 400MB/s, and btrfs can now push over 1GB/s while using the same CPU as XFS (if you factor out crcs). There are also fixes for the btrfs write_cache_pages in there do a better job of writing large portions of an extent. The first part of Josef's ENOSPC work is included, but the patch that starts enforcing space reservations was held back for now. Another visible change is the btrfs worker threads are more dynamic, and they die if they have been idle for a while. Chris Mason (26) commits (+759/-393): Btrfs: use a cached state for extent state operations during delalloc (+40/-24) Btrfs: fix releasepage to avoid unlocking extents we haven't locked (+7/-2) Btrfs: Use PagePrivate2 to track pages in the data=ordered code. (+62/-55) Btrfs: fix errors handling cached state in set/clear_extent_bit (+8/-8) Btrfs: search for an allocation hint while filling file COW (+59/-1) Btrfs: don't lock bits in the extent tree during writepage (+0/-21) Btrfs: reduce worker thread spin_lock_irq hold times (+60/-14) Btrfs: keep irqs on more often in the worker threads (+16/-10) Btrfs: fix btrfs page_mkwrite to return locked page (+3/-0) Btrfs: reduce CPU usage in the extent_state tree (+28/-68) Btrfs: Fix test_range_bit for whole file extents (+4/-0) Btrfs: properly honor wbc-nr_to_write changes (+27/-11) Btrfs: use larger nr_to_write for larger extents (+9/-5) Btrfs: Allow worker threads to exit when idle (+132/-32) Btrfs: zero page past end of inline file items (+5/-0) Btrfs: fix worker thread double spin_lock_irq (+2/-2) Btrfs: cache values for locking extents (+100/-36) Btrfs: fix early enospc during balancing (+7/-13) Btrfs: Fix new state initialization order (+2/-2) Btrfs: switch extent_map to a rw lock (+57/-60) Btrfs: Fix async thread shutdown race (+10/-6) Btrfs: fix async worker startup race (+11/-3) Btrfs: Fix extent replacment race (+80/-13) Btrfs: deal with NULL space info (+16/-2) Btrfs: tweak congestion backoff (+1/-1) Btrfs: optimize set extent bit (+13/-4) Zheng Yan (7) commits (+1490/-679): Btrfs: check size of inode backref before adding hardlink (+37/-24) Btrfs: do not reuse objectid of deleted snapshot/subvol (+31/-116) Btrfs: add snapshot/subvolume destroy ioctl (+605/-233) Btrfs: change how subvolumes are organized (+459/-168) Btrfs: don't rename file into dummy directory (+3/-0) Btrfs: relocate file extents in clusters (+148/-89) Btrfs: speed up snapshot dropping (+207/-49) Josef Bacik (6) commits (+207/-748): Btrfs: don't keep retrying a block group if we fail to allocate a cluster (+17/-8) Btrfs: make balance code choose more wisely when relocating (+148/-18) Btrfs: account for space used by the super mirrors (+20/-2) Btrfs: fix extent entry threshold calculation (+21/-14) Btrfs: fix bitmap size tracking (+1/-0) Btrfs: remove dead code (+0/-706) Yan Zheng (2) commits (+383/-259): Btrfs: hash the btree inode during fill_super (+1/-0) Btrfs: improve async block group caching (+382/-259) Sage Weil (1) commits (+1/-2): Btrfs: fix arithmetic error in clone ioctl Total: (42) commits fs/btrfs/async-thread.c | 264 +- fs/btrfs/async-thread.h | 12 fs/btrfs/btrfs_inode.h |1 fs/btrfs/compression.c |8 fs/btrfs/ctree.c|6 fs/btrfs/ctree.h| 78 + fs/btrfs/dir-item.c | 47 + fs/btrfs/disk-io.c | 235 +++-- fs/btrfs/export.c | 133 ++- fs/btrfs/extent-tree.c | 1740 ++-- fs/btrfs/extent_io.c| 404 +- fs/btrfs/extent_io.h| 18 fs/btrfs/extent_map.c | 103 ++ fs/btrfs/extent_map.h |5 fs/btrfs/file.c | 37 fs/btrfs/free-space-cache.c | 36 fs/btrfs/inode-item.c |4 fs/btrfs/inode-map.c| 93 -- fs/btrfs/inode.c| 687 - fs/btrfs/ioctl.c| 341 fs/btrfs/ioctl.h|3 fs/btrfs/ordered-data.c | 37 fs/btrfs/ordered-data.h |3
Re: grub-0.97: btrfs multidevice support [PATCH]
Hi Edward, I'm sorry but GRUB Legacy is not maintained. At least not by us; we've deprecated it in favour of GRUB 2. It is also being abandoned by distributors, so I wouldn't recommend that you put any effort in developing for it. -- Robert Millan The DRM opt-in fallacy: Your data belongs to us. We will decide when (and how) you may access your data; but nobody's threatening your freedom: we still allow you to remove your data and not access it at all. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix deadlock with free space handling and user transactions
Hi, If an ioctl-initiated transaction is open, we can't force a commit during the free space checks in order to free up pinned extents or else we deadlock. Just ENOSPC instead. A more satisfying solution that reserves space for the entire user transaction up front is forthcoming... Signed-off-by: Sage Weil s...@newdream.net --- fs/btrfs/extent-tree.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 90d314e..63d86ae 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2813,7 +2813,7 @@ alloc: } spin_unlock(meta_sinfo-lock); - if (!committed) { + if (!committed !root-fs_info-open_ioctl_trans) { committed = 1; trans = btrfs_join_transaction(root, 1); if (!trans) @@ -2887,7 +2887,7 @@ alloc: spin_unlock(data_sinfo-lock); /* commit the current transaction and try again */ - if (!committed) { + if (!committed !root-fs_info-open_ioctl_trans) { committed = 1; trans = btrfs_join_transaction(root, 1); if (!trans) -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/2] grub-0.97: btrfs support
Hello everyone. The following patches are for Fedora 10(**). The distro-independent package will be put to kernel.org a bit later. I. Loading kernels from btrfs volumes Now you can load kernels and initrds from btrfs volumes composed of many devices. WARNING!!! Make sure that all components of your loading btrfs volume(*) are visible to grub. Otherwise, you'll end with unbootable system. The list of available grub devices can be obtained, for example, using tab completion in grub shell. Number of components of a loading volume is not restricted, however if it is larger then 128, then the boot process will be slowed down because of expensive translations (btrfs-device-id - grub-device-id) which issue a large number of IOs. We cache only 128 such translations in grub-0.97 because of high memory pressure. II. Installing grub from btrfs volumes You can install grub from a btrfs image volume(*) composed of many devices (see above about restrictions). Also you can setup any component of a btrfs boot(*) volume as grub root device. NOTE!!! Make sure that all components of image and boot volumes(*) are visible to grub, otherwise grub installer will return error. TECHNICAL NOTE (for grub developers): The unpleasant surprise was that grub installer overwrites (by default!) the file (stage2), bypassing the file system driver. I can not understand this: it looks like stepping to the clean water with dirty shoe. Hope that grub2 won't afford such things. In order to install grub from a btrfs image volume use special option (--stage2). This option makes grub installer to rewrite the file with a help of the OS's file system (i.e, via write (2)). Any attempts to install without this option will fail with an error (wrong argument). The example of possible installation scenario. Suppose image volume = root volume = loading volume is composed of devices (hd0,4), (hd0,5), (hd1,5), (hd1,7) and is not an OS's root. We want to setup (hd0,4) as grub root device and install grub to the mbr of (hd0). . build and install grub with btrfs support; . mount your the 3 in 1 btrfs volume to /mnt; . create a directory /mnt/grub; . put the built files stage1, stage2, btrfs_stage1_5, grub.conf, etc. to /mnt/grub; . run grub shell; . grub root (hd0,4) . grub setup --stage2=/mnt/grub/stage2 (hd0) . have a fun. Use info(1) grub for more details. (*) Glossary: . loading volume: a btrfs volume that contains kernel image and initrd; . image volume: a btrfs volume that contains stage1, stage2, btrfs_stage_1_5, and grub.conf files needed for grub installer; . boot volume: a btrfs volume where grub will look for stage2 and grub.conf files in boot time. (**) Link to the Fedora's grub package: http://ucho.ignum.cz/fedora/linux/releases/10/Fedora/source/SRPMS/grub-0.97-38.fc10.src.rpm All comments, bugreports, etc. are welcome as usual. Thanks, Edward. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/2] grub-0.97: btrfs multidevice configuration support
Signed-off-by: Edward Shishkin edw...@redhat.com --- grub/Makefile.am|2 stage2/btrfs.h | 55 ++-- stage2/builtins.c | 10 stage2/disk_io.c|2 stage2/filesys.h|4 stage2/fsys_btrfs.c | 693 +--- 6 files changed, 595 insertions(+), 171 deletions(-) --- grub-0.97.orig/stage2/btrfs.h +++ grub-0.97/stage2/btrfs.h @@ -124,6 +124,7 @@ static int btrfs_csum_sizes[] = { 4, 0 } #define BTRFS_DEFAULT_NUM_DEVICES 1 #define BTRFS_DEFAULT_NODE_SIZE 4096 #define BTRFS_DEFAULT_LEAF_SIZE 4096 +#define BTRFS_NUM_CACHED_DEVICES 128 #define WARN_ON(c) #define cassert(cond) ({ switch (-1) { case (cond): case 0: break; } }) @@ -315,13 +316,22 @@ struct btrfs_node { struct btrfs_key_ptr ptrs[]; } __attribute__ ((__packed__)); +struct btrfs_device { + /* the internal btrfs device id */ + u64 devid; + /* the internal grub device representation */ + unsigned long drive; + unsigned long part; + unsigned long length; +}; + struct extent_buffer { /* metadata */ + struct btrfs_device dev; u64 start; u64 dev_bytenr; u32 len; - int refs; - int flags; + /* data */ char *data; }; @@ -555,12 +565,8 @@ struct btrfs_block_group_item { struct btrfs_root { struct extent_buffer node; char data[4096]; - struct extent_buffer *commit_root; struct btrfs_root_item root_item; - struct btrfs_key root_key; - struct btrfs_fs_info *fs_info; u64 objectid; - u64 last_trans; /* data allocations are done in sectorsize units */ u32 sectorsize; @@ -573,42 +579,31 @@ struct btrfs_root { /* leaf allocations are done in leafsize units */ u32 stripesize; +}; - int ref_cows; - int track_dirty; - - - u32 type; - u64 highest_inode; - u64 last_inode_alloc; +struct btrfs_file_info { + struct btrfs_key key; }; struct btrfs_root; struct btrfs_fs_devices; struct btrfs_fs_info { u8 fsid[BTRFS_FSID_SIZE]; - u8 chunk_tree_uuid[BTRFS_UUID_SIZE]; struct btrfs_root fs_root; struct btrfs_root tree_root; struct btrfs_root chunk_root; - struct btrfs_key file_info; /* currently opened file */ + struct btrfs_file_info file_info; /* currently opened file */ struct btrfs_path paths [LAST_LOOKUP_POOL]; - u64 generation; - u64 last_trans_committed; + char mbr[SECTOR_SIZE]; - u64 system_alloc_profile; - u64 alloc_start; + int sb_mirror; + u64 sb_transid; + struct btrfs_device sb_dev; + struct btrfs_super_block sb_copy; - struct btrfs_super_block super_temp; - struct btrfs_super_block super_copy; - - u64 super_bytenr; - u64 total_pinned; - - int system_allocs; - int readonly; + struct btrfs_device devices[BTRFS_NUM_CACHED_DEVICES + 1]; }; /* @@ -1129,6 +1124,11 @@ static inline void btrfs_set_key_type(st key-type = val; } +static inline u64 btrfs_super_devid(struct btrfs_super_block *disk_super) +{ + return le64_to_cpu(disk_super-dev_item.devid); +} + /* struct btrfs_header */ BTRFS_SETGET_HEADER_FUNCS(header_bytenr, struct btrfs_header, bytenr, 64); BTRFS_SETGET_HEADER_FUNCS(header_generation, struct btrfs_header, @@ -1317,6 +1317,7 @@ struct btrfs_fs_devices { }; struct btrfs_bio_stripe { + struct btrfs_device dev; u64 physical; }; --- grub-0.97.orig/stage2/fsys_btrfs.c +++ grub-0.97/stage2/fsys_btrfs.c @@ -31,15 +31,21 @@ #define BTRFS_FS_INFO \ ((struct btrfs_fs_info *)((unsigned long)FSYS_BUF + \ LOOKUP_CACHE_SIZE)) -#define BTRFS_CACHE_SIZE(sizeof(struct btrfs_fs_info) +\ -LOOKUP_CACHE_SIZE) -#define BTRFS_FILE_INFO (BTRFS_FS_INFO-file_info) -#define BTRFS_TREE_ROOT (BTRFS_FS_INFO-tree_root) -#define BTRFS_CHUNK_ROOT(BTRFS_FS_INFO-chunk_root) -#define BTRFS_FS_ROOT (BTRFS_FS_INFO-fs_root) -#define BTRFS_SUPER (BTRFS_FS_INFO-super_copy) -#define LOOKUP_CACHE_BUF(id) ((char *)((unsigned long)FSYS_BUF + \ - id * LOOKUP_CACHE_BUF_SIZE)) +#define BTRFS_CACHE_SIZE (sizeof(struct btrfs_fs_info) + \ + LOOKUP_CACHE_SIZE) +#define BTRFS_TREE_ROOT (BTRFS_FS_INFO-tree_root) +#define BTRFS_CHUNK_ROOT (BTRFS_FS_INFO-chunk_root) +#define BTRFS_FS_ROOT(BTRFS_FS_INFO-fs_root) +#define BTRFS_SUPER (BTRFS_FS_INFO-sb_copy) +#define BTRFS_DEVICES(BTRFS_FS_INFO-devices[0]) +#define BTRFS_FILE_INFO (BTRFS_FS_INFO-file_info) +#define BTRFS_FILE_INFO_KEY
Re: [PATCH 4/4] add snapshot/subvolume destroy ioctl
My test result confirms that it has been fixed. Thanks for your time. 2009/9/24 Chris Mason chris.ma...@oracle.com: On Thu, Sep 24, 2009 at 08:43:22AM -0400, Chris Mason wrote: On Thu, Sep 24, 2009 at 04:27:57PM +0800, john wrote: After a git bisect, I think this patch may introduce a performance regression(about 15% slower) in postmark benchmark. Sometimes(last a few seconds) in the test, CPU usage is 100% wait but NO IO is performing, it's not IO-wait. This didn't happen for earlier versions. test*environment: hard disk: INTEL X25-E SSD 64G mkfs options: mkfs.btrfs -m single /dev/xxx mount options: -o ssd,nodatasum,nodatacow Thanks, I'll give this a shot here. Looks like Yan Zheng already tracked it down. I've pushed his fix out to the master branch. Thanks, Chris -- Zhang Jingwang National Research Centre for High Performance Computers Institute of Computing Technology, Chinese Academy of Sciences No. 6, South Kexueyuan Road, Haidian District Beijing, China -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html