Re: backref for an extent not found in send_root (!backref_ctx->found_itself)
Hi Alex, On Mon, January 28, 2013 at 17:11 (+0100), Alex Lyakas wrote: > Hi Jan, > I have a set of unit tests (part of the larger system) for the > send-receive functionality, with which I am able to hit this error: > > Jan 28 18:01:00 687-dev kernel: [16968.451358] btrfs: ERROR did not > find backref in send_root. inode=259, offset=139264, disk_byte=4263936 > found extent=4263936 > > As the code states, this could indicate a bug in backref walking. This > reproduces with "for-linus" branch. > > Typically this happens when a snapshot is deleted, immediately a new > snap with the same name is created, and then "btrfs send" is issued > without parent (i.e., full-send) on this snap. > > To debug this further, we can do one of two things: > # I can apply patches/debug prints & reproduce > # I can work to isolate the unit test into a bash script and send you > a script that reproduces I'd prefer #2 of the above. You can also send me the unit tests you've got if I can get them running without multiple days of setup. I'm guessing that this is more likely going to end up in send.c than in backref.c, perhaps Alexander would like to trace this one down. But anyway, send me a reproducer (in private, if you don't want to publish it) and we'll see what's going on. Thanks, -Jan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 01/10] Btrfs: use atomic for btrfs_fs_info->generation
fs_info->generation is a 64bit variant, and it can be accessed by multi-task, if there is no lock or other methods to protect it, we might get the wrong number, especially on 32bit machine. For example, Assuming ->generation is 0x at the beginning, then we increase it by 1, ->generation will be 0x 0001 , but it is in the registers, then we store it into the memory. If some task accesses it at this time, just like this: Task0 Task1 set low 32 bits load low 32 bits load high 32 bits set high 32 bits The task will get 0, it is a wrong number. We fix this problem by the atomic operation. Signed-off-by: Zhao Lei Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.c | 7 --- fs/btrfs/ctree.h | 2 +- fs/btrfs/disk-io.c | 8 fs/btrfs/file.c | 6 -- fs/btrfs/inode.c | 5 +++-- fs/btrfs/qgroup.c| 2 +- fs/btrfs/transaction.c | 4 ++-- include/trace/events/btrfs.h | 3 ++- 8 files changed, 21 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index eea5da7..4a36c03 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1365,10 +1365,11 @@ noinline int btrfs_cow_block(struct btrfs_trans_handle *trans, (unsigned long long) root->fs_info->running_transaction->transid); - if (trans->transid != root->fs_info->generation) + if (trans->transid != atomic64_read(&root->fs_info->generation)) WARN(1, KERN_CRIT "trans %llu running %llu\n", (unsigned long long)trans->transid, - (unsigned long long)root->fs_info->generation); + (unsigned long long)atomic64_read( + &root->fs_info->generation)); if (!should_cow_block(trans, root, buf)) { *cow_ret = buf; @@ -1465,7 +1466,7 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans, return 0; WARN_ON(trans->transaction != root->fs_info->running_transaction); - WARN_ON(trans->transid != root->fs_info->generation); + WARN_ON(trans->transid != atomic64_read(&root->fs_info->generation)); parent_nritems = btrfs_header_nritems(parent); blocksize = btrfs_level_size(root, parent_level - 1); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 547b7b0..c3edb22 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1278,7 +1278,7 @@ struct btrfs_fs_info { struct btrfs_block_rsv empty_block_rsv; - u64 generation; + atomic64_t generation; u64 last_trans_committed; /* diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 65f0367..f03aebc 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1200,7 +1200,7 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize, memset(&root->root_item, 0, sizeof(root->root_item)); memset(&root->defrag_progress, 0, sizeof(root->defrag_progress)); memset(&root->root_kobj, 0, sizeof(root->root_kobj)); - root->defrag_trans_start = fs_info->generation; + root->defrag_trans_start = atomic64_read(&fs_info->generation); init_completion(&root->kobj_unregister); root->defrag_running = 0; root->root_key.objectid = objectid; @@ -2501,7 +2501,7 @@ retry_root_backup: fs_info->pending_quota_state = 1; } - fs_info->generation = generation; + atomic64_set(&fs_info->generation, generation); fs_info->last_trans_committed = generation; ret = btrfs_recover_balance(fs_info); @@ -3436,12 +3436,12 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) int was_dirty; btrfs_assert_tree_locked(buf); - if (transid != root->fs_info->generation) + if (transid != atomic64_read(&root->fs_info->generation)) WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, " "found %llu running %llu\n", (unsigned long long)buf->start, (unsigned long long)transid, - (unsigned long long)root->fs_info->generation); + (u64)atomic64_read(&root->fs_info->generation)); was_dirty = set_extent_buffer_dirty(buf); if (!was_dirty) { spin_lock(&root->fs_info->delalloc_lock); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 841cfe3..02409b6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1588,7 +1588,8 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, * otherwise subsequent syncs to a file that's been synced in this * transaction will appear to have already occured. */ - BTRFS_I(inode)->last_trans = root->fs_
[PATCH V2 02/10] Btrfs: use atomic for fs_info->last_trans_committed
fs_info->last_trans_committed is a 64bit variant, and it can be accessed by multi-task, if there is no lock or other methods to protect it, we might get the wrong number, especially on 32bit machine.(Even on 64bit machine, it is possible that the compiler may split a 64bit operation into two 32bit operation.) For example, Assuming ->last_trans_committed is 0x at the beginning, then we want set it to 0x0001. Task0 Task1 set low 32 bits load low 32 bits load high 32 bits set high 32 bits The task will get 0, it is a wrong number. We fix this problem by the atomic operation. Signed-off-by: Zhao Lei Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.h| 2 +- fs/btrfs/disk-io.c | 2 +- fs/btrfs/file.c | 2 +- fs/btrfs/ioctl.c| 2 +- fs/btrfs/ordered-data.c | 2 +- fs/btrfs/scrub.c| 2 +- fs/btrfs/transaction.c | 5 +++-- fs/btrfs/tree-log.c | 16 +--- 8 files changed, 18 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c3edb22..34a60a8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1279,7 +1279,7 @@ struct btrfs_fs_info { struct btrfs_block_rsv empty_block_rsv; atomic64_t generation; - u64 last_trans_committed; + atomic64_t last_trans_committed; /* * this is updated to the current trans every time a full commit diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f03aebc..87ed05a 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2502,7 +2502,7 @@ retry_root_backup: } atomic64_set(&fs_info->generation, generation); - fs_info->last_trans_committed = generation; + atomic64_set(&fs_info->last_trans_committed, generation); ret = btrfs_recover_balance(fs_info); if (ret) { diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 02409b6..910ea99 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1683,7 +1683,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) if (btrfs_inode_in_log(inode, atomic64_read(&root->fs_info->generation)) || BTRFS_I(inode)->last_trans <= - root->fs_info->last_trans_committed) { + atomic64_read(&root->fs_info->last_trans_committed)) { BTRFS_I(inode)->last_trans = 0; /* diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index afbf3ac..3b6c339 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3114,7 +3114,7 @@ static noinline long btrfs_ioctl_start_sync(struct btrfs_root *root, return PTR_ERR(trans); /* No running transaction, don't bother */ - transid = root->fs_info->last_trans_committed; + transid = atomic64_read(&root->fs_info->last_trans_committed); goto out; } transid = trans->transid; diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index f107312..f376621 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -975,7 +975,7 @@ void btrfs_add_ordered_operation(struct btrfs_trans_handle *trans, * if this file hasn't been changed since the last transaction * commit, we can safely return without doing anything */ - if (last_mod < root->fs_info->last_trans_committed) + if (last_mod < atomic64_read(&root->fs_info->last_trans_committed)) return; spin_lock(&root->fs_info->ordered_extent_lock); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index bdbb94f..af0b566 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2703,7 +2703,7 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) return -EIO; - gen = root->fs_info->last_trans_committed; + gen = atomic64_read(&root->fs_info->last_trans_committed); for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 105d642..29fdf1c 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -459,7 +459,8 @@ int btrfs_wait_for_commit(struct btrfs_root *root, u64 transid) int ret = 0; if (transid) { - if (transid <= root->fs_info->last_trans_committed) + if (transid <= + atomic64_read(&root->fs_info->last_trans_committed)) goto out; ret = -EINVAL; @@ -1730,7 +1731,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, cur_trans->commit_done = 1; - root->fs_info->last_trans_committed = cur_trans->transid; + atomic64_set(&root->fs_info->las
[PATCH V2 03/10] Btrfs: use atomic for fs_info->last_trans_log_full_commit
fs_info->last_trans_log_full_commit is a 64bit variant, and it can be accessed by multi-task, if there is no lock or other methods to protect it, we might get the wrong number, especially on 32bit machine.(Even on 64bit machine, it is possible that the compiler may split a 64bit operation into two 32bit operation.) For example, Assuming ->last_trans_log_full_commit is 0x at the beginning, then we want set it to 0x0001. Task0 Task1 set low 32 bits load low 32 bits load high 32 bits set high 32 bits The task will get 0, it is a wrong number. We fix this problem by the atomic operation. Signed-off-by: Zhao Lei Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.h | 2 +- fs/btrfs/extent-tree.c | 3 ++- fs/btrfs/inode.c | 3 ++- fs/btrfs/tree-log.c| 32 +++- 4 files changed, 24 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 34a60a8..745e7ad 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1285,7 +1285,7 @@ struct btrfs_fs_info { * this is updated to the current trans every time a full commit * is required instead of the faster short fsync log commits */ - u64 last_trans_log_full_commit; + atomic64_t last_trans_log_full_commit; unsigned long mount_opt; unsigned long compress_type:4; u64 max_inline; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 85b8454..ef61a4a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -7868,7 +7868,8 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans, extent_root = root->fs_info->extent_root; - root->fs_info->last_trans_log_full_commit = trans->transid; + atomic64_set(&root->fs_info->last_trans_log_full_commit, +trans->transid); cache = kzalloc(sizeof(*cache), GFP_NOFS); if (!cache) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 35c4dda..803be87 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7433,7 +7433,8 @@ static int btrfs_rename(struct inode *old_dir, struct dentry *old_dentry, if (unlikely(old_ino == BTRFS_FIRST_FREE_OBJECTID)) { /* force full log commit if subvolume involved. */ - root->fs_info->last_trans_log_full_commit = trans->transid; + atomic64_set(&root->fs_info->last_trans_log_full_commit, +trans->transid); } else { ret = btrfs_insert_inode_ref(trans, dest, new_dentry->d_name.name, diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 7f42a53..bb7c01b 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2227,14 +2227,14 @@ static int wait_log_commit(struct btrfs_trans_handle *trans, &wait, TASK_UNINTERRUPTIBLE); mutex_unlock(&root->log_mutex); - if (root->fs_info->last_trans_log_full_commit != + if (atomic64_read(&root->fs_info->last_trans_log_full_commit) != trans->transid && root->log_transid < transid + 2 && atomic_read(&root->log_commit[index])) schedule(); finish_wait(&root->log_commit_wait[index], &wait); mutex_lock(&root->log_mutex); - } while (root->fs_info->last_trans_log_full_commit != + } while (atomic64_read(&root->fs_info->last_trans_log_full_commit) != trans->transid && root->log_transid < transid + 2 && atomic_read(&root->log_commit[index])); return 0; @@ -2244,12 +2244,12 @@ static void wait_for_writer(struct btrfs_trans_handle *trans, struct btrfs_root *root) { DEFINE_WAIT(wait); - while (root->fs_info->last_trans_log_full_commit != + while (atomic64_read(&root->fs_info->last_trans_log_full_commit) != trans->transid && atomic_read(&root->log_writers)) { prepare_to_wait(&root->log_writer_wait, &wait, TASK_UNINTERRUPTIBLE); mutex_unlock(&root->log_mutex); - if (root->fs_info->last_trans_log_full_commit != + if (atomic64_read(&root->fs_info->last_trans_log_full_commit) != trans->transid && atomic_read(&root->log_writers)) schedule(); mutex_lock(&root->log_mutex); @@ -2306,7 +2306,8 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans, } /* bail out if we need to do a full commit */ - if (root->fs_info->last_trans_log_full_commit == trans->transid) { + if (atomic64_read(&root->fs_info->last_trans_log_full_commit) == +
[PATCH V2 04/10] Btrfs: add a comment for fs_info->max_inline
Though ->max_inline is a 64bit variant, and may be accessed by multi-task, but it is just suggestive number, so we needn't add anything to protect fs_info->max_inline, just add a comment to explain wny we don't use a lock to protect it. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear. --- fs/btrfs/ctree.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 745e7ad..3e672916 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1288,6 +1288,12 @@ struct btrfs_fs_info { atomic64_t last_trans_log_full_commit; unsigned long mount_opt; unsigned long compress_type:4; + /* +* It is a suggestive number, the read side is safe even it gets a +* wrong number because we will write out the data into a regular +* extent. The write side(mount/remount) is under ->s_umount lock, +* so it is also safe. +*/ u64 max_inline; u64 alloc_start; struct btrfs_transaction *running_transaction; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 05/10] Btrfs: protect fs_info->alloc_start
fs_info->alloc_start is a 64bits variant, can be accessed by multi-task, but it is not protected strictly, it can be changed while we are accessing it. On 32bit machine, we will get wrong value because we access it by two instructions.(In fact, it is also possible that the same problem happens on the 64bit machine, because the compiler may split the 64bit operation into two 32bit operation.) For example: Assuming -> alloc_start is 0x 0001 at the beginning, then we remount and set ->alloc_start to 0x 0100 . Task0 Task1 load high 32 bits set high 32 bits set low 32 bits load low 32 bits Task1 will get 0. This patch fixes this problem by using two locks to protect it fs_info->chunk_mutex sb->s_umount On the read side, we just need get one of these two locks, and on the write side, we must lock all of them. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.h | 10 ++ fs/btrfs/super.c | 4 2 files changed, 14 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3e672916..201be7d 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1295,6 +1295,16 @@ struct btrfs_fs_info { * so it is also safe. */ u64 max_inline; + /* +* Protected by ->chunk_mutex and sb->s_umount. +* +* The reason that we use two lock to protect it is because only +* remount and mount operations can change it and these two operations +* are under sb->s_umount, but the read side (chunk allocation) can not +* acquire sb->s_umount or the deadlock would happen. So we use two +* locks to protect it. On the write side, we must acquire two locks, +* and on the read side, we just need acquire one of them. +*/ u64 alloc_start; struct btrfs_transaction *running_transaction; wait_queue_head_t transaction_throttle; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index d8982e9..c96f132 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -519,7 +519,9 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) case Opt_alloc_start: num = match_strdup(&args[0]); if (num) { + mutex_lock(&info->chunk_mutex); info->alloc_start = memparse(num, NULL); + mutex_unlock(&info->chunk_mutex); kfree(num); printk(KERN_INFO "btrfs: allocations start at %llu\n", @@ -1289,7 +1291,9 @@ restore: fs_info->mount_opt = old_opts; fs_info->compress_type = old_compress_type; fs_info->max_inline = old_max_inline; + mutex_lock(&fs_info->chunk_mutex); fs_info->alloc_start = old_alloc_start; + mutex_unlock(&fs_info->chunk_mutex); btrfs_resize_thread_pool(fs_info, old_thread_pool_size, fs_info->thread_pool_size); fs_info->metadata_ratio = old_metadata_ratio; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 06/10] Btrfs: use percpu counter for dirty metadata count
->dirty_metadata_bytes is accessed very frequently, so use percpu counter instead of the u64 variant to reduce the contention of the lock. This patch also fixed the problem that we access it without lock protection in __btrfs_btree_balance_dirty(), which may cause we skip the dirty pages flush. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.h | 9 fs/btrfs/disk-io.c | 64 fs/btrfs/extent_io.c | 9 +++- 3 files changed, 42 insertions(+), 40 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 201be7d..1dcbbfd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -191,6 +191,8 @@ static int btrfs_csum_sizes[] = { 4, 0 }; /* ioprio of readahead is set to idle */ #define BTRFS_IOPRIO_READA (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)) +#define BTRFS_DIRTY_METADATA_THRESH(32 * 1024 * 1024) + /* * The key defines the order in the tree, and so it also defines (optimal) * block layout. @@ -1439,10 +1441,9 @@ struct btrfs_fs_info { u64 total_pinned; - /* protected by the delalloc lock, used to keep from writing -* metadata until there is a nice batch -*/ - u64 dirty_metadata_bytes; + /* used to keep from writing metadata until there is a nice batch */ + struct percpu_counter dirty_metadata_bytes; + s32 dirty_metadata_batch; struct list_head dirty_cowonly_roots; struct btrfs_fs_devices *fs_devices; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 87ed05a..961ac58 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -946,18 +946,20 @@ static int btree_writepages(struct address_space *mapping, struct writeback_control *wbc) { struct extent_io_tree *tree; + struct btrfs_fs_info *fs_info; + int ret; + tree = &BTRFS_I(mapping->host)->io_tree; if (wbc->sync_mode == WB_SYNC_NONE) { - struct btrfs_root *root = BTRFS_I(mapping->host)->root; - u64 num_dirty; - unsigned long thresh = 32 * 1024 * 1024; if (wbc->for_kupdate) return 0; + fs_info = BTRFS_I(mapping->host)->root->fs_info; /* this is a bit racy, but that's ok */ - num_dirty = root->fs_info->dirty_metadata_bytes; - if (num_dirty < thresh) + ret = percpu_counter_compare(&fs_info->dirty_metadata_bytes, +BTRFS_DIRTY_METADATA_THRESH); + if (ret < 0) return 0; } return btree_write_cache_pages(mapping, wbc); @@ -1125,24 +1127,16 @@ struct extent_buffer *read_tree_block(struct btrfs_root *root, u64 bytenr, void clean_tree_block(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *buf) { + struct btrfs_fs_info *fs_info = root->fs_info; + if (btrfs_header_generation(buf) == - root->fs_info->running_transaction->transid) { + fs_info->running_transaction->transid) { btrfs_assert_tree_locked(buf); if (test_and_clear_bit(EXTENT_BUFFER_DIRTY, &buf->bflags)) { - spin_lock(&root->fs_info->delalloc_lock); - if (root->fs_info->dirty_metadata_bytes >= buf->len) - root->fs_info->dirty_metadata_bytes -= buf->len; - else { - spin_unlock(&root->fs_info->delalloc_lock); - btrfs_panic(root->fs_info, -EOVERFLOW, - "Can't clear %lu bytes from " - " dirty_mdatadata_bytes (%llu)", - buf->len, - root->fs_info->dirty_metadata_bytes); - } - spin_unlock(&root->fs_info->delalloc_lock); - + __percpu_counter_add(&fs_info->dirty_metadata_bytes, +-buf->len, +fs_info->dirty_metadata_batch); /* ugh, clear_extent_buffer_dirty needs to lock the page */ btrfs_set_lock_blocking(buf); clear_extent_buffer_dirty(buf); @@ -2004,10 +1998,18 @@ int open_ctree(struct super_block *sb, goto fail_srcu; } + ret = percpu_counter_init(&fs_info->dirty_metadata_bytes, 0); + if (ret) { + err = ret; + goto fail_bdi; + } + fs_info->dirty_metadata_batch = PAGE_CACHE_SIZE * + (1 + ilog2(nr_cpu_ids)); + fs_info->btree_inode = new_inode(sb); if (!fs_info->btree_
[PATCH V2 07/10] Btrfs: use percpu counter for fs_info->delalloc_bytes
fs_info->delalloc_bytes is accessed very frequently, so use percpu counter instead of the u64 variant for it to reduce the lock contention. This patch also fixed the problem that we access the variant without the lock protection.At worst, we would not flush the delalloc inodes, and just return ENOSPC error when we still have some free space in the fs. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.h | 7 --- fs/btrfs/disk-io.c | 18 ++ fs/btrfs/extent-tree.c | 6 -- fs/btrfs/inode.c | 6 -- 4 files changed, 26 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1dcbbfd..51515a3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1383,6 +1383,7 @@ struct btrfs_fs_info { */ struct list_head ordered_extents; + spinlock_t delalloc_lock; /* * all of the inodes that have delalloc bytes. It is possible for * this list to be empty even when there is still dirty data=ordered @@ -1443,7 +1444,10 @@ struct btrfs_fs_info { /* used to keep from writing metadata until there is a nice batch */ struct percpu_counter dirty_metadata_bytes; + struct percpu_counter delalloc_bytes; s32 dirty_metadata_batch; + s32 delalloc_batch; + struct list_head dirty_cowonly_roots; struct btrfs_fs_devices *fs_devices; @@ -1459,9 +1463,6 @@ struct btrfs_fs_info { struct reloc_control *reloc_ctl; - spinlock_t delalloc_lock; - u64 delalloc_bytes; - /* data_alloc_cluster is only used in ssd mode */ struct btrfs_free_cluster data_alloc_cluster; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 961ac58..29d52af 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2006,10 +2006,16 @@ int open_ctree(struct super_block *sb, fs_info->dirty_metadata_batch = PAGE_CACHE_SIZE * (1 + ilog2(nr_cpu_ids)); + ret = percpu_counter_init(&fs_info->delalloc_bytes, 0); + if (ret) { + err = ret; + goto fail_dirty_metadata_bytes; + } + fs_info->btree_inode = new_inode(sb); if (!fs_info->btree_inode) { err = -ENOMEM; - goto fail_dirty_metadata_bytes; + goto fail_delalloc_bytes; } mapping_set_gfp_mask(fs_info->btree_inode->i_mapping, GFP_NOFS); @@ -2264,6 +2270,7 @@ int open_ctree(struct super_block *sb, sectorsize = btrfs_super_sectorsize(disk_super); stripesize = btrfs_super_stripesize(disk_super); fs_info->dirty_metadata_batch = leafsize * (1 + ilog2(nr_cpu_ids)); + fs_info->delalloc_batch = sectorsize * 512 * (1 + ilog2(nr_cpu_ids)); /* * mixed block groups end up with duplicate but slightly offset @@ -2726,6 +2733,8 @@ fail_iput: invalidate_inode_pages2(fs_info->btree_inode->i_mapping); iput(fs_info->btree_inode); +fail_delalloc_bytes: + percpu_counter_destroy(&fs_info->delalloc_bytes); fail_dirty_metadata_bytes: percpu_counter_destroy(&fs_info->dirty_metadata_bytes); fail_bdi: @@ -3357,9 +3366,9 @@ int close_ctree(struct btrfs_root *root) btrfs_free_qgroup_config(root->fs_info); - if (fs_info->delalloc_bytes) { - printk(KERN_INFO "btrfs: at unmount delalloc count %llu\n", - (unsigned long long)fs_info->delalloc_bytes); + if (percpu_counter_sum(&fs_info->delalloc_bytes)) { + printk(KERN_INFO "btrfs: at unmount delalloc count %lld\n", + percpu_counter_sum(&fs_info->delalloc_bytes)); } free_extent_buffer(fs_info->extent_root->node); @@ -3407,6 +3416,7 @@ int close_ctree(struct btrfs_root *root) btrfs_mapping_tree_free(&fs_info->mapping_tree); percpu_counter_destroy(&fs_info->dirty_metadata_bytes); + percpu_counter_destroy(&fs_info->delalloc_bytes); bdi_destroy(&fs_info->bdi); cleanup_srcu_struct(&fs_info->subvol_srcu); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ef61a4a..f4f0b1e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3724,7 +3724,8 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, space_info = block_rsv->space_info; smp_mb(); - delalloc_bytes = root->fs_info->delalloc_bytes; + delalloc_bytes = percpu_counter_sum_positive( + &root->fs_info->delalloc_bytes); if (delalloc_bytes == 0) { if (trans) return; @@ -3766,7 +3767,8 @@ static void shrink_delalloc(struct btrfs_root *root, u64 to_reclaim, u64 orig, break; } smp_mb(); - delalloc_bytes = root->f
[PATCH V2 08/10] Btrfs: use the inode own lock to protect its delalloc_bytes
We need not use a global lock to protect the delalloc_bytes of the inode, just use its own lock. In this way, we can reduce the lock contention and ->delalloc_lock will just protect delalloc inode list. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - none. --- fs/btrfs/btrfs_inode.h | 1 + fs/btrfs/disk-io.c | 2 ++ fs/btrfs/inode.c | 47 ++- 3 files changed, 37 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 2a8c242..c935a77 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -40,6 +40,7 @@ #define BTRFS_INODE_HAS_ASYNC_EXTENT 6 #define BTRFS_INODE_NEEDS_FULL_SYNC7 #define BTRFS_INODE_COPY_EVERYTHING8 +#define BTRFS_INODE_IN_DELALLOC_LIST 9 /* in memory btrfs inode */ struct btrfs_inode { diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 29d52af..abf1f10 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3685,6 +3685,8 @@ static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root) delalloc_inodes); list_del_init(&btrfs_inode->delalloc_inodes); + clear_bit(BTRFS_INODE_IN_DELALLOC_LIST, + &btrfs_inode->runtime_flags); btrfs_invalidate_inodes(btrfs_inode->root); } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d98b508..4f92b35 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1508,15 +1508,22 @@ static void btrfs_set_bit_hook(struct inode *inode, spin_unlock(&BTRFS_I(inode)->lock); } - spin_lock(&root->fs_info->delalloc_lock); - BTRFS_I(inode)->delalloc_bytes += len; __percpu_counter_add(&root->fs_info->delalloc_bytes, len, root->fs_info->delalloc_batch); - if (do_list && list_empty(&BTRFS_I(inode)->delalloc_inodes)) { - list_add_tail(&BTRFS_I(inode)->delalloc_inodes, - &root->fs_info->delalloc_inodes); + spin_lock(&BTRFS_I(inode)->lock); + BTRFS_I(inode)->delalloc_bytes += len; + if (do_list && !test_bit(BTRFS_INODE_IN_DELALLOC_LIST, +&BTRFS_I(inode)->runtime_flags)) { + spin_lock(&root->fs_info->delalloc_lock); + if (list_empty(&BTRFS_I(inode)->delalloc_inodes)) { + list_add_tail(&BTRFS_I(inode)->delalloc_inodes, + &root->fs_info->delalloc_inodes); + set_bit(BTRFS_INODE_IN_DELALLOC_LIST, + &BTRFS_I(inode)->runtime_flags); + } + spin_unlock(&root->fs_info->delalloc_lock); } - spin_unlock(&root->fs_info->delalloc_lock); + spin_unlock(&BTRFS_I(inode)->lock); } } @@ -1551,16 +1558,22 @@ static void btrfs_clear_bit_hook(struct inode *inode, && do_list) btrfs_free_reserved_data_space(inode, len); - spin_lock(&root->fs_info->delalloc_lock); __percpu_counter_add(&root->fs_info->delalloc_bytes, -len, root->fs_info->delalloc_batch); + spin_lock(&BTRFS_I(inode)->lock); BTRFS_I(inode)->delalloc_bytes -= len; - if (do_list && BTRFS_I(inode)->delalloc_bytes == 0 && - !list_empty(&BTRFS_I(inode)->delalloc_inodes)) { - list_del_init(&BTRFS_I(inode)->delalloc_inodes); + test_bit(BTRFS_INODE_IN_DELALLOC_LIST, +&BTRFS_I(inode)->runtime_flags)) { + spin_lock(&root->fs_info->delalloc_lock); + if (!list_empty(&BTRFS_I(inode)->delalloc_inodes)) { + list_del_init(&BTRFS_I(inode)->delalloc_inodes); + clear_bit(BTRFS_INODE_IN_DELALLOC_LIST, + &BTRFS_I(inode)->runtime_flags); + } + spin_unlock(&root->fs_info->delalloc_lock); } - spin_unlock(&root->fs_info->delalloc_lock); + spin_unlock(&BTRFS_I(inode)->lock); } } @@ -7316,14 +7329,19 @@ fail: static int btrfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { + u64 delalloc_bytes; struct inode *inode = dentry->d_inode; u32 blocksize = inode->i_sb->s_blocksize; generic_fillattr(inode, stat); stat->dev = BTRFS_I(inode)->root->anon_dev; stat->blksize = PAGE_CACHE_SIZE; + + spin_lock(&BTRFS_I(inode)->lock); + delall
[PATCH V2 09/10] Btrfs: use seqlock to protect fs_info->avail_{data, metadata, system}_alloc_bits
There is no lock to protect fs_info->avail_{data, metadata, system}_alloc_bits, it may introduce some problem, such as the wrong profile information, so we add a seqlock to protect them. Signed-off-by: Zhao Lei Signed-off-by: Miao Xie --- Changelog v1 -> v2: - none. --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/disk-io.c | 1 + fs/btrfs/extent-tree.c | 22 ++-- fs/btrfs/volumes.c | 56 +++--- 4 files changed, 49 insertions(+), 32 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 51515a3..c95b539 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1474,6 +1474,8 @@ struct btrfs_fs_info { struct rb_root defrag_inodes; atomic_t defrag_running; + /* Used to protect avail_{data, metadata, system}_alloc_bits */ + seqlock_t profiles_lock; /* * these three are in extended format (availability of single * chunks is denoted by BTRFS_AVAIL_ALLOC_BIT_SINGLE bit, other diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index abf1f10..a7797ed 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2036,6 +2036,7 @@ int open_ctree(struct super_block *sb, spin_lock_init(&fs_info->tree_mod_seq_lock); rwlock_init(&fs_info->tree_mod_log_lock); mutex_init(&fs_info->reloc_mutex); + seqlock_init(&fs_info->profiles_lock); init_completion(&fs_info->kobj_unregister); INIT_LIST_HEAD(&fs_info->dirty_cowonly_roots); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f4f0b1e..bbbfa72 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3223,12 +3223,14 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) u64 extra_flags = chunk_to_extended(flags) & BTRFS_EXTENDED_PROFILE_MASK; + write_seqlock(&fs_info->profiles_lock); if (flags & BTRFS_BLOCK_GROUP_DATA) fs_info->avail_data_alloc_bits |= extra_flags; if (flags & BTRFS_BLOCK_GROUP_METADATA) fs_info->avail_metadata_alloc_bits |= extra_flags; if (flags & BTRFS_BLOCK_GROUP_SYSTEM) fs_info->avail_system_alloc_bits |= extra_flags; + write_sequnlock(&fs_info->profiles_lock); } /* @@ -3320,12 +3322,18 @@ u64 btrfs_reduce_alloc_profile(struct btrfs_root *root, u64 flags) static u64 get_alloc_profile(struct btrfs_root *root, u64 flags) { - if (flags & BTRFS_BLOCK_GROUP_DATA) - flags |= root->fs_info->avail_data_alloc_bits; - else if (flags & BTRFS_BLOCK_GROUP_SYSTEM) - flags |= root->fs_info->avail_system_alloc_bits; - else if (flags & BTRFS_BLOCK_GROUP_METADATA) - flags |= root->fs_info->avail_metadata_alloc_bits; + unsigned seq; + + do { + seq = read_seqbegin(&root->fs_info->profiles_lock); + + if (flags & BTRFS_BLOCK_GROUP_DATA) + flags |= root->fs_info->avail_data_alloc_bits; + else if (flags & BTRFS_BLOCK_GROUP_SYSTEM) + flags |= root->fs_info->avail_system_alloc_bits; + else if (flags & BTRFS_BLOCK_GROUP_METADATA) + flags |= root->fs_info->avail_metadata_alloc_bits; + } while (read_seqretry(&root->fs_info->profiles_lock, seq)); return btrfs_reduce_alloc_profile(root, flags); } @@ -7937,12 +7945,14 @@ static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags) u64 extra_flags = chunk_to_extended(flags) & BTRFS_EXTENDED_PROFILE_MASK; + write_seqlock(&fs_info->profiles_lock); if (flags & BTRFS_BLOCK_GROUP_DATA) fs_info->avail_data_alloc_bits &= ~extra_flags; if (flags & BTRFS_BLOCK_GROUP_METADATA) fs_info->avail_metadata_alloc_bits &= ~extra_flags; if (flags & BTRFS_BLOCK_GROUP_SYSTEM) fs_info->avail_system_alloc_bits &= ~extra_flags; + write_sequnlock(&fs_info->profiles_lock); } int btrfs_remove_block_group(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 15f6efd..65f22c2 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1372,14 +1372,19 @@ int btrfs_rm_device(struct btrfs_root *root, char *device_path) u64 devid; u64 num_devices; u8 *dev_uuid; + unsigned seq; int ret = 0; bool clear_super = false; mutex_lock(&uuid_mutex); - all_avail = root->fs_info->avail_data_alloc_bits | - root->fs_info->avail_system_alloc_bits | - root->fs_info->avail_metadata_alloc_bits; + do { + seq = read_seqbegin(&root->fs_info->profiles_lock); + + all_avail = root->fs_info->avail_data_alloc_bits | + root->fs_info->avail_system_alloc_bits | +
[PATCH V2 10/10] Btrfs: use bit operation for ->fs_state
There is no lock to protect fs_info->fs_state, it will introduce some problems, such as the value may be covered by the other task when several tasks modify it. For example: Task0 - CPU0Task1 - CPU1 mov %fs_state rax or $0x1 rax mov %fs_state rax or $0x2 rax mov rax %fs_state mov rax %fs_state The expected value is 3, but in fact, it is 2. Though this problem doesn't happen now (because there is only one flag currently), the code is error prone, if we add other flags, the above problem will happen to a certainty. Now we use bit operation for it to fix the above problem. In this way, we can make the code more robust and be easy to add new flags. Signed-off-by: Miao Xie --- Changelog v1 -> v2: - modify the changelog and make it more clear and stringency. --- fs/btrfs/ctree.h | 4 +++- fs/btrfs/disk-io.c | 5 +++-- fs/btrfs/file.c| 2 +- fs/btrfs/scrub.c | 2 +- fs/btrfs/super.c | 4 ++-- fs/btrfs/transaction.c | 9 - 6 files changed, 14 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c95b539..c34e36e 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -338,7 +338,9 @@ static inline unsigned long btrfs_chunk_item_size(int num_stripes) /* * File system states */ +#define BTRFS_FS_STATE_ERROR 0 +/* Super block flags */ /* Errors detected */ #define BTRFS_SUPER_FLAG_ERROR (1ULL << 2) @@ -1540,7 +1542,7 @@ struct btrfs_fs_info { u64 qgroup_seq; /* filesystem state */ - u64 fs_state; + unsigned long fs_state; struct btrfs_delayed_root *delayed_root; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a7797ed..caf329c 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2196,7 +2196,8 @@ int open_ctree(struct super_block *sb, goto fail_alloc; /* check FS state, whether FS is broken. */ - fs_info->fs_state |= btrfs_super_flags(disk_super); + if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_ERROR) + set_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state); ret = btrfs_check_super_valid(fs_info, sb->s_flags & MS_RDONLY); if (ret) { @@ -3354,7 +3355,7 @@ int close_ctree(struct btrfs_root *root) printk(KERN_ERR "btrfs: commit super ret %d\n", ret); } - if (fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) + if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) btrfs_error_commit_super(root); btrfs_put_block_group_cache(fs_info); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 910ea99..796fd79 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1531,7 +1531,7 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, * although we have opened a file as writable, we have * to stop this write operation to ensure FS consistency. */ - if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) { + if (test_bit(BTRFS_FS_STATE_ERROR, &root->fs_info->fs_state)) { mutex_unlock(&inode->i_mutex); err = -EROFS; goto out; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index af0b566..2e91b56 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2700,7 +2700,7 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, int ret; struct btrfs_root *root = sctx->dev_root; - if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) + if (test_bit(BTRFS_FS_STATE_ERROR, &root->fs_info->fs_state)) return -EIO; gen = atomic64_read(&root->fs_info->last_trans_committed); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index c96f132..6528482 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -98,7 +98,7 @@ static void __save_error_info(struct btrfs_fs_info *fs_info) * today we only save the error info into ram. Long term we'll * also send it down to the disk */ - fs_info->fs_state = BTRFS_SUPER_FLAG_ERROR; + set_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state); } static void save_error_info(struct btrfs_fs_info *fs_info) @@ -114,7 +114,7 @@ static void btrfs_handle_error(struct btrfs_fs_info *fs_info) if (sb->s_flags & MS_RDONLY) return; - if (fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) { + if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) { sb->s_flags |= MS_RDONLY; printk(KERN_INFO "btrfs is forced readonly\n"); /* diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 29fdf1c..50437b4 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -62,7 +62,7 @@ static noinline int join_transaction(struct btrfs_root *root, int type) spin_lock(&fs_info->trans_lock); loop: /*
Re: [PATCH 2/2] btrfs-progs: remove btrfslabel.[c|h]
On Tue, 29 Jan 2013 14:24:13 +0800, Jeff Liu wrote: > Clean btrfslabel.[c|h] out of the source tree and move those related > functions to utils.[c|h]. > > Signed-off-by: Jie Liu > CC: David Sterba > CC: Gene Czarcinski > --- > Makefile |4 +- > btrfslabel.c | 178 > - > btrfslabel.h |5 -- > cmds-filesystem.c |1 - > utils.c | 129 ++ > utils.h |2 + > 6 files changed, 133 insertions(+), 186 deletions(-) > delete mode 100644 btrfslabel.c > delete mode 100644 btrfslabel.h > > diff --git a/Makefile b/Makefile > index 4894903..e54b21e 100644 > --- a/Makefile > +++ b/Makefile > @@ -4,8 +4,8 @@ CFLAGS = -g -O1 > objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ > root-tree.o dir-item.o file-item.o inode-item.o \ > inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \ > - volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \ > - send-stream.o send-utils.o qgroup.o > + volumes.o utils.o btrfs-list.o repair.o send-stream.o \ > + send-utils.o qgroup.o > cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o > \ > cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ > cmds-quota.o cmds-qgroup.o > diff --git a/btrfslabel.c b/btrfslabel.c > deleted file mode 100644 > index 2826050..000 > --- a/btrfslabel.c > +++ /dev/null > @@ -1,178 +0,0 @@ > -/* > - * Copyright (C) 2008 Morey Roof. All rights reserved. > - * > - * This program is free software; you can redistribute it and/or > - * modify it under the terms of the GNU General Public > - * License v2 as published by the Free Software Foundation. > - * > - * This program is distributed in the hope that it will be useful, > - * but WITHOUT ANY WARRANTY; without even the implied warranty of > - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > - * General Public License for more details. > - * > - * You should have received a copy of the GNU General Public > - * License along with this program; if not, write to the > - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, > - * Boston, MA 021110-1307, USA. > - */ > - > -#define _GNU_SOURCE > - > -#ifndef __CHECKER__ > -#include > -#include > -#include "ioctl.h" > -#endif /* __CHECKER__ */ > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include "kerncompat.h" > -#include "ctree.h" > -#include "utils.h" > -#include "version.h" > -#include "disk-io.h" > -#include "transaction.h" > - > -#define MOUNTED1 > -#define UNMOUNTED 2 > -#define GET_LABEL 3 > -#define SET_LABEL 4 > - > -static int set_label_unmounted(const char *dev, const char *label) > -{ > - struct btrfs_trans_handle *trans; > - struct btrfs_root *root; > - int ret; > - > - ret = check_mounted(dev); > - if (ret < 0) { > -fprintf(stderr, "FATAL: error checking %s mount status\n", dev); > -return -1; > - } > - if (ret > 0) { > - fprintf(stderr, "ERROR: dev %s is mounted, use mount point\n", > - dev); > - return -1; > - } > - > - if (strlen(label) > BTRFS_LABEL_SIZE - 1) { > - fprintf(stderr, "ERROR: Label %s is too long (max %d)\n", > - label, BTRFS_LABEL_SIZE - 1); > - return -1; > - } > - > - /* Open the super_block at the default location > - * and as read-write. > - */ > - root = open_ctree(dev, 0, 1); > - if (!root) /* errors are printed by open_ctree() */ > - return -1; > - > - trans = btrfs_start_transaction(root, 1); > - snprintf(root->fs_info->super_copy.label, BTRFS_LABEL_SIZE, "%s", > - label); > - btrfs_commit_transaction(trans, root); > - > - /* Now we close it since we are done. */ > - close_ctree(root); > - return 0; > -} > - > -static int set_label_mounted(const char *mount_path, const char *label) > -{ > - int fd; > - > - fd = open(mount_path, O_RDONLY | O_NOATIME); > - if (fd < 0) { > - fprintf(stderr, "ERROR: unable access to '%s'\n", mount_path); > - return -1; > - } > - > - if (ioctl(fd, BTRFS_IOC_SET_FSLABEL, label) < 0) { > - fprintf(stderr, "ERROR: unable to set label %s\n", > - strerror(errno)); > - close(fd); > - return -1; > - } > - > - return 0; > -} > - > -static int get_label_unmounted(const char *dev) > -{ > - struct btrfs_root *root; > - int ret; > - > - ret = check_mounted(dev); > - if (ret < 0) { > -fprintf(stderr, "FATAL: error checking %s mount status\n", dev); > -return -1; > - }
Re: [PATCH 2/2] btrfs-progs: remove btrfslabel.[c|h]
On 01/29/2013 06:26 PM, Stefan Behrens wrote: > On Tue, 29 Jan 2013 14:24:13 +0800, Jeff Liu wrote: >> Clean btrfslabel.[c|h] out of the source tree and move those related >> functions to utils.[c|h]. >> >> Signed-off-by: Jie Liu >> CC: David Sterba >> CC: Gene Czarcinski >> --- >> Makefile |4 +- >> btrfslabel.c | 178 >> - >> btrfslabel.h |5 -- >> cmds-filesystem.c |1 - >> utils.c | 129 ++ >> utils.h |2 + >> 6 files changed, 133 insertions(+), 186 deletions(-) >> delete mode 100644 btrfslabel.c >> delete mode 100644 btrfslabel.h >> >> diff --git a/Makefile b/Makefile >> index 4894903..e54b21e 100644 >> --- a/Makefile >> +++ b/Makefile >> @@ -4,8 +4,8 @@ CFLAGS = -g -O1 >> objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ >>root-tree.o dir-item.o file-item.o inode-item.o \ >>inode-map.o crc32c.o rbtree.o extent-cache.o extent_io.o \ >> - volumes.o utils.o btrfs-list.o btrfslabel.o repair.o \ >> - send-stream.o send-utils.o qgroup.o >> + volumes.o utils.o btrfs-list.o repair.o send-stream.o \ >> + send-utils.o qgroup.o >> cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o >> cmds-scrub.o \ >> cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ >> cmds-quota.o cmds-qgroup.o >> diff --git a/btrfslabel.c b/btrfslabel.c >> deleted file mode 100644 >> index 2826050..000 >> --- a/btrfslabel.c >> +++ /dev/null >> @@ -1,178 +0,0 @@ >> -/* >> - * Copyright (C) 2008 Morey Roof. All rights reserved. >> - * >> - * This program is free software; you can redistribute it and/or >> - * modify it under the terms of the GNU General Public >> - * License v2 as published by the Free Software Foundation. >> - * >> - * This program is distributed in the hope that it will be useful, >> - * but WITHOUT ANY WARRANTY; without even the implied warranty of >> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> - * General Public License for more details. >> - * >> - * You should have received a copy of the GNU General Public >> - * License along with this program; if not, write to the >> - * Free Software Foundation, Inc., 59 Temple Place - Suite 330, >> - * Boston, MA 021110-1307, USA. >> - */ >> - >> -#define _GNU_SOURCE >> - >> -#ifndef __CHECKER__ >> -#include >> -#include >> -#include "ioctl.h" >> -#endif /* __CHECKER__ */ >> - >> -#include >> -#include >> -#include >> -#include >> -#include >> -#include >> -#include >> -#include >> -#include >> -#include >> -#include "kerncompat.h" >> -#include "ctree.h" >> -#include "utils.h" >> -#include "version.h" >> -#include "disk-io.h" >> -#include "transaction.h" >> - >> -#define MOUNTED1 >> -#define UNMOUNTED 2 >> -#define GET_LABEL 3 >> -#define SET_LABEL 4 >> - >> -static int set_label_unmounted(const char *dev, const char *label) >> -{ >> -struct btrfs_trans_handle *trans; >> -struct btrfs_root *root; >> -int ret; >> - >> -ret = check_mounted(dev); >> -if (ret < 0) { >> - fprintf(stderr, "FATAL: error checking %s mount status\n", dev); >> - return -1; >> -} >> -if (ret > 0) { >> -fprintf(stderr, "ERROR: dev %s is mounted, use mount point\n", >> -dev); >> -return -1; >> -} >> - >> -if (strlen(label) > BTRFS_LABEL_SIZE - 1) { >> -fprintf(stderr, "ERROR: Label %s is too long (max %d)\n", >> -label, BTRFS_LABEL_SIZE - 1); >> -return -1; >> -} >> - >> -/* Open the super_block at the default location >> - * and as read-write. >> - */ >> -root = open_ctree(dev, 0, 1); >> -if (!root) /* errors are printed by open_ctree() */ >> -return -1; >> - >> -trans = btrfs_start_transaction(root, 1); >> -snprintf(root->fs_info->super_copy.label, BTRFS_LABEL_SIZE, "%s", >> - label); >> -btrfs_commit_transaction(trans, root); >> - >> -/* Now we close it since we are done. */ >> -close_ctree(root); >> -return 0; >> -} >> - >> -static int set_label_mounted(const char *mount_path, const char *label) >> -{ >> -int fd; >> - >> -fd = open(mount_path, O_RDONLY | O_NOATIME); >> -if (fd < 0) { >> -fprintf(stderr, "ERROR: unable access to '%s'\n", mount_path); >> -return -1; >> -} >> - >> -if (ioctl(fd, BTRFS_IOC_SET_FSLABEL, label) < 0) { >> -fprintf(stderr, "ERROR: unable to set label %s\n", >> -strerror(errno)); >> -close(fd); >> -return -1; >> -} >> - >> -return 0; >> -} >> - >> -static int get_label_unmounted(const char *dev) >> -{ >> -struct btrfs_root *root; >> -int ret; >> - >> -ret = check_mounted(
About Chunk Tree recover
Hi, everyone. About 1 years ago, we implemented the chunk tree recover function, but it has not been applied till now because that implementation need change the disk format. (http://marc.info/?l=linux-btrfs&m=129914269932543&w=2 http://marc.info/?l=linux-btrfs&m=130976668006281&w=2 http://marc.info/?l=linux-btrfs&m=129914269932543&w=2) Recently, I reconsidered the implementation of this function, and found a new approach that needn't change the disk format. That is the external chunk tree backup, just like external journal device of ext4. The basic idea is: - specify a external file or device which is used to backup the chunk tree when mount. - When mount, compare the super block in the external file/device with the super block of the btrfs, if the checksum of the super block in the externel file/device is right, and the FS UUID and generation are the same as the fs, it means the chunk tree in the external file/device is valid, needn't rebuild it. Otherwise, we will rebuild the chunk tree in the external file/device according to the chunk tree of the fs. - When we allocate a new chunk, we will log the new chunk information into the external file/device - sync the external file/device when committing the transaction - If the chunk tree of the fs is corrupted, we use the information in the external file/device to recover it. By this way, we needn't change disk format and also needn't do a block device scan which need lots of time, and is very hard to find the start address and length of a chunk. Any comment for this idea? Thanks Miao -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/12] Btrfs-progs: move printing subvol list outside of btrfs_list_subvols
Hi, > To improve the code reuse its better to have btrfs_list_subvols > just return list of subvols witout printing > > Signed-off-by: Anand Jain > --- > btrfs-list.c | 28 ++-- > btrfs-list.h | 2 +- > cmds-subvolume.c | 4 ++-- > 3 files changed, 21 insertions(+), 13 deletions(-) > > diff --git a/btrfs-list.c b/btrfs-list.c > index cb42fbc..b404e1d 100644 > --- a/btrfs-list.c > +++ b/btrfs-list.c > @@ -1439,15 +1439,11 @@ static void print_all_volume_info(struct root_lookup > *sorted_tree, > } > } > > -int btrfs_list_subvols(int fd, struct btrfs_list_filter_set *filter_set, > -struct btrfs_list_comparer_set *comp_set, > -int is_tab_result) > +int btrfs_list_subvols(int fd, struct root_lookup *root_lookup) > { > - struct root_lookup root_lookup; > - struct root_lookup root_sort; > int ret; > > - ret = __list_subvol_search(fd, &root_lookup); > + ret = __list_subvol_search(fd, root_lookup); > if (ret) { > fprintf(stderr, "ERROR: can't perform the search - %s\n", > strerror(errno)); > @@ -1458,16 +1454,28 @@ int btrfs_list_subvols(int fd, struct > btrfs_list_filter_set *filter_set, >* now we have an rbtree full of root_info objects, but we need to fill >* in their path names within the subvol that is referencing each one. >*/ > - ret = __list_subvol_fill_paths(fd, &root_lookup); > - if (ret < 0) > - return ret; > + ret = __list_subvol_fill_paths(fd, root_lookup); > + return ret; > +} > > +int btrfs_list_subvols_print(int fd, struct btrfs_list_filter_set > *filter_set, > +struct btrfs_list_comparer_set *comp_set, > +int is_tab_result) > +{ > + struct root_lookup root_lookup; > + struct root_lookup root_sort; > + int ret; > + > + ret = btrfs_list_subvols(fd, &root_lookup); > + if (ret) > + return ret; > __filter_and_sort_subvol(&root_lookup, &root_sort, filter_set, >comp_set, fd); > > print_all_volume_info(&root_sort, is_tab_result); > __free_all_subvolumn(&root_lookup); Here we forget to free filter and comp_set before..i hope you can add it to your patchset.. Maybe you can have patch 13... if (filter_set) btrfs_list_free_filter_set(filter_set); if (comp_set) btrfs_list_free_comparer_set(comp_set); Thanks, Wang > - return ret; > + > + return 0; > } > > static int print_one_extent(int fd, struct btrfs_ioctl_search_header *sh, > diff --git a/btrfs-list.h b/btrfs-list.h > index cde4b3c..71fe0f3 100644 > --- a/btrfs-list.h > +++ b/btrfs-list.h > @@ -98,7 +98,7 @@ int btrfs_list_setup_comparer(struct > btrfs_list_comparer_set **comp_set, > enum btrfs_list_comp_enum comparer, > int is_descending); > > -int btrfs_list_subvols(int fd, struct btrfs_list_filter_set *filter_set, > +int btrfs_list_subvols_print(int fd, struct btrfs_list_filter_set > *filter_set, > struct btrfs_list_comparer_set *comp_set, > int is_tab_result); > int btrfs_list_find_updated_files(int fd, u64 root_id, u64 oldest_gen); > diff --git a/cmds-subvolume.c b/cmds-subvolume.c > index e3cdb1e..c35dff7 100644 > --- a/cmds-subvolume.c > +++ b/cmds-subvolume.c > @@ -406,7 +406,7 @@ static int cmd_subvol_list(int argc, char **argv) > BTRFS_LIST_FILTER_TOPID_EQUAL, > top_id); > > - ret = btrfs_list_subvols(fd, filter_set, comparer_set, > + ret = btrfs_list_subvols_print(fd, filter_set, comparer_set, > is_tab_result); > if (ret) > return 19; > @@ -613,7 +613,7 @@ static int cmd_subvol_get_default(int argc, char **argv) > btrfs_list_setup_filter(&filter_set, BTRFS_LIST_FILTER_ROOTID, > default_id); > > - ret = btrfs_list_subvols(fd, filter_set, NULL, 0); > + ret = btrfs_list_subvols_print(fd, filter_set, NULL, 0); > if (ret) > return 19; > return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH, RFC] mkfs: collapse redundant logic in custom_alloc_extent()
Hi Eric, On Fri, Jan 25, 2013 at 5:57 PM, Eric Sandeen wrote: > It looks to me like the logic in these two if statements are > overlapping. > > The test for flags & BTRFS_BLOCK_GROUP_SYSTEM in the 2nd case > should never get triggered, because it would have triggered > on the first case, right? > > And since the actions are identical in both cases, this can be > collapsed into one. > > Signed-off-by: Eric Sandeen > --- > > p.s. > > Having done that, I now look at the nearly identical > custom_alloc_extent() copy in convert.c, and wonder if it's > intentional that the convert copy does not care about > BTRFS_BLOCK_GROUP_METADATA, but mkfs does? I'm not quite > sure what's going on there. When I was digging in the conversion code, I saw that btrfs_make_block_groups() uses some heuristics to define some block groups as DATA and some as METADATA. But later, the conversion code, as you noticed, doesn't care about this, and there are data EXTENT_ITEMs that land in METADATA block groups. I am not sure if this is a problem or not, I asked here: http://www.spinics.net/lists/linux-btrfs/msg19894.html, but got no answers:( Since then I tried to develop my own version of convert that lays out block groups more properly, but it makes some assumptions on the free space on the block device being converted. Alex. > > Thanks, > -Eric > > diff --git a/mkfs.c b/mkfs.c > index ca850d9..5d77428 100644 > --- a/mkfs.c > +++ b/mkfs.c > @@ -635,14 +635,10 @@ static int custom_alloc_extent(struct btrfs_root *root, > u64 num_bytes, > > cache = btrfs_lookup_block_group(root->fs_info, start); > BUG_ON(!cache); > - if (cache->flags & BTRFS_BLOCK_GROUP_SYSTEM || > - last > cache->key.objectid + cache->key.offset) { > - last = cache->key.objectid + cache->key.offset; > - continue; > - } > > if (cache->flags & (BTRFS_BLOCK_GROUP_SYSTEM | > - BTRFS_BLOCK_GROUP_METADATA)) { > + BTRFS_BLOCK_GROUP_METADATA) || > + last > cache->key.objectid + cache->key.offset) { > last = cache->key.objectid + cache->key.offset; > continue; > } > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On Mon, Jan 28, 2013 at 07:30:09PM -0700, Liu Bo wrote: > On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote: > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: > > > Hi Josef, > > > > > > Thanks for the patch - sorry for the long delay in testing... > > > > > > > Jim, > > > > I've been trying to reason out how this happens, could you do a btrfs fi df > > on > > the filesystem thats giving you trouble so I can see if what I think is > > happening is what's actually happening. Thanks, > > Josef, > > A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib > 251 [not run] FSTRIM is not supported Are you sure its 251? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On Tue, Jan 29, 2013 at 08:47:30AM -0500, Josef Bacik wrote: > On Mon, Jan 28, 2013 at 07:30:09PM -0700, Liu Bo wrote: > > On Mon, Jan 28, 2013 at 04:23:31PM -0500, Josef Bacik wrote: > > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: > > > > Hi Josef, > > > > > > > > Thanks for the patch - sorry for the long delay in testing... > > > > > > > > > > Jim, > > > > > > I've been trying to reason out how this happens, could you do a btrfs fi > > > df on > > > the filesystem thats giving you trouble so I can see if what I think is > > > happening is what's actually happening. Thanks, > > > > Josef, > > > > A quick reproducer here: running xfstests 251 with autodefrag,compress=zlib > > > > > 251 [not run] FSTRIM is not supported > > Are you sure its 251? Thanks, Sorry it's early, I need a device that does trim. /me waits for his fusion card to get back from the shop, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Add static compile target
On Tue, Jan 29, 2013 at 12:31:53AM +0100, Ian Kumlien wrote: > This means that dists are striping binaries... > In which case it would be no problem to have then build the static target, > perhaps we could try to verify if they are available and build btrfs.static > and btrfsck.static if possible I like that, keeping the .static versions along the dynamic ones, just for the rescue purposes. (And to reduce the total file size even further merge the fsck functionlity into 'btrfs', but this is not a primary goal now.) david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: corrupted file size on inline extent conversion?
On Mon, Jan 28, 2013 at 05:12:12PM -0700, Sage Weil wrote: > A ceph user observed a incorrect i_size on btrfs. The pattern looks like > this: > > - some writes at low file offsets > - a write to 4185600 len 8704 (i_size should be 4MB) > - more writes to low offsets > - a write to 4181504 len 4096 (abutts the write above) > - a bit of time goes by... > - stat returns 4186112 (4MB - 8192) > - that's a fwe bytes to the right of the top write above. > > There are some logs showing the full read/write activity to the file at > > http://tracker.newdream.net/attachments/658/object_log.txt > > on issue > > http://tracker.newdream.net/issues/3810 > > The kernel was 3.7.0-030700-generic (and probably also observed on 3.7.1). > > Is this a known bug? Not known but I took a long hard look at our ordered i size updating and I think I spotted the bug. Could you run this patch and see if you get the printk? If you do then that was the problem and you should be good to go. It definitely needs to be fixed, hopefully it's also your bug. Thanks, Josef diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c index cbd4838..dbd4905 100644 --- a/fs/btrfs/ordered-data.c +++ b/fs/btrfs/ordered-data.c @@ -895,8 +895,14 @@ int btrfs_ordered_update_i_size(struct inode *inode, u64 offset, * if the disk i_size is already at the inode->i_size, or * this ordered extent is inside the disk i_size, we're done */ - if (disk_i_size == i_size || offset <= disk_i_size) { + if (disk_i_size == i_size) goto out; + + if (offset <= disk_i_size) { + if (ordered && ordered->outstanding_isize > disk_i_size) + printk(KERN_ERR "this would have bitten us in the ass\n"); + else + goto out; } /* -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: refactor check_label()
On Tue, Jan 29, 2013 at 02:24:12PM +0800, Jeff Liu wrote: > --- a/utils.c > +++ b/utils.c > @@ -1122,17 +1122,21 @@ char *pretty_sizes(u64 size) >-1if the label is too long >-2if the label contains an invalid character > */ > -int check_label(char *input) > +static int check_label(char *input) > { > int i; > int len = strlen(input); > > - if (len > BTRFS_LABEL_SIZE) { > + if (len > BTRFS_LABEL_SIZE - 1) { > + fprintf(stderr, "ERROR: Label %s is too long (max %d)\n", > + input, BTRFS_LABEL_SIZE - 1); > return -1; > } > > for (i = 0; i < len; i++) { > if (input[i] == '/' || input[i] == '\\') { > + fprintf(stderr, "ERROR: Label %s contains invalid " > + "characters\n", input); > return -2; > } Plase drop this check, see http://repo.or.cz/w/btrfs-progs-unstable/devel.git/commit/79e0e445fc2365e47fc7f060d5a4445d37e184b8 (also function comment and maybe the callers) "btrfs-progs: kill check for /'s in labels This patch kills a check in mkfs's label stuff which doesn't allow labels that have /'s in them. This causes problems for Anaconda which try to label volumes with their mountpoints." (mkfs.c) thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About Chunk Tree recover
On Tue, Jan 29, 2013 at 04:13:47AM -0700, Miao Xie wrote: > Hi, everyone. > > About 1 years ago, we implemented the chunk tree recover function, > but it has not been applied till now because that implementation > need change the disk format. > (http://marc.info/?l=linux-btrfs&m=129914269932543&w=2 > http://marc.info/?l=linux-btrfs&m=130976668006281&w=2 > http://marc.info/?l=linux-btrfs&m=129914269932543&w=2) > > Recently, I reconsidered the implementation of this function, and > found a new approach that needn't change the disk format. That is > the external chunk tree backup, just like external journal device > of ext4. The basic idea is: I do like the idea of a dedicated chunk backup area, outside the filesystem. But, I think we need to be able to fall back to the scanning operation. The chunk tree backup actually fits well into the log area I'm setting up for raid5/6. The log area is really just a dedicated chunk where I'm stuffing blocks to avoid read/modify/write and to make sub stripe writes power cut safe. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fwd: btrfsck and ctree version
i know that the proposed ctree.c file is from a kernel source but btrfsck is user space only, since the btrfs-next is newer than btrfs-prog i was hoping for a commit of this change for the user-space version. since this file-system have been created prior kernel 3.2 there is no tree root backup i was hoping using btrfsck to regenerate the csum which are failing during mount time (Input/output error) /var/log/messages: btrfs csum failed ino 1048522 off 5124096 csum 1219517398 private 836806197 i didn't find any way to deactivate csum check with a mount option or as chris mention is there a way to regenerate the cache on the block device. is there a solution ? thanks for your responses olivier 2013/1/29 Chris Mason > > On Mon, Jan 28, 2013 at 03:03:08PM -0700, David Sterba wrote: > > On Mon, Jan 28, 2013 at 03:07:13PM +0100, polack christian wrote: > > > i did use btrfsck to recover it > > > i got the tool from > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git > > > > > > and i got this error message: > > > ... > > > Check tree block failed, want=294555648, have=0 > > > Check tree block failed, want=294559744, have=0 > > > Check tree block failed, want=294559744, have=0 > > > btrfsck: ctree.c:1690: leaf_space_used: Assertion `!(data_len < 0)' > > > failed. > > > Aborted (core dumped) > > > > > > looking at > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git > > > > but this is a kernel source repository, not progs, I wonder > > > > > this error in ctree.c have been corrected by this commit > > > > > > http://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=41be1f3b40b87de33cd2e7463dce88596dbdccc4 > > > > how this could happen. I have looked at the whether it does not silently > > fix a bug, nothing wrong I can see now. How did you verify that the > > patch fixes the fsck problem? > > It sounds much more like the reboot or remount cleared the cache on the > block device. > > -chris > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V6 23/30] btrfs: add support for read_iter and write_iter
btrfs can use generic_file_read_iter(). Base btrfs_file_write_iter() on btrfs_file_aio_write(), then have the latter call the former. Signed-off-by: Dave Kleikamp Cc: Zach Brown Cc: Chris Mason Cc: linux-btrfs@vger.kernel.org --- fs/btrfs/file.c | 42 ++ 1 file changed, 14 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f76b1fd..f23e24b 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -437,7 +437,7 @@ static noinline int btrfs_copy_from_user(loff_t pos, int num_pages, write_bytes -= copied; total_copied += copied; - /* Return to btrfs_file_aio_write to fault page */ + /* Return to btrfs_file_write_iter to fault page */ if (unlikely(copied == 0)) break; @@ -1426,27 +1426,23 @@ static noinline ssize_t __btrfs_buffered_write(struct file *file, } static ssize_t __btrfs_direct_write(struct kiocb *iocb, - const struct iovec *iov, - unsigned long nr_segs, loff_t pos, - loff_t *ppos, size_t count, size_t ocount) +struct iov_iter *iter, loff_t pos, + loff_t *ppos, size_t count) { struct file *file = iocb->ki_filp; - struct iov_iter i; ssize_t written; ssize_t written_buffered; loff_t endbyte; int err; - written = generic_file_direct_write(iocb, iov, &nr_segs, pos, ppos, - count, ocount); + written = generic_file_direct_write_iter(iocb, iter, pos, ppos, count); if (written < 0 || written == count) return written; pos += written; count -= written; - iov_iter_init(&i, iov, nr_segs, count, written); - written_buffered = __btrfs_buffered_write(file, &i, pos); + written_buffered = __btrfs_buffered_write(file, iter, pos); if (written_buffered < 0) { err = written_buffered; goto out; @@ -1481,9 +1477,8 @@ static void update_time_for_write(struct inode *inode) inode_inc_iversion(inode); } -static ssize_t btrfs_file_aio_write(struct kiocb *iocb, - const struct iovec *iov, - unsigned long nr_segs, loff_t pos) +static ssize_t btrfs_file_write_iter(struct kiocb *iocb, +struct iov_iter *iter, loff_t pos) { struct file *file = iocb->ki_filp; struct inode *inode = fdentry(file)->d_inode; @@ -1492,19 +1487,14 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, u64 start_pos; ssize_t num_written = 0; ssize_t err = 0; - size_t count, ocount; + size_t count; bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host); sb_start_write(inode->i_sb); mutex_lock(&inode->i_mutex); - err = generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ); - if (err) { - mutex_unlock(&inode->i_mutex); - goto out; - } - count = ocount; + count = iov_iter_count(iter); current->backing_dev_info = inode->i_mapping->backing_dev_info; err = generic_write_checks(file, &pos, &count, S_ISBLK(inode->i_mode)); @@ -1557,14 +1547,10 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb, atomic_inc(&BTRFS_I(inode)->sync_writers); if (unlikely(file->f_flags & O_DIRECT)) { - num_written = __btrfs_direct_write(iocb, iov, nr_segs, - pos, ppos, count, ocount); + num_written = __btrfs_direct_write(iocb, iter, pos, ppos, + count); } else { - struct iov_iter i; - - iov_iter_init(&i, iov, nr_segs, count, num_written); - - num_written = __btrfs_buffered_write(file, &i, pos); + num_written = __btrfs_buffered_write(file, iter, pos); if (num_written > 0) *ppos = pos + num_written; } @@ -2387,9 +2373,9 @@ const struct file_operations btrfs_file_operations = { .llseek = btrfs_file_llseek, .read = do_sync_read, .write = do_sync_write, - .aio_read = generic_file_aio_read, .splice_read= generic_file_splice_read, - .aio_write = btrfs_file_aio_write, + .read_iter = generic_file_read_iter, + .write_iter = btrfs_file_write_iter, .mmap = btrfs_file_mmap, .open = generic_file_open, .release= btrfs_release_file, -- 1.8.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On Tue, Jan 29, 2013 at 08:50:34AM -0500, Josef Bacik wrote: > On Tue, Jan 29, 2013 at 08:47:30AM -0500, Josef Bacik wrote: > > 251 [not run] FSTRIM is not supported > > > > Are you sure its 251? Thanks, > > Sorry it's early, I need a device that does trim. /me waits for his fusion > card > to get back from the shop, You can use scsi_debug device with parm: lbpu:enable LBP, support UNMAP command (def=0) (int) david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On Tue, Jan 29, 2013 at 05:43:31PM +0100, David Sterba wrote: > On Tue, Jan 29, 2013 at 08:50:34AM -0500, Josef Bacik wrote: > You can use scsi_debug device with > > parm: lbpu:enable LBP, support UNMAP command (def=0) (int) Also, loop device with a file backed by a filesystem with hole punch support also understands TRIM. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: refactor check_label()
On 01/29/2013 11:19 PM, David Sterba wrote: > On Tue, Jan 29, 2013 at 02:24:12PM +0800, Jeff Liu wrote: >> --- a/utils.c >> +++ b/utils.c >> @@ -1122,17 +1122,21 @@ char *pretty_sizes(u64 size) >>-1if the label is too long >>-2if the label contains an invalid character >> */ >> -int check_label(char *input) >> +static int check_label(char *input) >> { >> int i; >> int len = strlen(input); >> >> - if (len > BTRFS_LABEL_SIZE) { >> + if (len > BTRFS_LABEL_SIZE - 1) { >> +fprintf(stderr, "ERROR: Label %s is too long (max %d)\n", >> +input, BTRFS_LABEL_SIZE - 1); >> return -1; >> } >> >> for (i = 0; i < len; i++) { >> if (input[i] == '/' || input[i] == '\\') { >> +fprintf(stderr, "ERROR: Label %s contains invalid " >> +"characters\n", input); >> return -2; >> } > > Plase drop this check, see > http://repo.or.cz/w/btrfs-progs-unstable/devel.git/commit/79e0e445fc2365e47fc7f060d5a4445d37e184b8 > (also function comment and maybe the callers) > > "btrfs-progs: kill check for /'s in labels > > This patch kills a check in mkfs's label stuff which doesn't allow > labels that have /'s in them. This causes problems for Anaconda which > try to label volumes with their mountpoints." > (mkfs.c) Ok, so looks we can safely clean this routine out of the code base since there is no other users call it if am not missing anything. Thanks, -Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On 01/28/2013 02:23 PM, Josef Bacik wrote: > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: >> Hi Josef, >> >> Thanks for the patch - sorry for the long delay in testing... >> > > Jim, > > I've been trying to reason out how this happens, could you do a btrfs fi df on > the filesystem thats giving you trouble so I can see if what I think is > happening is what's actually happening. Thanks, Here's an example, using a slightly different kernel than my previous report. It's your btrfs-next master branch (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state") with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree). Here I'm finding the file system in question: # ls -l /dev/mapper | grep dm-93 lrwxrwxrwx 1 root root 8 Jan 29 11:13 cs53s19p2 -> ../dm-93 # df -h | grep -A 1 cs53s19p2 /dev/mapper/cs53s19p2 896G 1.1G 896G 1% /ram/mnt/ceph/data.osd.522 Here's the info you asked for: # btrfs fi df /ram/mnt/ceph/data.osd.522 Data: total=2.01GB, used=1.00GB System: total=4.00MB, used=64.00KB Metadata: total=8.00MB, used=7.56MB And here's the backtrace that had trouble on dm-93. It's a little different to my previous report: [ 705.496463] [ cut here ] [ 705.501123] WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x60/0x110 [btrfs]() [ 705.509751] Hardware name: X8DTH-i/6/iF/6F [ 705.513862] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput sg joydev sd_mod hid_generic iTCO_wdt iTCO_vendor_support coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 i2c_core button lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod ioatdma nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000 [ 705.580232] Pid: 33025, comm: ceph-osd Not tainted 3.7.0-00269-gd9acbfd #492 [ 705.587488] Call Trace: [ 705.589957] [] warn_slowpath_common+0x94/0xc0 [ 705.596108] [] ? btrfs_free_path+0x2a/0x40 [btrfs] [ 705.602685] [] warn_slowpath_fmt+0x46/0x50 [ 705.608563] [] __btrfs_abort_transaction+0x60/0x110 [btrfs] [ 705.615994] [] __btrfs_alloc_chunk+0x678/0x710 [btrfs] [ 705.622945] [] btrfs_alloc_chunk+0x5e/0x90 [btrfs] [ 705.629635] [] ? check_system_chunk+0x71/0x130 [btrfs] [ 705.637079] [] do_chunk_alloc+0x2ec/0x370 [btrfs] [ 705.643451] [] ? btrfs_reduce_alloc_profile+0xa9/0x120 [btrfs] [ 705.650951] [] btrfs_check_data_free_space+0x13c/0x2b0 [btrfs] [ 705.658446] [] btrfs_delalloc_reserve_space+0x20/0x60 [btrfs] [ 705.665882] [] __btrfs_buffered_write+0x15e/0x340 [btrfs] [ 705.672952] [] btrfs_file_aio_write+0x309/0x450 [btrfs] [ 705.679889] [] ? __btrfs_direct_write+0x130/0x130 [btrfs] [ 705.686934] [] do_sync_readv_writev+0x94/0xe0 [ 705.692942] [] do_readv_writev+0xe3/0x1e0 [ 705.698604] [] ? fget_light+0x122/0x170 [ 705.704093] [] vfs_writev+0x46/0x60 [ 705.709239] [] sys_writev+0x5f/0xc0 [ 705.714388] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 705.720827] [] system_call_fastpath+0x16/0x1b [ 705.726829] ---[ end trace 6e889d6d939ca116 ]--- [ 705.731459] BTRFS warning (device dm-93): __btrfs_alloc_chunk:3787: Aborting unused transaction(error 28). [ 705.741187] btrfs: mapping failed logical 1099431936 bio len 524288 len 65536 [ 705.741192] BTRFS warning (device dm-93): find_free_extent:5948: Aborting unused transaction(Object already exists). [ 705.759185] [ cut here ] [ 705.763929] kernel BUG at fs/btrfs/volumes.c:4891! [ 705.768990] invalid opcode: [#1] SMP [ 705.773561] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput sg joydev sd_mod hid_generic iTCO_wdt iTCO_vendor_support coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod cxgb4 i2c_i801 i2c_core button lpc_ich mfd_core ehci_hcd uhci_hcd i7core_edac edac_core dm_mod ioatdma nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000 [ 705.845121] CPU 22 [ 705.847114] Pid: 21317, comm: btrfs-worker-1 Tainted: GW 3.7.0-00269-gd9acbfd #492 Supermicro X8DTH-i/6/iF/6F/X8DTH [ 705.858886] RIP: 0010:[] [] btrfs_map_bio+0x8d/0x300 [btrfs] [ 705.867928] RSP: 0018:880610ce7c58 EFLAGS: 00010296 [ 705.873363] RAX: 0041 RBX: 88061c
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote: > On 01/28/2013 02:23 PM, Josef Bacik wrote: > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: > >> Hi Josef, > >> > >> Thanks for the patch - sorry for the long delay in testing... > >> > > > > Jim, > > > > I've been trying to reason out how this happens, could you do a btrfs fi df > > on > > the filesystem thats giving you trouble so I can see if what I think is > > happening is what's actually happening. Thanks, > > Here's an example, using a slightly different kernel than > my previous report. It's your btrfs-next master branch > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state") > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree). > > > Here I'm finding the file system in question: > > # ls -l /dev/mapper | grep dm-93 > lrwxrwxrwx 1 root root 8 Jan 29 11:13 cs53s19p2 -> ../dm-93 > > # df -h | grep -A 1 cs53s19p2 > /dev/mapper/cs53s19p2 > 896G 1.1G 896G 1% /ram/mnt/ceph/data.osd.522 > > > Here's the info you asked for: > > # btrfs fi df /ram/mnt/ceph/data.osd.522 > Data: total=2.01GB, used=1.00GB > System: total=4.00MB, used=64.00KB > Metadata: total=8.00MB, used=7.56MB > How big is the disk you are using, and what mount options? I have a patch to keep the panic from happening and hopefully the abort, could you try this? I still want to keep the underlying error from happening because it shouldn't be, but no reason I can't fix the error case while you can easily reproduce it :). Thanks, Josef >From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Tue, 29 Jan 2013 15:03:37 -0500 Subject: [PATCH] Btrfs: fix chunk allocation error handling If we error out allocating a dev extent we will have already created the block group and such which will cause problems since the allocator may have tried to allocate out of the block group that no longer exists. This will cause BUG_ON()'s in the bio submission path. This also makes a failure to allocate a dev extent a non-abort error, we will just clean up the dev extents we did allocate and exit. Now if we fail to delete the dev extents we will abort since we can't have half of the dev extents hanging around, but this will make us much less likely to abort. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/volumes.c | 32 ++-- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4f8c281..2ba5b84 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3766,12 +3766,6 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, if (ret) goto error; - ret = btrfs_make_block_group(trans, extent_root, 0, type, -BTRFS_FIRST_CHUNK_TREE_OBJECTID, -start, num_bytes); - if (ret) - goto error; - for (i = 0; i < map->num_stripes; ++i) { struct btrfs_device *device; u64 dev_offset; @@ -3783,15 +3777,33 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, info->chunk_root->root_key.objectid, BTRFS_FIRST_CHUNK_TREE_OBJECTID, start, dev_offset, stripe_size); - if (ret) { - btrfs_abort_transaction(trans, extent_root, ret); - goto error; - } + if (ret) + goto error_dev_extent; + } + + ret = btrfs_make_block_group(trans, extent_root, 0, type, +BTRFS_FIRST_CHUNK_TREE_OBJECTID, +start, num_bytes); + if (ret) { + i = map->num_stripes - 1; + goto error_dev_extent; } kfree(devices_info); return 0; +error_dev_extent: + for (; i >= 0; i--) { + struct btrfs_device *device; + int err; + + device = map->stripes[i].dev; + err = btrfs_free_dev_extent(trans, device, start); + if (err) { + btrfs_abort_transaction(trans, extent_root, err); + break; + } + } error: kfree(map); kfree(devices_info); -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] btrfs-progs: fix mkfs.btrfs -r option
Commit 605e806166847872bb91831b397d58f95027975a broke the mkfs.btrfs -r option, because it calls make_btrfs without ever setting dev_block_count, in the -r case, so we tell it to make a filesystem of size 0. Then we wander into ENOSPC land and segfault. As a quick one-line-fix, just set the dev_block_count to the size of the destination image file. Signed-off-by: Eric Sandeen --- diff --git a/mkfs.c b/mkfs.c index fbf8319..940702d 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1337,6 +1337,8 @@ int main(int ac, char **av) fprintf(stderr, "unable to zero the output file\n"); exit(1); } + /* our "device" is the new image file */ + dev_block_count = block_count; } if (mixed) { if (metadata_profile != data_profile) { -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On 01/29/2013 01:04 PM, Josef Bacik wrote: > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote: >> On 01/28/2013 02:23 PM, Josef Bacik wrote: >>> On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: Hi Josef, Thanks for the patch - sorry for the long delay in testing... >>> >>> Jim, >>> >>> I've been trying to reason out how this happens, could you do a btrfs fi df >>> on >>> the filesystem thats giving you trouble so I can see if what I think is >>> happening is what's actually happening. Thanks, >> >> Here's an example, using a slightly different kernel than >> my previous report. It's your btrfs-next master branch >> (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state") >> with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree). >> >> >> Here I'm finding the file system in question: >> >> # ls -l /dev/mapper | grep dm-93 >> lrwxrwxrwx 1 root root 8 Jan 29 11:13 cs53s19p2 -> ../dm-93 >> >> # df -h | grep -A 1 cs53s19p2 >> /dev/mapper/cs53s19p2 >> 896G 1.1G 896G 1% /ram/mnt/ceph/data.osd.522 >> >> >> Here's the info you asked for: >> >> # btrfs fi df /ram/mnt/ceph/data.osd.522 >> Data: total=2.01GB, used=1.00GB >> System: total=4.00MB, used=64.00KB >> Metadata: total=8.00MB, used=7.56MB >> > > How big is the disk you are using, and what mount options? The partition is ~900 GiB, and the mount options according to /proc/mount are: rw,noatime,nospace_cache Also, in case it matters, I build the file systems with -l 65536 -n 65536. > I have a patch to > keep the panic from happening and hopefully the abort, could you try this? I > still want to keep the underlying error from happening because it shouldn't > be, > but no reason I can't fix the error case while you can easily reproduce it :). I'm happy to try it - but I probably won't have results for you until tomorrow, due to other time pressures. Thanks for taking a look. -- Jim > Thanks, > > Josef > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2, RFC] btrfs-progs: overhaul mkfs.btrfs -r option
The manpage for the "-r" option simply says that it will copy the path specified to -r into the newly made filesystem. There's not a lot of reason to treat that option as differently as it is now - today it ignores discard, fs size, and mixed options, for example. It also failed to check whether the target device was mounted before proceeding. Etc... Rework things so that we really follow the same paths whether or not -r is specified, but with one special case for -r: * If the device does not exist, it will be created as a regular file of the minimum size to hold the -r path, or of size specified by the -b option. This also changes a little behavior; it does not pre-fill the new file with zeros, but allows it to be sparse, and does not truncate an existing device file. If you want to start with an empty file, just don't point it at an existing file... Signed-off-by: Eric Sandeen --- Lightly tested . . diff --git a/man/mkfs.btrfs.8.in b/man/mkfs.btrfs.8.in index 72025ed..c9f9e4f 100644 --- a/man/mkfs.btrfs.8.in +++ b/man/mkfs.btrfs.8.in @@ -63,6 +63,12 @@ Specify the sectorsize, the minimum block allocation. .TP \fB\-r\fR, \fB\-\-rootdir \fIrootdir\fR Specify a directory to copy into the newly created fs. +This option is limited to a single device. As a special +case for this option, if the device does not exist, +it will be created as a regular file of either the minimum +required size, or the size specified by the +\fB\-b\fR +option. .TP \fB\-K\fR, \fB\-\-nodiscard \fR Do not perform whole device TRIM operation by default. diff --git a/mkfs.c b/mkfs.c index 940702d..129fae8 100644 --- a/mkfs.c +++ b/mkfs.c @@ -1020,15 +1020,6 @@ fail_no_files: return -1; } -static int open_target(char *output_name) -{ - int output_fd; - output_fd = open(output_name, O_CREAT | O_RDWR | O_TRUNC, -S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH); - - return output_fd; -} - static int create_chunks(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 num_of_meta_chunks, u64 size_of_data) @@ -1150,28 +1141,6 @@ static u64 size_sourcedir(char *dir_name, u64 sectorsize, return total_size; } -static int zero_output_file(int out_fd, u64 size, u32 sectorsize) -{ - int len = sectorsize; - int loop_num = size / sectorsize; - u64 location = 0; - char *buf = malloc(len); - int ret = 0, i; - ssize_t written; - - if (!buf) - return -ENOMEM; - memset(buf, 0, len); - for (i = 0; i < loop_num; i++) { - written = pwrite64(out_fd, buf, len, location); - if (written != len) - ret = -EIO; - location += sectorsize; - } - free(buf); - return ret; -} - static int check_leaf_or_node_size(u32 size, u32 sectorsize) { if (size < sectorsize) { @@ -1291,55 +1260,74 @@ int main(int ac, char **av) if (ac == 0) print_usage(); + if (source_dir && ac > 1) { + fprintf(stderr, + "The -r option is limited to a single device\n"); + exit(1); + } + printf("\nWARNING! - %s IS EXPERIMENTAL\n", BTRFS_BUILD_VERSION); printf("WARNING! - see http://btrfs.wiki.kernel.org before using\n\n"); - if (source_dir == 0) { - file = av[optind++]; - ret = check_mounted(file); - if (ret < 0) { - fprintf(stderr, "error checking %s mount status\n", file); - exit(1); - } - if (ret == 1) { - fprintf(stderr, "%s is mounted\n", file); - exit(1); - } - ac--; - fd = open(file, O_RDWR); - if (fd < 0) { - fprintf(stderr, "unable to open %s\n", file); - exit(1); - } - first_file = file; - ret = btrfs_prepare_device(fd, file, zero_end, &dev_block_count, - block_count, &mixed, nodiscard); - if (block_count && block_count > dev_block_count) { - fprintf(stderr, "%s is smaller than requested size\n", file); - exit(1); - } - } else { - ac = 0; - file = av[optind++]; - fd = open_target(file); - if (fd < 0) { - fprintf(stderr, "unable to open the %s\n", file); - exit(1); - } + file = av[optind++]; + ac--; /* used that arg */ - first_file = file; + ret = check_mounted(file); + if (ret < 0) { + fprintf(stderr, "error checking %s mount status\n", file); + exit(1); + } + if (ret == 1) { +
Re: [PATCH] Btrfs: fix a deadlock on chunk mutex
On 01/29/2013 01:04 PM, Josef Bacik wrote: > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote: >> > On 01/28/2013 02:23 PM, Josef Bacik wrote: >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote: > >> Hi Josef, > >> > >> Thanks for the patch - sorry for the long delay in testing... > >> >>> > > >>> > > Jim, >>> > > >>> > > I've been trying to reason out how this happens, could you do a btrfs >>> > > fi df on >>> > > the filesystem thats giving you trouble so I can see if what I think is >>> > > happening is what's actually happening. Thanks, >> > >> > Here's an example, using a slightly different kernel than >> > my previous report. It's your btrfs-next master branch >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state") >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree). >> > >> > >> > Here I'm finding the file system in question: >> > >> > # ls -l /dev/mapper | grep dm-93 >> > lrwxrwxrwx 1 root root 8 Jan 29 11:13 cs53s19p2 -> ../dm-93 >> > >> > # df -h | grep -A 1 cs53s19p2 >> > /dev/mapper/cs53s19p2 >> > 896G 1.1G 896G 1% /ram/mnt/ceph/data.osd.522 >> > >> > >> > Here's the info you asked for: >> > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522 >> > Data: total=2.01GB, used=1.00GB >> > System: total=4.00MB, used=64.00KB >> > Metadata: total=8.00MB, used=7.56MB >> > > How big is the disk you are using, and what mount options? I have a patch to > keep the panic from happening and hopefully the abort, could you try this? I > still want to keep the underlying error from happening because it shouldn't > be, > but no reason I can't fix the error case while you can easily reproduce it :). > Thanks, > > Josef > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001 > From: Josef Bacik > Date: Tue, 29 Jan 2013 15:03:37 -0500 > Subject: [PATCH] Btrfs: fix chunk allocation error handling > > If we error out allocating a dev extent we will have already created the > block group and such which will cause problems since the allocator may have > tried to allocate out of the block group that no longer exists. This will > cause BUG_ON()'s in the bio submission path. This also makes a failure to > allocate a dev extent a non-abort error, we will just clean up the dev > extents we did allocate and exit. Now if we fail to delete the dev extents > we will abort since we can't have half of the dev extents hanging around, > but this will make us much less likely to abort. Thanks, > > Signed-off-by: Josef Bacik > --- Interesting - with your patch applied I triggered the following, just bringing up a fresh Ceph filesystem - I didn't even get a chance to mount it on my Ceph clients: [ 6419.450179] BTRFS error (device dm-73) in btrfs_free_dev_extent:1115: error 28 (Slot search failed) [ 6419.459223] btrfs is forced readonly [ 6419.462805] [ cut here ] [ 6419.467440] WARNING: at fs/btrfs/super.c:256 __btrfs_abort_transaction+0x60/0x110 [btrfs]() [ 6419.475809] Hardware name: X8DTH-i/6/iF/6F [ 6419.479914] Modules linked in: btrfs zlib_deflate ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath scsi_dh vhost_net macvtap macvlan tun uinput sg joydev sd_mod iTCO_wdt iTCO_vendor_support hid_generic coretemp kvm crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul microcode button ata_piix libata mpt2sas scsi_transport_sas raid_class scsi_mod serio_raw pcspkr mlx4_ib ib_sa ib_mad ib_core mlx4_en mlx4_core cxgb4 i2c_i801 i2c_core lpc_ich mfd_core uhci_hcd ehci_hcd i7core_edac edac_core ioatdma dm_mod nfsv4 auth_rpcgss nfsv3 nfs_acl nfsv2 nfs lockd sunrpc fscache broadcom tg3 hwmon bnx2 igb dca e1000 [ 6419.546095] Pid: 107593, comm: ceph-osd Not tainted 3.7.0-00270-g8353482 #494 [ 6419.553227] Call Trace: [ 6419.555697] [] warn_slowpath_common+0x94/0xc0 [ 6419.561708] [] warn_slowpath_fmt+0x46/0x50 [ 6419.567491] [] __btrfs_abort_transaction+0x60/0x110 [btrfs] [ 6419.574746] [] __btrfs_alloc_chunk+0x6e6/0x770 [btrfs] [ 6419.581553] [] btrfs_alloc_chunk+0x5e/0x90 [btrfs] [ 6419.588017] [] ? check_system_chunk+0x71/0x130 [btrfs] [ 6419.594824] [] do_chunk_alloc+0x2ec/0x370 [btrfs] [ 6419.601188] [] find_free_extent+0xaac/0xbe0 [btrfs] [ 6419.607733] [] btrfs_reserve_extent+0x82/0x190 [btrfs] [ 6419.614545] [] btrfs_alloc_free_block+0x85/0x230 [btrfs] [ 6419.621530] [] ? check_buffer_tree_ref+0x25/0x50 [btrfs] [ 6419.628512] [] __btrfs_cow_block+0x14a/0x4b0 [btrfs] [ 6419.635155] [] ? btrfs_try_tree_write_lock+0x3c/0xa0 [btrfs] [ 6419.642475] [] ? btrfs_set_lock_blocking_rw+0xe3/0x160 [btrfs] [ 6419.649970] [] btrfs_cow_block+0x161/0x200 [btrfs] [ 6419.656424] [] btrfs_search_slot+0x399/0x760 [btrfs] [ 6419.663050] [] btrfs_truncate_inode_items+0x179/0x710 [btrfs] [ 6419.670458] [] ? btrfs_add_ordered_operation+0x5
[RFC] Move btrfsck in to the btrfs command
NOTE: in order to apply this patch you should: git mv btrfsck.c cmd-fsck.c This patch moves btrfsck in to "btrfs fsck". It also adds support for symlinks to the btrfs binary to retain compablity, =) I think something should be done to the help description but i'm not sure what... Anyway, feedback is welcome. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [RFC] include btrfsck in btrfs - including "name check"
This patch includes fsck as a subcommand of btrfs, but if you rename the binary to btrfsck (or, preferably, use a symlink) it will act like the old btrfs command. It will also handle fsck.btrfs which currently is a noop. --- Makefile| 4 ++-- btrfs.c | 68 + cmds-fsck.c | 38 +++--- commands.h | 3 +++ 4 files changed, 77 insertions(+), 36 deletions(-) diff --git a/Makefile b/Makefile index 4894903..8467530 100644 --- a/Makefile +++ b/Makefile @@ -8,7 +8,7 @@ objects = ctree.o disk-io.o radix-tree.o extent-tree.o print-tree.o \ send-stream.o send-utils.o qgroup.o cmds_objects = cmds-subvolume.o cmds-filesystem.o cmds-device.o cmds-scrub.o \ cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \ - cmds-quota.o cmds-qgroup.o + cmds-quota.o cmds-qgroup.o cmds-fsck.o CHECKFLAGS= -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ -Wbitwise \ -Wuninitialized -Wshadow -Wundef @@ -20,7 +20,7 @@ bindir = $(prefix)/bin LIBS=-luuid -lm RESTORE_LIBS=-lz -progs = btrfsctl mkfs.btrfs btrfs-debug-tree btrfs-show btrfs-vol btrfsck \ +progs = btrfsctl mkfs.btrfs btrfs-debug-tree btrfs-show btrfs-vol \ btrfs btrfs-map-logical btrfs-image btrfs-zero-log btrfs-convert \ btrfs-find-root btrfs-restore btrfstune diff --git a/btrfs.c b/btrfs.c index 687acec..5c1220e 100644 --- a/btrfs.c +++ b/btrfs.c @@ -48,8 +48,13 @@ int prefixcmp(const char *str, const char *prefix) return (unsigned char)*prefix - (unsigned char)*str; } -static int parse_one_token(const char *arg, const struct cmd_group *grp, - const struct cmd_struct **cmd_ret) +#define parse_one_token(arg, grp, cmd_ret) \ + _parse_one_token((arg), (grp), (cmd_ret), 0) +#define parse_one_exact_token(arg, grp, cmd_ret) \ + _parse_one_token((arg), (grp), (cmd_ret), 1) + +static int _parse_one_token(const char *arg, const struct cmd_group *grp, + const struct cmd_struct **cmd_ret, int exact) { const struct cmd_struct *cmd = grp->commands; const struct cmd_struct *abbrev_cmd = NULL, *ambiguous_cmd = NULL; @@ -80,12 +85,15 @@ static int parse_one_token(const char *arg, const struct cmd_group *grp, return 0; } - if (ambiguous_cmd) - return -2; + if (!exact) + { + if (ambiguous_cmd) + return -2; - if (abbrev_cmd) { - *cmd_ret = abbrev_cmd; - return 0; + if (abbrev_cmd) { + *cmd_ret = abbrev_cmd; + return 0; + } } return -1; @@ -246,6 +254,7 @@ const struct cmd_group btrfs_cmd_group = { { "balance", cmd_balance, NULL, &balance_cmd_group, 0 }, { "device", cmd_device, NULL, &device_cmd_group, 0 }, { "scrub", cmd_scrub, NULL, &scrub_cmd_group, 0 }, + { "fsck", cmd_fsck, cmd_fsck_usage, NULL, 0 }, { "inspect-internal", cmd_inspect, NULL, &inspect_cmd_group, 0 }, { "send", cmd_send, NULL, &send_cmd_group, 0 }, { "receive", cmd_receive, NULL, &receive_cmd_group, 0 }, @@ -257,24 +266,47 @@ const struct cmd_group btrfs_cmd_group = { }, }; +static int cmd_dummy(int argc, char **argv) +{ + return 0; +} + +/* change behaviour depending on what we're called */ +const struct cmd_group function_cmd_group = { + NULL, NULL, + { + { "btrfsck", cmd_fsck, NULL, NULL, 0 }, + { "fsck.btrfs", cmd_dummy, NULL, NULL, 0 }, + { 0, 0, 0, 0, 0 } + }, +}; + int main(int argc, char **argv) { const struct cmd_struct *cmd; + char *func = strrchr(argv[0], '/'); + if (func) + argv[0] = ++func; crc32c_optimization_init(); - argc--; - argv++; - handle_options(&argc, &argv); - if (argc > 0) { - if (!prefixcmp(argv[0], "--")) - argv[0] += 2; - } else { - usage_command_group(&btrfs_cmd_group, 0, 0); - exit(1); - } + /* if we have cmd, we're started as a sub command */ + if (parse_one_exact_token(argv[0], &function_cmd_group, &cmd) < 0) + { + argc--; + argv++; - cmd = parse_command_token(argv[0], &btrfs_cmd_group); + handle_options(&argc, &argv); + if (argc > 0) { + if (!prefixcmp(argv[0], "--")) + argv[0] += 2; + } else { + usage_command_group(&btrfs_cmd_group, 0, 0); + exit(1); + } + + cmd = parse_command_token(argv[0], &btrfs_cmd_group); + } handle
Integration branch of btrfs-progs 2013-01-30
Hi, a few build warning fixes, unaligned access fix #2 and finally support for the 'device stats' and device 'replace' commands! Please test, worked for me here, but not tested extensively. If everything goes well I'll send a pull request with this branch in a few days. git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130130 (top commit 78b35a43988163dbf71d9) I'll continue collecting patches and patchsets in the mean time. Existing patches and bugfixes have a slight precedence over patchsets. thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Integration branch of btrfs-progs 2013-01-30
On Wed, Jan 30, 2013 at 01:24:28AM +0100, David Sterba wrote: > git://repo.or.cz/btrfs-progs-unstable/devel.git integration-20130130 > > (top commit 78b35a43988163dbf71d9) Shortlog: Anand Jain (1): Btrfs-progs: move open_file_or_dir() to utils.c Ben Peddell (1): btrfs-progs: fix unaligned accesses v2 David Sterba (1): btrfs-progs: fix build warnings in btrfslabel.c Gene Czarcinski (1): Btrfs-progs: Fix trival compiler error in cmds-qgroup.c Stefan Behrens (3): Btrfs-progs: make two utility functions globally available Btrfs-progs: add command to get/reset device stats via ioctl Btrfs-progs: add support for device replace procedure -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html