Re: btrfs/125 deadlock using nospace_cache or space_cache=v2
State updated: The deadlock seems to be caused by 2 bugs: 1) Bad error handling in run_delalloc_nocow() The direct cause is, btrfs_reloc_clone_csums() fails and return -EIO. Then error handler will call extent_clear_unlock_delalloc() to clear dirty flag and end writeback of the resting pages in the extent. However this makes the ordered extent not happy, as it just skips IO of the remaining pages, which ordered extent relies on to finish. This is quite easy to reproduce using the following modification: diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1e861a0..b9d0bcb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1497,8 +1497,11 @@ static noinline int run_delalloc_nocow(struct inode *inode, if (root->root_key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) { + ret = -EIO; + /* ret = btrfs_reloc_clone_csums(inode, cur_offset, num_bytes); + */ if (ret) { if (!nolock && nocow) btrfs_end_write_no_snapshoting(root); Then any balance will cause btrfs to wait ordered extent which will never finish, just like what we encountered. 2) RAID5/6 recover not working in some tree operation. In fact, btrfs succeeded to mount the fs, so RAID5/6 recover code is working, at least for some trees. And btrfs succeeded in recovering all the data with correct checksum, if using normal read(cat works here) before balance. However it fails to read csum tree and causes run_delalloc_nocow() to return -EIO, which leads to above bug. So there is something related to RAID5/6 code, maybe readahead? which contributes to this bug. I'll continue digging and keep the state updated if anyone is interested in this bug. Thanks, Qu At 02/07/2017 04:02 PM, Anand Jain wrote: Hi Qu, I don't think I have seen this before, I don't know the reason why I wrote this, may be to test encryption, however it was all with default options. But now I could reproduce and, looks like balance fails to start with IO error though the mount is successful. -- # tail -f ./results/btrfs/125.full intense and takes potentially very long. It is recommended to use the balance filters to narrow down the balanced data. Use 'btrfs balance start --full-balance' option to skip this warning. The operation will start in 10 seconds. Use Ctrl-C to stop it. 10 9 8 7 6 5 4 3 2 1ERROR: error during balancing '/scratch': Input/output error There may be more info in syslog - try dmesg | tail Starting balance without any filters. failed: '/root/bin/btrfs balance start /scratch' This must be fixed. For debugging if I add a sync before previous unmount, the problem isn't reproduced. just fyi. Strange. --- diff --git a/tests/btrfs/125 b/tests/btrfs/125 index 91aa8d8c3f4d..4d4316ca9f6e 100755 --- a/tests/btrfs/125 +++ b/tests/btrfs/125 @@ -133,6 +133,7 @@ echo "-Mount normal-" >> $seqres.full echo echo "Mount normal and balance" +_run_btrfs_util_prog filesystem sync $SCRATCH_MNT _scratch_unmount _run_btrfs_util_prog device scan _scratch_mount >> $seqres.full 2>&1 -- HTH. Thanks, Anand On 02/07/17 14:09, Qu Wenruo wrote: Hi Anand, I found that btrfs/125 test case can only pass if we enabled space cache. If using nospace_cache or space_cache=v2 mount option, it will get blocked forever with the following callstack(the only blocked process): [11382.046978] btrfs D11128 6705 6057 0x [11382.047356] Call Trace: [11382.047668] __schedule+0x2d4/0xae0 [11382.047956] schedule+0x3d/0x90 [11382.048283] btrfs_start_ordered_extent+0x160/0x200 [btrfs] [11382.048630] ? wake_atomic_t_function+0x60/0x60 [11382.048958] btrfs_wait_ordered_range+0x113/0x210 [btrfs] [11382.049360] btrfs_relocate_block_group+0x260/0x2b0 [btrfs] [11382.049703] btrfs_relocate_chunk+0x51/0xf0 [btrfs] [11382.050073] btrfs_balance+0xaa9/0x1610 [btrfs] [11382.050404] ? btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs] [11382.050739] btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs] [11382.051109] btrfs_ioctl+0xbe7/0x27f0 [btrfs] [11382.051430] ? trace_hardirqs_on+0xd/0x10 [11382.051747] ? free_object+0x74/0xa0 [11382.052084] ? debug_object_free+0xf2/0x130 [11382.052413] do_vfs_ioctl+0x94/0x710 [11382.052750] ? enqueue_hrtimer+0x160/0x160 [11382.053090] ? do_nanosleep+0x71/0x130 [11382.053431] SyS_ioctl+0x79/0x90 [11382.053735] entry_SYSCALL_64_fastpath+0x18/0xad [11382.054570] RIP: 0033:0x7f397d7a6787 I also found in the test case, we only have 3 continuous data extents, whose sizes are 1M, 68.5M and 31.5M respectively. Original data block group: 0 1M 64M69.5M 101M 128M | Ext A | Extent B(68.5M) |
Re: [PATCH v2] Btrfs: fix btrfs_decompress_buf2page()
On 02/11/17 07:03, Omar Sandoval wrote: From: Omar SandovalIf btrfs_decompress_buf2page() is handed a bio with its page in the middle of the working buffer, then we adjust the offset into the working buffer. After we copy into the bio, we advance the iterator by the number of bytes we copied. Then, we have some logic to handle the case of discontiguous pages and adjust the offset into the working buffer again. However, if we didn't advance the bio to a new page, we may enter this case in error, essentially repeating the adjustment that we already made when we entered the function. The end result is bogus data in the bio. Previously, we only checked for this case when we advanced to a new page, but the conversion to bio iterators changed that. This restores the old, correct behavior. A case I saw when testing with zlib was: buf_start = 42769 total_out = 46865 working_bytes = total_out - buf_start = 4096 start_byte = 45056 The condition (total_out > start_byte && buf_start < start_byte) is true, so we adjust the offset: buf_offset = start_byte - buf_start = 2287 working_bytes -= buf_offset = 1809 current_buf_start = buf_start = 42769 Then, we copy bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809 buf_offset += bytes = 4096 working_bytes -= bytes = 0 current_buf_start += bytes = 44578 After bio_advance(), we are still in the same page, so start_byte is the same. Then, we check (total_out > start_byte && current_buf_start < start_byte), which is true! So, we adjust the values again: buf_offset = start_byte - buf_start = 2287 working_bytes = total_out - start_byte = 1809 current_buf_start = buf_start + buf_offset = 45056 But note that working_bytes was already zero before this, so we should have stopped copying. Thanks Omar for nailing this down. How about a test case for this ? -Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans
Just as Filipe pointed out, the most time consuming parts of qgroup are btrfs_qgroup_account_extents() and btrfs_qgroup_prepare_account_extents(). Which both call btrfs_find_all_roots() to get old_roots and new_roots ulist. What makes things worse is, we're calling that expensive btrfs_find_all_roots() at transaction committing time with TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction. Such behavior is necessary for @new_roots search as current btrfs_find_all_roots() can't do it correctly so we do call it just before switch commit roots. However for @old_roots search, it's not necessary as such search is based on commit_root, so it will always be correct and we can move it out of transaction committing. This patch moves the @old_roots search part out of commit_transaction(), so in theory we can half the time qgroup time consumption at commit_transaction(). But please note that, this won't speedup qgroup overall, the total time consumption is still the same, just reduce the performance stall. Cc: Filipe MananaSigned-off-by: Qu Wenruo --- v2: Update commit message to make it more clear. Don't call btrfs_find_all_roots() before insert qgroup extent record, so we can avoid wasting CPU for already inserted qgroup extent record. PS: If and only if we have fixed and proved btrfs_find_all_roots() can get correct result with current root, then we can move all expensive btrfs_find_all_roots() out of transaction committing. However I strongly doubt if it's possible with current delayed ref, and without such prove, move btrfs_find_all_roots() for @new_roots out of transaction committing will just screw up qgroup accounting. --- fs/btrfs/delayed-ref.c | 20 fs/btrfs/qgroup.c | 33 + fs/btrfs/qgroup.h | 33 ++--- 3 files changed, 75 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ef724a5fc30e..0ee927ef5a71 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *ref, struct btrfs_qgroup_extent_record *qrecord, u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved, -int action, int is_data) +int action, int is_data, int *qrecord_inserted_ret) { struct btrfs_delayed_ref_head *existing; struct btrfs_delayed_ref_head *head_ref = NULL; struct btrfs_delayed_ref_root *delayed_refs; int count_mod = 1; int must_insert_reserved = 0; + int qrecord_inserted = 0; /* If reserved is provided, it must be a data extent. */ BUG_ON(!is_data && reserved); @@ -623,6 +624,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, if(btrfs_qgroup_trace_extent_nolock(fs_info, delayed_refs, qrecord)) kfree(qrecord); + else + qrecord_inserted = 1; } spin_lock_init(_ref->lock); @@ -650,6 +653,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, atomic_inc(_refs->num_entries); trans->delayed_ref_updates++; } + if (qrecord_inserted_ret) + *qrecord_inserted_ret = qrecord_inserted; return head_ref; } @@ -779,6 +784,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head_ref; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_qgroup_extent_record *record = NULL; + int qrecord_inserted; BUG_ON(extent_op && extent_op->is_data); ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS); @@ -806,12 +812,15 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, * the spin lock */ head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record, - bytenr, num_bytes, 0, 0, action, 0); + bytenr, num_bytes, 0, 0, action, 0, + _inserted); add_delayed_tree_ref(fs_info, trans, head_ref, >node, bytenr, num_bytes, parent, ref_root, level, action); spin_unlock(_refs->lock); + if (qrecord_inserted) + return btrfs_qgroup_trace_extent_post(fs_info, record); return 0; free_head_ref: @@ -836,6 +845,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_head *head_ref; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_qgroup_extent_record *record = NULL; + int qrecord_inserted; BUG_ON(extent_op && !extent_op->is_data); ref =
[PATCH v3] btrfs: relocation: Enhance kernel error output for relocation
When balance(relocation) fails, btrfs-progs will report like: ERROR: error during balancing '/mnt/scratch': Input/output error There may be more info in syslog - try dmesg | tail However kernel can't provide may useful info in many cases to locate the problem. This patch will add error messages in relocation to help user and developer to locate the problem. Signed-off-by: Qu Wenruo--- v2: Fix typo where 'err' and 'ret' are used wrong. v3: Add space between error number and parenthesis of error string Fix gramma errors. --- fs/btrfs/relocation.c | 61 +++ 1 file changed, 57 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 379711048fb0..6de5800e17bc 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4011,6 +4011,8 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) rc->block_rsv, rc->block_rsv->size, BTRFS_RESERVE_FLUSH_ALL); if (ret) { + btrfs_err(fs_info, "failed to reserve space: %d (%s)", + ret, btrfs_decode_error(ret)); err = ret; break; } @@ -4019,6 +4021,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) if (IS_ERR(trans)) { err = PTR_ERR(trans); trans = NULL; + btrfs_err(fs_info, + "failed to start transaction: %d (%s)", + err, btrfs_decode_error(err)); break; } restart: @@ -4028,8 +4033,12 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) } ret = find_next_extent(rc, path, ); - if (ret < 0) + if (ret < 0) { + btrfs_err(fs_info, + "failed to find next extent: %d (%s)", + ret, btrfs_decode_error(ret)); err = ret; + } if (ret != 0) break; @@ -4081,9 +4090,17 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { ret = add_tree_block(rc, , path, ); + if (ret < 0) + btrfs_err(fs_info, + "failed to record tree block: %d (%s)", + ret, btrfs_decode_error(ret)); } else if (rc->stage == UPDATE_DATA_PTRS && (flags & BTRFS_EXTENT_FLAG_DATA)) { ret = add_data_references(rc, , path, ); + if (ret < 0) + btrfs_err(fs_info, + "failed to record data extent: %d (%s)", + ret, btrfs_decode_error(ret)); } else { btrfs_release_path(path); ret = 0; @@ -4103,6 +4120,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) rc->backref_cache.last_trans = trans->transid - 1; if (ret != -EAGAIN) { + btrfs_err(fs_info, + "failed to relocate tree blocks: %d (%s)", + ret, btrfs_decode_error(ret)); err = ret; break; } @@ -4121,6 +4141,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) ret = relocate_data_extent(rc->data_inode, , >cluster); if (ret < 0) { + btrfs_err(fs_info, + "failed to relocate data extent: %d (%s)", + ret, btrfs_decode_error(ret)); err = ret; break; } @@ -4147,8 +4170,12 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) if (!err) { ret = relocate_file_extent_cluster(rc->data_inode, >cluster); - if (ret < 0) + if (ret < 0) { + btrfs_err(fs_info, + "failed to relocate file extent cluster: %d (%s)", + ret, btrfs_decode_error(ret));
Re: [PATCH v2] btrfs: relocation: Enhance kernel error output for relocation
At 02/15/2017 04:34 AM, Filipe Manana wrote: On Tue, Feb 14, 2017 at 12:57 AM, Qu Wenruowrote: When balance(relocation) fails, btrfs-progs will report like: ERROR: error during balancing '/mnt/scratch': Input/output error There may be more info in syslog - try dmesg | tail However kernel can't provide may useful info in many cases to locate the problem. This patch will add error messages in relocation to help user and developer to locate the problem. Signed-off-by: Qu Wenruo --- v2: Fix typo where 'err' and 'ret' are used wrong. --- fs/btrfs/relocation.c | 60 +++ 1 file changed, 56 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 379711048fb0..d26809807c1b 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4011,6 +4011,8 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) rc->block_rsv, rc->block_rsv->size, BTRFS_RESERVE_FLUSH_ALL); if (ret) { + btrfs_err(fs_info, "failed to reserve space: %d(%s)", Please leave a space between %d and the opening parenthesis (a common style you follow but it's not correct in most western languages afaik, and makes things harder to the eye). Thanks for pointing out my bad habit. I'll avoid such usage in later patches. Thanks, Qu + ret, btrfs_decode_error(ret)); err = ret; break; } @@ -4019,6 +4021,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) if (IS_ERR(trans)) { err = PTR_ERR(trans); trans = NULL; + btrfs_err(fs_info, + "failed to start transaction: %d(%s)", Same, space before parenthesis missing. + err, btrfs_decode_error(err)); break; } restart: @@ -4028,8 +4033,11 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) } ret = find_next_extent(rc, path, ); - if (ret < 0) + if (ret < 0) { + btrfs_err(fs_info, "failed to find next extent: %d(%s)", Same, space before parenthesis missing. + ret, btrfs_decode_error(ret)); err = ret; + } if (ret != 0) break; @@ -4081,9 +4089,17 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { ret = add_tree_block(rc, , path, ); + if (ret < 0) + btrfs_err(fs_info, + "failed to record tree block: %d(%s)", + ret, btrfs_decode_error(ret)); } else if (rc->stage == UPDATE_DATA_PTRS && (flags & BTRFS_EXTENT_FLAG_DATA)) { ret = add_data_references(rc, , path, ); + if (ret < 0) + btrfs_err(fs_info, + "failed to record data extent: %d(%s)", Same, space before parenthesis missing. + ret, btrfs_decode_error(ret)); } else { btrfs_release_path(path); ret = 0; @@ -4103,6 +4119,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) rc->backref_cache.last_trans = trans->transid - 1; if (ret != -EAGAIN) { + btrfs_err(fs_info, + "faild to relocate tree blocks: %d(%s)", faild -> failed Same, space before parenthesis missing. + ret, btrfs_decode_error(ret)); err = ret; break; } @@ -4121,6 +4140,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc) ret = relocate_data_extent(rc->data_inode, , >cluster); if (ret < 0) { + btrfs_err(fs_info, + "failed to relocate data extent: %d(%s)", Same, space before parenthesis missing. + ret, btrfs_decode_error(ret)); err = ret; break;
Re: [PATCH] Btrfs: fix data loss after truncate when using the no-holes feature
On Tue, Feb 14, 2017 at 08:35:26PM +, fdman...@kernel.org wrote: > From: Filipe Manana> > If we have a file with an implicit hole (NO_HOLES feature enabled) that > has an extent following the hole, delayed writes against regions of the > file behind the hole happened before but were not yet flushed and then > we truncate the file to a smaller size that lies inside the hole, we > end up persisting a wrong disk_i_size value for our inode that leads to > data loss after umounting and mounting again the filesystem or after > the inode is evicted and loaded again. > > This happens because at inode.c:btrfs_truncate_inode_items() we end up > setting last_size to the offset of the extent that we deleted and that > followed the hole. We then pass that value to btrfs_ordered_update_i_size() > which updates the inode's disk_i_size to a value smaller then the offset > of the buffered (delayed) writes. > > Example reproducer: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo > $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo > $ xfs_io -c "truncate 60K" /mnt/foo >--> inode's disk_i_size updated to 0 > > $ md5sum /mnt/foo > 3c5ca3c3ab42f4b04d7e7eb0b0d4d806 /mnt/foo > > $ umount /dev/sdb > $ mount /dev/sdb /mnt > > $ md5sum /mnt/foo > d41d8cd98f00b204e9800998ecf8427e /mnt/foo >--> Empty file, all data lost! > > Cc: # 3.14+ > Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole > extents") > Signed-off-by: Filipe Manana > --- > fs/btrfs/inode.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 8eb6703..a0a38ce 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4703,8 +4703,12 @@ int btrfs_truncate_inode_items(struct > btrfs_trans_handle *trans, > btrfs_abort_transaction(trans, ret); > } > error: > - if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) > + if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { > + if (last_size > new_size && > + btrfs_fs_incompat(fs_info, NO_HOLES)) > + last_size = new_size; > btrfs_ordered_update_i_size(inode, last_size, NULL); > + } > Looks good to me. Reviewed-by: Liu Bo We can also remove the code that sets last_size = new_size in the case of NO_HOLES inside the while loop, as I think this patch covers that, too. Because we've held the inode mutex and wait for the dirty pages beyond the current i_size, in fact, I couldn't think of a case that with NO_HOLE last_size < new_size, maybe we could use new_size directly and add a ASSERT just in case? Thanks, -liubo > btrfs_free_path(path); > > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: misc-tests: Primary Superblock corruption and recovery using backup Superblock.
Signed-off-by: Lakshmipathi.G--- .../019-fix-corrupted-superblock/test.sh | 37 ++ 1 file changed, 37 insertions(+) create mode 100755 tests/misc-tests/019-fix-corrupted-superblock/test.sh diff --git a/tests/misc-tests/019-fix-corrupted-superblock/test.sh b/tests/misc-tests/019-fix-corrupted-superblock/test.sh new file mode 100755 index 000..806c775 --- /dev/null +++ b/tests/misc-tests/019-fix-corrupted-superblock/test.sh @@ -0,0 +1,37 @@ +#!/bin/bash +# +# Corrupt primary superblock and restore it using backup superblock. +# + +source $TOP/tests/common + +check_prereq btrfs-select-super + +setup_root_helper +prepare_test_dev 512M + +superblock_offset=65536 + +test_superblock_restore() +{ + run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV + + # Corrupt superblock checksum +dd if=/dev/zero of=$TEST_DEV seek=$superblock_offset bs=1 \ +count=4 conv=notrunc &> /dev/null + run_check_stdout $SUDO_HELPER mount $TEST_DEV $TEST_MNT | \ + grep -q 'wrong fs type' +if [ $? -ne 0 ]; then + _fail "Failed to corrupt superblock." +fi + + # Copy backup superblock to primary + run_check $TOP/btrfs-select-super -s 1 $TEST_DEV + run_check $SUDO_HELPER mount $TEST_DEV $TEST_MNT +if [ $? -ne 0 ]; then + _fail "Failed to fix superblock." +fi + run_check_umount_test_dev +} + +test_superblock_restore -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: add test for truncate after a mix of dio and buffered writes
From: Filipe MananaTest that if we have a file with a hole, do a mix of direct IO and buffered writes to it and truncate the file to a size that lies in the middle of the hole, after unmounting and mounting again the filesystem, the file has a correct size and no data loss happened. This test is motivated by a bug found in btrfs when used with the no-holes feature (i.e. MKFS_OPTIONS="-O no-holes") which is fixed by the following patch for the linux kernel: Btrfs: fix data loss after truncate when using the no-holes feature Signed-off-by: Filipe Manana --- tests/generic/409 | 75 +++ tests/generic/409.out | 9 +++ tests/generic/group | 1 + 3 files changed, 85 insertions(+) create mode 100755 tests/generic/409 create mode 100644 tests/generic/409.out diff --git a/tests/generic/409 b/tests/generic/409 new file mode 100755 index 000..8d3fd33 --- /dev/null +++ b/tests/generic/409 @@ -0,0 +1,75 @@ +#! /bin/bash +# FSQA Test No. 409 +# +# Test that if we have a file with a hole, do a mix of direct IO and buffered +# writes to it and truncate the file to a size that lies in the middle of the +# hole, after unmounting and mounting again the filesystem, the file has a +# correct size and no data loss happened. +# +#--- +# +# Copyright (C) 2017 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# Create out test file with two extents and a hole between those extents. +# The extent that lies beyond the hole must be written using direct IO and later +# we truncate the file to a size that lies within the hole's range. +$XFS_IO_PROG -f -c "pwrite -S 0x01 0K 32K" $SCRATCH_MNT/foo | _filter_xfs_io +$XFS_IO_PROG -d -c "pwrite -S 0x02 -b 32K 64K 32K" $SCRATCH_MNT/foo \ + | _filter_xfs_io + +# Now truncate our file to a smaller size that lies behind the offset used by +# the previous direct IO write and that lies in a file hole. +$XFS_IO_PROG -c "truncate 60K" $SCRATCH_MNT/foo + +# Now get file digests before unmounting the filesystem and after mounting it +# again. The digests should match (same file data and size in both cases). +echo "File digest before unmounting the filesystem:" +md5sum $SCRATCH_MNT/foo | _filter_scratch +_scratch_cycle_mount +echo "File digest after mounting again the filesystem:" +md5sum $SCRATCH_MNT/foo | _filter_scratch + +status=0 +exit diff --git a/tests/generic/409.out b/tests/generic/409.out new file mode 100644 index 000..02a7dc5 --- /dev/null +++ b/tests/generic/409.out @@ -0,0 +1,9 @@ +QA output created by 409 +wrote 32768/32768 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +wrote 32768/32768 bytes at offset 65536 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +File digest before unmounting the filesystem: +3c5ca3c3ab42f4b04d7e7eb0b0d4d806 SCRATCH_MNT/foo +File digest after mounting again the filesystem: +3c5ca3c3ab42f4b04d7e7eb0b0d4d806 SCRATCH_MNT/foo diff --git a/tests/generic/group b/tests/generic/group index d0bc47d..074b494 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -411,3 +411,4 @@ 406 auto quick dangerous 407 auto quick clone metadata 408 auto quick clone dedupe metadata +409 auto quick metadata -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: add test for btrfs send/receive with sparse files
From: Filipe MananaTest that both a full and incremental btrfs send operation preserves file holes. This used to fail when the filesystem had the NO_HOLES feature enabled, that is, when the test is run with MKFS_OPTIONS="-O no-holes". This is fixed by the following patch for the linux kernel: "Btrfs: incremental send, fix unnecessary hole writes for sparse files" Signed-off-by: Filipe Manana --- tests/btrfs/137 | 141 tests/btrfs/137.out | 63 +++ tests/btrfs/group | 1 + 3 files changed, 205 insertions(+) create mode 100755 tests/btrfs/137 create mode 100644 tests/btrfs/137.out diff --git a/tests/btrfs/137 b/tests/btrfs/137 new file mode 100755 index 000..3ff2c6b --- /dev/null +++ b/tests/btrfs/137 @@ -0,0 +1,141 @@ +#! /bin/bash +# FS QA Test No. btrfs/137 +# +# Test that both incremental and full send operations preserve file holes. +# +#--- +# +# Copyright (C) 2017 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -fr $send_files_dir + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/punch + +# real QA test starts here +_supported_fs btrfs +_supported_os Linux +_require_test +_require_scratch +_require_xfs_io_command "fiemap" + +send_files_dir=$TEST_DIR/btrfs-test-$seq + +rm -f $seqres.full +rm -fr $send_files_dir +mkdir $send_files_dir + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount + +# Create the first test file. +$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 4K" $SCRATCH_MNT/foo | _filter_xfs_io + +# Create a second test file with a 1Mb hole. +$XFS_IO_PROG -f \ + -c "pwrite -S 0xaa 0 4K" \ + -c "pwrite -S 0xbb 1028K 4K" \ + $SCRATCH_MNT/bar | _filter_xfs_io + +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ + $SCRATCH_MNT/snap1 >/dev/null + +# Now add one new extent to our first test file, increasing its size and leaving +# a 1Mb hole between the first extent and this new extent. +$XFS_IO_PROG -c "pwrite -S 0xbb 1028K 4K" $SCRATCH_MNT/foo | _filter_xfs_io + +# Now overwrite the last extent of our second test file. +$XFS_IO_PROG -c "pwrite -S 0xcc 1028K 4K" $SCRATCH_MNT/bar | _filter_xfs_io + +$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \ +$SCRATCH_MNT/snap2 >/dev/null + +echo +echo "File digests in the original filesystem:" +md5sum $SCRATCH_MNT/snap1/foo | _filter_scratch +md5sum $SCRATCH_MNT/snap1/bar | _filter_scratch +md5sum $SCRATCH_MNT/snap2/foo | _filter_scratch +md5sum $SCRATCH_MNT/snap2/bar | _filter_scratch + +echo +echo "File snap1/foo fiemap results in the original filesystem:" +$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap1/foo | _filter_fiemap +echo +echo "File snap1/bar fiemap results in the original filesystem:" +$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap1/bar | _filter_fiemap +echo +echo "File snap2/foo fiemap results in the original filesystem:" +$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap2/foo | _filter_fiemap +echo +echo "File snap2/bar fiemap results in the original filesystem:" +$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap2/bar | _filter_fiemap +echo + +# Create the send streams to apply later on a new filesystem. +$BTRFS_UTIL_PROG send $SCRATCH_MNT/snap1 -f $send_files_dir/1.snap 2>&1 \ + | _filter_scratch +$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/snap1 $SCRATCH_MNT/snap2 \ + -f $send_files_dir/2.snap 2>&1 | _filter_scratch + +# Create a new filesystem, receive the send streams and verify that the file +# contents are the same as in the original filesystem and that the file holes +# exists in both snapshots. +_scratch_unmount +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount +$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $send_files_dir/1.snap >/dev/null +$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $send_files_dir/2.snap >/dev/null + +echo +echo "File digests in the new
[PATCH] Btrfs: incremental send, fix unnecessary hole writes for sparse files
From: Filipe MananaWhen using the NO_HOLES feature, during an incremental send we often issue write operations for holes when we should not, because that range is already a hole in the destination snapshot. While that does not change the contents of the file at the receiver, it avoids preservation of file holes, leading to wasted disk space and extra IO during send/receive. A couple examples where the holes are not preserved follows. $ mkfs.btrfs -O no-holes -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xaa 0 4K" /mnt/foo $ xfs_io -f -c "pwrite -S 0xaa 0 4K" -c "pwrite -S 0xbb 1028K 4K" /mnt/bar $ btrfs subvolume snapshot -r /mnt /mnt/snap1 # Now add one new extent to our first test file, increasing its size and # leaving a 1Mb hole between the first extent and this new extent. $ xfs_io -c "pwrite -S 0xbb 1028K 4K" /mnt/foo # Now overwrite the last extent of our second test file. $ xfs_io -c "pwrite -S 0xcc 1028K 4K" /mnt/bar $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ xfs_io -r -c "fiemap -v" /mnt/snap2/foo /mnt/snap2/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 25088..25095 8 0x2000 1: [8..2055]: hole 2048 2: [2056..2063]:24576..24583 8 0x2001 $ xfs_io -r -c "fiemap -v" /mnt/snap2/bar /mnt/snap2/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 25096..25103 8 0x2000 1: [8..2055]: hole 2048 2: [2056..2063]:24584..24591 8 0x2001 $ btrfs send /mnt/snap1 -f /tmp/1.snap $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.snap $ umount /mnt # It's not relevant to enable no-holes in the new filesystem. $ mkfs.btrfs -O no-holes -f /dev/sdc $ mount /dev/sdc /mnt $ btrfs receive /mnt -f /tmp/1.snap $ btrfs receive /mnt -f /tmp/2.snap $ xfs_io -r -c "fiemap -v" /mnt/snap2/foo /mnt/snap2/foo: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 24576..24583 8 0x2000 1: [8..2063]: 25624..27679 2056 0x1 $ xfs_io -r -c "fiemap -v" /mnt/snap2/bar /mnt/snap2/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 24584..24591 8 0x2000 1: [8..2063]: 27680..29735 2056 0x1 The holes do not exist in the second filesystem and they were replaced with extents filled with the byte 0x00, making each file take 1032Kb of space instead of 8Kb. So fix this by not issuing the write operations consisting of buffers filled with the byte 0x00 when the destination snapshot already has a hole for the respective range. A test case for fstests will follow soon. Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 88 +++-- 1 file changed, 86 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 712922e..456c890 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5306,6 +5306,81 @@ static int get_last_extent(struct send_ctx *sctx, u64 offset) return ret; } +static int range_is_hole_in_parent(struct send_ctx *sctx, + const u64 start, + const u64 end) +{ + struct btrfs_path *path; + struct btrfs_key key; + struct btrfs_root *root = sctx->parent_root; + u64 search_start = start; + int ret; + + path = alloc_path_for_send(); + if (!path) + return -ENOMEM; + + key.objectid = sctx->cur_ino; + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = search_start; + ret = btrfs_search_slot(NULL, root, , path, 0, 0); + if (ret < 0) + goto out; + if (ret > 0 && path->slots[0] > 0) + path->slots[0]--; + + while (search_start < end) { + struct extent_buffer *leaf = path->nodes[0]; + int slot = path->slots[0]; + struct btrfs_file_extent_item *fi; + u64 extent_end; + + if (slot >= btrfs_header_nritems(leaf)) { + ret = btrfs_next_leaf(root, path); + if (ret < 0) + goto out; + else if (ret > 0) + break; + continue; + } + + btrfs_item_key_to_cpu(leaf, , slot); + if (key.objectid < sctx->cur_ino || + key.type < BTRFS_EXTENT_DATA_KEY) + goto next; + if (key.objectid > sctx->cur_ino || + key.type > BTRFS_EXTENT_DATA_KEY || + key.offset >= end) + break; + + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + if (btrfs_file_extent_type(leaf, fi) == + BTRFS_FILE_EXTENT_INLINE) { + u64
[PATCH] Btrfs: fix data loss after truncate when using the no-holes feature
From: Filipe MananaIf we have a file with an implicit hole (NO_HOLES feature enabled) that has an extent following the hole, delayed writes against regions of the file behind the hole happened before but were not yet flushed and then we truncate the file to a smaller size that lies inside the hole, we end up persisting a wrong disk_i_size value for our inode that leads to data loss after umounting and mounting again the filesystem or after the inode is evicted and loaded again. This happens because at inode.c:btrfs_truncate_inode_items() we end up setting last_size to the offset of the extent that we deleted and that followed the hole. We then pass that value to btrfs_ordered_update_i_size() which updates the inode's disk_i_size to a value smaller then the offset of the buffered (delayed) writes. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo $ xfs_io -c "truncate 60K" /mnt/foo --> inode's disk_i_size updated to 0 $ md5sum /mnt/foo 3c5ca3c3ab42f4b04d7e7eb0b0d4d806 /mnt/foo $ umount /dev/sdb $ mount /dev/sdb /mnt $ md5sum /mnt/foo d41d8cd98f00b204e9800998ecf8427e /mnt/foo --> Empty file, all data lost! Cc: # 3.14+ Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents") Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8eb6703..a0a38ce 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4703,8 +4703,12 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans, btrfs_abort_transaction(trans, ret); } error: - if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) + if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { + if (last_size > new_size && + btrfs_fs_incompat(fs_info, NO_HOLES)) + last_size = new_size; btrfs_ordered_update_i_size(inode, last_size, NULL); + } btrfs_free_path(path); -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: relocation: Enhance kernel error output for relocation
On Tue, Feb 14, 2017 at 12:57 AM, Qu Wenruowrote: > When balance(relocation) fails, btrfs-progs will report like: > > ERROR: error during balancing '/mnt/scratch': Input/output error > There may be more info in syslog - try dmesg | tail > > However kernel can't provide may useful info in many cases to locate the > problem. > > This patch will add error messages in relocation to help user and > developer to locate the problem. > > Signed-off-by: Qu Wenruo > --- > v2: > Fix typo where 'err' and 'ret' are used wrong. > --- > fs/btrfs/relocation.c | 60 > +++ > 1 file changed, 56 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 379711048fb0..d26809807c1b 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -4011,6 +4011,8 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > rc->block_rsv, rc->block_rsv->size, > BTRFS_RESERVE_FLUSH_ALL); > if (ret) { > + btrfs_err(fs_info, "failed to reserve space: %d(%s)", Please leave a space between %d and the opening parenthesis (a common style you follow but it's not correct in most western languages afaik, and makes things harder to the eye). > + ret, btrfs_decode_error(ret)); > err = ret; > break; > } > @@ -4019,6 +4021,9 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > if (IS_ERR(trans)) { > err = PTR_ERR(trans); > trans = NULL; > + btrfs_err(fs_info, > + "failed to start transaction: %d(%s)", Same, space before parenthesis missing. > + err, btrfs_decode_error(err)); > break; > } > restart: > @@ -4028,8 +4033,11 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > } > > ret = find_next_extent(rc, path, ); > - if (ret < 0) > + if (ret < 0) { > + btrfs_err(fs_info, "failed to find next extent: > %d(%s)", Same, space before parenthesis missing. > + ret, btrfs_decode_error(ret)); > err = ret; > + } > if (ret != 0) > break; > > @@ -4081,9 +4089,17 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > > if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { > ret = add_tree_block(rc, , path, ); > + if (ret < 0) > + btrfs_err(fs_info, > + "failed to record tree block: > %d(%s)", > + ret, btrfs_decode_error(ret)); > } else if (rc->stage == UPDATE_DATA_PTRS && >(flags & BTRFS_EXTENT_FLAG_DATA)) { > ret = add_data_references(rc, , path, ); > + if (ret < 0) > + btrfs_err(fs_info, > + "failed to record data extent: > %d(%s)", Same, space before parenthesis missing. > + ret, btrfs_decode_error(ret)); > } else { > btrfs_release_path(path); > ret = 0; > @@ -4103,6 +4119,9 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > rc->backref_cache.last_trans = trans->transid > - 1; > > if (ret != -EAGAIN) { > + btrfs_err(fs_info, > + "faild to relocate tree blocks: > %d(%s)", faild -> failed Same, space before parenthesis missing. > + ret, > btrfs_decode_error(ret)); > err = ret; > break; > } > @@ -4121,6 +4140,9 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > ret = relocate_data_extent(rc->data_inode, >, >cluster); > if (ret < 0) { > + btrfs_err(fs_info, > + "failed to relocate data extent: %d(%s)", Same, space before parenthesis missing. > + ret, btrfs_decode_error(ret)); >
Re: [PATCH] btrfs: make space cache inode readahead failure nonfatal
On Mon, Feb 13, 2017 at 10:53:12AM +0100, David Sterba wrote: > We do a readahead of the free space cache inode to speed things up but > the failure is not fatal, like in other readahead cases. Proper reads > would need to happen anyway and any errors would be caught there. > Reviewed-by: Liu BoThanks, -liubo > Signed-off-by: David Sterba > --- > fs/btrfs/free-space-cache.c | 10 +++--- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index 437580e84d9d..e96e981ef01e 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -286,14 +286,14 @@ int btrfs_truncate_free_space_cache(struct btrfs_root > *root, > return ret; > } > > -static int readahead_cache(struct inode *inode) > +static void readahead_cache(struct inode *inode) > { > struct file_ra_state *ra; > unsigned long last_index; > > ra = kzalloc(sizeof(*ra), GFP_NOFS); > if (!ra) > - return -ENOMEM; > + return; > > file_ra_state_init(ra, inode->i_mapping); > last_index = (i_size_read(inode) - 1) >> PAGE_SHIFT; > @@ -301,8 +301,6 @@ static int readahead_cache(struct inode *inode) > page_cache_sync_readahead(inode->i_mapping, ra, NULL, 0, last_index); > > kfree(ra); > - > - return 0; > } > > static int io_ctl_init(struct btrfs_io_ctl *io_ctl, struct inode *inode, > @@ -730,9 +728,7 @@ static int __load_free_space_cache(struct btrfs_root > *root, struct inode *inode, > if (ret) > return ret; > > - ret = readahead_cache(inode); > - if (ret) > - goto out; > + readahead_cache(inode); > > ret = io_ctl_prepare_pages(_ctl, inode, 1); > if (ret) > -- > 2.10.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: relocation: Enhance kernel error output for relocation
On Tue, Feb 14, 2017 at 08:57:50AM +0800, Qu Wenruo wrote: > When balance(relocation) fails, btrfs-progs will report like: > > ERROR: error during balancing '/mnt/scratch': Input/output error > There may be more info in syslog - try dmesg | tail > > However kernel can't provide may useful info in many cases to locate the > problem. > > This patch will add error messages in relocation to help user and > developer to locate the problem. Reviewed-by: Liu BoThanks, -liubo > > Signed-off-by: Qu Wenruo > --- > v2: > Fix typo where 'err' and 'ret' are used wrong. > --- > fs/btrfs/relocation.c | 60 > +++ > 1 file changed, 56 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 379711048fb0..d26809807c1b 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -4011,6 +4011,8 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > rc->block_rsv, rc->block_rsv->size, > BTRFS_RESERVE_FLUSH_ALL); > if (ret) { > + btrfs_err(fs_info, "failed to reserve space: %d(%s)", > + ret, btrfs_decode_error(ret)); > err = ret; > break; > } > @@ -4019,6 +4021,9 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > if (IS_ERR(trans)) { > err = PTR_ERR(trans); > trans = NULL; > + btrfs_err(fs_info, > + "failed to start transaction: %d(%s)", > + err, btrfs_decode_error(err)); > break; > } > restart: > @@ -4028,8 +4033,11 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > } > > ret = find_next_extent(rc, path, ); > - if (ret < 0) > + if (ret < 0) { > + btrfs_err(fs_info, "failed to find next extent: %d(%s)", > + ret, btrfs_decode_error(ret)); > err = ret; > + } > if (ret != 0) > break; > > @@ -4081,9 +4089,17 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > > if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { > ret = add_tree_block(rc, , path, ); > + if (ret < 0) > + btrfs_err(fs_info, > + "failed to record tree block: %d(%s)", > + ret, btrfs_decode_error(ret)); > } else if (rc->stage == UPDATE_DATA_PTRS && > (flags & BTRFS_EXTENT_FLAG_DATA)) { > ret = add_data_references(rc, , path, ); > + if (ret < 0) > + btrfs_err(fs_info, > + "failed to record data extent: > %d(%s)", > + ret, btrfs_decode_error(ret)); > } else { > btrfs_release_path(path); > ret = 0; > @@ -4103,6 +4119,9 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > rc->backref_cache.last_trans = trans->transid - > 1; > > if (ret != -EAGAIN) { > + btrfs_err(fs_info, > + "faild to relocate tree blocks: %d(%s)", > + ret, btrfs_decode_error(ret)); > err = ret; > break; > } > @@ -4121,6 +4140,9 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > ret = relocate_data_extent(rc->data_inode, > , >cluster); > if (ret < 0) { > + btrfs_err(fs_info, > + "failed to relocate data extent: %d(%s)", > + ret, btrfs_decode_error(ret)); > err = ret; > break; > } > @@ -4147,8 +4169,12 @@ static noinline_for_stack int > relocate_block_group(struct reloc_control *rc) > if (!err) { > ret = relocate_file_extent_cluster(rc->data_inode, > >cluster); > - if (ret < 0) > + if (ret < 0) { > + btrfs_err(fs_info, > + "failed to relocate
Re: [PATCH 28/29] btrfs: remove unused parameters from __btrfs_write_out_cache
On Mon, Feb 13, 2017 at 10:34:47AM +0100, David Sterba wrote: > Both unused after the call to update_cache_item has been moved to > __btrfs_wait_cache_io. > Reviewed-by: Liu BoThanks, -liubo > Signed-off-by: David Sterba > --- > fs/btrfs/free-space-cache.c | 11 +++ > 1 file changed, 3 insertions(+), 8 deletions(-) > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > index 0d02599d22bc..c169acef2a23 100644 > --- a/fs/btrfs/free-space-cache.c > +++ b/fs/btrfs/free-space-cache.c > @@ -1224,8 +1224,6 @@ int btrfs_wait_cache_io(struct btrfs_trans_handle > *trans, > * @ctl - the free space cache we are going to write out > * @block_group - the block_group for this cache if it belongs to a > block_group > * @trans - the trans handle > - * @path - the path to use > - * @offset - the offset for the key we'll insert > * > * This function writes out a free space cache struct to disk for quick > recovery > * on mount. This will return 0 if it was successful in writing the cache > out, > @@ -1235,8 +1233,7 @@ static int __btrfs_write_out_cache(struct btrfs_root > *root, struct inode *inode, > struct btrfs_free_space_ctl *ctl, > struct btrfs_block_group_cache *block_group, > struct btrfs_io_ctl *io_ctl, > -struct btrfs_trans_handle *trans, > -struct btrfs_path *path, u64 offset) > +struct btrfs_trans_handle *trans) > { > struct btrfs_fs_info *fs_info = root->fs_info; > struct extent_state *cached_state = NULL; > @@ -1394,8 +1391,7 @@ int btrfs_write_out_cache(struct btrfs_fs_info *fs_info, > return 0; > > ret = __btrfs_write_out_cache(root, inode, ctl, block_group, > - _group->io_ctl, trans, > - path, block_group->key.objectid); > + _group->io_ctl, trans); > if (ret) { > #ifdef DEBUG > btrfs_err(fs_info, > @@ -3542,8 +3538,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root, > return 0; > > memset(_ctl, 0, sizeof(io_ctl)); > - ret = __btrfs_write_out_cache(root, inode, ctl, NULL, _ctl, > - trans, path, 0); > + ret = __btrfs_write_out_cache(root, inode, ctl, NULL, _ctl, trans); > if (!ret) { > /* >* At this point writepages() didn't error out, so our metadata > -- > 2.10.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to dump/find parity of RAID-5 file?
On Mon, Feb 06, 2017 at 09:40:47PM +0100, Goffredo Baroncelli wrote: > > IIRC, the parity is spread across the disk stripes of the chunk. > > So first you have to find the logical-offset [LO] where the the file begins. > Then you have to map this offset to the chunk which holds the data. The chunk > has the following info: > - chunk start [CS], chunk length [CL] > - for each stripe: > where the stripe starts > > If you subtract the chunk-start from the logical-offset [ CO == LO-CS], you > will find the offset where the data belongs in the chunk. > > As stated above, the PARITY is spread across the chunk stripes. So (supposing > that the stripe size is 64K, the raid level is 5, the disks are three), > > - the first 64k of stripe 0, is data [0..64K) > - the first 64k of stripe 1, is data [64..128K) > - the first 64k of stripe 2 is parity, > > - the 2nd 64k of stripe 0 is parity, > - the 2nd 64k of stripe 1, is data [128..196K) > - the 2nd 64k of stripe 2, is data [192..256K) > > - the 3rd 64k of stripe 0, is data [256..320K) > - the 3rd 64k of stripe 1 is parity, > - the 3rd 64k of stripe 2, is data [320..384K) > and so on, > > To find the data, You have to compare the CO to the data [...) range. > > If you look to an my old patch (unfinished :-( ), you can find some example > to dump the different stripe > > [BTRFS-PROGS][PATCH][V2] Add two new commands: 'btrfs insp physical-find' and > 'btrfs insp physical-dump' > > Sorry for the delay, I was offline. Thanks for the details. I can understood "partiy spread across the chunk stripes" part. But unable to figure-out the first part regarding calculations. Raid5 With 3-devices each 512MB. Create single 128KB file("print 'Ab'+'a'*65534+'aB'+'b'*65533"). 'debug-tree' shows chunk tree as: item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 145096704) itemoff 15557 itemsize 144 length 134217728 owner 2 stripe_len 65536 type DATA|RAID5 io_align 65536 io_width 65536 sector_size 4096 num_stripes 3 sub_stripes 0 stripe 0 devid 3 offset 6368 dev_uuid 9a2a18f1-6193-44b9-aafc-23d161d66110 stripe 1 devid 2 offset 6368 dev_uuid e45ab907-c3a8-4dff-af9f-2ae5fd38ffd6 stripe 2 devid 1 offset 83034112 dev_uuid 428c04d9-37da-454a-b7b2-f6fe88580de2 and fs-tree shows: item 13 key (145227776 EXTENT_ITEM 131072) itemoff 15788 itemsize 53 extent refs 1 gen 7 flags DATA extent data backref root 5 objectid 257 offset 0 count 1 >From above, I assume: LO=145227776 CS=145096704 and CL=134217728 CO=145227776 - 145096704 => CO = 131072 Quite confused from here :s I'll look into your patches to understand more. I hope sometime in future we will have your finished patches :) 'physical-find' and 'physical-find' commands will be really useful for debugging/testing and learning purposes. thanks. Cheers. Lakshmipathi.G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to dump/find parity of RAID-5 file?
On Mon, Feb 06, 2017 at 09:36:58AM +0800, Qu Wenruo wrote: > > Please note the following things when calculating RAID5/6 data and P/Q > location: > > 1) Basic layout > The chunk stripe only shows the *first* full stripe layout. > And it always follows the sequence of "Data0, Data1, ... Data N, Parity" > > And one full stripe is consistent of N * 64K (fixed yet). > > > 2) Device rotation > RAID5/6 do device rotation. > So the *second* full stripe will have the following layout: > "Data 1, Data2, ... Data N, Parity, Data 0" > > > You can refer to my offline-scrub patchset to see how it assemble the > RAID5/6 mapping, which I believe it is easier than current btrfs_map_block() > implementation. > [PATCH v2 19/19] btrfs-progs: fsck: Introduce offline scrub function > > Thanks, > Qu Sorry for the delay in response, I was offline. Thanks for the details. I think now I have better idea about (1)Basic layout and (2)Device rotation. Will look into offline-scrub patches to explore more about exact mapping and on-disk layouts. I'm trying to understand layout for simple Raid-5 with 3-drives with 128KB file-size. Will try to understand Raid-6 little later. thanks. Cheers. Lakshmipathi.G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/29] btrfs: remove unused parameters from btrfs_cmp_data
On Mon, Feb 13, 2017 at 10:34:35AM +0100, David Sterba wrote: > After the page locking has been reworked, we get all pages prepared via > cmp_pages. > Reviewed-by: Liu BoThanks, -liubo > Signed-off-by: David Sterba > --- > fs/btrfs/ioctl.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 91188a2ac5a1..1f13f8416d29 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -3053,8 +3053,7 @@ static int btrfs_cmp_data_prepare(struct inode *src, > u64 loff, > return 0; > } > > -static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst, > - u64 dst_loff, u64 len, struct cmp_pages *cmp) > +static int btrfs_cmp_data(u64 len, struct cmp_pages *cmp) > { > int ret = 0; > int i; > @@ -3221,7 +3220,7 @@ static int btrfs_extent_same(struct inode *src, u64 > loff, u64 olen, > } > > /* pass original length for comparison so we stay within i_size */ > - ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, ); > + ret = btrfs_cmp_data(olen, ); > if (ret == 0) > ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); > > -- > 2.10.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FS gives kernel UPS on attempt to create snapshot and after running balance it's unmountable.
[root@server ~]# btrfs-show-super -af /dev/sdc superblock: bytenr=65536, device=/dev/sdc - csum_type 0 (crc32c) csum_size 4 csum0x17d56ce0 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid0576d577-8954-4a60-a02b-9492b3c29318 label main_pool generation 150682 root5223857717248 sys_array_size 321 chunk_root_generation 150678 root_level 1 chunk_root 8669488005120 chunk_root_level1 log_root0 log_root_transid0 log_root_level 0 total_bytes 16003191472128 bytes_used 6411278503936 sectorsize 4096 nodesize16384 leafsize16384 stripesize 4096 root_dir6 num_devices 8 compat_flags0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation150682 uuid_tree_generation150679 dev_item.uuid 46abffa8-7afe-451f-93c6-abb8e589c4e8 dev_item.fsid 0576d577-8954-4a60-a02b-9492b3c29318 [match] dev_item.type 0 dev_item.total_bytes2000398934016 dev_item.bytes_used 1647136735232 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 8669487824896) length 67108864 owner 2 stripe_len 65536 type SYSTEM|RAID10 io_align 65536 io_width 65536 sector_size 4096 num_stripes 8 sub_stripes 2 stripe 0 devid 7 offset 1083674984448 dev_uuid 566fb8a3-d6de-4230-8b70-a5fda0a120f6 stripe 1 devid 8 offset 1083674984448 dev_uuid 845aefb2-e0a6-479a-957b-a82fb7207d6c stripe 2 devid 1 offset 1365901312 dev_uuid 46abffa8-7afe-451f-93c6-abb8e589c4e8 stripe 3 devid 3 offset 1345978368 dev_uuid 95921633-2fc1-479f-a3ba-e6e5a1989755 stripe 4 devid 4 offset 1345978368 dev_uuid 20828f0e-4661-4987-ac11-72814c1e423a stripe 5 devid 5 offset 1345978368 dev_uuid 2c3cd71f-5178-48e7-8032-6b6eec023197 stripe 6 devid 6 offset 1345978368 dev_uuid 806a47e5-cac4-41c9-abb9-5c49506459e1 stripe 7 devid 2 offset 1345978368 dev_uuid e1358e0e-edaf-4505-9c71-ed0862c45841 backup_roots[4]: backup 0: backup_tree_root: 5223857717248 gen: 150680 level: 1 backup_chunk_root: 8669488005120 gen: 150678 level: 1 backup_extent_root: 5223867383808 gen: 150680 level: 2 backup_fs_root: 0 gen: 0 level: 0 backup_dev_root:5224791523328 gen: 150680 level: 1 backup_csum_root: 5224802140160 gen: 150680 level: 3 backup_total_bytes: 16003191472128 backup_bytes_used: 6411278503936 backup_num_devices: 8 backup 1: backup_tree_root: 5224155807744 gen: 150681 level: 1 backup_chunk_root: 8669488005120 gen: 150678 level: 1 backup_extent_root: 5224156233728 gen: 150681 level: 2 backup_fs_root: 0 gen: 0 level: 0 backup_dev_root:5224633155584 gen: 150681 level: 1 backup_csum_root: 5224634941440 gen: 150681 level: 3 backup_total_bytes: 16003191472128 backup_bytes_used: 6411278503936 backup_num_devices: 8 backup 2: backup_tree_root: 5223857717248 gen: 150682 level: 1 backup_chunk_root: 8669488005120 gen: 150678 level: 1 backup_extent_root: 5223867383808 gen: 150682 level: 2 backup_fs_root: 0 gen: 0 level: 0 backup_dev_root:5224622358528 gen: 150682 level: 1 backup_csum_root: 5224675344384 gen: 150682 level: 3 backup_total_bytes: 16003191472128 backup_bytes_used: 6411278503936
Re: [PATCH 21/29] btrfs: remove unused parameter from create_snapshot
On Mon, Feb 13, 2017 at 10:34:26AM +0100, David Sterba wrote: > The name parameters have never been used, as the name is passed via the > dentry. > Reviewed-by: Liu BoThanks, -liubo > Signed-off-by: David Sterba > --- > fs/btrfs/ioctl.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index e8c7c313c113..91188a2ac5a1 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -656,7 +656,7 @@ static void btrfs_wait_for_no_snapshoting_writes(struct > btrfs_root *root) > } > > static int create_snapshot(struct btrfs_root *root, struct inode *dir, > -struct dentry *dentry, char *name, int namelen, > +struct dentry *dentry, > u64 *async_transid, bool readonly, > struct btrfs_qgroup_inherit *inherit) > { > @@ -871,7 +871,7 @@ static noinline int btrfs_mksubvol(const struct path > *parent, > goto out_up_read; > > if (snap_src) { > - error = create_snapshot(snap_src, dir, dentry, name, namelen, > + error = create_snapshot(snap_src, dir, dentry, > async_transid, readonly, inherit); > } else { > error = create_subvol(dir, dentry, name, namelen, > -- > 2.10.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 17/29] btrfs: remove unused parameter from extent_write_cache_pages
On Mon, Feb 13, 2017 at 10:34:13AM +0100, David Sterba wrote: > The 'tree' was used to call locking hook that does not exist anymore. > > Signed-off-by: David Sterba> --- > fs/btrfs/extent_io.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index f683fa5a4b91..22a2f2fa62c7 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -3916,8 +3916,7 @@ int btree_write_cache_pages(struct address_space > *mapping, > * WB_SYNC_ALL then we were called for data integrity and we must wait for > * existing IO to complete. > */ > -static int extent_write_cache_pages(struct extent_io_tree *tree, > - struct address_space *mapping, > +static int extent_write_cache_pages(struct address_space *mapping, >struct writeback_control *wbc, >writepage_t writepage, void *data, >void (*flush_fn)(void *)) > @@ -4158,8 +4157,7 @@ int extent_writepages(struct extent_io_tree *tree, > .bio_flags = 0, > }; > > - ret = extent_write_cache_pages(tree, mapping, wbc, > -__extent_writepage, , > + ret = extent_write_cache_pages(mapping, wbc, __extent_writepage, , > flush_write_bio); Are we going to leave {btrfs,extent}_{read,write}pages untouched? Thanks, -liubo > flush_epd_write_bio(); > return ret; > -- > 2.10.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/29] btrfs: remove unused parameter from submit_extent_page
On Mon, Feb 13, 2017 at 10:34:04AM +0100, David Sterba wrote: > This used to hold number of maximum pages to allocate, but this is now > limited by BIO_MAX_PAGES. > > Signed-off-by: David Sterba> --- > fs/btrfs/extent_io.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 9df6ed30de00..00f3be5c4f2d 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -2759,7 +2759,6 @@ static int submit_extent_page(int op, int op_flags, > struct extent_io_tree *tree, > size_t size, unsigned long offset, > struct block_device *bdev, > struct bio **bio_ret, > - unsigned long max_pages, > bio_end_io_t end_io_func, > int mirror_num, > unsigned long prev_bio_flags, > @@ -3063,7 +3062,7 @@ static int __do_readpage(struct extent_io_tree *tree, > pnr -= page->index; > ret = submit_extent_page(REQ_OP_READ, read_flags, tree, NULL, >page, sector, disk_io_size, pg_offset, > - bdev, bio, pnr, > + bdev, bio, >end_bio_extent_readpage, mirror_num, >*bio_flags, >this_bio_flag, @pnr is only used here, it can be removed, too. > @@ -3434,7 +3433,7 @@ static noinline_for_stack int > __extent_writepage_io(struct inode *inode, > > ret = submit_extent_page(REQ_OP_WRITE, write_flags, tree, wbc, >page, sector, iosize, pg_offset, > - bdev, >bio, max_nr, > + bdev, >bio, >end_bio_extent_writepage, >0, 0, 0, false); > if (ret) { So does @max_nr. With removing those two, Reviewed-by: Liu Bo Thanks, -liubo > @@ -3751,7 +3750,7 @@ static noinline_for_stack int write_one_eb(struct > extent_buffer *eb, > set_page_writeback(p); > ret = submit_extent_page(REQ_OP_WRITE, write_flags, tree, wbc, >p, offset >> 9, PAGE_SIZE, 0, bdev, > - >bio, -1, > + >bio, >end_bio_extent_buffer_writepage, >0, epd->bio_flags, bio_flags, false); > epd->bio_flags = bio_flags; > -- > 2.10.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use helper to get used bytes of space_info
On Tue, Feb 14, 2017 at 04:56:06PM +0100, David Sterba wrote: > On Mon, Feb 13, 2017 at 03:42:21PM -0800, Liu Bo wrote: > > This uses a helper instead of open code around used byte of space_info > > everywhere. > > > > Signed-off-by: Liu Bo> > Reviewed-by: David Sterba > > > --- > > fs/btrfs/extent-tree.c | 42 -- > > 1 file changed, 20 insertions(+), 22 deletions(-) > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > index 097fa4a..b2b0b82 100644 > > --- a/fs/btrfs/extent-tree.c > > +++ b/fs/btrfs/extent-tree.c > > @@ -4118,6 +4118,15 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, > > int data) > > return ret; > > } > > > > +static u64 btrfs_space_info_used(struct btrfs_space_info *s_info, > > +bool may_used_included) > > +{ > > + ASSERT(s_info); > > + return s_info->bytes_used + s_info->bytes_reserved + > > + s_info->bytes_pinned + s_info->bytes_readonly + > > + ((may_used_included) ? s_info->bytes_may_use : 0); > > I'd rename may_used_included to 'may_use_included' so it matches > bytes_may_use. I agree, it sounds better. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/29] btrfs: remove unused parameter from btrfs_check_super_valid
On Tue, Feb 14, 2017 at 12:42:07PM +0100, David Sterba wrote: > On Mon, Feb 13, 2017 at 06:11:56PM -0800, Liu Bo wrote: > > On Mon, Feb 13, 2017 at 10:33:55AM +0100, David Sterba wrote: > > > None of the checks need to know the RO/RW status. > > > > > > > OK...there was a readonly check, which lets us skip all checks, > > it was removed by the below commit, should we add the check back? > > All the current checks do not modify the superblock, so it's read-only > and we can keep it as-is. The superblock is available from inside the > function anyway so we don't need to check sb_flags externally. I'll > update the changelog. Yes, this makes sense. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unexpected behavior involving file attributes and snapshots.
On 2017-02-14 11:46, Austin S. Hemmelgarn wrote: On 2017-02-14 11:07, Chris Murphy wrote: On Tue, Feb 14, 2017 at 8:30 AM, Austin S. Hemmelgarnwrote: I was just experimenting with snapshots on 4.9.0, and came across some unexpected behavior. The simple explanation is that if you snapshot a subvolume, any files in the subvolume that have the NOCOW attribute will not have that attribute in the snapshot. Some further testing indicates that this is the only file attribute that isn't preserved (I checked all the chattr flags that BTRFS supports). Huh, I can't reproduce this with 4.9.8 or 4.10rc7. systemd sets journal files with chattr +C, and I do manual snapshots of rootfs periodically, and those snapshots have journal files that have +C still set. Just tested on a different filesystem, and I'm not seeing it there either, I'll take a closer look at the FS I saw this on and see if I can figure out what's up. After poking around a bit further, the system crashed, and it looks like there was some memory corruption scattered throughout the kernel from one of the other modules I had loaded (now I get to spend a day or more figuring out which one and reporting that bug). Given the state of the kernel crash dump though, I'm actually somewhat surprised that things weren't misbehaving more spectacularly than this. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unexpected behavior involving file attributes and snapshots.
On 2017-02-14 11:07, Chris Murphy wrote: On Tue, Feb 14, 2017 at 8:30 AM, Austin S. Hemmelgarnwrote: I was just experimenting with snapshots on 4.9.0, and came across some unexpected behavior. The simple explanation is that if you snapshot a subvolume, any files in the subvolume that have the NOCOW attribute will not have that attribute in the snapshot. Some further testing indicates that this is the only file attribute that isn't preserved (I checked all the chattr flags that BTRFS supports). Huh, I can't reproduce this with 4.9.8 or 4.10rc7. systemd sets journal files with chattr +C, and I do manual snapshots of rootfs periodically, and those snapshots have journal files that have +C still set. Just tested on a different filesystem, and I'm not seeing it there either, I'll take a closer look at the FS I saw this on and see if I can figure out what's up. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: specify a new ordered extent type for create_io_em
On Mon, Feb 13, 2017 at 03:35:09PM -0800, Liu Bo wrote: > As 0 refers to an existing type BTRFS_ORDERED_IO_DONE, this specifies a > new type 'REGULAR' for regular IO. > > Signed-off-by: Liu BoReviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unexpected behavior involving file attributes and snapshots.
On Tue, Feb 14, 2017 at 8:30 AM, Austin S. Hemmelgarnwrote: > I was just experimenting with snapshots on 4.9.0, and came across some > unexpected behavior. > > The simple explanation is that if you snapshot a subvolume, any files in the > subvolume that have the NOCOW attribute will not have that attribute in the > snapshot. Some further testing indicates that this is the only file > attribute that isn't preserved (I checked all the chattr flags that BTRFS > supports). Huh, I can't reproduce this with 4.9.8 or 4.10rc7. systemd sets journal files with chattr +C, and I do manual snapshots of rootfs periodically, and those snapshots have journal files that have +C still set. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use helper to get used bytes of space_info
On Mon, Feb 13, 2017 at 03:42:21PM -0800, Liu Bo wrote: > This uses a helper instead of open code around used byte of space_info > everywhere. > > Signed-off-by: Liu BoReviewed-by: David Sterba > --- > fs/btrfs/extent-tree.c | 42 -- > 1 file changed, 20 insertions(+), 22 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 097fa4a..b2b0b82 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -4118,6 +4118,15 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, > int data) > return ret; > } > > +static u64 btrfs_space_info_used(struct btrfs_space_info *s_info, > + bool may_used_included) > +{ > + ASSERT(s_info); > + return s_info->bytes_used + s_info->bytes_reserved + > + s_info->bytes_pinned + s_info->bytes_readonly + > + ((may_used_included) ? s_info->bytes_may_use : 0); I'd rename may_used_included to 'may_use_included' so it matches bytes_may_use. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unexpected behavior involving file attributes and snapshots.
On Tue, 14 Feb 2017 10:30:43 -0500 "Austin S. Hemmelgarn"wrote: > I was just experimenting with snapshots on 4.9.0, and came across some > unexpected behavior. > > The simple explanation is that if you snapshot a subvolume, any files in > the subvolume that have the NOCOW attribute will not have that attribute > in the snapshot. Some further testing indicates that this is the only > file attribute that isn't preserved (I checked all the chattr flags that > BTRFS supports). > > I'm kind of curious whether: > 1. This is actually documented somewhere, as it's somewhat unexpected > given that everything else is preserved when snapshotting. > 2. This is intended behavior, or just happens to be a side effect of the > implementation. I don't seem to get this on 4.4.45 and 4.4.47. $ btrfs sub create test Create subvolume './test' $ touch test/abc $ chattr +C test/abc $ echo def > test/abc $ ls -la test/abc -rw-r--r-- 1 rm rm 4 Feb 14 20:52 test/abc $ lsattr test/abc ---C test/abc $ btrfs sub snap test test2 Create a snapshot of 'test' in './test2' $ ls -la test2/abc -rw-r--r-- 1 rm rm 4 Feb 14 20:52 test2/abc $ lsattr test2/abc ---C test2/abc -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: try to avoid acquiring free space ctl's lock
On Mon, Feb 13, 2017 at 03:42:30PM -0800, Liu Bo wrote: > We don't need to take the lock if the block group has not been cached. > > Signed-off-by: Liu BoReviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
problem: can't mount superblock
dear all, two days ago I got an old 1TB external USB HDD drive. I scanned it using smartctl, long scan, no errors. I created one partition /dev/sde1, formatted it using BTRFS, and created two subvolumes. One subvolume was intended for backups. This morning I started copying some 200GB data on this subvolume, but something went haywire. I use Ubuntu xenial + backports, linux kernel 4.8.0-34-generic btrfs-progs 4.7.3-1 I now see that in the kernel log this line was printed many times > BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr **, rd 0, flush 0, > corrupt 0, gen 0 where ** progressed from 0 to 56; after that this line appeared > BTRFS: error (device sdd1) in btrfs_commit_transaction:2232: errno=-5 > IO failure (Error while writing out transaction) > BTRFS info (device sdd1): forced readonly then there was a traceback, and a lot more errors (see attached log). I then umounted the volume, but I could not remount it. (Yes I tried to turn it off and on again). Here are some command and their outputs. > # mount -t btrfs -o ro,recover /dev/sde1 /mnt > mount: /dev/sde1: can't read superblock when I run the above, The kernel log reports > BTRFS error (device sdd1): bdev /dev/sde1 errs: wr 73, rd 44, flush 0, > corrupt 0, gen 0 where the "rd" part increases each time Another command. > # btrfs check --repair /dev/sde1 > enabling repair mode > Checking filesystem on /dev/sde1 > UUID: 491f1bb6-a477-4f3d-a066-19cabaa9 > checking extents > Fixed 0 roots. > checking free space cache > cache and super generation don't match, space cache will be invalidated > checking fs roots > checking csums > checking root refs > found 90068570112 bytes used err is 0 > total csum bytes: 87728132 > total tree bytes: 212680704 > total fs tree bytes: 110411776 > total extent tree bytes: 5750784 > btree space waste bytes: 2363 > file data blocks allocated: 89855889408 > referenced 89855889408 Note that smartctl reports no errors in the HDD whatsoever... ( whatever error occurred, it was maybe in the USB transport). Maybe the next step may be "btrfs rescue super-recover" but I would like your advice. Thanks, a. ps: please reply to me too, I did not subscribe to this list Feb 14 09:36:34 localhost kernel: [431222.636024] usb 1-3: new high-speed USB device number 3 using ehci-pci Feb 14 09:36:34 localhost kernel: [431222.785672] usb 1-3: New USB device found, idVendor=0bc2, idProduct=2321 Feb 14 09:36:34 localhost kernel: [431222.785675] usb 1-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1 Feb 14 09:36:34 localhost kernel: [431222.785676] usb 1-3: Product: Expansion Feb 14 09:36:34 localhost kernel: [431222.785678] usb 1-3: Manufacturer: Seagate Feb 14 09:36:34 localhost kernel: [431222.785679] usb 1-3: SerialNumber: NA4BS1FY Feb 14 09:36:34 localhost kernel: [431222.787890] scsi host7: uas Feb 14 09:36:34 localhost kernel: [431222.824806] sd 7:0:0:0: Attached scsi generic sg5 type 0 Feb 14 09:36:34 localhost kernel: [431222.825556] sd 7:0:0:0: [sde] Spinning up disk... Feb 14 09:36:37 localhost kernel: [431223.840036] ...ready Feb 14 09:36:37 localhost kernel: [431225.888521] sd 7:0:0:0: [sde] 3907029167 512-byte logical blocks: (2.00 TB/1.82 TiB) Feb 14 09:36:37 localhost kernel: [431225.888524] sd 7:0:0:0: [sde] 2048-byte physical blocks Feb 14 09:36:37 localhost kernel: [431226.167318] sd 7:0:0:0: [sde] Attached SCSI disk Feb 14 09:38:19 localhost kernel: [431327.955353] EXT4-fs (sde1): mounted filesystem with ordered data mode. Opts: (null) Feb 14 10:19:00 localhost kernel: [433769.032880] floppy: error -5 while reading block 0 Feb 14 10:19:21 localhost kernel: [433789.768849] blk_update_request: I/O error, dev fd0, sector 0 Feb 14 10:19:21 localhost kernel: [433789.768873] floppy: error -5 while reading block 0 Feb 14 10:19:27 localhost kernel: [433796.434449] blk_update_request: I/O error, dev sdd, sector 65435512 Feb 14 10:19:28 localhost kernel: [433796.774734] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Feb 14 10:19:28 localhost kernel: [433796.774740] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 03 e6 7a 30 00 00 f0 00 Feb 14 10:19:28 localhost kernel: [433796.774743] blk_update_request: I/O error, dev sdd, sector 65436208 Feb 14 10:19:28 localhost kernel: [433797.007909] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Feb 14 10:19:28 localhost kernel: [433797.007916] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 03 e6 7c 10 00 00 f0 00 Feb 14 10:19:28 localhost kernel: [433797.007919] blk_update_request: I/O error, dev sdd, sector 65436688 Feb 14 10:19:28 localhost kernel: [433797.230799] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Feb 14 10:19:28 localhost kernel: [433797.230805] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 03 e6 7d f0 00 00 f0 00 Feb 14 10:19:28 localhost kernel: [433797.230808] blk_update_request: I/O error, dev sdd, sector 65437168 Feb 14 10:19:28
Unexpected behavior involving file attributes and snapshots.
I was just experimenting with snapshots on 4.9.0, and came across some unexpected behavior. The simple explanation is that if you snapshot a subvolume, any files in the subvolume that have the NOCOW attribute will not have that attribute in the snapshot. Some further testing indicates that this is the only file attribute that isn't preserved (I checked all the chattr flags that BTRFS supports). I'm kind of curious whether: 1. This is actually documented somewhere, as it's somewhat unexpected given that everything else is preserved when snapshotting. 2. This is intended behavior, or just happens to be a side effect of the implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists
Hi again, so, it seems that I've solved the problem: After having to umount/mount the FS several times to get btrfs-cleaner to finish, I thought of the "failed to load free space [...], rebuilding it now" type errors and decided to try the clear_cache mount option. Since then, my home server has been running its backups regularly again. Furthermore, I was able back up my desktop again via send/recv (the rsync based backup is still running, but I expect it to succeed). The kernel log has also stayed clean. Kai, I'd be curious whether clear_cache will help in your case, too, if you haven't tried it already. Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
"btrfs receive" throws "No space left on device" on empty and large enough fs
Hi all I recently submitted a bug report to launchpad ( https://bugs.launchpad.net/ubuntu/+source/btrfs-tools/+bug/1664013 ) but I now found out about this mailing list, which may be a better place to post. Basically, whenever many files are created at high throughput in my empty btrfs file system, it throws an ENOSPC error.I have had this happen with both "btrfs receive" and with "rsync" but not when creating a single large file. Both "rsync" and "btrfs receive" fail not when creating a new file but when renaming/moving it (see link). I hope the description of the error in launchpad is helpful (please let me know if I need to re-post it here). I am happy to provide additional information or run other tests if this can help. Thanks! Luca -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] btrfs: ulist: make the finalization function public
On Tue, Feb 14, 2017 at 05:29:14PM +0800, Qu Wenruo wrote: > > > At 02/13/2017 10:30 PM, David Sterba wrote: > > Make ulist_fini externally visible so the ulist API is complete. > > Looks good to me. > > While I'm not pretty sure if the name "ulist_fini" is good enough. > > Maybe "ulist_finish" or "ulist_release" (just like release_path and > free_path)? Yeah, I don't like the name either, ulist_release sounds ok to me. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 11/29] btrfs: remove unused parameter from btrfs_check_super_valid
On Mon, Feb 13, 2017 at 06:11:56PM -0800, Liu Bo wrote: > On Mon, Feb 13, 2017 at 10:33:55AM +0100, David Sterba wrote: > > None of the checks need to know the RO/RW status. > > > > OK...there was a readonly check, which lets us skip all checks, > it was removed by the below commit, should we add the check back? All the current checks do not modify the superblock, so it's read-only and we can keep it as-is. The superblock is available from inside the function anyway so we don't need to check sb_flags externally. I'll update the changelog. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Root volume (ID 5) in deleting state
It looks you're right! On a different machine: # btrfs sub list / | grep -v lxc ID 327 gen 1959587 top level 5 path mnt/reaver ID 498 gen 593655 top level 5 path var/lib/machines # btrfs sub list / -d | wc -l 0 Ok, apparently it's a regression in one of the latest versions then. But, it seems quite harmless. I'm glad my data are safe :) # uname -a Linux interceptor 4.9.6-1-ARCH #1 SMP PREEMPT Thu Jan 26 09:22:26 CET 2017 x86_64 GNU/Linux # btrfs fi show / Label: none uuid: 859dec5c-850c-4660-ad99-bc87456aa309 Total devices 1 FS bytes used 132.89GiB devid1 size 200.00GiB used 200.00GiB path /dev/mapper/vg0-btrfsroot As a side note, all of your disk space is allocated (200GiB of 200GiB). Even while there's still 70GiB of free space scattered around inside, this might lead to out-of-space issues, depending on how badly fragmented that free space is. I have not noticed this at all! # btrfs fi show / Label: none uuid: 859dec5c-850c-4660-ad99-bc87456aa309 Total devices 1 FS bytes used 134.23GiB devid1 size 200.00GiB used 200.00GiB path /dev/mapper/vg0-btrfsroot # btrfs fi df / Data, single: total=195.96GiB, used=131.58GiB System, single: total=3.00MiB, used=48.00KiB Metadata, single: total=4.03GiB, used=2.64GiB GlobalReserve, single: total=512.00MiB, used=0.00B After btrfs defrag there is no difference. btrfs fi show says still 200/200. I'll try to play with it. [ ... ] So, to get the numbers of total raw disk space allocation down, you need to defragment free space (compact the data), not defrag used space. You can even create pictures of space utilization in your btrfs filesystem, which might help understanding what it looks like right now: \o/ https://github.com/knorrie/btrfs-heatmap/ I've run into your tool yesterday while googling around this - thanks, it's really nice tool. Now rebalance is running and it seems to work well. Thank you for excellent responses and help! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] btrfs: ulist: make the finalization function public
At 02/13/2017 10:30 PM, David Sterba wrote: Make ulist_fini externally visible so the ulist API is complete. Looks good to me. While I'm not pretty sure if the name "ulist_fini" is good enough. Maybe "ulist_finish" or "ulist_release" (just like release_path and free_path)? Thanks, Qu Signed-off-by: David Sterba--- fs/btrfs/ulist.c | 2 +- fs/btrfs/ulist.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c index b1434bb57e36..5deee56434fc 100644 --- a/fs/btrfs/ulist.c +++ b/fs/btrfs/ulist.c @@ -58,7 +58,7 @@ void ulist_init(struct ulist *ulist) * This is useful in cases where the base 'struct ulist' has been statically * allocated. */ -static void ulist_fini(struct ulist *ulist) +void ulist_fini(struct ulist *ulist) { struct ulist_node *node; struct ulist_node *next; diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h index 007b22fff3f9..1a4130443d7e 100644 --- a/fs/btrfs/ulist.h +++ b/fs/btrfs/ulist.h @@ -44,6 +44,7 @@ struct ulist { }; void ulist_init(struct ulist *ulist); +void ulist_fini(struct ulist *ulist); void ulist_reinit(struct ulist *ulist); struct ulist *ulist_alloc(gfp_t gfp_mask); void ulist_free(struct ulist *ulist); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] btrfs: embed extent_changeset::range_changed to the structure
At 02/13/2017 10:30 PM, David Sterba wrote: We can embed range_changed to the extent changeset to address following problems: - no need to allocate ulist dynamically, we also get rid of the GFP_NOFS for free - fix lack of allocation failure checking in btrfs_qgroup_reserve_data Nice catch. The stack consuption where extent_changeset is used slightly increases: before: 16 after: 16 - 8 (for pointer) + 32 (sizeof ulist) = 40 Which is bearable. Signed-off-by: David SterbaReviewed-by: Qu Wenruo Thanks, Qu --- fs/btrfs/extent_io.c | 2 +- fs/btrfs/extent_io.h | 2 +- fs/btrfs/qgroup.c| 24 +--- 3 files changed, 11 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9df6ed30de00..9140847bfb0c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -144,7 +144,7 @@ static void add_extent_changeset(struct extent_state *state, unsigned bits, if (!set && (state->state & bits) == 0) return; changeset->bytes_changed += state->end - state->start + 1; - ret = ulist_add(changeset->range_changed, state->start, state->end, + ret = ulist_add(>range_changed, state->start, state->end, GFP_ATOMIC); /* ENOMEM */ BUG_ON(ret < 0); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 4551a5b4b8f5..270d03be290e 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -193,7 +193,7 @@ struct extent_changeset { u64 bytes_changed; /* Changed ranges */ - struct ulist *range_changed; + struct ulist range_changed; }; static inline void extent_set_compress_type(unsigned long *bio_flags, diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 971701328229..7cc2e2221f55 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2828,7 +2828,7 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len) return 0; changeset.bytes_changed = 0; - changeset.range_changed = ulist_alloc(GFP_NOFS); + ulist_init(_changed); ret = set_record_extent_bits(_I(inode)->io_tree, start, start + len -1, EXTENT_QGROUP_RESERVED, ); trace_btrfs_qgroup_reserve_data(inode, start, len, @@ -2840,17 +2840,17 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 start, u64 len) if (ret < 0) goto cleanup; - ulist_free(changeset.range_changed); + ulist_fini(_changed); return ret; cleanup: /* cleanup already reserved ranges */ ULIST_ITER_INIT(); - while ((unode = ulist_next(changeset.range_changed, ))) + while ((unode = ulist_next(_changed, ))) clear_extent_bit(_I(inode)->io_tree, unode->val, unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL, GFP_NOFS); - ulist_free(changeset.range_changed); + ulist_fini(_changed); return ret; } @@ -2862,10 +2862,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len, int ret; changeset.bytes_changed = 0; - changeset.range_changed = ulist_alloc(GFP_NOFS); - if (!changeset.range_changed) - return -ENOMEM; - + ulist_init(_changed); ret = clear_record_extent_bits(_I(inode)->io_tree, start, start + len -1, EXTENT_QGROUP_RESERVED, ); if (ret < 0) @@ -2878,7 +2875,7 @@ static int __btrfs_qgroup_release_data(struct inode *inode, u64 start, u64 len, trace_btrfs_qgroup_release_data(inode, start, len, changeset.bytes_changed, trace_op); out: - ulist_free(changeset.range_changed); + ulist_fini(_changed); return ret; } @@ -2976,22 +2973,19 @@ void btrfs_qgroup_check_reserved_leak(struct inode *inode) int ret; changeset.bytes_changed = 0; - changeset.range_changed = ulist_alloc(GFP_NOFS); - if (WARN_ON(!changeset.range_changed)) - return; - + ulist_init(_changed); ret = clear_record_extent_bits(_I(inode)->io_tree, 0, (u64)-1, EXTENT_QGROUP_RESERVED, ); WARN_ON(ret < 0); if (WARN_ON(changeset.bytes_changed)) { ULIST_ITER_INIT(); - while ((unode = ulist_next(changeset.range_changed, ))) { + while ((unode = ulist_next(_changed, ))) { btrfs_warn(BTRFS_I(inode)->root->fs_info, "leaking qgroup reserved space, ino: %lu, start: %llu, end: %llu", inode->i_ino, unode->val, unode->aux); } qgroup_free(BTRFS_I(inode)->root, changeset.bytes_changed); } - ulist_free(changeset.range_changed); + ulist_fini(_changed); } -- To unsubscribe from this list:
Re: [PATCH 7/7] nonblocking aio: btrfs
On 14.02.2017 04:46, Goldwyn Rodrigues wrote: > From: Goldwyn Rodrigues> > Return EAGAIN if any of the following checks fail > + i_rwsem is lockable > + NODATACOW or PREALLOC is set perhaps you wanted to say "NODATACOW or PREALLOC is _not_ set"? See below > + Can nocow at the desired location > + Writing beyond end of file > > Signed-off-by: Goldwyn Rodrigues > --- > fs/btrfs/file.c | 25 - > fs/btrfs/inode.c | 3 +++ > 2 files changed, 23 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 3a14c87..f4dcc7a 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -1804,12 +1804,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb > *iocb, > ssize_t num_written = 0; > bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host); > ssize_t err; > - loff_t pos; > - size_t count; > + loff_t pos = iocb->ki_pos; > + size_t count = iov_iter_count(from); > loff_t oldsize; > int clean_page = 0; > > - inode_lock(inode); > + if ((iocb->ki_flags & IOCB_NONBLOCKING) && > + (iocb->ki_flags & IOCB_DIRECT)) { > + /* Don't sleep on inode rwsem */ > + if (!inode_trylock(inode)) > + return -EAGAIN; > + /* > + * We will allocate space in case nodatacow is not set, > + * so bail > + */ > + if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | > + BTRFS_INODE_PREALLOC)) || The first part of the condition will evaluate to true only when both btrfs_inode_nodatacow and btrfs_inodeprealloc are not set. The code contradicts your statement in the change log. > + check_can_nocow(inode, pos, ) <= 0) { > + inode_unlock(inode); > + return -EAGAIN; > + } > + } else > + inode_lock(inode); > + > err = generic_write_checks(iocb, from); > if (err <= 0) { > inode_unlock(inode); > @@ -1843,8 +1860,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >*/ > update_time_for_write(inode); > > - pos = iocb->ki_pos; > - count = iov_iter_count(from); > start_pos = round_down(pos, root->sectorsize); > oldsize = i_size_read(inode); > if (start_pos > oldsize) { > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index be4da91..9911ebd 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -8690,6 +8690,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, > struct iov_iter *iter) > if (offset + count <= inode->i_size) { > inode_unlock(inode); > relock = true; > + } else if (iocb->ki_flags & IOCB_NONBLOCKING) { > + ret = -EAGAIN; > + goto out; > } > ret = btrfs_delalloc_reserve_space(inode, offset, count); > if (ret) > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html