Re: btrfs/125 deadlock using nospace_cache or space_cache=v2

2017-02-14 Thread Qu Wenruo

State updated:

The deadlock seems to be caused by 2 bugs:

1) Bad error handling in run_delalloc_nocow()
   The direct cause is, btrfs_reloc_clone_csums() fails and return -EIO.
   Then error handler will call extent_clear_unlock_delalloc() to
   clear dirty flag and end writeback of the resting pages in the
   extent.

   However this makes the ordered extent not happy, as it just skips IO
   of the remaining pages, which ordered extent relies on to finish.

   This is quite easy to reproduce using the following modification:
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1e861a0..b9d0bcb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1497,8 +1497,11 @@ static noinline int run_delalloc_nocow(struct 
inode *inode,


if (root->root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
+   ret = -EIO;
+   /*
ret = btrfs_reloc_clone_csums(inode, cur_offset,
  num_bytes);
+ */
if (ret) {
if (!nolock && nocow)

btrfs_end_write_no_snapshoting(root);


   Then any balance will cause btrfs to wait ordered extent which will
   never finish, just like what we encountered.

2) RAID5/6 recover not working in some tree operation.
   In fact, btrfs succeeded to mount the fs, so RAID5/6 recover code is
   working, at least for some trees.

   And btrfs succeeded in recovering all the data with correct checksum,
   if using normal read(cat works here) before balance.

   However it fails to read csum tree and causes run_delalloc_nocow() to
   return -EIO, which leads to above bug.

   So there is something related to RAID5/6 code, maybe readahead? which
   contributes to this bug.

I'll continue digging and keep the state updated if anyone is interested 
in this bug.


Thanks,
Qu


At 02/07/2017 04:02 PM, Anand Jain wrote:


Hi Qu,

 I don't think I have seen this before, I don't know the reason
 why I wrote this, may be to test encryption, however it was all
 with default options.

 But now I could reproduce and, looks like balance fails to
 start with IO error though the mount is successful.
--
# tail -f ./results/btrfs/125.full
intense and takes potentially very long. It is recommended to
use the balance filters to narrow down the balanced data.
Use 'btrfs balance start --full-balance' option to skip this
warning. The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1ERROR: error during balancing '/scratch':
Input/output error
There may be more info in syslog - try dmesg | tail

Starting balance without any filters.
failed: '/root/bin/btrfs balance start /scratch'


 This must be fixed. For debugging if I add a sync before previous
 unmount, the problem isn't reproduced. just fyi. Strange.

---
diff --git a/tests/btrfs/125 b/tests/btrfs/125
index 91aa8d8c3f4d..4d4316ca9f6e 100755
--- a/tests/btrfs/125
+++ b/tests/btrfs/125
@@ -133,6 +133,7 @@ echo "-Mount normal-" >> $seqres.full
 echo
 echo "Mount normal and balance"

+_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
 _scratch_unmount
 _run_btrfs_util_prog device scan
 _scratch_mount >> $seqres.full 2>&1
--

 HTH.

Thanks, Anand


On 02/07/17 14:09, Qu Wenruo wrote:

Hi Anand,

I found that btrfs/125 test case can only pass if we enabled space cache.

If using nospace_cache or space_cache=v2 mount option, it will get
blocked forever with the following callstack(the only blocked process):

[11382.046978] btrfs   D11128  6705   6057 0x
[11382.047356] Call Trace:
[11382.047668]  __schedule+0x2d4/0xae0
[11382.047956]  schedule+0x3d/0x90
[11382.048283]  btrfs_start_ordered_extent+0x160/0x200 [btrfs]
[11382.048630]  ? wake_atomic_t_function+0x60/0x60
[11382.048958]  btrfs_wait_ordered_range+0x113/0x210 [btrfs]
[11382.049360]  btrfs_relocate_block_group+0x260/0x2b0 [btrfs]
[11382.049703]  btrfs_relocate_chunk+0x51/0xf0 [btrfs]
[11382.050073]  btrfs_balance+0xaa9/0x1610 [btrfs]
[11382.050404]  ? btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs]
[11382.050739]  btrfs_ioctl_balance+0x3a0/0x3b0 [btrfs]
[11382.051109]  btrfs_ioctl+0xbe7/0x27f0 [btrfs]
[11382.051430]  ? trace_hardirqs_on+0xd/0x10
[11382.051747]  ? free_object+0x74/0xa0
[11382.052084]  ? debug_object_free+0xf2/0x130
[11382.052413]  do_vfs_ioctl+0x94/0x710
[11382.052750]  ? enqueue_hrtimer+0x160/0x160
[11382.053090]  ? do_nanosleep+0x71/0x130
[11382.053431]  SyS_ioctl+0x79/0x90
[11382.053735]  entry_SYSCALL_64_fastpath+0x18/0xad
[11382.054570] RIP: 0033:0x7f397d7a6787

I also found in the test case, we only have 3 continuous data extents,
whose sizes are 1M, 68.5M and 31.5M respectively.

Original data block group:
0   1M 64M69.5M  101M   128M
| Ext A | Extent B(68.5M) |

Re: [PATCH v2] Btrfs: fix btrfs_decompress_buf2page()

2017-02-14 Thread Anand Jain




On 02/11/17 07:03, Omar Sandoval wrote:

From: Omar Sandoval 

If btrfs_decompress_buf2page() is handed a bio with its page in the
middle of the working buffer, then we adjust the offset into the working
buffer. After we copy into the bio, we advance the iterator by the
number of bytes we copied. Then, we have some logic to handle the case
of discontiguous pages and adjust the offset into the working buffer
again. However, if we didn't advance the bio to a new page, we may enter
this case in error, essentially repeating the adjustment that we already
made when we entered the function. The end result is bogus data in the
bio.

Previously, we only checked for this case when we advanced to a new
page, but the conversion to bio iterators changed that. This restores
the old, correct behavior.

A case I saw when testing with zlib was:

buf_start = 42769
total_out = 46865
working_bytes = total_out - buf_start = 4096
start_byte = 45056

The condition (total_out > start_byte && buf_start < start_byte) is
true, so we adjust the offset:

buf_offset = start_byte - buf_start = 2287
working_bytes -= buf_offset = 1809
current_buf_start = buf_start = 42769

Then, we copy

bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
buf_offset += bytes = 4096
working_bytes -= bytes = 0
current_buf_start += bytes = 44578

After bio_advance(), we are still in the same page, so start_byte is the
same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
which is true! So, we adjust the values again:

buf_offset = start_byte - buf_start = 2287
working_bytes = total_out - start_byte = 1809
current_buf_start = buf_start + buf_offset = 45056

But note that working_bytes was already zero before this, so we should
have stopped copying.



 Thanks Omar for nailing this down. How about a test case for this ?

-Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans

2017-02-14 Thread Qu Wenruo
Just as Filipe pointed out, the most time consuming parts of qgroup are
btrfs_qgroup_account_extents() and
btrfs_qgroup_prepare_account_extents().
Which both call btrfs_find_all_roots() to get old_roots and new_roots
ulist.

What makes things worse is, we're calling that expensive
btrfs_find_all_roots() at transaction committing time with
TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.

Such behavior is necessary for @new_roots search as current
btrfs_find_all_roots() can't do it correctly so we do call it just
before switch commit roots.

However for @old_roots search, it's not necessary as such search is
based on commit_root, so it will always be correct and we can move it
out of transaction committing.

This patch moves the @old_roots search part out of
commit_transaction(), so in theory we can half the time qgroup time
consumption at commit_transaction().

But please note that, this won't speedup qgroup overall, the total time
consumption is still the same, just reduce the performance stall.

Cc: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
v2:
  Update commit message to make it more clear.
  Don't call btrfs_find_all_roots() before insert qgroup extent record,
  so we can avoid wasting CPU for already inserted qgroup extent record.

PS:
If and only if we have fixed and proved btrfs_find_all_roots() can
get correct result with current root, then we can move all
expensive btrfs_find_all_roots() out of transaction committing.

However I strongly doubt if it's possible with current delayed ref,
and without such prove, move btrfs_find_all_roots() for @new_roots
out of transaction committing will just screw up qgroup accounting.

---
 fs/btrfs/delayed-ref.c | 20 
 fs/btrfs/qgroup.c  | 33 +
 fs/btrfs/qgroup.h  | 33 ++---
 3 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ef724a5fc30e..0ee927ef5a71 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_node *ref,
 struct btrfs_qgroup_extent_record *qrecord,
 u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
-int action, int is_data)
+int action, int is_data, int *qrecord_inserted_ret)
 {
struct btrfs_delayed_ref_head *existing;
struct btrfs_delayed_ref_head *head_ref = NULL;
struct btrfs_delayed_ref_root *delayed_refs;
int count_mod = 1;
int must_insert_reserved = 0;
+   int qrecord_inserted = 0;
 
/* If reserved is provided, it must be a data extent. */
BUG_ON(!is_data && reserved);
@@ -623,6 +624,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
if(btrfs_qgroup_trace_extent_nolock(fs_info,
delayed_refs, qrecord))
kfree(qrecord);
+   else
+   qrecord_inserted = 1;
}
 
spin_lock_init(_ref->lock);
@@ -650,6 +653,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
atomic_inc(_refs->num_entries);
trans->delayed_ref_updates++;
}
+   if (qrecord_inserted_ret)
+   *qrecord_inserted_ret = qrecord_inserted;
return head_ref;
 }
 
@@ -779,6 +784,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_qgroup_extent_record *record = NULL;
+   int qrecord_inserted;
 
BUG_ON(extent_op && extent_op->is_data);
ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
@@ -806,12 +812,15 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
 * the spin lock
 */
head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record,
-   bytenr, num_bytes, 0, 0, action, 0);
+   bytenr, num_bytes, 0, 0, action, 0,
+   _inserted);
 
add_delayed_tree_ref(fs_info, trans, head_ref, >node, bytenr,
 num_bytes, parent, ref_root, level, action);
spin_unlock(_refs->lock);
 
+   if (qrecord_inserted)
+   return btrfs_qgroup_trace_extent_post(fs_info, record);
return 0;
 
 free_head_ref:
@@ -836,6 +845,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_qgroup_extent_record *record = NULL;
+   int qrecord_inserted;
 
BUG_ON(extent_op && !extent_op->is_data);
ref = 

[PATCH v3] btrfs: relocation: Enhance kernel error output for relocation

2017-02-14 Thread Qu Wenruo
When balance(relocation) fails, btrfs-progs will report like:

ERROR: error during balancing '/mnt/scratch': Input/output error
There may be more info in syslog - try dmesg | tail

However kernel can't provide may useful info in many cases to locate the
problem.

This patch will add error messages in relocation to help user and
developer to locate the problem.

Signed-off-by: Qu Wenruo 
---
v2:
  Fix typo where 'err' and 'ret' are used wrong.
v3:
  Add space between error number and parenthesis of error string
  Fix gramma errors.
---
 fs/btrfs/relocation.c | 61 +++
 1 file changed, 57 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 379711048fb0..6de5800e17bc 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4011,6 +4011,8 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
rc->block_rsv, rc->block_rsv->size,
BTRFS_RESERVE_FLUSH_ALL);
if (ret) {
+   btrfs_err(fs_info, "failed to reserve space: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4019,6 +4021,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
trans = NULL;
+   btrfs_err(fs_info,
+   "failed to start transaction: %d (%s)",
+ err, btrfs_decode_error(err));
break;
}
 restart:
@@ -4028,8 +4033,12 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)
}
 
ret = find_next_extent(rc, path, );
-   if (ret < 0)
+   if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to find next extent: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
+   }
if (ret != 0)
break;
 
@@ -4081,9 +4090,17 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)
 
if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
ret = add_tree_block(rc, , path, );
+   if (ret < 0)
+   btrfs_err(fs_info,
+   "failed to record tree block: %d (%s)",
+ ret, btrfs_decode_error(ret));
} else if (rc->stage == UPDATE_DATA_PTRS &&
   (flags & BTRFS_EXTENT_FLAG_DATA)) {
ret = add_data_references(rc, , path, );
+   if (ret < 0)
+   btrfs_err(fs_info,
+   "failed to record data extent: %d (%s)",
+ ret, btrfs_decode_error(ret));
} else {
btrfs_release_path(path);
ret = 0;
@@ -4103,6 +4120,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
rc->backref_cache.last_trans = trans->transid - 
1;
 
if (ret != -EAGAIN) {
+   btrfs_err(fs_info,
+   "failed to relocate tree blocks: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4121,6 +4141,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
ret = relocate_data_extent(rc->data_inode,
   , >cluster);
if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to relocate data extent: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4147,8 +4170,12 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)
if (!err) {
ret = relocate_file_extent_cluster(rc->data_inode,
   >cluster);
-   if (ret < 0)
+   if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to relocate file extent cluster: %d (%s)",
+ ret, btrfs_decode_error(ret));
   

Re: [PATCH v2] btrfs: relocation: Enhance kernel error output for relocation

2017-02-14 Thread Qu Wenruo



At 02/15/2017 04:34 AM, Filipe Manana wrote:

On Tue, Feb 14, 2017 at 12:57 AM, Qu Wenruo  wrote:

When balance(relocation) fails, btrfs-progs will report like:

ERROR: error during balancing '/mnt/scratch': Input/output error
There may be more info in syslog - try dmesg | tail

However kernel can't provide may useful info in many cases to locate the
problem.

This patch will add error messages in relocation to help user and
developer to locate the problem.

Signed-off-by: Qu Wenruo 
---
v2:
  Fix typo where 'err' and 'ret' are used wrong.
---
 fs/btrfs/relocation.c | 60 +++
 1 file changed, 56 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 379711048fb0..d26809807c1b 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4011,6 +4011,8 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
rc->block_rsv, rc->block_rsv->size,
BTRFS_RESERVE_FLUSH_ALL);
if (ret) {
+   btrfs_err(fs_info, "failed to reserve space: %d(%s)",


Please leave a space between %d and the opening parenthesis (a common
style you follow but it's not correct in most western languages afaik,
and makes things harder to the eye).



Thanks for pointing out my bad habit.

I'll avoid such usage in later patches.

Thanks,
Qu

+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4019,6 +4021,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
trans = NULL;
+   btrfs_err(fs_info,
+ "failed to start transaction: %d(%s)",


Same, space before parenthesis missing.


+ err, btrfs_decode_error(err));
break;
}
 restart:
@@ -4028,8 +4033,11 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)
}

ret = find_next_extent(rc, path, );
-   if (ret < 0)
+   if (ret < 0) {
+   btrfs_err(fs_info, "failed to find next extent: %d(%s)",


Same, space before parenthesis missing.


+ ret, btrfs_decode_error(ret));
err = ret;
+   }
if (ret != 0)
break;

@@ -4081,9 +4089,17 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)

if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
ret = add_tree_block(rc, , path, );
+   if (ret < 0)
+   btrfs_err(fs_info,
+ "failed to record tree block: %d(%s)",
+ ret, btrfs_decode_error(ret));
} else if (rc->stage == UPDATE_DATA_PTRS &&
   (flags & BTRFS_EXTENT_FLAG_DATA)) {
ret = add_data_references(rc, , path, );
+   if (ret < 0)
+   btrfs_err(fs_info,
+ "failed to record data extent: 
%d(%s)",


Same, space before parenthesis missing.


+ ret, btrfs_decode_error(ret));
} else {
btrfs_release_path(path);
ret = 0;
@@ -4103,6 +4119,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
rc->backref_cache.last_trans = trans->transid - 
1;

if (ret != -EAGAIN) {
+   btrfs_err(fs_info,
+   "faild to relocate tree blocks: %d(%s)",


faild -> failed

Same, space before parenthesis missing.


+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4121,6 +4140,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
ret = relocate_data_extent(rc->data_inode,
   , >cluster);
if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to relocate data extent: %d(%s)",


Same, space before parenthesis missing.


+ ret, btrfs_decode_error(ret));
err = ret;
break;
  

Re: [PATCH] Btrfs: fix data loss after truncate when using the no-holes feature

2017-02-14 Thread Liu Bo
On Tue, Feb 14, 2017 at 08:35:26PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If we have a file with an implicit hole (NO_HOLES feature enabled) that
> has an extent following the hole, delayed writes against regions of the
> file behind the hole happened before but were not yet flushed and then
> we truncate the file to a smaller size that lies inside the hole, we
> end up persisting a wrong disk_i_size value for our inode that leads to
> data loss after umounting and mounting again the filesystem or after
> the inode is evicted and loaded again.
> 
> This happens because at inode.c:btrfs_truncate_inode_items() we end up
> setting last_size to the offset of the extent that we deleted and that
> followed the hole. We then pass that value to btrfs_ordered_update_i_size()
> which updates the inode's disk_i_size to a value smaller then the offset
> of the buffered (delayed) writes.
> 
> Example reproducer:
> 
>  $ mkfs.btrfs -f /dev/sdb
>  $ mount /dev/sdb /mnt
> 
>  $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo
>  $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo
>  $ xfs_io -c "truncate 60K" /mnt/foo
>--> inode's disk_i_size updated to 0
> 
>  $ md5sum /mnt/foo
>  3c5ca3c3ab42f4b04d7e7eb0b0d4d806  /mnt/foo
> 
>  $ umount /dev/sdb
>  $ mount /dev/sdb /mnt
> 
>  $ md5sum /mnt/foo
>  d41d8cd98f00b204e9800998ecf8427e  /mnt/foo
>--> Empty file, all data lost!
> 
> Cc:   # 3.14+
> Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole 
> extents")
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/inode.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8eb6703..a0a38ce 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4703,8 +4703,12 @@ int btrfs_truncate_inode_items(struct 
> btrfs_trans_handle *trans,
>   btrfs_abort_transaction(trans, ret);
>   }
>  error:
> - if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
> + if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
> + if (last_size > new_size &&
> + btrfs_fs_incompat(fs_info, NO_HOLES))
> + last_size = new_size;
>   btrfs_ordered_update_i_size(inode, last_size, NULL);
> + }
>

Looks good to me.

Reviewed-by: Liu Bo 

We can also remove the code that sets last_size = new_size in the case of
NO_HOLES inside the while loop, as I think this patch covers that, too.

Because we've held the inode mutex and wait for the dirty pages beyond the
current i_size, in fact, I couldn't think of a case that with NO_HOLE last_size
< new_size, maybe we could use new_size directly and add a ASSERT just in case?

Thanks,

-liubo
>   btrfs_free_path(path);
>  
> -- 
> 2.7.0.rc3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: misc-tests: Primary Superblock corruption and recovery using backup Superblock.

2017-02-14 Thread Lakshmipathi.G
Signed-off-by: Lakshmipathi.G 
---
 .../019-fix-corrupted-superblock/test.sh   | 37 ++
 1 file changed, 37 insertions(+)
 create mode 100755 tests/misc-tests/019-fix-corrupted-superblock/test.sh

diff --git a/tests/misc-tests/019-fix-corrupted-superblock/test.sh 
b/tests/misc-tests/019-fix-corrupted-superblock/test.sh
new file mode 100755
index 000..806c775
--- /dev/null
+++ b/tests/misc-tests/019-fix-corrupted-superblock/test.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+#
+# Corrupt primary superblock and restore it using backup superblock.
+#
+
+source $TOP/tests/common
+
+check_prereq btrfs-select-super
+
+setup_root_helper
+prepare_test_dev 512M
+
+superblock_offset=65536
+
+test_superblock_restore()
+{
+   run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV
+
+   # Corrupt superblock checksum
+dd if=/dev/zero of=$TEST_DEV seek=$superblock_offset bs=1 \
+count=4  conv=notrunc &> /dev/null
+   run_check_stdout $SUDO_HELPER mount $TEST_DEV $TEST_MNT | \
+   grep -q 'wrong fs type'
+if [ $? -ne 0 ]; then
+   _fail "Failed to corrupt superblock."
+fi 
+
+   # Copy backup superblock to primary
+   run_check $TOP/btrfs-select-super -s 1 $TEST_DEV
+   run_check $SUDO_HELPER mount $TEST_DEV $TEST_MNT
+if [ $? -ne 0 ]; then
+   _fail "Failed to fix superblock."
+fi 
+   run_check_umount_test_dev
+}
+
+test_superblock_restore
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: add test for truncate after a mix of dio and buffered writes

2017-02-14 Thread fdmanana
From: Filipe Manana 

Test that if we have a file with a hole, do a mix of direct IO and
buffered writes to it and truncate the file to a size that lies in the
middle of the hole, after unmounting and mounting again the filesystem,
the file has a correct size and no data loss happened.

This test is motivated by a bug found in btrfs when used with the
no-holes feature (i.e. MKFS_OPTIONS="-O no-holes") which is fixed by
the following patch for the linux kernel:

  Btrfs: fix data loss after truncate when using the no-holes feature

Signed-off-by: Filipe Manana 
---
 tests/generic/409 | 75 +++
 tests/generic/409.out |  9 +++
 tests/generic/group   |  1 +
 3 files changed, 85 insertions(+)
 create mode 100755 tests/generic/409
 create mode 100644 tests/generic/409.out

diff --git a/tests/generic/409 b/tests/generic/409
new file mode 100755
index 000..8d3fd33
--- /dev/null
+++ b/tests/generic/409
@@ -0,0 +1,75 @@
+#! /bin/bash
+# FSQA Test No. 409
+#
+# Test that if we have a file with a hole, do a mix of direct IO and buffered
+# writes to it and truncate the file to a size that lies in the middle of the
+# hole, after unmounting and mounting again the filesystem, the file has a
+# correct size and no data loss happened.
+#
+#---
+#
+# Copyright (C) 2017 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# Create out test file with two extents and a hole between those extents.
+# The extent that lies beyond the hole must be written using direct IO and 
later
+# we truncate the file to a size that lies within the hole's range.
+$XFS_IO_PROG -f -c "pwrite -S 0x01 0K 32K" $SCRATCH_MNT/foo | _filter_xfs_io
+$XFS_IO_PROG -d -c "pwrite -S 0x02 -b 32K 64K 32K" $SCRATCH_MNT/foo \
+   | _filter_xfs_io
+
+# Now truncate our file to a smaller size that lies behind the offset used by
+# the previous direct IO write and that lies in a file hole.
+$XFS_IO_PROG -c "truncate 60K" $SCRATCH_MNT/foo
+
+# Now get file digests before unmounting the filesystem and after mounting it
+# again. The digests should match (same file data and size in both cases).
+echo "File digest before unmounting the filesystem:"
+md5sum $SCRATCH_MNT/foo | _filter_scratch
+_scratch_cycle_mount
+echo "File digest after mounting again the filesystem:"
+md5sum $SCRATCH_MNT/foo | _filter_scratch
+
+status=0
+exit
diff --git a/tests/generic/409.out b/tests/generic/409.out
new file mode 100644
index 000..02a7dc5
--- /dev/null
+++ b/tests/generic/409.out
@@ -0,0 +1,9 @@
+QA output created by 409
+wrote 32768/32768 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 65536
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+File digest before unmounting the filesystem:
+3c5ca3c3ab42f4b04d7e7eb0b0d4d806  SCRATCH_MNT/foo
+File digest after mounting again the filesystem:
+3c5ca3c3ab42f4b04d7e7eb0b0d4d806  SCRATCH_MNT/foo
diff --git a/tests/generic/group b/tests/generic/group
index d0bc47d..074b494 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -411,3 +411,4 @@
 406 auto quick dangerous
 407 auto quick clone metadata
 408 auto quick clone dedupe metadata
+409 auto quick metadata
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: add test for btrfs send/receive with sparse files

2017-02-14 Thread fdmanana
From: Filipe Manana 

Test that both a full and incremental btrfs send operation preserves file
holes.

This used to fail when the filesystem had the NO_HOLES feature enabled,
that is, when the test is run with MKFS_OPTIONS="-O no-holes".

This is fixed by the following patch for the linux kernel:

  "Btrfs: incremental send, fix unnecessary hole writes for sparse files"

Signed-off-by: Filipe Manana 
---
 tests/btrfs/137 | 141 
 tests/btrfs/137.out |  63 +++
 tests/btrfs/group   |   1 +
 3 files changed, 205 insertions(+)
 create mode 100755 tests/btrfs/137
 create mode 100644 tests/btrfs/137.out

diff --git a/tests/btrfs/137 b/tests/btrfs/137
new file mode 100755
index 000..3ff2c6b
--- /dev/null
+++ b/tests/btrfs/137
@@ -0,0 +1,141 @@
+#! /bin/bash
+# FS QA Test No. btrfs/137
+#
+# Test that both incremental and full send operations preserve file holes.
+#
+#---
+#
+# Copyright (C) 2017 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -fr $send_files_dir
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/punch
+
+# real QA test starts here
+_supported_fs btrfs
+_supported_os Linux
+_require_test
+_require_scratch
+_require_xfs_io_command "fiemap"
+
+send_files_dir=$TEST_DIR/btrfs-test-$seq
+
+rm -f $seqres.full
+rm -fr $send_files_dir
+mkdir $send_files_dir
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+
+# Create the first test file.
+$XFS_IO_PROG -f -c "pwrite -S 0xaa 0 4K" $SCRATCH_MNT/foo | _filter_xfs_io
+
+# Create a second test file with a 1Mb hole.
+$XFS_IO_PROG -f \
+ -c "pwrite -S 0xaa 0 4K" \
+ -c "pwrite -S 0xbb 1028K 4K" \
+ $SCRATCH_MNT/bar | _filter_xfs_io
+
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+   $SCRATCH_MNT/snap1 >/dev/null
+
+# Now add one new extent to our first test file, increasing its size and 
leaving
+# a 1Mb hole between the first extent and this new extent.
+$XFS_IO_PROG -c "pwrite -S 0xbb 1028K 4K" $SCRATCH_MNT/foo | _filter_xfs_io
+
+# Now overwrite the last extent of our second test file.
+$XFS_IO_PROG -c "pwrite -S 0xcc 1028K 4K" $SCRATCH_MNT/bar | _filter_xfs_io
+
+$BTRFS_UTIL_PROG subvolume snapshot -r $SCRATCH_MNT \
+$SCRATCH_MNT/snap2 >/dev/null
+
+echo
+echo "File digests in the original filesystem:"
+md5sum $SCRATCH_MNT/snap1/foo | _filter_scratch
+md5sum $SCRATCH_MNT/snap1/bar | _filter_scratch
+md5sum $SCRATCH_MNT/snap2/foo | _filter_scratch
+md5sum $SCRATCH_MNT/snap2/bar | _filter_scratch
+
+echo
+echo "File snap1/foo fiemap results in the original filesystem:"
+$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap1/foo | _filter_fiemap
+echo
+echo "File snap1/bar fiemap results in the original filesystem:"
+$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap1/bar | _filter_fiemap
+echo
+echo "File snap2/foo fiemap results in the original filesystem:"
+$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap2/foo | _filter_fiemap
+echo
+echo "File snap2/bar fiemap results in the original filesystem:"
+$XFS_IO_PROG -r -c "fiemap -v" $SCRATCH_MNT/snap2/bar | _filter_fiemap
+echo
+
+# Create the send streams to apply later on a new filesystem.
+$BTRFS_UTIL_PROG send $SCRATCH_MNT/snap1 -f $send_files_dir/1.snap 2>&1 \
+   | _filter_scratch
+$BTRFS_UTIL_PROG send -p $SCRATCH_MNT/snap1 $SCRATCH_MNT/snap2 \
+   -f $send_files_dir/2.snap 2>&1 | _filter_scratch
+
+# Create a new filesystem, receive the send streams and verify that the file
+# contents are the same as in the original filesystem and that the file holes
+# exists in both snapshots.
+_scratch_unmount
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $send_files_dir/1.snap >/dev/null
+$BTRFS_UTIL_PROG receive $SCRATCH_MNT -f $send_files_dir/2.snap >/dev/null
+
+echo
+echo "File digests in the new 

[PATCH] Btrfs: incremental send, fix unnecessary hole writes for sparse files

2017-02-14 Thread fdmanana
From: Filipe Manana 

When using the NO_HOLES feature, during an incremental send we often issue
write operations for holes when we should not, because that range is already
a hole in the destination snapshot. While that does not change the contents
of the file at the receiver, it avoids preservation of file holes, leading
to wasted disk space and extra IO during send/receive.

A couple examples where the holes are not preserved follows.

 $ mkfs.btrfs -O no-holes -f /dev/sdb
 $ mount /dev/sdb /mnt
 $ xfs_io -f -c "pwrite -S 0xaa 0 4K" /mnt/foo
 $ xfs_io -f -c "pwrite -S 0xaa 0 4K" -c "pwrite -S 0xbb 1028K 4K" /mnt/bar
 $ btrfs subvolume snapshot -r /mnt /mnt/snap1

 # Now add one new extent to our first test file, increasing its size and
 # leaving a 1Mb hole between the first extent and this new extent.
 $ xfs_io -c "pwrite -S 0xbb 1028K 4K" /mnt/foo

 # Now overwrite the last extent of our second test file.
 $ xfs_io -c "pwrite -S 0xcc 1028K 4K" /mnt/bar

 $ btrfs subvolume snapshot -r /mnt /mnt/snap2

 $ xfs_io -r -c "fiemap -v" /mnt/snap2/foo
 /mnt/snap2/foo:
 EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
   0: [0..7]:  25088..25095 8 0x2000
   1: [8..2055]:   hole  2048
   2: [2056..2063]:24576..24583 8 0x2001

 $ xfs_io -r -c "fiemap -v" /mnt/snap2/bar
 /mnt/snap2/bar:
 EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
   0: [0..7]:  25096..25103 8 0x2000
   1: [8..2055]:   hole  2048
   2: [2056..2063]:24584..24591 8 0x2001

  $ btrfs send /mnt/snap1 -f /tmp/1.snap
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.snap

  $ umount /mnt
  # It's not relevant to enable no-holes in the new filesystem.
  $ mkfs.btrfs -O no-holes -f /dev/sdc
  $ mount /dev/sdc /mnt
  $ btrfs receive /mnt -f /tmp/1.snap
  $ btrfs receive /mnt -f /tmp/2.snap

  $ xfs_io -r -c "fiemap -v" /mnt/snap2/foo
  /mnt/snap2/foo:
  EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
0: [0..7]:  24576..24583 8 0x2000
1: [8..2063]:   25624..27679  2056   0x1

  $ xfs_io -r -c "fiemap -v" /mnt/snap2/bar
  /mnt/snap2/bar:
  EXT: FILE-OFFSET  BLOCK-RANGE  TOTAL FLAGS
0: [0..7]:  24584..24591 8 0x2000
1: [8..2063]:   27680..29735  2056   0x1

The holes do not exist in the second filesystem and they were replaced
with extents filled with the byte 0x00, making each file take 1032Kb of
space instead of 8Kb.

So fix this by not issuing the write operations consisting of buffers
filled with the byte 0x00 when the destination snapshot already has a
hole for the respective range.

A test case for fstests will follow soon.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 88 +++--
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 712922e..456c890 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -5306,6 +5306,81 @@ static int get_last_extent(struct send_ctx *sctx, u64 
offset)
return ret;
 }
 
+static int range_is_hole_in_parent(struct send_ctx *sctx,
+  const u64 start,
+  const u64 end)
+{
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct btrfs_root *root = sctx->parent_root;
+   u64 search_start = start;
+   int ret;
+
+   path = alloc_path_for_send();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = sctx->cur_ino;
+   key.type = BTRFS_EXTENT_DATA_KEY;
+   key.offset = search_start;
+   ret = btrfs_search_slot(NULL, root, , path, 0, 0);
+   if (ret < 0)
+   goto out;
+   if (ret > 0 && path->slots[0] > 0)
+   path->slots[0]--;
+
+   while (search_start < end) {
+   struct extent_buffer *leaf = path->nodes[0];
+   int slot = path->slots[0];
+   struct btrfs_file_extent_item *fi;
+   u64 extent_end;
+
+   if (slot >= btrfs_header_nritems(leaf)) {
+   ret = btrfs_next_leaf(root, path);
+   if (ret < 0)
+   goto out;
+   else if (ret > 0)
+   break;
+   continue;
+   }
+
+   btrfs_item_key_to_cpu(leaf, , slot);
+   if (key.objectid < sctx->cur_ino ||
+   key.type < BTRFS_EXTENT_DATA_KEY)
+   goto next;
+   if (key.objectid > sctx->cur_ino ||
+   key.type > BTRFS_EXTENT_DATA_KEY ||
+   key.offset >= end)
+   break;
+
+   fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(leaf, fi) ==
+   BTRFS_FILE_EXTENT_INLINE) {
+   u64 

[PATCH] Btrfs: fix data loss after truncate when using the no-holes feature

2017-02-14 Thread fdmanana
From: Filipe Manana 

If we have a file with an implicit hole (NO_HOLES feature enabled) that
has an extent following the hole, delayed writes against regions of the
file behind the hole happened before but were not yet flushed and then
we truncate the file to a smaller size that lies inside the hole, we
end up persisting a wrong disk_i_size value for our inode that leads to
data loss after umounting and mounting again the filesystem or after
the inode is evicted and loaded again.

This happens because at inode.c:btrfs_truncate_inode_items() we end up
setting last_size to the offset of the extent that we deleted and that
followed the hole. We then pass that value to btrfs_ordered_update_i_size()
which updates the inode's disk_i_size to a value smaller then the offset
of the buffered (delayed) writes.

Example reproducer:

 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt

 $ xfs_io -f -c "pwrite -S 0x01 0K 32K" /mnt/foo
 $ xfs_io -d -c "pwrite -S 0x02 -b 32K 64K 32K" /mnt/foo
 $ xfs_io -c "truncate 60K" /mnt/foo
   --> inode's disk_i_size updated to 0

 $ md5sum /mnt/foo
 3c5ca3c3ab42f4b04d7e7eb0b0d4d806  /mnt/foo

 $ umount /dev/sdb
 $ mount /dev/sdb /mnt

 $ md5sum /mnt/foo
 d41d8cd98f00b204e9800998ecf8427e  /mnt/foo
   --> Empty file, all data lost!

Cc:   # 3.14+
Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole extents")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/inode.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8eb6703..a0a38ce 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4703,8 +4703,12 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle 
*trans,
btrfs_abort_transaction(trans, ret);
}
 error:
-   if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID)
+   if (root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID) {
+   if (last_size > new_size &&
+   btrfs_fs_incompat(fs_info, NO_HOLES))
+   last_size = new_size;
btrfs_ordered_update_i_size(inode, last_size, NULL);
+   }
 
btrfs_free_path(path);
 
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: relocation: Enhance kernel error output for relocation

2017-02-14 Thread Filipe Manana
On Tue, Feb 14, 2017 at 12:57 AM, Qu Wenruo  wrote:
> When balance(relocation) fails, btrfs-progs will report like:
>
> ERROR: error during balancing '/mnt/scratch': Input/output error
> There may be more info in syslog - try dmesg | tail
>
> However kernel can't provide may useful info in many cases to locate the
> problem.
>
> This patch will add error messages in relocation to help user and
> developer to locate the problem.
>
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Fix typo where 'err' and 'ret' are used wrong.
> ---
>  fs/btrfs/relocation.c | 60 
> +++
>  1 file changed, 56 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 379711048fb0..d26809807c1b 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4011,6 +4011,8 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
> rc->block_rsv, rc->block_rsv->size,
> BTRFS_RESERVE_FLUSH_ALL);
> if (ret) {
> +   btrfs_err(fs_info, "failed to reserve space: %d(%s)",

Please leave a space between %d and the opening parenthesis (a common
style you follow but it's not correct in most western languages afaik,
and makes things harder to the eye).

> + ret, btrfs_decode_error(ret));
> err = ret;
> break;
> }
> @@ -4019,6 +4021,9 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
> if (IS_ERR(trans)) {
> err = PTR_ERR(trans);
> trans = NULL;
> +   btrfs_err(fs_info,
> + "failed to start transaction: %d(%s)",

Same, space before parenthesis missing.

> + err, btrfs_decode_error(err));
> break;
> }
>  restart:
> @@ -4028,8 +4033,11 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
> }
>
> ret = find_next_extent(rc, path, );
> -   if (ret < 0)
> +   if (ret < 0) {
> +   btrfs_err(fs_info, "failed to find next extent: 
> %d(%s)",

Same, space before parenthesis missing.

> + ret, btrfs_decode_error(ret));
> err = ret;
> +   }
> if (ret != 0)
> break;
>
> @@ -4081,9 +4089,17 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>
> if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
> ret = add_tree_block(rc, , path, );
> +   if (ret < 0)
> +   btrfs_err(fs_info,
> + "failed to record tree block: 
> %d(%s)",
> + ret, btrfs_decode_error(ret));
> } else if (rc->stage == UPDATE_DATA_PTRS &&
>(flags & BTRFS_EXTENT_FLAG_DATA)) {
> ret = add_data_references(rc, , path, );
> +   if (ret < 0)
> +   btrfs_err(fs_info,
> + "failed to record data extent: 
> %d(%s)",

Same, space before parenthesis missing.

> + ret, btrfs_decode_error(ret));
> } else {
> btrfs_release_path(path);
> ret = 0;
> @@ -4103,6 +4119,9 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
> rc->backref_cache.last_trans = trans->transid 
> - 1;
>
> if (ret != -EAGAIN) {
> +   btrfs_err(fs_info,
> +   "faild to relocate tree blocks: 
> %d(%s)",

faild -> failed

Same, space before parenthesis missing.

> + ret, 
> btrfs_decode_error(ret));
> err = ret;
> break;
> }
> @@ -4121,6 +4140,9 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
> ret = relocate_data_extent(rc->data_inode,
>, >cluster);
> if (ret < 0) {
> +   btrfs_err(fs_info,
> +   "failed to relocate data extent: %d(%s)",

Same, space before parenthesis missing.

> + ret, btrfs_decode_error(ret));
> 

Re: [PATCH] btrfs: make space cache inode readahead failure nonfatal

2017-02-14 Thread Liu Bo
On Mon, Feb 13, 2017 at 10:53:12AM +0100, David Sterba wrote:
> We do a readahead of the free space cache inode to speed things up but
> the failure is not fatal, like in other readahead cases. Proper reads
> would need to happen anyway and any errors would be caught there.
>

Reviewed-by: Liu Bo 

Thanks,

-liubo
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/free-space-cache.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 437580e84d9d..e96e981ef01e 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -286,14 +286,14 @@ int btrfs_truncate_free_space_cache(struct btrfs_root 
> *root,
>   return ret;
>  }
>  
> -static int readahead_cache(struct inode *inode)
> +static void readahead_cache(struct inode *inode)
>  {
>   struct file_ra_state *ra;
>   unsigned long last_index;
>  
>   ra = kzalloc(sizeof(*ra), GFP_NOFS);
>   if (!ra)
> - return -ENOMEM;
> + return;
>  
>   file_ra_state_init(ra, inode->i_mapping);
>   last_index = (i_size_read(inode) - 1) >> PAGE_SHIFT;
> @@ -301,8 +301,6 @@ static int readahead_cache(struct inode *inode)
>   page_cache_sync_readahead(inode->i_mapping, ra, NULL, 0, last_index);
>  
>   kfree(ra);
> -
> - return 0;
>  }
>  
>  static int io_ctl_init(struct btrfs_io_ctl *io_ctl, struct inode *inode,
> @@ -730,9 +728,7 @@ static int __load_free_space_cache(struct btrfs_root 
> *root, struct inode *inode,
>   if (ret)
>   return ret;
>  
> - ret = readahead_cache(inode);
> - if (ret)
> - goto out;
> + readahead_cache(inode);
>  
>   ret = io_ctl_prepare_pages(_ctl, inode, 1);
>   if (ret)
> -- 
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: relocation: Enhance kernel error output for relocation

2017-02-14 Thread Liu Bo
On Tue, Feb 14, 2017 at 08:57:50AM +0800, Qu Wenruo wrote:
> When balance(relocation) fails, btrfs-progs will report like:
> 
> ERROR: error during balancing '/mnt/scratch': Input/output error
> There may be more info in syslog - try dmesg | tail
> 
> However kernel can't provide may useful info in many cases to locate the
> problem.
> 
> This patch will add error messages in relocation to help user and
> developer to locate the problem.

Reviewed-by: Liu Bo 

Thanks,

-liubo
> 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Fix typo where 'err' and 'ret' are used wrong.
> ---
>  fs/btrfs/relocation.c | 60 
> +++
>  1 file changed, 56 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 379711048fb0..d26809807c1b 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4011,6 +4011,8 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>   rc->block_rsv, rc->block_rsv->size,
>   BTRFS_RESERVE_FLUSH_ALL);
>   if (ret) {
> + btrfs_err(fs_info, "failed to reserve space: %d(%s)",
> +   ret, btrfs_decode_error(ret));
>   err = ret;
>   break;
>   }
> @@ -4019,6 +4021,9 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>   if (IS_ERR(trans)) {
>   err = PTR_ERR(trans);
>   trans = NULL;
> + btrfs_err(fs_info,
> +   "failed to start transaction: %d(%s)",
> +   err, btrfs_decode_error(err));
>   break;
>   }
>  restart:
> @@ -4028,8 +4033,11 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>   }
>  
>   ret = find_next_extent(rc, path, );
> - if (ret < 0)
> + if (ret < 0) {
> + btrfs_err(fs_info, "failed to find next extent: %d(%s)",
> +   ret, btrfs_decode_error(ret));
>   err = ret;
> + }
>   if (ret != 0)
>   break;
>  
> @@ -4081,9 +4089,17 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>  
>   if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
>   ret = add_tree_block(rc, , path, );
> + if (ret < 0)
> + btrfs_err(fs_info,
> +   "failed to record tree block: %d(%s)",
> +   ret, btrfs_decode_error(ret));
>   } else if (rc->stage == UPDATE_DATA_PTRS &&
>  (flags & BTRFS_EXTENT_FLAG_DATA)) {
>   ret = add_data_references(rc, , path, );
> + if (ret < 0)
> + btrfs_err(fs_info,
> +   "failed to record data extent: 
> %d(%s)",
> +   ret, btrfs_decode_error(ret));
>   } else {
>   btrfs_release_path(path);
>   ret = 0;
> @@ -4103,6 +4119,9 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>   rc->backref_cache.last_trans = trans->transid - 
> 1;
>  
>   if (ret != -EAGAIN) {
> + btrfs_err(fs_info,
> + "faild to relocate tree blocks: %d(%s)",
> +   ret, btrfs_decode_error(ret));
>   err = ret;
>   break;
>   }
> @@ -4121,6 +4140,9 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>   ret = relocate_data_extent(rc->data_inode,
>  , >cluster);
>   if (ret < 0) {
> + btrfs_err(fs_info,
> + "failed to relocate data extent: %d(%s)",
> +   ret, btrfs_decode_error(ret));
>   err = ret;
>   break;
>   }
> @@ -4147,8 +4169,12 @@ static noinline_for_stack int 
> relocate_block_group(struct reloc_control *rc)
>   if (!err) {
>   ret = relocate_file_extent_cluster(rc->data_inode,
>  >cluster);
> - if (ret < 0)
> + if (ret < 0) {
> + btrfs_err(fs_info,
> + "failed to relocate 

Re: [PATCH 28/29] btrfs: remove unused parameters from __btrfs_write_out_cache

2017-02-14 Thread Liu Bo
On Mon, Feb 13, 2017 at 10:34:47AM +0100, David Sterba wrote:
> Both unused after the call to update_cache_item has been moved to
> __btrfs_wait_cache_io.
> 

Reviewed-by: Liu Bo 

Thanks,

-liubo
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/free-space-cache.c | 11 +++
>  1 file changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
> index 0d02599d22bc..c169acef2a23 100644
> --- a/fs/btrfs/free-space-cache.c
> +++ b/fs/btrfs/free-space-cache.c
> @@ -1224,8 +1224,6 @@ int btrfs_wait_cache_io(struct btrfs_trans_handle 
> *trans,
>   * @ctl - the free space cache we are going to write out
>   * @block_group - the block_group for this cache if it belongs to a 
> block_group
>   * @trans - the trans handle
> - * @path - the path to use
> - * @offset - the offset for the key we'll insert
>   *
>   * This function writes out a free space cache struct to disk for quick 
> recovery
>   * on mount.  This will return 0 if it was successful in writing the cache 
> out,
> @@ -1235,8 +1233,7 @@ static int __btrfs_write_out_cache(struct btrfs_root 
> *root, struct inode *inode,
>  struct btrfs_free_space_ctl *ctl,
>  struct btrfs_block_group_cache *block_group,
>  struct btrfs_io_ctl *io_ctl,
> -struct btrfs_trans_handle *trans,
> -struct btrfs_path *path, u64 offset)
> +struct btrfs_trans_handle *trans)
>  {
>   struct btrfs_fs_info *fs_info = root->fs_info;
>   struct extent_state *cached_state = NULL;
> @@ -1394,8 +1391,7 @@ int btrfs_write_out_cache(struct btrfs_fs_info *fs_info,
>   return 0;
>  
>   ret = __btrfs_write_out_cache(root, inode, ctl, block_group,
> -   _group->io_ctl, trans,
> -   path, block_group->key.objectid);
> +   _group->io_ctl, trans);
>   if (ret) {
>  #ifdef DEBUG
>   btrfs_err(fs_info,
> @@ -3542,8 +3538,7 @@ int btrfs_write_out_ino_cache(struct btrfs_root *root,
>   return 0;
>  
>   memset(_ctl, 0, sizeof(io_ctl));
> - ret = __btrfs_write_out_cache(root, inode, ctl, NULL, _ctl,
> -   trans, path, 0);
> + ret = __btrfs_write_out_cache(root, inode, ctl, NULL, _ctl, trans);
>   if (!ret) {
>   /*
>* At this point writepages() didn't error out, so our metadata
> -- 
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to dump/find parity of RAID-5 file?

2017-02-14 Thread Lakshmipathi.G
On Mon, Feb 06, 2017 at 09:40:47PM +0100, Goffredo Baroncelli wrote:
> 
> IIRC, the parity is spread across the disk stripes of the chunk.
> 
> So first you have to find the logical-offset [LO] where the the file begins. 
> Then you have to map this offset to the chunk which holds the data. The chunk 
> has the following info:
> - chunk start [CS], chunk length [CL]
> - for each stripe:
>   where the stripe starts
> 
> If you subtract the chunk-start from the logical-offset [ CO == LO-CS], you 
> will find the offset where the data belongs in the chunk.
> 
> As stated above, the PARITY is spread across the chunk stripes. So (supposing 
> that the stripe size is 64K, the raid level is 5, the disks are three), 
> 
> - the first 64k of stripe 0, is data [0..64K)
> - the first 64k of stripe 1, is data [64..128K)
> - the first 64k of stripe 2 is parity, 
> 
> - the 2nd 64k of stripe 0 is parity, 
> - the 2nd 64k of stripe 1, is data [128..196K)
> - the 2nd 64k of stripe 2, is data [192..256K)
> 
> - the 3rd 64k of stripe 0, is data [256..320K)
> - the 3rd 64k of stripe 1 is parity, 
> - the 3rd 64k of stripe 2, is data [320..384K)
> and so on,
> 
> To find the data, You have to compare the CO to the data [...) range.
> 
> If you look to an my old patch (unfinished :-( ), you can find some example 
> to dump the different stripe
> 
> [BTRFS-PROGS][PATCH][V2] Add two new commands: 'btrfs insp physical-find' and 
> 'btrfs insp physical-dump'
> 
> 
Sorry for the delay, I was offline. Thanks for the details. I can understood 
"partiy spread across the chunk stripes" part.
But unable to figure-out the first part regarding calculations.

Raid5 With 3-devices each 512MB. Create single 128KB file("print 
'Ab'+'a'*65534+'aB'+'b'*65533"). 'debug-tree' shows chunk tree as:

item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 145096704) itemoff 15557 
itemsize 144
length 134217728 owner 2 stripe_len 65536 type DATA|RAID5
io_align 65536 io_width 65536 sector_size 4096
num_stripes 3 sub_stripes 0
stripe 0 devid 3 offset 6368
dev_uuid 9a2a18f1-6193-44b9-aafc-23d161d66110
stripe 1 devid 2 offset 6368
dev_uuid e45ab907-c3a8-4dff-af9f-2ae5fd38ffd6
stripe 2 devid 1 offset 83034112
dev_uuid 428c04d9-37da-454a-b7b2-f6fe88580de2
and fs-tree shows:
item 13 key (145227776 EXTENT_ITEM 131072) itemoff 15788 itemsize 53
extent refs 1 gen 7 flags DATA
extent data backref root 5 objectid 257 offset 0 count 1

>From above, I assume: 
LO=145227776  CS=145096704 and CL=134217728
CO=145227776 - 145096704  => CO = 131072

Quite confused from here :s  I'll look into your patches to understand more. I 
hope sometime in future we will 
have your finished patches :) 'physical-find' and 'physical-find' commands will 
be really useful for debugging/testing and 
learning purposes. thanks.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to dump/find parity of RAID-5 file?

2017-02-14 Thread Lakshmipathi.G
On Mon, Feb 06, 2017 at 09:36:58AM +0800, Qu Wenruo wrote:
> 
> Please note the following things when calculating RAID5/6 data and P/Q
> location:
> 
> 1) Basic layout
> The chunk stripe only shows the *first* full stripe layout.
> And it always follows the sequence of "Data0, Data1, ... Data N, Parity"
> 
> And one full stripe is consistent of N * 64K (fixed yet).
> 
> 
> 2) Device rotation
> RAID5/6 do device rotation.
> So the *second* full stripe will have the following layout:
> "Data 1, Data2, ... Data N, Parity, Data 0"
> 
> 
> You can refer to my offline-scrub patchset to see how it assemble the
> RAID5/6 mapping, which I believe it is easier than current btrfs_map_block()
> implementation.
> [PATCH v2 19/19] btrfs-progs: fsck: Introduce offline scrub function
> 
> Thanks,
> Qu
Sorry for the delay in response, I was offline. Thanks for the details. I think 
now I have 
better idea about (1)Basic layout  and (2)Device rotation. Will look into 
offline-scrub patches 
to explore more about exact mapping and on-disk layouts. I'm trying to 
understand layout for
simple Raid-5 with 3-drives with 128KB file-size. Will try to understand Raid-6 
little later. 
thanks.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/29] btrfs: remove unused parameters from btrfs_cmp_data

2017-02-14 Thread Liu Bo
On Mon, Feb 13, 2017 at 10:34:35AM +0100, David Sterba wrote:
> After the page locking has been reworked, we get all pages prepared via
> cmp_pages.
>

Reviewed-by: Liu Bo 

Thanks,

-liubo
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/ioctl.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 91188a2ac5a1..1f13f8416d29 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3053,8 +3053,7 @@ static int btrfs_cmp_data_prepare(struct inode *src, 
> u64 loff,
>   return 0;
>  }
>  
> -static int btrfs_cmp_data(struct inode *src, u64 loff, struct inode *dst,
> -   u64 dst_loff, u64 len, struct cmp_pages *cmp)
> +static int btrfs_cmp_data(u64 len, struct cmp_pages *cmp)
>  {
>   int ret = 0;
>   int i;
> @@ -3221,7 +3220,7 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>   }
>  
>   /* pass original length for comparison so we stay within i_size */
> - ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, );
> + ret = btrfs_cmp_data(olen, );
>   if (ret == 0)
>   ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
>  
> -- 
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FS gives kernel UPS on attempt to create snapshot and after running balance it's unmountable.

2017-02-14 Thread Tomasz Kusmierz
[root@server ~]#  btrfs-show-super -af /dev/sdc
superblock: bytenr=65536, device=/dev/sdc
-
csum_type   0 (crc32c)
csum_size   4
csum0x17d56ce0 [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
fsid0576d577-8954-4a60-a02b-9492b3c29318
label   main_pool
generation  150682
root5223857717248
sys_array_size  321
chunk_root_generation   150678
root_level  1
chunk_root  8669488005120
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 16003191472128
bytes_used  6411278503936
sectorsize  4096
nodesize16384
leafsize16384
stripesize  4096
root_dir6
num_devices 8
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x161
( MIXED_BACKREF |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA )
cache_generation150682
uuid_tree_generation150679
dev_item.uuid   46abffa8-7afe-451f-93c6-abb8e589c4e8
dev_item.fsid   0576d577-8954-4a60-a02b-9492b3c29318 [match]
dev_item.type   0
dev_item.total_bytes2000398934016
dev_item.bytes_used 1647136735232
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 8669487824896)
length 67108864 owner 2 stripe_len 65536 type SYSTEM|RAID10
io_align 65536 io_width 65536 sector_size 4096
num_stripes 8 sub_stripes 2
stripe 0 devid 7 offset 1083674984448
dev_uuid 566fb8a3-d6de-4230-8b70-a5fda0a120f6
stripe 1 devid 8 offset 1083674984448
dev_uuid 845aefb2-e0a6-479a-957b-a82fb7207d6c
stripe 2 devid 1 offset 1365901312
dev_uuid 46abffa8-7afe-451f-93c6-abb8e589c4e8
stripe 3 devid 3 offset 1345978368
dev_uuid 95921633-2fc1-479f-a3ba-e6e5a1989755
stripe 4 devid 4 offset 1345978368
dev_uuid 20828f0e-4661-4987-ac11-72814c1e423a
stripe 5 devid 5 offset 1345978368
dev_uuid 2c3cd71f-5178-48e7-8032-6b6eec023197
stripe 6 devid 6 offset 1345978368
dev_uuid 806a47e5-cac4-41c9-abb9-5c49506459e1
stripe 7 devid 2 offset 1345978368
dev_uuid e1358e0e-edaf-4505-9c71-ed0862c45841
backup_roots[4]:
backup 0:
backup_tree_root:   5223857717248   gen: 150680 level: 1
backup_chunk_root:  8669488005120   gen: 150678 level: 1
backup_extent_root: 5223867383808   gen: 150680 level: 2
backup_fs_root: 0   gen: 0  level: 0
backup_dev_root:5224791523328   gen: 150680 level: 1
backup_csum_root:   5224802140160   gen: 150680 level: 3
backup_total_bytes: 16003191472128
backup_bytes_used:  6411278503936
backup_num_devices: 8

backup 1:
backup_tree_root:   5224155807744   gen: 150681 level: 1
backup_chunk_root:  8669488005120   gen: 150678 level: 1
backup_extent_root: 5224156233728   gen: 150681 level: 2
backup_fs_root: 0   gen: 0  level: 0
backup_dev_root:5224633155584   gen: 150681 level: 1
backup_csum_root:   5224634941440   gen: 150681 level: 3
backup_total_bytes: 16003191472128
backup_bytes_used:  6411278503936
backup_num_devices: 8

backup 2:
backup_tree_root:   5223857717248   gen: 150682 level: 1
backup_chunk_root:  8669488005120   gen: 150678 level: 1
backup_extent_root: 5223867383808   gen: 150682 level: 2
backup_fs_root: 0   gen: 0  level: 0
backup_dev_root:5224622358528   gen: 150682 level: 1
backup_csum_root:   5224675344384   gen: 150682 level: 3
backup_total_bytes: 16003191472128
backup_bytes_used:  6411278503936
   

Re: [PATCH 21/29] btrfs: remove unused parameter from create_snapshot

2017-02-14 Thread Liu Bo
On Mon, Feb 13, 2017 at 10:34:26AM +0100, David Sterba wrote:
> The name parameters have never been used, as the name is passed via the
> dentry.
>

Reviewed-by: Liu Bo 

Thanks,

-liubo
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/ioctl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index e8c7c313c113..91188a2ac5a1 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -656,7 +656,7 @@ static void btrfs_wait_for_no_snapshoting_writes(struct 
> btrfs_root *root)
>  }
>  
>  static int create_snapshot(struct btrfs_root *root, struct inode *dir,
> -struct dentry *dentry, char *name, int namelen,
> +struct dentry *dentry,
>  u64 *async_transid, bool readonly,
>  struct btrfs_qgroup_inherit *inherit)
>  {
> @@ -871,7 +871,7 @@ static noinline int btrfs_mksubvol(const struct path 
> *parent,
>   goto out_up_read;
>  
>   if (snap_src) {
> - error = create_snapshot(snap_src, dir, dentry, name, namelen,
> + error = create_snapshot(snap_src, dir, dentry,
>   async_transid, readonly, inherit);
>   } else {
>   error = create_subvol(dir, dentry, name, namelen,
> -- 
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/29] btrfs: remove unused parameter from extent_write_cache_pages

2017-02-14 Thread Liu Bo
On Mon, Feb 13, 2017 at 10:34:13AM +0100, David Sterba wrote:
> The 'tree' was used to call locking hook that does not exist anymore.
> 
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/extent_io.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f683fa5a4b91..22a2f2fa62c7 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3916,8 +3916,7 @@ int btree_write_cache_pages(struct address_space 
> *mapping,
>   * WB_SYNC_ALL then we were called for data integrity and we must wait for
>   * existing IO to complete.
>   */
> -static int extent_write_cache_pages(struct extent_io_tree *tree,
> -  struct address_space *mapping,
> +static int extent_write_cache_pages(struct address_space *mapping,
>struct writeback_control *wbc,
>writepage_t writepage, void *data,
>void (*flush_fn)(void *))
> @@ -4158,8 +4157,7 @@ int extent_writepages(struct extent_io_tree *tree,
>   .bio_flags = 0,
>   };
>  
> - ret = extent_write_cache_pages(tree, mapping, wbc,
> -__extent_writepage, ,
> + ret = extent_write_cache_pages(mapping, wbc, __extent_writepage, ,
>  flush_write_bio);

Are we going to leave {btrfs,extent}_{read,write}pages untouched?

Thanks,

-liubo

>   flush_epd_write_bio();
>   return ret;
> -- 
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/29] btrfs: remove unused parameter from submit_extent_page

2017-02-14 Thread Liu Bo
On Mon, Feb 13, 2017 at 10:34:04AM +0100, David Sterba wrote:
> This used to hold number of maximum pages to allocate, but this is now
> limited by BIO_MAX_PAGES.
> 
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/extent_io.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 9df6ed30de00..00f3be5c4f2d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2759,7 +2759,6 @@ static int submit_extent_page(int op, int op_flags, 
> struct extent_io_tree *tree,
> size_t size, unsigned long offset,
> struct block_device *bdev,
> struct bio **bio_ret,
> -   unsigned long max_pages,
> bio_end_io_t end_io_func,
> int mirror_num,
> unsigned long prev_bio_flags,
> @@ -3063,7 +3062,7 @@ static int __do_readpage(struct extent_io_tree *tree,
>   pnr -= page->index;
>   ret = submit_extent_page(REQ_OP_READ, read_flags, tree, NULL,
>page, sector, disk_io_size, pg_offset,
> -  bdev, bio, pnr,
> +  bdev, bio,
>end_bio_extent_readpage, mirror_num,
>*bio_flags,
>this_bio_flag,

@pnr is only used here, it can be removed, too.

> @@ -3434,7 +3433,7 @@ static noinline_for_stack int 
> __extent_writepage_io(struct inode *inode,
>  
>   ret = submit_extent_page(REQ_OP_WRITE, write_flags, tree, wbc,
>page, sector, iosize, pg_offset,
> -  bdev, >bio, max_nr,
> +  bdev, >bio,
>end_bio_extent_writepage,
>0, 0, 0, false);
>   if (ret) {

So does @max_nr.

With removing those two,

Reviewed-by: Liu Bo 

Thanks,

-liubo
> @@ -3751,7 +3750,7 @@ static noinline_for_stack int write_one_eb(struct 
> extent_buffer *eb,
>   set_page_writeback(p);
>   ret = submit_extent_page(REQ_OP_WRITE, write_flags, tree, wbc,
>p, offset >> 9, PAGE_SIZE, 0, bdev,
> -  >bio, -1,
> +  >bio,
>end_bio_extent_buffer_writepage,
>0, epd->bio_flags, bio_flags, false);
>   epd->bio_flags = bio_flags;
> -- 
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use helper to get used bytes of space_info

2017-02-14 Thread Liu Bo
On Tue, Feb 14, 2017 at 04:56:06PM +0100, David Sterba wrote:
> On Mon, Feb 13, 2017 at 03:42:21PM -0800, Liu Bo wrote:
> > This uses a helper instead of open code around used byte of space_info
> > everywhere.
> > 
> > Signed-off-by: Liu Bo 
> 
> Reviewed-by: David Sterba 
> 
> > ---
> >  fs/btrfs/extent-tree.c | 42 --
> >  1 file changed, 20 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 097fa4a..b2b0b82 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -4118,6 +4118,15 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, 
> > int data)
> > return ret;
> >  }
> >  
> > +static u64 btrfs_space_info_used(struct btrfs_space_info *s_info,
> > +bool may_used_included)
> > +{
> > +   ASSERT(s_info);
> > +   return s_info->bytes_used + s_info->bytes_reserved +
> > +   s_info->bytes_pinned + s_info->bytes_readonly +
> > +   ((may_used_included) ? s_info->bytes_may_use : 0);
> 
> I'd rename may_used_included to 'may_use_included' so it matches 
> bytes_may_use.

I agree, it sounds better.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/29] btrfs: remove unused parameter from btrfs_check_super_valid

2017-02-14 Thread Liu Bo
On Tue, Feb 14, 2017 at 12:42:07PM +0100, David Sterba wrote:
> On Mon, Feb 13, 2017 at 06:11:56PM -0800, Liu Bo wrote:
> > On Mon, Feb 13, 2017 at 10:33:55AM +0100, David Sterba wrote:
> > > None of the checks need to know the RO/RW status.
> > >
> > 
> > OK...there was a readonly check, which lets us skip all checks,
> > it was removed by the below commit, should we add the check back?
> 
> All the current checks do not modify the superblock, so it's read-only
> and we can keep it as-is. The superblock is available from inside the
> function anyway so we don't need to check sb_flags externally. I'll
> update the changelog.
Yes, this makes sense.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unexpected behavior involving file attributes and snapshots.

2017-02-14 Thread Austin S. Hemmelgarn

On 2017-02-14 11:46, Austin S. Hemmelgarn wrote:

On 2017-02-14 11:07, Chris Murphy wrote:

On Tue, Feb 14, 2017 at 8:30 AM, Austin S. Hemmelgarn
 wrote:

I was just experimenting with snapshots on 4.9.0, and came across some
unexpected behavior.

The simple explanation is that if you snapshot a subvolume, any files
in the
subvolume that have the NOCOW attribute will not have that attribute
in the
snapshot.  Some further testing indicates that this is the only file
attribute that isn't preserved (I checked all the chattr flags that
BTRFS
supports).


Huh, I can't reproduce this with 4.9.8 or 4.10rc7. systemd sets
journal files with chattr +C, and I do manual snapshots of rootfs
periodically, and those snapshots have journal files that have +C
still set.


Just tested on a different filesystem, and I'm not seeing it there
either, I'll take a closer look at the FS I saw this on and see if I can
figure out what's up.

After poking around a bit further, the system crashed, and it looks like 
there was some memory corruption scattered throughout the kernel from 
one of the other modules I had loaded (now I get to spend a day or more 
figuring out which one and reporting that bug).  Given the state of the 
kernel crash dump though, I'm actually somewhat surprised that things 
weren't misbehaving more spectacularly than this.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unexpected behavior involving file attributes and snapshots.

2017-02-14 Thread Austin S. Hemmelgarn

On 2017-02-14 11:07, Chris Murphy wrote:

On Tue, Feb 14, 2017 at 8:30 AM, Austin S. Hemmelgarn
 wrote:

I was just experimenting with snapshots on 4.9.0, and came across some
unexpected behavior.

The simple explanation is that if you snapshot a subvolume, any files in the
subvolume that have the NOCOW attribute will not have that attribute in the
snapshot.  Some further testing indicates that this is the only file
attribute that isn't preserved (I checked all the chattr flags that BTRFS
supports).


Huh, I can't reproduce this with 4.9.8 or 4.10rc7. systemd sets
journal files with chattr +C, and I do manual snapshots of rootfs
periodically, and those snapshots have journal files that have +C
still set.

Just tested on a different filesystem, and I'm not seeing it there 
either, I'll take a closer look at the FS I saw this on and see if I can 
figure out what's up.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: specify a new ordered extent type for create_io_em

2017-02-14 Thread David Sterba
On Mon, Feb 13, 2017 at 03:35:09PM -0800, Liu Bo wrote:
> As 0 refers to an existing type BTRFS_ORDERED_IO_DONE, this specifies a
> new type 'REGULAR' for regular IO.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unexpected behavior involving file attributes and snapshots.

2017-02-14 Thread Chris Murphy
On Tue, Feb 14, 2017 at 8:30 AM, Austin S. Hemmelgarn
 wrote:
> I was just experimenting with snapshots on 4.9.0, and came across some
> unexpected behavior.
>
> The simple explanation is that if you snapshot a subvolume, any files in the
> subvolume that have the NOCOW attribute will not have that attribute in the
> snapshot.  Some further testing indicates that this is the only file
> attribute that isn't preserved (I checked all the chattr flags that BTRFS
> supports).

Huh, I can't reproduce this with 4.9.8 or 4.10rc7. systemd sets
journal files with chattr +C, and I do manual snapshots of rootfs
periodically, and those snapshots have journal files that have +C
still set.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use helper to get used bytes of space_info

2017-02-14 Thread David Sterba
On Mon, Feb 13, 2017 at 03:42:21PM -0800, Liu Bo wrote:
> This uses a helper instead of open code around used byte of space_info
> everywhere.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 

> ---
>  fs/btrfs/extent-tree.c | 42 --
>  1 file changed, 20 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 097fa4a..b2b0b82 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -4118,6 +4118,15 @@ u64 btrfs_get_alloc_profile(struct btrfs_root *root, 
> int data)
>   return ret;
>  }
>  
> +static u64 btrfs_space_info_used(struct btrfs_space_info *s_info,
> +  bool may_used_included)
> +{
> + ASSERT(s_info);
> + return s_info->bytes_used + s_info->bytes_reserved +
> + s_info->bytes_pinned + s_info->bytes_readonly +
> + ((may_used_included) ? s_info->bytes_may_use : 0);

I'd rename may_used_included to 'may_use_included' so it matches bytes_may_use.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unexpected behavior involving file attributes and snapshots.

2017-02-14 Thread Roman Mamedov
On Tue, 14 Feb 2017 10:30:43 -0500
"Austin S. Hemmelgarn"  wrote:

> I was just experimenting with snapshots on 4.9.0, and came across some 
> unexpected behavior.
> 
> The simple explanation is that if you snapshot a subvolume, any files in 
> the subvolume that have the NOCOW attribute will not have that attribute 
> in the snapshot.  Some further testing indicates that this is the only 
> file attribute that isn't preserved (I checked all the chattr flags that 
> BTRFS supports).
> 
> I'm kind of curious whether:
> 1. This is actually documented somewhere, as it's somewhat unexpected 
> given that everything else is preserved when snapshotting.
> 2. This is intended behavior, or just happens to be a side effect of the 
> implementation.

I don't seem to get this on 4.4.45 and 4.4.47.

$ btrfs sub create test
Create subvolume './test'
$ touch test/abc
$ chattr +C test/abc 
$ echo def > test/abc
$ ls -la test/abc 
-rw-r--r-- 1 rm rm 4 Feb 14 20:52 test/abc
$ lsattr test/abc 
---C test/abc
$ btrfs sub snap test test2
Create a snapshot of 'test' in './test2'
$ ls -la test2/abc 
-rw-r--r-- 1 rm rm 4 Feb 14 20:52 test2/abc
$ lsattr test2/abc 
---C test2/abc

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: try to avoid acquiring free space ctl's lock

2017-02-14 Thread David Sterba
On Mon, Feb 13, 2017 at 03:42:30PM -0800, Liu Bo wrote:
> We don't need to take the lock if the block group has not been cached.
> 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problem: can't mount superblock

2017-02-14 Thread A Mennucc
dear all,

two days ago I got an old 1TB external USB HDD drive.  I scanned it
using smartctl, long scan, no errors.  I created one partition
/dev/sde1, formatted it using BTRFS, and created two subvolumes. One
subvolume was intended for backups.  This morning I started copying some
200GB data on this subvolume, but something went haywire.

I use Ubuntu xenial + backports,
linux kernel 4.8.0-34-generic
btrfs-progs   4.7.3-1

I now see that in the kernel log this line was printed many times

> BTRFS error (device sdd1): bdev /dev/sdd1 errs: wr **, rd 0, flush 0,
> corrupt 0, gen 0
where ** progressed from 0 to 56; after that this line appeared

> BTRFS: error (device sdd1) in btrfs_commit_transaction:2232: errno=-5
> IO failure (Error while writing out transaction)
> BTRFS info (device sdd1): forced readonly
then there was a traceback, and a lot more errors (see attached log).

I then umounted the volume, but I could not remount it. (Yes I tried to
turn it off and on again).

Here are some command and their outputs.

> # mount -t btrfs -o ro,recover /dev/sde1 /mnt
> mount: /dev/sde1: can't read superblock
when I run the above, The kernel log reports
> BTRFS error (device sdd1): bdev /dev/sde1 errs: wr 73, rd 44, flush 0,
> corrupt 0, gen 0
where the "rd" part increases each time

Another command.
> # btrfs check --repair /dev/sde1
> enabling repair mode
> Checking filesystem on /dev/sde1
> UUID: 491f1bb6-a477-4f3d-a066-19cabaa9
> checking extents
> Fixed 0 roots.
> checking free space cache
> cache and super generation don't match, space cache will be invalidated
> checking fs roots
> checking csums
> checking root refs
> found 90068570112 bytes used err is 0
> total csum bytes: 87728132
> total tree bytes: 212680704
> total fs tree bytes: 110411776
> total extent tree bytes: 5750784
> btree space waste bytes: 2363
> file data blocks allocated: 89855889408
>  referenced 89855889408

Note that smartctl reports no errors in the HDD  whatsoever... (
whatever error occurred, it was maybe in the USB transport).

Maybe the next step may be  "btrfs rescue super-recover" but I would
like your advice.

Thanks, a.

ps: please reply to me too, I did not subscribe to this list

Feb 14 09:36:34 localhost kernel: [431222.636024] usb 1-3: new high-speed USB device number 3 using ehci-pci
Feb 14 09:36:34 localhost kernel: [431222.785672] usb 1-3: New USB device found, idVendor=0bc2, idProduct=2321
Feb 14 09:36:34 localhost kernel: [431222.785675] usb 1-3: New USB device strings: Mfr=2, Product=3, SerialNumber=1
Feb 14 09:36:34 localhost kernel: [431222.785676] usb 1-3: Product: Expansion
Feb 14 09:36:34 localhost kernel: [431222.785678] usb 1-3: Manufacturer: Seagate
Feb 14 09:36:34 localhost kernel: [431222.785679] usb 1-3: SerialNumber: NA4BS1FY
Feb 14 09:36:34 localhost kernel: [431222.787890] scsi host7: uas
Feb 14 09:36:34 localhost kernel: [431222.824806] sd 7:0:0:0: Attached scsi generic sg5 type 0
Feb 14 09:36:34 localhost kernel: [431222.825556] sd 7:0:0:0: [sde] Spinning up disk...
Feb 14 09:36:37 localhost kernel: [431223.840036] ...ready
Feb 14 09:36:37 localhost kernel: [431225.888521] sd 7:0:0:0: [sde] 3907029167 512-byte logical blocks: (2.00 TB/1.82 TiB)
Feb 14 09:36:37 localhost kernel: [431225.888524] sd 7:0:0:0: [sde] 2048-byte physical blocks
Feb 14 09:36:37 localhost kernel: [431226.167318] sd 7:0:0:0: [sde] Attached SCSI disk
Feb 14 09:38:19 localhost kernel: [431327.955353] EXT4-fs (sde1): mounted filesystem with ordered data mode. Opts: (null)
Feb 14 10:19:00 localhost kernel: [433769.032880] floppy: error -5 while reading block 0
Feb 14 10:19:21 localhost kernel: [433789.768849] blk_update_request: I/O error, dev fd0, sector 0
Feb 14 10:19:21 localhost kernel: [433789.768873] floppy: error -5 while reading block 0
Feb 14 10:19:27 localhost kernel: [433796.434449] blk_update_request: I/O error, dev sdd, sector 65435512
Feb 14 10:19:28 localhost kernel: [433796.774734] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Feb 14 10:19:28 localhost kernel: [433796.774740] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 03 e6 7a 30 00 00 f0 00
Feb 14 10:19:28 localhost kernel: [433796.774743] blk_update_request: I/O error, dev sdd, sector 65436208
Feb 14 10:19:28 localhost kernel: [433797.007909] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Feb 14 10:19:28 localhost kernel: [433797.007916] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 03 e6 7c 10 00 00 f0 00
Feb 14 10:19:28 localhost kernel: [433797.007919] blk_update_request: I/O error, dev sdd, sector 65436688
Feb 14 10:19:28 localhost kernel: [433797.230799] sd 6:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Feb 14 10:19:28 localhost kernel: [433797.230805] sd 6:0:0:0: [sdd] tag#0 CDB: Write(10) 2a 00 03 e6 7d f0 00 00 f0 00
Feb 14 10:19:28 localhost kernel: [433797.230808] blk_update_request: I/O error, dev sdd, sector 65437168
Feb 14 10:19:28 

Unexpected behavior involving file attributes and snapshots.

2017-02-14 Thread Austin S. Hemmelgarn
I was just experimenting with snapshots on 4.9.0, and came across some 
unexpected behavior.


The simple explanation is that if you snapshot a subvolume, any files in 
the subvolume that have the NOCOW attribute will not have that attribute 
in the snapshot.  Some further testing indicates that this is the only 
file attribute that isn't preserved (I checked all the chattr flags that 
BTRFS supports).


I'm kind of curious whether:
1. This is actually documented somewhere, as it's somewhat unexpected 
given that everything else is preserved when snapshotting.
2. This is intended behavior, or just happens to be a side effect of the 
implementation.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-02-14 Thread Marc Joliet
Hi again,

so, it seems that I've solved the problem:  After having to umount/mount the 
FS several times to get btrfs-cleaner to finish, I thought of the "failed to 
load free space [...], rebuilding it now" type errors and decided to try the 
clear_cache mount option.  Since then, my home server has been running its 
backups regularly again.  Furthermore, I was able back up my desktop again via 
send/recv (the rsync based backup is still running, but I expect it to 
succeed).  The kernel log has also stayed clean.

Kai, I'd be curious whether clear_cache will help in your case, too, if you 
haven't tried it already.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


"btrfs receive" throws "No space left on device" on empty and large enough fs

2017-02-14 Thread Luca Citi

Hi all

I recently submitted a bug report to launchpad ( 
https://bugs.launchpad.net/ubuntu/+source/btrfs-tools/+bug/1664013 ) but 
I now found out about this mailing list, which may be a better place to 
post.


Basically, whenever many files are created at high throughput in my 
empty btrfs file system, it throws an ENOSPC error.I have had this 
happen with both "btrfs receive" and with "rsync" but not when creating 
a single large file. Both "rsync" and "btrfs receive" fail not when 
creating a new file but when renaming/moving it (see link).


I hope the description of the error in launchpad is helpful (please let 
me know if I need to re-post it here).
I am happy to provide additional information or run other tests if this 
can help.


Thanks!
Luca
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] btrfs: ulist: make the finalization function public

2017-02-14 Thread David Sterba
On Tue, Feb 14, 2017 at 05:29:14PM +0800, Qu Wenruo wrote:
> 
> 
> At 02/13/2017 10:30 PM, David Sterba wrote:
> > Make ulist_fini externally visible so the ulist API is complete.
> 
> Looks good to me.
> 
> While I'm not pretty sure if the name "ulist_fini" is good enough.

> 
> Maybe "ulist_finish" or "ulist_release" (just like release_path and 
> free_path)?

Yeah, I don't like the name either, ulist_release sounds ok to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/29] btrfs: remove unused parameter from btrfs_check_super_valid

2017-02-14 Thread David Sterba
On Mon, Feb 13, 2017 at 06:11:56PM -0800, Liu Bo wrote:
> On Mon, Feb 13, 2017 at 10:33:55AM +0100, David Sterba wrote:
> > None of the checks need to know the RO/RW status.
> >
> 
> OK...there was a readonly check, which lets us skip all checks,
> it was removed by the below commit, should we add the check back?

All the current checks do not modify the superblock, so it's read-only
and we can keep it as-is. The superblock is available from inside the
function anyway so we don't need to check sb_flags externally. I'll
update the changelog.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Root volume (ID 5) in deleting state

2017-02-14 Thread Martin Mlynář



It looks you're right!

On a different machine:

# btrfs sub list / | grep -v lxc
ID 327 gen 1959587 top level 5 path mnt/reaver
ID 498 gen 593655 top level 5 path var/lib/machines

# btrfs sub list / -d | wc -l
0

Ok, apparently it's a regression in one of the latest versions then.
But, it seems quite harmless.

I'm glad my data are safe :)






# uname -a
Linux interceptor 4.9.6-1-ARCH #1 SMP PREEMPT Thu Jan 26 09:22:26 CET
2017 x86_64 GNU/Linux

# btrfs fi show  /
Label: none  uuid: 859dec5c-850c-4660-ad99-bc87456aa309
  Total devices 1 FS bytes used 132.89GiB
  devid1 size 200.00GiB used 200.00GiB path
/dev/mapper/vg0-btrfsroot

As a side note, all of your disk space is allocated (200GiB of 200GiB).

Even while there's still 70GiB of free space scattered around inside,
this might lead to out-of-space issues, depending on how badly
fragmented that free space is.

I have not noticed this at all!

# btrfs fi show /
Label: none  uuid: 859dec5c-850c-4660-ad99-bc87456aa309
 Total devices 1 FS bytes used 134.23GiB
 devid1 size 200.00GiB used 200.00GiB path /dev/mapper/vg0-btrfsroot

# btrfs fi df /
Data, single: total=195.96GiB, used=131.58GiB
System, single: total=3.00MiB, used=48.00KiB
Metadata, single: total=4.03GiB, used=2.64GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

After btrfs defrag there is no difference. btrfs fi show says still
200/200. I'll try to play with it.


[ ... ]

So, to get the numbers of total raw disk space allocation down, you need
to defragment free space (compact the data), not defrag used space.

You can even create pictures of space utilization in your btrfs
filesystem, which might help understanding what it looks like right now: \o/

https://github.com/knorrie/btrfs-heatmap/
I've run into your tool yesterday while googling around this - thanks, 
it's really nice tool. Now rebalance is running and it seems to work well.


Thank you for excellent responses and help!



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] btrfs: ulist: make the finalization function public

2017-02-14 Thread Qu Wenruo



At 02/13/2017 10:30 PM, David Sterba wrote:

Make ulist_fini externally visible so the ulist API is complete.


Looks good to me.

While I'm not pretty sure if the name "ulist_fini" is good enough.

Maybe "ulist_finish" or "ulist_release" (just like release_path and 
free_path)?


Thanks,
Qu


Signed-off-by: David Sterba 
---
 fs/btrfs/ulist.c | 2 +-
 fs/btrfs/ulist.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c
index b1434bb57e36..5deee56434fc 100644
--- a/fs/btrfs/ulist.c
+++ b/fs/btrfs/ulist.c
@@ -58,7 +58,7 @@ void ulist_init(struct ulist *ulist)
  * This is useful in cases where the base 'struct ulist' has been statically
  * allocated.
  */
-static void ulist_fini(struct ulist *ulist)
+void ulist_fini(struct ulist *ulist)
 {
struct ulist_node *node;
struct ulist_node *next;
diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h
index 007b22fff3f9..1a4130443d7e 100644
--- a/fs/btrfs/ulist.h
+++ b/fs/btrfs/ulist.h
@@ -44,6 +44,7 @@ struct ulist {
 };

 void ulist_init(struct ulist *ulist);
+void ulist_fini(struct ulist *ulist);
 void ulist_reinit(struct ulist *ulist);
 struct ulist *ulist_alloc(gfp_t gfp_mask);
 void ulist_free(struct ulist *ulist);




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/7] btrfs: embed extent_changeset::range_changed to the structure

2017-02-14 Thread Qu Wenruo



At 02/13/2017 10:30 PM, David Sterba wrote:

We can embed range_changed to the extent changeset to address following
problems:

- no need to allocate ulist dynamically, we also get rid of the GFP_NOFS
  for free
- fix lack of allocation failure checking in btrfs_qgroup_reserve_data


Nice catch.



The stack consuption where extent_changeset is used slightly increases:

before: 16
after: 16 - 8 (for pointer) + 32 (sizeof ulist) = 40

Which is bearable.

Signed-off-by: David Sterba 


Reviewed-by: Qu Wenruo 

Thanks,
Qu


---
 fs/btrfs/extent_io.c |  2 +-
 fs/btrfs/extent_io.h |  2 +-
 fs/btrfs/qgroup.c| 24 +---
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 9df6ed30de00..9140847bfb0c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -144,7 +144,7 @@ static void add_extent_changeset(struct extent_state 
*state, unsigned bits,
if (!set && (state->state & bits) == 0)
return;
changeset->bytes_changed += state->end - state->start + 1;
-   ret = ulist_add(changeset->range_changed, state->start, state->end,
+   ret = ulist_add(>range_changed, state->start, state->end,
GFP_ATOMIC);
/* ENOMEM */
BUG_ON(ret < 0);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4551a5b4b8f5..270d03be290e 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -193,7 +193,7 @@ struct extent_changeset {
u64 bytes_changed;

/* Changed ranges */
-   struct ulist *range_changed;
+   struct ulist range_changed;
 };

 static inline void extent_set_compress_type(unsigned long *bio_flags,
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 971701328229..7cc2e2221f55 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2828,7 +2828,7 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 
start, u64 len)
return 0;

changeset.bytes_changed = 0;
-   changeset.range_changed = ulist_alloc(GFP_NOFS);
+   ulist_init(_changed);
ret = set_record_extent_bits(_I(inode)->io_tree, start,
start + len -1, EXTENT_QGROUP_RESERVED, );
trace_btrfs_qgroup_reserve_data(inode, start, len,
@@ -2840,17 +2840,17 @@ int btrfs_qgroup_reserve_data(struct inode *inode, u64 
start, u64 len)
if (ret < 0)
goto cleanup;

-   ulist_free(changeset.range_changed);
+   ulist_fini(_changed);
return ret;

 cleanup:
/* cleanup already reserved ranges */
ULIST_ITER_INIT();
-   while ((unode = ulist_next(changeset.range_changed, )))
+   while ((unode = ulist_next(_changed, )))
clear_extent_bit(_I(inode)->io_tree, unode->val,
 unode->aux, EXTENT_QGROUP_RESERVED, 0, 0, NULL,
 GFP_NOFS);
-   ulist_free(changeset.range_changed);
+   ulist_fini(_changed);
return ret;
 }

@@ -2862,10 +2862,7 @@ static int __btrfs_qgroup_release_data(struct inode 
*inode, u64 start, u64 len,
int ret;

changeset.bytes_changed = 0;
-   changeset.range_changed = ulist_alloc(GFP_NOFS);
-   if (!changeset.range_changed)
-   return -ENOMEM;
-
+   ulist_init(_changed);
ret = clear_record_extent_bits(_I(inode)->io_tree, start,
start + len -1, EXTENT_QGROUP_RESERVED, );
if (ret < 0)
@@ -2878,7 +2875,7 @@ static int __btrfs_qgroup_release_data(struct inode 
*inode, u64 start, u64 len,
trace_btrfs_qgroup_release_data(inode, start, len,
changeset.bytes_changed, trace_op);
 out:
-   ulist_free(changeset.range_changed);
+   ulist_fini(_changed);
return ret;
 }

@@ -2976,22 +2973,19 @@ void btrfs_qgroup_check_reserved_leak(struct inode 
*inode)
int ret;

changeset.bytes_changed = 0;
-   changeset.range_changed = ulist_alloc(GFP_NOFS);
-   if (WARN_ON(!changeset.range_changed))
-   return;
-
+   ulist_init(_changed);
ret = clear_record_extent_bits(_I(inode)->io_tree, 0, (u64)-1,
EXTENT_QGROUP_RESERVED, );

WARN_ON(ret < 0);
if (WARN_ON(changeset.bytes_changed)) {
ULIST_ITER_INIT();
-   while ((unode = ulist_next(changeset.range_changed, ))) {
+   while ((unode = ulist_next(_changed, ))) {
btrfs_warn(BTRFS_I(inode)->root->fs_info,
"leaking qgroup reserved space, ino: %lu, start: 
%llu, end: %llu",
inode->i_ino, unode->val, unode->aux);
}
qgroup_free(BTRFS_I(inode)->root, changeset.bytes_changed);
}
-   ulist_free(changeset.range_changed);
+   ulist_fini(_changed);
 }




--
To unsubscribe from this list: 

Re: [PATCH 7/7] nonblocking aio: btrfs

2017-02-14 Thread Nikolay Borisov


On 14.02.2017 04:46, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 
> Return EAGAIN if any of the following checks fail
>  + i_rwsem is lockable
>  + NODATACOW or PREALLOC is set

perhaps you wanted to say "NODATACOW or PREALLOC is _not_ set"? See below



>  + Can nocow at the desired location
>  + Writing beyond end of file
> 
> Signed-off-by: Goldwyn Rodrigues 
> ---
>  fs/btrfs/file.c  | 25 -
>  fs/btrfs/inode.c |  3 +++
>  2 files changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 3a14c87..f4dcc7a 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1804,12 +1804,29 @@ static ssize_t btrfs_file_write_iter(struct kiocb 
> *iocb,
>   ssize_t num_written = 0;
>   bool sync = (file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host);
>   ssize_t err;
> - loff_t pos;
> - size_t count;
> + loff_t pos = iocb->ki_pos;
> + size_t count = iov_iter_count(from);
>   loff_t oldsize;
>   int clean_page = 0;
>  
> - inode_lock(inode);
> + if ((iocb->ki_flags & IOCB_NONBLOCKING) &&
> + (iocb->ki_flags & IOCB_DIRECT)) {
> + /* Don't sleep on inode rwsem */
> + if (!inode_trylock(inode))
> + return -EAGAIN;
> + /*
> +  * We will allocate space in case nodatacow is not set,
> +  * so bail
> +  */
> + if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
> +   BTRFS_INODE_PREALLOC)) ||

The first part of the condition will evaluate to true only when both
btrfs_inode_nodatacow and btrfs_inodeprealloc are not set. The code
contradicts your statement in the change log.

> + check_can_nocow(inode, pos, ) <= 0) {
> + inode_unlock(inode);
> + return -EAGAIN;
> + }
> + } else
> + inode_lock(inode);
> +
>   err = generic_write_checks(iocb, from);
>   if (err <= 0) {
>   inode_unlock(inode);
> @@ -1843,8 +1860,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
>*/
>   update_time_for_write(inode);
>  
> - pos = iocb->ki_pos;
> - count = iov_iter_count(from);
>   start_pos = round_down(pos, root->sectorsize);
>   oldsize = i_size_read(inode);
>   if (start_pos > oldsize) {
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index be4da91..9911ebd 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8690,6 +8690,9 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, 
> struct iov_iter *iter)
>   if (offset + count <= inode->i_size) {
>   inode_unlock(inode);
>   relock = true;
> + } else if (iocb->ki_flags & IOCB_NONBLOCKING) {
> + ret = -EAGAIN;
> + goto out;
>   }
>   ret = btrfs_delalloc_reserve_space(inode, offset, count);
>   if (ret)
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html