Re: [PATCH v2] Fstest: btrfs/151: test if device delete ends up with losing raid profile

2017-10-13 Thread Eryu Guan
On Fri, Oct 13, 2017 at 01:40:05PM -0600, Liu Bo wrote:
> Currently running 'btrfs device delete' can end up with losing data
> raid profile (if any), this test is to reproduce the problem.
> 
> The fix is
>  "Btrfs: avoid losing data raid profile when deleting a device"
> 
> Signed-off-by: Liu Bo 
> ---
>  tests/btrfs/151 | 73 
> +
>  tests/btrfs/151.out |  2 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 76 insertions(+)
>  create mode 100755 tests/btrfs/151
>  create mode 100644 tests/btrfs/151.out
> 
> diff --git a/tests/btrfs/151 b/tests/btrfs/151
> new file mode 100755
> index 000..1866cb6
> --- /dev/null
> +++ b/tests/btrfs/151
> @@ -0,0 +1,73 @@
> +#! /bin/bash
> +# FS QA Test 151
> +#
> +# Test if it's losing data chunk's raid profile after 'btrfs device
> +# remove'.
> +#
> +# The fix is
> +#Btrfs: avoid losing data raid profile when deleting a device
> +#
> +#---
> +# Copyright (c) 2017 Oracle.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch
> +_require_scratch_dev_pool 3
> +
> +# create raid1 for data
> +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
> +
> +# we need an empty data chunk, so nospace_cache is required.
> +_scratch_mount -onospace_cache
> +
> +# if data chunk is empty, 'btrfs device remove' can change raid1 to
> +# single.
> +$BTRFS_UTIL_PROG device delete 2 $SCRATCH_MNT >> $seqres.full
> +
> +df_ret=`$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT`
> +echo $df_ret | $AWK_PROG -F ':' '/Data,/ {print $1}'

Forgot to commit the changes before sending? It looks like the same as v1.

Thanks,
Eryu

> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/151.out b/tests/btrfs/151.out
> new file mode 100644
> index 000..0a1de06
> --- /dev/null
> +++ b/tests/btrfs/151.out
> @@ -0,0 +1,2 @@
> +QA output created by 151
> +Data, RAID1
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index e73bb1b..a7ff7b0 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -153,3 +153,4 @@
>  148 auto quick rw
>  149 auto quick send compress
>  150 auto quick dangerous
> +151 auto quick
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: use BLK_STS defines where needed

2017-10-13 Thread Anand Jain
At few places we could use BLK_STS_OK and BLK_STS_NOSUPP.

Signed-off-by: Anand Jain 
---
 fs/btrfs/compression.c | 3 ++-
 fs/btrfs/inode.c   | 4 ++--
 fs/btrfs/volumes.c | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index b51d23f5cafa..43d4b51556ec 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -239,7 +239,8 @@ static void end_compressed_bio_write(struct bio *bio)
 cb->start,
 cb->start + cb->len - 1,
 NULL,
-bio->bi_status ? 0 : 1);
+bio->bi_status ?
+BLK_STS_OK : BLK_STS_NOTSUPP);
cb->compressed_pages[0]->mapping = NULL;
 
end_compressed_writeback(inode, cb);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 128f3e58634f..9aa03c89ad86 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8360,7 +8360,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED) {
err = btrfs_subio_endio_read(inode, io_bio, err);
if (!err)
-   bio->bi_status = 0;
+   bio->bi_status = BLK_STS_OK;
}
 
unlock_extent(_I(inode)->io_tree, dip->logical_offset,
@@ -8478,7 +8478,7 @@ static void btrfs_end_dio_bio(struct bio *bio)
if (dip->errors) {
bio_io_error(dip->orig_bio);
} else {
-   dip->dio_bio->bi_status = 0;
+   dip->dio_bio->bi_status = BLK_STS_OK;
bio_endio(dip->orig_bio);
}
 out:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a61405c2cd31..15e017af756c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6020,7 +6020,7 @@ static void btrfs_end_bio(struct bio *bio)
 * this bio is actually up to date, we didn't
 * go over the max number of errors
 */
-   bio->bi_status = 0;
+   bio->bi_status = BLK_STS_OK;
}
 
btrfs_end_bbio(bbio, bio);
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix false EIO for missing device

2017-10-13 Thread Anand Jain



-   bio->bi_status = BLK_STS_IOERR;
+   if (atomic_read(>error) > bbio->max_errors)
+   bio->bi_status = BLK_STS_IOERR;
+   else
+   bio->bi_status = 0;


Thanks for the fix, I'd prefer BLK_STS_OK rather than 0.

With that,

Reviewed-by: Liu Bo 


 Thanks for the review will fix it.

-Anand


-liubo

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: fix false EIO for missing device

2017-10-13 Thread Anand Jain
When one of the device is missing, bbio_error() takes care
of setting the error status. And if its only IO that is
pending in that stripe, it fails to check the status of the
other IO at %bbio_error before setting the error %bi_status
for the %orig_bio. Fix this by checking if %bbio->error is
has crossed the %bbio->max_errors. Thxs.

Reproducer as below fdatasync error is seen intermittently.

 mount -o degraded /dev/sdc /btrfs
 dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 
conv=fdatasync

 dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error

 The reason for the intermittences of the problem is because..
 following condition has to be met, which depends on timely
 coordination.
 In btrfs_map_bio()
  . The RAID1 the missing device has to be at %dev_nr = 1
 In bbio_error()
  . Before bbio_error() is called the bio of the not-missing
device at %dev_nr=0 must be completed so that the below
condition is true
 if (atomic_dec_and_test(>stripes_pending)) {

Signed-off-by: Anand Jain 
Reviewed-by: Liu Bo 
---
v2: Use BLK_STS_OK instead of 0.

 fs/btrfs/volumes.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 41c02a3ffc78..15e017af756c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct 
bio *bio, u64 logical)
 
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
bio->bi_iter.bi_sector = logical >> 9;
-   bio->bi_status = BLK_STS_IOERR;
+   if (atomic_read(>error) > bbio->max_errors)
+   bio->bi_status = BLK_STS_IOERR;
+   else
+   bio->bi_status = BLK_STS_OK;
btrfs_end_bbio(bbio, bio);
}
 }
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] Btrfs: clear_dirty only on pages only in compression range

2017-10-13 Thread Timofey Titovets
2017-10-10 19:22 GMT+03:00 David Sterba :
> On Tue, Oct 03, 2017 at 06:06:02PM +0300, Timofey Titovets wrote:
>> We need to call extent_range_clear_dirty_for_io()
>> on compression range to prevent application from changing
>> page content, while pages compressing.
>>
>> but "(end - start)" can be much (up to 1024 times) bigger
>> then compression range (BTRFS_MAX_UNCOMPRESSED), so optimize that
>> by calculating compression range for
>> that loop iteration, and flip bits only on that range
>
> I'm not sure this is safe to do. Compress_file_range gets the whole
> range [start,end] from async_cow_start, and other parts of code might
> rely on the status of the whole range, ie. not partially redirtied.

I check some kernel code, io path are complex =\.
I see 3 approaches:
1. That used to signal upper level, that we fail to write that pages
and on other sync() call, kernel can send that data again to writing.
2. We lock pages from any changes while processing data.
3. Serialize writes, i.e. if i understood correctly, that allow to
not send down pages again for several sync requests.

My code above will handle ok first and second case, and in theory will
cause some issues with 3, but doesn't matter.

The main design problem from my point of view for now, that we call
that function many times in the loop, example:
compress_file_range get: 0 - 1048576
extent_range_clear_dirty_for_io(), will get a call for:
0 - 1048576
131072 - 1048576
262144 - 1048576
... & etc

So, we can beat that in different way.
I first think about move extent_range_clear_dirty_for_io() out of
loop, but i think that a better approach.

Whats about:
if (!redirty) {
extent_range_clear_dirty_for_io(inode, start, end);
redirty = 1;
}

That will also cover all above cases, because that lock whole range, as before.
But we only call that once, and that will fix design issue.

That you think?

Thanks.
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: add write_flags for compression bio

2017-10-13 Thread Liu Bo
Compression code path has only flaged bios with REQ_OP_WRITE no matter
where the bios come from.  This breaks the rule that sync writes and
writeback writes need to be differentiated from each other.

This passes writeback_control to compression write path so that it can
send bios with proper flags to block layer.

Signed-off-by: Liu Bo 
---
 fs/btrfs/compression.c |  7 ---
 fs/btrfs/compression.h |  3 ++-
 fs/btrfs/extent_io.c   |  2 +-
 fs/btrfs/extent_io.h   |  3 ++-
 fs/btrfs/inode.c   | 15 +++
 5 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 280384b..3dae2f5 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -292,7 +292,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
 unsigned long len, u64 disk_start,
 unsigned long compressed_len,
 struct page **compressed_pages,
-unsigned long nr_pages)
+unsigned long nr_pages,
+unsigned int write_flags)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct bio *bio = NULL;
@@ -324,7 +325,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
bdev = fs_info->fs_devices->latest_bdev;
 
bio = btrfs_bio_alloc(bdev, first_byte);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+   bio->bi_opf = REQ_OP_WRITE | write_flags;
bio->bi_private = cb;
bio->bi_end_io = end_compressed_bio_write;
refcount_set(>pending_bios, 1);
@@ -371,7 +372,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
bio_put(bio);
 
bio = btrfs_bio_alloc(bdev, first_byte);
-   bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
+   bio->bi_opf = REQ_OP_WRITE | write_flags;
bio->bi_private = cb;
bio->bi_end_io = end_compressed_bio_write;
bio_add_page(bio, page, PAGE_SIZE, 0);
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index d2781ff..dc45b94 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -91,7 +91,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode 
*inode, u64 start,
  unsigned long len, u64 disk_start,
  unsigned long compressed_len,
  struct page **compressed_pages,
- unsigned long nr_pages);
+ unsigned long nr_pages,
+ unsigned int write_flags);
 blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio,
 int mirror_num, unsigned long bio_flags);
 
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3e5bb0c..ea64ad0 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3252,7 +3252,7 @@ static noinline_for_stack int writepage_delalloc(struct 
inode *inode,
   delalloc_start,
   delalloc_end,
   _started,
-  nr_written);
+  nr_written, wbc);
/* File system has been set read-only */
if (ret) {
SetPageError(page);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index faffa28..a92fd98 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -116,7 +116,8 @@ struct extent_io_ops {
 */
int (*fill_delalloc)(void *private_data, struct page *locked_page,
 u64 start, u64 end, int *page_started,
-unsigned long *nr_written);
+unsigned long *nr_written,
+struct writeback_control *wbc);
 
int (*writepage_start_hook)(struct page *page, u64 start, u64 end);
void (*writepage_end_io_hook)(struct page *page, u64 start, u64 end,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 128f3e5..ee67773 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -367,6 +367,7 @@ struct async_cow {
struct page *locked_page;
u64 start;
u64 end;
+   unsigned int write_flags;
struct list_head extents;
struct btrfs_work work;
 };
@@ -846,7 +847,8 @@ static noinline void submit_compressed_extents(struct inode 
*inode,
async_extent->ram_size,
ins.objectid,
ins.offset, async_extent->pages,
-   

Re: Btrfs warnings in kernel 4.13.5

2017-10-13 Thread Juan Orti Alcaine
2017-10-13 20:15 GMT+02:00 Chris Murphy :
> On Fri, Oct 13, 2017 at 12:02 PM, Duncan <1i5t5.dun...@cox.net> wrote:
>> Those warnings aren't anything to be /too/ worried about.  They are
>> triggered when a btrfs device size isn't a multiple of the btrfs
>> sectorsize (currently 4 KiB on amd64 aka x86_64).  You can manually
>> shrink your btrfs devices the fraction to an exact 4 KiB multiple, or
>> wait a bit, and a new release of btrfs-tools should have a command to do
>> it for you.
>
>
> So it's an open question now why it was created with a device size
> that isn't divisible by 4KiB. Seems like a bad idea to make it easy to
> create virtual block devices based on 512 byte divisions, as it makes
> it easy to end up with unaligned sectors at some point if one of the
> underlying devices is an AF 512e device. Was it manually created? Or
> was it created via the OS installer? Some time ago there was a feature
> proposal for the Fedora installer being able to create bcache and
> dmcache devices, but I haven't seen anything yet.


I'm sure I created all bcache devices with a block size of 4k.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: avoid losing data raid profile when deleting a device

2017-10-13 Thread Liu Bo
On Wed, Oct 11, 2017 at 10:38:50AM +0300, Nikolay Borisov wrote:
> 
> 
> On 10.10.2017 20:53, Liu Bo wrote:
> > We've avoided data losing raid profile when doing balance, but it
> > turns out that deleting a device could also result in the same
> > problem
> > 
> > This fixes the problem by creating an empty data chunk before
> > relocating the data chunk.
> 
> Why is this needed - copy the metadata of the to-be-relocated chunk into
> the newly created empty chunk? I don't entirely understand that code but
> doesn't this seem a bit like a hack in order to stash some information?
> Perhaps you could elaborate the logic a bit more in the changelog?
> 
> > 
> > Metadata/System chunk are supposed to have non-zero bytes all the time
> > so their raid profile is persistent.
> 
> I think this changelog is a bit scarce on detail as to the culprit of
> the problem. Could you perhaps put a sentence or two what the underlying
> logic which deletes the raid profile if a chunk is empty ?
>

Fair enough.

The problem is as same as what commit 2c9fe8355258 ("btrfs: Fix
lost-data-profile caused by balance bg") had fixed.

Similar to doing balance, deleting a device can also move all chunks
on this disk to other available disks, after 'move' successfully,
it'll remove those chunks.

If our last data chunk is empty and part of it happens to be on this
disk, then there is no data chunk in this btrfs after deleting the
device successfully, any following write will try to create a new data
chunk which ends up with a single data chunk because the only
available data raid profile is 'single'.

thanks,
-liubo

> > 
> > Reported-by: James Alandt 
> > Signed-off-by: Liu Bo 
> > ---
> > 
> > v2: - return the correct error.
> > - move helper ahead of __btrfs_balance().
> > 
> >  fs/btrfs/volumes.c | 84 
> > ++
> >  1 file changed, 65 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 4a72c45..a74396d 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -3018,6 +3018,48 @@ static int btrfs_relocate_sys_chunks(struct 
> > btrfs_fs_info *fs_info)
> > return ret;
> >  }
> >  
> > +/*
> > + * return 1 : allocate a data chunk successfully,
> > + * return <0: errors during allocating a data chunk,
> > + * return 0 : no need to allocate a data chunk.
> > + */
> > +static int btrfs_may_alloc_data_chunk(struct btrfs_fs_info *fs_info,
> > + u64 chunk_offset)
> > +{
> > +   struct btrfs_block_group_cache *cache;
> > +   u64 bytes_used;
> > +   u64 chunk_type;
> > +
> > +   cache = btrfs_lookup_block_group(fs_info, chunk_offset);
> > +   ASSERT(cache);
> > +   chunk_type = cache->flags;
> > +   btrfs_put_block_group(cache);
> > +
> > +   if (chunk_type & BTRFS_BLOCK_GROUP_DATA) {
> > +   spin_lock(_info->data_sinfo->lock);
> > +   bytes_used = fs_info->data_sinfo->bytes_used;
> > +   spin_unlock(_info->data_sinfo->lock);
> > +
> > +   if (!bytes_used) {
> > +   struct btrfs_trans_handle *trans;
> > +   int ret;
> > +
> > +   trans = btrfs_join_transaction(fs_info->tree_root);
> > +   if (IS_ERR(trans))
> > +   return PTR_ERR(trans);
> > +
> > +   ret = btrfs_force_chunk_alloc(trans, fs_info,
> > + BTRFS_BLOCK_GROUP_DATA);
> > +   btrfs_end_transaction(trans);
> > +   if (ret < 0)
> > +   return ret;
> > +
> > +   return 1;
> > +   }
> > +   }
> > +   return 0;
> > +}
> > +
> >  static int insert_balance_item(struct btrfs_fs_info *fs_info,
> >struct btrfs_balance_control *bctl)
> >  {
> > @@ -3476,7 +3518,6 @@ static int __btrfs_balance(struct btrfs_fs_info 
> > *fs_info)
> > u32 count_meta = 0;
> > u32 count_sys = 0;
> > int chunk_reserved = 0;
> > -   u64 bytes_used = 0;
> >  
> > /* step one make some room on all the devices */
> > devices = _info->fs_devices->devices;
> > @@ -3635,28 +3676,21 @@ static int __btrfs_balance(struct btrfs_fs_info 
> > *fs_info)
> > goto loop;
> > }
> >  
> > -   ASSERT(fs_info->data_sinfo);
> > -   spin_lock(_info->data_sinfo->lock);
> > -   bytes_used = fs_info->data_sinfo->bytes_used;
> > -   spin_unlock(_info->data_sinfo->lock);
> > -
> > -   if ((chunk_type & BTRFS_BLOCK_GROUP_DATA) &&
> > -   !chunk_reserved && !bytes_used) {
> > -   trans = btrfs_start_transaction(chunk_root, 0);
> > -   if (IS_ERR(trans)) {
> > -   mutex_unlock(_info->delete_unused_bgs_mutex);
> > -   ret = PTR_ERR(trans);
> > -   goto error;
> > 

[PATCH v2] Fstest: btrfs/151: test if device delete ends up with losing raid profile

2017-10-13 Thread Liu Bo
Currently running 'btrfs device delete' can end up with losing data
raid profile (if any), this test is to reproduce the problem.

The fix is
 "Btrfs: avoid losing data raid profile when deleting a device"

Signed-off-by: Liu Bo 
---
 tests/btrfs/151 | 73 +
 tests/btrfs/151.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 76 insertions(+)
 create mode 100755 tests/btrfs/151
 create mode 100644 tests/btrfs/151.out

diff --git a/tests/btrfs/151 b/tests/btrfs/151
new file mode 100755
index 000..1866cb6
--- /dev/null
+++ b/tests/btrfs/151
@@ -0,0 +1,73 @@
+#! /bin/bash
+# FS QA Test 151
+#
+# Test if it's losing data chunk's raid profile after 'btrfs device
+# remove'.
+#
+# The fix is
+#  Btrfs: avoid losing data raid profile when deleting a device
+#
+#---
+# Copyright (c) 2017 Oracle.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+# Modify as appropriate.
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_scratch_dev_pool 3
+
+# create raid1 for data
+_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
+
+# we need an empty data chunk, so nospace_cache is required.
+_scratch_mount -onospace_cache
+
+# if data chunk is empty, 'btrfs device remove' can change raid1 to
+# single.
+$BTRFS_UTIL_PROG device delete 2 $SCRATCH_MNT >> $seqres.full
+
+df_ret=`$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT`
+echo $df_ret | $AWK_PROG -F ':' '/Data,/ {print $1}'
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/151.out b/tests/btrfs/151.out
new file mode 100644
index 000..0a1de06
--- /dev/null
+++ b/tests/btrfs/151.out
@@ -0,0 +1,2 @@
+QA output created by 151
+Data, RAID1
diff --git a/tests/btrfs/group b/tests/btrfs/group
index e73bb1b..a7ff7b0 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -153,3 +153,4 @@
 148 auto quick rw
 149 auto quick send compress
 150 auto quick dangerous
+151 auto quick
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/2] btrfs: introduce device dynamic state transition to failed

2017-10-13 Thread Liu Bo
On Tue, Oct 03, 2017 at 11:59:19PM +0800, Anand Jain wrote:
> From: Anand Jain 
> 
> This patch provides helper functions to force a device to failed,
> and we need it for the following reasons,
> 1) a. It can be reported that device has failed when it does and
>b. Close the device when it goes offline so that blocklayer can
>   cleanup
> 2) Identify the candidate for the auto replace
> 3) Stop further RW to the failing device and
> 4) A device in the multi device btrfs may fail, but as of now in
>some system config whole of btrfs gets unmounted.
> 
> Signed-off-by: Anand Jain 
> Tested-by: Austin S. Hemmelgarn 
> ---
> V8: General misc cleanup. Based on v4.14-rc2
> 
>  fs/btrfs/volumes.c | 104 
> +
>  fs/btrfs/volumes.h |  15 +++-
>  2 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 0e8f16c305df..06e7cf4cef81 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7255,3 +7255,107 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info 
> *fs_info)
>   fs_devices = fs_devices->seed;
>   }
>  }
> +
> +static void do_close_device(struct work_struct *work)
> +{
> + struct btrfs_device *device;
> +
> + device = container_of(work, struct btrfs_device, rcu_work);
> +
> + if (device->closing_bdev)
> + blkdev_put(device->closing_bdev, device->mode);
> +
> + device->closing_bdev = NULL;
> +}
> +
> +static void btrfs_close_one_device(struct rcu_head *head)
> +{
> + struct btrfs_device *device;
> +
> + device = container_of(head, struct btrfs_device, rcu);
> +
> + INIT_WORK(>rcu_work, do_close_device);
> + schedule_work(>rcu_work);
> +}

No need to schedule the job to a worker.

thanks,

-liubo

> +
> +void btrfs_force_device_close(struct btrfs_device *device)
> +{
> + struct btrfs_fs_info *fs_info;
> + struct btrfs_fs_devices *fs_devices;
> +
> + fs_devices = device->fs_devices;
> + fs_info = fs_devices->fs_info;
> +
> + btrfs_sysfs_rm_device_link(fs_devices, device);
> +
> + mutex_lock(_devices->device_list_mutex);
> + mutex_lock(_devices->fs_info->chunk_mutex);
> +
> + btrfs_assign_next_active_device(fs_devices->fs_info, device, NULL);
> +
> + if (device->bdev)
> + fs_devices->open_devices--;
> +
> + if (device->writeable) {
> + list_del_init(>dev_alloc_list);
> + fs_devices->rw_devices--;
> + }
> + device->writeable = 0;
> +
> + /*
> +  * Todo: We have miss-used missing flag all around, and here
> +  * too for now. (In the long run I want to keep missing to only
> +  * indicate that it was not present when RAID was assembled.)
> +  */
> + device->missing = 1;
> + fs_devices->missing_devices++;
> + device->closing_bdev = device->bdev;
> + device->bdev = NULL;
> +
> + call_rcu(>rcu, btrfs_close_one_device);
> +
> + mutex_unlock(_devices->fs_info->chunk_mutex);
> + mutex_unlock(_devices->device_list_mutex);
> +
> + rcu_barrier();
> +
> + btrfs_warn_in_rcu(fs_info, "device %s failed",
> + rcu_str_deref(device->name));
> +
> + /*
> +  * We lost one/more disk, which means its not as it
> +  * was configured by the user. Show mount should show
> +  * degraded.
> +  */
> + btrfs_set_opt(fs_info->mount_opt, DEGRADED);
> +
> + /*
> +  * Now having lost one of the device, check if chunk stripe
> +  * is incomplete and handle fatal error if needed.
> +  */
> + if (!btrfs_check_rw_degradable(fs_info))
> + btrfs_handle_fs_error(fs_info, -EIO,
> + "devices below critical level");
> +}
> +
> +void btrfs_mark_device_failed(struct btrfs_device *dev)
> +{
> + struct btrfs_fs_devices *fs_devices = dev->fs_devices;
> +
> + /* This shouldn't be called if device is already missing */
> + if (dev->missing || !dev->bdev)
> + return;
> + if (dev->failed)
> + return;
> + dev->failed = 1;
> +
> + /* Last RW device is requested to force close let FS handle it. */
> + if (fs_devices->rw_devices == 1) {
> + btrfs_handle_fs_error(fs_devices->fs_info, -EIO,
> + "Last RW device failed");
> + return;
> + }
> +
> + /* Point of no return start here. */
> + btrfs_force_device_close(dev);
> +}
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 6108fdfec67f..05b150c03995 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -65,13 +65,26 @@ struct btrfs_device {
>   struct btrfs_pending_bios pending_sync_bios;
>  
>   struct block_device *bdev;
> + struct block_device *closing_bdev;
>  
>   /* the mode sent to blkdev_get */
>   fmode_t mode;
>  
>   int writeable;
> 

Re: [PATCH v8 2/2] btrfs: check device for critical errors and mark failed

2017-10-13 Thread Liu Bo
On Sun, Oct 08, 2017 at 10:23:58PM +0800, Anand Jain wrote:
> 
> 
> On 10/07/2017 07:56 AM, Liu Bo wrote:
> > On Thu, Oct 05, 2017 at 09:56:59PM +0800, Anand Jain wrote:
> > > 
> > > 
> > > On 10/05/2017 04:11 AM, Liu Bo wrote:
> > > > On Tue, Oct 03, 2017 at 11:59:20PM +0800, Anand Jain wrote:
> > > > > From: Anand Jain 
> > > > > 
> > > > > Write and flush errors are critical errors, upon which the device fd
> > > > > must be closed and marked as failed.
> > > > > 
> > > > 
> > > > Can we defer the job of closing device to umount?
> > > 
> > >   Originally I think it was designed to handle the bad disk
> > >   same as missing device. as below [1]. This patch just does that.
> 
> > >   But I am curious to know if there is any issue to close failed
> > >   device ?
> 
>Anything on this ?

No big deal, it's just different with md's behavior, but surely it
doesn't make a big difference whether we close device fd here or not.

> 
> 
> > > [1]
> > > -
> > > static int read_one_dev(struct btrfs_fs_info *fs_info,
> > >  struct extent_buffer *leaf,
> > >  struct btrfs_dev_item *dev_item)
> > > ::
> > >  /*
> > >   * this happens when a device that was properly setup
> > >   * in the device info lists suddenly goes bad.
> > >   * device->bdev is NULL, and so we have to set
> > >   * device->missing to one here
> > >   */
> > > --
> > > 
> > >   So here I misuse the device missing scenario (device->bdev = NULL)
> > >   to the failed device scenario. (code comment mentioned it).
> > > 
> > 
> > This is OK to me, we can unify these stuff later, for now, ->missing
> > is to bypass this failed disk for r/w.
> > 
> > >   Secondly as in the patch 1/2 commit log and also Austin mentioned it,
> > >   if failed device close is deferred it will also defer the cleanup of
> > >   the failed device at the block layer.
> > > 
> > >   But yes block layer can still clean up when btrfs closes the
> > >   device at the time of replace, also where in the long run, pulling
> > >   out of the failed disk would (planned) trigger a udev notification
> > >   into the btrfs sysfs interface and it can close the device.
> > > 
> > >   Anything that I have missed ?
> > > 
> > 
> > I'm more inclined to do cleanup by making udev call 'device remove',
> > which is supposed to do all the validation checks you need to done
> > here.
> 
>Validation check like what ?

OK, just realize that I was talking about what we should do upon
device getting pulled out, in that case, validation check about raid
profile is needed, such as can we keep the raid profile or do we need
to degrade raid proflie, and 'device delete' can do that.

Although I agree with the fact that write/flush errors are critical
ones, it's still too strict if it got only a few write/flush errors,
given cable issues or timeout from some layers may also lead to those
errors.  A compromised way is to have a threshold to give drives
another chance.

thanks,

-liubo

> 
> Thanks, Anand
> 
> > >   Further, jfyi closing of failed device isn't related to the
> > >   looping for device failure though.
> > > 
> > > > We can go mark the device failed and skip it while doing read/write,
> > > > and umount can do the cleanup work.
> > > 
> > >   In servers a need of umount is considered as downtime, things
> > >   shouldn't wait for umount to cleanup.
> > > 
> > 
> > OK.
> > 
> > > > That way we don't need a dedicated thread looping around to detect a
> > > > rare situation.
> > > 
> > >   I think its better to make it event based instead of thread looping,
> > >   pls take a look..
> > > 
> > > [PATCH v8.1 2/2] btrfs: mark device failed for write and flush errors
> > > 
> > 
> > Event based is good, for anything about failing disk, we would have to
> > follow what md is doing eventually, i.e. let udev handle it.
> > 
> > thanks,
> > -liubo
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: USB upgrade fun

2017-10-13 Thread Chris Murphy
On Fri, Oct 13, 2017 at 12:22 PM, Austin S. Hemmelgarn
 wrote:

> From a practical perspective, you're almost certainly better off creating a
> copy for cold storage without involving BTRFS.

Yeah if you want to hedge your bets, and keep it simple, rsync to XFS.
With some added risk (double) you can use mdadm linear or LVM to
concat multiple devices if necessary. In theory you're not going to
get a double failure. But really, the more backups you add, it
increasingly doesn't matter what the strategy used.


>As an alternative, I would
> suggest one of the following approaches:
>
> 1. Take a snapshot of the filesystem, and use send/receive to transfer that
> to another device which you then remove and store somewhere.

This is what I do. Primary, and two separate backup (separate volume
UUIDs), all three are Btrfs. The primary uses send/receive to the two
backups. But my organization plan allows me to have only three
subvolumes on the primary being snapshot and thus send/receive. So
it's easy. Fourth copy, a subset of important data, is in the cloud.


> 2. Use rsync to copy things to another device which you then remove and
> store somewhere.

A fifth copy, also a subset of important data is done doing this to
XFS and a big drive kept offsite. It just accumulates. I don't
consider it a backup, it's an archive, nothing is deleted. I treat it
as WORM. Maybe one day I redo it based on dm-verity so I can add error
detection and correction, the cryptographic aspects are a distant
second for me.


> Both these options are safer, less likely to screw up your existing
> filesystem, and produce copies that can safely be connected to the original
> system at the same time as the original filesystem.

Keeping it simple is the only way you use it with any regularity, and
can understand it well enough if (really when) you have to do a
restore.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs warnings in kernel 4.13.5

2017-10-13 Thread Chris Murphy
On Fri, Oct 13, 2017 at 12:02 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Juan Orti Alcaine posted on Fri, 13 Oct 2017 10:40:20 +0200 as excerpted:
>
>> Hi,
>>
>> I've upgraded my system to Fedora 27 and now I see many btrfs warnings,
>> although the system seems to be working fine. Is this something I should
>> worry about?
>
>> # uname -a Linux xenon 4.13.5-300.fc27.x86_64 #1 SMP Thu Oct 5 16:57:11
>> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>>
>> # btrfs --version btrfs-progs v4.13.2
>
>> [  337.813416] [ cut here ]
>> [  337.813453] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
>> btrfs_update_device+0x1be/0x1d0 [btrfs]
>
>> [  337.814256] [ cut here ]
>> [  337.814281] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
>> btrfs_update_device+0x1be/0x1d0 [btrfs]
>
>> [  337.822161] [ cut here ]
>> [  337.822185] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
>> btrfs_update_device+0x1be/0x1d0 [btrfs]
>
>> [  337.822868] [ cut here ]
>> [  337.822890] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
>> btrfs_update_device+0x1be/0x1d0 [btrfs]
>
>
> Those warnings aren't anything to be /too/ worried about.  They are
> triggered when a btrfs device size isn't a multiple of the btrfs
> sectorsize (currently 4 KiB on amd64 aka x86_64).  You can manually
> shrink your btrfs devices the fraction to an exact 4 KiB multiple, or
> wait a bit, and a new release of btrfs-tools should have a command to do
> it for you.


So it's an open question now why it was created with a device size
that isn't divisible by 4KiB. Seems like a bad idea to make it easy to
create virtual block devices based on 512 byte divisions, as it makes
it easy to end up with unaligned sectors at some point if one of the
underlying devices is an AF 512e device. Was it manually created? Or
was it created via the OS installer? Some time ago there was a feature
proposal for the Fedora installer being able to create bcache and
dmcache devices, but I haven't seen anything yet.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fstest: btrfs/151: test if device delete ends up with losing raid profile

2017-10-13 Thread Liu Bo
On Thu, Oct 12, 2017 at 03:06:57PM +0800, Eryu Guan wrote:
> On Mon, Oct 09, 2017 at 11:39:21AM -0600, Liu Bo wrote:
> > Currently running 'btrfs device delete' can end up with losing data raid
> > profile (if any), this test is to reproduce the problem.
> > 
> > The fix is
> >  "Btrfs: avoid losing data raid profile when deleting a device"
> > 
> > Signed-off-by: Liu Bo 
> > ---
> >  tests/btrfs/151 | 73 
> > +
> >  tests/btrfs/151.out |  2 ++
> >  tests/btrfs/group   |  1 +
> >  3 files changed, 76 insertions(+)
> >  create mode 100755 tests/btrfs/151
> >  create mode 100644 tests/btrfs/151.out
> > 
> > diff --git a/tests/btrfs/151 b/tests/btrfs/151
> > new file mode 100755
> > index 000..1866cb6
> > --- /dev/null
> > +++ b/tests/btrfs/151
> > @@ -0,0 +1,73 @@
> > +#! /bin/bash
> > +# FS QA Test 151
> > +#
> > +# Test if it's losing data chunk's raid profile after 'btrfs device
> > +# remove'.
> > +#
> > +# The fix is
> > +#  Btrfs: avoid losing data raid profile when deleting a device
> > +#
> > +#---
> > +# Copyright (c) 2017 Oracle.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#---
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +   cd /
> > +   rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +
> > +# Modify as appropriate.
> > +_supported_fs btrfs
> > +_supported_os Linux
> > +_require_scratch
> 
> Better to have _require_btrfs_dev_del_by_devid too.
>

OK.

> > +_require_scratch_dev_pool 3
> 
> Hmm, this only reproduces for me if I have exact three devices in
> SCRATCH_DEV_POOL, large dev pool passed the test. If this can only be
> reproduced with a three-device setup, I think we need to limit the dev
> pool size to at most 3, e.g.
> 
> # 
> _scratch_dev_pool_get 3
> 
> _scratch_dev_pool_put
> status=0
> exit

Oops, you're totally right.

> 
> > +
> > +# create raid1 for data
> > +_scratch_pool_mkfs "-d raid1 -b 1G" >> $seqres.full 2>&1
> > +
> > +# we need an empty data chunk, so nospace_cache is required.
> > +_scratch_mount -onospace_cache
> > +
> > +# if data chunk is empty, 'btrfs device remove' can change raid1 to
> > +# single.
> > +$BTRFS_UTIL_PROG device delete 2 $SCRATCH_MNT >> $seqres.full
> > +
> > +df_ret=`$BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT`
> > +echo $df_ret | $AWK_PROG -F ':' '/Data,/ {print $1}'
> 
> I'd like to avoid saving outputs in a variable, it's easy to cause
> subtle problems without proper quoting, as Dave found out in this thread
> 
> [whacky issue] xfs/277 endlessly looping in _require_xfs_io_command
> http://www.spinics.net/lists/fstests/msg07501.html
>
> How about:
> 
> # save btrfs filesystem df output for debug purpose
> $BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT >>$seqres.full 2>&1
> $BTRFS_UTIL_PROG filesystem df $SCRATCH_MNT | \
>   $AWK_PROG -F ':' '/Data,/ {print $1}'

That makes sense to me, will update it.

Thanks for the comments.

thanks,

-liubo

> 
> Thanks,
> Eryu
> 
> > +
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/btrfs/151.out b/tests/btrfs/151.out
> > new file mode 100644
> > index 000..0a1de06
> > --- /dev/null
> > +++ b/tests/btrfs/151.out
> > @@ -0,0 +1,2 @@
> > +QA output created by 151
> > +Data, RAID1
> > diff --git a/tests/btrfs/group b/tests/btrfs/group
> > index e73bb1b..a7ff7b0 100644
> > --- a/tests/btrfs/group
> > +++ b/tests/btrfs/group
> > @@ -153,3 +153,4 @@
> >  148 auto quick rw
> >  149 auto quick send compress
> >  150 auto quick dangerous
> > +151 auto quick
> > -- 
> > 2.5.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe fstests" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from 

Re: [PATCH] btrfs: fix false EIO for missing device

2017-10-13 Thread Liu Bo
On Fri, Oct 13, 2017 at 09:42:18PM +0800, Anand Jain wrote:
> When one of the device is missing, bbio_error() takes care
> of setting the error status. And if its only IO that is
> pending in that stripe, it fails to check the status of the
> other IO at %bbio_error before setting the error %bi_status
> for the %orig_bio. Fix this by checking if %bbio->error is
> has crossed the %bbio->max_errors. Thxs.
> 
> Reproducer as below fdatasync error is seen intermittently.
> 
>  mount -o degraded /dev/sdc /btrfs
>  dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 
> conv=fdatasync
> 
>  dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
> 
>  The reason for the intermittences of the problem is because..
>  following condition has to be met, which depends on timely
>  coordination.
>  In btrfs_map_bio()
>   . The RAID1 the missing device has to be at %dev_nr = 1
>  In bbio_error()
>   . Before bbio_error() is called the bio of the not-missing
> device at %dev_nr=0 must be completed so that the below
> condition is true
>  if (atomic_dec_and_test(>stripes_pending)) {
>


> Signed-off-by: Anand Jain 
> ---
>  fs/btrfs/volumes.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9af633dcf015..efd502176915 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct 
> bio *bio, u64 logical)
>  
>   btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
>   bio->bi_iter.bi_sector = logical >> 9;
> - bio->bi_status = BLK_STS_IOERR;
> + if (atomic_read(>error) > bbio->max_errors)
> + bio->bi_status = BLK_STS_IOERR;
> + else
> + bio->bi_status = 0;

Thanks for the fix, I'd prefer BLK_STS_OK rather than 0.

With that,

Reviewed-by: Liu Bo 

-liubo
>   btrfs_end_bbio(bbio, bio);
>   }
>  }
> -- 
> 2.13.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs errors over NFS

2017-10-13 Thread Peter Grandi
>> TL;DR: ran into some btrfs errors and weird behaviour, but
>> things generally seem to work. Just posting some details in
>> case it helps devs or other users. [ ... ] I've run into a
>> btrfs error trying to do a -j8 build of android on a btrfs
>> filesystem exported over NFSv3. [ ... ]

I have an NFS server from Btrfs filesystem, and it is mostly
read-only and low-use, unlike a massive build, but so far it has
worked for me. The issue that was reported a while ago was that
the kernel NFS server does not report as errors to clients
checksum validation failures, just prints a warning, so for that
reason and a few others I switch to the Ganesha NFS server.

>From your stack traces I noticed that some go pretty deep so
maybe there is an issue with that (but on 'amd64' the kernel
stack is much bigger than it used to be on 'i386'). Another
possibility is that the volume got somewhat damaged for other
reasons (bugs, media errors, ...) and this is have further
consequences.

BTW 'errno 17' is "File exists", so perhaps there is a race
condition over NFS. The bogus files with mode 0 seem to me to be
bogus directory entries with no files linked to them, which
could be again the result of race conditions. The problems with
chunks allocation reported as "WARNING" are unfamiliar to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/21] btrfs: remove type argument from comp_tree_refs

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:58PM -0400, Josef Bacik wrote:
> We can get this from the ref we've passed in.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/21] btrfs: remove delayed_ref_node from ref_head

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:57PM -0400, Josef Bacik wrote:
> This is just excessive information in the ref_head, and makes the code
> complicated.  It is a relic from when we had the heads and the refs in
> the same tree, which is no longer the case.  With this removal I've
> cleaned up a bunch of the cruft around this old assumption as well.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 

> @@ -91,8 +82,8 @@ struct btrfs_delayed_extent_op {
>   * reference count modifications we've queued up.
>   */
>  struct btrfs_delayed_ref_head {
> - struct btrfs_delayed_ref_node node;
> -
> + u64 bytenr, num_bytes;

In structures, one declaration per line is recommended. Fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/21] btrfs: move all ref head cleanup to the helper function

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:56PM -0400, Josef Bacik wrote:
> We do a couple different cleanup operations on the ref head.  We adjust
> counters, we'll free any reserved space if we didn't end up using the
> ref, and we clear the pending csum bytes.  Move all these disparate
> things into cleanup_ref_head and clean up the logic in
> __btrfs_run_delayed_refs so that it handles the !ref case a lot cleaner,
> as well as making run_one_delayed_ref() only deal with real refs and not
> the ref head.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 

This one was tough.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/21] btrfs: move ref_mod modification into the if (ref) logic

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:55PM -0400, Josef Bacik wrote:
> We only use this logic if our ref isn't a ref_head, so move it up into
> the if (ref) case since we know that this is a normal ref and not a
> delayed ref head.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/21] btrfs: breakout empty head cleanup to a helper

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:54PM -0400, Josef Bacik wrote:
> Move this code out to a helper function to further simplivy
> __btrfs_run_delayed_refs.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/21] btrfs: move extent_op cleanup to a helper

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:53PM -0400, Josef Bacik wrote:
> Move the extent_op cleanup for an empty head ref to a helper function to
> help simplify __btrfs_run_delayed_refs.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/21] btrfs: add a helper to return a head ref

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:52PM -0400, Josef Bacik wrote:
> Simplify the error handling in __btrfs_run_delayed_refs by breaking out
> the code used to return a head back to the delayed_refs tree for
> processing into a helper function.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/21] Btrfs: add a extent ref verify tool

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:50PM -0400, Josef Bacik wrote:
> We were having corruption issues that were tied back to problems with the 
> extent
> tree.  In order to track them down I built this tool to try and find the
> culprit, which was pretty successful.  If you compile with this tool on it 
> will
> live verify every ref update that the fs makes and make sure it is consistent
> and valid.  I've run this through with xfstests and haven't gotten any false
> positives.  Thanks,
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 

I've fixed the error messages, they should not start with an uppercase
letter and must not end with \n, and can be un-indented so they fit as
much as possible under 80.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/21] btrfs: pass root to various extent ref mod functions

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:49PM -0400, Josef Bacik wrote:
> We need the actual root for the ref verifier tool to work, so change
> these functions to pass the root around instead.  This will be used in
> a subsequent patch.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/21] btrfs: add ref-verify mount option

2017-10-13 Thread David Sterba
On Fri, Oct 13, 2017 at 03:53:36PM +0200, David Sterba wrote:
> On Fri, Sep 29, 2017 at 03:43:48PM -0400, Josef Bacik wrote:
> > This adds the infrastructure for turning ref verify on and off for a
> > mount, to be used by a later patch.
> > 
> > Signed-off-by: Josef Bacik 
> 
> Reviewed-by: David Sterba 

I've added this hunk so we get the config options recorded at module load time.

@@ -2332,6 +2332,9 @@ static void btrfs_print_mod_info(void)
 #endif
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
", integrity-checker=on"
+#endif
+#ifdef CONFIG_BTRFS_FS_REF_VERIFY
+   ", ref-verify=on"
 #endif
"\n",
btrfs_crc32c_impl());
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread Nikolay Borisov

> 
> The outstanding_extents accounting is consistent with only the items needed to
> handle the outstanding extent items.  However since changing the inode 
> requires
> updating the inode item as well we have to keep this floating reservation for
> the inode item until we have 0 outstanding extents.  The way we do this is 
> with
> the BTRFS_INODE_DELALLOC_META_RESERVED flag.  So if it isn't set we will
> allocate nr_exntents + 1 in btrfs_delalloc_reserve_metadata() and then set our
> bit.  If we ever steal this reservation we make sure to clear the flag so we
> know we don't have to clean it up when outstanding_extents goes to 0.  It's 
> not
> super intuitive but needs to be done under the BTRFS_I(inode)->lock so this 
> was
> the best place to put it.  I suppose we could move the logic out of here and 
> put
> it somewhere else to make it more clear.

I think defining this logic in its own, discrete block of code would be
best w.r.t readibility. It's not super obvious.


I'm slowly going through your patchkit so expect more question but
otherwise the delalloc stuff after this and patch 03 really start
looking a lot more obvious !


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/21] btrfs: add ref-verify mount option

2017-10-13 Thread David Sterba
On Fri, Sep 29, 2017 at 03:43:48PM -0400, Josef Bacik wrote:
> This adds the infrastructure for turning ref verify on and off for a
> mount, to be used by a later patch.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix false EIO for missing device

2017-10-13 Thread Anand Jain
When one of the device is missing, bbio_error() takes care
of setting the error status. And if its only IO that is
pending in that stripe, it fails to check the status of the
other IO at %bbio_error before setting the error %bi_status
for the %orig_bio. Fix this by checking if %bbio->error is
has crossed the %bbio->max_errors. Thxs.

Reproducer as below fdatasync error is seen intermittently.

 mount -o degraded /dev/sdc /btrfs
 dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 
conv=fdatasync

 dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error

 The reason for the intermittences of the problem is because..
 following condition has to be met, which depends on timely
 coordination.
 In btrfs_map_bio()
  . The RAID1 the missing device has to be at %dev_nr = 1
 In bbio_error()
  . Before bbio_error() is called the bio of the not-missing
device at %dev_nr=0 must be completed so that the below
condition is true
 if (atomic_dec_and_test(>stripes_pending)) {

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9af633dcf015..efd502176915 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct 
bio *bio, u64 logical)
 
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
bio->bi_iter.bi_sector = logical >> 9;
-   bio->bi_status = BLK_STS_IOERR;
+   if (atomic_read(>error) > bbio->max_errors)
+   bio->bi_status = BLK_STS_IOERR;
+   else
+   bio->bi_status = 0;
btrfs_end_bbio(bbio, bio);
}
 }
-- 
2.13.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread David Sterba
On Fri, Oct 13, 2017 at 09:10:52AM -0400, Josef Bacik wrote:
> On Fri, Oct 13, 2017 at 11:39:15AM +0300, Nikolay Borisov wrote:
> > 
> > 
> > On 29.09.2017 22:43, Josef Bacik wrote:
> > >  
> > > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode 
> > > *inode,
> > > +  int mod)
> > > +{
> > > + ASSERT(spin_is_locked(>lock));
> > > + inode->outstanding_extents += mod;
> > > + if (btrfs_is_free_space_inode(inode))
> > > + return;
> > > +}
> > > +
> > > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> > > +   int mod)
> > > +{
> > > + ASSERT(spin_is_locked(>lock));
> > 
> > lockdep_assert_held(>lock); for both functions. I've spoken with
> > Peterz and he said any other way of checking whether a lock is held, I
> > quote, "must die"
> > 
> 
> Blah I continually forget that's what we're supposed to use now.

I try to keep such information on
https://btrfs.wiki.kernel.org/index.php/Development_notes#BCP

Just started the section with best practices, feel free to add more.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: cleanup extent locking sequence

2017-10-13 Thread David Sterba
On Tue, Oct 10, 2017 at 09:26:41PM -0500, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 
> Code cleanup for better understanding:
> needs_unlock to be called extent_locked to show state as opposed to
> action.
> Changed the variable to int, to reduce code in the critical path(code usually
> executed).
> 
> Signed-off-by: Goldwyn Rodrigues 
> ---
>  fs/btrfs/file.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index f6c6754cf52d..a3d006d14683 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1590,7 +1590,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
> file *file,
>   int ret = 0;
>   bool only_release_metadata = false;
>   bool force_page_uptodate = false;
> - bool need_unlock;
> + int extents_locked;

The variable can be moved to the while() { } block

>  
>   nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
>   PAGE_SIZE / (sizeof(struct page *)));
> @@ -1670,7 +1670,6 @@ static noinline ssize_t __btrfs_buffered_write(struct 
> file *file,
>   }
>  
>   release_bytes = reserve_bytes;
> - need_unlock = false;
>  again:
>   /*
>* This is going to setup the pages array with the number of
> @@ -1683,16 +1682,15 @@ static noinline ssize_t __btrfs_buffered_write(struct 
> file *file,
>   if (ret)
>   break;
>  
> - ret = lock_and_cleanup_extent_if_need(BTRFS_I(inode), pages,
> + extents_locked = lock_and_cleanup_extent_if_need(
> + BTRFS_I(inode), pages,
>   num_pages, pos, write_bytes, ,
>   , _state);
> - if (ret < 0) {
> + if (extents_locked < 0) {
>   if (ret == -EAGAIN)

either

if (extents_locked == -EAGAIN)

or

move 'ret = extents_locked' above the if.

>   goto again;
> + ret = extents_locked;
>   break;
> - } else if (ret > 0) {
> - need_unlock = true;
> - ret = 0;
>   }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/21] btrfs: make the delalloc block rsv per inode

2017-10-13 Thread Josef Bacik
On Fri, Oct 13, 2017 at 02:47:32PM +0300, Nikolay Borisov wrote:
> 
> 
> > @@ -8848,7 +8849,6 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, 
> > struct iov_iter *iter)
> > if (iov_iter_rw(iter) == WRITE) {
> > up_read(_I(inode)->dio_sem);
> > current->journal_info = NULL;
> > -   btrfs_delalloc_release_extents(BTRFS_I(inode), count);
> > if (ret < 0 && ret != -EIOCBQUEUED) {
> > if (dio_data.reserve)
> > btrfs_delalloc_release_space(inode, 
> > data_reserved,
> > @@ -8869,6 +8869,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, 
> > struct iov_iter *iter)
> > } else if (ret >= 0 && (size_t)ret < count)
> > btrfs_delalloc_release_space(inode, data_reserved,
> > offset, count - (size_t)ret);
> 
> In case we didn't manage to write everything we are releasing the extra
> stuff that wasn't written.
> 
> > +   btrfs_delalloc_release_extents(BTRFS_I(inode), count);
> In case btrfs_delalloc_release_space triggered wouldn't freeing metadata
> here cause some sort of an underflow? Shouldn't we adjust count in case
> we have already freed anything beforehand?
> 

No, btrfs_delalloc_release_extents() is only for modifying the
outstanding_extents and adjusting the reservation,
btrfs_delalloc_release_space() takes care of the outstanding csum_bytes and
adjusts the reservation.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread Josef Bacik
On Fri, Oct 13, 2017 at 11:39:15AM +0300, Nikolay Borisov wrote:
> 
> 
> On 29.09.2017 22:43, Josef Bacik wrote:
> >  
> > +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
> > +int mod)
> > +{
> > +   ASSERT(spin_is_locked(>lock));
> > +   inode->outstanding_extents += mod;
> > +   if (btrfs_is_free_space_inode(inode))
> > +   return;
> > +}
> > +
> > +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> > + int mod)
> > +{
> > +   ASSERT(spin_is_locked(>lock));
> 
> lockdep_assert_held(>lock); for both functions. I've spoken with
> Peterz and he said any other way of checking whether a lock is held, I
> quote, "must die"
> 

Blah I continually forget that's what we're supposed to use now.

> > +   inode->reserved_extents += mod;
> > +   if (btrfs_is_free_space_inode(inode))
> > +   return;
> > +}
> > +
> >  static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 
> > generation)
> >  {
> > int ret = 0;
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index a7f68c304b4c..1262612fbf78 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct 
> > btrfs_root *root,
> >  u64 *qgroup_reserved, bool use_global_rsv);
> >  void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
> >   struct btrfs_block_rsv *rsv);
> > +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 
> > num_bytes);
> > +
> >  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 
> > num_bytes);
> >  void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 
> > num_bytes);
> >  int btrfs_delalloc_reserve_space(struct inode *inode,
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 1a6aced00a19..aa0f5c8953b0 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct 
> > btrfs_fs_info *fs_info,
> >  }
> >  
> >  /**
> > - * drop_outstanding_extent - drop an outstanding extent
> > + * drop_over_reserved_extents - drop our extra extent reservations
> >   * @inode: the inode we're dropping the extent for
> > - * @num_bytes: the number of bytes we're releasing.
> >   *
> > - * This is called when we are freeing up an outstanding extent, either 
> > called
> > - * after an error or after an extent is written.  This will return the 
> > number of
> > - * reserved extents that need to be freed.  This must be called with
> > - * BTRFS_I(inode)->lock held.
> > + * We reserve extents we may use, but they may have been merged with other
> > + * extents and we may not need the extra reservation.
> > + *
> > + * We also call this when we've completed io to an extent or had an error 
> > and
> > + * cleared the outstanding extent, in either case we no longer need our
> > + * reservation and can drop the excess.
> >   */
> > -static unsigned drop_outstanding_extent(struct btrfs_inode *inode,
> > -   u64 num_bytes)
> > +static unsigned drop_over_reserved_extents(struct btrfs_inode *inode)
> >  {
> > -   unsigned drop_inode_space = 0;
> > -   unsigned dropped_extents = 0;
> > -   unsigned num_extents;
> > +   unsigned num_extents = 0;
> >  
> > -   num_extents = count_max_extents(num_bytes);
> > -   ASSERT(num_extents);
> > -   ASSERT(inode->outstanding_extents >= num_extents);
> > -   inode->outstanding_extents -= num_extents;
> > +   if (inode->reserved_extents > inode->outstanding_extents) {
> > +   num_extents = inode->reserved_extents -
> > +   inode->outstanding_extents;
> > +   btrfs_mod_reserved_extents(inode, -num_extents);
> > +   }
> >  
> > if (inode->outstanding_extents == 0 &&
> > test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED,
> >>runtime_flags))
> > -   drop_inode_space = 1;
> > -
> > -   /*
> > -* If we have more or the same amount of outstanding extents than we 
> > have
> > -* reserved then we need to leave the reserved extents count alone.
> > -*/
> > -   if (inode->outstanding_extents >= inode->reserved_extents)
> > -   return drop_inode_space;
> > -
> > -   dropped_extents = inode->reserved_extents - inode->outstanding_extents;
> > -   inode->reserved_extents -= dropped_extents;
> > -   return dropped_extents + drop_inode_space;
> > +   num_extents++;
> > +   return num_extents;
> 
> Something bugs me around the handling of this. In
> btrfs_delalloc_reserve_metadata we do add the additional bytes necessary
> for updating the inode. However outstanding_extents is modified with the
> number of extent items necessary to cover the requires byte range but we
> don't account the extra inode item as being an extent. Then why do we
> have to actually increment 

Re: [PATCH][v2] btrfs: fix send ioctl on 32bit with 64bit kernel

2017-10-13 Thread David Sterba
On Wed, Sep 27, 2017 at 10:43:13AM -0400, Josef Bacik wrote:
> We pass in a pointer in our send arg struct, this means the struct size 
> doesn't
> match with 32bit user space and 64bit kernel space.  Fix this by adding a 
> compat
> mode and doing the appropriate conversion.
> 
> Signed-off-by: Josef Bacik 

Reviewed-by: David Sterba 

I've moved the structure declaration next to the one for receive on 32/64,
the comment applies to both.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] btrfs: add_missing_dev() should return the actual error

2017-10-13 Thread David Sterba
On Mon, Oct 09, 2017 at 11:07:43AM +0800, Anand Jain wrote:
> add_missing_dev() can return device pointer so that IS_ERR/
> PTR_ERR can be used to check for the actual error occurred
> in the function.
> 
> Signed-off-by: Anand Jain 
> ---

1-4 added, with some tweaks. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/4] btrfs: fix EIO misuse to report missing degraded option

2017-10-13 Thread David Sterba
On Mon, Oct 09, 2017 at 11:07:44AM +0800, Anand Jain wrote:
> EIO is only for the IO failure to the device, avoid it.
> 
> Signed-off-by: Anand Jain 

Updated changelog.

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] btrfs: add_missing_dev() should return the actual error

2017-10-13 Thread David Sterba
On Wed, Oct 11, 2017 at 12:46:18PM +0800, Anand Jain wrote:
> add_missing_dev() can return device pointer so that IS_ERR/
> PTR_ERR can be used to check for the actual error occurred
> in the function.
> 
> Signed-off-by: Anand Jain 
> Reviewed-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Clean up unused variables in free-space-tree.c

2017-10-13 Thread David Sterba
On Thu, Oct 12, 2017 at 02:43:27PM -0700, Omar Sandoval wrote:
> On Thu, Oct 12, 2017 at 10:40:42PM +0100, Christos Gkekas wrote:
> > Remove variables 'start' and 'end', which are set but never used.
> 
> Oops.
> 
> Reviewed-by: Omar Sandoval 

Thanks, added to the queue.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-10-13 Thread David Sterba
On Fri, Oct 13, 2017 at 11:27:35AM +0200, Arnd Bergmann wrote:
> We now get a harmless compile-time on 32-bit architectures:
> 
> fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
> fs/btrfs/tree-checker.c:189:70: error: format '%lu' expects argument of type 
> 'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=]
> 
> This changes the format string to use %zu instead of %lu for size_t.
> 
> Fixes: c1f6520bf360 ("btrfs: tree-checker: Enhance output for 
> check_extent_data_item")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs warnings in kernel 4.13.5

2017-10-13 Thread Juan Orti Alcaine
2017-10-13 13:02 GMT+02:00 Duncan <1i5t5.dun...@cox.net>:
> Those warnings aren't anything to be /too/ worried about.  They are
> triggered when a btrfs device size isn't a multiple of the btrfs
> sectorsize (currently 4 KiB on amd64 aka x86_64).  You can manually
> shrink your btrfs devices the fraction to an exact 4 KiB multiple, or
> wait a bit, and a new release of btrfs-tools should have a command to do
> it for you.


Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/21] btrfs: make the delalloc block rsv per inode

2017-10-13 Thread Nikolay Borisov


On 29.09.2017 22:43, Josef Bacik wrote:
> The way we handle delalloc metadata reservations has gotten
> progressively more complicated over the years.  There is so much cruft
> and weirdness around keeping the reserved count and outstanding counters
> consistent and handling the error cases that it's impossible to
> understand.
> 
> Fix this by making the delalloc block rsv per-inode.  This way we can
> calculate the actual size of the outstanding metadata reservations every
> time we make a change, and then reserve the delta based on that amount.
> This greatly simplifies the code everywhere, and makes the error
> handling in btrfs_delalloc_reserve_metadata far less terrifying.
> 
> Signed-off-by: Josef Bacik 
> ---
>  fs/btrfs/btrfs_inode.h   |  27 ++--
>  fs/btrfs/ctree.h |   5 +-
>  fs/btrfs/delayed-inode.c |  46 +--
>  fs/btrfs/disk-io.c   |  18 ++-
>  fs/btrfs/extent-tree.c   | 320 
> ---
>  fs/btrfs/inode.c |  18 +--
>  6 files changed, 141 insertions(+), 293 deletions(-)
> 
> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
> index 22daa79e77b8..f9c6887a8b6c 100644
> --- a/fs/btrfs/btrfs_inode.h
> +++ b/fs/btrfs/btrfs_inode.h
> @@ -36,14 +36,13 @@
>  #define BTRFS_INODE_ORPHAN_META_RESERVED 1
>  #define BTRFS_INODE_DUMMY2
>  #define BTRFS_INODE_IN_DEFRAG3
> -#define BTRFS_INODE_DELALLOC_META_RESERVED   4
> -#define BTRFS_INODE_HAS_ORPHAN_ITEM  5
> -#define BTRFS_INODE_HAS_ASYNC_EXTENT 6
> -#define BTRFS_INODE_NEEDS_FULL_SYNC  7
> -#define BTRFS_INODE_COPY_EVERYTHING  8
> -#define BTRFS_INODE_IN_DELALLOC_LIST 9
> -#define BTRFS_INODE_READDIO_NEED_LOCK10
> -#define BTRFS_INODE_HAS_PROPS11
> +#define BTRFS_INODE_HAS_ORPHAN_ITEM  4
> +#define BTRFS_INODE_HAS_ASYNC_EXTENT 5
> +#define BTRFS_INODE_NEEDS_FULL_SYNC  6
> +#define BTRFS_INODE_COPY_EVERYTHING  7
> +#define BTRFS_INODE_IN_DELALLOC_LIST 8
> +#define BTRFS_INODE_READDIO_NEED_LOCK9
> +#define BTRFS_INODE_HAS_PROPS10
>  
>  /* in memory btrfs inode */
>  struct btrfs_inode {
> @@ -176,7 +175,8 @@ struct btrfs_inode {
>* of extent items we've reserved metadata for.
>*/
>   unsigned outstanding_extents;
> - unsigned reserved_extents;
> +
> + struct btrfs_block_rsv block_rsv;
>  
>   /*
>* Cached values of inode properties
> @@ -278,15 +278,6 @@ static inline void btrfs_mod_outstanding_extents(struct 
> btrfs_inode *inode,
> mod);
>  }
>  
> -static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> -   int mod)
> -{
> - ASSERT(spin_is_locked(>lock));
> - inode->reserved_extents += mod;
> - if (btrfs_is_free_space_inode(inode))
> - return;
> -}
> -
>  static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 
> generation)
>  {
>   int ret = 0;
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 1262612fbf78..93e767e16a43 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -763,8 +763,6 @@ struct btrfs_fs_info {
>* delayed dir index item
>*/
>   struct btrfs_block_rsv global_block_rsv;
> - /* block reservation for delay allocation */
> - struct btrfs_block_rsv delalloc_block_rsv;
>   /* block reservation for metadata operations */
>   struct btrfs_block_rsv trans_block_rsv;
>   /* block reservation for chunk tree */
> @@ -2751,6 +2749,9 @@ int btrfs_delalloc_reserve_space(struct inode *inode,
>  void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
>  struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
> unsigned short type);
> +void btrfs_init_metadata_block_rsv(struct btrfs_fs_info *fs_info,
> +struct btrfs_block_rsv *rsv,
> +unsigned short type);
>  void btrfs_free_block_rsv(struct btrfs_fs_info *fs_info,
> struct btrfs_block_rsv *rsv);
>  void __btrfs_free_block_rsv(struct btrfs_block_rsv *rsv);
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 19e4ad2f3f2e..5d73f79ded8b 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -581,7 +581,6 @@ static int btrfs_delayed_inode_reserve_metadata(
>   struct btrfs_block_rsv *dst_rsv;
>   u64 num_bytes;
>   int ret;
> - bool release = false;
>  
>   src_rsv = trans->block_rsv;
>   dst_rsv = _info->delayed_block_rsv;
> @@ -589,36 +588,13 @@ static int btrfs_delayed_inode_reserve_metadata(
>   num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
>  
>   /*
> -  * If our block_rsv is the delalloc block reserve 

Re: USB upgrade fun

2017-10-13 Thread Austin S. Hemmelgarn

On 2017-10-12 21:42, Kai Hendry wrote:

Thank you Austin & Chris for your replies!

On Fri, 13 Oct 2017, at 01:19 AM, Austin S. Hemmelgarn wrote:

Same here on a pair of 3 year old NUC's.  Based on the traces and the
other information, I'd be willing to bet this is probably the root cause
of the issues.


It probably is... since when I remove my new 4TB USB disk from the
front, I am at least able to mount my two 2x2TB in degraded mode and see
my data!
Given this, I think I know exactly what's wrong (although confirmation 
from a developer that what I think is going on can actually happen would 
be nice).  Based on what you're saying, the metadata on the two 2TB 
drives says they're the only ones in the array, while the metadata on 
the 4TB drive says all three are in the array, but is missing almost all 
other data and is out of sync with the 2TB drives.


So I am not quite sure what to do now.
I don't trust USB hubs.
Yeah, I don't trust USB in general for permanently attached storage. 
Not just because of power problems like this, but because all kinds of 
things can cause random disconnects, which in turn cause issues with any 
filesystem (BTRFS just makes it easier to notice them).


On a different NUC I've noticed I can't charge my iPhone anymore!
https://mail-archive.com/linux-usb@vger.kernel.org/msg95231.html So...
is there any end in sight for the "USB power" problem? Does USB-C /
Thunderbolt address this issue? :(
In theory, yes, but I'm not sure if the NUC's that include it properly 
support the USB Power Delivery specification (if not, then they can't 
safely source more than 500mA, which is too little for a traditional 
hard drive).


Try return my new 4TB to Amazon and find an externally powered one


merely the Btrfs signature is wiped from the deleted device(s). So you
could restore that signature and the device would be valid again;


Wonder how would you do that, in order to have a working snapshot that I
can put in cold storage?
In practice, it's too much work to be practical.  It requires rebuilding 
the metadata tree from scratch, which is pretty much impossible with the 
current tools (and even then you likely wouldn't have all the data, 
because some may have been overwritten during the device removal).


Nonetheless I hope the btrfs developers can make it possible to remove a
RAID1 drive, to put in cold storage use case, without any pfaffing.
From a practical perspective, you're almost certainly better off 
creating a copy for cold storage without involving BTRFS.  As an 
alternative, I would suggest one of the following approaches:


1. Take a snapshot of the filesystem, and use send/receive to transfer 
that to another device which you then remove and store somewhere.
2. Use rsync to copy things to another device which you then remove and 
store somewhere.


Both these options are safer, less likely to screw up your existing 
filesystem, and produce copies that can safely be connected to the 
original system at the same time as the original filesystem.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs warnings in kernel 4.13.5

2017-10-13 Thread Duncan
Juan Orti Alcaine posted on Fri, 13 Oct 2017 10:40:20 +0200 as excerpted:

> Hi,
> 
> I've upgraded my system to Fedora 27 and now I see many btrfs warnings,
> although the system seems to be working fine. Is this something I should
> worry about?

> # uname -a Linux xenon 4.13.5-300.fc27.x86_64 #1 SMP Thu Oct 5 16:57:11
> UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> 
> # btrfs --version btrfs-progs v4.13.2

> [  337.813416] [ cut here ]
> [  337.813453] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
> btrfs_update_device+0x1be/0x1d0 [btrfs]

> [  337.814256] [ cut here ]
> [  337.814281] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
> btrfs_update_device+0x1be/0x1d0 [btrfs]

> [  337.822161] [ cut here ]
> [  337.822185] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
> btrfs_update_device+0x1be/0x1d0 [btrfs]

> [  337.822868] [ cut here ]
> [  337.822890] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
> btrfs_update_device+0x1be/0x1d0 [btrfs]


Those warnings aren't anything to be /too/ worried about.  They are 
triggered when a btrfs device size isn't a multiple of the btrfs 
sectorsize (currently 4 KiB on amd64 aka x86_64).  You can manually 
shrink your btrfs devices the fraction to an exact 4 KiB multiple, or 
wait a bit, and a new release of btrfs-tools should have a command to do 
it for you. 

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: tree-checker: use %zu format string for size_t

2017-10-13 Thread Arnd Bergmann
We now get a harmless compile-time on 32-bit architectures:

fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
fs/btrfs/tree-checker.c:189:70: error: format '%lu' expects argument of type 
'long unsigned int', but argument 6 has type 'unsigned int' [-Werror=format=]

This changes the format string to use %zu instead of %lu for size_t.

Fixes: c1f6520bf360 ("btrfs: tree-checker: Enhance output for 
check_extent_data_item")
Signed-off-by: Arnd Bergmann 
---
 fs/btrfs/tree-checker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index c7a09cc2a17c..388fb6553aa5 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -186,7 +186,7 @@ static int check_extent_data_item(struct btrfs_root *root,
/* Regular or preallocated extent has fixed item size */
if (item_size != sizeof(*fi)) {
file_extent_err(root, leaf, slot,
-   "invalid item size for reg/prealloc file extent, have 
%u expect %lu",
+   "invalid item size for reg/prealloc file extent, have 
%u expect %zu",
item_size, sizeof(*fi));
return -EUCLEAN;
}
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: subvol list: don't print deleted subvol as TOPLEVEL

2017-10-13 Thread Lu Fengqi
We should use entry->root_id instead of top_id to determine whether it is
the toplevel subvolume.

Signed-off-by: Lu Fengqi 
---
 btrfs-list.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/btrfs-list.c b/btrfs-list.c
index 6f2d8774..4ea8edfd 100644
--- a/btrfs-list.c
+++ b/btrfs-list.c
@@ -1279,7 +1279,7 @@ static void filter_and_sort_subvol(struct root_lookup 
*all_subvols,
 
ret = resolve_root(all_subvols, entry, top_id);
if (ret == -ENOENT) {
-   if (top_id != BTRFS_FS_TREE_OBJECTID) {
+   if (entry->root_id != BTRFS_FS_TREE_OBJECTID) {
entry->full_path = strdup("DELETED");
entry->deleted = 1;
} else {
-- 
2.14.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs errors over NFS

2017-10-13 Thread Nikolay Borisov


On 12.10.2017 20:44, Steve Leung wrote:
> Hi list,
> 
>   TL;DR: ran into some btrfs errors and weird behaviour, but things generally 
> seem 
>   to work.  Just posting some details in case it helps devs or other users.
> 
> I've run into a btrfs error trying to do a -j8 build of android on a btrfs 
> filesystem exported over NFSv3.  That in itself might be unwise, but should 
> be 
> legal.  :)  This is on 64-bit Arch, kernel 4.13.3, and the filesystem is 
> RAID1 on 4 
> devices, with nearly 4TiB unallocated.
> 
> The build itself failed when part of the build process noticed that a file 
> got 
> created with mode 000.  i.e. in "ls -l" the mode column looked like 
> ?-.
> 
> That much happened without any errors in dmesg, though the NFS client 
> complained 
> about some inodes having a mode of 0.
> 
> Then when I tried to delete those files, I got the error below:
> 
> [21476.546060] BTRFS error (device sdc1): err add delayed dir index 
> item(index: 17) into the deletion tree of the delayed node(root id: 21780, 
> inode id: 15668343, errno: -17)
> [21476.546194] [ cut here ]
> [21476.546196] kernel BUG at fs/btrfs/delayed-inode.c:1552!
> [21476.546231] invalid opcode:  [#1] PREEMPT SMP
> [21476.546258] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
> libcrc32c iptable_filter tun coretemp kvm_intel kvm iTCO_wdt ppdev gpio_ich 
> iTCO_vendor_support evdev i915 input_leds psmouse pcspkr irqbypass i2c_i801 
> snd_hda_codec_idt snd_hda_codec_generic mac_hid lpc_ich tpm_tis tpm_tis_core 
> parport_pc tpm parport video drm_kms_helper drm syscopyarea winbond_cir 
> sysfillrect sysimgblt rc_core led_class snd_hda_intel snd_hda_codec 
> snd_hda_core snd_hwdep snd_pcm snd_timer shpchp snd button e1000e soundcore 
> intel_agp intel_gtt acpi_cpufreq mei_me mei fb_sys_fops i2c_algo_bit ptp 
> pps_core sch_fq_codel nfsd auth_rpcgss oid_registry nfs_acl lockd grace 
> sunrpc ip_tables x_tables crc32c_generic btrfs xor raid6_pq sr_mod cdrom
> [21476.546623]  sd_mod hid_generic usbhid hid serio_raw atkbd libps2 uhci_hcd 
> pata_it821x ahci libahci libata firewire_ohci scsi_mod firewire_core 
> crc_itu_t i8042 serio ehci_pci ehci_hcd usbcore usb_common
> [21476.546725] CPU: 0 PID: 984 Comm: rm Not tainted 4.13.3-1-ARCH #1
> [21476.546756] Hardware name:  /DG33TL, BIOS 
> DPP3510J.86A.0572.2009.0715.2346 07/15/2009
> [21476.546801] task: 943668131b80 task.stack: aa3840da4000
> [21476.546854] RIP: 0010:btrfs_delete_delayed_dir_index+0x24d/0x330 [btrfs]
> [21476.546889] RSP: 0018:aa3840da7d18 EFLAGS: 00010286
> [21476.546918] RAX:  RBX: 94367c074b90 RCX: 
> 
> [21476.546953] RDX:  RSI: 9436abc0dc78 RDI: 
> 9436abc0dc78
> [21476.546989] RBP: aa3840da7d88 R08: 0323 R09: 
> 
> [21476.547025] R10: aa3840da7bd8 R11:  R12: 
> 943583503100
> [21476.547060] R13: 94367c074bd8 R14: 94367c074bd0 R15: 
> 943583503218
> [21476.547096] FS:  7f57da0f6500() GS:9436abc0() 
> knlGS:
> [21476.547138] CS:  0010 DS:  ES:  CR0: 80050033
> [21476.547167] CR2: 7f57d9c809c0 CR3: 5e39b000 CR4: 
> 06f0
> [21476.547203] Call Trace:
> [21476.547235]  __btrfs_unlink_inode+0x1bc/0x550 [btrfs]
> [21476.547278]  btrfs_unlink_inode+0x1c/0x50 [btrfs]
> [21476.547317]  btrfs_unlink+0x88/0xe0 [btrfs]
> [21476.547343]  vfs_unlink+0x111/0x1c0
> [21476.547365]  ? __lookup_hash+0x22/0xa0
> [21476.547389]  do_unlinkat+0x2b1/0x310
> [21476.547412]  SyS_unlinkat+0x1b/0x30
> [21476.547434]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [21476.547460] RIP: 0033:0x7f57d9c303d7
> [21476.547481] RSP: 002b:7fff4bc1a6b8 EFLAGS: 0246 ORIG_RAX: 
> 0107
> [21476.547520] RAX: ffda RBX: 5644fee69540 RCX: 
> 7f57d9c303d7
> [21476.547556] RDX:  RSI: 5644fee68310 RDI: 
> ff9c
> [21476.547591] RBP: 5644fee69648 R08: 0003 R09: 
> 
> [21476.547627] R10: 0100 R11: 0246 R12: 
> 5644fee68280
> [21476.547662] R13: 7fff4bc1a7f0 R14: 5644fd37094f R15: 
> 
> [21476.547700] Code: ff ff ff 48 8b 43 10 4c 8b 03 41 b9 ef ff ff ff 48 8b 55 
> b0 48 8b 7d a8 48 c7 c6 b0 99 3f c0 48 8b 88 38 03 00 00 e8 f3 0d f7 ff <0f> 
> 0b 48 8b 4d a8 49 8b 7e 38 8b 81 70 10 00 00 48 8d b1 c8 01 
> [21476.547840] RIP: btrfs_delete_delayed_dir_index+0x24d/0x330 [btrfs] RSP: 
> aa3840da7d18
> [21476.549333] ---[ end trace 41039f5467d2 ]---
> 

Well, the main thing here is that you hit the BUG() in 
btrfs_delete_delayed_dir_index. 
So this happened because you tried to delete a file, which was already staged 
for deletion. 

So which directory corresponds to inode 15668343, presumably the android build 

Re: Btrfs warnings in kernel 4.13.5

2017-10-13 Thread Juan Orti Alcaine
2017-10-13 10:40 GMT+02:00 Juan Orti Alcaine :
> Hi,
>
> I've upgraded my system to Fedora 27 and now I see many btrfs
> warnings, although the system seems to be working fine. Is this
> something I should worry about?


I'm getting more warnings:

[  337.822868] [ cut here ]
[  337.822890] WARNING: CPU: 5 PID: 459 at fs/btrfs/ctree.h:1559
btrfs_update_device+0x1be/0x1d0 [btrfs]
[  337.822891] Modules linked in: rfcomm fuse vhost_net vhost tap
xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat
ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
libcrc32c iptable_mangle iptable_raw iptable_security bridge stp llc
ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep vfat fat
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic
snd_usb_audio snd_hda_codec_hdmi snd_hda_intel snd_hda_codec
snd_hda_core snd_usbmidi_lib eeepc_wmi
[  337.822928]  btusb btrtl uvcvideo snd_hwdep crct10dif_pclmul
crc32_pclmul videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_seq
videobuf2_core ghash_clmulni_intel videodev intel_cstate asus_wmi
snd_rawmidi intel_uncore btbcm snd_seq_device btintel bluetooth
intel_rapl_perf sparse_keymap media ecdh_generic joydev wmi_bmof
snd_pcm rfkill i2c_i801 video iTCO_wdt iTCO_vendor_support mei_me mei
snd_timer wmi snd soundcore shpchp lpc_ich nfsd auth_rpcgss nfs_acl
lockd grace sunrpc binfmt_misc btrfs xor raid6_pq amdkfd amd_iommu_v2
radeon bcache i2c_algo_bit drm_kms_helper crc32c_intel ttm r8169
serio_raw drm mii
[  337.822963] CPU: 5 PID: 459 Comm: btrfs-cleaner Tainted: GW
  4.13.5-300.fc27.x86_64 #1
[  337.822964] Hardware name: System manufacturer System Product
Name/P8Z68-V LE, BIOS 4102 09/09/2013
[  337.822965] task: 9e98479ecc80 task.stack: b3a4c229
[  337.822983] RIP: 0010:btrfs_update_device+0x1be/0x1d0 [btrfs]
[  337.822984] RSP: 0018:b3a4c2293d28 EFLAGS: 00010206
[  337.822986] RAX: 0fff RBX: 9e9844f3ad20 RCX: 01d1c100fe00
[  337.822987] RDX: 0004 RSI: 3efa RDI: 9e9821e5e000
[  337.822988] RBP: b3a4c2293d70 R08: 3efe R09: b3a4c2293ce0
[  337.822989] R10: 1000 R11: 0003 R12: 9e9844999c00
[  337.822990] R13:  R14: 3eda R15: 9e9821e5e000
[  337.822994] FS:  () GS:9e985ed4()
knlGS:
[  337.822995] CS:  0010 DS:  ES:  CR0: 80050033
[  337.822996] CR2: 1f89b1c4d000 CR3: 00037ce09000 CR4: 000426e0
[  337.822997] Call Trace:
[  337.823017]  btrfs_remove_chunk+0x365/0x870 [btrfs]
[  337.823036]  btrfs_delete_unused_bgs+0x30d/0x3d0 [btrfs]
[  337.823054]  cleaner_kthread+0x165/0x180 [btrfs]
[  337.823057]  kthread+0x125/0x140
[  337.823074]  ? __btree_submit_bio_start+0x20/0x20 [btrfs]
[  337.823078]  ? kthread_park+0x60/0x60
[  337.823081]  ret_from_fork+0x25/0x30
[  337.823083] Code: 4c 89 ff 45 31 c0 ba 10 00 00 00 4c 89 f6 e8 fa
20 ff ff 4c 89 ff e8 f2 ee fc ff e9 d3 fe ff ff 41 bd f4 ff ff ff e9
d0 fe ff ff <0f> ff eb b6 e8 29 bd ab ca 66 0f 1f 84 00 00 00 00 00 66
66 66
[  337.823118] ---[ end trace 3609a7da677d8847 ]---
[  805.824111] FS-Cache: Loaded
[  805.835188] FS-Cache: Netfs 'nfs' registered for caching
[  805.861120] Key type dns_resolver registered
[  805.891629] NFS: Registering the id_resolver key type
[  805.891635] Key type id_resolver registered
[  805.891635] Key type id_legacy registered
[  807.427618] nf_conntrack: default automatic helper assignment has
been turned off for security reasons and CT-based  firewall rule not
found. Use the iptables CT target to attach helpers instead.
[ 2512.070835] [ cut here ]
[ 2512.070873] WARNING: CPU: 4 PID: 14051 at fs/btrfs/ctree.h:1559
btrfs_update_device+0x1be/0x1d0 [btrfs]
[ 2512.070874] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver
nfs fscache rfcomm fuse vhost_net vhost tap xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c
iptable_mangle iptable_raw iptable_security bridge stp llc
ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep vfat fat
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic
snd_usb_audio snd_hda_codec_hdmi 

Btrfs warnings in kernel 4.13.5

2017-10-13 Thread Juan Orti Alcaine
Hi,

I've upgraded my system to Fedora 27 and now I see many btrfs
warnings, although the system seems to be working fine. Is this
something I should worry about?

I have a scrub running with no errors so far:

# btrfs scrub status /mnt/btrfs
scrub status for 038b2b48-fd2d-4565-b2b1-d07847ecca8c
scrub started at Thu Oct 12 19:35:55 2017, running for 03:28:41
total bytes scrubbed: 3.35TiB with 0 errors

# uname -a
Linux xenon 4.13.5-300.fc27.x86_64 #1 SMP Thu Oct 5 16:57:11 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.13.2

# btrfs fi show
Label: 'btrfs_raid1'  uuid: 038b2b48-fd2d-4565-b2b1-d07847ecca8c
Total devices 4 FS bytes used 1.81TiB
devid4 size 1.82TiB used 1.19TiB path /dev/bcache32
devid5 size 1.82TiB used 1.19TiB path /dev/bcache16
devid6 size 1.82TiB used 1.19TiB path /dev/bcache0
devid7 size 1.79TiB used 158.00GiB path /dev/bcache48

# btrfs fi usage /mnt/btrfs
Overall:
Device size:   7.24TiB
Device allocated:  3.72TiB
Device unallocated:3.53TiB
Device missing:  0.00B
Used:  3.62TiB
Free (estimated):  1.81TiB  (min: 1.81TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:1.85TiB, Used:1.80TiB
   /dev/bcache01.18TiB
   /dev/bcache16   1.18TiB
   /dev/bcache32   1.18TiB
   /dev/bcache48 157.00GiB

Metadata,RAID1: Size:7.00GiB, Used:5.78GiB
   /dev/bcache05.00GiB
   /dev/bcache16   4.00GiB
   /dev/bcache32   4.00GiB
   /dev/bcache48   1.00GiB

System,RAID1: Size:32.00MiB, Used:320.00KiB
   /dev/bcache16  32.00MiB
   /dev/bcache32  32.00MiB

Unallocated:
   /dev/bcache0  647.02GiB
   /dev/bcache16 647.98GiB
   /dev/bcache32 645.98GiB
   /dev/bcache48   1.63TiB

All my bcache devices are in writethrough mode:

# bcache-status -s
--- bcache ---
UUIDd82cf90a-756c-4b8a-b871-72714cdae024
Block Size  4.00 KiB
Bucket Size 2.0 MiB
Congested?  False
Read Congestion 2.0ms
Write Congestion20.0ms
Total Cache Size95 GiB
Total Cache Used95 GiB(100%)
Total Cache Unused  0 B(0%)
Evictable Cache 95 GiB(100%)
Replacement Policy  [lru] fifo random
Cache Mode  [writethrough] writeback writearound none
Total Hits  150430(62%)
Total Misses90622
Total Bypass Hits   1036307(100%)
Total Bypass Misses 0
Total Bypassed  2 GiB
--- Backing Device ---
  Device File   /dev/sde4 (8:68)
  bcache Device File/dev/bcache48 (252:48)
  Size  2 TiB
  Cache Mode[writethrough] writeback writearound none
  Readahead 0.0k
  Sequential Cutoff 4.0 MiB
  Merge sequential? False
  State clean
  Writeback?True
  Dirty Data0 B
  Total Hits14066(57%)
  Total Misses  10363
  Total Bypass Hits 40454(100%)
  Total Bypass Misses   0
  Total Bypassed134.3 MiB
--- Cache Device ---
  Device File   /dev/sda4 (8:4)
  Size  95 GiB
  Block Size4.00 KiB
  Bucket Size   2.0 MiB
  Replacement Policy[lru] fifo random
  Discard?  False
  I/O Errors0
  Metadata Written  99.1 MiB
  Data Written  6 GiB
  Buckets   48432
  Cache Used95 GiB(100%)
  Cache Unused  0 B(0%)
--- Backing Device ---
  Device File   /dev/sdb1 (8:17)
  bcache Device File/dev/bcache32 (252:32)
  Size  2 TiB
  Cache Mode[writethrough] writeback writearound none
  Readahead 0.0k
  Sequential Cutoff 4.0 MiB
  Merge sequential? False
  State clean
  Writeback?True
  Dirty Data0 B
  Total Hits28167(40%)
  Total Misses  42227
  Total Bypass Hits 632707(100%)
  Total Bypass Misses   0
  Total Bypassed2 GiB
--- Backing Device ---
  Device File   /dev/sdc1 (8:33)
  bcache Device File/dev/bcache16 (252:16)
  Size  2 TiB
  Cache Mode[writethrough] writeback writearound none
  Readahead 0.0k
  Sequential Cutoff 4.0 MiB
  Merge sequential? False
  State clean
  Writeback?True
  Dirty Data0 B
  Total Hits67948(74%)
  Total 

Re: [PATCH 01/21] Btrfs: rework outstanding_extents

2017-10-13 Thread Nikolay Borisov


On 29.09.2017 22:43, Josef Bacik wrote:
>  
> +static inline void btrfs_mod_outstanding_extents(struct btrfs_inode *inode,
> +  int mod)
> +{
> + ASSERT(spin_is_locked(>lock));
> + inode->outstanding_extents += mod;
> + if (btrfs_is_free_space_inode(inode))
> + return;
> +}
> +
> +static inline void btrfs_mod_reserved_extents(struct btrfs_inode *inode,
> +   int mod)
> +{
> + ASSERT(spin_is_locked(>lock));

lockdep_assert_held(>lock); for both functions. I've spoken with
Peterz and he said any other way of checking whether a lock is held, I
quote, "must die"

> + inode->reserved_extents += mod;
> + if (btrfs_is_free_space_inode(inode))
> + return;
> +}
> +
>  static inline int btrfs_inode_in_log(struct btrfs_inode *inode, u64 
> generation)
>  {
>   int ret = 0;
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index a7f68c304b4c..1262612fbf78 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -2742,6 +2742,8 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
> *root,
>u64 *qgroup_reserved, bool use_global_rsv);
>  void btrfs_subvolume_release_metadata(struct btrfs_fs_info *fs_info,
> struct btrfs_block_rsv *rsv);
> +void btrfs_delalloc_release_extents(struct btrfs_inode *inode, u64 
> num_bytes);
> +
>  int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 
> num_bytes);
>  void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 
> num_bytes);
>  int btrfs_delalloc_reserve_space(struct inode *inode,
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 1a6aced00a19..aa0f5c8953b0 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5971,42 +5971,31 @@ void btrfs_subvolume_release_metadata(struct 
> btrfs_fs_info *fs_info,
>  }
>  
>  /**
> - * drop_outstanding_extent - drop an outstanding extent
> + * drop_over_reserved_extents - drop our extra extent reservations
>   * @inode: the inode we're dropping the extent for
> - * @num_bytes: the number of bytes we're releasing.
>   *
> - * This is called when we are freeing up an outstanding extent, either called
> - * after an error or after an extent is written.  This will return the 
> number of
> - * reserved extents that need to be freed.  This must be called with
> - * BTRFS_I(inode)->lock held.
> + * We reserve extents we may use, but they may have been merged with other
> + * extents and we may not need the extra reservation.
> + *
> + * We also call this when we've completed io to an extent or had an error and
> + * cleared the outstanding extent, in either case we no longer need our
> + * reservation and can drop the excess.
>   */
> -static unsigned drop_outstanding_extent(struct btrfs_inode *inode,
> - u64 num_bytes)
> +static unsigned drop_over_reserved_extents(struct btrfs_inode *inode)
>  {
> - unsigned drop_inode_space = 0;
> - unsigned dropped_extents = 0;
> - unsigned num_extents;
> + unsigned num_extents = 0;
>  
> - num_extents = count_max_extents(num_bytes);
> - ASSERT(num_extents);
> - ASSERT(inode->outstanding_extents >= num_extents);
> - inode->outstanding_extents -= num_extents;
> + if (inode->reserved_extents > inode->outstanding_extents) {
> + num_extents = inode->reserved_extents -
> + inode->outstanding_extents;
> + btrfs_mod_reserved_extents(inode, -num_extents);
> + }
>  
>   if (inode->outstanding_extents == 0 &&
>   test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED,
>  >runtime_flags))
> - drop_inode_space = 1;
> -
> - /*
> -  * If we have more or the same amount of outstanding extents than we 
> have
> -  * reserved then we need to leave the reserved extents count alone.
> -  */
> - if (inode->outstanding_extents >= inode->reserved_extents)
> - return drop_inode_space;
> -
> - dropped_extents = inode->reserved_extents - inode->outstanding_extents;
> - inode->reserved_extents -= dropped_extents;
> - return dropped_extents + drop_inode_space;
> + num_extents++;
> + return num_extents;

Something bugs me around the handling of this. In
btrfs_delalloc_reserve_metadata we do add the additional bytes necessary
for updating the inode. However outstanding_extents is modified with the
number of extent items necessary to cover the requires byte range but we
don't account the extra inode item as being an extent. Then why do we
have to actually increment the num_extents here? Doesn't this lead to
underflow?

To illustrate: A write comes for 1 mb, we count this as 1 outstanding
extent in delalloc_reserve_metadata:

nr_extents = count_max_extents(num_bytes);
inode->outstanding_extents += nr_extents;

We do account the extra inode item but only 

Re: [PATCH] btrfs: set include path relatively

2017-10-13 Thread Naohiro Aota
2017-10-12 21:38 GMT+09:00 David Sterba :
> On Thu, Oct 12, 2017 at 11:22:24AM +0900, Naohiro Aota wrote:
>> Currently, gcc is passed the include directory with full path. As a result,
>> dependency files (*.o.d) also record the full path at the build time. Such
>> full path dependency is annoying for sharing the source between multiple
>> machines, containers, or anything the path differ.
>>
>> And this is the same way what other program using autotools e.g. e2fsprogs
>> is doing:
>>
>> $ grep top_builddir Makefile
>> top_builddir = .
>> CPPFLAGS = -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib
>> BUILD_CFLAGS = -g -O2  -I. -I$(top_builddir)/lib -I$(top_srcdir)/lib 
>> -DHAVE_CONFIG_H
>> 
>
> Makes sense for the header files. TOPDIR is also used for linking, can
> it cause similar problems, where the library search path is set as 
> -L$(TOPDIR) ?

Since both "TOPDIR := $(shell pwd)" and -L$(TOPDIR) is evaluated at
the build time,
it never cause problem on linking itself. What the problem with
dependency files is persisting
the full path into the dep files. We generate the dep files only for
objects (%.o.d rule)

Well, it's ok to use "-L." here, and it's the same way with autotools does.

>
> I wonder if we should do the same as in the example above and set
> default value of TOPDIR to '.', instead of `pwd`.

TOPDIR is also used by library-test{,.static}. They run gcc in a
temporary directory.
Thus, TOPDIR here should be absolute.

>
>
>> Signed-off-by: Naohiro Aota 
>> ---
>>  Makefile | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index b1f3388..69b94fb 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -70,8 +70,8 @@ CFLAGS = $(SUBST_CFLAGS) \
>>-D_XOPEN_SOURCE=700  \
>>-fno-strict-aliasing \
>>-fPIC \
>> -  -I$(TOPDIR) \
>> -  -I$(TOPDIR)/kernel-lib \
>> +  -I. \
>> +  -I./kernel-lib \
>>$(EXTRAWARN_CFLAGS) \
>>$(DEBUG_CFLAGS_INTERNAL) \
>>$(EXTRA_CFLAGS)
>> --
>> 2.14.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html