Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-05 Thread Qu Wenruo



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


How reproducible of the hang?

I also see the -EINTR output, but that seems to be designed for 
btrfs/11[45].


btrfs/078 is unrelated to qgroup, and all these three test pass in my 
test environment, which is v4.11-rc1 with your patches applied.


I ran these 3 tests in a row with default and space_cache=v2 mount 
options, and 5 times for each mount option, no hang at all.


It would help much if more info can be provided, from blocked process 
backtrace to test mount option to base commit.


Thanks,
Qu


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096
 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 fs/btrfs/raid56.c| 19 +-
 fs/btrfs/scrub.c | 42 
 fs/btrfs/transaction.c   | 20 +--
 fs/btrfs/transaction.h   |  3 ++-
 fs/btrfs/tree-log.c  |  2 +-
 fs/btrfs/volumes.c   | 10 +-
 fs/btrfs/volumes.h   |  2 +-
 include/trace/events/btrfs.h |  4 ++--
 24 files changed, 143 insertions(+), 137 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: relocation: Enhance kernel error output for relocation

2017-03-05 Thread Qu Wenruo

Any comment on this patch?

Thanks,
Qu

At 02/15/2017 09:39 AM, Qu Wenruo wrote:

When balance(relocation) fails, btrfs-progs will report like:

ERROR: error during balancing '/mnt/scratch': Input/output error
There may be more info in syslog - try dmesg | tail

However kernel can't provide may useful info in many cases to locate the
problem.

This patch will add error messages in relocation to help user and
developer to locate the problem.

Signed-off-by: Qu Wenruo 
---
v2:
  Fix typo where 'err' and 'ret' are used wrong.
v3:
  Add space between error number and parenthesis of error string
  Fix gramma errors.
---
 fs/btrfs/relocation.c | 61 +++
 1 file changed, 57 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 379711048fb0..6de5800e17bc 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4011,6 +4011,8 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
rc->block_rsv, rc->block_rsv->size,
BTRFS_RESERVE_FLUSH_ALL);
if (ret) {
+   btrfs_err(fs_info, "failed to reserve space: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4019,6 +4021,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
trans = NULL;
+   btrfs_err(fs_info,
+   "failed to start transaction: %d (%s)",
+ err, btrfs_decode_error(err));
break;
}
 restart:
@@ -4028,8 +4033,12 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)
}

ret = find_next_extent(rc, path, );
-   if (ret < 0)
+   if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to find next extent: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
+   }
if (ret != 0)
break;

@@ -4081,9 +4090,17 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)

if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
ret = add_tree_block(rc, , path, );
+   if (ret < 0)
+   btrfs_err(fs_info,
+   "failed to record tree block: %d (%s)",
+ ret, btrfs_decode_error(ret));
} else if (rc->stage == UPDATE_DATA_PTRS &&
   (flags & BTRFS_EXTENT_FLAG_DATA)) {
ret = add_data_references(rc, , path, );
+   if (ret < 0)
+   btrfs_err(fs_info,
+   "failed to record data extent: %d (%s)",
+ ret, btrfs_decode_error(ret));
} else {
btrfs_release_path(path);
ret = 0;
@@ -4103,6 +4120,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
rc->backref_cache.last_trans = trans->transid - 
1;

if (ret != -EAGAIN) {
+   btrfs_err(fs_info,
+   "failed to relocate tree blocks: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4121,6 +4141,9 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
ret = relocate_data_extent(rc->data_inode,
   , >cluster);
if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to relocate data extent: %d (%s)",
+ ret, btrfs_decode_error(ret));
err = ret;
break;
}
@@ -4147,8 +4170,12 @@ static noinline_for_stack int 
relocate_block_group(struct reloc_control *rc)
if (!err) {
ret = relocate_file_extent_cluster(rc->data_inode,
   >cluster);
-   if (ret < 0)
+   if (ret < 0) {
+   btrfs_err(fs_info,
+   "failed to relocate file extent 

[PATCH v6 2/2] btrfs: Handle delalloc error correctly to avoid ordered extent hang

2017-03-05 Thread Qu Wenruo
[BUG]
If run_delalloc_range() returns error and there is already some ordered
extents created, btrfs will be hanged with the following backtrace:

Call Trace:
 __schedule+0x2d4/0xae0
 schedule+0x3d/0x90
 btrfs_start_ordered_extent+0x160/0x200 [btrfs]
 ? wake_atomic_t_function+0x60/0x60
 btrfs_run_ordered_extent_work+0x25/0x40 [btrfs]
 btrfs_scrubparity_helper+0x1c1/0x620 [btrfs]
 btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
 process_one_work+0x2af/0x720
 ? process_one_work+0x22b/0x720
 worker_thread+0x4b/0x4f0
 kthread+0x10f/0x150
 ? process_one_work+0x720/0x720
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x2e/0x40

[CAUSE]

|<-- delalloc range --->|
| OE 1 | OE 2 | ... | OE n |
|<>|   |<-- cleanup range ->|
 ||
 \_=> First page handled by end_extent_writepage() in __extent_writepage()

The problem is caused by error handler of run_delalloc_range(), which
doesn't handle any created ordered extents, leaving them waiting on
btrfs_finish_ordered_io() to finish.

However after run_delalloc_range() returns error, __extent_writepage()
won't submit bio, so btrfs_writepage_end_io_hook() won't be triggered
except the first page, and btrfs_finish_ordered_io() won't be triggered
for created ordered extents either.

So OE 2~n will hang forever, and if OE 1 is larger than one page, it
will also hang.

[FIX]
Introduce btrfs_cleanup_ordered_extents() function to cleanup created
ordered extents and finish them manually.

The function is based on existing
btrfs_endio_direct_write_update_ordered() function, and modify it to
act just like btrfs_writepage_endio_hook() but handles specified range
other than one page.

After fix, delalloc error will be handled like:

|<-- delalloc range --->|
| OE 1 | OE 2 | ... | OE n |
|<>|<  --->|<-- old error handler ->|
 ||  ||
 ||  \_=> Cleaned up by cleanup_ordered_extents()
 \_=> First page handled by end_extent_writepage() in __extent_writepage()

Signed-off-by: Qu Wenruo 
Signed-off-by: Filipe Manana 
---
changelog:
v2:
  Add BTRFS_ORDERED_SKIP_METADATA flag to avoid double reducing
  outstanding extents, which is already done by
  extent_clear_unlock_delalloc() with EXTENT_DO_ACCOUNT control bit
v3:
  Skip first page to avoid underflow ordered->bytes_left.
  Fix range passed in cow_file_range() which doesn't cover the whole
  extent.
  Expend extent_clear_unlock_delalloc() range to allow them to handle
  metadata release.
v4:
  Don't use extra bit to skip metadata freeing for ordered extent,
  but only handle btrfs_reloc_clone_csums() error just before processing
  next extent.
  This makes error handle much easier for run_delalloc_nocow().
v5:
  Variant gramma and comment fixes suggested by Filipe Manana
  Enhanced commit message to focus on the generic error handler bug,
  pointed out by Filipe Manana
  Adding missing wq/func user in __endio_write_update_ordered(), pointed
  out by Filipe Manana.
  Enhanced commit message with ascii art to explain the bug more easily.
  Fix a bug which can leads to corrupted extent allocation, exposed by
  Filipe Manana.
v6:
  Split the metadata underflow fix to another patch.
  Use better comment and btrfs_cleanup_ordered_extent() implementation
  from Filipe Manana.
---
 fs/btrfs/inode.c | 62 ++--
 1 file changed, 47 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1d83d504f2e5..9b03eb9b27d0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -115,6 +115,30 @@ static struct extent_map *create_io_em(struct inode 
*inode, u64 start, u64 len,
   u64 ram_bytes, int compress_type,
   int type);
 
+static void __endio_write_update_ordered(struct inode *inode,
+u64 offset, u64 bytes,
+bool uptodate);
+
+/*
+ * Cleanup all submitted ordered extents in specified range to handle errors
+ * from the fill_dellaloc() callback.
+ *
+ * NOTE: caller must ensure that when an error happens, it can not call
+ * extent_clear_unlock_delalloc() to clear both the bits EXTENT_DO_ACCOUNTING
+ * and EXTENT_DELALLOC simultaneously, because that causes the reserved 
metadata
+ * to be released, which we want to happen only when finishing the ordered
+ * extent (btrfs_finish_ordered_io()). Also note that the caller of the
+ * fill_dealloc() callback already does proper cleanup for the first page of
+ * the range, that is, it invokes the callback writepage_end_io_hook() for the
+ * range of the first page.
+ */
+static inline void btrfs_cleanup_ordered_extents(struct inode *inode,
+u64 offset, u64 bytes)
+{
+   return __endio_write_update_ordered(inode, offset + PAGE_SIZE,
+ 

[PATCH v6 1/2] btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error

2017-03-05 Thread Qu Wenruo
[BUG]
When btrfs_reloc_clone_csum() reports error, it can underflow metadata
and leads to kernel assertion on outstanding extents in
run_delalloc_nocow() and cow_file_range().

 BTRFS info (device vdb5): relocating block group 12582912 flags data
 BTRFS info (device vdb5): found 1 extents
 assertion failed: inode->outstanding_extents >= num_extents, file: 
fs/btrfs//extent-tree.c, line: 5858

Currently, due to another bug blocking ordered extents, the bug is only
reproducible under certain block group layout and using error injection.

a) Create one data block group with one 4K extent in it.
   To avoid the bug that hangs btrfs due to ordered extent which never
   finishes
b) Make btrfs_reloc_clone_csum() always fail
c) Relocate that block group

[CAUSE]
run_delalloc_nocow() and cow_file_range() handles error from
btrfs_reloc_clone_csum() wrongly:

(The ascii chart shows a more generic case of this bug other than the
bug mentioned above)

|<-- delalloc range --->|
| OE 1 | OE 2 | ... | OE n |
|<--- cleanup range --->|
|<---  --->|
 \/
 btrfs_finish_ordered_io() range

So error handler, which calls extent_clear_unlock_delalloc() with
EXTENT_DELALLOC and EXTENT_DO_ACCOUNT bits, and btrfs_finish_ordered_io()
will both cover OE n, and free its metadata, causing metadata under flow.

[Fix]
The fix is to ensure after calling btrfs_add_ordered_extent(), we only
call error handler after increasing the iteration offset, so that
cleanup range won't cover any created ordered extent.

|<-- delalloc range --->|
| OE 1 | OE 2 | ... | OE n |
|<---  --->|<-- cleanup range ->|
 \/
 btrfs_finish_ordered_io() range

Signed-off-by: Qu Wenruo 
---
changelog:
v6:
  New, split from v5 patch, as this is a separate bug.
---
 fs/btrfs/inode.c | 51 +++
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b2bc07aad1ae..1d83d504f2e5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -998,15 +998,24 @@ static noinline int cow_file_range(struct inode *inode,
BTRFS_DATA_RELOC_TREE_OBJECTID) {
ret = btrfs_reloc_clone_csums(inode, start,
  cur_alloc_size);
+   /*
+* Only drop cache here, and process as normal.
+*
+* We must not allow extent_clear_unlock_delalloc()
+* at out_unlock label to free meta of this ordered
+* extent, as its meta should be freed by
+* btrfs_finish_ordered_io().
+*
+* So we must continue until @start is increased to
+* skip current ordered extent.
+*/
if (ret)
-   goto out_drop_extent_cache;
+   btrfs_drop_extent_cache(BTRFS_I(inode), start,
+   start + ram_size - 1, 0);
}
 
btrfs_dec_block_group_reservations(fs_info, ins.objectid);
 
-   if (disk_num_bytes < cur_alloc_size)
-   break;
-
/* we're not doing compressed IO, don't unlock the first
 * page (which the caller expects to stay locked), don't
 * clear any dirty bits and don't set any writeback bits
@@ -1022,10 +1031,21 @@ static noinline int cow_file_range(struct inode *inode,
 delalloc_end, locked_page,
 EXTENT_LOCKED | EXTENT_DELALLOC,
 op);
-   disk_num_bytes -= cur_alloc_size;
+   if (disk_num_bytes > cur_alloc_size)
+   disk_num_bytes = 0;
+   else
+   disk_num_bytes -= cur_alloc_size;
num_bytes -= cur_alloc_size;
alloc_hint = ins.objectid + ins.offset;
start += cur_alloc_size;
+
+   /*
+* btrfs_reloc_clone_csums() error, since start is increased
+* extent_clear_unlock_delalloc() at out_unlock label won't
+* free metadata of current ordered extent, we're OK to exit.
+*/
+   if (ret)
+   goto out_unlock;
}
 out:
return ret;
@@ -1414,15 +1434,14 @@ static noinline int run_delalloc_nocow(struct inode 
*inode,
BUG_ON(ret); /* -ENOMEM */
 
if (root->root_key.objectid ==
-   BTRFS_DATA_RELOC_TREE_OBJECTID) {
+   

Re: Why do BTRFS (still) forgets what device to write to?

2017-03-05 Thread Duncan
waxhead posted on Sun, 05 Mar 2017 17:26:36 +0100 as excerpted:

> I am doing some test on BTRFS with both data and metadata in raid1.
> 
> uname -a Linux daffy 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28)
> x86_64 GNU/Linux
> 
> btrfs--version btrfs-progs v4.7.3
> 
> 
> 01. mkfs.btrfs /dev/sd[fgh]1 02. mount /dev/sdf1 /btrfs_test/
> 03. btrfs balance start -dconvert=raid1 /btrfs_test/
> 04. copied a lots of 3-4MB files to it (about 40GB)...
> 05. Started to compress some of the files to create one larger file...
> 06. Pulled the (sata) plug on one of the drives... (sdf1)
> 07. dmesg shows that the kernel is rejecting I/O to offline device +
> [sdf] killing request]
> 08. BTRS error (device sdf1) bdev /dev/sdf1 errs: wr 0, rd 1, flush 0,
> corrupt 0, gen 0 09. the previous line repeats - increasing rd count 10.
> Reconnecting the sdf1 drive again makes it show up as sdi1 11. btrfs fi
> sh /btrfs_test shows sd1 as the correct device id (1).
> 12. Yet dmesg shows tons of errors like this: BTRFS error (device sdf1)
> : bdev /dev/sdi1 errs wr 37182, rd 39851, flush 1, corrupt 0, gen 0
> 13. and the above line repeats increasing wr, and rd errors.
> 14. BTRFS never seems to "get in tune again" while the filesystem is
> mounted.
> 
> The conclusion appears to be that the device ID is back again in the
> btrfs pool so why does btrfs still try to write to the wrong device (or
> does it?!).

The base problem is that btrfs doesn't (yet) have any concept of a device 
disconnecting and reconnecting "live", only after unmount/remount.

When a device drops out, btrfs will continue to attempt to write to it.  
Things will continue normally on all other devices, and only after some 
time will btrfs actually finally give up on the device.  (I /believe/ 
it's after the level of dirty memory exceeds some safety threshold, with 
the unwritten writes taking up a larger and larger part of dirty memory 
until something gives.  However, I'm not a dev just a user and list 
regular, and this is just my supposition filling in the blanks, so don't 
take it for gospel unless you get confirmation either directly from the 
code or from an actual dev.)

If the outage is short enough for the kernel to bring back the device as 
the same device node, great, btrfs can and does resume writing to it.  
However, once the outage is long enough that the kernel brings back the 
physical device as a different device node, yes, btrfs filesystem show 
will show the device back as its normal ID, but that information isn't 
properly communicated to the "live" still-mounted filesystem, and it 
continues to attempt writing to the old device node.

There's plans for, and even patches that introduce limited support for, 
live detection and automatic (re)integration of a new or reintroduced 
device, but those patches are in a longterm development project and last 
I read weren't even in a state where they even applied cleanly to current 
kernels, as they've not been kept current and have gone stale.

Of course it should be kept in mind that btrfs is still under heavy 
development, and while stabilizing, isn't considered, certainly not by 
its devs, to be anywhere near feature complete and stabilized, even, at 
times such as this, for features that are generally considered as 
reasonably stable and mature as btrfs itself is -- that is, still 
stabilizING, not yet fully stable and mature -- keep backups and be 
prepared to use them if you value your data, because you may indeed be 
calling on them!

In that state, it's only to be expected that there will still be some 
incomplete features such as this, where manual intervention may be 
required that wouldn't be in more complete/stable/mature solutions.

Basically, it comes with the territory.

> The good thing here is that BTRFS does still work fine after a unmount
> and mount again. Running a scrub on the filesystem cleans up tons of
> errors , but no uncorrectable errors.

Correct.  An unmount will leave all that data unwritten to the device it 
still considers missing, so of course those checksums aren't going to 
match.  On remount, btrfs sees the device again, and should and AFAIK 
consistently does note the difference in commit generations, pulling from 
the updated device where they differ.  A scrub can then be used to bring 
the outdated device back in sync.

But be sure to do that scrub as soon as possible.  Should further 
instability continue to drop out devices, or further not entirely 
graceful unmounts/shutdowns occur, the damage may get worse and not be 
entirely repairable, certainly not with only a simple scrub.

> However it says total bytes scrubbed 94.21GB with 75 errors ... and
> further down it says corrected errors: 72, uncorrectable errors: 0 ,
> unverified errors: 0
> 
> Why 75 vs 72 errors?! did it correct all or not?

>From my own experience (and I actually deliberately ran btrfs raid1 
with a failing device for awhile to test this sort of thing, btrfs' 
checksumming 

Re: [PATCH v2] btrfs-progs: report I/O errors when closing the filesystem

2017-03-05 Thread Qu Wenruo



At 03/04/2017 01:02 AM, Omar Sandoval wrote:

From: Omar Sandoval 

If the final fsync() on the Btrfs device fails, we just swallow the
error and don't alert the user in any way. This was uncovered by xfstest
generic/405, which checks that mkfs fails when it encounters EIO.

Signed-off-by: Omar Sandoval 


Please ignore my comment on v1 patch, didn't see the v2 one.

Looks good to me.

Reviewed-by: Qu Wenruo 

Thanks,
Qu

---
Changes from v1: return -errno instead of -1

 disk-io.c | 4 ++--
 volumes.c | 9 +++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 2a94d4fc..48fb3c6b 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1817,10 +1817,10 @@ int close_ctree_fs_info(struct btrfs_fs_info *fs_info)
free_fs_roots_tree(_info->fs_root_tree);

btrfs_release_all_roots(fs_info);
-   btrfs_close_devices(fs_info->fs_devices);
+   ret = btrfs_close_devices(fs_info->fs_devices);
btrfs_cleanup_all_caches(fs_info);
btrfs_free_fs_info(fs_info);
-   return 0;
+   return ret;
 }

 int clean_tree_block(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/volumes.c b/volumes.c
index a0a85edd..f7d82d51 100644
--- a/volumes.c
+++ b/volumes.c
@@ -160,6 +160,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 {
struct btrfs_fs_devices *seed_devices;
struct btrfs_device *device;
+   int ret = 0;

 again:
if (!fs_devices)
@@ -168,7 +169,11 @@ again:
device = list_entry(fs_devices->devices.next,
struct btrfs_device, dev_list);
if (device->fd != -1) {
-   fsync(device->fd);
+   if (fsync(device->fd) == -1) {
+   warning("fsync on device %llu failed: %s",
+   device->devid, strerror(errno));
+   ret = -errno;
+   }
if (posix_fadvise(device->fd, 0, 0, 
POSIX_FADV_DONTNEED))
fprintf(stderr, "Warning, could not drop 
caches\n");
close(device->fd);
@@ -197,7 +202,7 @@ again:
free(fs_devices);
}

-   return 0;
+   return ret;
 }

 void btrfs_close_all_devices(void)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: report I/O errors when closing the filesystem

2017-03-05 Thread Qu Wenruo



At 03/04/2017 12:59 AM, Omar Sandoval wrote:

From: Omar Sandoval 

If the final fsync() on the Btrfs device fails, we just swallow the
error and don't alert the user in any way. This was uncovered by xfstest
generic/405, which checks that mkfs fails when it encounters EIO.

Signed-off-by: Omar Sandoval 
---
 disk-io.c | 4 ++--
 volumes.c | 9 +++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 2a94d4fc..48fb3c6b 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1817,10 +1817,10 @@ int close_ctree_fs_info(struct btrfs_fs_info *fs_info)
free_fs_roots_tree(_info->fs_root_tree);

btrfs_release_all_roots(fs_info);
-   btrfs_close_devices(fs_info->fs_devices);
+   ret = btrfs_close_devices(fs_info->fs_devices);
btrfs_cleanup_all_caches(fs_info);
btrfs_free_fs_info(fs_info);
-   return 0;
+   return ret;
 }

 int clean_tree_block(struct btrfs_trans_handle *trans, struct btrfs_root *root,
diff --git a/volumes.c b/volumes.c
index a0a85edd..89335d35 100644
--- a/volumes.c
+++ b/volumes.c
@@ -160,6 +160,7 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
 {
struct btrfs_fs_devices *seed_devices;
struct btrfs_device *device;
+   int ret = 0;

 again:
if (!fs_devices)
@@ -168,7 +169,11 @@ again:
device = list_entry(fs_devices->devices.next,
struct btrfs_device, dev_list);
if (device->fd != -1) {
-   fsync(device->fd);
+   if (fsync(device->fd) == -1) {
+   warning("fsync on device %llu failed: %s",
+   device->devid, strerror(errno));
+   ret = -1;


-1 is -EINVAL, better using -errno here.

Despite of that, looks good.

Thanks,
Qu

+   }
if (posix_fadvise(device->fd, 0, 0, 
POSIX_FADV_DONTNEED))
fprintf(stderr, "Warning, could not drop 
caches\n");
close(device->fd);
@@ -197,7 +202,7 @@ again:
free(fs_devices);
}

-   return 0;
+   return ret;
 }

 void btrfs_close_all_devices(void)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/17] fs, btrfs refcount conversions

2017-03-05 Thread Qu Wenruo



At 03/03/2017 04:55 PM, Elena Reshetova wrote:

Now when new refcount_t type and API are finally merged
(see include/linux/refcount.h), the following
patches convert various refcounters in the btrfs filesystem from atomic_t
to refcount_t. By doing this we prevent intentional or accidental
underflows or overflows that can led to use-after-free vulnerabilities.

The below patches are fully independent and can be cherry-picked separately.
Since we convert all kernel subsystems in the same fashion, resulting
in about 300 patches, we have to group them for sending at least in some
fashion to be manageable. Please excuse the long cc list.

These patches have been tested with xfstests by running btrfs-related tests.
btrfs debug was enabled, warns on refcount errors, too. No output related to
refcount errors produced. However, the following errors were during the run:
 * tests btrfs/078, btrfs/114, btrfs/115, no errors anywhere in dmesg, but
 process hangs. They all seem to be around qgroup, sometimes error visible
 such as qgroup scan failed -4 before it blocks, but not always.


-EINTR? That's strange.

Any blocked process backtrace?


 * test btrfs/104 dmesg has additional error output:
 BTRFS warning (device vdc): qgroup 258 reserved space underflow, have: 0,
 to free: 4096


Known one, and fixes already sent to mail list while not merged yet:
https://patchwork.kernel.org/patch/9592765/

Thanks,
Qu


 I tried looking at the code on what causes the failure, but could not figure
 it out. It doesn't seem to be related to any refcount changes at least IMO.

The above test failures are hard for me to understand and interpreted, but
they don't seem to relate to refcount conversions.

Elena Reshetova (17):
  fs, btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_transaction.use_count from atomic_t to
refcount_t
  fs, btrfs: convert extent_map.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_ordered_extent.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_caching_control.count from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to
refcount_t
  fs, btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_root.refs from atomic_t to refcount_t
  fs, btrfs: convert extent_state.refs from atomic_t to refcount_t
  fs, btrfs: convert compressed_bio.pending_bios from atomic_t to
refcount_t
  fs, btrfs: convert scrub_recover.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_page.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_block.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_parity.refs from atomic_t to refcount_t
  fs, btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
  fs, btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

 fs/btrfs/backref.c   |  2 +-
 fs/btrfs/compression.c   | 18 -
 fs/btrfs/ctree.h |  5 +++--
 fs/btrfs/delayed-inode.c | 46 ++--
 fs/btrfs/delayed-inode.h |  5 +++--
 fs/btrfs/delayed-ref.c   |  8 
 fs/btrfs/delayed-ref.h   |  8 +---
 fs/btrfs/disk-io.c   |  6 +++---
 fs/btrfs/disk-io.h   |  4 ++--
 fs/btrfs/extent-tree.c   | 20 +--
 fs/btrfs/extent_io.c | 18 -
 fs/btrfs/extent_io.h |  3 ++-
 fs/btrfs/extent_map.c| 10 +-
 fs/btrfs/extent_map.h|  3 ++-
 fs/btrfs/ordered-data.c  | 20 +--
 fs/btrfs/ordered-data.h  |  2 +-
 fs/btrfs/raid56.c| 19 +-
 fs/btrfs/scrub.c | 42 
 fs/btrfs/transaction.c   | 20 +--
 fs/btrfs/transaction.h   |  3 ++-
 fs/btrfs/tree-log.c  |  2 +-
 fs/btrfs/volumes.c   | 10 +-
 fs/btrfs/volumes.h   |  2 +-
 include/trace/events/btrfs.h |  4 ++--
 24 files changed, 143 insertions(+), 137 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.7.2] btrfs_run_delayed_refs:2963: errno=-17 Object already exists

2017-03-05 Thread Marc Joliet
On Mittwoch, 1. März 2017 19:14:07 CET you wrote:
>  In any
> case, I started btrfs-check on the device itself.

*Sigh*, I had to restart it, because I forgot to redirect to a file and quite 
frankly wasn't expecting this flood of output, but here's a summary of the 
output after about 2 days:

% wc -l btrfs_check_output_20170303.log 
1360252 btrfs_check_output_20170303.log
% grep -v "backref lost" btrfs_check_output_20170303.log | grep -v "check \
(leaf\|node\) failed" | grep -v "lost its parent" | grep -v "referencer count" 
checking extents
ERROR: block group[3879328546816 1073741824] used 1072840704 but extent items 
used 1129164800
ERROR: block group[4163870130176 1073741824] used 1072259072 but extent items 
used 0
ERROR: block group[4223999672320 1073741824] used 1073664000 but extent items 
used 1074188288
ERROR: block group[4278760505344 1073741824] used 1073377280 but extent items 
used 1073901568
ERROR: block group[4406535782400 1073741824] used 1073627136 but extent items 
used 0
ERROR: extent [3830028140544 4096] referencer bytenr mismatch, wanted: 
3830028140544, have: 3826183847936
% tail btrfs_check_output_20170303.log 
ERROR: data extent[3924538445824 4096] backref lost
ERROR: data extent[3933464903680 4096] backref lost
ERROR: data extent[3924538531840 4096] backref lost
ERROR: data extent[3839131480064 4096] backref lost
ERROR: data extent[3834701750272 4096] backref lost
ERROR: data extent[3873087918080 4096] backref lost
ERROR: data extent[3873072283648 4096] backref lost
ERROR: data extent[3873088090112 8192] backref lost
ERROR: data extent[3873072287744 4096] backref lost
ERROR: data extent[3856294449152 4096] backref lost

That's right, slowly approaching 1.5 million lines of btrfs-check output!  
That's *way* more than I ran it the last time this error happened a few weeks 
ago.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only

2017-03-05 Thread Lukas Tribus

Hello Hans,


Am 24.02.2017 um 01:26 schrieb Hans van Kranenburg:



Once that is done, I would like to go over the "btrfs recovery" thread
and see if it can
be applied for my case as well. I will certainly need your help when
that time comes...

We can take a stab at it.



I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs 
inspect-internal dump-tree.

But I cannot find anything about 5242107641856 in the dump-tree output.

What does that mean?




Thanks,
Lukas

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-05 Thread Peter Grandi
[ ... on the difference between number of devices and length of
a chunk-stripe ... ]

> Note: possibilities get even more interesting with a 4-device
> volume with 'raid1' profile chunks, and similar case involving
> other profiles than 'raid1'.

Consider for example a 4-device volume with 2 devices abruptly
missing: if 2-length 'raid1' chunk-stripes have been uniformly
laid across devices, then some chunk-stripes will be completely
missing (where both chunks in the stripe were on the 2 missing
devices), some will be 1-length, and some will be 2-length.

What to do when devices are missing?

One possibility is to simply require mount with the 'degraded'
option, by default read-only, but allowing read-write, simply as
a way to ensure the sysadm knows that some metada/data *may* not
be redundant or *may* even be unavailable (if the chunk-stripe
length is less than the minimum to reconstruct the data).

Then attempts to read unavailable metadata or data would return
an error like a checksum violation without redundancy,
dynamically (when the application or 'balance' or 'scrub'
attempt to read the unavailable data).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1 degraded mount still produce single chunks, writeable mount not allowed

2017-03-05 Thread Peter Grandi
>> What makes me think that "unmirrored" 'raid1' profile chunks
>> are "not a thing" is that it is impossible to remove
>> explicitly a member device from a 'raid1' profile volume:
>> first one has to 'convert' to 'single', and then the 'remove'
>> copies back to the remaining devices the 'single' chunks that
>> are on the explicitly 'remove'd device. Which to me seems
>> absurd.

> It is, there should be a way to do this as a single operation.
> [ ... ] The reason this is currently the case though is a
> simple one, 'btrfs device delete' is just a special instance
> of balance [ ... ]  does no profile conversion, but having
> that as an option would actually be _very_ useful from a data
> safety perspective.

That seems to me an even more "confused" opinion: because
removing a device to make it "missing" and removing it
permanently should be very different operations.

Consider the common case of a 3-member volume with a 'raid1'
target profile: if the sysadm thinks that a drive should be
replaced, the goal is to take it out *without* converting every
chunk to 'single', because with 2-out-of-3 devices half of the
chunks will still be fully mirrored.

Also, removing the device to be replaced should really not be
the same thing as balancing the chunks, if there is space, to be
'raid1' across remaining drives, because that's a completely
different operation.

>> Going further in my speculation, I suspect that at the core of
>> the Btrfs multidevice design there is a persistent "confusion"
>> (to use en euphemism) between volumes having a profile, and
>> merely chunks have a profile.

> There generally is.  The profile is entirely a property of the
> chunks (each chunk literally has a bit of metadata that says
> what profile it is), not the volume.  There's some metadata in
> the volume somewhere that says what profile to use for new
> chunks of each type (I think),

That's the "target" profile for the volume.

> but that doesn't dictate what chunk profiles there are on the
> volume. [ ... ]

But as that's the case then the current Btrfs logic for
determining whether a volume is degraded or not is quite
"confused" indeed.

Because suppose there is again the simple case of a 3-device
volume, where all existing chunks have 'raid1' profile and the
volume's target profile is also 'raid1' and one device has gone
offline: the volume cannot be said to be "degraded", unless a
full examination of all chunks is made. Because it can well
happen that in fact *none* of the chunks was mirrored to that
device, for example, however unlikely. And viceversa. Even with
3 devices some chunks may be temporarily "unmirrored" (even if
for brief times hopefully).

The average case is that half of the chunks will be fully
mirrored across the two remaining devices and half will be
"unmirrored".

Now consider re-adding the third device: at that point the
volume has got back all 3 devices, so it is not "degraded", but
50% of the chunks in the volume will still be "unmirrored", even
if eventually they will be mirrored on the newly added device.

Note: possibilities get even more interesting with a 4-device
volume with 'raid1' profile chunks, and similar case involving
other profiles than 'raid1'.

Therefore the current Btrfs logic for deciding whether a volume
is "degraded" seems simply "confused" to me, because whether
there are missing devices and some chunks are "unmirrored" is
not quite the same thing.

The same applies to the current logic that in a 2-device volume
with a device missing new chunks are created as "single" profile
instead of as "unmirrored" 'raid1' profile: another example of
"confusion" between number of devices and chunk profile.

Note: the best that can be said is that a volume has both a
"target chunk profile" (one per data, metadata, system chunks)
and a target number of member devices, and that a volume with a
number of devices below the target *might* be degraded, and that
whether a volume is in fact degraded is not either/or, but given
by the percentage of chunks or stripes that are degraded. This
is expecially made clear by the 'raid1' case where the chunk
stripe length is always 2, but the number of target devices can
be greater than 2. Management of devices and management of
stripes are in Btrfs, unlike conventional RAID like Linux MD,
rather different operations needing rather different, if
related, logic.

My impression is that because of "confusion" between number of
devices in a volume and status of chunk profile there are some
"surprising" behaviors in Btrfs, and that will take quite a bit
to fix, most importantly for the Btrfs developer team to clear
among themselves the semantics attaching to both. After 10 years
of development that seems the right thing to do :-).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


assertion failed: page_ops & PAGE_LOCK

2017-03-05 Thread Dave Jones
After commenting out the assertion that Liu bo pointed out was bogus,
my trinity runs last a little longer.. This is a new one I think..

assertion failed: page_ops & PAGE_LOCK, file: fs/btrfs/extent_io.c, line: 1716
[ cut here ]
kernel BUG at fs/btrfs/ctree.h:3423!
invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 32625 Comm: trinity-c40 Not tainted 4.10.0-think+ #3 
task: 8804c6b95440 task.stack: c9ba8000
RIP: 0010:assfail.constprop.68+0x1c/0x1e
RSP: 0018:c9bab6c0 EFLAGS: 00010282
RAX: 004e RBX: 00a3 RCX: 0001
RDX:  RSI: 81ee593a RDI: 
RBP: c9bab6c0 R08:  R09: 0001
R10: 0001 R11:  R12: c9bab790
R13:  R14: 01f8 R15: ea000b2026c0
FS:  7f102a0e4b40() GS:880507a0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f1029f21000 CR3: 0004c6bde000 CR4: 001406e0
DR0: 7f695d948000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 __process_pages_contig+0x4b4/0x4d0
 __unlock_for_delalloc.isra.43+0x32/0x50
 find_lock_delalloc_range+0x15e/0x210
 writepage_delalloc.isra.51+0x91/0x1a0
 __extent_writepage+0x10d/0x470
 extent_write_cache_pages.constprop.57+0x2d3/0x430
 extent_writepages+0x5d/0x90
 ? btrfs_releasepage+0x20/0x20
 btrfs_writepages+0x28/0x30
 do_writepages+0x21/0x30
 __filemap_fdatawrite_range+0xc6/0x100
 filemap_fdatawrite_range+0x13/0x20
 btrfs_fdatawrite_range+0x20/0x50
 start_ordered_ops+0x19/0x30
 btrfs_sync_file+0x99/0x540
 ? debug_smp_processor_id+0x17/0x20
 ? get_lock_stats+0x19/0x50
 ? debug_smp_processor_id+0x17/0x20
 ? get_lock_stats+0x19/0x50
 vfs_fsync_range+0x4b/0xb0
 btrfs_file_write_iter+0x412/0x570
 ? rcu_sync_lockdep_assert+0x2f/0x60
 __do_readv_writev+0x2ea/0x380
 do_readv_writev+0x7c/0xc0
 ? debug_smp_processor_id+0x17/0x20
 ? get_lock_stats+0x19/0x50
 ? __context_tracking_exit.part.5+0x82/0x1e0
 vfs_writev+0x3f/0x50
 do_pwritev+0xb5/0xd0
 SyS_pwritev2+0x17/0x30
 do_syscall_64+0x66/0x1d0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f1029a0e0f9
RSP: 002b:7ffe59e72c48 EFLAGS: 0246
[CONT START]  ORIG_RAX: 0148
RAX: ffda RBX: 0148 RCX: 7f1029a0e0f9
RDX: 00f2 RSI: 562b5ef7de50 RDI: 0185
RBP: 7f1029fc5000 R08: 08002180528a R09: 0007
R10: baba R11: 0246 R12: 0002
R13: 7f1029fc5048 R14: 7f102a0e4ad8 R15: 7f1029fc5000

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Why do BTRFS (still) forgets what device to write to?

2017-03-05 Thread waxhead

I am doing some test on BTRFS with both data and metadata in raid1.

uname -a
Linux daffy 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28) x86_64 
GNU/Linux


btrfs--version
btrfs-progs v4.7.3


01. mkfs.btrfs /dev/sd[fgh]1
02. mount /dev/sdf1 /btrfs_test/
03. btrfs balance start -dconvert=raid1 /btrfs_test/
04. copied a lots of 3-4MB files to it (about 40GB)...
05. Started to compress some of the files to create one larger file...
06. Pulled the (sata) plug on one of the drives... (sdf1)
07. dmesg shows that the kernel is rejecting I/O to offline device + 
[sdf] killing request]
08. BTRS error (device sdf1) bdev /dev/sdf1 errs: wr 0, rd 1, flush 0, 
corrupt 0, gen 0

09. the previous line repeats - increasing rd count
10. Reconnecting the sdf1 drive again makes it show up as sdi1
11. btrfs fi sh /btrfs_test shows sd1 as the correct device id (1).
12. Yet dmesg shows tons of errors like this: BTRFS error (device sdf1) 
: bdev /dev/sdi1 errs wr 37182, rd 39851, flush 1, corrupt 0, gen 0

13. and the above line repeats increasing wr, and rd errors.
14. BTRFS never seems to "get in tune again" while the filesystem is 
mounted.


The conclusion appears to be that the device ID is back again in the 
btrfs pool so why does btrfs still try to write to the wrong device (or 
does it?!).


The good thing here is that BTRFS does still work fine after a unmount 
and mount again. Running a scrub on the filesystem cleans up tons of 
errors , but no uncorrectable errors.


However it says total bytes scrubbed 94.21GB with 75 errors ... and 
further down it says corrected errors: 72, uncorrectable errors: 0 , 
unverified errors: 0


Why 75 vs 72 errors?! did it correct all or not?

I have recently lost 1x 5 device BTRFS filesystem as well as 2x 3 device 
BTRFS filesystems set up in RAID1 (both data and medata) by toying 
around with them. The 2x filesystems I lost was using all bad disks (all 
3 of them) but the one mentioned here uses good (but old) 400GB drives 
just for the record.


By lost I mean that mount does not recognize the filesystem, but BTRFS 
fi sh does show that all devices are present. I did not make notes for 
those filesystems , but it appears that RAID1 is a bit fragile.


I don't need to recover anything. This is just a "toy system" for 
playing around with btrfs and doing some tests.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8 v2] Non-blocking AIO

2017-03-05 Thread Avi Kivity



On 03/01/2017 01:36 AM, Goldwyn Rodrigues wrote:

This series adds nonblocking feature to asynchronous I/O writes.
io_submit() can be delayed because of a number of reason:
  - Block allocation for files
  - Data writebacks for direct I/O
  - Sleeping because of waiting to acquire i_rwsem
  - Congested block device


We've been hit by a few of these so this change is very welcome.



The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
any of these conditions are met. This way userspace can push most
of the write()s to the kernel to the best of its ability to complete
and if it returns -EAGAIN, can defer it to another thread.



Is it not possible to push the iocb to a workqueue?  This will allow 
existing userspace to work with the new functionality, unchanged. Any 
userspace implementation would have to do the same thing, so it's not 
like we're saving anything by pushing it there.



In order to enable this, IOCB_FLAG_NOWAIT is introduced in
uapi/linux/aio_abi.h which translates to IOCB_NOWAIT for struct iocb,
BIO_NOWAIT for bio and IOMAP_NOWAIT for iomap.

This feature is provided for direct I/O of asynchronous I/O only. I have
tested it against xfs, ext4, and btrfs.

Changes since v1:
  + Forwardported from 4.9.10
  + changed name from _NONBLOCKING to *_NOWAIT
  + filemap_range_has_page call moved to closer to (just before) calling 
filemap_write_and_wait_range().
  + BIO_NOWAIT limited to get_request()
  + XFS fixes
- included reflink
- use of xfs_ilock_nowait() instead of a XFS_IOLOCK_NONBLOCKING flag
- Translate the flag through IOMAP_NOWAIT (iomap) to check for
  block allocation for the file.
  + ext4 coding style


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html