Re: [PATCH] btrfs-progs: send operates on ro snapshots only

2017-06-05 Thread Duncan
Hans van Kranenburg posted on Tue, 06 Jun 2017 01:32:18 +0200 as
excerpted:

> While talking to another btrfs user on IRC today, it became clear that a
> major point of confusion in the btrfs send manual is that it's not
> telling the user soon enough that send/receive solely operates on
> subvolume snapshots instead of the original (read/write) subvolumes.
> 
> So, change the first few alineas to explicitely mention snapshots

... lines to explicitly ...

> instead. Technically, snapshots are also just subvolumes, but requiring
> this level of technical detailed knowledge doesn't help the user which
> is just trying out things.

[actual patch omitted]

LGTM (other than the commit description spelling as above, neither my 
read-thru nor the spellchecker flagged anything in the actual patch).

The new wording is indeed much clearer.  Thanks. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: skip commit transaction if we don't have enough pinned bytes

2017-06-05 Thread Liu Bo
On Fri, Jun 02, 2017 at 11:14:13AM -0700, Omar Sandoval wrote:
> On Fri, May 19, 2017 at 11:39:15AM -0600, Liu Bo wrote:
> > We commit transaction in order to reclaim space from pinned bytes because
> > it could process delayed refs, and in may_commit_transaction(), we check
> > first if pinned bytes are enough for the required space, we then check if
> > that plus bytes reserved for delayed insert are enough for the required
> > space.
> > 
> > This changes the code to the above logic.
> > 
> > Signed-off-by: Liu Bo 
> > ---
> >  fs/btrfs/extent-tree.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index e390451c72e6..bded1ddd1bb6 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -4837,7 +4837,7 @@ static int may_commit_transaction(struct 
> > btrfs_fs_info *fs_info,
> >  
> > spin_lock(_rsv->lock);
> > if (percpu_counter_compare(_info->total_bytes_pinned,
> > -  bytes - delayed_rsv->size) >= 0) {
> > +  bytes - delayed_rsv->size) < 0) {
> > spin_unlock(_rsv->lock);
> > return -ENOSPC;
> > }
> 
> I found this bug in my latest enospc investigation, too. However, the
> total_bytes_pinned counter is pretty broken. E.g., on my laptop:
> 
> $ sync; grep -H '' /sys/fs/btrfs/*/allocation/*/total_bytes_pinned
> /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/data/total_bytes_pinned:48693501952
> /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/metadata/total_bytes_pinned:-258146304
> /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/system/total_bytes_pinned:0
> 
> I have a patch to fix it that I haven't cleaned up yet, below. Without
> it, Bo's fix will probably cause early ENOSPCs. Dave, should we pull
> Bo's patch out of for-next? In any case, I'll get my fix sent out.
>

When data delayed refs are not merged, __btrfs_free_extent() will dec
the pinned as expected, but when they got merged, %pinned bytes only
got inc'd but not dec'd because the delayed ref has been deleted due
to ref_count == 0.

This doesn't happen to tree block ref since tree block ref doesn't
have more than one ref, so ref_mod is either 1 or -1, and tree block
ref gets removed via btrfs_free_tree_block(), which also needs to be
changed.

Based on that, I think we can solve the problem by touching
btrfs_free_tree_block() and data delayed ref instead of head ref.

thanks,
-liubo

> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index be70d90dfee5..93ffa898df6d 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -470,7 +470,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle 
> *trans,
>  static noinline void
>  update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
>struct btrfs_delayed_ref_node *existing,
> -  struct btrfs_delayed_ref_node *update)
> +  struct btrfs_delayed_ref_node *update,
> +  int *old_ref_mod_ret)
>  {
>   struct btrfs_delayed_ref_head *existing_ref;
>   struct btrfs_delayed_ref_head *ref;
> @@ -523,6 +524,8 @@ update_existing_head_ref(struct btrfs_delayed_ref_root 
> *delayed_refs,
>* currently, for refs we just added we know we're a-ok.
>*/
>   old_ref_mod = existing_ref->total_ref_mod;
> + if (old_ref_mod_ret)
> + *old_ref_mod_ret = old_ref_mod;
>   existing->ref_mod += update->ref_mod;
>   existing_ref->total_ref_mod += update->ref_mod;
>  
> @@ -550,7 +553,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>struct btrfs_delayed_ref_node *ref,
>struct btrfs_qgroup_extent_record *qrecord,
>u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
> -  int action, int is_data, int *qrecord_inserted_ret)
> +  int action, int is_data, int *qrecord_inserted_ret,
> +  int *old_ref_mod, int *new_ref_mod)
>  {
>   struct btrfs_delayed_ref_head *existing;
>   struct btrfs_delayed_ref_head *head_ref = NULL;
> @@ -638,7 +642,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>   if (existing) {
>   WARN_ON(ref_root && reserved && existing->qgroup_ref_root
>   && existing->qgroup_reserved);
> - update_existing_head_ref(delayed_refs, >node, ref);
> + update_existing_head_ref(delayed_refs, >node, ref,
> +  old_ref_mod);
>   /*
>* we've updated the existing ref, free the newly
>* allocated ref
> @@ -646,6 +651,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>   kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
>   head_ref = existing;
>   } else {
> + if 

device-mapper btrfs Warning Spam

2017-06-05 Thread Steven Bell
Hello all,


I have a btrfs raid-0 array (initially a raid-6 but long ago converted
due to warnings not to use raid56, not that I think this impacts my
situation).

One of the disks in the array started to go bad. Some bad sectors
meant one of my video files was unable to be read. This issue was
discovered during a btrfs scrub.

By deleting the offending file occupying the bad sector, I was able to
successfully remove the offending disk from the array. Replacement
disk is in and no issues, restored the file from backup. Great.


My problem now is the following warning message repeatedly spamming my
system logs every minute (even after fresh reboot and everything with
the disk management settled).

"kernel: device-mapper: core: btrfs: sending ioctl 5331 to DM device
without required privilege."

I have never seen the error before the disk issue, and I have
confirmed it only started after a reboot after the disk had started
going bad. I've not been able to figure out any other factors that
maybe contributing.

Everything appears functional and without issue, and I see no
consequences of these messages in system performance or behaviour.

I'm running fully updated Fedora 25 which is using btrfs-progs version
4.6.1-1.fc25 on kernel 4.11.3-200.fc25.x86_64.


Has anyone ever seen this issue? Any help or advice would be greatly
appreciated, thank you.


Regards,

Steven
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: send operates on ro snapshots only

2017-06-05 Thread Hans van Kranenburg
While talking to another btrfs user on IRC today, it became clear that a
major point of confusion in the btrfs send manual is that it's not
telling the user soon enough that send/receive solely operates on
subvolume snapshots instead of the original (read/write) subvolumes.

So, change the first few alineas to explicitely mention snapshots
instead. Technically, snapshots are also just subvolumes, but requiring
this level of technical detailed knowledge doesn't help the user which
is just trying out things.

Signed-off-by: Hans van Kranenburg 
---
 Documentation/btrfs-send.asciidoc | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/Documentation/btrfs-send.asciidoc 
b/Documentation/btrfs-send.asciidoc
index 1c5ccbe6..ef345f68 100644
--- a/Documentation/btrfs-send.asciidoc
+++ b/Documentation/btrfs-send.asciidoc
@@ -3,7 +3,7 @@ btrfs-send(8)
 
 NAME
 
-btrfs-send - generate a stream of changes between two subvolumes
+btrfs-send - generate a stream of changes between two subvolume snapshots
 
 SYNOPSIS
 
@@ -13,20 +13,21 @@ DESCRIPTION
 ---
 
 This command will generate a stream of instructions that describe changes
-between two subvolumes. The stream can be consumed by the *btrfs receive*
-command to replicate the sent subvolume on a different filesystem.
+between two subvolume snapshots. The stream can be consumed by the *btrfs
+receive* command to replicate the sent snapshot on a different filesystem.
 The command operates in two modes: full and incremental.
 
-All subvolumes involved in one send command must be read-only (ie. the
-read-only snapshots and this status cannot be changed if there's a running send
-operation that uses the subvolume).
+All snapshots involved in one send command must be read-only, and this status
+cannot be changed as long as there's a running send operation that uses the
+snapshot.
 
-In the full mode, the entire subvolume data and metadata will end up in the
+In the full mode, the entire snapshot data and metadata will end up in the
 stream.
 
-In the incremental mode (options '-p' and '-c'), there can be one or more
-parent subvolumes that will establish the base for determining the changes.
-The final stream will be smaller compared to the full mode.
+In the incremental mode (options '-p' and '-c'), previously sent snapshots that
+are available on both the sending and receiving side can be used to reduce the
+amount of information that has to be sent to reconstruct the sent snapshot on a
+different filesystem.
 
 It is allowed to omit the '-p ' option when '-c ' options
 are given, in which case *btrfs send* will determine a suitable parent among 
the
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: btrfs_ioctl_search_key documentation

2017-06-05 Thread Hans van Kranenburg
A programmer who is trying to implement calling the btrfs SEARCH
or SEARCH_V2 ioctl will probably soon end up reading this struct
definition.

Properly document the input fields to prevent common misconceptions:
 1. The search space is linear, not 3 dimensional. The invidual min/max
 values for objectid, type and offset cannot be used to filter the
 result, they only define the endpoints of an interval.
 2. The transaction id (a.k.a. generation) filter applies only on
 transaction id of the last COW operation on a whole metadata page, not
 on individual items.

Ad 1. The first misunderstanding was helped by the previous misleading
comments on min/max type and offset:
  "keys returned will be >= min and <= max".

Ad 2. For example, running btrfs balance will happily cause rewriting of
metadata pages that contain a filesystem tree of a read only subvolume,
causing transids to be increased.

Also, improve descriptions of tree_id and nr_items and add in/out
annotations.

Signed-off-by: Hans van Kranenburg 
---

Most interesting changes since v1:
 - mention the special tree_id input value 0
 - rewrite the part about min_key and max_key, trying to be more concise

Less interesting changes since v1:
 - the first line of the commit message was 51 characters long
 - a > ended up at the beginning of the line in the commit message, messing up
   the >= notation in some mail programs

 include/uapi/linux/btrfs.h | 62 +++---
 1 file changed, 42 insertions(+), 20 deletions(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index a456e5309238..88ae3d096a21 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -426,31 +426,53 @@ struct btrfs_ioctl_ino_lookup_args {
char name[BTRFS_INO_LOOKUP_PATH_MAX];
 };
 
+/* Search criteria for the btrfs SEARCH ioctl family. */
 struct btrfs_ioctl_search_key {
-   /* which root are we searching.  0 is the tree of tree roots */
-   __u64 tree_id;
-
-   /* keys returned will be >= min and <= max */
-   __u64 min_objectid;
-   __u64 max_objectid;
-
-   /* keys returned will be >= min and <= max */
-   __u64 min_offset;
-   __u64 max_offset;
-
-   /* max and min transids to search for */
-   __u64 min_transid;
-   __u64 max_transid;
+   /*
+* The tree we're searching in. 1 is the tree of tree roots, 2 is the
+* extent tree, etc...
+*
+* A special tree_id value of 0 will cause a search in the subvolume 
tree
+* that the inode which is passed to the ioctl is part of.
+*/
+   __u64 tree_id;  /* in */
 
-   /* keys returned will be >= min and <= max */
-   __u32 min_type;
-   __u32 max_type;
+   /*
+* When doing a tree search, we're actually taking a slice from a linear
+* search space of 136-bit keys.
+*
+* A full 136-bit tree key is composed as:
+*   (objectid << 72) + (type << 64) + offset
+*
+* The individual min and max values for objectid, type and offset 
define
+* the min_key and max_key values for the search range. All metadata 
items
+* with a key in the interval [min_key, max_key] will be returned.
+*
+* Additionally, we can filter the items returned on transaction id of 
the
+* metadata block they're stored in by specifying a transid range.  Be
+* aware that this transaction id only denotes when the metadata page 
that
+* currently contains the item got written the last time as result of a 
COW
+* operation.  The number does not have any meaning related to the
+* transaction in which an individual item that is being returned was
+* created or changed.
+*/
+   __u64 min_objectid; /* in */
+   __u64 max_objectid; /* in */
+   __u64 min_offset;   /* in */
+   __u64 max_offset;   /* in */
+   __u64 min_transid;  /* in */
+   __u64 max_transid;  /* in */
+   __u32 min_type; /* in */
+   __u32 max_type; /* in */
 
/*
-* how many items did userland ask for, and how many are we
-* returning
+* input: The maximum amount of results desired.
+* output: The actual amount of items returned, restricted by any of:
+*  - reaching the upper bound of the search range
+*  - reaching the input nr_items amount of items
+*  - completely filling the supplied memory buffer
 */
-   __u32 nr_items;
+   __u32 nr_items; /* in/out */
 
/* align to 64 bits */
__u32 unused;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation

2017-06-05 Thread Hans van Kranenburg
On 06/05/2017 09:00 PM, Goffredo Baroncelli wrote:
> On 2017-06-05 17:27, Hans van Kranenburg wrote:
>> + * When doing a tree search, we're actually taking a slice from a linear
>> + * search space of 136-bit keys:
>> + *
>> + * Key of the first possible item to be returned:
>> + *   (min_objectid << 72) + (min_type << 64) + min_offset
>> + * Key of the last possible item to be returned:
>> + *   (max_objectid << 72) + (max_type << 64) + max_offset
>> + *
> As non English people, I prefer a less verbose [...]

Yeah, it's a bit meh... I started to change the text again and ended up
rewriting it in a different way for patch V2 (sending in a minute).

> [...] and more programmatic form, like:
> 
> +  * When doing a tree search, we're actually taking a slice from a linear
> +  * search space of 136-bit keys:
> +*
> +  * A key is returned if 
> +  *   ((min_objectid << 72) + (min_type << 64) + min_offset  <=
> +*(objectid << 72) + (type << 64) + offset))  &&
> +  *   ((max_objectid << 72) + (max_type << 64) + max_offset >= 
> +*(objectid << 72) + (type << 64) + offset))
> +*

TBH, these lines mostly have an effect of dancing around before my eyes.

The point is, the search starts somewhere, end it ends somewhere. All
intermediate objects are returned. The min/max values are not applied as
a check to every key found in that range again. This way of explaining
("is returned if") adds to that wrong idea again imho.

>> + * [...] In other
>> + * words, they are not used to filter the type or offset of intermediate
>> + * keys encountered.
> 
> Even this is correct, I still find a bit complicate to fully understand the 
> meaning.
> 
> I would prefer to replace "not used" with "not usable"... But as stated above 
> I am not a native English people :-)

I'm dutch. ;) But for the user, using usable instead of used is nice
indeed, because it provides something that can be acted on, instead of
having something somewhere that "uses" it and apparently makes decisions
about what it does for some reason. Anyway, in the rewrite of that part
above, it's gone.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: skip commit transaction if we don't have enough pinned bytes

2017-06-05 Thread Liu Bo
On Fri, Jun 02, 2017 at 11:14:13AM -0700, Omar Sandoval wrote:
> On Fri, May 19, 2017 at 11:39:15AM -0600, Liu Bo wrote:
> > We commit transaction in order to reclaim space from pinned bytes because
> > it could process delayed refs, and in may_commit_transaction(), we check
> > first if pinned bytes are enough for the required space, we then check if
> > that plus bytes reserved for delayed insert are enough for the required
> > space.
> > 
> > This changes the code to the above logic.
> > 
> > Signed-off-by: Liu Bo 
> > ---
> >  fs/btrfs/extent-tree.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index e390451c72e6..bded1ddd1bb6 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -4837,7 +4837,7 @@ static int may_commit_transaction(struct 
> > btrfs_fs_info *fs_info,
> >  
> > spin_lock(_rsv->lock);
> > if (percpu_counter_compare(_info->total_bytes_pinned,
> > -  bytes - delayed_rsv->size) >= 0) {
> > +  bytes - delayed_rsv->size) < 0) {
> > spin_unlock(_rsv->lock);
> > return -ENOSPC;
> > }
> 
> I found this bug in my latest enospc investigation, too. However, the
> total_bytes_pinned counter is pretty broken. E.g., on my laptop:
> 
> $ sync; grep -H '' /sys/fs/btrfs/*/allocation/*/total_bytes_pinned
> /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/data/total_bytes_pinned:48693501952
> /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/metadata/total_bytes_pinned:-258146304
> /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/system/total_bytes_pinned:0
> 
> I have a patch to fix it that I haven't cleaned up yet, below. Without
> it, Bo's fix will probably cause early ENOSPCs. Dave, should we pull
> Bo's patch out of for-next? In any case, I'll get my fix sent out.
>

While it does result in early ENOSPC, I don't think it's that fix's
problem, but because we don't precisely track this %pinned counter.

It has shown metadata's %total_bytes_pinned that is negative, have you
seen a negative counter for data?

The problem I observed is btrfs_free_tree_block(), which doesn't add
pinned bytes when it's not the last reference, while
__btrfs_free_extent() dec pinned bytes when it's not the last ref.  (I
have made a simple reproducer.)

But for data, looks like we could over-add pinned bytes if a removed
ref got re-added later, I'll try to make a test for this case.

> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index be70d90dfee5..93ffa898df6d 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -470,7 +470,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle 
> *trans,
>  static noinline void
>  update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs,
>struct btrfs_delayed_ref_node *existing,
> -  struct btrfs_delayed_ref_node *update)
> +  struct btrfs_delayed_ref_node *update,
> +  int *old_ref_mod_ret)

The head ref doesn't seem related to the negative thing, we dec the
counter when running delayed refs, not head ref.

>  {
>   struct btrfs_delayed_ref_head *existing_ref;
>   struct btrfs_delayed_ref_head *ref;
> @@ -523,6 +524,8 @@ update_existing_head_ref(struct btrfs_delayed_ref_root 
> *delayed_refs,
>* currently, for refs we just added we know we're a-ok.
>*/
>   old_ref_mod = existing_ref->total_ref_mod;
> + if (old_ref_mod_ret)
> + *old_ref_mod_ret = old_ref_mod;
>   existing->ref_mod += update->ref_mod;
>   existing_ref->total_ref_mod += update->ref_mod;
>  
> @@ -550,7 +553,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>struct btrfs_delayed_ref_node *ref,
>struct btrfs_qgroup_extent_record *qrecord,
>u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
> -  int action, int is_data, int *qrecord_inserted_ret)
> +  int action, int is_data, int *qrecord_inserted_ret,
> +  int *old_ref_mod, int *new_ref_mod)
>  {
>   struct btrfs_delayed_ref_head *existing;
>   struct btrfs_delayed_ref_head *head_ref = NULL;
> @@ -638,7 +642,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>   if (existing) {
>   WARN_ON(ref_root && reserved && existing->qgroup_ref_root
>   && existing->qgroup_reserved);
> - update_existing_head_ref(delayed_refs, >node, ref);
> + update_existing_head_ref(delayed_refs, >node, ref,
> +  old_ref_mod);
>   /*
>* we've updated the existing ref, free the newly
>* allocated ref
> @@ -646,6 +651,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>   

Re: [PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation

2017-06-05 Thread Goffredo Baroncelli
On 2017-06-05 17:27, Hans van Kranenburg wrote:
> +  * When doing a tree search, we're actually taking a slice from a linear
> +  * search space of 136-bit keys:
> +  *
> +  * Key of the first possible item to be returned:
> +  *   (min_objectid << 72) + (min_type << 64) + min_offset
> +  * Key of the last possible item to be returned:
> +  *   (max_objectid << 72) + (max_type << 64) + max_offset
> +  *


As non English people, I prefer a less verbose and more programmatic form, like:

+* When doing a tree search, we're actually taking a slice from a linear
+* search space of 136-bit keys:
+*
+* A key is returned if 
+*   ((min_objectid << 72) + (min_type << 64) + min_offset  <=
+*(objectid << 72) + (type << 64) + offset))  &&
+*   ((max_objectid << 72) + (max_type << 64) + max_offset >= 
+*(objectid << 72) + (type << 64) + offset))
+*




> +  * [...] In other
> +  * words, they are not used to filter the type or offset of intermediate
> +  * keys encountered.

Even this is correct, I still find a bit complicate to fully understand the 
meaning.

I would prefer to replace "not used" with "not usable"... But as stated above I 
am not a native English people :-)

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size

2017-06-05 Thread Timofey Titovets
2017-06-05 19:10 GMT+03:00 David Sterba :
> On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote:
>> Btrfs already skip store of data where compression didn't
>> free at least one byte. Let's make logic better and make check
>> that compression free at least one sector size
>> because in another case it useless to store this data compressed
>>
>> Signed-off-by: Timofey Titovets 
>> ---
>>  fs/btrfs/inode.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 17cbe930..2793007b 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode 
>> *inode,
>>   /*
>>* one last check to make sure the compression is really a
>>* win, compare the page count read with the blocks on disk
>> +  * compression must free at least one sector size
>>*/
>>   total_in = ALIGN(total_in, PAGE_SIZE);
>> - if (total_compressed >= total_in) {
>> + if (total_compressed + blocksize > total_in) {
>
> We're doing aligned calculation here, shouldn't this be >= ?
>
> If total_compressed + blocksize == total_in, we have saved exactly one
> blocksize.

We discussed that already in:
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg64350.html

IIRC: long in short:
- input data size 8192
- output data size 4096

This invertion logic, i.e. check if compression can be skipped, if
comparasion are true -> skip compression.

In case of above check,
old logic:
total_compressed >= total_in
4096 >= 8192 -> will_compress=1
With if blocksize added:
4096+4096 >= 8192 -> will_compress=0
So this must be changed to:
4096+4096 > 8192 -> will_compress=1
Because compression save one blocksize

Also will_compress not used after this code, so in theory this code
can be refactored to be more obvious, like that:
  total_in = ALIGN(total_in, PAGE_SIZE);
- if (total_compressed + blocksize > total_in) {
-  will_compress = 0;
- } else {
+if (total_compressed + blocksize <= total_in) {
  num_bytes = total_in;
   *num_added += 1;

Thanks

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL] Btrfs fixes for 4.12

2017-06-05 Thread David Sterba
Hi,

please pull the following assorted fixes to 4.12. Thanks.


The following changes since commit 9bcaaea7418d09691f1ffab5c49aacafe3eef9d0:

  btrfs: fix the gfp_mask for the reada_zones radix tree (2017-05-04 16:56:11 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.12

for you to fetch changes up to 2eec59665d545ce443184b9c7a32a65b4dde6d4c:

  Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io (2017-06-01 17:00:11 
+0200)


Colin Ian King (1):
  btrfs: fix incorrect error return ret being passed to mapping_set_error

David Sterba (1):
  btrfs: use correct types for page indices in btrfs_page_exists_in_range

Jan Kara (1):
  btrfs: Make flush bios explicitely sync

Jeff Mahoney (2):
  btrfs: fix memory leak in update_space_info failure path
  btrfs: fix race with relocation recovery and fs_root setup

Liu Bo (2):
  Btrfs: skip commit transaction if we don't have enough pinned bytes
  Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io

Qu Wenruo (1):
  btrfs: fiemap: Cache and merge fiemap extent before submit it to user

Su Yue (1):
  btrfs: tree-log.c: Wrong printk information about namelen

 fs/btrfs/dir-item.c|   2 +-
 fs/btrfs/disk-io.c |  10 ++--
 fs/btrfs/extent-tree.c |   9 ++--
 fs/btrfs/extent_io.c   | 126 +++--
 fs/btrfs/inode.c   |   6 +--
 5 files changed, 138 insertions(+), 15 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] btrfs: qgroup: Fix hang when using inode_cache and qgroup

2017-06-05 Thread Goldwyn Rodrigues


On 05/31/2017 03:08 AM, Qu Wenruo wrote:
> Commit 48a89bc4f2ce ("btrfs: qgroups: Retry after commit on getting EDQUOT")
> is causing hang, with the following backtrace:
> 
> Call Trace:
>  __schedule+0x374/0xaf0
>  schedule+0x3d/0x90
>  wait_for_commit+0x4a/0x80 [btrfs]
>  ? wake_atomic_t_function+0x60/0x60
>  btrfs_commit_transaction+0xe0/0xa10 [btrfs]  <<< Here
>  ? start_transaction+0xad/0x510 [btrfs]
>  qgroup_reserve+0x1f0/0x350 [btrfs]
>  btrfs_qgroup_reserve_data+0xf8/0x2f0 [btrfs]
>  ? _raw_spin_unlock+0x27/0x40
>  btrfs_check_data_free_space+0x6d/0xb0 [btrfs]
>  btrfs_delalloc_reserve_space+0x25/0x70 [btrfs]
>  btrfs_save_ino_cache+0x402/0x650 [btrfs]
>  commit_fs_roots+0xb7/0x170 [btrfs]
>  btrfs_commit_transaction+0x425/0xa10 [btrfs] <<< And here
>  qgroup_reserve+0x1f0/0x350 [btrfs]
>  btrfs_qgroup_reserve_data+0xf8/0x2f0 [btrfs]
>  ? _raw_spin_unlock+0x27/0x40
>  btrfs_check_data_free_space+0x6d/0xb0 [btrfs]
>  btrfs_delalloc_reserve_space+0x25/0x70 [btrfs]
>  btrfs_direct_IO+0x1c5/0x3b0 [btrfs]
>  generic_file_direct_write+0xab/0x150
>  btrfs_file_write_iter+0x243/0x530 [btrfs]
>  __vfs_write+0xc9/0x120
>  vfs_write+0xcb/0x1f0
>  SyS_pwrite64+0x79/0x90
>  entry_SYSCALL_64_fastpath+0x18/0xad
> 
> The problem is that, inode_cache will be written in commit_fs_roots(),
> which is called in btrfs_commit_transaction().
> 
> And when it fails to reserve enough data space, qgroup_reserve() will
> try to call btrfs_commit_transaction() again, then we are waiting for
> ourselves.
> 
> The patch will introduce can_retry parameter for qgroup_reserve(),
> allowing related callers to avoid deadly commit transaction deadlock.
> 
> Now for space cache inode, we will not allow qgroup retry, so it will
> not cause deadlock.
> 
> Fixes: 48a89bc4f2ce ("btrfs: qgroups: Retry after commit on getting EDQUOT")
> Cc: Goldwyn Rodrigues 
> Signed-off-by: Qu Wenruo 
> ---
> Commit  48a89bc4f2ce ("btrfs: qgroups: Retry after commit on getting EDQUOT")
> is not only causing such deadlock, but also screwing up qgroup reserved
> space for even generic test cases.
> 
> I'm afraid we may need to revert that commit if we can't find a good way
> to fix the newly caused qgroup meta reserved space underflow.
> (Unlike old bug which is qgroup data reserved space underflow, this time
> the commit is causing new metadata space underflow).

I tried the same with direct I/O and have the same results. I run into
underflows often. By reverting the patch, we are avoiding the problem
not resolving it. The numbers don't add up and the point is to find out
where the numbers are getting lost (or counted in excess). I will
continue investigating on this front.

By ignoring the warning (unset BTRFS_DEBUG) and continuing during
overflow, we are just avoiding the problem. It does not show up in dmesg
any longer.


-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size

2017-06-05 Thread David Sterba
On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote:
> Btrfs already skip store of data where compression didn't
> free at least one byte. Let's make logic better and make check
> that compression free at least one sector size
> because in another case it useless to store this data compressed
> 
> Signed-off-by: Timofey Titovets 
> ---
>  fs/btrfs/inode.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 17cbe930..2793007b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode 
> *inode,
>   /*
>* one last check to make sure the compression is really a
>* win, compare the page count read with the blocks on disk
> +  * compression must free at least one sector size
>*/
>   total_in = ALIGN(total_in, PAGE_SIZE);
> - if (total_compressed >= total_in) {
> + if (total_compressed + blocksize > total_in) {

We're doing aligned calculation here, shouldn't this be >= ?

If total_compressed + blocksize == total_in, we have saved exactly one
blocksize.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation

2017-06-05 Thread Hans van Kranenburg
On 06/05/2017 05:27 PM, Hans van Kranenburg wrote:
> A programmer who is trying to implement calling the btrfs SEARCH
> or SEARCH_V2 ioctl will probably soon end up reading this struct
> definition.
> 
> Properly document the input fields to prevent common misconceptions:
>  1. The search space is linear, not 3 dimensional.
>  2. The transaction id (a.k.a. generation) filter applies only on
>  transaction id of the last COW operation on a whole metadata page, not
>  on individual items.
> 
> Ad 1. The first misunderstanding was helped by the previous misleading
> comments on min/max type and offset: "keys returned will be
>> = min and <= max".
> 
> Ad 2. For example, running btrfs balance will happily cause rewriting of
> metadata pages that contain a filesystem tree of a read only subvolume,
> causing transids to be increased.
> 
> Signed-off-by: Hans van Kranenburg 
> ---
>  include/uapi/linux/btrfs.h | 63 
> +++---
>  1 file changed, 43 insertions(+), 20 deletions(-)
> 
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index a456e5309238..864ad86c5d80 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -427,30 +427,53 @@ struct btrfs_ioctl_ino_lookup_args {
>  };
>  
>  struct btrfs_ioctl_search_key {
> - /* which root are we searching.  0 is the tree of tree roots */
> - __u64 tree_id;

Since this 0 is incorrect... I also fixed that...

> -
> - /* keys returned will be >= min and <= max */
> - __u64 min_objectid;
> - __u64 max_objectid;
> -
> - /* keys returned will be >= min and <= max */
> - __u64 min_offset;
> - __u64 max_offset;
> -
> - /* max and min transids to search for */
> - __u64 min_transid;
> - __u64 max_transid;
> + /*
> +  * The tree we're searching in. 1 is the tree of tree roots, 2 is the
> +  * extent tree, etc...

But after trying to feed a tree 0 to SEARCH, I got output, while this
tree does not exist at all...

Then I found this, in ioctl.c:

if (sk->tree_id == 0) {
/* search the root of the inode that was passed */
root = BTRFS_I(inode)->root;
}

I'll send an updated patch later to also mention that special case,
which is quite useful to know about actually...

Hans

> +  */
> + __u64 tree_id;  /* in */
>  
> - /* keys returned will be >= min and <= max */
> - __u32 min_type;
> - __u32 max_type;
> + /*
> +  * This struct is used to provide the search key range for the SEARCH 
> and
> +  * SEARCH_V2 ioctls.
> +  *
> +  * When doing a tree search, we're actually taking a slice from a linear
> +  * search space of 136-bit keys:
> +  *
> +  * Key of the first possible item to be returned:
> +  *   (min_objectid << 72) + (min_type << 64) + min_offset
> +  * Key of the last possible item to be returned:
> +  *   (max_objectid << 72) + (max_type << 64) + max_offset
> +  *
> +  * All of the min/max input numbers only define the ultimate lower and
> +  * upper boundary of the keys of items that will be returned. In other
> +  * words, they are not used to filter the type or offset of intermediate
> +  * keys encountered.
> +  *
> +  * Additionally, we can filter the items returned on transaction id of 
> the
> +  * metadata block they're stored in by specifying a transid range.  Be
> +  * aware that this transaction id only denotes when the metadata page 
> that
> +  * currently contains the item got written the last time as result of a 
> COW
> +  * operation.  The number does not have any meaning related to the
> +  * transaction in which an individual item that is being returned was
> +  * created or changed.
> +  */
> + __u64 min_objectid; /* in */
> + __u64 max_objectid; /* in */
> + __u64 min_offset;   /* in */
> + __u64 max_offset;   /* in */
> + __u64 min_transid;  /* in */
> + __u64 max_transid;  /* in */
> + __u32 min_type; /* in */
> + __u32 max_type; /* in */
>  
>   /*
> -  * how many items did userland ask for, and how many are we
> -  * returning
> +  * input: The maximum amount of results desired.
> +  * output: The actual amount of items returned, restricted by either
> +  *   stopping the search when reaching the input nr_items amount of 
> items,
> +  *   or restricted by the size of the supplied memory buffer.
>*/
> - __u32 nr_items;
> + __u32 nr_items; /* in/out */
>  
>   /* align to 64 bits */
>   __u32 unused;
> 


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/2] Btrfs: lzo.c - compressed data size must be less then input size

2017-06-05 Thread David Sterba
On Tue, May 30, 2017 at 02:18:04AM +0300, Timofey Titovets wrote:
> Logic already return error if compression
> make data bigger, let's sync logic with zlib
> and also return error if compressed size
> are equal to input size
> 
> Signed-off-by: Timofey Titovets 

Reviewed-by: David Sterba 

I've updated the changelog a bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation

2017-06-05 Thread Hans van Kranenburg
A programmer who is trying to implement calling the btrfs SEARCH
or SEARCH_V2 ioctl will probably soon end up reading this struct
definition.

Properly document the input fields to prevent common misconceptions:
 1. The search space is linear, not 3 dimensional.
 2. The transaction id (a.k.a. generation) filter applies only on
 transaction id of the last COW operation on a whole metadata page, not
 on individual items.

Ad 1. The first misunderstanding was helped by the previous misleading
comments on min/max type and offset: "keys returned will be
>= min and <= max".

Ad 2. For example, running btrfs balance will happily cause rewriting of
metadata pages that contain a filesystem tree of a read only subvolume,
causing transids to be increased.

Signed-off-by: Hans van Kranenburg 
---
 include/uapi/linux/btrfs.h | 63 +++---
 1 file changed, 43 insertions(+), 20 deletions(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index a456e5309238..864ad86c5d80 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -427,30 +427,53 @@ struct btrfs_ioctl_ino_lookup_args {
 };
 
 struct btrfs_ioctl_search_key {
-   /* which root are we searching.  0 is the tree of tree roots */
-   __u64 tree_id;
-
-   /* keys returned will be >= min and <= max */
-   __u64 min_objectid;
-   __u64 max_objectid;
-
-   /* keys returned will be >= min and <= max */
-   __u64 min_offset;
-   __u64 max_offset;
-
-   /* max and min transids to search for */
-   __u64 min_transid;
-   __u64 max_transid;
+   /*
+* The tree we're searching in. 1 is the tree of tree roots, 2 is the
+* extent tree, etc...
+*/
+   __u64 tree_id;  /* in */
 
-   /* keys returned will be >= min and <= max */
-   __u32 min_type;
-   __u32 max_type;
+   /*
+* This struct is used to provide the search key range for the SEARCH 
and
+* SEARCH_V2 ioctls.
+*
+* When doing a tree search, we're actually taking a slice from a linear
+* search space of 136-bit keys:
+*
+* Key of the first possible item to be returned:
+*   (min_objectid << 72) + (min_type << 64) + min_offset
+* Key of the last possible item to be returned:
+*   (max_objectid << 72) + (max_type << 64) + max_offset
+*
+* All of the min/max input numbers only define the ultimate lower and
+* upper boundary of the keys of items that will be returned. In other
+* words, they are not used to filter the type or offset of intermediate
+* keys encountered.
+*
+* Additionally, we can filter the items returned on transaction id of 
the
+* metadata block they're stored in by specifying a transid range.  Be
+* aware that this transaction id only denotes when the metadata page 
that
+* currently contains the item got written the last time as result of a 
COW
+* operation.  The number does not have any meaning related to the
+* transaction in which an individual item that is being returned was
+* created or changed.
+*/
+   __u64 min_objectid; /* in */
+   __u64 max_objectid; /* in */
+   __u64 min_offset;   /* in */
+   __u64 max_offset;   /* in */
+   __u64 min_transid;  /* in */
+   __u64 max_transid;  /* in */
+   __u32 min_type; /* in */
+   __u32 max_type; /* in */
 
/*
-* how many items did userland ask for, and how many are we
-* returning
+* input: The maximum amount of results desired.
+* output: The actual amount of items returned, restricted by either
+*   stopping the search when reaching the input nr_items amount of 
items,
+*   or restricted by the size of the supplied memory buffer.
 */
-   __u32 nr_items;
+   __u32 nr_items; /* in/out */
 
/* align to 64 bits */
__u32 unused;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: simplify code with bio_io_error

2017-06-05 Thread David Sterba
On Fri, Jun 02, 2017 at 04:08:50PM +0800, Guoqing Jiang wrote:
> bio_io_error was introduced in the commit 4246a0b
> ("block: add a bi_error field to struct bio"), so
> use it to simplify code.
> 
> Signed-off-by: Guoqing Jiang 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/10] fs: return on congested block device

2017-06-05 Thread Jens Axboe
On 06/04/2017 11:35 PM, Goldwyn Rodrigues wrote:
> @@ -1900,6 +1905,17 @@ generic_make_request_checks(struct bio *bio)
>   goto end_io;
>   }
>  
> + /*
> +  * For a REQ_NOWAIT based request, return -EOPNOTSUPP
> +  * if queue does not have QUEUE_FLAG_NOWAIT_SUPPORT set
> +  * and if it is not a request based queue.
> +  */
> +
> + if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q)) {
> + err = -EOPNOTSUPP;
> + goto end_io;
> + }

There is no QUEUE_FLAG_NOWAIT, this looks like a somewhat stale comment.

This patch should be prefixed with 'block', not 'fs'.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps

2017-06-05 Thread David Sterba
On Mon, Jun 05, 2017 at 12:12:31AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> First, instead of open-coding the vmalloc() fallback, use the new
> kvzalloc() helper. Second, use memalloc_nofs_{save,restore}() instead of
> GFP_NOFS, as vmalloc() uses some GFP_KERNEL allocations internally which
> could lead to deadlocks.
> 
> Signed-off-by: Omar Sandoval 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 8/9] btrfs: Check namelen before in 'btrfs_del_root_ref'

2017-06-05 Thread David Sterba
On Thu, Jun 01, 2017 at 04:57:15PM +0800, Su Yue wrote:
> Call btrfs_is_namelen_valid before memcmp.
> 
> Signed-off-by: Su Yue 
> ---
>  fs/btrfs/root-tree.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c
> index 7d6bc308bf43..7a5450600723 100644
> --- a/fs/btrfs/root-tree.c
> +++ b/fs/btrfs/root-tree.c
> @@ -390,6 +390,13 @@ int btrfs_del_root_ref(struct btrfs_trans_handle *trans,
>   WARN_ON(btrfs_root_ref_dirid(leaf, ref) != dirid);
>   WARN_ON(btrfs_root_ref_name_len(leaf, ref) != name_len);
>   ptr = (unsigned long)(ref + 1);
> + ret = btrfs_is_namelen_valid(leaf, path->slots[0], ptr,
> +  name_len);
> + if (!ret) {
> + err = -EIO;

This results in many fstests failures, eg.

[ 1886.766605] run fstests btrfs/008 at 2017-06-05 17:18:52
[ 1897.043952] BTRFS: device fsid 136be123-23f0-4fa7-bcbe-2001c90fb638 devid 1 
transid 5 /dev/sdb6
[ 1897.100674] BTRFS info (device sdb6): disk space caching is enabled
[ 1897.100684] BTRFS info (device sdb6): has skinny extents
[ 1897.100689] BTRFS info (device sdb6): flagging fs with big metadata feature
[ 1897.106492] BTRFS info (device sdb6): detected SSD devices, enabling SSD mode
[ 1897.107360] BTRFS info (device sdb6): creating UUID tree
[ 1897.304521] BTRFS critical (device sdb5): invalid dir item name len: 7
[ 1897.304541] BTRFS: error (device sdb5) in btrfs_unlink_subvol:4244: errno=-5 
IO failure
[ 1897.304548] BTRFS info (device sdb5): forced readonly
[ 1897.304557] BTRFS: error (device sdb5) in btrfs_ioctl_snap_destroy:2508: 
errno=-5 IO failure
[ 1897.465795] BTRFS error (device sdb5): cleaner transaction attach returned 
-30

with current for-next, so I'll remove the branch with namelen validation for 
now.

> + goto out;
> + }
> +
>   WARN_ON(memcmp_extent_buffer(leaf, name, ptr, name_len));
>   *sequence = btrfs_root_ref_sequence(leaf, ref);
>  
> -- 
> 2.13.0
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: snapshot destruction making IO extremely slow

2017-06-05 Thread Jakob Schürz
Am 2017-05-24 um 10:23 schrieb Marat Khalili:
> Hello,
> 
>> It occurs when enabling quotas on a volume. When there are a
>> lot of snapshots that are deleted, the system becomes extremely
>> unresponsive (IO often waiting for 30s on a SSD). When I don't have
>> quotas, removing snapshots is fast.
> Same problem here. It is now common knowledge in the list that qgroups
> cause performance problems. I try to avoid deleting many snapshots at
> once because of this.

I enabled quota a few weeks ago, then my system gets so unresponsible,
that services stopped working and nothing worked anymore... disabling
qouta was also not able, it endet up in a kernel-panic and frozen system.

Hat to reinstall my system back from the backup...

greetings
Jakob

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps

2017-06-05 Thread Omar Sandoval
From: Omar Sandoval 

First, instead of open-coding the vmalloc() fallback, use the new
kvzalloc() helper. Second, use memalloc_nofs_{save,restore}() instead of
GFP_NOFS, as vmalloc() uses some GFP_KERNEL allocations internally which
could lead to deadlocks.

Signed-off-by: Omar Sandoval 
---
 fs/btrfs/free-space-tree.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
index fc0bd8406758..e5e92e6ba734 100644
--- a/fs/btrfs/free-space-tree.c
+++ b/fs/btrfs/free-space-tree.c
@@ -17,7 +17,7 @@
  */
 
 #include 
-#include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "locking.h"
@@ -153,21 +153,21 @@ static inline u32 free_space_bitmap_size(u64 size, u32 
sectorsize)
 
 static u8 *alloc_bitmap(u32 bitmap_size)
 {
-   void *mem;
+   u8 *ret;
+   unsigned int nofs_flag;
 
/*
-* The allocation size varies, observed numbers were < 4K up to 16K.
-* Using vmalloc unconditionally would be too heavy, we'll try
-* contiguous allocations first.
+* GFP_NOFS doesn't work with kvmalloc(), but we really can't recurse
+* into the filesystem as the free space bitmap can be modified in the
+* critical section of a transaction commit.
+*
+* TODO: push the memalloc_nofs_{save,restore}() to the caller where we
+* know that recursion is unsafe.
 */
-   if  (bitmap_size <= PAGE_SIZE)
-   return kzalloc(bitmap_size, GFP_NOFS);
-
-   mem = kzalloc(bitmap_size, GFP_NOFS | __GFP_NOWARN);
-   if (mem)
-   return mem;
-
-   return __vmalloc(bitmap_size, GFP_NOFS | __GFP_ZERO, PAGE_KERNEL);
+   nofs_flag = memalloc_nofs_save();
+   ret = kvzalloc(bitmap_size, GFP_KERNEL);
+   memalloc_nofs_restore(nofs_flag);
+   return ret;
 }
 
 int convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans,
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html