Re: [PATCH] btrfs-progs: send operates on ro snapshots only
Hans van Kranenburg posted on Tue, 06 Jun 2017 01:32:18 +0200 as excerpted: > While talking to another btrfs user on IRC today, it became clear that a > major point of confusion in the btrfs send manual is that it's not > telling the user soon enough that send/receive solely operates on > subvolume snapshots instead of the original (read/write) subvolumes. > > So, change the first few alineas to explicitely mention snapshots ... lines to explicitly ... > instead. Technically, snapshots are also just subvolumes, but requiring > this level of technical detailed knowledge doesn't help the user which > is just trying out things. [actual patch omitted] LGTM (other than the commit description spelling as above, neither my read-thru nor the spellchecker flagged anything in the actual patch). The new wording is indeed much clearer. Thanks. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: skip commit transaction if we don't have enough pinned bytes
On Fri, Jun 02, 2017 at 11:14:13AM -0700, Omar Sandoval wrote: > On Fri, May 19, 2017 at 11:39:15AM -0600, Liu Bo wrote: > > We commit transaction in order to reclaim space from pinned bytes because > > it could process delayed refs, and in may_commit_transaction(), we check > > first if pinned bytes are enough for the required space, we then check if > > that plus bytes reserved for delayed insert are enough for the required > > space. > > > > This changes the code to the above logic. > > > > Signed-off-by: Liu Bo> > --- > > fs/btrfs/extent-tree.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > index e390451c72e6..bded1ddd1bb6 100644 > > --- a/fs/btrfs/extent-tree.c > > +++ b/fs/btrfs/extent-tree.c > > @@ -4837,7 +4837,7 @@ static int may_commit_transaction(struct > > btrfs_fs_info *fs_info, > > > > spin_lock(_rsv->lock); > > if (percpu_counter_compare(_info->total_bytes_pinned, > > - bytes - delayed_rsv->size) >= 0) { > > + bytes - delayed_rsv->size) < 0) { > > spin_unlock(_rsv->lock); > > return -ENOSPC; > > } > > I found this bug in my latest enospc investigation, too. However, the > total_bytes_pinned counter is pretty broken. E.g., on my laptop: > > $ sync; grep -H '' /sys/fs/btrfs/*/allocation/*/total_bytes_pinned > /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/data/total_bytes_pinned:48693501952 > /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/metadata/total_bytes_pinned:-258146304 > /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/system/total_bytes_pinned:0 > > I have a patch to fix it that I haven't cleaned up yet, below. Without > it, Bo's fix will probably cause early ENOSPCs. Dave, should we pull > Bo's patch out of for-next? In any case, I'll get my fix sent out. > When data delayed refs are not merged, __btrfs_free_extent() will dec the pinned as expected, but when they got merged, %pinned bytes only got inc'd but not dec'd because the delayed ref has been deleted due to ref_count == 0. This doesn't happen to tree block ref since tree block ref doesn't have more than one ref, so ref_mod is either 1 or -1, and tree block ref gets removed via btrfs_free_tree_block(), which also needs to be changed. Based on that, I think we can solve the problem by touching btrfs_free_tree_block() and data delayed ref instead of head ref. thanks, -liubo > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c > index be70d90dfee5..93ffa898df6d 100644 > --- a/fs/btrfs/delayed-ref.c > +++ b/fs/btrfs/delayed-ref.c > @@ -470,7 +470,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle > *trans, > static noinline void > update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs, >struct btrfs_delayed_ref_node *existing, > - struct btrfs_delayed_ref_node *update) > + struct btrfs_delayed_ref_node *update, > + int *old_ref_mod_ret) > { > struct btrfs_delayed_ref_head *existing_ref; > struct btrfs_delayed_ref_head *ref; > @@ -523,6 +524,8 @@ update_existing_head_ref(struct btrfs_delayed_ref_root > *delayed_refs, >* currently, for refs we just added we know we're a-ok. >*/ > old_ref_mod = existing_ref->total_ref_mod; > + if (old_ref_mod_ret) > + *old_ref_mod_ret = old_ref_mod; > existing->ref_mod += update->ref_mod; > existing_ref->total_ref_mod += update->ref_mod; > > @@ -550,7 +553,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, >struct btrfs_delayed_ref_node *ref, >struct btrfs_qgroup_extent_record *qrecord, >u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved, > - int action, int is_data, int *qrecord_inserted_ret) > + int action, int is_data, int *qrecord_inserted_ret, > + int *old_ref_mod, int *new_ref_mod) > { > struct btrfs_delayed_ref_head *existing; > struct btrfs_delayed_ref_head *head_ref = NULL; > @@ -638,7 +642,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, > if (existing) { > WARN_ON(ref_root && reserved && existing->qgroup_ref_root > && existing->qgroup_reserved); > - update_existing_head_ref(delayed_refs, >node, ref); > + update_existing_head_ref(delayed_refs, >node, ref, > + old_ref_mod); > /* >* we've updated the existing ref, free the newly >* allocated ref > @@ -646,6 +651,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, > kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); > head_ref = existing; > } else { > + if
device-mapper btrfs Warning Spam
Hello all, I have a btrfs raid-0 array (initially a raid-6 but long ago converted due to warnings not to use raid56, not that I think this impacts my situation). One of the disks in the array started to go bad. Some bad sectors meant one of my video files was unable to be read. This issue was discovered during a btrfs scrub. By deleting the offending file occupying the bad sector, I was able to successfully remove the offending disk from the array. Replacement disk is in and no issues, restored the file from backup. Great. My problem now is the following warning message repeatedly spamming my system logs every minute (even after fresh reboot and everything with the disk management settled). "kernel: device-mapper: core: btrfs: sending ioctl 5331 to DM device without required privilege." I have never seen the error before the disk issue, and I have confirmed it only started after a reboot after the disk had started going bad. I've not been able to figure out any other factors that maybe contributing. Everything appears functional and without issue, and I see no consequences of these messages in system performance or behaviour. I'm running fully updated Fedora 25 which is using btrfs-progs version 4.6.1-1.fc25 on kernel 4.11.3-200.fc25.x86_64. Has anyone ever seen this issue? Any help or advice would be greatly appreciated, thank you. Regards, Steven -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: send operates on ro snapshots only
While talking to another btrfs user on IRC today, it became clear that a major point of confusion in the btrfs send manual is that it's not telling the user soon enough that send/receive solely operates on subvolume snapshots instead of the original (read/write) subvolumes. So, change the first few alineas to explicitely mention snapshots instead. Technically, snapshots are also just subvolumes, but requiring this level of technical detailed knowledge doesn't help the user which is just trying out things. Signed-off-by: Hans van Kranenburg--- Documentation/btrfs-send.asciidoc | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/Documentation/btrfs-send.asciidoc b/Documentation/btrfs-send.asciidoc index 1c5ccbe6..ef345f68 100644 --- a/Documentation/btrfs-send.asciidoc +++ b/Documentation/btrfs-send.asciidoc @@ -3,7 +3,7 @@ btrfs-send(8) NAME -btrfs-send - generate a stream of changes between two subvolumes +btrfs-send - generate a stream of changes between two subvolume snapshots SYNOPSIS @@ -13,20 +13,21 @@ DESCRIPTION --- This command will generate a stream of instructions that describe changes -between two subvolumes. The stream can be consumed by the *btrfs receive* -command to replicate the sent subvolume on a different filesystem. +between two subvolume snapshots. The stream can be consumed by the *btrfs +receive* command to replicate the sent snapshot on a different filesystem. The command operates in two modes: full and incremental. -All subvolumes involved in one send command must be read-only (ie. the -read-only snapshots and this status cannot be changed if there's a running send -operation that uses the subvolume). +All snapshots involved in one send command must be read-only, and this status +cannot be changed as long as there's a running send operation that uses the +snapshot. -In the full mode, the entire subvolume data and metadata will end up in the +In the full mode, the entire snapshot data and metadata will end up in the stream. -In the incremental mode (options '-p' and '-c'), there can be one or more -parent subvolumes that will establish the base for determining the changes. -The final stream will be smaller compared to the full mode. +In the incremental mode (options '-p' and '-c'), previously sent snapshots that +are available on both the sending and receiving side can be used to reduce the +amount of information that has to be sent to reconstruct the sent snapshot on a +different filesystem. It is allowed to omit the '-p ' option when '-c ' options are given, in which case *btrfs send* will determine a suitable parent among the -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: btrfs_ioctl_search_key documentation
A programmer who is trying to implement calling the btrfs SEARCH or SEARCH_V2 ioctl will probably soon end up reading this struct definition. Properly document the input fields to prevent common misconceptions: 1. The search space is linear, not 3 dimensional. The invidual min/max values for objectid, type and offset cannot be used to filter the result, they only define the endpoints of an interval. 2. The transaction id (a.k.a. generation) filter applies only on transaction id of the last COW operation on a whole metadata page, not on individual items. Ad 1. The first misunderstanding was helped by the previous misleading comments on min/max type and offset: "keys returned will be >= min and <= max". Ad 2. For example, running btrfs balance will happily cause rewriting of metadata pages that contain a filesystem tree of a read only subvolume, causing transids to be increased. Also, improve descriptions of tree_id and nr_items and add in/out annotations. Signed-off-by: Hans van Kranenburg--- Most interesting changes since v1: - mention the special tree_id input value 0 - rewrite the part about min_key and max_key, trying to be more concise Less interesting changes since v1: - the first line of the commit message was 51 characters long - a > ended up at the beginning of the line in the commit message, messing up the >= notation in some mail programs include/uapi/linux/btrfs.h | 62 +++--- 1 file changed, 42 insertions(+), 20 deletions(-) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index a456e5309238..88ae3d096a21 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -426,31 +426,53 @@ struct btrfs_ioctl_ino_lookup_args { char name[BTRFS_INO_LOOKUP_PATH_MAX]; }; +/* Search criteria for the btrfs SEARCH ioctl family. */ struct btrfs_ioctl_search_key { - /* which root are we searching. 0 is the tree of tree roots */ - __u64 tree_id; - - /* keys returned will be >= min and <= max */ - __u64 min_objectid; - __u64 max_objectid; - - /* keys returned will be >= min and <= max */ - __u64 min_offset; - __u64 max_offset; - - /* max and min transids to search for */ - __u64 min_transid; - __u64 max_transid; + /* +* The tree we're searching in. 1 is the tree of tree roots, 2 is the +* extent tree, etc... +* +* A special tree_id value of 0 will cause a search in the subvolume tree +* that the inode which is passed to the ioctl is part of. +*/ + __u64 tree_id; /* in */ - /* keys returned will be >= min and <= max */ - __u32 min_type; - __u32 max_type; + /* +* When doing a tree search, we're actually taking a slice from a linear +* search space of 136-bit keys. +* +* A full 136-bit tree key is composed as: +* (objectid << 72) + (type << 64) + offset +* +* The individual min and max values for objectid, type and offset define +* the min_key and max_key values for the search range. All metadata items +* with a key in the interval [min_key, max_key] will be returned. +* +* Additionally, we can filter the items returned on transaction id of the +* metadata block they're stored in by specifying a transid range. Be +* aware that this transaction id only denotes when the metadata page that +* currently contains the item got written the last time as result of a COW +* operation. The number does not have any meaning related to the +* transaction in which an individual item that is being returned was +* created or changed. +*/ + __u64 min_objectid; /* in */ + __u64 max_objectid; /* in */ + __u64 min_offset; /* in */ + __u64 max_offset; /* in */ + __u64 min_transid; /* in */ + __u64 max_transid; /* in */ + __u32 min_type; /* in */ + __u32 max_type; /* in */ /* -* how many items did userland ask for, and how many are we -* returning +* input: The maximum amount of results desired. +* output: The actual amount of items returned, restricted by any of: +* - reaching the upper bound of the search range +* - reaching the input nr_items amount of items +* - completely filling the supplied memory buffer */ - __u32 nr_items; + __u32 nr_items; /* in/out */ /* align to 64 bits */ __u32 unused; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation
On 06/05/2017 09:00 PM, Goffredo Baroncelli wrote: > On 2017-06-05 17:27, Hans van Kranenburg wrote: >> + * When doing a tree search, we're actually taking a slice from a linear >> + * search space of 136-bit keys: >> + * >> + * Key of the first possible item to be returned: >> + * (min_objectid << 72) + (min_type << 64) + min_offset >> + * Key of the last possible item to be returned: >> + * (max_objectid << 72) + (max_type << 64) + max_offset >> + * > As non English people, I prefer a less verbose [...] Yeah, it's a bit meh... I started to change the text again and ended up rewriting it in a different way for patch V2 (sending in a minute). > [...] and more programmatic form, like: > > + * When doing a tree search, we're actually taking a slice from a linear > + * search space of 136-bit keys: > +* > + * A key is returned if > + * ((min_objectid << 72) + (min_type << 64) + min_offset <= > +*(objectid << 72) + (type << 64) + offset)) && > + * ((max_objectid << 72) + (max_type << 64) + max_offset >= > +*(objectid << 72) + (type << 64) + offset)) > +* TBH, these lines mostly have an effect of dancing around before my eyes. The point is, the search starts somewhere, end it ends somewhere. All intermediate objects are returned. The min/max values are not applied as a check to every key found in that range again. This way of explaining ("is returned if") adds to that wrong idea again imho. >> + * [...] In other >> + * words, they are not used to filter the type or offset of intermediate >> + * keys encountered. > > Even this is correct, I still find a bit complicate to fully understand the > meaning. > > I would prefer to replace "not used" with "not usable"... But as stated above > I am not a native English people :-) I'm dutch. ;) But for the user, using usable instead of used is nice indeed, because it provides something that can be acted on, instead of having something somewhere that "uses" it and apparently makes decisions about what it does for some reason. Anyway, in the rewrite of that part above, it's gone. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: skip commit transaction if we don't have enough pinned bytes
On Fri, Jun 02, 2017 at 11:14:13AM -0700, Omar Sandoval wrote: > On Fri, May 19, 2017 at 11:39:15AM -0600, Liu Bo wrote: > > We commit transaction in order to reclaim space from pinned bytes because > > it could process delayed refs, and in may_commit_transaction(), we check > > first if pinned bytes are enough for the required space, we then check if > > that plus bytes reserved for delayed insert are enough for the required > > space. > > > > This changes the code to the above logic. > > > > Signed-off-by: Liu Bo> > --- > > fs/btrfs/extent-tree.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > index e390451c72e6..bded1ddd1bb6 100644 > > --- a/fs/btrfs/extent-tree.c > > +++ b/fs/btrfs/extent-tree.c > > @@ -4837,7 +4837,7 @@ static int may_commit_transaction(struct > > btrfs_fs_info *fs_info, > > > > spin_lock(_rsv->lock); > > if (percpu_counter_compare(_info->total_bytes_pinned, > > - bytes - delayed_rsv->size) >= 0) { > > + bytes - delayed_rsv->size) < 0) { > > spin_unlock(_rsv->lock); > > return -ENOSPC; > > } > > I found this bug in my latest enospc investigation, too. However, the > total_bytes_pinned counter is pretty broken. E.g., on my laptop: > > $ sync; grep -H '' /sys/fs/btrfs/*/allocation/*/total_bytes_pinned > /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/data/total_bytes_pinned:48693501952 > /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/metadata/total_bytes_pinned:-258146304 > /sys/fs/btrfs/f31cc926-37d3-442d-b50a-10c62d47badc/allocation/system/total_bytes_pinned:0 > > I have a patch to fix it that I haven't cleaned up yet, below. Without > it, Bo's fix will probably cause early ENOSPCs. Dave, should we pull > Bo's patch out of for-next? In any case, I'll get my fix sent out. > While it does result in early ENOSPC, I don't think it's that fix's problem, but because we don't precisely track this %pinned counter. It has shown metadata's %total_bytes_pinned that is negative, have you seen a negative counter for data? The problem I observed is btrfs_free_tree_block(), which doesn't add pinned bytes when it's not the last reference, while __btrfs_free_extent() dec pinned bytes when it's not the last ref. (I have made a simple reproducer.) But for data, looks like we could over-add pinned bytes if a removed ref got re-added later, I'll try to make a test for this case. > diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c > index be70d90dfee5..93ffa898df6d 100644 > --- a/fs/btrfs/delayed-ref.c > +++ b/fs/btrfs/delayed-ref.c > @@ -470,7 +470,8 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle > *trans, > static noinline void > update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs, >struct btrfs_delayed_ref_node *existing, > - struct btrfs_delayed_ref_node *update) > + struct btrfs_delayed_ref_node *update, > + int *old_ref_mod_ret) The head ref doesn't seem related to the negative thing, we dec the counter when running delayed refs, not head ref. > { > struct btrfs_delayed_ref_head *existing_ref; > struct btrfs_delayed_ref_head *ref; > @@ -523,6 +524,8 @@ update_existing_head_ref(struct btrfs_delayed_ref_root > *delayed_refs, >* currently, for refs we just added we know we're a-ok. >*/ > old_ref_mod = existing_ref->total_ref_mod; > + if (old_ref_mod_ret) > + *old_ref_mod_ret = old_ref_mod; > existing->ref_mod += update->ref_mod; > existing_ref->total_ref_mod += update->ref_mod; > > @@ -550,7 +553,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, >struct btrfs_delayed_ref_node *ref, >struct btrfs_qgroup_extent_record *qrecord, >u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved, > - int action, int is_data, int *qrecord_inserted_ret) > + int action, int is_data, int *qrecord_inserted_ret, > + int *old_ref_mod, int *new_ref_mod) > { > struct btrfs_delayed_ref_head *existing; > struct btrfs_delayed_ref_head *head_ref = NULL; > @@ -638,7 +642,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, > if (existing) { > WARN_ON(ref_root && reserved && existing->qgroup_ref_root > && existing->qgroup_reserved); > - update_existing_head_ref(delayed_refs, >node, ref); > + update_existing_head_ref(delayed_refs, >node, ref, > + old_ref_mod); > /* >* we've updated the existing ref, free the newly >* allocated ref > @@ -646,6 +651,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info, >
Re: [PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation
On 2017-06-05 17:27, Hans van Kranenburg wrote: > + * When doing a tree search, we're actually taking a slice from a linear > + * search space of 136-bit keys: > + * > + * Key of the first possible item to be returned: > + * (min_objectid << 72) + (min_type << 64) + min_offset > + * Key of the last possible item to be returned: > + * (max_objectid << 72) + (max_type << 64) + max_offset > + * As non English people, I prefer a less verbose and more programmatic form, like: +* When doing a tree search, we're actually taking a slice from a linear +* search space of 136-bit keys: +* +* A key is returned if +* ((min_objectid << 72) + (min_type << 64) + min_offset <= +*(objectid << 72) + (type << 64) + offset)) && +* ((max_objectid << 72) + (max_type << 64) + max_offset >= +*(objectid << 72) + (type << 64) + offset)) +* > + * [...] In other > + * words, they are not used to filter the type or offset of intermediate > + * keys encountered. Even this is correct, I still find a bit complicate to fully understand the meaning. I would prefer to replace "not used" with "not usable"... But as stated above I am not a native English people :-) BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size
2017-06-05 19:10 GMT+03:00 David Sterba: > On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote: >> Btrfs already skip store of data where compression didn't >> free at least one byte. Let's make logic better and make check >> that compression free at least one sector size >> because in another case it useless to store this data compressed >> >> Signed-off-by: Timofey Titovets >> --- >> fs/btrfs/inode.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> index 17cbe930..2793007b 100644 >> --- a/fs/btrfs/inode.c >> +++ b/fs/btrfs/inode.c >> @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode >> *inode, >> /* >>* one last check to make sure the compression is really a >>* win, compare the page count read with the blocks on disk >> + * compression must free at least one sector size >>*/ >> total_in = ALIGN(total_in, PAGE_SIZE); >> - if (total_compressed >= total_in) { >> + if (total_compressed + blocksize > total_in) { > > We're doing aligned calculation here, shouldn't this be >= ? > > If total_compressed + blocksize == total_in, we have saved exactly one > blocksize. We discussed that already in: https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg64350.html IIRC: long in short: - input data size 8192 - output data size 4096 This invertion logic, i.e. check if compression can be skipped, if comparasion are true -> skip compression. In case of above check, old logic: total_compressed >= total_in 4096 >= 8192 -> will_compress=1 With if blocksize added: 4096+4096 >= 8192 -> will_compress=0 So this must be changed to: 4096+4096 > 8192 -> will_compress=1 Because compression save one blocksize Also will_compress not used after this code, so in theory this code can be refactored to be more obvious, like that: total_in = ALIGN(total_in, PAGE_SIZE); - if (total_compressed + blocksize > total_in) { - will_compress = 0; - } else { +if (total_compressed + blocksize <= total_in) { num_bytes = total_in; *num_added += 1; Thanks -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PULL] Btrfs fixes for 4.12
Hi, please pull the following assorted fixes to 4.12. Thanks. The following changes since commit 9bcaaea7418d09691f1ffab5c49aacafe3eef9d0: btrfs: fix the gfp_mask for the reada_zones radix tree (2017-05-04 16:56:11 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.12 for you to fetch changes up to 2eec59665d545ce443184b9c7a32a65b4dde6d4c: Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io (2017-06-01 17:00:11 +0200) Colin Ian King (1): btrfs: fix incorrect error return ret being passed to mapping_set_error David Sterba (1): btrfs: use correct types for page indices in btrfs_page_exists_in_range Jan Kara (1): btrfs: Make flush bios explicitely sync Jeff Mahoney (2): btrfs: fix memory leak in update_space_info failure path btrfs: fix race with relocation recovery and fs_root setup Liu Bo (2): Btrfs: skip commit transaction if we don't have enough pinned bytes Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io Qu Wenruo (1): btrfs: fiemap: Cache and merge fiemap extent before submit it to user Su Yue (1): btrfs: tree-log.c: Wrong printk information about namelen fs/btrfs/dir-item.c| 2 +- fs/btrfs/disk-io.c | 10 ++-- fs/btrfs/extent-tree.c | 9 ++-- fs/btrfs/extent_io.c | 126 +++-- fs/btrfs/inode.c | 6 +-- 5 files changed, 138 insertions(+), 15 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] btrfs: qgroup: Fix hang when using inode_cache and qgroup
On 05/31/2017 03:08 AM, Qu Wenruo wrote: > Commit 48a89bc4f2ce ("btrfs: qgroups: Retry after commit on getting EDQUOT") > is causing hang, with the following backtrace: > > Call Trace: > __schedule+0x374/0xaf0 > schedule+0x3d/0x90 > wait_for_commit+0x4a/0x80 [btrfs] > ? wake_atomic_t_function+0x60/0x60 > btrfs_commit_transaction+0xe0/0xa10 [btrfs] <<< Here > ? start_transaction+0xad/0x510 [btrfs] > qgroup_reserve+0x1f0/0x350 [btrfs] > btrfs_qgroup_reserve_data+0xf8/0x2f0 [btrfs] > ? _raw_spin_unlock+0x27/0x40 > btrfs_check_data_free_space+0x6d/0xb0 [btrfs] > btrfs_delalloc_reserve_space+0x25/0x70 [btrfs] > btrfs_save_ino_cache+0x402/0x650 [btrfs] > commit_fs_roots+0xb7/0x170 [btrfs] > btrfs_commit_transaction+0x425/0xa10 [btrfs] <<< And here > qgroup_reserve+0x1f0/0x350 [btrfs] > btrfs_qgroup_reserve_data+0xf8/0x2f0 [btrfs] > ? _raw_spin_unlock+0x27/0x40 > btrfs_check_data_free_space+0x6d/0xb0 [btrfs] > btrfs_delalloc_reserve_space+0x25/0x70 [btrfs] > btrfs_direct_IO+0x1c5/0x3b0 [btrfs] > generic_file_direct_write+0xab/0x150 > btrfs_file_write_iter+0x243/0x530 [btrfs] > __vfs_write+0xc9/0x120 > vfs_write+0xcb/0x1f0 > SyS_pwrite64+0x79/0x90 > entry_SYSCALL_64_fastpath+0x18/0xad > > The problem is that, inode_cache will be written in commit_fs_roots(), > which is called in btrfs_commit_transaction(). > > And when it fails to reserve enough data space, qgroup_reserve() will > try to call btrfs_commit_transaction() again, then we are waiting for > ourselves. > > The patch will introduce can_retry parameter for qgroup_reserve(), > allowing related callers to avoid deadly commit transaction deadlock. > > Now for space cache inode, we will not allow qgroup retry, so it will > not cause deadlock. > > Fixes: 48a89bc4f2ce ("btrfs: qgroups: Retry after commit on getting EDQUOT") > Cc: Goldwyn Rodrigues> Signed-off-by: Qu Wenruo > --- > Commit 48a89bc4f2ce ("btrfs: qgroups: Retry after commit on getting EDQUOT") > is not only causing such deadlock, but also screwing up qgroup reserved > space for even generic test cases. > > I'm afraid we may need to revert that commit if we can't find a good way > to fix the newly caused qgroup meta reserved space underflow. > (Unlike old bug which is qgroup data reserved space underflow, this time > the commit is causing new metadata space underflow). I tried the same with direct I/O and have the same results. I run into underflows often. By reverting the patch, we are avoiding the problem not resolving it. The numbers don't add up and the point is to find out where the numbers are getting lost (or counted in excess). I will continue investigating on this front. By ignoring the warning (unset BTRFS_DEBUG) and continuing during overflow, we are just avoiding the problem. It does not show up in dmesg any longer. -- Goldwyn -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/2] Btrfs: compression must free at least one sector size
On Tue, May 30, 2017 at 02:18:05AM +0300, Timofey Titovets wrote: > Btrfs already skip store of data where compression didn't > free at least one byte. Let's make logic better and make check > that compression free at least one sector size > because in another case it useless to store this data compressed > > Signed-off-by: Timofey Titovets> --- > fs/btrfs/inode.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 17cbe930..2793007b 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -609,9 +609,10 @@ static noinline void compress_file_range(struct inode > *inode, > /* >* one last check to make sure the compression is really a >* win, compare the page count read with the blocks on disk > + * compression must free at least one sector size >*/ > total_in = ALIGN(total_in, PAGE_SIZE); > - if (total_compressed >= total_in) { > + if (total_compressed + blocksize > total_in) { We're doing aligned calculation here, shouldn't this be >= ? If total_compressed + blocksize == total_in, we have saved exactly one blocksize. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation
On 06/05/2017 05:27 PM, Hans van Kranenburg wrote: > A programmer who is trying to implement calling the btrfs SEARCH > or SEARCH_V2 ioctl will probably soon end up reading this struct > definition. > > Properly document the input fields to prevent common misconceptions: > 1. The search space is linear, not 3 dimensional. > 2. The transaction id (a.k.a. generation) filter applies only on > transaction id of the last COW operation on a whole metadata page, not > on individual items. > > Ad 1. The first misunderstanding was helped by the previous misleading > comments on min/max type and offset: "keys returned will be >> = min and <= max". > > Ad 2. For example, running btrfs balance will happily cause rewriting of > metadata pages that contain a filesystem tree of a read only subvolume, > causing transids to be increased. > > Signed-off-by: Hans van Kranenburg> --- > include/uapi/linux/btrfs.h | 63 > +++--- > 1 file changed, 43 insertions(+), 20 deletions(-) > > diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h > index a456e5309238..864ad86c5d80 100644 > --- a/include/uapi/linux/btrfs.h > +++ b/include/uapi/linux/btrfs.h > @@ -427,30 +427,53 @@ struct btrfs_ioctl_ino_lookup_args { > }; > > struct btrfs_ioctl_search_key { > - /* which root are we searching. 0 is the tree of tree roots */ > - __u64 tree_id; Since this 0 is incorrect... I also fixed that... > - > - /* keys returned will be >= min and <= max */ > - __u64 min_objectid; > - __u64 max_objectid; > - > - /* keys returned will be >= min and <= max */ > - __u64 min_offset; > - __u64 max_offset; > - > - /* max and min transids to search for */ > - __u64 min_transid; > - __u64 max_transid; > + /* > + * The tree we're searching in. 1 is the tree of tree roots, 2 is the > + * extent tree, etc... But after trying to feed a tree 0 to SEARCH, I got output, while this tree does not exist at all... Then I found this, in ioctl.c: if (sk->tree_id == 0) { /* search the root of the inode that was passed */ root = BTRFS_I(inode)->root; } I'll send an updated patch later to also mention that special case, which is quite useful to know about actually... Hans > + */ > + __u64 tree_id; /* in */ > > - /* keys returned will be >= min and <= max */ > - __u32 min_type; > - __u32 max_type; > + /* > + * This struct is used to provide the search key range for the SEARCH > and > + * SEARCH_V2 ioctls. > + * > + * When doing a tree search, we're actually taking a slice from a linear > + * search space of 136-bit keys: > + * > + * Key of the first possible item to be returned: > + * (min_objectid << 72) + (min_type << 64) + min_offset > + * Key of the last possible item to be returned: > + * (max_objectid << 72) + (max_type << 64) + max_offset > + * > + * All of the min/max input numbers only define the ultimate lower and > + * upper boundary of the keys of items that will be returned. In other > + * words, they are not used to filter the type or offset of intermediate > + * keys encountered. > + * > + * Additionally, we can filter the items returned on transaction id of > the > + * metadata block they're stored in by specifying a transid range. Be > + * aware that this transaction id only denotes when the metadata page > that > + * currently contains the item got written the last time as result of a > COW > + * operation. The number does not have any meaning related to the > + * transaction in which an individual item that is being returned was > + * created or changed. > + */ > + __u64 min_objectid; /* in */ > + __u64 max_objectid; /* in */ > + __u64 min_offset; /* in */ > + __u64 max_offset; /* in */ > + __u64 min_transid; /* in */ > + __u64 max_transid; /* in */ > + __u32 min_type; /* in */ > + __u32 max_type; /* in */ > > /* > - * how many items did userland ask for, and how many are we > - * returning > + * input: The maximum amount of results desired. > + * output: The actual amount of items returned, restricted by either > + * stopping the search when reaching the input nr_items amount of > items, > + * or restricted by the size of the supplied memory buffer. >*/ > - __u32 nr_items; > + __u32 nr_items; /* in/out */ > > /* align to 64 bits */ > __u32 unused; > -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/2] Btrfs: lzo.c - compressed data size must be less then input size
On Tue, May 30, 2017 at 02:18:04AM +0300, Timofey Titovets wrote: > Logic already return error if compression > make data bigger, let's sync logic with zlib > and also return error if compressed size > are equal to input size > > Signed-off-by: Timofey TitovetsReviewed-by: David Sterba I've updated the changelog a bit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: Improve btrfs_ioctl_search_key documentation
A programmer who is trying to implement calling the btrfs SEARCH or SEARCH_V2 ioctl will probably soon end up reading this struct definition. Properly document the input fields to prevent common misconceptions: 1. The search space is linear, not 3 dimensional. 2. The transaction id (a.k.a. generation) filter applies only on transaction id of the last COW operation on a whole metadata page, not on individual items. Ad 1. The first misunderstanding was helped by the previous misleading comments on min/max type and offset: "keys returned will be >= min and <= max". Ad 2. For example, running btrfs balance will happily cause rewriting of metadata pages that contain a filesystem tree of a read only subvolume, causing transids to be increased. Signed-off-by: Hans van Kranenburg--- include/uapi/linux/btrfs.h | 63 +++--- 1 file changed, 43 insertions(+), 20 deletions(-) diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index a456e5309238..864ad86c5d80 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -427,30 +427,53 @@ struct btrfs_ioctl_ino_lookup_args { }; struct btrfs_ioctl_search_key { - /* which root are we searching. 0 is the tree of tree roots */ - __u64 tree_id; - - /* keys returned will be >= min and <= max */ - __u64 min_objectid; - __u64 max_objectid; - - /* keys returned will be >= min and <= max */ - __u64 min_offset; - __u64 max_offset; - - /* max and min transids to search for */ - __u64 min_transid; - __u64 max_transid; + /* +* The tree we're searching in. 1 is the tree of tree roots, 2 is the +* extent tree, etc... +*/ + __u64 tree_id; /* in */ - /* keys returned will be >= min and <= max */ - __u32 min_type; - __u32 max_type; + /* +* This struct is used to provide the search key range for the SEARCH and +* SEARCH_V2 ioctls. +* +* When doing a tree search, we're actually taking a slice from a linear +* search space of 136-bit keys: +* +* Key of the first possible item to be returned: +* (min_objectid << 72) + (min_type << 64) + min_offset +* Key of the last possible item to be returned: +* (max_objectid << 72) + (max_type << 64) + max_offset +* +* All of the min/max input numbers only define the ultimate lower and +* upper boundary of the keys of items that will be returned. In other +* words, they are not used to filter the type or offset of intermediate +* keys encountered. +* +* Additionally, we can filter the items returned on transaction id of the +* metadata block they're stored in by specifying a transid range. Be +* aware that this transaction id only denotes when the metadata page that +* currently contains the item got written the last time as result of a COW +* operation. The number does not have any meaning related to the +* transaction in which an individual item that is being returned was +* created or changed. +*/ + __u64 min_objectid; /* in */ + __u64 max_objectid; /* in */ + __u64 min_offset; /* in */ + __u64 max_offset; /* in */ + __u64 min_transid; /* in */ + __u64 max_transid; /* in */ + __u32 min_type; /* in */ + __u32 max_type; /* in */ /* -* how many items did userland ask for, and how many are we -* returning +* input: The maximum amount of results desired. +* output: The actual amount of items returned, restricted by either +* stopping the search when reaching the input nr_items amount of items, +* or restricted by the size of the supplied memory buffer. */ - __u32 nr_items; + __u32 nr_items; /* in/out */ /* align to 64 bits */ __u32 unused; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: simplify code with bio_io_error
On Fri, Jun 02, 2017 at 04:08:50PM +0800, Guoqing Jiang wrote: > bio_io_error was introduced in the commit 4246a0b > ("block: add a bi_error field to struct bio"), so > use it to simplify code. > > Signed-off-by: Guoqing JiangReviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/10] fs: return on congested block device
On 06/04/2017 11:35 PM, Goldwyn Rodrigues wrote: > @@ -1900,6 +1905,17 @@ generic_make_request_checks(struct bio *bio) > goto end_io; > } > > + /* > + * For a REQ_NOWAIT based request, return -EOPNOTSUPP > + * if queue does not have QUEUE_FLAG_NOWAIT_SUPPORT set > + * and if it is not a request based queue. > + */ > + > + if ((bio->bi_opf & REQ_NOWAIT) && !queue_is_rq_based(q)) { > + err = -EOPNOTSUPP; > + goto end_io; > + } There is no QUEUE_FLAG_NOWAIT, this looks like a somewhat stale comment. This patch should be prefixed with 'block', not 'fs'. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps
On Mon, Jun 05, 2017 at 12:12:31AM -0700, Omar Sandoval wrote: > From: Omar Sandoval> > First, instead of open-coding the vmalloc() fallback, use the new > kvzalloc() helper. Second, use memalloc_nofs_{save,restore}() instead of > GFP_NOFS, as vmalloc() uses some GFP_KERNEL allocations internally which > could lead to deadlocks. > > Signed-off-by: Omar Sandoval Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 8/9] btrfs: Check namelen before in 'btrfs_del_root_ref'
On Thu, Jun 01, 2017 at 04:57:15PM +0800, Su Yue wrote: > Call btrfs_is_namelen_valid before memcmp. > > Signed-off-by: Su Yue> --- > fs/btrfs/root-tree.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/fs/btrfs/root-tree.c b/fs/btrfs/root-tree.c > index 7d6bc308bf43..7a5450600723 100644 > --- a/fs/btrfs/root-tree.c > +++ b/fs/btrfs/root-tree.c > @@ -390,6 +390,13 @@ int btrfs_del_root_ref(struct btrfs_trans_handle *trans, > WARN_ON(btrfs_root_ref_dirid(leaf, ref) != dirid); > WARN_ON(btrfs_root_ref_name_len(leaf, ref) != name_len); > ptr = (unsigned long)(ref + 1); > + ret = btrfs_is_namelen_valid(leaf, path->slots[0], ptr, > + name_len); > + if (!ret) { > + err = -EIO; This results in many fstests failures, eg. [ 1886.766605] run fstests btrfs/008 at 2017-06-05 17:18:52 [ 1897.043952] BTRFS: device fsid 136be123-23f0-4fa7-bcbe-2001c90fb638 devid 1 transid 5 /dev/sdb6 [ 1897.100674] BTRFS info (device sdb6): disk space caching is enabled [ 1897.100684] BTRFS info (device sdb6): has skinny extents [ 1897.100689] BTRFS info (device sdb6): flagging fs with big metadata feature [ 1897.106492] BTRFS info (device sdb6): detected SSD devices, enabling SSD mode [ 1897.107360] BTRFS info (device sdb6): creating UUID tree [ 1897.304521] BTRFS critical (device sdb5): invalid dir item name len: 7 [ 1897.304541] BTRFS: error (device sdb5) in btrfs_unlink_subvol:4244: errno=-5 IO failure [ 1897.304548] BTRFS info (device sdb5): forced readonly [ 1897.304557] BTRFS: error (device sdb5) in btrfs_ioctl_snap_destroy:2508: errno=-5 IO failure [ 1897.465795] BTRFS error (device sdb5): cleaner transaction attach returned -30 with current for-next, so I'll remove the branch with namelen validation for now. > + goto out; > + } > + > WARN_ON(memcmp_extent_buffer(leaf, name, ptr, name_len)); > *sequence = btrfs_root_ref_sequence(leaf, ref); > > -- > 2.13.0 > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: snapshot destruction making IO extremely slow
Am 2017-05-24 um 10:23 schrieb Marat Khalili: > Hello, > >> It occurs when enabling quotas on a volume. When there are a >> lot of snapshots that are deleted, the system becomes extremely >> unresponsive (IO often waiting for 30s on a SSD). When I don't have >> quotas, removing snapshots is fast. > Same problem here. It is now common knowledge in the list that qgroups > cause performance problems. I try to avoid deleting many snapshots at > once because of this. I enabled quota a few weeks ago, then my system gets so unresponsible, that services stopped working and nothing worked anymore... disabling qouta was also not able, it endet up in a kernel-panic and frozen system. Hat to reinstall my system back from the backup... greetings Jakob -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: use memalloc_nofs and kvzalloc() for free space tree bitmaps
From: Omar SandovalFirst, instead of open-coding the vmalloc() fallback, use the new kvzalloc() helper. Second, use memalloc_nofs_{save,restore}() instead of GFP_NOFS, as vmalloc() uses some GFP_KERNEL allocations internally which could lead to deadlocks. Signed-off-by: Omar Sandoval --- fs/btrfs/free-space-tree.c | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c index fc0bd8406758..e5e92e6ba734 100644 --- a/fs/btrfs/free-space-tree.c +++ b/fs/btrfs/free-space-tree.c @@ -17,7 +17,7 @@ */ #include -#include +#include #include "ctree.h" #include "disk-io.h" #include "locking.h" @@ -153,21 +153,21 @@ static inline u32 free_space_bitmap_size(u64 size, u32 sectorsize) static u8 *alloc_bitmap(u32 bitmap_size) { - void *mem; + u8 *ret; + unsigned int nofs_flag; /* -* The allocation size varies, observed numbers were < 4K up to 16K. -* Using vmalloc unconditionally would be too heavy, we'll try -* contiguous allocations first. +* GFP_NOFS doesn't work with kvmalloc(), but we really can't recurse +* into the filesystem as the free space bitmap can be modified in the +* critical section of a transaction commit. +* +* TODO: push the memalloc_nofs_{save,restore}() to the caller where we +* know that recursion is unsafe. */ - if (bitmap_size <= PAGE_SIZE) - return kzalloc(bitmap_size, GFP_NOFS); - - mem = kzalloc(bitmap_size, GFP_NOFS | __GFP_NOWARN); - if (mem) - return mem; - - return __vmalloc(bitmap_size, GFP_NOFS | __GFP_ZERO, PAGE_KERNEL); + nofs_flag = memalloc_nofs_save(); + ret = kvzalloc(bitmap_size, GFP_KERNEL); + memalloc_nofs_restore(nofs_flag); + return ret; } int convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans, -- 2.13.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html