[PATCH 1/1] btrfs: fix improper return value

2016-12-03 Thread Pan Bian
In function btrfs_uuid_tree_iterate(), errno is assigned to variable ret
on errors. However, it directly returns 0. It may be better to return
ret. This patch also removes the warning, because the caller already
prints a warning.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731
Signed-off-by: Pan Bian 
---
 fs/btrfs/uuid-tree.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
index 7fc89e4..83bb2f2 100644
--- a/fs/btrfs/uuid-tree.c
+++ b/fs/btrfs/uuid-tree.c
@@ -351,7 +351,5 @@ int btrfs_uuid_tree_iterate(struct btrfs_fs_info *fs_info,
 
 out:
btrfs_free_path(path);
-   if (ret)
-   btrfs_warn(fs_info, "btrfs_uuid_tree_iterate failed %d", ret);
-   return 0;
+   return ret;
 }
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: system hangs due to qgroups

2016-12-03 Thread Adam Borowski
On Sat, Dec 03, 2016 at 10:46:40PM +0100, Marc Joliet wrote:
> As it's a rescue shell, I have only the one shell AFAIK, and it's occupied
> by mount.  So I can't tell if there are dmesg entries, however, when this
> happens during a normal running system, I never saw any dmesg entries. 

You can use "open" (might be named "openvt") to spawn a shell on
tty2/tty3/etc.  And if you have "screen" installed, Ctrl-a c spawns new
terminals (Ctrl-a n/p/0-9 to switch).

> The output of sysrq+t is too big to capture all of it (i.e., I can't scroll 
> back to the beginning)

You may use netconsole to log everything kernel says to another machine.  I
can't provide you with the incantations from the top of my head (got working
serial (far more reliable) on all my dev boxes, and it doesn't work with
bridging ie containers on production), but as your rescue shell has no
network sharing, assuming your network card driver supports a feature
netconsole needs _and_ stars were aligned right when your network card was
manufactured, netconsole is a valuable aid.

The system might be not dead enough to stop userland network logging from
getting through, too.


Meow!
-- 
The bill declaring Jesus as the King of Poland fails to specify whether
the addition is at the top or end of the list of kings.  What should the
historians do?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/18] btrfs: call functions that overwrite their root parameter with fs_info

2016-12-03 Thread Jeff Mahoney
On 12/2/16 5:32 PM, Omar Sandoval wrote:
> On Fri, Dec 02, 2016 at 12:07:21AM -0500, je...@suse.com wrote:
>> From: Jeff Mahoney 
>>
>> There are 11 functions that accept a root parameter and immediately
>> overwrite it.  We can pass those an fs_info pointer instead.
>>
>> Signed-off-by: Jeff Mahoney 
>> ---
>>  fs/btrfs/ctree.h|  4 ++--
>>  fs/btrfs/disk-io.c  |  4 ++--
>>  fs/btrfs/extent-tree.c  | 17 +++---
>>  fs/btrfs/file-item.c|  5 ++---
>>  fs/btrfs/free-space-cache.c |  5 ++---
>>  fs/btrfs/free-space-cache.h |  2 +-
>>  fs/btrfs/transaction.c  |  9 
>>  fs/btrfs/tree-log.c |  6 ++---
>>  fs/btrfs/volumes.c  | 55 
>> +
>>  fs/btrfs/volumes.h  |  4 ++--
>>  10 files changed, 52 insertions(+), 59 deletions(-)
>>
> 
> [snip]
> 
>>  int btrfs_remove_chunk(struct btrfs_trans_handle *trans,
>> -   struct btrfs_root *root, u64 chunk_offset)
>> +   struct btrfs_fs_info *fs_info, u64 chunk_offset)
>>  {
>> +struct btrfs_root *root = fs_info->chunk_root;
>>  struct extent_map_tree *em_tree;
>>  struct extent_map *em;
>>  struct btrfs_root *extent_root = root->fs_info->extent_root;
>> @@ -2812,8 +2811,7 @@ int btrfs_remove_chunk(struct btrfs_trans_handle 
>> *trans,
>>  struct btrfs_fs_devices *fs_devices = root->fs_info->fs_devices;
>>  
>>  /* Just in case */
>> -root = root->fs_info->chunk_root;
>> -em_tree = >fs_info->mapping_tree.map_tree;
>> +em_tree = _info->mapping_tree.map_tree;
> 
> I think this "Just in case" comment was referring to the reassignment of
> the root, so it's just confusing now.

Agreed.  Thanks for the review.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: system hangs due to qgroups

2016-12-03 Thread Chris Murphy
On Sat, Dec 3, 2016 at 2:46 PM, Marc Joliet <mar...@gmx.de> wrote:
> On Saturday 03 December 2016 13:42:42 Chris Murphy wrote:
>> On Sat, Dec 3, 2016 at 11:40 AM, Marc Joliet <mar...@gmx.de> wrote:
>> > Hello all,
>> >
>> > I'm having some trouble with btrfs on a laptop, possibly due to qgroups.
>> > Specifically, some file system activities (e.g., snapshot creation,
>> > baloo_file_extractor from KDE Plasma) cause the system to hang for up to
>> > about 40 minutes, maybe more.
>>
>> Do you get any blocked tasks kernel messages? If so, issue sysrq+w
>> during the hang, and then check the system log (dmesg may not contain
>> everything if the command fills the message buffer). If it's a hang
>> without any kernel messages, then issue sysrq+t.
>>
>> https://www.kernel.org/doc/Documentation/sysrq.txt
>
> As it's a rescue shell, I have only the one shell AFAIK, and it's occupied by
> mount.  So I can't tell if there are dmesg entries, however, when this happens
> during a normal running system, I never saw any dmesg entries.  Anyway, I ran
> both.

OK so this is root fs? I would try to work on it from another volume.
An advantage of openSUSE Tumbleweed is they claim to fully support
qgroups, where upstream uses much more guarded language about its
stability.

Whereas last night's Fedora Rawhide has kernel 4.9-rc7 and btrfs-progs 4.8.5.
https://kojipkgs.fedoraproject.org/compose/rawhide/Fedora-Rawhide-20161203.n.0/compose/Workstation/x86_64/iso/Fedora-Workstation-netinst-x86_64-Rawhide-20161203.n.0.iso

You can use dd to write the ISO to a USB stick, it supports BIOS and
UEFI and Secure Boot.

Troubleshooting > Rescue a Fedora system > option 3 to get to a shell
The sysrq+t and sysrq+w can be written out in their entirety with
monotonic time using 'journalctl -b -k -o short-monotonic >
kernelmessages.log'

Unfortunately this is not a live system, so you can't (as far as I
know) install script to more easily capture everything to a single
file; 'btrfs check  > btrfscheck.log' should capture most of the
output, but it misses a few early lines for some reason.

And then scp those files to another system, or mount another stick and
copy locally.

>
> Should I take photos?  That'll be annoying to do with all the scrolling, but I
> can do that if need be.

I can't decipher it anyway, it's mainly for a dev who wanders across
this thread or if you file a bug report. But you can get the complete
output using the method above.

>
>> > After I next turned on the laptop, the balance resumed, causing bootup to
>> > fail, after which I remembered about the skip_balance mount option, which
>> > I
>> > tried in a rescue shell from an initramfs.
>>
>> The file system is the root filesystem? If so, skip_balance may not be
>> happening soon enough. Use kernel parameter rootflags=skip_balance
>> which will apply this mount option at the very first moment the file
>> system is mounted during boot.
>
> Yes, it's the root file system (there's that plus a swap partition).  I
> believe I tried rootflags, but I think it also failed, which is why I'm using
> a rescue shell now.  I can try it again, though, if anybody thinks that
> there's no point in waiting, especially if btrfs_scrub_pause in the btrfs-
> transaction call trace is significant.

It sounds like it's resuming a scrub. That won't happen if you boot
from an alternate volume. There's a scrub file found at
/var/lib/btrfs/ that tracks the progress of scrubs for each btrfs
volume - that directory with an inprogress scrub for your file system
is actually in the directory on that file system. If you haven't had
luck with btrfs scrub cancel, you can just remove the files in that
directory when you get a chance to rw mount the volume.



>
>> > Since I couldn't use skip_balance, and logically can't destroy qgroups on
>> > a
>> > read-only file system, I decided to wait for a regular mount to finish.
>> > That has been running since Tuesday, and I am slowly growing impatient.
>> Haha, no kidding! I think that's very patient.
>
> Heh :) . I've still got my main desktop (as ancient as it may be), so I'm
> content with waiting for now, but I don't want to wait forever, especially if
> there might not even be a point.

How big is the file system? Sounds like it's a single device volume on
a laptop so I'm guessing at most 1TB, and that'd mean at most 100GiB
of metadata, which should mean around 15 minutes max to completely
read and process all the metadata, and maybe a few hours to do a
scrub. I'd bail after a few hours for sure.



>
>> > Thus I arrive at my question(s): is there anything else I can try, short
>> > of
>> > reformatting and restoring from back

Re: system hangs due to qgroups

2016-12-03 Thread Marc Joliet
On Saturday 03 December 2016 13:42:42 Chris Murphy wrote:
> On Sat, Dec 3, 2016 at 11:40 AM, Marc Joliet  wrote:
> > Hello all,
> > 
> > I'm having some trouble with btrfs on a laptop, possibly due to qgroups.
> > Specifically, some file system activities (e.g., snapshot creation,
> > baloo_file_extractor from KDE Plasma) cause the system to hang for up to
> > about 40 minutes, maybe more.
> 
> Do you get any blocked tasks kernel messages? If so, issue sysrq+w
> during the hang, and then check the system log (dmesg may not contain
> everything if the command fills the message buffer). If it's a hang
> without any kernel messages, then issue sysrq+t.
> 
> https://www.kernel.org/doc/Documentation/sysrq.txt

As it's a rescue shell, I have only the one shell AFAIK, and it's occupied by 
mount.  So I can't tell if there are dmesg entries, however, when this happens 
during a normal running system, I never saw any dmesg entries.  Anyway, I ran 
both.

The output of sysrq+w mentions two tasks: "btrfs-transaction" with 
btrfs_scrub_pause+0xbe/0xd0 as the top-most entry in the call trace, and 
"mount" with its top-most entry at schedule+0x33/0x90 (it looks like it's 
still in the "early" processing, since there's also 
"btrfs_parse_early_options+0190/0x190" in the call trace).

The output of sysrq+t is too big to capture all of it (i.e., I can't scroll 
back to the beginning), but just looking at the task names that I *can* see, 
there are: btrfs-fixup, various btrfs-endio*, btrfs-rmw, btrfs-freespace, 
btrfs-delayed-m (cut off), btrfs-readahead, btrfs-qgroup-re (cut off), btrfs-
extent-re (cut off), btrfs-cleaner, and btrfs-transaction.  Oh, and a bunch of 
kworkers.

Should I take photos?  That'll be annoying to do with all the scrolling, but I 
can do that if need be.

> > After I next turned on the laptop, the balance resumed, causing bootup to
> > fail, after which I remembered about the skip_balance mount option, which
> > I
> > tried in a rescue shell from an initramfs.
> 
> The file system is the root filesystem? If so, skip_balance may not be
> happening soon enough. Use kernel parameter rootflags=skip_balance
> which will apply this mount option at the very first moment the file
> system is mounted during boot.

Yes, it's the root file system (there's that plus a swap partition).  I 
believe I tried rootflags, but I think it also failed, which is why I'm using 
a rescue shell now.  I can try it again, though, if anybody thinks that 
there's no point in waiting, especially if btrfs_scrub_pause in the btrfs-
transaction call trace is significant.

> > Since I couldn't use skip_balance, and logically can't destroy qgroups on
> > a
> > read-only file system, I decided to wait for a regular mount to finish. 
> > That has been running since Tuesday, and I am slowly growing impatient.
> Haha, no kidding! I think that's very patient.

Heh :) . I've still got my main desktop (as ancient as it may be), so I'm 
content with waiting for now, but I don't want to wait forever, especially if 
there might not even be a point.

> > Thus I arrive at my question(s): is there anything else I can try, short
> > of
> > reformatting and restoring from backup?  Can I use btrfs-check here, or
> > any
> > other tool?  Or...?
> 
> Yes, btrfs-progs 4.8.5 has the latest qgroup checks, so if there's
> something wrong it should find it and if not that's a bug of its own.

The initramfs has 4.8.4, but it looks like 4.8.5 was "only" an urgent bug fix, 
with no changes to qgroups handling, so I can use that, too.  Can it repair 
qgroups problems, too?

> > Also, should I be able to avoid reformatting: how do I properly disable
> > quota support?
> 
> 'btrfs quota disable' is the only command that applies to this and it
> requires rw mount; there's no 'noquota' mount option.

OK, thanks.

So what should I try next?  I'm sick at home, so I can spend more time on this 
than usual.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


[PATCH] btrfs: limit async_work allocation and worker func duration

2016-12-03 Thread Maxim Patlasov
Problem statement: unprivileged user who has read-write access to more than
one btrfs subvolume may easily consume all kernel memory (eventually
triggering oom-killer).

Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):

[root@kteam1 ~]# cat prep.sh

DEV=/dev/sdb
mkfs.btrfs -f $DEV
mount $DEV /mnt
for i in `seq 1 16`
do
mkdir /mnt/$i
btrfs subvolume create /mnt/SV_$i
ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
chmod a+rwx /mnt/$i
done

[root@kteam1 ~]# sh prep.sh

[maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done

[root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | 
grep -v dma; sleep 60; done
kmalloc-12810144  10144128   321 : tunables000 : 
slabdata317317  0
kmalloc-128   9992352 9992352128   321 : tunables000 : 
slabdata 312261 312261  0
kmalloc-128   24226752 24226752128   321 : tunables000 
: slabdata 757086 757086  0
kmalloc-128   42754240 42754240128   321 : tunables000 
: slabdata 1336070 1336070  0

The huge numbers above come from insane number of async_work-s allocated
and queued by btrfs_wq_run_delayed_node.

The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
worker func (btrfs_async_run_delayed_root) processes at least
BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
works as expected while the list is almost empty. As soon as it is getting
bigger, worker func starts to process more than one item at a time, it takes
longer, and the chances to have async_works queued more than needed is getting
higher.

The problem above is worsened by another flaw of delayed-inode implementation:
if async_work was queued in a throttling branch (number of items >=
BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
the func occupies CPU infinitely (up to 30sec in my experiments): while the
func is trying to drain the list, the user activity may add more and more
items to the list.

The patch fixes both problems in straightforward way: refuse queuing too
many works in btrfs_wq_run_delayed_node and bail out of worker func if
at least BTRFS_DELAYED_WRITEBACK items are processed.

Signed-off-by: Maxim Patlasov 
---
 fs/btrfs/async-thread.c  |8 
 fs/btrfs/async-thread.h  |1 +
 fs/btrfs/delayed-inode.c |6 --
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index e0f071f..29f6252 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -86,6 +86,14 @@ btrfs_work_owner(struct btrfs_work *work)
return work->wq->fs_info;
 }
 
+bool btrfs_workqueue_normal_congested(struct btrfs_workqueue *wq)
+{
+   int thresh = wq->normal->thresh != NO_THRESHOLD ?
+   wq->normal->thresh : num_possible_cpus();
+
+   return atomic_read(>normal->pending) > thresh * 2;
+}
+
 BTRFS_WORK_HELPER(worker_helper);
 BTRFS_WORK_HELPER(delalloc_helper);
 BTRFS_WORK_HELPER(flush_delalloc_helper);
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 8e52484..1f95973 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -84,4 +84,5 @@ void btrfs_workqueue_set_max(struct btrfs_workqueue *wq, int 
max);
 void btrfs_set_work_high_priority(struct btrfs_work *work);
 struct btrfs_fs_info *btrfs_work_owner(struct btrfs_work *work);
 struct btrfs_fs_info *btrfs_workqueue_owner(struct __btrfs_workqueue *wq);
+bool btrfs_workqueue_normal_congested(struct btrfs_workqueue *wq);
 #endif
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 3eeb9cd..de946dd 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1356,7 +1356,8 @@ release_path:
total_done++;
 
btrfs_release_prepared_delayed_node(delayed_node);
-   if (async_work->nr == 0 || total_done < async_work->nr)
+   if ((async_work->nr == 0 && total_done < BTRFS_DELAYED_WRITEBACK) ||
+   total_done < async_work->nr)
goto again;
 
 free_path:
@@ -1372,7 +1373,8 @@ static int btrfs_wq_run_delayed_node(struct 
btrfs_delayed_root *delayed_root,
 {
struct btrfs_async_delayed_work *async_work;
 
-   if (atomic_read(_root->items) < BTRFS_DELAYED_BACKGROUND)
+   if (atomic_read(_root->items) < BTRFS_DELAYED_BACKGROUND ||
+   btrfs_workqueue_normal_congested(fs_info->delayed_workers))
return 0;
 
async_work = kmalloc(sizeof(*async_work), GFP_NOFS);

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: system hangs due to qgroups

2016-12-03 Thread Chris Murphy
On Sat, Dec 3, 2016 at 11:40 AM, Marc Joliet  wrote:
> Hello all,
>
> I'm having some trouble with btrfs on a laptop, possibly due to qgroups.
> Specifically, some file system activities (e.g., snapshot creation,
> baloo_file_extractor from KDE Plasma) cause the system to hang for up to about
> 40 minutes, maybe more.

Do you get any blocked tasks kernel messages? If so, issue sysrq+w
during the hang, and then check the system log (dmesg may not contain
everything if the command fills the message buffer). If it's a hang
without any kernel messages, then issue sysrq+t.

https://www.kernel.org/doc/Documentation/sysrq.txt




>
> After I next turned on the laptop, the balance resumed, causing bootup to
> fail, after which I remembered about the skip_balance mount option, which I
> tried in a rescue shell from an initramfs.

The file system is the root filesystem? If so, skip_balance may not be
happening soon enough. Use kernel parameter rootflags=skip_balance
which will apply this mount option at the very first moment the file
system is mounted during boot.



> Since I couldn't use skip_balance, and logically can't destroy qgroups on a
> read-only file system, I decided to wait for a regular mount to finish.  That
> has been running since Tuesday, and I am slowly growing impatient.

Haha, no kidding! I think that's very patient.


> Thus I arrive at my question(s): is there anything else I can try, short of
> reformatting and restoring from backup?  Can I use btrfs-check here, or any
> other tool?  Or...?

Yes, btrfs-progs 4.8.5 has the latest qgroup checks, so if there's
something wrong it should find it and if not that's a bug of its own.



> Also, should I be able to avoid reformatting: how do I properly disable quota
> support?

'btrfs quota disable' is the only command that applies to this and it
requires rw mount; there's no 'noquota' mount option.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs-progs: utils: negative numbers are more plausible than sizes over 8 EiB

2016-12-03 Thread Zygo Blaxell
I got tired of seeing "16.00EiB" whenever btrfs-progs encounters a
negative size value, e.g. during resize:

Unallocated:
   /dev/mapper/datamd18   16.00EiB

This version is much more useful:

Unallocated:
   /dev/mapper/datamd18  -26.29GiB

Signed-off-by: Zygo Blaxell 

---
v2: change the function prototype so it's easier to see that the
mangling implied by the name "pretty" includes "reinterpretation
of the u64 value as a signed quantity."
---
 utils.c | 12 ++--
 utils.h |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/utils.c b/utils.c
index 69b580a..07e8443 100644
--- a/utils.c
+++ b/utils.c
@@ -2575,7 +2575,7 @@ out:
  * Note: this function uses a static per-thread buffer. Do not call this
  * function more than 10 times within one argument list!
  */
-const char *pretty_size_mode(u64 size, unsigned mode)
+const char *pretty_size_mode(s64 size, unsigned mode)
 {
static __thread int ps_index = 0;
static __thread char ps_array[10][32];
@@ -2594,20 +2594,20 @@ static const char* unit_suffix_binary[] =
 static const char* unit_suffix_decimal[] =
{ "B", "kB", "MB", "GB", "TB", "PB", "EB"};
 
-int pretty_size_snprintf(u64 size, char *str, size_t str_size, unsigned 
unit_mode)
+int pretty_size_snprintf(s64 size, char *str, size_t str_size, unsigned 
unit_mode)
 {
int num_divs;
float fraction;
-   u64 base = 0;
+   s64 base = 0;
int mult = 0;
const char** suffix = NULL;
-   u64 last_size;
+   s64 last_size;
 
if (str_size == 0)
return 0;
 
if ((unit_mode & ~UNITS_MODE_MASK) == UNITS_RAW) {
-   snprintf(str, str_size, "%llu", size);
+   snprintf(str, str_size, "%lld", size);
return 0;
}
 
@@ -2642,7 +2642,7 @@ int pretty_size_snprintf(u64 size, char *str, size_t 
str_size, unsigned unit_mod
   num_divs = 0;
   break;
default:
-   while (size >= mult) {
+   while ((size < 0 ? -size : size) >= mult) {
last_size = size;
size /= mult;
num_divs++;
diff --git a/utils.h b/utils.h
index 366ca29..525bde9 100644
--- a/utils.h
+++ b/utils.h
@@ -174,9 +174,9 @@ int check_mounted_where(int fd, const char *file, char 
*where, int size,
 int btrfs_device_already_in_root(struct btrfs_root *root, int fd,
 int super_offset);
 
-int pretty_size_snprintf(u64 size, char *str, size_t str_bytes, unsigned 
unit_mode);
+int pretty_size_snprintf(s64 size, char *str, size_t str_bytes, unsigned 
unit_mode);
 #define pretty_size(size)  pretty_size_mode(size, UNITS_DEFAULT)
-const char *pretty_size_mode(u64 size, unsigned mode);
+const char *pretty_size_mode(s64 size, unsigned mode);
 
 u64 parse_size(char *s);
 u64 parse_qgroupid(const char *p);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: utils: negative numbers are more plausible than sizes over 8 EiB

2016-12-03 Thread Zygo Blaxell
On Sat, Dec 03, 2016 at 10:25:17AM -0800, Omar Sandoval wrote:
> On Sat, Dec 03, 2016 at 01:19:38AM -0500, Zygo Blaxell wrote:
> > I got tired of seeing "16.00EiB" whenever btrfs-progs encounters a
> > negative size value.
> > 
> > e.g. during filesystem shrink we see:
> > 
> > Unallocated:
> >/dev/mapper/testvol0   16.00EiB
> > 
> > Interpreting this as a signed quantity is much more useful:
> > 
> > Unallocated:
> >/dev/mapper/testvol0  -26.29GiB
> > 
> > Signed-off-by: Zygo Blaxell 
> > ---
> >  utils.c | 13 -
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/utils.c b/utils.c
> > index 69b580a..bd2b66e 100644
> > --- a/utils.c
> > +++ b/utils.c
> > @@ -2594,20 +2594,23 @@ static const char* unit_suffix_binary[] =
> >  static const char* unit_suffix_decimal[] =
> > { "B", "kB", "MB", "GB", "TB", "PB", "EB"};
> >  
> > -int pretty_size_snprintf(u64 size, char *str, size_t str_size, unsigned 
> > unit_mode)
> > +int pretty_size_snprintf(u64 usize, char *str, size_t str_size, unsigned 
> > unit_mode)
> >  {
> > int num_divs;
> > float fraction;
> > -   u64 base = 0;
> > +   s64 base = 0;
> > int mult = 0;
> > const char** suffix = NULL;
> > -   u64 last_size;
> > +   s64 last_size;
> >  
> > if (str_size == 0)
> > return 0;
> >  
> > +   /* Negative numbers are more plausible than sizes over 8 EiB. */
> > +   s64 size = (s64)usize;
> 
> Just make pretty_size_snprintf() take an s64 size so it's clear from the
> function signature that it's signed instead of hidden in the definition.

I intentionally buried the unsigned -> signed conversion in the lowest
level function so I wouldn't trigger signed/unsigned conversion warnings
at all 46 call sites for pretty_size_mode.  The btrfs code uses u64
endemically for all size data, and I wasn't about to try to change that.

The word "pretty" in the function name should imply that what comes out
is a possibly lossy transformation of what goes in.  Since "16.00EiB"
is much more lossy than "-29.96GiB", I believe I am merely reducing the
lossiness quantitatively rather than qualitatively.

On the other hand, the signed/unsigned warning isn't enabled by default
in this project.  I can certainly do it that way if you prefer.

> > +
> > if ((unit_mode & ~UNITS_MODE_MASK) == UNITS_RAW) {
> > -   snprintf(str, str_size, "%llu", size);
> > +   snprintf(str, str_size, "%lld", size);
> > return 0;
> > }
> >  
> > @@ -2642,7 +2645,7 @@ int pretty_size_snprintf(u64 size, char *str, size_t 
> > str_size, unsigned unit_mod
> >num_divs = 0;
> >break;
> > default:
> > -   while (size >= mult) {
> > +   while ((size < 0 ? -size : size) >= mult) {
> > last_size = size;
> > size /= mult;
> > num_divs++;
> > -- 
> > 2.1.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


signature.asc
Description: Digital signature


Re: missing checksums on reboot

2016-12-03 Thread Chris Murphy
On Fri, Dec 2, 2016 at 1:36 PM, Blake Lewis  wrote:
> Well, 3.10 is what you get with the RHEL7.x distributions, so that's
> why people are running it.
> Apparently, it is "good enough" for many purposes.

Nope. Btrfs in RHEL 7 is a technology preview, and ...

"Technology Preview features are currently unsupported, may not be
functionally complete, and are not suitable for deployment in
production. "


> My real goal here is to understand the scope of the bug and whether
> any mitigation is
> possible.  Of course, I don't expect anyone else to make a patch for
> me (or even to accept
> my patch), but if I knew what the bug is and what was done to fix it,
> I'd be in a much better
> position to decide what to do.  If anyone can shed any light on this,
> I'd be very grateful.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/diff/?id=v4.8.12=v3.19=2

There's more than 1 lines changed between 3.19 and 4.8.12. You're
in the right place, but it's a shot in the dark if any developer
remembers the particular problem you're having compared to the massive
pile of bugs fixed and changes made since then. The RHEL 3.10 kernel
as I understand it has v3.19 Btrfs code. You might test with a 4.1.36
kernel to see if your problem reproduces. There was some O_DIRECTbug
fixes a while back but I have no idea what kernel versions those were.

If the fix is relatively recent, it simply may not have been
backported very far because kernels go EOL and backporting just
doesn't happen, not that it can't be done. So you might search the
list for O_DIRECT, and see if anything looks familiar. If that doesn't
pan out you can start doing simple bisects with longterm kernel.org
kernels, starting with the oldest (which is where I got 4.1.36) and
see if you can find a commit that fixes the problem, and then if it
applies cleanly to the much older kernel you're using.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] btrfs: fix improper return value

2016-12-03 Thread Omar Sandoval
On Sat, Dec 03, 2016 at 06:55:16PM +0800, Pan Bian wrote:
> In function btrfs_uuid_tree_iterate(), errno is assigned to variable
> ret on errors. However, it directly returns 0. It may be better to
> return ret.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731
> 
> Signed-off-by: Pan Bian 
> ---
>  fs/btrfs/uuid-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
> index 7fc89e4..44bcc1f 100644
> --- a/fs/btrfs/uuid-tree.c
> +++ b/fs/btrfs/uuid-tree.c
> @@ -353,5 +353,5 @@ int btrfs_uuid_tree_iterate(struct btrfs_fs_info *fs_info,
>   btrfs_free_path(path);
>   if (ret)
>   btrfs_warn(fs_info, "btrfs_uuid_tree_iterate failed %d", ret);
> - return 0;
> + return ret;

This one makes sense, since the caller is checking the return value. The
caller is already printing a warning, can you also get rid of the
btrfs_warn(fs_info, "btrfs_uuid_tree_iterate failed %d", ret)?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


system hangs due to qgroups

2016-12-03 Thread Marc Joliet
Hello all,

I'm having some trouble with btrfs on a laptop, possibly due to qgroups.  
Specifically, some file system activities (e.g., snapshot creation, 
baloo_file_extractor from KDE Plasma) cause the system to hang for up to about 
40 minutes, maybe more.  It always causes (most of) my desktop to hang, 
(although I can usually navigate between pre-existing Konsole tabs) and 
prevents new programs from starting.  I've seen the system load go up to >30 
before the laptop suddenly resumes normal operation.  I've been seeing this 
since Linux 4.7, maybe already 4.6.

Now, I thought that maybe this was (indirectly) due to an overly full file 
system (~90% full), so I deleted some things I didn't need to get it up to 15% 
free.  (For the record, I also tried mounting with ssd_spread.)  After that, I 
ran a balance with -dusage=50, which started out promising, but then went back 
to the "bad" behaviour.  *But* it seemed better than before overall, so I 
started a balance with -musage=10, then -musage=50.  That turned out to be a 
mistake.  Since I had to transport the laptop, and couldn't wait for "balance 
cancel" to return (IIUC it only returns after the next block (group?) is 
freed), I forced the laptop off.

After I next turned on the laptop, the balance resumed, causing bootup to 
fail, after which I remembered about the skip_balance mount option, which I 
tried in a rescue shell from an initramfs.  But wait, that failed, too!  
Specifically, the stack trace I get whenever I try it includes as one of the 
last lines:

"RIP [] qgroup_fix_relocated_data_extents+0x1f/0x2a8"

(I can take photos of the full stack trace if requested.)

So then I ran "btrfs qgroup show /sysroot/", which showed many quota groups, 
much to my surprise.  On the upside, at least now I discovered the likely 
reason for the performance problems.

(I actually think I know why I'm seeing qgroups: at one point I was trying out 
various snapshot/backup tools for btrfs, and one (I forgot which) 
unconditionally activated quota support, which infuriated me, but I promptly 
deactivated it, or so I thought.  Is quota support automatically enabled when 
qgroups are discovered, or did I perhaps not disable quota support properly?)

Since I couldn't use skip_balance, and logically can't destroy qgroups on a 
read-only file system, I decided to wait for a regular mount to finish.  That 
has been running since Tuesday, and I am slowly growing impatient.

Thus I arrive at my question(s): is there anything else I can try, short of 
reformatting and restoring from backup?  Can I use btrfs-check here, or any 
other tool?  Or...?

Also, should I be able to avoid reformatting: how do I properly disable quota 
support?

(BTW, searching for qgroup_fix_relocated_data_extents turned up the ML thread 
"[PATCH] Btrfs: fix endless loop in balancing block groups", could that be 
related?)

The laptop is currently running Gentoo with Linux 4.8.10 and btrfs-progs 
4.8.4.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: [PATCH 1/1] btrfs: volumes: fix improper return value

2016-12-03 Thread Omar Sandoval
On Sat, Dec 03, 2016 at 07:01:45PM +0800, Pan Bian wrote:
> Variable ret takes the errno on failures. However, it directly returns 0.
> It may be better to return "ret".
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188741
> 
> Signed-off-by: Pan Bian 
> ---
>  fs/btrfs/volumes.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 71a60cc..32fd903 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4222,7 +4222,7 @@ static int btrfs_uuid_scan_kthread(void *data)
>   else
>   set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, _info->flags);
>   up(_info->uuid_tree_rescan_sem);
> - return 0;
> + return ret;

This is a kthread, no one checks the return value. We already let the
user know if it fails:

btrfs_warn(fs_info, "btrfs_uuid_scan_kthread failed %d", ret);

>  }
>  
>  /*
> -- 
> 1.9.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: utils: negative numbers are more plausible than sizes over 8 EiB

2016-12-03 Thread Omar Sandoval
On Sat, Dec 03, 2016 at 01:19:38AM -0500, Zygo Blaxell wrote:
> I got tired of seeing "16.00EiB" whenever btrfs-progs encounters a
> negative size value.
> 
> e.g. during filesystem shrink we see:
> 
> Unallocated:
>/dev/mapper/testvol0   16.00EiB
> 
> Interpreting this as a signed quantity is much more useful:
> 
> Unallocated:
>/dev/mapper/testvol0  -26.29GiB
> 
> Signed-off-by: Zygo Blaxell 
> ---
>  utils.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/utils.c b/utils.c
> index 69b580a..bd2b66e 100644
> --- a/utils.c
> +++ b/utils.c
> @@ -2594,20 +2594,23 @@ static const char* unit_suffix_binary[] =
>  static const char* unit_suffix_decimal[] =
>   { "B", "kB", "MB", "GB", "TB", "PB", "EB"};
>  
> -int pretty_size_snprintf(u64 size, char *str, size_t str_size, unsigned 
> unit_mode)
> +int pretty_size_snprintf(u64 usize, char *str, size_t str_size, unsigned 
> unit_mode)
>  {
>   int num_divs;
>   float fraction;
> - u64 base = 0;
> + s64 base = 0;
>   int mult = 0;
>   const char** suffix = NULL;
> - u64 last_size;
> + s64 last_size;
>  
>   if (str_size == 0)
>   return 0;
>  
> + /* Negative numbers are more plausible than sizes over 8 EiB. */
> + s64 size = (s64)usize;

Just make pretty_size_snprintf() take an s64 size so it's clear from the
function signature that it's signed instead of hidden in the definition.

> +
>   if ((unit_mode & ~UNITS_MODE_MASK) == UNITS_RAW) {
> - snprintf(str, str_size, "%llu", size);
> + snprintf(str, str_size, "%lld", size);
>   return 0;
>   }
>  
> @@ -2642,7 +2645,7 @@ int pretty_size_snprintf(u64 size, char *str, size_t 
> str_size, unsigned unit_mod
>  num_divs = 0;
>  break;
>   default:
> - while (size >= mult) {
> + while ((size < 0 ? -size : size) >= mult) {
>   last_size = size;
>   size /= mult;
>   num_divs++;
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_destroy_inode warn (outstanding extents)

2016-12-03 Thread Dave Jones
On Thu, Dec 01, 2016 at 10:32:09AM -0500, Dave Jones wrote:
 > http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents.txt
 > 
 > Also same bug, different run, but a different traceview 
 > http://codemonkey.org.uk/junk/btrfs-destroy-inode-outstanding-extents-function-graph.txt
 > 
 > (function-graph screws up the RIP for some reason, 'return_to_handler'
 >  should actually be btrfs_destroy_inode)

Chris pointed me at a pending patch that took care of this warning.

 > Anyways, I've got some code that works pretty well for dumping the
 > ftrace buffer now when things go awry.  I just need to run it enough
 > times that I hit that bad page state instead of this, or a lockdep bug first.

Which allowed me to run long enough to get this trace..
http://codemonkey.org.uk/junk/bad-page-state.txt

Does this shed any light ?
The interesting process here seems to be kworker/u8:17, and the trace
captures some of what that was doing before that bad page was hit.

I've got another run going now, so I'll compare that trace when it
happens to see if it matches my current theory that it's something to
do with that btrfs_scrubparity_helper. I've seen that show up in stack
traces a few times while chasing this.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


`btrfs check --repair` stuck in a loop // filesystem repair case study

2016-12-03 Thread Tomasz Melcer

Hi,

I have a btrfs filesystem on a 4TB HDD connected with USB 2.0. Some time 
ago I accidentally disconnected the drive while doing heavy writes. 
After reconnecting it seemed like the filesystem still works (it mounted 
fine and I could read some files chosen at random), but I ran `btrfs 
scrub` to be sure.


`btrfs scrub` aborted itself after ~20 hours, after reading ~3.5TB of 
data. `dmesg` contained a single line:


#v+
BTRFS error (device dm-0): bad tree block start 0 3527021166592
#v-

I couldn't find any further details anywhere in logs. I assume this 
means that some data have actually been lost from this filesystem. I 
have backups of data from this drive, so I decided to play a little 
trying out btrfs recovery strategies.


I checked whether there are any bad blocks on the raw device — all 
blocks were read successfully.


I created a devicemapper snapshot/overlay to keep the raw device data 
read only and track the changes made by any recovery procedures.


I ran `btrfstune -u` on the overlay to avoid having two devices with the 
same uuid. This was done using a dedicated VM which did not see the raw 
device (suggested by `Ke` on IRC). BTW, this command resulted in the 
overlay device growing by ~25GB, which IIUC means that around 6M 
4096-byte blocks were changed in the process (is that expected?).


I was recommended to run `btrfs check`. The result is here: [1] (323 
lines of output), and IIRC it finished in few hours.


 [1] https://gist.github.com/liori/f8c5e69677e8c9d6038d2e3e4db9aa42

(5 data checksum errors are a preexisting condition, I knew about them 
before the incident).


I then started `btrfs check --repair`. This was about a week ago, and it 
is still going. The partial output is here: [2] (already almost 18k 
lines). The same problems are being found again and again in a loop, as 
if it was stuck.


 [2] https://gist.github.com/liori/01494afbe63cd19ba49be663be937d84

I do observe that the ctime of the overlay file is updated every once a 
while, but the file itself does not grow anymore after some initial 
change of ~70k blocks. My interpretation is that even if the repair 
process writes anything, it only keeps writing in the same places again 
and again.


I did not have any snapshots on this filesystem. I did have some 
deduplicated content, but no more than 4 copies of any data block, and 
deduplication resulted in saving ~1TB of space total. The device was 
never a part of a multi-device setup.


Is there anything more I can do with this filesystem to bring it to a 
state where I can `btrfs scrub` it, know what have been lost, etc? Is 
this behavior of `btrfs scrub --repair` expected and will it ever finish?


Thank you,


--
Tomasz Melcer
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs: volumes: fix improper return value

2016-12-03 Thread Pan Bian
Variable ret takes the errno on failures. However, it directly returns 0.
It may be better to return "ret".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188741

Signed-off-by: Pan Bian 
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 71a60cc..32fd903 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4222,7 +4222,7 @@ static int btrfs_uuid_scan_kthread(void *data)
else
set_bit(BTRFS_FS_UPDATE_UUID_TREE_GEN, _info->flags);
up(_info->uuid_tree_rescan_sem);
-   return 0;
+   return ret;
 }
 
 /*
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs: fix improper return value

2016-12-03 Thread Pan Bian
In function btrfs_uuid_tree_iterate(), errno is assigned to variable
ret on errors. However, it directly returns 0. It may be better to
return ret.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188731

Signed-off-by: Pan Bian 
---
 fs/btrfs/uuid-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/uuid-tree.c b/fs/btrfs/uuid-tree.c
index 7fc89e4..44bcc1f 100644
--- a/fs/btrfs/uuid-tree.c
+++ b/fs/btrfs/uuid-tree.c
@@ -353,5 +353,5 @@ int btrfs_uuid_tree_iterate(struct btrfs_fs_info *fs_info,
btrfs_free_path(path);
if (ret)
btrfs_warn(fs_info, "btrfs_uuid_tree_iterate failed %d", ret);
-   return 0;
+   return ret;
 }
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html