Re: [PATCH] Btrfs: fix the mount failure due to missing devices

2017-05-02 Thread Adam Borowski
On Tue, May 02, 2017 at 05:48:40PM -0600, Liu Bo wrote:
> Say there is a raid1 btrfs which consists of two disks, after one disk
> becomes unavailable, we can still mount it in degraded mode once, for
> the second mount it would refuse to mount it with an error
> 
> "BTRFS warning (device sdf): missing devices (1) exceeds the limit (0), 
> writeable mount is not allowed"
> 
> The reason is that during the first mount (with the default mount
> option), it creates a chunk of single profile so that another mount
> will report the limit to tolerate missing or faulty devices as 0.
> 
> But we're mounting the filesystem from the device where the single
> profile chunk lives, we can safely allow it to be mounted in the
> degraded mode.

Nope, this approach doesn't work.  This patch breaks JBOD setups and any
other scheme that has single chunks that are present on missing devices.

I've tested, with obvious results.  Reproducer:
dd if=/dev/zero of=ra bs=1048576 seek=4095 count=1
dd if=/dev/zero of=rb bs=1048576 seek=4095 count=1
losetup -D
losetup -f ra
losetup -f rb
mkfs.btrfs -mraid1 -dsingle /dev/loop0 /dev/loop1
mount -onoatime /dev/loop0 /mnt/vol1
dd if=/dev/zero of=/mnt/vol1/foo bs=1048576 count=2048
umount /mnt/vol1
losetup -D
losetup -f ra
mount -onoatime,degraded /dev/loop0 /mnt/vol1

Both vanilla and with Qu's chunk check patch, the mount properly fails. 
With this patch instead, it succeeds with disastrous results.  There's
little point in allowing mounting when half of the data -- or worse,
metadata -- is missing.


> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/disk-io.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index eb1ee7b..b65a265 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3686,8 +3686,15 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
>  );
>   if (space.total_bytes == 0 || space.used_bytes == 0)
>   continue;
> - flags = space.flags;
>  
> + /*
> +  * skip single profile as we have opened this
> +  * device for single profile
> +  */
> + if ((space.flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0)
> + continue;
> +
> + flags = space.flags;
>   num_tolerated_disk_barrier_failures = min(
>   num_tolerated_disk_barrier_failures,
>   btrfs_get_num_tolerated_disk_barrier_failures(
> -- 
> 1.8.3.1

Meow!
-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: check: Fix heap use after free.

2017-05-02 Thread Su Yue
fsck/004-no-dir-index makes valgrinds complaining about Invalid read.
==31890== Invalid read of size 1
==31890==at 0x453D09: repair_inode_backrefs (cmds-check.c:2690)
==31890==by 0x453D09: check_inode_recs (cmds-check.c:3330)
==31890==by 0x453D09: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890==at 0x4C2C14B: free (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890==by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890==by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Block was alloc'd at
==31890==at 0x4C2AF1F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890==by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890==by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890==by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890==by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890==by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==
==31890== Invalid read of size 8
==31890==at 0x452D66: repair_inode_backrefs (cmds-check.c:2731)
==31890==by 0x452D66: check_inode_recs (cmds-check.c:3330)
==31890==by 0x452D66: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890==at 0x4C2C14B: free (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890==by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890==by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Block was alloc'd at
==31890==at 0x4C2AF1F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890==by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890==by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890==by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890==by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890==by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==

While iterating over backrefs in repair_inode_backrefs, there are several
situations to repair one backref according backref->found_dir_item and
backref->found_dir_index.
Two of these branches may free the backref, but next judgments will still
access the freed memory.

Because these branches are independent, let repair_inode_backrefs skip to
handle next backref after free can fix it.

Signed-off-by: Su Yue 
---
 cmds-check.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 897b1587..e52fbb84 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2666,6 +2666,7 @@ static int repair_inode_backrefs(struct btrfs_root *root,
repaired++;
list_del(>list);
free(backref);
+   continue;
}
 
if (!delete && !backref->found_dir_index &&
@@ -2676,12 +2677,12 @@ static int repair_inode_backrefs(struct btrfs_root 
*root,
break;
repaired++;
if (backref->found_dir_item &&
-   backref->found_dir_index &&
backref->found_dir_index) {
if (!backref->errors &&
backref->found_inode_ref) {
list_del(>list);
free(backref);
+   continue;
}
}
}
-- 
2.12.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More 

[PATCH] Btrfs: fix the mount failure due to missing devices

2017-05-02 Thread Liu Bo
Say there is a raid1 btrfs which consists of two disks, after one disk
becomes unavailable, we can still mount it in degraded mode once, for
the second mount it would refuse to mount it with an error

"BTRFS warning (device sdf): missing devices (1) exceeds the limit (0), 
writeable mount is not allowed"

The reason is that during the first mount (with the default mount
option), it creates a chunk of single profile so that another mount
will report the limit to tolerate missing or faulty devices as 0.

But we're mounting the filesystem from the device where the single
profile chunk lives, we can safely allow it to be mounted in the
degraded mode.

Signed-off-by: Liu Bo 
---
 fs/btrfs/disk-io.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eb1ee7b..b65a265 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3686,8 +3686,15 @@ int btrfs_calc_num_tolerated_disk_barrier_failures(
   );
if (space.total_bytes == 0 || space.used_bytes == 0)
continue;
-   flags = space.flags;
 
+   /*
+* skip single profile as we have opened this
+* device for single profile
+*/
+   if ((space.flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0)
+   continue;
+
+   flags = space.flags;
num_tolerated_disk_barrier_failures = min(
num_tolerated_disk_barrier_failures,
btrfs_get_num_tolerated_disk_barrier_failures(
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I see what device was used to mount btrfs?

2017-05-02 Thread Adam Borowski
On Tue, May 02, 2017 at 10:15:06PM +0200, Kai Krakow wrote:
> Ideally, the btrfs wouldn't even appear in /dev until it was assembled
> by udev. But apparently that's not the case, and I think this is where
> the problems come from. I wish, btrfs would not show up as device nodes
> in /dev that the mount command identified as btrfs. Instead, btrfs
> would expose (probably through udev) a device node
> in /dev/btrfs/fs_identifier when it is ready.
> 
> Apparently, the core problem of how to handle degraded btrfs still
> remains. Maybe it could be solved by adding more stages of btrfs nodes,
> like /dev/btrfs-incomplete (for unusable btrfs), /dev/btrfs-degraded
> (for btrfs still missing devices but at least one stripe of btrfs raid
> available) and /dev/btrfs as the final stage.

The problem is, we can't tell these states apart other than doing the vast
majority of mount's work.  As I described earlier in this thread, even the
"fully available" stage is not trivial.


Meow!
-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I see what device was used to mount btrfs?

2017-05-02 Thread Kai Krakow
Am Tue, 2 May 2017 21:50:19 +0200
schrieb Goffredo Baroncelli :

> On 2017-05-02 20:49, Adam Borowski wrote:
> >> It could be some daemon that waits for btrfs to become complete.
> >> Do we have something?  
> > Such a daemon would also have to read the chunk tree.  
> 
> I don't think that a daemon is necessary. As proof of concept, in the
> past I developed a mount helper [1] which handled the mount of a
> btrfs filesystem: this handler first checks if the filesystem is a
> multivolume devices, if so it waits that all the devices are
> appeared. Finally mount the filesystem.
> 
> > It's not so simple -- such a btrfs device would have THREE states:
> > 
> > 1. not mountable yet (multi-device with not enough disks present)
> > 2. mountable ro / rw-degraded
> > 3. healthy  
> 
> My mount.btrfs could be "programmed" to wait a timeout, then it
> mounts the filesystem as degraded if not all devices are present.
> This is a very simple strategy, but this could be expanded.
> 
> I am inclined to think that the current approach doesn't fit well the
> btrfs requirements.  The roles and responsibilities are spread to too
> much layer (udev, systemd, mount)... I hoped that my helper could be
> adopted in order to concentrate all the responsibility to only one
> binary; this would reduce the interface number with the other
> subsystem (eg systemd, udev).
> 
> For example, it would be possible to implement a sane check that
> prevent to mount a btrfs filesystem if two devices exposes the same
> UUID... 

Ideally, the btrfs wouldn't even appear in /dev until it was assembled
by udev. But apparently that's not the case, and I think this is where
the problems come from. I wish, btrfs would not show up as device nodes
in /dev that the mount command identified as btrfs. Instead, btrfs
would expose (probably through udev) a device node
in /dev/btrfs/fs_identifier when it is ready.

Apparently, the core problem of how to handle degraded btrfs still
remains. Maybe it could be solved by adding more stages of btrfs nodes,
like /dev/btrfs-incomplete (for unusable btrfs), /dev/btrfs-degraded
(for btrfs still missing devices but at least one stripe of btrfs raid
available) and /dev/btrfs as the final stage. That way, a mount process
could wait for a while, and if the device doesn't appear, it tries the
degraded stage instead. If the fs is opened from the degraded dev node
stage, udev (or other processes) that scan for devices should stop
assembling the fs if they still do so.

bcache has a similar approach by hiding an fs within a protective
superblock. Unless bcache is setup, the fs won't show up in /dev, and
that fs won't be visible by other means. Btrfs should do something
similar and only show a single device node if assembled completely. The
component devices would have superblocks ignored by mount, and only the
final node would expose a virtual superblock and the compound device
after it. Of course, this makes things like compound device resizing
more complicated maybe even impossible.

If I'm not totally wrong, I think this is also how zfs exposes its
pools. You need user space tools to make the fs pools visible in the
tree. If zfs is incomplete, there's nothing to mount, and thus no race
condition. But I never tried zfs seriously, so I do not know.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-02 Thread Kai Krakow
Am Mon, 1 May 2017 22:56:06 -0600
schrieb Chris Murphy :

> On Mon, May 1, 2017 at 9:23 PM, Marc MERLIN  wrote:
> > Hi Chris,
> >
> > Thanks for the reply, much appreciated.
> >
> > On Mon, May 01, 2017 at 07:50:22PM -0600, Chris Murphy wrote:  
> >> What about btfs check (no repair), without and then also with
> >> --mode=lowmem?
> >>
> >> In theory I like the idea of a 24 hour rollback; but in normal
> >> usage Btrfs will eventually free up space containing stale and no
> >> longer necessary metadata. Like the chunk tree, it's always
> >> changing, so you get to a point, even with snapshots, that the old
> >> state of that tree is just - gone. A snapshot of an fs tree does
> >> not make the chunk tree frozen in time.  
> >
> > Right, of course, I was being way over optimistic here. I kind of
> > forgot that metadata wasn't COW, my bad.  
> 
> Well it is COW. But there's more to the file system than fs trees, and
> just because an fs tree gets snapshot doesn't mean all data is
> snapshot. So whether snapshot or not, there's metadata that becomes
> obsolete as the file system is updated and those areas get freed up
> and eventually overwritten.
> 
> 
> >  
> >> In any case, it's a big problem in my mind if no existing tools can
> >> fix a file system of this size. So before making anymore changes,
> >> make sure you have a btrfs-image somewhere, even if it's huge. The
> >> offline checker needs to be able to repair it, right now it's all
> >> we have for such a case.  
> >
> > The image will be huge, and take maybe 24H to make (last time it
> > took some silly amount of time like that), and honestly I'm not
> > sure how useful it'll be.
> > Outside of the kernel crashing if I do a btrfs balance, and
> > hopefully the crash report I gave is good enough, the state I'm in
> > is not btrfs' fault.
> >
> > If I can't roll back to a reasonably working state, with data loss
> > of a known quantity that I can recover from backup, I'll have to
> > destroy and filesystem and recover from scratch, which will take
> > multiple days. Since I can't wait too long before getting back to a
> > working state, I think I'm going to try btrfs check --repair after
> > a scrub to get a list of all the pathanmes/inodes that are known to
> > be damaged, and work from there.
> > Sounds reasonable?  
> 
> Yes.
> 
> 
> >
> > Also, how is --mode=lowmem being useful?  
> 
> Testing. lowmem is a different implementation, so it might find
> different things from the regular check.
> 
> 
> >
> > And for re-parenting a sub-subvolume, is that possible?
> > (I want to delete /sub1/ but I can't because I have /sub1/sub2
> > that's also a subvolume and I'm not sure how to re-parent sub2 to
> > somewhere else so that I can subvolume delete sub1)  
> 
> Well you can move sub2 out of sub1 just like a directory and then
> delete sub1. If it's read-only it can't be moved, but you can use
> btrfs property get/set ro true/false to temporarily make it not
> read-only, move it, then make it read-only again, and it's still fine
> to use with btrfs send receive.
> 
> 
> 
> 
> 
> >
> > In the meantime, a simple check without repair looks like this. It
> > will likely take many hours to complete:
> > gargamel:/var/local/space# btrfs check /dev/mapper/dshelf2
> > Checking filesystem on /dev/mapper/dshelf2
> > UUID: 03e9a50c-1ae6-4782-ab9c-5f310a98e653
> > checking extents
> > checksum verify failed on 3096461459456 found 0E6B7980 wanted
> > FBE5477A checksum verify failed on 3096461459456 found 0E6B7980
> > wanted FBE5477A checksum verify failed on 2899180224512 found
> > 7A6D427F wanted 7E899EE5 checksum verify failed on 2899180224512
> > found 7A6D427F wanted 7E899EE5 checksum verify failed on
> > 2899180224512 found ABBE39B0 wanted E0735D0E checksum verify failed
> > on 2899180224512 found 7A6D427F wanted 7E899EE5 bytenr mismatch,
> > want=2899180224512, have=3981076597540270796 checksum verify failed
> > on 1449488023552 found CECC36AF wanted 199FE6C5 checksum verify
> > failed on 1449488023552 found CECC36AF wanted 199FE6C5 checksum
> > verify failed on 1449544613888 found 895D691B wanted A0C64D2B
> > checksum verify failed on 1449544613888 found 895D691B wanted
> > A0C64D2B parent transid verify failed on 1671538819072 wanted
> > 293964 found 293902 parent transid verify failed on 1671538819072
> > wanted 293964 found 293902 checksum verify failed on 1671603781632
> > found 18BC28D6 wanted 372655A0 checksum verify failed on
> > 1671603781632 found 18BC28D6 wanted 372655A0 checksum verify failed
> > on 1759425052672 found 843B59F1 wanted F0FF7D00 checksum verify
> > failed on 1759425052672 found 843B59F1 wanted F0FF7D00 checksum
> > verify failed on 2182657212416 found CD8EFC0C wanted 70847071
> > checksum verify failed on 2182657212416 found CD8EFC0C wanted
> > 70847071 checksum verify failed on 2898779357184 found 96395131
> > wanted 433D6E09 checksum verify failed on 2898779357184 found
> > 96395131 wanted 

Re: 4.11 relocate crash, null pointer + rolling back a filesystem by X hours?

2017-05-02 Thread Kai Krakow
Am Tue, 2 May 2017 05:01:02 + (UTC)
schrieb Duncan <1i5t5.dun...@cox.net>:

> Of course on-list I'm somewhat known for my arguments propounding the 
> notion that any filesystem that's too big to be practically
> maintained (including time necessary to restore from backups, should
> that be necessary for whatever reason) is... too big... and should
> ideally be broken along logical and functional boundaries into a
> number of individual smaller filesystems until such point as each one
> is found to be practically maintainable within a reasonably practical
> time frame. Don't put all the eggs in one basket, and when the bottom
> of one of those baskets inevitably falls out, most of your eggs will
> be safe in other baskets. =:^)

Hehe... Yes, you're a fan of small filesystems. I'm more from the
opposite camp, preferring one big filesystem to not mess around with
size constraints of small filesystems fighting for the same volume
space. It also gives such filesystems better chances for data locality
of data staying in totally different parts across your fs mounts and
can reduce head movement. Of course, much of this is not true if you
use different devices per filesystem, or use SSDs, or SAN where you
have no real control over the physical placement of image stripes
anyway. But well...

In an ideal world, subvolumes of btrfs would be totally independent of
each other, just only share the same volume and dynamically allocating
chunks of space from it. If one is broken, it is simply not usable and
it should be destroyable. A garbage collector would grab the leftover
chunks from the subvolume and free them, and you could recreate this
subvolume from backup. In reality, shared extents will cross subvolume
borders so it is probably not how things could work anytime in the near
of far future.

This idea is more like having thinly provisioned LVM volumes which
allocate space as the filesystems on top need them, much like doing
thinly provisioned images with a VM host system. The problem here is,
unlike subvolumes, those chunks of space could never be given back to
the host as it doesn't know if it is still in use. Of course, there's
implementations available which allow thinning the images by passing
through TRIM from the guest to the host (or by other means of
communication channels between host and guest), but that is usually not
giving good performance, if even supported.

I tried once to exploit this in VirtualBox and hoped it would translate
guest discards into hole punching requests on the host, and it's even
documented to work that way... But (a) it was horrible slow, and (b) it
was incredibly unstable to the point of being useless. OTOH, it's not
announced as a stable feature and has to be enabled by manually editing
the XML config files.

But I still like the idea: Is it possible to make btrfs still work if
one subvolume gets corrupted? Of course it should have ways of telling
the user which other subvolumes are interconnected through shared
extents so those would be also discarded upon corruption cleanup - at
least if those extents couldn't be made any sense of any longer. Since
corruption is an issue mostly of subvolumes being written to, snapshots
should be mostly safe.

Such a feature would also only make sense if btrfs had an online repair
tool. BTW, are there plans for having an online repair tool in the
future? Maybe one that only scans and fixes part of the filesystems
(for obvious performance reasons, wrt Duncans idea of handling
filesystems), i.e. those parts that the kernel discovered having
corruptions? If I could then just delete and restore affected files,
this would be even better than having independent subvolumes like above.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I see what device was used to mount btrfs?

2017-05-02 Thread Goffredo Baroncelli
On 2017-05-02 20:49, Adam Borowski wrote:
>> It could be some daemon that waits for btrfs to become complete.  Do we
>> have something?
> Such a daemon would also have to read the chunk tree.

I don't think that a daemon is necessary. As proof of concept, in the past I 
developed a mount helper [1] which handled the mount of a btrfs filesystem:
this handler first checks if the filesystem is a multivolume devices, if so it 
waits that all the devices are appeared. Finally mount the filesystem.

> It's not so simple -- such a btrfs device would have THREE states:
> 
> 1. not mountable yet (multi-device with not enough disks present)
> 2. mountable ro / rw-degraded
> 3. healthy

My mount.btrfs could be "programmed" to wait a timeout, then it mounts the 
filesystem as degraded if not all devices are present. This is a very simple 
strategy, but this could be expanded.

I am inclined to think that the current approach doesn't fit well the btrfs 
requirements.  The roles and responsibilities are spread to too much layer 
(udev, systemd, mount)... I hoped that my helper could be adopted in order to 
concentrate all the responsibility to only one binary; this would reduce the 
interface number with the other subsystem (eg systemd, udev).

For example, it would be possible to implement a sane check that prevent to 
mount a btrfs filesystem if two devices exposes the same UUID... 


BR
G.Baroncelli

[1] See "[RFC][PATCH v2] mount.btrfs helper" thread ( 
https://marc.info/?l=linux-btrfs=141736989508243=2 )



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I see what device was used to mount btrfs?

2017-05-02 Thread Adam Borowski
On Tue, May 02, 2017 at 05:19:34PM +0300, Andrei Borzenkov wrote:
> On Tue, May 2, 2017 at 4:58 PM, Adam Borowski  wrote:
> > On Sun, Apr 30, 2017 at 08:47:43AM +0300, Andrei Borzenkov wrote:
> >> systemd waits for the final device that makes btrfs complete and mounts
> >> it using this device name.
> >
> >> But in /proc/self/mountinfo we actually see another
> >> device name. Due to peculiarities of systemd implementation this device
> >> "does not exist" from systemd PoV.
> >>
> >> Looking at btrfs code I start to suspect that we actually do not know
> >> what device was used to mount it at all.
> >>
> >> So we always show device with the smallest devid, irrespectively of what
> >> device was actually used to mount it.
> >
> > Devices come and go (ok, it's not like you hot-remove disks every day,
> > but...).  Storing the device that started the mount is pointless: btrfs
> > can handle removal fine so such a stored device would point nowhere -- or
> > worse, to some unrelated innocent disk you put in for data recovery (you may
> > have other plans than re-provisioning that raid).
> 
> Yes, I understand all of this, you do not need to convince me. OTOH
> the problem is real - we need to have some way to order btrfs mounts
> during bootup. In the past it was solved by delays. Systemd tries to
> eliminate ad hoc delays ... which is by itself not bad. So what can be
> utilized from btrfs side to implement ordering? We need /something/ to
> wait for. It could be virtual device that represents btrfs RAID and
> have state online/offline (similar to Linux MD).

It's not so simple -- such a btrfs device would have THREE states:

1. not mountable yet (multi-device with not enough disks present)
2. mountable ro / rw-degraded
3. healthy

The distinction between 1 and 2 is important, especially because systemd for
some reason insists on forcing unmount if it thinks the filesystem is in a
bad state (why?!?).  On distributions that follow the traditional remount
scheme (ie, you mount ro during boot, run fsck/whatever (no-op for btrfs),
then remount rw), starting as soon as we're in state 2 would be faster.  It
would also allow automatically going degraded if a timeout is hit[1].

To distinguish between 1 and 2 you need to halfway mount the filesystem, at
least to read the chunk tree (Qu's "why the heck it wasn't merged 2 years
ago" chunk check patch would help).

Naively thinking, it might be tempting to have only two states, varying it
whether the filesystem is already mounted -- currently it's 1+2 vs 3; it
would be: before mount: 1+2 vs 3, after mount: 1 vs 2+3.

But this would lead to breakage in corner cases.

For example: a box has a 3-way raid1 on sda sdb sdc.  Due to a cable not
being firmly seated, power supply or controller having a hiccup, etc,
suddenly sda goes offline.  Btrfs handles that fine, the admin gets worried
and hot-plugs a fourth disk, adding it to the raid.  Reboot.  sda gets up
first, boot goes fine so far, mountall/systemd starts, wants to mount that
filesystem.  sda appears to be fine, systemd reads it and sees there are _3_
disks (as obviously sda doesn't yet know about the fourth).  As sdd was a
random slow crap disk the admin happened to have on the shelf, it's not yet
up.  So systemd sees sda sdb sdc on -- they have all the device IDs it's
looking for, the count is ok, so it assumes all is fine.  It tries to mount,
but btrfs then properly notices there are four disks needed, and because
there was no -odegraded, the mount fails.  Boom.

Thus, there's no real way to know if the mount will succeed beforehand.


> It could be some daemon that waits for btrfs to become complete.  Do we
> have something?

Such a daemon would also have to read the chunk tree.


Meow!

[1]. Not entirely sure if that's a good default -- in one of my boxes, two
disks throw scary errors like UnrecovData BadCRC then after ninetysomething
seconds all goes well, although md (/ is on a 5GB 5-way raid1 md) first goes
degraded then starts autorecovery.  As systemd likes timeouts of 90 seconds,
just a few seconds shy of what this box needs to settle, having systemd and
auto-degrade there would lead to unpaired blocks, which btrfs doesn't yet
repair without being ordered to by hand.

-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check --repair: failed to repair damaged filesystem, aborting

2017-05-02 Thread Marc MERLIN
(cc trimmed)

The one in debian/unstable crashed:
gargamel:~# btrfs --version
btrfs-progs v4.7.3
gargamel:~# btrfs check --repair /dev/mapper/dshelf2
bytenr mismatch, want=2899180224512, have=3981076597540270796
extent-tree.c:2721: alloc_reserved_tree_block: Assertion `ret` failed.
btrfs[0x43e418]
btrfs[0x43e43f]
btrfs[0x43f276]
btrfs[0x43f46f]
btrfs[0x4407ef]
btrfs[0x440963]
btrfs(btrfs_inc_extent_ref+0x513)[0x44107a]
btrfs[0x420053]
btrfs[0x4265eb]
btrfs(cmd_check+0x)[0x427d6d]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f6b632e82b1]
btrfs(_start+0x2a)[0x40a37a]

Ok, it's old, let's take git from today:
gargamel:~# btrfs --version
btrfs-progs v4.10.2
As a note, 
gargamel:~# btrfs check --mode=lowmem --repair /dev/mapper/dshelf2
enabling repair mode
ERROR: low memory mode doesn't support repair yet

As a note, a 32bit binary on a 64bit kernel:
gargamel:~# btrfs check --repair /dev/mapper/dshelf2
enabling repair mode
Checking filesystem on /dev/mapper/dshelf2
UUID: 03e9a50c-1ae6-4782-ab9c-5f310a98e653
checking extents
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5
checksum verify failed on 1449488023552 found CECC36AF wanted 199FE6C5
checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B
checksum verify failed on 1449544613888 found 895D691B wanted A0C64D2B
parent transid verify failed on 1671538819072 wanted 293964 found 293902
parent transid verify failed on 1671538819072 wanted 293964 found 293902
checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0
checksum verify failed on 1671603781632 found 18BC28D6 wanted 372655A0
cmds-check.c:6291: add_data_backref: BUG_ON `!back` triggered, value 1
Aborted

let's try again with a 64bit binary built from git:
(...)
Repaired extent references for 4227617038336
ref mismatch on [4227872751616 4096] extent item 1, found 0
Incorrect local backref count on 4227872751616 parent 3493071667200 owner 0
offset 0 found 0 wanted 1 back 0x56470b18e7f0  
Backref disk bytenr does not match extent record, bytenr=4227872751616, ref
bytenr=0
backpointer mismatch on [4227872751616 4096]
owner ref check failed [4227872751616 4096]
repair deleting extent record: key 4227872751616 168 4096
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E  
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
Repaired extent references for 4227872751616
ref mismatch on [6674127745024 32768] extent item 0, found 1
Backref 6674127745024 parent 7566652473344 owner 0 offset 0 num_refs 0 not
found in extent tree
Incorrect local backref count on 6674127745024 parent 7566652473344 owner 0
offset 0 found 1 wanted 0 back 0x5648afda0f20  
backpointer mismatch on [6674127745024 32768]
checksum verify failed on 6983266418688 found 393B112A wanted 2B19CD5C
checksum verify failed on 6983266418688 found 393B112A wanted 2B19CD5C
checksum verify failed on 6983266418688 found BCBF9E15 wanted 785FF67E
checksum verify failed on 6983266418688 found 393B112A wanted 2B19CD5C
bytenr mismatch, want=6983266418688, have=13671317608077697645
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr mismatch, want=2899180224512, have=3981076597540270796
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
checksum verify failed on 2899180224512 found ABBE39B0 wanted E0735D0E
checksum verify failed on 2899180224512 found 7A6D427F wanted 7E899EE5
bytenr 

Re: [PATCH] Tidy while loop in end_compressed_writeback

2017-05-02 Thread Sahil Kang
I didn't consider it before, but I agree with your point on explicitly handling 
the return value from find_get_pages_contig.

Thanks for looking over it,
Sahil


 Original Message 
From: David Sterba 
Sent: May 2, 2017 9:01:34 AM PDT
To: Sahil Kang 
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Tidy while loop in end_compressed_writeback

On Wed, Apr 19, 2017 at 05:43:19AM -0700, Sahil Kang wrote:
> This is my first patch so it's a minor code change.
> I think removing the early continue from the loop makes the function a 
> little easier to follow.

I think the opposite. In the current code, it's clear that if ret is 0,
then there's a shortcut. While your code change is functionally
equivalent, it takes a bit more time to see that the for loop does
nothing if ret == 0 and overall it looks like we don't handle the return
value coming from find_get_pages_contig, in exchange for some ?:
trickery that requires a comment.

So I don't think this patch makes code better and I'm not going to apply
it, sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: tests: add variable quotation to convert-tests

2017-05-02 Thread David Sterba
On Mon, Apr 24, 2017 at 12:30:00AM +0530, Lakshmipathi.G wrote:
> Signed-off-by: Lakshmipathi.G 

Applied, thanks.

> - if [[ $TEST_LOG =~ dump ]]; then
> + if [[ "$TEST_LOG" =~ dump ]]; then

Please note, that the matched string should not be quoted, since bash 3.2.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: run_next_block: Return error code when extent buffer read fails

2017-05-02 Thread David Sterba
On Mon, Apr 24, 2017 at 04:57:17PM +0530, Praveen K Pandey wrote:
> run_next_block() ignores the return value of read_tree_block() and
> always returns success status. This might cause deal_root_from_list()
> to loop infinitely when reading metadata block fails. This bug fixes
> the issue by extracting and returning the error code from the return
> value of read_tree_block().
> 
> Bug ID : https://bugzilla.kernel.org/show_bug.cgi?id=11
> 
> Signed-off-by: Praveen K Pandey 

Applied, thanks. As noted in the bug, there are many more to fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3 v2] Make max_size consistent with nr

2017-05-02 Thread David Sterba
On Fri, Apr 28, 2017 at 11:51:21AM +0200, Christophe de Dinechin wrote:
> Since we memset tmpl, max_size==0. This does not seem consistent with nr = 1.
> In check_extent_refs, we will call:
> 
>   set_extent_dirty(root->fs_info->excluded_extents,
>rec->start,
>rec->start + rec->max_size - 1);
> 
> This ends up with BUG_ON(end < start) in insert_state.
> 
> Signed-off-by: Christophe de Dinechin 

Applied. Thanks for debugging the issue!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3 v2] Prevent attempt to insert extent record with max_size==0

2017-05-02 Thread David Sterba
On Fri, Apr 28, 2017 at 11:50:23AM +0200, Christophe de Dinechin wrote:
> When this happens, we will trip a BUG_ON(end < start) in insert_state
> because in check_extent_refs, we use this max_size expecting it's not zero:
> 
>   set_extent_dirty(root->fs_info->excluded_extents,
>rec->start,
>rec->start + rec->max_size - 1);
> 
> See https://bugzilla.redhat.com/show_bug.cgi?id=1435567
> for an example where this scenario occurs.
> 
> Signed-off-by: Christophe de Dinechin 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: always write superblocks synchronously

2017-05-02 Thread Davidlohr Bueso
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_FUA implementation.
Since REQ_FUA and REQ_FLUSH flags are stripped from submitted IO
when the disk doesn't have volatile write cache and thus effectively
make the write async. This was seen to cause performance hits up
to 90% regression in disk IO related benchmarks such as reaim and
dbench[1].

Fix the problem by making sure the first superblock write is also
treated as synchronous since they can block progress of the
journalling (commit, log syncs) machinery and thus the whole filesystem.

[1] https://www.spinics.net/lists/linux-ext4/msg56238.html

Fixes: b685d3d65ac (block: treat REQ_FUA and REQ_PREFLUSH as synchronous)
Cc: stable 
Cc: Jan Kara 
Signed-off-by: Davidlohr Bueso 
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 061c1d1f774f..51b2fd8ceccb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3468,7 +3468,7 @@ static int write_dev_supers(struct btrfs_device *device,
 * to go down lazy.
 */
if (i == 0)
-   ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_FUA, bh);
+   ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_FUA | 
REQ_SYNC, bh);
else
ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
if (ret)
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Disambiguate between cases where add_tree_backref fails

2017-05-02 Thread David Sterba
On Fri, Apr 28, 2017 at 11:12:35AM +0200, Christophe de Dinechin wrote:
> See https://bugzilla.redhat.com/show_bug.cgi?id=1435567 for an example
> where the message occurs,
> 
> Signed-off-by: Christophe de Dinechin 

Applied, thanks. I've moved the strings to next line so they don't
overflow 80 cols.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Tidy while loop in end_compressed_writeback

2017-05-02 Thread David Sterba
On Wed, Apr 19, 2017 at 05:43:19AM -0700, Sahil Kang wrote:
> This is my first patch so it's a minor code change.
> I think removing the early continue from the loop makes the function a 
> little easier to follow.

I think the opposite. In the current code, it's clear that if ret is 0,
then there's a shortcut. While your code change is functionally
equivalent, it takes a bit more time to see that the for loop does
nothing if ret == 0 and overall it looks like we don't handle the return
value coming from find_get_pages_contig, in exchange for some ?:
trickery that requires a comment.

So I don't think this patch makes code better and I'm not going to apply
it, sorry.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: remove an unused variable

2017-05-02 Thread David Sterba
On Tue, May 02, 2017 at 12:25:27PM +0300, Dan Carpenter wrote:
> "item" is never used.
> 
> Signed-off-by: Dan Carpenter 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1 linux-next] btrfs: kmap() can't fail

2017-05-02 Thread David Sterba
On Tue, Apr 25, 2017 at 08:11:02PM +0200, Fabian Frederick wrote:
> Remove NULL test on kmap()
> 
> Signed-off-by: Fabian Frederick 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] btrfs: Make flush bios explicitely sync

2017-05-02 Thread David Sterba
On Tue, May 02, 2017 at 05:03:50PM +0200, Jan Kara wrote:
> Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
> synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
> definitions.  generic_make_request_checks() however strips REQ_FUA and
> REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
> write cache and thus write effectively becomes asynchronous which can
> lead to performance regressions
> 
> Fix the problem by making sure all bios which are synchronous are
> properly marked with REQ_SYNC.
> 
> CC: David Sterba 
> CC: linux-btrfs@vger.kernel.org
> Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
> Signed-off-by: Jan Kara 

Acked-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: btrfs-convert: Add larger device support

2017-05-02 Thread David Sterba
On Tue, May 02, 2017 at 05:06:01PM +0200, David Sterba wrote:
> On Thu, Apr 27, 2017 at 03:35:34PM +0530, Lakshmipathi.G wrote:
> > Bug: Fail to convert 22TB EXT4 to BTRFS
> > https://bugzilla.kernel.org/show_bug.cgi?id=194795
> 
> Thanks for working on the bug, the proposed fix looks good to me judging
> by the use of extended API. This would need a test though. I'm trying
> this on a sparse file, 3TB occupies 50G and is slow to create.
> Conversion works and hasn't finished yet, so this is not suitable for
> tests run by default.

So it took a few more minutes. I'm testing it again, not on NFS,  and
now the creation of empty filesystem took just a few seconds and empty
file is ~160MB. This sounds much better. So please write a test, using
something like

truncate -s3T image
mkfs.ext4 image
mount && write some data
convert && rollback

Thanks.  Later we might need to add some mkfs.ext4 option coverage.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: btrfs-convert: Add larger device support

2017-05-02 Thread Lakshmipathi.G
On Tue, May 02, 2017 at 05:06:01PM +0200, David Sterba wrote:
> On Thu, Apr 27, 2017 at 03:35:34PM +0530, Lakshmipathi.G wrote:
> > Bug: Fail to convert 22TB EXT4 to BTRFS
> > https://bugzilla.kernel.org/show_bug.cgi?id=194795
> 
> Thanks for working on the bug, the proposed fix looks good to me judging
> by the use of extended API. This would need a test though. I'm trying
> this on a sparse file, 3TB occupies 50G and is slow to create.
> Conversion works and hasn't finished yet, so this is not suitable for
> tests run by default.
> 
> Please write a more detailed changelog, a reference to bugzilla is not
> really sufficient.

Sure, will send a patch with better commit description about the bug and
proposed fix. 

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: check: Fix heap use after free.

2017-05-02 Thread David Sterba
On Tue, May 02, 2017 at 02:26:19PM +0800, Su Yue wrote:
> The repair_inode_backrefs use the backref again after free while iterating 
> over backrefs.
> So let it continue to next step after free can fix it.

Please enhance the changelog. I'm missing some explanation that the new
code is still correct. I can find that out from code after a while but I
need to compare my findings with yours. It's for documentation and also
to make sure we both agree on the proposed way to fix it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: btrfs-convert: Add larger device support

2017-05-02 Thread David Sterba
On Thu, Apr 27, 2017 at 03:35:34PM +0530, Lakshmipathi.G wrote:
> Bug: Fail to convert 22TB EXT4 to BTRFS
> https://bugzilla.kernel.org/show_bug.cgi?id=194795

Thanks for working on the bug, the proposed fix looks good to me judging
by the use of extended API. This would need a test though. I'm trying
this on a sparse file, 3TB occupies 50G and is slow to create.
Conversion works and hasn't finished yet, so this is not suitable for
tests run by default.

Please write a more detailed changelog, a reference to bugzilla is not
really sufficient.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] Fix fallout from changes to FUA and PREFLUSH definitions

2017-05-02 Thread Jan Kara
Hello,

this series addresses a performance issue caused by commit b685d3d65ac7 "block:
treat REQ_FUA and REQ_PREFLUSH as synchronous". We know for certain this
problem significanly regresses (over 10%, in some cases up to 100%) ext4 and
btrfs for dbench4 and reaim benchmarks.  Based on this I have fixed up also
other places which suffer from the same problem however those changes are
untested so maintainers please have a look whether the change makes sense to
you and also whether I possibly didn't miss some cases where REQ_SYNC should be
also added. Patches in this series are completely independent so if maintainers
agree with the change, feel free to take it through your tree.

The core of the problem is that above mentioned commit removed REQ_SYNC flag
from WRITE_{FUA|PREFLUSH|...} definitions.  generic_make_request_checks()
however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage
doesn't report volatile write cache and thus write effectively becomes
asynchronous which can lead to performance regressions.

A side note for ext4: The two patches for ext4 & jbd2 are on top of the change
that got merged in the ext4 tree already.

Honza
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] btrfs: Make flush bios explicitely sync

2017-05-02 Thread Jan Kara
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

CC: David Sterba 
CC: linux-btrfs@vger.kernel.org
Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3
Signed-off-by: Jan Kara 
---
 fs/btrfs/disk-io.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eb1ee7b6f532..af75a9aab81e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3488,10 +3488,12 @@ static int write_dev_supers(struct btrfs_device *device,
 * we fua the first super.  The others we allow
 * to go down lazy.
 */
-   if (i == 0)
-   ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_FUA, bh);
-   else
+   if (i == 0) {
+   ret = btrfsic_submit_bh(REQ_OP_WRITE,
+   REQ_SYNC | REQ_FUA, bh);
+   } else {
ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
+   }
if (ret)
errors++;
}
@@ -3555,7 +3557,7 @@ static int write_dev_flush(struct btrfs_device *device, 
int wait)
 
bio->bi_end_io = btrfs_end_empty_barrier;
bio->bi_bdev = device->bdev;
-   bio->bi_opf = REQ_OP_WRITE | REQ_PREFLUSH;
+   bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
init_completion(>flush_wait);
bio->bi_private = >flush_wait;
device->flush_bio = bio;
-- 
2.12.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: regression test for nocsum buffered read's repair

2017-05-02 Thread Filipe Manana
On Fri, Apr 28, 2017 at 6:26 PM, Liu Bo  wrote:
> This is to test whether buffered read retry-repair code is able to work in
> raid1 case as expected.
>
> Please note that without checksum, btrfs doesn't know if the data used to
> repair is correct, so repair is more of resync which makes sure that both
> of the copy has the same content.
>
> Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed and 
> drop
> checks") introduced the regression.
>
> The upstream fix is
> Btrfs: bring back repair during read
>
> Signed-off-by: Liu Bo 

Reviewed-by: Filipe Manana 

Just a comment below.

> ---
> v2: - Add 'mkfs -b 1G' to limit filesystem size to 2G in raid1 profile so that
>   we get a consistent output.
>
>  tests/btrfs/143 | 197 
> 
>  tests/btrfs/143.out |  39 +++
>  tests/btrfs/group   |   1 +
>  3 files changed, 237 insertions(+)
>  create mode 100755 tests/btrfs/143
>  create mode 100644 tests/btrfs/143.out
>
> diff --git a/tests/btrfs/143 b/tests/btrfs/143
> new file mode 100755
> index 000..70f3f9f
> --- /dev/null
> +++ b/tests/btrfs/143
> @@ -0,0 +1,197 @@
> +#! /bin/bash
> +# FS QA Test 143
> +#
> +# Regression test for btrfs buffered read's repair during read without 
> checksum.
> +#
> +# This is to test whether buffered read retry-repair code is able to work in
> +# raid1 case as expected.
> +#
> +# Please note that without checksum, btrfs doesn't know if the data used to
> +# repair is correct, so repair is more of resync which makes sure that both
> +# of the copy has the same content.
> +#
> +# Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed and 
> drop
> +# checks") introduced the regression.
> +#
> +# The upstream fix is
> +#Btrfs: bring back repair during read
> +#
> +#---
> +# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_dev_pool 2
> +
> +_require_btrfs_command inspect-internal dump-tree
> +_require_command "$FILEFRAG_PROG" filefrag
> +
> +# helpe to convert 'file offset' to btrfs logical offset
> +FILEFRAG_FILTER='
> +   if (/blocks? of (\d+) bytes/) {
> +   $blocksize = $1;
> +   next
> +   }
> +   ($ext, $logical, $physical, $length) =
> +   (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/)
> +   or next;
> +   ($flags) = /.*:\s*(\S*)$/;
> +   print $physical * $blocksize, "#",
> + $length * $blocksize, "#",
> + $logical * $blocksize, "#",
> + $flags, " "'
> +
> +# this makes filefrag output script readable by using a perl helper.
> +# output is one extent per line, with three numbers separated by '#'
> +# the numbers are: physical, length, logical (all in bytes)
> +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678
> +_filter_extents()
> +{
> +   tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER"
> +}
> +
> +_check_file_extents()
> +{
> +   cmd="filefrag -v $1"
> +   echo "# $cmd" >> $seqres.full
> +   out=`$cmd | _filter_extents`
> +   if [ -z "$out" ]; then
> +   return 1
> +   fi
> +   echo "after filter: $out" >> $seqres.full
> +   echo $out
> +   return 0
> +}
> +
> +_check_repair()
> +{
> +   filter=${1:-cat}
> +   dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac 
> | $filter | grep -q -e "read error corrected"
> +   if [ $? -eq 0 ]; then
> +   echo 1
> +   else
> +   echo 0
> +   fi
> 

Re: [PATCH v2] fstests: regression test for nocsum dio read's repair

2017-05-02 Thread Filipe Manana
On Fri, Apr 28, 2017 at 6:26 PM, Liu Bo  wrote:
> Commit 2dabb3248453 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
> introduced this regression.  It'd cause 'Segmentation fault' error.
>
> The upstream fix is
> Btrfs: fix segment fault when doing dio read
>
> Signed-off-by: Liu Bo 

Reviewed-by: Filipe Manana 

Just a comment below.

> ---
> v2: - Add 'mkfs -b 1G' to limit filesystem size to 2G in raid1 profile so that
>   we get a consistent output.
>
>  tests/btrfs/142 | 189 
> 
>  tests/btrfs/142.out |  39 +++
>  tests/btrfs/group   |   1 +
>  3 files changed, 229 insertions(+)
>  create mode 100755 tests/btrfs/142
>  create mode 100644 tests/btrfs/142.out
>
> diff --git a/tests/btrfs/142 b/tests/btrfs/142
> new file mode 100755
> index 000..94566de
> --- /dev/null
> +++ b/tests/btrfs/142
> @@ -0,0 +1,189 @@
> +#! /bin/bash
> +# FS QA Test 142
> +#
> +# Regression test for btrfs DIO read's repair during read without checksum.
> +#
> +# Commit 2dabb3248453 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
> +# introduced this regression.  It'd cause 'Segmentation fault' error.
> +#
> +# The upstream fix is
> +#  Btrfs: fix segment fault when doing dio read
> +#
> +#---
> +# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_dev_pool 2
> +
> +_require_btrfs_command inspect-internal dump-tree
> +_require_command "$FILEFRAG_PROG" filefrag
> +
> +# helpe to convert 'file offset' to btrfs logical offset
> +FILEFRAG_FILTER='
> +   if (/blocks? of (\d+) bytes/) {
> +   $blocksize = $1;
> +   next
> +   }
> +   ($ext, $logical, $physical, $length) =
> +   (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/)
> +   or next;
> +   ($flags) = /.*:\s*(\S*)$/;
> +   print $physical * $blocksize, "#",
> + $length * $blocksize, "#",
> + $logical * $blocksize, "#",
> + $flags, " "'
> +
> +# this makes filefrag output script readable by using a perl helper.
> +# output is one extent per line, with three numbers separated by '#'
> +# the numbers are: physical, length, logical (all in bytes)
> +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678
> +_filter_extents()
> +{
> +   tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER"
> +}
> +
> +_check_file_extents()
> +{
> +   cmd="filefrag -v $1"
> +   echo "# $cmd" >> $seqres.full
> +   out=`$cmd | _filter_extents`
> +   if [ -z "$out" ]; then
> +   return 1
> +   fi
> +   echo "after filter: $out" >> $seqres.full
> +   echo $out
> +   return 0
> +}
> +
> +_check_repair()
> +{
> +   filter=${1:-cat}
> +   dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac 
> | $filter | grep -q -e "direct IO failed"
> +   if [ $? -eq 0 ]; then
> +   echo 1
> +   else
> +   echo 0
> +   fi
> +}
> +
> +_get_physical()
> +{
> +# $1 is logical address
> +# print chunk tree and find devid 2 which is $SCRATCH_DEV
> +$BTRFS_UTIL_PROG inspect-internal dump-tree -t 3 $SCRATCH_DEV | grep 
> $1 -A 6 | awk '($1 ~ /stripe/ && $3 ~ /devid/ && $4 ~ /1/) { print $6 }'
> +}
> +
> +
> +SYSFS_BDEV=`_sysfs_dev $SCRATCH_DEV`
> +
> +start_fail()
> +{
> +   echo 100 > $DEBUGFS_MNT/fail_make_request/probability
> +   echo 1 > $DEBUGFS_MNT/fail_make_request/times
> +   echo 0 > $DEBUGFS_MNT/fail_make_request/verbose
> +   echo 

Re: [PATCH v3] fstests: regression test for btrfs buffered read's repair

2017-05-02 Thread Filipe Manana
On Fri, Apr 28, 2017 at 6:26 PM, Liu Bo  wrote:
> This case tests whether buffered read can repair the bad copy if we
> have a good copy.
>
> Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed
> and drop checks") introduced the regression.
>
> The upstream fix is
> Btrfs: bring back repair during read
>
> Signed-off-by: Liu Bo 

Reviewed-by: Filipe Manana 

Just a comment below.

> ---
> v2: - Add regression commit and the fix to the description
> - Use btrfs inspect-internal dump-tree to get rid of the dependence 
> btrfs-map-logical
> - Add comments in several places
> - Fix typo, dio->buffered.
>
> v3: - Add 'mkfs -b 1G' to limit filesystem size to 2G in raid1 profile so that
>   we get a consistent output.
>
>  tests/btrfs/141 | 169 
> 
>  tests/btrfs/141.out |  39 
>  tests/btrfs/group   |   1 +
>  3 files changed, 209 insertions(+)
>  create mode 100755 tests/btrfs/141
>  create mode 100644 tests/btrfs/141.out
>
> diff --git a/tests/btrfs/141 b/tests/btrfs/141
> new file mode 100755
> index 000..c4e08ed
> --- /dev/null
> +++ b/tests/btrfs/141
> @@ -0,0 +1,169 @@
> +#! /bin/bash
> +# FS QA Test 141
> +#
> +# Regression test for btrfs buffered read's repair during read.
> +#
> +# Commit 20a7db8ab3f2 ("btrfs: add dummy callback for readpage_io_failed
> +# and drop checks") introduced the regression.
> +#
> +# The upstream fix is
> +#  Btrfs: bring back repair during read
> +#
> +#---
> +# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_dev_pool 2
> +
> +_require_btrfs_command inspect-internal dump-tree
> +_require_command "$FILEFRAG_PROG" filefrag
> +
> +
> +# helpe to convert 'file offset' to btrfs logical offset
> +FILEFRAG_FILTER='
> +   if (/blocks? of (\d+) bytes/) {
> +   $blocksize = $1;
> +   next
> +   }
> +   ($ext, $logical, $physical, $length) =
> +   (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/)
> +   or next;
> +   ($flags) = /.*:\s*(\S*)$/;
> +   print $physical * $blocksize, "#",
> + $length * $blocksize, "#",
> + $logical * $blocksize, "#",
> + $flags, " "'
> +
> +# this makes filefrag output script readable by using a perl helper.
> +# output is one extent per line, with three numbers separated by '#'
> +# the numbers are: physical, length, logical (all in bytes)
> +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678
> +_filter_extents()
> +{
> +   tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER"
> +}
> +
> +_check_file_extents()
> +{
> +   cmd="filefrag -v $1"
> +   echo "# $cmd" >> $seqres.full
> +   out=`$cmd | _filter_extents`
> +   if [ -z "$out" ]; then
> +   return 1
> +   fi
> +   echo "after filter: $out" >> $seqres.full
> +   echo $out
> +   return 0
> +}
> +
> +_check_repair()
> +{
> +   filter=${1:-cat}
> +   dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac 
> | $filter | grep -q -e "csum failed"
> +   if [ $? -eq 0 ]; then
> +   echo 1
> +   else
> +   echo 0
> +   fi
> +}
> +
> +_get_physical()
> +{
> +# $1 is logical address
> +# print chunk tree and find devid 2 which is $SCRATCH_DEV
> +$BTRFS_UTIL_PROG inspect-internal dump-tree -t 3 $SCRATCH_DEV | grep 
> $1 -A 6 | awk '($1 ~ /stripe/ && $3 ~ /devid/ && $4 ~ /1/) { print $6 }'
> +}
> +
> 

Re: [PATCH v3] fstests: regression test for btrfs dio read repair

2017-05-02 Thread Filipe Manana
On Fri, Apr 28, 2017 at 6:25 PM, Liu Bo  wrote:
> This case tests whether dio read can repair the bad copy if we have
> a good copy.
>
> Commit 2dabb3248453 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
> introduced the regression.
>
> The upstream fix is
> Btrfs: fix invalid dereference in btrfs_retry_endio
>
> Signed-off-by: Liu Bo 

Reviewed-by: Filipe Manana 

Just a comment below.

> ---
> v2: - Add regression commit and the fix to the description
> - Use btrfs inspect-internal dump-tree to get rid of the dependence 
> btrfs-map-logical
> - Add comments in several places
>
> v3: - Add 'mkfs -b 1G' to limit filesystem size to 2G in raid1 profile so that
>   we get a consistent output.
>
>  tests/btrfs/140 | 167 
> 
>  tests/btrfs/140.out |  39 
>  tests/btrfs/group   |   1 +
>  3 files changed, 207 insertions(+)
>  create mode 100755 tests/btrfs/140
>  create mode 100644 tests/btrfs/140.out
>
> diff --git a/tests/btrfs/140 b/tests/btrfs/140
> new file mode 100755
> index 000..dcd8807
> --- /dev/null
> +++ b/tests/btrfs/140
> @@ -0,0 +1,167 @@
> +#! /bin/bash
> +# FS QA Test 140
> +#
> +# Regression test for btrfs DIO read's repair during read.
> +#
> +# Commit 2dabb3248453 ("Btrfs: Direct I/O read: Work on sectorsized blocks")
> +# introduced the regression.
> +# The upstream fix is
> +#  Btrfs: fix invalid dereference in btrfs_retry_endio
> +#
> +#---
> +# Copyright (c) 2017 Liu Bo.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1   # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> +   cd /
> +   rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_dev_pool 2
> +
> +_require_btrfs_command inspect-internal dump-tree
> +_require_command "$FILEFRAG_PROG" filefrag
> +_require_odirect
> +
> +# helpe to convert 'file offset' to btrfs logical offset
> +FILEFRAG_FILTER='
> +   if (/blocks? of (\d+) bytes/) {
> +   $blocksize = $1;
> +   next
> +   }
> +   ($ext, $logical, $physical, $length) =
> +   (/^\s*(\d+):\s+(\d+)..\s+\d+:\s+(\d+)..\s+\d+:\s+(\d+):/)
> +   or next;
> +   ($flags) = /.*:\s*(\S*)$/;
> +   print $physical * $blocksize, "#",
> + $length * $blocksize, "#",
> + $logical * $blocksize, "#",
> + $flags, " "'
> +
> +# this makes filefrag output script readable by using a perl helper.
> +# output is one extent per line, with three numbers separated by '#'
> +# the numbers are: physical, length, logical (all in bytes)
> +# sample output: "1234#10#5678" -> physical 1234, length 10, logical 5678
> +_filter_extents()
> +{
> +   tee -a $seqres.full | $PERL_PROG -ne "$FILEFRAG_FILTER"
> +}
> +
> +_check_file_extents()
> +{
> +   cmd="filefrag -v $1"
> +   echo "# $cmd" >> $seqres.full
> +   out=`$cmd | _filter_extents`
> +   if [ -z "$out" ]; then
> +   return 1
> +   fi
> +   echo "after filter: $out" >> $seqres.full
> +   echo $out
> +   return 0
> +}
> +
> +_check_repair()
> +{
> +   filter=${1:-cat}
> +   dmesg | tac | sed -ne "0,\#run fstests $seqnum at $date_time#p" | tac 
> | $filter | grep -q -e "csum failed"
> +   if [ $? -eq 0 ]; then
> +   echo 1
> +   else
> +   echo 0
> +   fi
> +}
> +
> +_get_physical()
> +{
> +   # $1 is logical address
> +   # print chunk tree and find devid 2 which is $SCRATCH_DEV
> +   $BTRFS_UTIL_PROG inspect-internal dump-tree -t 3 $SCRATCH_DEV | grep 
> $1 -A 6 | awk '($1 ~ /stripe/ && $3 ~ /devid/ && $4 ~ /1/) { print $6 }'
> +}
> +
> +_scratch_dev_pool_get 2
> +# step 1, create a raid1 

Re: [PATCH] btrfs-progs: Use more restrict check to read out tree root

2017-05-02 Thread David Sterba
On Tue, Apr 25, 2017 at 04:40:16PM +0800, Qu Wenruo wrote:
> For fuzzed image bko-156811-bad-parent-ref-qgroup-verify.raw, it cause
> qgroup to report -ENOMEM.
> 
> But the fact is, such image is heavy damaged so there is not valid root
> item for extent tree.
> 
> Normal extent tree key in root tree should be (EXTENT_TREE ROOT_ITEM 0),
> while in that fuzzed image, we got (EXTENT_TREE EXXTENT_DATA SOME_NUMBER).
> 
> It's btrfs_find_last_root() that only checks the objectid, not caring
> key type leads to such problem.
> 
> Fix by doing extra check on key type for such case.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix fuzz-test for bko-161821.raw.txt

2017-05-02 Thread David Sterba
On Tue, May 02, 2017 at 03:36:09PM +0800, Lu Fengqi wrote:
> Fuzzed image bko-161821.raw cause btrfs check to get segmentation fault.
> 
> The function check_owner_ref attempts to access a non-exist quota tree
> when dealing with extent_item [4198400 4096] in the corrupted filesystem.
> 
> The function btrfs_new_fs_info always allocate memory for
> fs_info->quota_root regardless of whether quota_tree exists or not.
> Additionally, the function btrfs_read_fs_root will directly return
> fs_info->quota_root if location->objectid == BTRFS_QUOTA_TREE_OBJECTID.
> 
> This patch does the following things:
> 1. Do extra check and return ENOENT if quota tree does not exist in the
> function btrfs_read_fs_root.
> 2. Free useless fs_info->quota_root in the function btrfs_setup_all_roots
> to reduce confusion.
> 3. free_extent_buffer even if check_child_node failed in the function
> walk_down_tree.
> 
> Signed-off-by: Lu Fengqi 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: check: Fix memory leak in check_chunks_and_extents

2017-05-02 Thread David Sterba
On Tue, May 02, 2017 at 11:17:25AM +0800, Qu Wenruo wrote:
> fsck/003-shift-offsets makes valgrinds complaining about memory leaks.
> ==5910==
> ==5910== HEAP SUMMARY:
> ==5910== in use at exit: 1,112 bytes in 11 blocks
> ==5910==   total heap usage: 161 allocs, 150 frees, 164,800 bytes allocated
> ==5910==
> ==5910== 216 (72 direct, 144 indirect) bytes in 1 blocks are definitely lost 
> in loss record 3 of 5
> ==5910==at 0x4C2AF1F: malloc (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==5910==by 0x4815A3: add_root_item_to_list (cmds-check.c:9683)
> ==5910==by 0x481CE2: check_chunks_and_extents (cmds-check.c:9886)
> ==5910==by 0x4B: cmd_check (cmds-check.c:12977)
> ==5910==by 0x40A8C5: main (btrfs.c:246)
> ==5910==
> 
> The check_chunks_and_extents() memory leaks are caused by not freeing
> added root items of normal_trees and dropping_trees.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: tests: fssum, fix memory leak

2017-05-02 Thread David Sterba
On Wed, Apr 26, 2017 at 09:22:41AM +0800, Lu Fengqi wrote:
> Free the alloced memory and close dir before exit.
> 
> Signed-off-by: Lu Fengqi 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I see what device was used to mount btrfs?

2017-05-02 Thread Andrei Borzenkov
On Tue, May 2, 2017 at 4:58 PM, Adam Borowski  wrote:
> On Sun, Apr 30, 2017 at 08:47:43AM +0300, Andrei Borzenkov wrote:
>> I'm chasing issue with btrfs mounts under systemd
>> (https://github.com/systemd/systemd/issues/5781) - to summarize, systemd
>> waits for the final device that makes btrfs complete and mounts it using
>> this device name.
>
> Systemd is wrong here -- its approach can possibly work only on clean mounts;
> if you need to mount degraded it will hang forever.  Even worse, if it's not
> the root filesystem, it will "helpfully" unmount it after you mount manually
> (for root fs, it will _try_ to unmount but that obviously fails, resulting
> in nothing but some CPU wasted on trying to unmount over and over).
>
>> But in /proc/self/mountinfo we actually see another
>> device name. Due to peculiarities of systemd implementation this device
>> "does not exist" from systemd PoV.
>>
>> Looking at btrfs code I start to suspect that we actually do not know
>> what device was used to mount it at all.
>>
>> So we always show device with the smallest devid, irrespectively of what
>> device was actually used to mount it.
>
> Devices come and go (ok, it's not like you hot-remove disks every day,
> but...).  Storing the device that started the mount is pointless: btrfs
> can handle removal fine so such a stored device would point nowhere -- or
> worse, to some unrelated innocent disk you put in for data recovery (you may
> have other plans than re-provisioning that raid).
>

Yes, I understand all of this, you do not need to convince me. OTOH
the problem is real - we need to have some way to order btrfs mounts
during bootup. In the past it was solved by delays. Systemd tries to
eliminate ad hoc delays ... which is by itself not bad. So what can be
utilized from btrfs side to implement ordering? We need /something/ to
wait for. It could be virtual device that represents btrfs RAID and
have state online/offline (similar to Linux MD). It could be some
daemon that waits for btrfs to become complete. Do we have something?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix memory leak when 0 sized block group item is found

2017-05-02 Thread David Sterba
On Tue, Apr 25, 2017 at 04:01:17PM +0800, Qu Wenruo wrote:
> When a 0 sized block group item is found, set_extent_bits() will not
> really set any bits.
> While set_state_private() still inserts allocated block group cache into
> block group extent_io_tree.
> 
> So at close_ctree() time, we won't free the private block group cache
> stored since we can't find any bit set for the 0 sized block group.
> 
> To fix it, at btrfs_read_block_groups() we skip any 0 sized block group,
> so such leak won't happen.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: tests: add variable quotation to fsck-tests

2017-05-02 Thread David Sterba
On Mon, Apr 24, 2017 at 06:59:40PM +0530, Lakshmipathi.G wrote:
> Signed-off-by: Lakshmipathi.G 

Applied, thanks. As this was done out of order with other patches,
tests/fsck-tests/026-check-inode-link/test.sh was skipped.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can I see what device was used to mount btrfs?

2017-05-02 Thread Adam Borowski
On Sun, Apr 30, 2017 at 08:47:43AM +0300, Andrei Borzenkov wrote:
> I'm chasing issue with btrfs mounts under systemd
> (https://github.com/systemd/systemd/issues/5781) - to summarize, systemd
> waits for the final device that makes btrfs complete and mounts it using
> this device name.

Systemd is wrong here -- its approach can possibly work only on clean mounts;
if you need to mount degraded it will hang forever.  Even worse, if it's not
the root filesystem, it will "helpfully" unmount it after you mount manually
(for root fs, it will _try_ to unmount but that obviously fails, resulting
in nothing but some CPU wasted on trying to unmount over and over).

> But in /proc/self/mountinfo we actually see another
> device name. Due to peculiarities of systemd implementation this device
> "does not exist" from systemd PoV.
> 
> Looking at btrfs code I start to suspect that we actually do not know
> what device was used to mount it at all.
> 
> So we always show device with the smallest devid, irrespectively of what
> device was actually used to mount it.

Devices come and go (ok, it's not like you hot-remove disks every day,
but...).  Storing the device that started the mount is pointless: btrfs
can handle removal fine so such a stored device would point nowhere -- or
worse, to some unrelated innocent disk you put in for data recovery (you may
have other plans than re-provisioning that raid).


Meow!
-- 
Don't be racist.  White, amber or black, all beers should be judged based
solely on their merits.  Heck, even if occasionally a cider applies for a
beer's job, why not?
On the other hand, corpo lager is not a race.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS warning (device sda7): block group 181491728384 has wrong amount of free space

2017-05-02 Thread Duncan
Marat Khalili posted on Tue, 02 May 2017 09:14:12 +0300 as excerpted:

> I cannot understand two messages in syslog, could someone please shed
> some light? Here they are:
> 
>> Apr 29 08:54:03 container-name kernel: [792742.662375] BTRFS warning
>> (device sda7): block group 181491728384 has wrong amount of free space

>> Apr 29 08:54:03 container-name kernel: [792742.662381] BTRFS warning
>> (device sda7): failed to load free space cache for block group
>> 181491728384, rebuilding it now

> Especially strange is the fact that messages appear in LXC container's
> syslog, but not in syslog of a host system. I only saw network and
> apparmor-related messages in container syslogs before.
> 
> I didn't run any usermode btrfs tools at the time (especially in
> container, since they are not even installed there), but there's a quota
> set for this subvolume, and it was coming close to exhausting by large
> mysql database. There're no snapshots this time. smartmon finds no
> problems.

These aren't anything to be alarmed about in themselves; they're just low 
priority warnings from the free-space *cache*, saying it somehow got out 
of sync with what's actually there.  As that's just a cache, designed to 
help speed up looking for free space without having to actually go and 
check every time, and problems will be caught and that block-group's 
cache invalidated and recreated (with those messages indicating that's 
exactly what's happening) if btrfs finds there's actually data written in 
a block that the cache said was free, no harm done.

Inconsistencies are typically generated in an unclean unmount situation, 
but may not be caught until sometime later, when btrfs goes to actually 
use some space the cache says should be available, that actually isn't.

AFAIK (I'm a btrfs user and list regular, not a dev), it shouldn't have 
anything to do with quota.

I wouldn't worry about it myself, particularly if I hadn't been able to 
do a clean unmount at some point, tho if I saw the warning, I'd probably 
do a btrfs scrub just to be sure that warning wasn't hinting at some more 
real problem somewhere.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove an unused variable

2017-05-02 Thread Dan Carpenter
"item" is never used.

Signed-off-by: Dan Carpenter 

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1e4e4532f381..337aef86dae5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5881,7 +5881,6 @@ static int btrfs_real_readdir(struct file *file, struct 
dir_context *ctx)
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_root *root = BTRFS_I(inode)->root;
-   struct btrfs_item *item;
struct btrfs_dir_item *di;
struct btrfs_key key;
struct btrfs_key found_key;
@@ -5932,7 +5931,6 @@ static int btrfs_real_readdir(struct file *file, struct 
dir_context *ctx)
continue;
}
 
-   item = btrfs_item_nr(slot);
btrfs_item_key_to_cpu(leaf, _key, slot);
 
if (found_key.objectid != key.objectid)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Fix fuzz-test for bko-161821.raw.txt

2017-05-02 Thread Lu Fengqi
Fuzzed image bko-161821.raw cause btrfs check to get segmentation fault.

The function check_owner_ref attempts to access a non-exist quota tree
when dealing with extent_item [4198400 4096] in the corrupted filesystem.

The function btrfs_new_fs_info always allocate memory for
fs_info->quota_root regardless of whether quota_tree exists or not.
Additionally, the function btrfs_read_fs_root will directly return
fs_info->quota_root if location->objectid == BTRFS_QUOTA_TREE_OBJECTID.

This patch does the following things:
1. Do extra check and return ENOENT if quota tree does not exist in the
function btrfs_read_fs_root.
2. Free useless fs_info->quota_root in the function btrfs_setup_all_roots
to reduce confusion.
3. free_extent_buffer even if check_child_node failed in the function
walk_down_tree.

Signed-off-by: Lu Fengqi 
---
 cmds-check.c |  1 +
 disk-io.c| 13 ++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/cmds-check.c b/cmds-check.c
index 17b7efbf..4c7532d0 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2185,6 +2185,7 @@ static int walk_down_tree(struct btrfs_root *root, struct 
btrfs_path *path,
 
ret = check_child_node(cur, path->slots[*level], next);
if (ret) {
+   free_extent_buffer(next);
err = ret;
goto out;
}
diff --git a/disk-io.c b/disk-io.c
index 985c4a9f..6aa6d98a 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -815,7 +815,8 @@ struct btrfs_root *btrfs_read_fs_root(struct btrfs_fs_info 
*fs_info,
if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
return fs_info->csum_root;
if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
-   return fs_info->quota_root;
+   return fs_info->quota_enabled ? fs_info->quota_root :
+   ERR_PTR(-ENOENT);
 
BUG_ON(location->objectid == BTRFS_TREE_RELOC_OBJECTID ||
   location->offset != (u64)-1);
@@ -837,12 +838,14 @@ struct btrfs_root *btrfs_read_fs_root(struct 
btrfs_fs_info *fs_info,
 
 void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
 {
+   if (fs_info->quota_root)
+   free(fs_info->quota_root);
+
free(fs_info->tree_root);
free(fs_info->extent_root);
free(fs_info->chunk_root);
free(fs_info->dev_root);
free(fs_info->csum_root);
-   free(fs_info->quota_root);
free(fs_info->free_space_root);
free(fs_info->super_copy);
free(fs_info->log_root_tree);
@@ -1057,8 +1060,12 @@ int btrfs_setup_all_roots(struct btrfs_fs_info *fs_info, 
u64 root_tree_bytenr,
 
ret = find_and_setup_root(root, fs_info, BTRFS_QUOTA_TREE_OBJECTID,
  fs_info->quota_root);
-   if (ret == 0)
+   if (ret) {
+   free(fs_info->quota_root);
+   fs_info->quota_root = NULL;
+   } else {
fs_info->quota_enabled = 1;
+   }
 
if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
ret = find_and_setup_root(root, fs_info, 
BTRFS_FREE_SPACE_TREE_OBJECTID,
-- 
2.12.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs won't mount any more

2017-05-02 Thread Marc Haber
On Thu, Apr 13, 2017 at 10:45:09AM +0200, Marc Haber wrote:
> I do have a dd copy of the device now.
> 
> $ sudo losetup --find --show ./dropbtr0.btrfs
> $ sudo mount -o skip_balance -t btrfs /dev/loop0 /mnt/tempdisk
> 
> does immediately result in:
> 
> Apr 12 22:37:48 fan kernel: [  124.742104] loop: module loaded
> Apr 12 22:37:48 fan kernel: [  124.784727] BTRFS: device label dropbtr0 devid 
> 1 transid 1530529 /dev/loop0
> Apr 12 22:38:07 fan kernel: [  143.120268] BTRFS info (device loop0): disk 
> space caching is enabled
> Apr 12 22:38:07 fan kernel: [  143.207872] BUG: unable to handle kernel NULL 
> pointer dereference at 01f0

This is now https://bugzilla.kernel.org/show_bug.cgi?id=195631

Greetings
Marc
-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


runtime btrfsck

2017-05-02 Thread Stefan Priebe - Profihost AG
Hello list,

i wanted to check an fs cause it has bad key ordering.

But btrfscheck is now running since 7 days. Current output:
# btrfsck -p --repair /dev/mapper/crypt_md0
enabling repair mode
Checking filesystem on /dev/mapper/crypt_md0
UUID: 37b15dd8-b2e1-4585-98d0-cc0fa2a5a7c9
bad key ordering 39 40
checking extents [O]

FS is a 12TB BTRFS Raid 0 over 3 mdadm Raid 5 devices. How long should
btrfsck run and is there any way to speed it up? btrfs tools is 4.8.5

Thanks!

Greets,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: check: Fix heap use after free.

2017-05-02 Thread Su Yue
fsck/004-no-dir-index makes valgrinds complaining about Invalid read.
==31890== Invalid read of size 1
==31890==at 0x453D09: repair_inode_backrefs (cmds-check.c:2690)
==31890==by 0x453D09: check_inode_recs (cmds-check.c:3330)
==31890==by 0x453D09: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890==at 0x4C2C14B: free (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890==by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890==by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Block was alloc'd at
==31890==at 0x4C2AF1F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890==by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890==by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890==by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890==by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890==by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==
==31890== Invalid read of size 8
==31890==at 0x452D66: repair_inode_backrefs (cmds-check.c:2731)
==31890==by 0x452D66: check_inode_recs (cmds-check.c:3330)
==31890==by 0x452D66: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Address 0x5cb7b90 is 16 bytes inside a block of size 50 free'd
==31890==at 0x4C2C14B: free (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x453D08: repair_inode_backrefs (cmds-check.c:2684)
==31890==by 0x453D08: check_inode_recs (cmds-check.c:3330)
==31890==by 0x453D08: check_fs_root (cmds-check.c:4012)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==  Block was alloc'd at
==31890==at 0x4C2AF1F: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==31890==by 0x45055C: get_inode_backref (cmds-check.c:1075)
==31890==by 0x45055C: add_inode_backref (cmds-check.c:1097)
==31890==by 0x45180C: process_dir_item (cmds-check.c:1525)
==31890==by 0x45180C: process_one_leaf (cmds-check.c:1838)
==31890==by 0x45180C: walk_down_tree (cmds-check.c:2134)
==31890==by 0x45180C: check_fs_root (cmds-check.c:3957)
==31890==by 0x45E788: check_fs_roots (cmds-check.c:4098)
==31890==by 0x45E788: cmd_check (cmds-check.c:13031)
==31890==by 0x40A88A: main (btrfs.c:246)
==31890==

The repair_inode_backrefs use the backref again after free while iterating over 
backrefs.
So let it continue to next step after free can fix it.

Signed-off-by: Su Yue 
---
 cmds-check.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 5cc84690..c910e520 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -2666,6 +2666,7 @@ static int repair_inode_backrefs(struct btrfs_root *root,
repaired++;
list_del(>list);
free(backref);
+   continue;
}
 
if (!delete && !backref->found_dir_index &&
@@ -2676,12 +2677,12 @@ static int repair_inode_backrefs(struct btrfs_root 
*root,
break;
repaired++;
if (backref->found_dir_item &&
-   backref->found_dir_index &&
backref->found_dir_index) {
if (!backref->errors &&
backref->found_inode_ref) {
list_del(>list);
free(backref);
+   continue;
}
}
}
-- 
2.12.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS warning (device sda7): block group 181491728384 has wrong amount of free space

2017-05-02 Thread Marat Khalili

Dear all,

I cannot understand two messages in syslog, could someone please shed 
some light? Here they are:


Apr 29 08:54:03 container-name kernel: [792742.662375] BTRFS warning 
(device sda7): block group 181491728384 has wrong amount of free space
Apr 29 08:54:03 container-name kernel: [792742.662381] BTRFS warning 
(device sda7): failed to load free space cache for block group 
181491728384, rebuilding it now
Especially strange is the fact that messages appear in LXC container's 
syslog, but not in syslog of a host system. I only saw network and 
apparmor-related messages in container syslogs before.


I didn't run any usermode btrfs tools at the time (especially in 
container, since they are not even installed there), but there's a quota 
set for this subvolume, and it was coming close to exhausting by large 
mysql database. There're no snapshots this time. smartmon finds no problems.



marat@host:~$ uname -a
Linux host 4.4.0-72-generic #93-Ubuntu SMP Fri Mar 31 14:07:41 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

marat@host:~$ lsb_release -a
No LSB modules are available.
Distributor ID:Ubuntu
Description:Ubuntu 16.04.2 LTS
Release:16.04
Codename:xenial
marat@host:~$ btrfs --version
btrfs-progs v4.4
marat@host:~$ sudo btrfs qgroup show -F -pcre 
/mnt/lxc/container-name/rootfs

qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/80263.93GiB 63.93GiB 64.00GiB none --- ---
marat@host:~$ sudo btrfs filesystem show /dev/sda7 # run after freeing 
space by clearing database

Label: 'data'  uuid: 37d3313a-e2ad-4b7f-98fc-a01d815952e0
Total devices 2 FS bytes used 47.73GiB
devid1 size 2.71TiB used 114.01GiB path /dev/sda7
devid2 size 2.71TiB used 114.01GiB path /dev/sdb7
marat@host:~$ sudo btrfs filesystem df /mnt/lxc/container-name/rootfs# 
run after freeing space by clearing database

Data, RAID1: total=111.00GiB, used=46.83GiB
System, RAID1: total=8.00MiB, used=32.00KiB
Metadata, RAID1: total=3.00GiB, used=983.11MiB
GlobalReserve, single: total=336.00MiB, used=0.00B
marat@host:~$ sudo lxc-attach -n container-name cat /proc/mounts | 
grep sda7
/dev/sda7 / btrfs 
rw,relatime,space_cache,subvolid=802,subvol=/lxc/container-name/rootfs 0 0


--

With Best Regards,
Marat Khalili

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html