Re: Qgroups are not applied when snapshotting a subvol?

2017-03-26 Thread Qu Wenruo



At 03/27/2017 11:26 AM, Andrei Borzenkov wrote:

27.03.2017 03:39, Qu Wenruo пишет:



At 03/26/2017 06:03 AM, Moritz Sichert wrote:

Hi,

I tried to configure qgroups on a btrfs filesystem but was really
surprised that when you snapshot a subvolume, the snapshot will not be
assigned to the qgroup the subvolume was in.

As an example consider the small terminal session in the attachment: I
create a subvol A, assign it to qgroup 1/1 and set a limit of 5M on
that qgroup. Then I write a file into A and eventually get "disk quota
exceeded". Then I create a snapshot of A and call it B. B will not be
assigned to 1/1 and writing a file into B confirms that no limits at
all are imposed for B.

I feel like I must be missing something here. Considering that
creating a snapshot does not require root privileges this would mean
that any user can just circumvent any quota and therefore make them
useless.

Is there a way to enforce quotas even when a user creates snapshots?



Yes, there is always method to attach the subvolume/snapshot to
specified higher level qgroup.

Just use "btrfs subvolume snapshot -i 1/1".



This requires cooperation from whoever creates subvolume, while the
question was - is it possible to enforce it, without need for explicit
option/action when snapshot is created.

To reiterate - if user omits "-i 1/1" (s)he "escapes" from quota
enforcement.


What if user really want to create a subvolume assigned another group?

You're implying a *policy* that if source subvolume belongs to a higher 
level qgroup, then snapshot created should also follow that higher level 
qgroup.


However kernel should only provide *mechanisim*, not *policy*.
And btrfs does it, it provides method to do it, whether to do or not is 
users responsibility.


If you want to implement that policy, please do it in a higher level, 
something like SUSE snapper, not in kernel.


Thanks,

Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qgroups are not applied when snapshotting a subvol?

2017-03-26 Thread Andrei Borzenkov
27.03.2017 03:39, Qu Wenruo пишет:
> 
> 
> At 03/26/2017 06:03 AM, Moritz Sichert wrote:
>> Hi,
>>
>> I tried to configure qgroups on a btrfs filesystem but was really
>> surprised that when you snapshot a subvolume, the snapshot will not be
>> assigned to the qgroup the subvolume was in.
>>
>> As an example consider the small terminal session in the attachment: I
>> create a subvol A, assign it to qgroup 1/1 and set a limit of 5M on
>> that qgroup. Then I write a file into A and eventually get "disk quota
>> exceeded". Then I create a snapshot of A and call it B. B will not be
>> assigned to 1/1 and writing a file into B confirms that no limits at
>> all are imposed for B.
>>
>> I feel like I must be missing something here. Considering that
>> creating a snapshot does not require root privileges this would mean
>> that any user can just circumvent any quota and therefore make them
>> useless.
>>
>> Is there a way to enforce quotas even when a user creates snapshots?
>>
> 
> Yes, there is always method to attach the subvolume/snapshot to
> specified higher level qgroup.
> 
> Just use "btrfs subvolume snapshot -i 1/1".
> 

This requires cooperation from whoever creates subvolume, while the
question was - is it possible to enforce it, without need for explicit
option/action when snapshot is created.

To reiterate - if user omits "-i 1/1" (s)he "escapes" from quota
enforcement.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/5] btrfs: raid56: Use bio_counter to protect scrub

2017-03-26 Thread Qu Wenruo



At 03/25/2017 07:21 AM, Liu Bo wrote:

On Fri, Mar 24, 2017 at 10:00:27AM +0800, Qu Wenruo wrote:

Unlike other place calling btrfs_map_block(), in raid56 scrub, we don't
use bio_counter to protect from race against dev replace.

This patch will use bio_counter to protect from the beginning of calling
btrfs_map_sblock(), until rbio endio.

Liu Bo 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/raid56.c | 2 ++
 fs/btrfs/scrub.c  | 5 +
 2 files changed, 7 insertions(+)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 1571bf26dc07..3a083165400f 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2642,6 +2642,7 @@ static void async_scrub_parity(struct btrfs_raid_bio 
*rbio)

 void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio)
 {
+   rbio->generic_bio_cnt = 1;


To keep consistent with other places, can you please do this setting when
allocating rbio?


No problem.




if (!lock_stripe_add(rbio))
async_scrub_parity(rbio);
 }
@@ -2694,6 +2695,7 @@ static void async_missing_raid56(struct btrfs_raid_bio 
*rbio)

 void raid56_submit_missing_rbio(struct btrfs_raid_bio *rbio)
 {
+   rbio->generic_bio_cnt = 1;
if (!lock_stripe_add(rbio))
async_missing_raid56(rbio);
 }


Ditto.


diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2a5458004279..265387bf3af8 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2379,6 +2379,7 @@ static void scrub_missing_raid56_pages(struct scrub_block 
*sblock)
int ret;
int i;

+   btrfs_bio_counter_inc_blocked(fs_info);
ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical,
, , 0, 1);
if (ret || !bbio || !bbio->raid_map)
@@ -2423,6 +2424,7 @@ static void scrub_missing_raid56_pages(struct scrub_block 
*sblock)
 rbio_out:
bio_put(bio);
 bbio_out:
+   btrfs_bio_counter_dec(fs_info);
btrfs_put_bbio(bbio);
spin_lock(>stat_lock);
sctx->stat.malloc_errors++;
@@ -2966,6 +2968,8 @@ static void scrub_parity_check_and_repair(struct 
scrub_parity *sparity)
goto out;

length = sparity->logic_end - sparity->logic_start;
+
+   btrfs_bio_counter_inc_blocked(fs_info);
ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, sparity->logic_start,
   , , 0, 1);
if (ret || !bbio || !bbio->raid_map)
@@ -2993,6 +2997,7 @@ static void scrub_parity_check_and_repair(struct 
scrub_parity *sparity)
 rbio_out:
bio_put(bio);
 bbio_out:
+   btrfs_bio_counter_dec(fs_info);
btrfs_put_bbio(bbio);
bitmap_or(sparity->ebitmap, sparity->ebitmap, sparity->dbitmap,
  sparity->nsectors);
--
2.12.1





If patch 4 and 5 are still supposed to fix the same problem, can you please
merge them into one patch so that a future bisect could be precise?


Yes, they are still fixing the same problem, and tests have already show 
the test is working. (We found a physical machine which normal btrfs/069 
can easily trigger it)


I'll merge them into one patch in next version.



And while I believe this fixes the crash described in patch 4,
scrub_setup_recheck_block() also retrives all stripes, and if we scrub
one device, and another device is being replaced so it could be freed
during scrub, is it another potential race case?


Seems to be another race, and it can only be triggered when a corruption 
is detected, while current test case doesn't include such corruption 
scenario (unless using degraded mount), so we didn't encounter it yet.


Although I can fix it in next update, I'm afraid we won't have proper 
test case for it until we have good enough btrfs-corrupt-block.


Thanks,
Qu



Thanks,

-liubo





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: Cleanup kernel-shared dir when execute make clean

2017-03-26 Thread Qu Wenruo
Reported-by: Chris 
Signed-off-by: Qu Wenruo 
---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index fdc63c3a..d1af4311 100644
--- a/Makefile
+++ b/Makefile
@@ -502,6 +502,7 @@ clean: $(CLEANDIRS)
@echo "Cleaning"
$(Q)$(RM) -f -- $(progs) *.o *.o.d \
kernel-lib/*.o kernel-lib/*.o.d \
+   kernel-shared/*.o kernel-shared/*.o.d \
image/*.o image/*.o.d \
convert/*.o convert/*.o.d \
mkfs/*.o mkfs/*.o.d \
-- 
2.12.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Qgroups are not applied when snapshotting a subvol?

2017-03-26 Thread Qu Wenruo



At 03/26/2017 06:03 AM, Moritz Sichert wrote:

Hi,

I tried to configure qgroups on a btrfs filesystem but was really surprised 
that when you snapshot a subvolume, the snapshot will not be assigned to the 
qgroup the subvolume was in.

As an example consider the small terminal session in the attachment: I create a subvol A, 
assign it to qgroup 1/1 and set a limit of 5M on that qgroup. Then I write a file into A 
and eventually get "disk quota exceeded". Then I create a snapshot of A and 
call it B. B will not be assigned to 1/1 and writing a file into B confirms that no 
limits at all are imposed for B.

I feel like I must be missing something here. Considering that creating a 
snapshot does not require root privileges this would mean that any user can 
just circumvent any quota and therefore make them useless.

Is there a way to enforce quotas even when a user creates snapshots?



Yes, there is always method to attach the subvolume/snapshot to 
specified higher level qgroup.


Just use "btrfs subvolume snapshot -i 1/1".

Thanks,
Qu



Moritz





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: send snapshot from snapshot incremental

2017-03-26 Thread Peter Grandi
[ ... ]
> BUT if i take a snapshot from the system, and want to transfer
> it to the external HD, i can not set a parent subvolume,
> because there isn't any.

Questions like this are based on incomplete understanding of
'send' and 'receive', and on IRC user "darkling" explained it
fairly well:

> When you use -c, you're telling the FS that it can expect to
> find a sent copy of that subvol on the receiving side, and
> that anything shared with it can be sent by reference. OK, so
> with -c on its own, you're telling the FS that "all the data
> in this subvol already exists on the remote".

> So, when you send your subvol, *all* of the subvol's metadata
> is sent, and where that metadata refers to an extent that's
> shared with the -c subvol, the extent data isn't sent, because
> it's known to be on the other end already, and can be shared
> directly from there.

> OK. So, with -p, there's a "base" subvol. The send subvol and
> the -p reference subvol are both snapshots of that base (at
> different times). The -p reference subvol, as with -c, is
> assumed to be on the remote FS. However, because it's known to
> be an earlier version of the same data, you can be more
> efficient in the sending by saying "start from the earlier
> version, and modify it in this way to get the new version"

> So, with -p, not all of the metadata is sent, because you know
> you've already got most of it on the remote in the form of the
> earlier version.

> So -p is "take this thing and apply these differences to it"
> and -c is "build this thing from scratch, but you can share
> some of the data with these sources"

Also here some additional details:

  http://logs.tvrrug.org.uk/logs/%23btrfs/2016-06-29.html#2016-06-29T22:39:59

The requirement for read-only is because in that way it is
pretty sure that the same stuff is on both origin and target
volume.

It may help to compare with RSYNC: it has to scan both the full
origin and target trees, because it cannot be told that there is
a parent tree that is the same on origin and target; but with
option '--link-dest' it can do something similar to 'send -c'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Missing __error symbol in btrfs-progs-4.10

2017-03-26 Thread Mike Gilbert
A Gentoo user reports that snapper broke after upgrading to btrfs-progs-4.10.

snapper: symbol lookup error: /usr/lib64/libbtrfs.so.0: undefined
symbol: __error

https://bugs.gentoo.org/show_bug.cgi?id=613890

It seems that the __error symbol is referenced, but not included in libbtrfs.

Exporting this symbol seems like a bad idea, so I would suggest
renaming it, or go back to inlining it in the headers.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: backing up a file server with many subvolumes

2017-03-26 Thread Peter Grandi
> [ ... ] In each filesystem subdirectory are incremental
> snapshot subvolumes for that filesystem.  [ ... ] The scheme
> is something like this:

> /backup///

BTW hopefully this does not amounts to too many subvolumes in
the '.../backup/' volume, because that can create complications,
where "too many" IIRC is more than a few dozen (even if a low
number of hundreds is still doable).

> I'd like to try to back up (duplicate) the file server
> filesystem containing these snapshot subvolumes for each
> remote machine. The problem is that I don't think I can use
> send/receive to do this. "Btrfs send" requires "read-only"
> snapshots, and snapshots are not recursive as yet.

Why is that a problem? What is a recursive snapshot?

> I think there are too many subvolumes which change too often
> to make doing this without recursion practical.

It is not clear to me how the «incremental snapshot subvolumes
for that filesystem» are made, whether with RSYNC or 'send' and
'receive' itself. It is also not clear to me why those snapshots
«change too often», why would they change at all? Once a backup
is made in whichever way to an «incremental snapshot», why would
that «incremental snapshot» ever change but for being deleted?

There are some tools that rely on the specific abilities of
'send' with options '-p' and '-c' to save a lot of network
bandwidth and target storage space, perhaps you might be
interested in searching for them.

Anyhow I'll repeat here part of an answer to a similar message:
issues like yours usually are based on incomplete understanding
of 'send' and 'receive', and on IRC user "darkling" explained it
fairly well:

> When you use -c, you're telling the FS that it can expect to
> find a sent copy of that subvol on the receiving side, and
> that anything shared with it can be sent by reference. OK, so
> with -c on its own, you're telling the FS that "all the data
> in this subvol already exists on the remote".

> So, when you send your subvol, *all* of the subvol's metadata
> is sent, and where that metadata refers to an extent that's
> shared with the -c subvol, the extent data isn't sent, because
> it's known to be on the other end already, and can be shared
> directly from there.

> OK. So, with -p, there's a "base" subvol. The send subvol and
> the -p reference subvol are both snapshots of that base (at
> different times). The -p reference subvol, as with -c, is
> assumed to be on the remote FS. However, because it's known to
> be an earlier version of the same data, you can be more
> efficient in the sending by saying "start from the earlier
> version, and modify it in this way to get the new version"

> So, with -p, not all of the metadata is sent, because you know
> you've already got most of it on the remote in the form of the
> earlier version.

> So -p is "take this thing and apply these differences to it"
> and -c is "build this thing from scratch, but you can share
> some of the data with these sources"

Also here some additional details:

  http://logs.tvrrug.org.uk/logs/%23btrfs/2016-06-29.html#2016-06-29T22:39:59

The requirement for read-only is because in that way it is
pretty sure that the same stuff is on both origin and target
volume.

It may help to compare with RSYNC: it has to scan both the full
origin and target trees, because it cannot be told that there is
a parent tree that is the same on origin and target; but with
option '--link-dest' it can do something similar to 'send -c'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: backing up a file server with many subvolumes

2017-03-26 Thread Adam Borowski
On Sun, Mar 26, 2017 at 02:14:36PM +0500, Roman Mamedov wrote:
> You could have done time-based snapshots on the top level (for /backup/), say,
> every 6 hours, and keep those for e.g. a month. Then don't bother with any
> other kind of subvolumes/snapshots on the backup machine, and do backups from
> remote machines into their respective subdirectories using simple 'rsync'.
> 
> That's what a sensible scheme looks like IMO, as opposed to a Btrfs-induced
> exercise in futility that you have (there are subvolumes? must use them for
> everything, even the frigging /boot/; there is send/receive? absolutely must
> use it for backing up; etc.)

Using old boring rsync is actually a pretty good idea, with caveats.

I for one don't herd server farms, thus systems I manage tend to be special
snowflakes.  Some run modern btrfs, some are on ancient kernels, usually /
is on a mdraid with a traditional filesystem, I got a bunch of ARM SoCs at
home -- plus even an ARM hosted server at Scaleway.  Standardizing on rsync
lets me make all those snowflakes backup the same way.  Only on the
destination I make full use of btrfs features.

Another benefit of rsync is that I don't exactly trust that send from 3.13
to receive on 4.9 won't have a data loss bug, while rsync is extremely well
tested.

On the other hand, rsync is _slow_.  Mere stat() calls on a non-trivial
piece of spinning rust can take half on hour.  That's something that's fine
in a nightly, but what if you want to back important stuff every 3 hours? 
Especially if those are, say, Maildir mails -- many many files to stat,
almost all of them cold.  Here send/receive shines.

And did I say that's important stuff?  So you send/receive to one target
every 3 hours, and rsync nightly to another.

-- 
⢀⣴⠾⠻⢶⣦⠀ Meow!
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second
⠈⠳⣄ preimage for double rot13!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs-progs clean is incomplete

2017-03-26 Thread Chris
"make clean-all" leaves generated files under the "kernel-shared" directory.

This causes problems when trying to build under the same tree when the
absolute path changes.

For example if you mount the source tree in one location,
generate/build, then mount it in a different location, clean-all,
re-build, and an error appears. "No rule to make target
'.../kerncompat.h', needed by 'kernel-shared/ulist.o'.  Stop."  This
happens because the "kernel-shared/ulist.o.d" file was not cleaned and
lists an incorrect path.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: backing up a file server with many subvolumes

2017-03-26 Thread Roman Mamedov
On Sat, 25 Mar 2017 23:00:20 -0400
"J. Hart"  wrote:

> I have a Btrfs filesystem on a backup server.  This filesystem has a 
> directory to hold backups for filesystems from remote machines.  In this 
> directory is a subdirectory for each machine.  Under each machine 
> subdirectory is one directory for each filesystem (ex /boot, /home, etc) 
> on that machine.  In each filesystem subdirectory are incremental 
> snapshot subvolumes for that filesystem.  The scheme is something like 
> this:
> 
> /backup///
> 
> I'd like to try to back up (duplicate) the file server filesystem 
> containing these snapshot subvolumes for each remote machine.  The 
> problem is that I don't think I can use send/receive to do this. "Btrfs 
> send" requires "read-only" snapshots, and snapshots are not recursive as 
> yet.  I think there are too many subvolumes which change too often to 
> make doing this without recursion practical.

You could have done time-based snapshots on the top level (for /backup/), say,
every 6 hours, and keep those for e.g. a month. Then don't bother with any
other kind of subvolumes/snapshots on the backup machine, and do backups from
remote machines into their respective subdirectories using simple 'rsync'.

That's what a sensible scheme looks like IMO, as opposed to a Btrfs-induced
exercise in futility that you have (there are subvolumes? must use them for
everything, even the frigging /boot/; there is send/receive? absolutely must
use it for backing up; etc.)

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html