Re: difference between -c and -p for send-receive?

2017-09-18 Thread Duncan
Dave posted on Mon, 18 Sep 2017 20:41:45 -0400 as excerpted:

>> Well, I do not immediately see why -c must imply incremental send. We
>> want to reduce amount of data that is transferred, so reuse data from
>> existing snapshots, but it is really orthogonal to whether we send full
>> subvolume or just changes since another snapshot.
>>
>>
> Starting months ago when I began using btrfs serious, I have been
> reading,
> rereading and trying to understand this:
> 
> FAQ - btrfs Wiki
> https://btrfs.wiki.kernel.org/index.php/
FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F
> 
> The comment above suddenly gives me another clue...
> 
> However, I still don't understand terms like "clone range ioctl",
> although I can guess it is something like a hard link.
> 
> Would it be correct to say the following?
> 
> 1. "-c" causes (appropriate) files in the newly transferred snapshot to
> be "hard linked" to existing files in another snapshot on the
> destination.

Technically, it's not a hard link but a reflink.  However, it's a 
reasonably accurate analogy for understanding the process, it's just at a 
different layer.

> Doesn't "-p" do something equivalent though?

Yes.  See below for the difference.

> 2. The -c and -p options can be used together or individually.

Yes.

> Questions:
> 
> If "-c" "will send all of the metadata of @B.1, but will leave out the
> data for @B.1/bigfile, because it's already in the backups filesystem,
> and can be reflinked from there" what will -p do in contrast?
> 
> Will "-p" not send all the metadata?
> 
> Will "-p" also leave out the data for @B.1/bigfile, when it's also
> already in the backups?

-c is less strict than -p, and sends more metadata over the wire as a 
result, but where the data is the same (reflink points to the same 
extent), it won't be sent in either case.  See below.

> What would make me choose one of these options over the other? I still
> struggle to see the difference.

What -p does is tell send that the named snapshot is a snapshot of an 
earlier state of the snapshot being sent, and that said earlier-state 
snapshot exists on both the send and receive end, so only the changes 
(both data and metadata) from the earlier snapshot must be sent.

Put a different way, the snapshot being sent is the parent, plus any 
changes since then, so to recreate the new snapshot, only the operations 
needed to update the state from the previous to the new state must be 
sent, and done by receive on the other end.

-c is less strict than -p.  It doesn't consider the named snapshot to be 
an earlier state of the snapshot being sent, but simply says that the two 
may have some data in common, as defined by reflinks to the same shared 
extents.

So -c will send more over the wire, in particular, it'll send much more 
metadata, I believe (being no dev or expert, just a list regular) 
essentially all metadata, because no claim as to the relationship of the 
metadata between the snapshot being sent and the clone is assumed.  But 
it can and does still assume that any extents reflinked in common can be 
simply sent by reference, instead of sending the literal data in that 
extent, because -c says the other end already has the snapshot named as a 
clone and that it can simply reflink it there, as well.

The wording of the manpage description for -c suggests that it picks one 
(and only one if there's more than one) -c clone and considers it a 
parent, which would allow it to shortcut sending the metadata in common 
for it as well, but not being a dev, I haven't looked at the code to be 
sure, and in any case, there can be only one parent, so it can do it for 
only one clone, even if there's more than one -c snapshot supplied.


So -p is primarily for the case where the named snapshot is an earlier 
state of the one being sent, and should be much more efficient than -c in 
that case.  However, the less strict -c should also work, and if the 
wording of the manpage can be believed, a single named -c snapshot will 
be treated as -p anyway.  But -c can also be used for snapshots that 
aren't related with one being an earlier state of the other, where 
there's simply some reflinks in common, perhaps due to dedup.  It should 
still result in the data with the common reflinks being only sent by 
reference, but much more metadata will be sent, and if there's not a lot 
of reflinks in common, it's likely to require enough additional 
processing that the relatively trivial amount of common reflinked data it 
might save may not be worth it, compared to simply sending a full non-
incremental snapshot.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help Recovering BTRFS array

2017-09-18 Thread Duncan
grondinm posted on Mon, 18 Sep 2017 14:14:08 -0300 as excerpted:

> superblock: bytenr=65536, device=/dev/md0
> -
> ERROR: bad magic on superblock on /dev/md0 at 65536
> 
> superblock: bytenr=67108864, device=/dev/md0
> -
> ERROR: bad magic on superblock on /dev/md0 at 67108864
> 
> superblock: bytenr=274877906944, device=/dev/md0
> -
> ERROR: bad magic on superblock on /dev/md0 at 274877906944
> 
> Now i'm really panicked. Is the FS toast? Can any recovery be attempted?

First I'm a user and list regular, not a dev.  With luck they can help 
beyond the below suggestions...

However, there's no need to panic in any case, due to the sysadmin's 
first rule of backups: The true value of any data is defined by the 
number of backups of that data you consider(ed) it worth having.

As a result, there are precisely two possibilities, neither one of which 
calls for panic.

1) No need to panic because you have a backup, and recovery is as simple 
as restoring from that backup.

2) You don't have a backup, in which case the lack of that backup means 
you have defined the value of the data as only trivial, worth less than 
the time/trouble/resources you saved by not making that backup.  Because 
the data is only of trivial value anyway, and you saved the more valuable 
assets of the time/trouble/resources you would have put into that backup 
were the data of more than trivial value, you've still saved the stuff 
you considered most valuable, so again, no need to panic.

It's a binary state.  There's no third possibility available, and no 
possibility you lost what your actions, or lack of them in the case of no 
backup, defined as of most value to you.

(As for the freshness of that backup, the same rule applies, but to the 
data delta between the state as of the backup and the current state.  If 
the value of the changed data is worth it to you to have it backed up, 
you'll have freshened your backup.  If not, you defined it to be as of 
such trivial value as to not be worth the time/trouble/resources to do 
so.)


That said, at the time you're calculating the value of the data against 
the value of the time/trouble/resources required to back it up, the loss 
potential remains theoretical.  Once something actually happens to the 
data, it's no longer theoretical, and the data, while of trivial enough 
value to be worth the risk when it was theoretical, may still be valuable 
enough to you to spend at least some time/trouble on trying to recover it.

In that case, since you can still mount, I'd suggest mounting read-only 
to prevent any further damage, and then do a copy off of the data you 
can, to a different, unaffected, filesystem.

Then if there's still data you want that you couldn't simply copy off, 
you can try btrfs restore.  While I do have backups here, a couple times 
when things went bad, btrfs restore was able to get back pretty much 
everything to current, while were I to have had to restore from backups, 
I'd have lost enough changed data to hurt, even if I had defined it as of 
trivial enough value when the risk remained theoretical that I hadn't yet 
freshened the backup.  (Since then I upgraded the rest of my storage to 
ssd, thus lowering the time and hassle cost of backups, encouraging me to 
do them more frequently.  Talking about which, I need to freshen them in 
the near future.  It's now on my list for my next day off...)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Duncan
Tomasz Chmielewski posted on Mon, 18 Sep 2017 18:27:09 +0900 as excerpted:

> And perhaps more important - can I assume that right now, with the
> latest stable kernel (4.13.2 right now), running "btrfs balance" is not
> safe and can lead to data corruption or loss?
> 
> 
> Consider the following case:
> 
> - system admin runs btrfs balance on a filesystem with 100 GB free and
> assumes it is enough space to complete successfully
> 
> - btrfs balance fails due to some bug with "No space left on device"
> 
> - at the same time, a database using this filesystem will fail with "No
> space left on device", apt/rpm will fail a package upgrade, some program
> using temp space will fail, log collector will fail to catch some data,
> because of "No space left on device" and so on?

To the best of my knowledge that shouldn't be a problem, certainly not 
one I'd worry about if you're following the sysadmin's first rule of 
backups, the true value of data to you is defined not by any claims but 
by the number of backups you consider it worth having of that data, so it 
follows that no backups means you've defined the data as worth less than 
the time/trouble/resources it would take to create at least that one 
backup.

The ENOSPC is because the internal calculation for the reserved-space 
requirement is buggy ATM, but AFAIK it's just that, an /internal/ 
calculation, that goes waayyy wild, and stops any action it's going to 
stop before it goes anywhere -- it doesn't get to the point of affecting 
anything else because the reserve space calculation goes wild and stops 
it before it can actually reserve the space.

Talking about which... I've not seen it mentioned in the bug discussion, 
but I wonder if doing a btrfs balance start -d, followed by a another 
balance with -m replacing the -d, thus separating the data and metadata 
balances, might work around the problem.  At least you could know for 
sure which is causing it that way, and complete a balance of the other 
one.  And if that blocks on one or the other, you could split the job up 
further using the devid= and drange= filters (see the btrfs-balance 
manpage), doing only part of the filesystem at a time.  My speculation is 
that you should be able to divide the operation up enough so that even if 
the reserve space calculation is off, it'll still complete.

Meanwhile, I don't believe it's just balance that's affected, either, tho 
it's the most commonly reported.  By my understanding, any sufficiently 
large operation could trigger it, tho obviously a full btrfs balance is 
about the largest operation a btrfs is likely to have, so it stands to 
reason that would trigger it more reliably than common generic filesystem 
operations.

Of course if you're paranoid, you can refrain from doing balances until 
you know the bug is fixed, but then I'd have to ask, if you're that 
paranoid of a filesystem failure, why are you running the still 
stabilizing, not yet entirely stable and mature, btrfs, in the first 
place?  Seems a bit like the folks still running RHEL/CentOS 6 with their 
stable kernels because they want stability, yet choosing to run the still 
not entirely stable btrfs, definitely not entirely stable on that old a 
kernel, on top of them.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: cleanup mount path

2017-09-18 Thread Misono, Tomohiro
Summary:
Cleanup mount path by avoiding calling btrfs_mount() twice.
This is for more understandable code and no functional change.

Explanation:
btrfs uses mount_subtree() to mount a subvolume directly.  This function
needs a vfsmount* of device's root (/), which is a return value of
vfs_kern_mount() (therefore root has to be mounted internally anyway).

Current approach of getting root's vfsmount* in mount time is a bit tricky:
1. mount systemcall calls vfs_kern_mount() on the way
2. btrfs_mount() is called 
3. btrfs_parse_early_options() parses "subvolid=" mount option and set the
   value to subvol_objectid. Otherwise, subvol_objectid has the initial
   value of 0
4. check subvol_objectid is 5 or not. This time id is not 5, and
   btrfs_mount() returns by calling mount_subvol()
5. In mount_subvol(), original mount options are modified to contain
   "subvolid=0" in setup_root_args(). Then, vfs_kern_mount() is called with
   this new options to get root's vfsmount*
6. btrfs_mount() is called again
7. btrfs_parse_early_options() parses "subvolid=0" and set 5 (instead of 0)
   to subvol_objectid
8. check subvol_objectid is 5 or not. This time id is 5 and mount_subvol()
   is not called. btrfs_mount() finishes mounting a root
9. (in mount_subvol()) with using a return vale of vfs_kern_mount(), it
   calls mount_subtree()
10 return subvolume's dentry

As illustrated above, calling btrfs_mount() twice complicates the problem.
Callback function of mount time (btrfs_mount()) is specified in struct
file_system_type which is passed to vfs_kern_mount(). Therefore, we can
avoid this by using another file_system_type for arguments of our
vfs_kern_mount() call. There is no need of modifying mount options.

In this approach: 
1. btrfs_mount() is called
2. parse "subvolid=" option and set the value to subvol_objectid
3. mount device's root by calling vfs_kern_mount() with different
   file_system_type specified. Then, different callback function is called
   (mount_root()). Most of this new function is the same as the original
   btrfs_mount()
4. return by calling mount_subtree()

I think this approach is the same as nfsv4, which is the only other
filesystem using mount_subtree() currently, and easy to understand.

Most of the change is done by just reorganizing the original code of
btrfs_mount()/mount_subvol() into btrfs_mount()/mount_subvol()/mount_root()

btrfs_parse_early_options() is split into two parts to avoid "device="
option will be handled twice (though it cause no harm). setup_root_args()
is deleted as not needed anymore.


Signed-off-by: Tomohiro Misono 
---
 fs/btrfs/super.c | 226 ++-
 1 file changed, 123 insertions(+), 103 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 12540b6..3a183c0 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -66,6 +66,7 @@
 
 static const struct super_operations btrfs_super_ops;
 static struct file_system_type btrfs_fs_type;
+static struct file_system_type btrfs_root_fs_type;
 
 static int btrfs_remount(struct super_block *sb, int *flags, char *data);
 
@@ -447,7 +448,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
case Opt_subvolrootid:
case Opt_device:
/*
-* These are parsed by btrfs_parse_early_options
+* These are parsed by btrfs_parse_subvol_options
+* and btrfs_parse_early_options
 * and can be happily ignored here.
 */
break;
@@ -854,11 +856,58 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
  * only when we need to allocate a new super block.
  */
 static int btrfs_parse_early_options(const char *options, fmode_t flags,
-   void *holder, char **subvol_name, u64 *subvol_objectid,
-   struct btrfs_fs_devices **fs_devices)
+   void *holder, struct btrfs_fs_devices **fs_devices)
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
+   int error = 0;
+
+   if (!options)
+   return 0;
+
+   /*
+* strsep changes the string, duplicate it because btrfs_parse_options
+* gets called later
+*/
+   opts = kstrdup(options, GFP_KERNEL);
+   if (!opts)
+   return -ENOMEM;
+   orig = opts;
+
+   while ((p = strsep(, ",")) != NULL) {
+   int token;
+   if (!*p)
+   continue;
+
+   token = match_token(p, tokens, args);
+   switch (token) {
+   case Opt_device:
+   device_name = match_strdup([0]);
+   if (!device_name) {
+   error = -ENOMEM;
+   goto out;
+   }
+   error = 

difference between -c and -p for send-receive?

2017-09-18 Thread Dave
new subject for new question

On Mon, Sep 18, 2017 at 1:37 PM, Andrei Borzenkov  wrote:

> >> What scenarios can lead to "ERROR: parent determination failed"?
> >
> > The man page for btrfs-send is reasonably clear on the requirements
> > btrfs imposes. If you want to use incremental sends (i.e. the -c or -p
> > options) then the specified snapshots must exist on both the source and
> > destination. If you don't have a suitable existing snapshot then don't
> > use -c or -p and just do a full send.
> >
>
> Well, I do not immediately see why -c must imply incremental send. We
> want to reduce amount of data that is transferred, so reuse data from
> existing snapshots, but it is really orthogonal to whether we send full
> subvolume or just changes since another snapshot.
>

Starting months ago when I began using btrfs serious, I have been
reading, rereading and trying to understand this:

FAQ - btrfs Wiki
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F

The comment above suddenly gives me another clue...

However, I still don't understand terms like "clone range ioctl",
although I can guess it is something like a hard link.

Would it be correct to say the following?

1. "-c" causes (appropriate) files in the newly transferred snapshot
to be "hard linked" to existing files in another snapshot on the
destination. Doesn't "-p" do something equivalent though?

2. The -c and -p options can be used together or individually.

Questions:

If "-c" "will send all of the metadata of @B.1, but will leave out the
data for @B.1/bigfile, because it's already in the backups filesystem,
and can be reflinked from there" what will -p do in contrast?

Will "-p" not send all the metadata?

Will "-p" also leave out the data for @B.1/bigfile, when it's also
already in the backups?

What would make me choose one of these options over the other? I still
struggle to see the difference.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix unexpected result when dio reading corrupted blocks

2017-09-18 Thread Goffredo Baroncelli
On 09/15/2017 11:06 PM, Liu Bo wrote:
> commit 4246a0b63bd8 ("block: add a bi_error field to struct bio")
> changed the logic of how dio read endio reports errors.
> 
> For single stripe dio read, %bio->bi_status reflects the error before
> verifying checksum, and now we're updating it when data block matches
> with its checksum, while in the mismatching case, %bio->bi_status is
> not updated to relfect that.
> 
> When some blocks in a file have been corrupted on disk, reading such a
> file ends up with
> 
> 1) checksum errros are reported in kernel log
> 2) read(2) returns successfully with some content being 0x01.
> 
> In order to fix it, we need to report its checksum mismatch error to
> the upper layer (dio layer in this case) as well.
> 
> Signed-off-by: Liu Bo 
> Reported-by: Goffredo Baroncelli 
> cc: Goffredo Baroncelli 

Tested-by: Goffredo Baroncelli 

> ---
>  fs/btrfs/inode.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 24bcd5c..a46799e 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8267,11 +8267,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
>   struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
>   blk_status_t err = bio->bi_status;
>  
> - if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED) {
> + if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED)
>   err = btrfs_subio_endio_read(inode, io_bio, err);
> - if (!err)
> - bio->bi_status = 0;
> - }
>  
>   unlock_extent(_I(inode)->io_tree, dip->logical_offset,
> dip->logical_offset + dip->bytes - 1);
> @@ -8279,7 +8276,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
>  
>   kfree(dip);
>  
> - dio_bio->bi_status = bio->bi_status;
> + dio_bio->bi_status = err;
>   dio_end_io(dio_bio);
>  
>   if (io_bio->end_io)
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent_io.c:1989

2017-09-18 Thread Kai Krakow
Am Mon, 18 Sep 2017 20:30:41 +0200
schrieb Holger Hoffstätte :

> On 09/18/17 19:09, Liu Bo wrote:
> > This 'mirror 0' looks fishy, (as mirror comes from
> > btrfs_io_bio->mirror_num, which should be at least 1 if raid1 setup
> > is in use.)
> > 
> > Not sure if 4.13.2-gentoo made any changes on btrfs, but can you  
> 
> No, it did not; Gentoo always strives to be as close to mainline as
> possible except for urgent security & low-risk convenience fixes.

According to
https://dev.gentoo.org/~mpagano/genpatches/patches-4.13-2.htm
it's not only security patches.

But as the list shows, there are indeed no btrfs patches. But there's
one that may change btrfs behavior (tho unlikely), that is enabling
native gcc optimizations if you choose so. I don't think that's a
default option in Gentoo.

I'm using native optimizations myself and see no strange mirror issues
in btrfs. OTOH, I've lately switched to gentoo ck patchset to get
better optimizations for gaming and realtime apps. But it's still at
the 4.12 series.

Are you sure the system crashed and wasn't just stuck at reading from
the disks? If the disks have error correction and recovery enabled, the
Linux block layer times out on the requests that the drives eventually
won't fix anyways and resets the link after 30s. The drive timeout is
120s by default.

You can change that on enterprise grade and NAS-ready drives, also a
handful of desktop drives support it. Smartctl is used to set the
values, just google "smartctl scterc". You could also adjust the
timeout of the scsi layer to above the drive timeout, that means more
than 120s if you cannot change scterc. I think it makes most sense to
not reset the link before the drive had its chance to answer the
request.

I think there are pros and cons of changing these values. I always
recommend to increase the scsi timeout above the scterc timeout.
Personally, I lower the scterc timeout to 70 centisecs, and let the
scsi timeout just at its default. RAID setups should use this to get
control of their own error correction methods: The drive returns from
request early and the RAID can do its job of reading from another copy,
i.e. btrfs or mdraid, then repair it by writing back a correct copy
which the drive converts into a sector relocation aka self-repair.

Other people may jump in and recommend their own perspective of why or
why not change which knob to which value.

But well, as long as you saw no scsi errors reported when the "crash"
occurred, these values are not involved in your problem anyways.

What about "btrfs device stats"?


-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent_io.c:1989

2017-09-18 Thread Holger Hoffstätte
On 09/18/17 19:09, Liu Bo wrote:
> This 'mirror 0' looks fishy, (as mirror comes from
> btrfs_io_bio->mirror_num, which should be at least 1 if raid1 setup is
> in use.)
> 
> Not sure if 4.13.2-gentoo made any changes on btrfs, but can you

No, it did not; Gentoo always strives to be as close to mainline as
possible except for urgent security & low-risk convenience fixes.

-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent_io.c:1989

2017-09-18 Thread Liu Bo
On Mon, Sep 18, 2017 at 08:55:29AM +, Paul Jones wrote:
> Hi
>  I have a system that crashed during a defrag, upon reboot I got the 
> following trace while resuming the defrag.
> Filesystem is BTRFS Raid1 on lvm+cache, kernel 4.13.2
> Check --repair gives lots of warnings about parent transid verify failed, but 
> otherwise completes without issue.
> 
> Ran scrub which seems to have fixed most of the issues without crashing:
> 
> scrub status for d844164a-239e-4f37-9126-d3b2f3ab72be
> scrub started at Mon Sep 18 15:59:05 2017 and finished after 02:04:00
> total bytes scrubbed: 2.22TiB with 22890 errors
> error details: verify=1078 csum=21812
> corrected errors: 22886, uncorrectable errors: 4, unverified errors: 0
> 
> I'll see how it goes when I use rsync to verify from the other backup.
> 
> Thanks,
> Paul.
> 
> <...>
> [  136.376559] BTRFS info (device dm-15): read error corrected: ino 0 off 
> 6517887115264 (dev /dev/mapper/lvmB-backup--b sector 178527312)
> [  136.376659] BTRFS info (device dm-15): read error corrected: ino 0 off 
> 6517887119360 (dev /dev/mapper/lvmB-backup--b sector 178527320)
> [  174.761517] BTRFS warning (device dm-15): csum failed root 7692 ino 534939 
> off 5639217152 csum 0xdbbb090f expected csum 0x74d6a9b2 mirror 0
> [  174.761800] BTRFS warning (device dm-15): csum failed root 7692 ino 534939 
> off 5639217152 csum 0xdbbb090f expected csum 0x74d6a9b2 mirror 0
> [  174.761838] BTRFS warning (device dm-15): csum failed root 7692 ino 534939 
> off 5639217152 csum 0xdbbb090f expected csum 0x74d6a9b2 mirror 0
> [  174.761880] BTRFS warning (device dm-15): csum failed root 7692 ino 534939 
> off 5639217152 csum 0xdbbb090f expected csum 0x74d6a9b2 mirror 0
> [  174.761924] BTRFS warning (device dm-15): csum failed root 7692 ino 534939 
> off 5639217152 csum 0xdbbb090f expected csum 0x74d6a9b2 mirror 0

This 'mirror 0' looks fishy, (as mirror comes from
btrfs_io_bio->mirror_num, which should be at least 1 if raid1 setup is
in use.)

Not sure if 4.13.2-gentoo made any changes on btrfs, but can you
please verify with the upstream kernel, say, v4.13?


Thanks,

-liubo

> [  174.761986] [ cut here ]
> [  174.761987] kernel BUG at fs/btrfs/extent_io.c:1989!
> [  174.761989] invalid opcode:  [#1] SMP
> [  174.762034] Modules linked in: cls_u32 sch_htb sch_sfq nf_conntrack_pptp 
> nf_conntrack_proto_gre nf_conntrack_sane nf_conntrack_sip ts_kmp 
> nf_conntrack_amanda nf_conntrack_snmp nf_conntrack_h323 
> nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_tftp 
> nf_conntrack_ftp nf_conntrack_irc xt_NETMAP xt_TCPMSS xt_CHECKSUM 
> ipt_rpfilter xt_DSCP xt_dscp xt_statistic xt_CT xt_AUDIT xt_NFLOG xt_time 
> xt_connlimit xt_realm xt_NFQUEUE xt_tcpmss xt_addrtype xt_pkttype iptable_raw 
> xt_TPROXY nf_defrag_ipv6 xt_CLASSIFY xt_mark xt_hashlimit xt_comment 
> xt_length xt_connmark xt_owner xt_recent xt_iprange xt_physdev xt_policy 
> iptable_mangle xt_nat xt_multiport xt_conntrack ipt_REJECT nf_reject_ipv4 
> ipt_MASQUERADE nf_nat_masquerade_ipv4 ipt_ECN ipt_CLUSTERIP ipt_ah 
> iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> [  174.762144]  nf_nat
> [  174.762145] [ cut here ]
> [  174.762145] kernel BUG at fs/btrfs/extent_io.c:1989!
> [  174.762267] [ cut here ]
> [  174.762268] kernel BUG at fs/btrfs/extent_io.c:1989!
> [  174.762334] [ cut here ]
> [  174.762335] kernel BUG at fs/btrfs/extent_io.c:1989!
> [  174.762487]  iptable_filter ip_tables nfsd auth_rpcgss oid_registry 
> nfs_acl binfmt_misc dm_cache_smq dm_cache dm_persistent_data dm_bufio 
> dm_bio_prison k10temp intel_powerclamp coretemp pcbc hwmon_vid iTCO_wdt 
> iTCO_vendor_support aesni_intel crypto_simd cryptd glue_helper pcspkr lpc_ich 
> i2c_i801 mfd_core xts aes_x86_64 cbc sha512_generic iscsi_tcp libiscsi_tcp 
> libiscsi scsi_transport_iscsi ixgb macvlan igb dca i2c_algo_bit e1000 atl1c 
> fuse nfs lockd grace sunrpc dm_mirror dm_region_hash dm_log dm_mod 
> hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_gyration 
> xhci_plat_hcd ohci_pci ohci_hcd uhci_hcd usb_storage megaraid_sas 
> megaraid_mbox megaraid_mm megaraid mptsas scsi_transport_sas mptspi 
> scsi_transport_spi mptscsih mptbase sata_inic162x ata_piix sata_nv sata_sil24 
> pata_jmicron pata_amd pata_mpiix
> [  174.762629]  usbhid ahci libahci xhci_pci r8169 ehci_pci xhci_hcd mii 
> ehci_hcd
> [  174.762682] CPU: 5 PID: 6683 Comm: kworker/u16:22 Not tainted 
> 4.13.2-gentoo #5
> [  174.762730] Hardware name: System manufacturer System Product Name/P8Z68-V 
> LE, BIOS 4101 05/09/2013
> [  174.762786] Workqueue: btrfs-endio btrfs_endio_helper
> [  174.762833] task: 8803e315d240 task.stack: c98c4000
> [  174.762883] RIP: 0010:repair_io_failure+0x1b5/0x200
> [  174.762930] RSP: 0018:c98c7c78 EFLAGS: 00010246
> [  174.762978] RAX: 8803b71ba480 RBX:  RCX: 
> 

Re: cp --reflink and qgroup limit

2017-09-18 Thread Antoine Belvire

Hello Qu,

Le 18/09/2017 à 01:33, Qu Wenruo a écrit :

That's a bug, and should be fixed.


OK.


But it's a little complicated to fix.
The biggest problem is, until reflink is done and we commit a 
transaction, we don't know if the the reflink operation will increase 
"rfer" or not.
(Considering a case where both reflink source and destination is inside 
the same subvolume)


Yes, we can always try to reserve the size of the data, and cause false 
alert, but the handling of reserved data range will be another problem.


I'll check if we can solve it.


Thanks for your explanation. I'm looking forward to hearing good news 
about this then :)


Regards,

--
Antoine
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR: parent determination failed (btrfs send-receive)

2017-09-18 Thread Andrei Borzenkov
18.09.2017 11:45, Graham Cobb пишет:
> On 18/09/17 07:10, Dave wrote:
>> For my understanding, what are the restrictions on deleting snapshots?
>>
>> What scenarios can lead to "ERROR: parent determination failed"?
> 
> The man page for btrfs-send is reasonably clear on the requirements
> btrfs imposes. If you want to use incremental sends (i.e. the -c or -p
> options) then the specified snapshots must exist on both the source and
> destination. If you don't have a suitable existing snapshot then don't
> use -c or -p and just do a full send.
> 

Well, I do not immediately see why -c must imply incremental send. We
want to reduce amount of data that is transferred, so reuse data from
existing snapshots, but it is really orthogonal to whether we send full
subvolume or just changes since another snapshot.

>> I use snap-sync to create and send snapshots.
>>
>> GitHub - wesbarnett/snap-sync: Use snapper snapshots to backup to external 
>> drive
>> https://github.com/wesbarnett/snap-sync
> 
> I am not familiar with this tool. Your question should be sent to the
> author of the tool, if that is what is deciding what -p and -c options
> are being used.
> 

I am not sure how it could come to this error. I looked on more or less
default installation of openSUSE here and all snapper snapshots have as
parent UUID the subvolume that is mounted as root (by default only one
configuration for root subvolume exists). So it is not possible to
remove this subvolume, unless some rollback to another snapshot was
performed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix unexpected result when dio reading corrupted blocks

2017-09-18 Thread Liu Bo
On Sat, Sep 16, 2017 at 01:58:34PM +0200, Goffredo Baroncelli wrote:
> On 09/15/2017 11:06 PM, Liu Bo wrote:
> > commit 4246a0b63bd8 ("block: add a bi_error field to struct bio")
> > changed the logic of how dio read endio reports errors.
> > 
> > For single stripe dio read, %bio->bi_status reflects the error before
> > verifying checksum, and now we're updating it when data block matches
> > with its checksum, while in the mismatching case, %bio->bi_status is
> > not updated to relfect that.
> > 
> > When some blocks in a file have been corrupted on disk, reading such a
> > file ends up with
> > 
> > 1) checksum errros are reported in kernel log
> > 2) read(2) returns successfully with some content being 0x01.
> > 
> > In order to fix it, we need to report its checksum mismatch error to
> > the upper layer (dio layer in this case) as well.
> 
> I tested it, and now it works: even using O_DIRECT -EIO is returned if the 
> file is corrupted.
> 
> ghigo@venice:~/btrfs/crash-o-direct/t$ ls -li
> total 16384
> 257 -rw-r--r-- 1 root root 16777216 Sep 15 20:51 abcd
> ghigo@venice:~/btrfs/crash-o-direct/t$ date
> Sat Sep 16 13:56:26 CEST 2017
> 
> ghigo@venice:~/btrfs/crash-o-direct/t$ cat abcd 
> cat: abcd: Input/output error
> ghigo@venice:~/btrfs/crash-o-direct/t$ dmesg -T | tail -1
> [Sat Sep 16 13:56:29 2017] BTRFS warning (device sdd5): csum failed root 5 
> ino 257 off 0 csum 0x98f94189 expected csum 0x0ab6be80 mirror 1
> 
> ghigo@venice:~/btrfs/crash-o-direct/t$ dd if=abcd iflag=direct
> dd: error reading 'abcd': Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes copied, 0.000404156 s, 0.0 kB/s
> ghigo@venice:~/btrfs/crash-o-direct/t$ dmesg -T | tail -1
> [Sat Sep 16 13:56:41 2017] BTRFS warning (device sdd5): csum failed root 5 
> ino 257 off 0 csum 0x98f94189 expected csum 0x0ab6be80 mirror 1
>

Thanks a lot, any chance I can get your 'Tested-by' tag?

Thanks,

-liubo
> 
> 
> 
> > 
> > Signed-off-by: Liu Bo 
> > Reported-by: Goffredo Baroncelli 
> > cc: Goffredo Baroncelli 
> > ---
> >  fs/btrfs/inode.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 24bcd5c..a46799e 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -8267,11 +8267,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
> > struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
> > blk_status_t err = bio->bi_status;
> >  
> > -   if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED) {
> > +   if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED)
> > err = btrfs_subio_endio_read(inode, io_bio, err);
> > -   if (!err)
> > -   bio->bi_status = 0;
> > -   }
> >  
> > unlock_extent(_I(inode)->io_tree, dip->logical_offset,
> >   dip->logical_offset + dip->bytes - 1);
> > @@ -8279,7 +8276,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
> >  
> > kfree(dip);
> >  
> > -   dio_bio->bi_status = bio->bi_status;
> > +   dio_bio->bi_status = err;
> > dio_end_io(dio_bio);
> >  
> > if (io_bio->end_io)
> > 
> 
> 
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli 
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix unexpected result when dio reading corrupted blocks

2017-09-18 Thread Liu Bo
On Mon, Sep 18, 2017 at 03:49:34PM +0200, Holger Hoffstätte wrote:
> 
> Hello, quick question for backporting..
> 
> On 09/15/17 23:06, Liu Bo wrote:
> > commit 4246a0b63bd8 ("block: add a bi_error field to struct bio")
> > changed the logic of how dio read endio reports errors.
> [snip]
> 
> I've tried to merge this into my 4.9.x++ tree but have a question since
> the DIO APIs changed recently and itt's hard to tell what is a bug
> and what is a feature.. :-/
>

Good point, hopefully it's the last change on those API.

> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -8267,11 +8267,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
> > struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
> > blk_status_t err = bio->bi_status;
> >  
> > -   if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED) {
> > +   if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED)
> > err = btrfs_subio_endio_read(inode, io_bio, err);
> > -   if (!err)
> > -   bio->bi_status = 0;
> > -   }
> >  
> > unlock_extent(_I(inode)->io_tree, dip->logical_offset,
> >   dip->logical_offset + dip->bytes - 1);
> 
> This hunk is fairly easy, just reverse bi_status to bi->error.
> However..
> 
> > @@ -8279,7 +8276,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
> >  
> > kfree(dip);
> >  
> > -   dio_bio->bi_status = bio->bi_status;
> > +   dio_bio->bi_status = err;
> > dio_end_io(dio_bio);
> ^^^
> 
> Same here, except that the call to dio_end_io used to take a second parameter
> (the error code, which has been moved into bi_status in 4.10+) and looked
> like this:
> 
> dio_end_io(dio_bio, bio->bi_error);
> 
> Given that "bio->bi_error" should have been "err" instead, I think err should
> also be passed to dio_end_io(), so that the whole hunk would look like:
> 
> ..
> -dio_bio->bi_error = bio->bi_error;
> -dio_end_io(dio_bio, bio->bi_error);
> +dio_bio->bi_error = err;
> +dio_end_io(dio_bio, err);
> ..
> 
> Would this be correct or did I misunderstand some subtle aspect about the
> DIO error handling?

Yes, the diff looks good to me.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Help Recovering BTRFS array

2017-09-18 Thread grondinm
Hello,

I will try to provide all information pertinent to the situation i find myself 
in.

Yesterday while trying to write some data to a BTRFS filesystem on top of a 
mdadm raid5 array encrypted with dmcrypt comprising of 4 1tb HDD my system 
became unresponsive and i had no choice but to hard reset. System came back up 
no problem and the array in question mounted without a complaint. Once i tried 
to write data to it again however the system became unresponsive again and 
required another hard reset. Again system came back up and everything mounted 
with no complaints.

This time i decided to run some checks. Ran a raid check by issuing 'echo check 
> /sys/block/md0/md/sync_action'. This completed without a single error. So i 
performed a proper restart just because and once the system came back up i 
initiated a scrub on the btrfs filesystem. This greeted me with my first 
indication that something is wrong:

btrfs sc stat /media/Storage2 
scrub status for e5bd5cf3-c736-48ff-b1c6-c9f678567788
scrub started at Mon Sep 18 06:05:21 2017, running for 07:40:47
total bytes scrubbed: 1.03TiB with 1 errors
error details: super=1
corrected errors: 0, uncorrectable errors: 0, unverified errors: 0

I was concerned but since it was still scrubbing i left it. Now things look 
really bleak... 

Every few minutes the scrub process goes into a D status as shown by htop it 
eventually keeps going and as far as i can see is still scrubbing(slowly). I 
decided to check a something else(based on the error above) I ran btrfs 
inspect-internal dump-super -a -f /dev/md0 which gave me this:

superblock: bytenr=65536, device=/dev/md0 
-
ERROR: bad magic on superblock on /dev/md0 at 65536

superblock: bytenr=67108864, device=/dev/md0
-
ERROR: bad magic on superblock on /dev/md0 at 67108864

superblock: bytenr=274877906944, device=/dev/md0
-
ERROR: bad magic on superblock on /dev/md0 at 274877906944

Now i'm really panicked. Is the FS toast? Can any recovery be attempted?

Here is the output of dump-super with the -F option:

superblock: bytenr=65536, device=/dev/md0
-
csum_type   43668 (INVALID)
csum_size   32
csum
0x76c647b04abf1057f04e40d1dc52522397258064b98a1b8f6aa6934c74c0dd55 [DON'T MATCH]
bytenr  6376050623103086821
flags   0x7edcc412b742c79f
( WRITTEN |
  RELOC |
  METADUMP |
  unknown flag: 0x7edcc410b742c79c )
magic   ..l~...q [DON'T MATCH]
fsid2cf827fa-7ab8-e290-b152-1735c2735a37
label   
.a.9.@.=4.#.|.D...]..dh=d,..k..n..~.5.i.8...(.._.tl.a.@..2..qidj.>Hy.U..{X5.kG0.)t..;/.2...@.T.|.u.<.`!J*9./8...&.g\.V...*.,/95.uEs..W.i..z..h...n(...VGn^F...H...5.DT..3.A..mK...~..}.1..n.
generation  1769598730239175261
root14863846352370317867
sys_array_size  1744503544
chunk_root_generation   18100024505086712407
root_level  79
chunk_root  10848092274453435018
chunk_root_level156
log_root7514172289378668244
log_root_transid6227239369566282426
log_root_level  18
total_bytes 5481087866519986730
bytes_used  13216280034370888020
sectorsize  4102056786
nodesize1038279258
leafsize276348297
stripesize  2473897044
root_dir12090183195204234845
num_devices 12836127619712721941
compat_flags0xf98ff436fc954bd4
compat_ro_flags 0x3fe8246616164da7
( FREE_SPACE_TREE |
  FREE_SPACE_TREE_VALID |
  unknown flag: 0x3fe8246616164da4 )
incompat_flags  0x3989a5037330bfd8
( COMPRESS_LZO |
  COMPRESS_LZOv2 |
  EXTENDED_IREF |
  RAID56 |
  SKINNY_METADATA |
  NO_HOLES |
  unknown flag: 0x3989a5037330bc10 )
cache_generation10789185961859482334
uuid_tree_generation14921288820846890813
dev_item.uuid   e6e382b3-de66-4c25-7cc9-3cc43cde9c24
dev_item.fsid   f8430e37-12ca-adaf-b038-f0ee10ce6327 [DON'T MATCH]
dev_item.type   7909001383421391155
dev_item.total_bytes4839925749276763097
dev_item.bytes_used 14330418354255459170
dev_item.io_align   4136652250
dev_item.io_width   1113335506
dev_item.sector_size1197062542
dev_item.devid  16559830033162408461
dev_item.dev_group  3271056113

Re: [PATCH 2/2] Remove misleading BCP 78 boilerplate

2017-09-18 Thread David Sterba
On Sun, Sep 17, 2017 at 07:52:27PM -0400, Nicholas D Steeves wrote:
> BCP 78 applies to RFC 6234, but sha224-256.c is Simplified BSD.
> 
> This causes the following lintian error when building on Debian and
> Debian derivatives:
> 
> E: btrfs-progs source: license-problem-non-free-RFC-BCP78
>tests/sha224-256.c
> 
> Please consult the following email from debian-le...@lists.debian.org
> for more information:
> 
> https://lists.debian.org/debian-legal/2017/08/msg4.html

Thanks, this looks like I've copied too much from the RFC and was not
aware of the BCP license issues. I believe the copyright notice(s) past
the line mentioning the filename(s) should be enough to satisfy the
licensing requirements and also the debian license checker.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix unexpected result when dio reading corrupted blocks

2017-09-18 Thread Holger Hoffstätte

Hello, quick question for backporting..

On 09/15/17 23:06, Liu Bo wrote:
> commit 4246a0b63bd8 ("block: add a bi_error field to struct bio")
> changed the logic of how dio read endio reports errors.
[snip]

I've tried to merge this into my 4.9.x++ tree but have a question since
the DIO APIs changed recently and itt's hard to tell what is a bug
and what is a feature.. :-/

> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8267,11 +8267,8 @@ static void btrfs_endio_direct_read(struct bio *bio)
>   struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
>   blk_status_t err = bio->bi_status;
>  
> - if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED) {
> + if (dip->flags & BTRFS_DIO_ORIG_BIO_SUBMITTED)
>   err = btrfs_subio_endio_read(inode, io_bio, err);
> - if (!err)
> - bio->bi_status = 0;
> - }
>  
>   unlock_extent(_I(inode)->io_tree, dip->logical_offset,
> dip->logical_offset + dip->bytes - 1);

This hunk is fairly easy, just reverse bi_status to bi->error.
However..

> @@ -8279,7 +8276,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
>  
>   kfree(dip);
>  
> - dio_bio->bi_status = bio->bi_status;
> + dio_bio->bi_status = err;
>   dio_end_io(dio_bio);
^^^

Same here, except that the call to dio_end_io used to take a second parameter
(the error code, which has been moved into bi_status in 4.10+) and looked
like this:

dio_end_io(dio_bio, bio->bi_error);

Given that "bio->bi_error" should have been "err" instead, I think err should
also be passed to dio_end_io(), so that the whole hunk would look like:

..
-dio_bio->bi_error = bio->bi_error;
-dio_end_io(dio_bio, bio->bi_error);
+dio_bio->bi_error = err;
+dio_end_io(dio_bio, err);
..

Would this be correct or did I misunderstand some subtle aspect about the
DIO error handling?

Thanks :)

Holger
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Tomasz Chmielewski

On 2017-09-18 22:44, Peter Becker wrote:

i'm not sure if it would help, but maybe you could try adding an 8GB
(or more) USB flash drive to the pool and try to start balance.
if it works out, you can throw him out of the pool after that.


I really can't, it's an "online server".

But I've removed some 65 GB data, so now it's 171 GB free, or, 60% used 
filesystem.


The balance still fails.


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Peter Becker
i'm not sure if it would help, but maybe you could try adding an 8GB
(or more) USB flash drive to the pool and try to start balance.
if it works out, you can throw him out of the pool after that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using fio to test btrfs compression

2017-09-18 Thread Timofey Titovets
2017-09-18 16:28 GMT+03:00 shally verma :
> On Mon, Sep 18, 2017 at 1:56 PM, Timofey Titovets  
> wrote:
>> 2017-09-18 10:36 GMT+03:00 shally verma :
>>> Hi
>>>
>>> I wanted to test btrfs compression using fio command but somehow
>>> during fio writes, I don't see code taking route of compression blocks
>>> where as If I do a copy to btrfs compression enabled mount point then
>>> I can easily see code falling through compression.c.
>>>
>>> Here's how I do my setup
>>>
>>> 1. mkfs.btrfs /dev/sdb1
>>> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt
>>> 3. cp  /mnt
>>> 4. dmesg shows print staments from compression.c and zlib.c confirming
>>> compression routine was invoked during write
>>> 5. now, copy back from btrfs mount point to home directory also shows
>>> decompress call invokation
>>>
>>> Now, try same with fio commands:
>>>
>>> fio command
>>>
>>> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1
>>> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128
>>> --name=test --size=10G --runtime=180 --time_based
>>>
>>> But it seems to write uncompressed data.
>>>
>>> Any help here? what's missing?
>>>
>>> Thanks
>>> Shally
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib
>> 2. Tune fio to generate compressible data
>>
> How do I "tune" fio to generate data. I had assumed once compression
> is enabled on btrfs any system fwrite call will simply compress data
> into it .Isn't it so?
> Can you share fio command that I can test?
> Thanks
> Shally
>>
>> --
>> Have a nice day,
>> Timofey.

That useless to compress uncompressible data.
Also, as you enable compress, not compress-force
So after first uncompressible write btrfs just stop trying compress that file.

>From man fio:
buffer_compress_percentage=int
 If this is set, then fio will attempt to provide I/O
buffer content (on WRITEs) that compresses to the specified level. Fio
does this by providing a mix of random data and a fixed  pattern.  The
 fixed  pattern  is  either
 zeros,  or the pattern specified by buffer_pattern. If
the pattern option is used, it might skew the compression ratio
slightly. Note that this is per block size unit, for file/disk wide
compression level that matches this
 setting, you'll also want to set refill_buffers.

  buffer_compress_chunk=int
 See buffer_compress_percentage. This setting allows fio
to manage how big the ranges of random data and zeroed data is.
Without this set, fio will provide buffer_compress_percentage of
blocksize random  data,  followed  by
 the remaining zeroed. With this set to some chunk size
smaller than the block size, fio can alternate random and zeroed data
throughout the I/O buffer.

Good luck :)
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using fio to test btrfs compression

2017-09-18 Thread shally verma
On Mon, Sep 18, 2017 at 1:56 PM, Timofey Titovets  wrote:
> 2017-09-18 10:36 GMT+03:00 shally verma :
>> Hi
>>
>> I wanted to test btrfs compression using fio command but somehow
>> during fio writes, I don't see code taking route of compression blocks
>> where as If I do a copy to btrfs compression enabled mount point then
>> I can easily see code falling through compression.c.
>>
>> Here's how I do my setup
>>
>> 1. mkfs.btrfs /dev/sdb1
>> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt
>> 3. cp  /mnt
>> 4. dmesg shows print staments from compression.c and zlib.c confirming
>> compression routine was invoked during write
>> 5. now, copy back from btrfs mount point to home directory also shows
>> decompress call invokation
>>
>> Now, try same with fio commands:
>>
>> fio command
>>
>> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1
>> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128
>> --name=test --size=10G --runtime=180 --time_based
>>
>> But it seems to write uncompressed data.
>>
>> Any help here? what's missing?
>>
>> Thanks
>> Shally
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> 1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib
> 2. Tune fio to generate compressible data
>
How do I "tune" fio to generate data. I had assumed once compression
is enabled on btrfs any system fwrite call will simply compress data
into it .Isn't it so?
Can you share fio command that I can test?
Thanks
Shally
>
> --
> Have a nice day,
> Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm VM died during partial raid1 problems of btrfs

2017-09-18 Thread Adam Borowski
On Wed, Sep 13, 2017 at 08:21:01AM -0400, Austin S. Hemmelgarn wrote:
> On 2017-09-12 17:13, Adam Borowski wrote:
> > On Tue, Sep 12, 2017 at 04:12:32PM -0400, Austin S. Hemmelgarn wrote:
> > > On 2017-09-12 16:00, Adam Borowski wrote:
> > > > Noted.  Both Marat's and my use cases, though, involve VMs that are off 
> > > > most
> > > > of the time, and at least for me, turned on only to test something.
> > > > Touching mtime makes rsync run again, and it's freaking _slow_: worse 
> > > > than
> > > > 40 minutes for a 40GB VM (source:SSD target:deduped HDD).
> > > 40 minutes for 40GB is insanely slow (that's just short of 18 MB/s) if
> > > you're going direct to a hard drive.  I get better performance than that 
> > > on
> > > my somewhat pathetic NUC based storage cluster (I get roughly 20 MB/s 
> > > there,
> > > but it's for archival storage so I don't really care).  I'm actually 
> > > curious
> > > what the exact rsync command you are using is (you can obviously redact
> > > paths as you see fit), as the only way I can think of that it should be 
> > > that
> > > slow is if you're using both --checksum (but if you're using this, you can
> > > tell rsync to skip the mtime check, and that issue goes away) and 
> > > --inplace,
> > > _and_ your HDD is slow to begin with.
> >
> > rsync -axX --delete --inplace --numeric-ids /mnt/btr1/qemu/ 
> > mordor:$BASE/qemu
> > The target is single, compress=zlib SAMSUNG HD204UI, 34976 hours old but
> > with nothing notable on SMART, in a Qnap 253a, kernel 4.9.
> compress=zlib is probably your biggest culprit.  As odd as this sounds, I'd
> suggest switching that to lzo (seriously, the performance difference is
> ludicrous), and then setting up a cron job (or systemd timer) to run defrag
> over things to switch to zlib.  As a general point of comparison, we do
> archival backups to a file server running BTRFS where I work, and the
> archiving process runs about four to ten times faster if we take this
> approach (LZO for initial compression, then recompress using defrag once the
> initial transfer is done) than just using zlib directly.

Turns out that lzo is actually the slowest, but only by a bit.

I tried a different disk, in the same Qnap; also an old disk but 7200 rpm
rather than 5400.  Mostly empty, only a handful subvolumes, not much
reflinking.  I made three separate copies, fallocated -d, upgraded Windows
inside the VM, then:

[/mnt/btr1/qemu]$ for x in none lzo zlib;do time rsync -axX --delete --inplace 
--numeric-ids win10.img mordor:/SOME/DIR/$x/win10.img;done

real31m37.459s
user27m21.587s
sys 2m16.210s

real33m28.258s
user27m19.745s
sys 2m17.642s

real32m57.058s
user27m24.297s
sys 2m17.640s

Note the "user" values.  So rsync does something bad on the source side.

Despite fragmentation, reads on the source are not a problem:

[/mnt/btr1/qemu]$ time cat /dev/null

real1m28.815s
user0m0.061s
sys 0m48.094s
[/mnt/btr1/qemu]$ /usr/sbin/filefrag win10.img 
win10.img: 63682 extents found
[/mnt/btr1/qemu]$ btrfs fi def win10.img
[/mnt/btr1/qemu]$ /usr/sbin/filefrag win10.img 
win10.img: 18015 extents found
[/mnt/btr1/qemu]$ time cat /dev/null

real1m17.879s
user0m0.076s
sys 0m37.757s

> `--inplace` is probably not helping (especially if most of the file changed,
> on BTRFS, it actually is marginally more efficient to just write out a whole
> new file and then replace the old one with a rename if you're rewriting most
> of the file), but is probably not as much of an issue as compress=zlib.

Yeah, scp + dedupe would run faster.  For deduplication, instead of
duperemove it'd be better to call file_extent_same on the first 128K, then
the second, ... -- without even hashing the blocks beforehand.

Not that this particular VM takes enough backup space to make spending too
much time worthwhile, but it's a good test case for performance issues like
this.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ I've read an article about how lively happy music boosts
⣾⠁⢰⠒⠀⣿⡁ productivity.  You can read it, too, you just need the
⢿⡄⠘⠷⠚⠋⠀ right music while doing so.  I recommend Skepticism
⠈⠳⣄ (funeral doom metal).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: snapshots of encrypted directories?

2017-09-18 Thread Austin S. Hemmelgarn

On 2017-09-15 15:41, Ulli Horlacher wrote:

On Fri 2017-09-15 (13:16), Austin S. Hemmelgarn wrote:


And then mount enryptfs:

mount.ecryptfs / /


This only possible by root.
For a user it is not possible to have access for his own snapshots.
Bad.


Which is why you use EncFS (which is a FUSE module that runs in
userspace and requires no root privileges) instead of eCryptFS (which is
a kernel assisted filesystem that doesn't use FUSE, has more complicated
setup constraints, and requires CAP_SYS_ADMIN or root access).


I use both, encfs and ecryptfs, for different use cases.
I use ecryptfs on my notebooks for $HOME, which has some kind of
automounter on login (via pam).
This setup is not possible with encfs, which is also much slower and has
a lower security level.
Actually it is, it's just not trivially easy like with eCryptFS.  the 
pam_script module can be used to perform auto-mounting on login as well.


But even for encfs it is very circumstantial for a user to have access to
snapshots.
It's still a case where it's a problem of the combined usage of the two, 
and it's not likely to get fixed by either.  In theory, it should be 
possible to have some hook added that handles mounting the snapshots 
when one is taken and when the user logs in, but that isn't the job of 
BTRFS at all (filesystems are supposed to not care about what's using 
them), and I don't see it as likely that EncFS or eCryptFS will add 
support either (they can't reliably watch for snapshot creation, so they 
would have to add snapshot support and force you to go through them). 
Overall, you're likely to be better off arguing for BTRFS native support 
for the VFS encryption API (that is, F2FS and ext4 style native per-file 
encryption).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A user cannot remove his readonly snapshots?!

2017-09-18 Thread Austin S. Hemmelgarn

On 2017-09-16 10:28, Ulli Horlacher wrote:

On Sat 2017-09-16 (13:47), Kai Krakow wrote:


Or you do "btrfs device stats .", it shows the associated device(s).


tux@xerus:/test/tux/zz: btrfs device stats .
ERROR: getting dev info for devstats failed: Operation not permitted

Not possible for a normal user.

`btrfs fi show` should be possible for a normal user, and that also 
lists devices.  Alternatively, you can check /sys/fs/btrfs and correlate 
UUID's (possibly easier programmatically than analyzing regular output).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A user cannot remove his readonly snapshots?!

2017-09-18 Thread Austin S. Hemmelgarn

On 2017-09-15 15:32, Ulli Horlacher wrote:

On Fri 2017-09-15 (13:08), Austin S. Hemmelgarn wrote:

On 2017-09-15 12:37, Ulli Horlacher wrote:


I have my btrfs filesystem mounted with option user_subvol_rm_allowed
tux@xerus: btrfs subvolume delete /test/tux/zz/.snapshot/2017-09-15_1824.test
Delete subvolume (no-commit): '/test/tux/zz/.snapshot/2017-09-15_1824.test'
ERROR: cannot delete '/test/tux/zz/.snapshot/2017-09-15_1824.test': Read-only 
file system

root can delete this snapshot, but not the user. Why?



Add 'user_subvol_rm' to the mount options and try again.


root@xerus:~# mount -vo user_subvol_rm /dev/sdd4 /test
mount: wrong fs type, bad option, bad superblock on /dev/sdd4,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.

root@xerus:~# dmesg | tail -2
[1514588.018991] BTRFS info (device sdd4): unrecognized mount option 
'user_subvol_rm'
[1514588.028430] BTRFS: open_ctree failed

user_subvol_rm is not listed on
https://btrfs.wiki.kernel.org/index.php/Mount_options

Did you mean user_subvol_rm_allowed?

Yes, that's what I meant, sorry about the confusion.

I have already used this option, without success. See above.

I missed that, sorry.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Tomasz Chmielewski

On 2017-09-18 17:29, Andrei Borzenkov wrote:
On Mon, Sep 18, 2017 at 11:20 AM, Tomasz Chmielewski  
wrote:

# df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a 
situation
such as this, as it really doesn't give you the information you need 
(it

can say you have lots of space available, but if btrfs has all of it
allocated into chunks, even if the chunks have space in them still, 
there

can be problems).



I see here on RAID-1, "df -h" it shows pretty much the same amount of 
free

space as "btrfs fi show":

- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):104.28GiB  
(min:

104.28GiB)



I think both use the same algorithm to compute free space (df at the
end just shows what kernel returns). The problem is that this
algorithm itself is just approximation in general case. For uniform
RAID1 profile it should be correct though.


And perhaps more important - can I assume that right now, with the 
latest stable kernel (4.13.2 right now), running "btrfs balance" is not 
safe and can lead to data corruption or loss?



Consider the following case:

- system admin runs btrfs balance on a filesystem with 100 GB free and 
assumes it is enough space to complete successfully


- btrfs balance fails due to some bug with "No space left on device"

- at the same time, a database using this filesystem will fail with "No 
space left on device", apt/rpm will fail a package upgrade, some program 
using temp space will fail, log collector will fail to catch some data, 
because of "No space left on device" and so on?




Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/extent_io.c:1989

2017-09-18 Thread Paul Jones
Hi
 I have a system that crashed during a defrag, upon reboot I got the following 
trace while resuming the defrag.
Filesystem is BTRFS Raid1 on lvm+cache, kernel 4.13.2
Check --repair gives lots of warnings about parent transid verify failed, but 
otherwise completes without issue.

Ran scrub which seems to have fixed most of the issues without crashing:

scrub status for d844164a-239e-4f37-9126-d3b2f3ab72be
scrub started at Mon Sep 18 15:59:05 2017 and finished after 02:04:00
total bytes scrubbed: 2.22TiB with 22890 errors
error details: verify=1078 csum=21812
corrected errors: 22886, uncorrectable errors: 4, unverified errors: 0

I'll see how it goes when I use rsync to verify from the other backup.

Thanks,
Paul.

[   52.687705] BTRFS error (device dm-15): parent transid verify failed on 
6822688718848 wanted 1044475 found 1044411
[   52.688346] BTRFS info (device dm-15): read error corrected: ino 0 off 
6822688718848 (dev /dev/mapper/lvmB-backup--b sector 2340415488)
[   52.688401] BTRFS info (device dm-15): read error corrected: ino 0 off 
6822688722944 (dev /dev/mapper/lvmB-backup--b sector 2340415496)
[   52.688451] BTRFS info (device dm-15): read error corrected: ino 0 off 
6822688727040 (dev /dev/mapper/lvmB-backup--b sector 2340415504)
[   52.688501] BTRFS info (device dm-15): read error corrected: ino 0 off 
6822688731136 (dev /dev/mapper/lvmB-backup--b sector 2340415512)
[   53.332383] BTRFS error (device dm-15): parent transid verify failed on 
6522612940800 wanted 1044486 found 1042732
[   53.332668] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522612940800 (dev /dev/mapper/lvmB-backup--b sector 491844480)
[   53.332732] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522612944896 (dev /dev/mapper/lvmB-backup--b sector 491844488)
[   53.332794] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522612948992 (dev /dev/mapper/lvmB-backup--b sector 491844496)
[   53.332846] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522612953088 (dev /dev/mapper/lvmB-backup--b sector 491844504)
[   53.395581] BTRFS error (device dm-15): parent transid verify failed on 
6823548452864 wanted 1044475 found 1044413
[   53.395979] BTRFS info (device dm-15): read error corrected: ino 0 off 
6823548452864 (dev /dev/mapper/lvmB-backup--b sector 2342094656)
[   53.396054] BTRFS info (device dm-15): read error corrected: ino 0 off 
6823548456960 (dev /dev/mapper/lvmB-backup--b sector 2342094664)
[   53.527429] BTRFS error (device dm-15): parent transid verify failed on 
6823548583936 wanted 1044475 found 1044413
[   55.516066] br0: port 1(eth0) entered forwarding state
[   55.516068] br0: topology change detected, propagating
[   55.516101] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready
[  126.354423] BTRFS error (device dm-15): parent transid verify failed on 
6522613661696 wanted 1044486 found 1043710
[  126.354696] repair_io_failure: 6 callbacks suppressed
[  126.354698] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522613661696 (dev /dev/mapper/lvmB-backup--b sector 491845888)
[  126.354765] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522613665792 (dev /dev/mapper/lvmB-backup--b sector 491845896)
[  126.354824] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522613669888 (dev /dev/mapper/lvmB-backup--b sector 491845904)
[  126.354886] BTRFS info (device dm-15): read error corrected: ino 0 off 
6522613673984 (dev /dev/mapper/lvmB-backup--b sector 491845912)
[  126.484340] BTRFS error (device dm-15): parent transid verify failed on 
6517401976832 wanted 1044482 found 1044204
[  126.484890] BTRFS info (device dm-15): read error corrected: ino 0 off 
6517401976832 (dev /dev/mapper/lvmB-backup--b sector 798336768)
[  126.484939] BTRFS info (device dm-15): read error corrected: ino 0 off 
6517401980928 (dev /dev/mapper/lvmB-backup--b sector 798336776)
[  126.484989] BTRFS info (device dm-15): read error corrected: ino 0 off 
6517401985024 (dev /dev/mapper/lvmB-backup--b sector 798336784)
[  126.485040] BTRFS info (device dm-15): read error corrected: ino 0 off 
6517401989120 (dev /dev/mapper/lvmB-backup--b sector 798336792)
[  126.667061] BTRFS error (device dm-15): parent transid verify failed on 
6523036008448 wanted 1044486 found 1044206
[  126.667340] BTRFS info (device dm-15): read error corrected: ino 0 off 
6523036008448 (dev /dev/mapper/lvm-backup--a sector 375252800)
[  126.667377] BTRFS info (device dm-15): read error corrected: ino 0 off 
6523036012544 (dev /dev/mapper/lvm-backup--a sector 375252808)
[  126.828898] BTRFS error (device dm-15): parent transid verify failed on 
6522547240960 wanted 1044486 found 1044206
[  126.829325] BTRFS error (device dm-15): parent transid verify failed on 
6522547257344 wanted 1044486 found 1043052
[  126.831141] BTRFS error (device dm-15): parent transid verify failed on 
6522547650560 wanted 1044486 found 1044206
[  126.846967] BTRFS 

Re: ERROR: parent determination failed (btrfs send-receive)

2017-09-18 Thread Graham Cobb
On 18/09/17 07:10, Dave wrote:
> For my understanding, what are the restrictions on deleting snapshots?
> 
> What scenarios can lead to "ERROR: parent determination failed"?

The man page for btrfs-send is reasonably clear on the requirements
btrfs imposes. If you want to use incremental sends (i.e. the -c or -p
options) then the specified snapshots must exist on both the source and
destination. If you don't have a suitable existing snapshot then don't
use -c or -p and just do a full send.

> I use snap-sync to create and send snapshots.
> 
> GitHub - wesbarnett/snap-sync: Use snapper snapshots to backup to external 
> drive
> https://github.com/wesbarnett/snap-sync

I am not familiar with this tool. Your question should be sent to the
author of the tool, if that is what is deciding what -p and -c options
are being used.

Personally I use and recommend btrbk. I have never had this issue and
the configuration options let me limit the snapshots it saves on both
the source and destination disks separately (so I keep fewer on the
source than on the backup disk).

Graham
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Andrei Borzenkov
On Mon, Sep 18, 2017 at 11:20 AM, Tomasz Chmielewski  wrote:
>>> # df -h /var/lib/lxd
>>>
>>> FWIW, standard (aka util-linux) df is effectively useless in a situation
>>> such as this, as it really doesn't give you the information you need (it
>>> can say you have lots of space available, but if btrfs has all of it
>>> allocated into chunks, even if the chunks have space in them still, there
>>> can be problems).
>
>
> I see here on RAID-1, "df -h" it shows pretty much the same amount of free
> space as "btrfs fi show":
>
> - "df -h" shows 105G free
> - "btrfs fi show" says: Free (estimated):104.28GiB  (min:
> 104.28GiB)
>

I think both use the same algorithm to compute free space (df at the
end just shows what kernel returns). The problem is that this
algorithm itself is just approximation in general case. For uniform
RAID1 profile it should be correct though.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using fio to test btrfs compression

2017-09-18 Thread Timofey Titovets
2017-09-18 10:36 GMT+03:00 shally verma :
> Hi
>
> I wanted to test btrfs compression using fio command but somehow
> during fio writes, I don't see code taking route of compression blocks
> where as If I do a copy to btrfs compression enabled mount point then
> I can easily see code falling through compression.c.
>
> Here's how I do my setup
>
> 1. mkfs.btrfs /dev/sdb1
> 2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt
> 3. cp  /mnt
> 4. dmesg shows print staments from compression.c and zlib.c confirming
> compression routine was invoked during write
> 5. now, copy back from btrfs mount point to home directory also shows
> decompress call invokation
>
> Now, try same with fio commands:
>
> fio command
>
> fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1
> --ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128
> --name=test --size=10G --runtime=180 --time_based
>
> But it seems to write uncompressed data.
>
> Any help here? what's missing?
>
> Thanks
> Shally
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

1. mount -t btrfs -o compress=zlib,compress-force -> compress-force=zlib
2. Tune fio to generate compressible data


-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to run balance successfully (No space left on device)?

2017-09-18 Thread Tomasz Chmielewski

# df -h /var/lib/lxd

FWIW, standard (aka util-linux) df is effectively useless in a 
situation
such as this, as it really doesn't give you the information you need 
(it

can say you have lots of space available, but if btrfs has all of it
allocated into chunks, even if the chunks have space in them still, 
there

can be problems).


I see here on RAID-1, "df -h" it shows pretty much the same amount of 
free space as "btrfs fi show":


- "df -h" shows 105G free
- "btrfs fi show" says: Free (estimated):104.28GiB  
(min: 104.28GiB)





But chances are pretty good that one you get that patch integrated,
whether by integrating it yourself to what you have currently, or by
trying 4.14-rc1 or waiting until it hits release or stable, that bug 
will

have been squashed! =:^)


OK, will wait for 4.14.


Tomasz Chmielewski
https://lxadm.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] xfstests: Add echo logic to show the information of TEST_FS_MOUNT_OPTS

2017-09-18 Thread Gu Jinxiang
There is message to show user the scrath mount, but no message to
point TEST_FS_MOUNT_OPTS of test mount.
Add logic to show test mount.

Signed-off-by: Gu Jinxiang 
---
 check |  1 +
 common/rc | 13 +
 2 files changed, 14 insertions(+)

diff --git a/check b/check
index f8db3cd..676b16c 100755
--- a/check
+++ b/check
@@ -579,6 +579,7 @@ for section in $HOST_OPTIONS_SECTIONS; do
# print out our test configuration
echo "FSTYP -- `_full_fstyp_details`"
echo "PLATFORM  -- `_full_platform_details`"
+   echo "TEST_FS_MOUNT_OPTS -- `_test_mount_options`"
if [ ! -z "$SCRATCH_DEV" ]; then
  echo "MKFS_OPTIONS  -- `_scratch_mkfs_options`"
  echo "MOUNT_OPTIONS -- `_scratch_mount_options`"
diff --git a/common/rc b/common/rc
index eb9c469..1e7fee2 100644
--- a/common/rc
+++ b/common/rc
@@ -312,6 +312,19 @@ _overlay_mount_options()
 $OVERLAY_MOUNT_OPTIONS
 }
 
+_test_mount_options()
+{
+   _test_options mount
+
+   if [ "$FSTYP" == "overlay" ]; then
+   echo `_overlay_mount_options`
+   return 0
+   fi
+
+   echo $TEST_OPTIONS $TEST_FS_MOUNT_OPTS $SELINUX_MOUNT_OPTIONS $* \
+   $TEST_DEV $TEST_DIR
+}
+
 _scratch_mount_options()
 {
_scratch_options mount
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] xfstests: Split MOUNT_OPTIONS to TEST_FS_MOUNT_OPTS and MOUNT_OPTIONS

2017-09-18 Thread Gu Jinxiang
Resovle the inconsistent of mount option.
Btrfs use MOUNT_OPTIONS for both scrath_dev and test_dev. Change to
MOUNT_OPTIONS for scratch mount, and TEST_FS_MOUNT_OPTS for test dev
mount.

Signed-off-by: Gu Jinxiang 
---
As mentioned by https://patchwork.kernel.org/patch/9742039/, the usage
of MOUNT_OPTIONS is inconsistent.

 common/btrfs | 56 
 common/rc|  2 +-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/common/btrfs b/common/btrfs
index fd762ef..2a1cfaf 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -89,6 +89,62 @@ _require_btrfs_fs_feature()
_notrun "Feature $feat not supported by the available btrfs 
version"
 }
 
+_check_test_btrfs_filesystem()
+{
+   device=$1
+
+   # If type is set, we're mounted
+   type=`_fs_type $device`
+   ok=1
+
+   if [ "$type" = "$FSTYP" ]; then
+   # mounted ...
+   mountpoint=`_umount_or_remount_ro $device`
+   fi
+
+   if [ -f ${RESULT_DIR}/require_scratch.require_qgroup_report ]; then
+   $BTRFS_UTIL_PROG check $device --qgroup-report > 
$tmp.qgroup_report 2>&1
+   if grep -qE "Counts for qgroup.*are different" 
$tmp.qgroup_report ; then
+   _log_err "_check_btrfs_filesystem: filesystem on 
$device has wrong qgroup numbers"
+   echo "*** qgroup_report.$FSTYP output ***"  
>>$seqres.full
+   cat $tmp.qgroup_report  
>>$seqres.full
+   echo "*** qgroup_report.$FSTYP output ***"  
>>$seqres.full
+   fi
+   rm -f $tmp.qgroup_report
+   fi
+
+   $BTRFS_UTIL_PROG check $device >$tmp.fsck 2>&1
+   if [ $? -ne 0 ]; then
+   _log_err "_check_btrfs_filesystem: filesystem on $device is 
inconsistent"
+   echo "*** fsck.$FSTYP output ***"   >>$seqres.full
+   cat $tmp.fsck   >>$seqres.full
+   echo "*** end fsck.$FSTYP output"   >>$seqres.full
+
+   ok=0
+   fi
+   rm -f $tmp.fsck
+
+   if [ $ok -eq 0 ]; then
+   echo "*** mount output ***" >>$seqres.full
+   _mount  >>$seqres.full
+   echo "*** end mount output" >>$seqres.full
+   elif [ "$type" = "$FSTYP" ]; then
+   # was mounted ...
+   _mount_or_remount_rw "$TEST_FS_MOUNT_OPTS" $device $mountpoint
+   ok=$?
+   fi
+
+   if [ $ok -eq 0 ]; then
+   status=1
+   if [ "$iam" != "check" ]; then
+   exit 1
+   fi
+   return 1
+   fi
+
+   return 0
+}
+
 _check_btrfs_filesystem()
 {
device=$1
diff --git a/common/rc b/common/rc
index cd53a37..eb9c469 100644
--- a/common/rc
+++ b/common/rc
@@ -2624,7 +2624,7 @@ _check_test_fs()
# do nothing for now
;;
 btrfs)
-   _check_btrfs_filesystem $TEST_DEV
+   _check_test_btrfs_filesystem $TEST_DEV
;;
 tmpfs)
# no way to check consistency for tmpfs
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


using fio to test btrfs compression

2017-09-18 Thread shally verma
Hi

I wanted to test btrfs compression using fio command but somehow
during fio writes, I don't see code taking route of compression blocks
where as If I do a copy to btrfs compression enabled mount point then
I can easily see code falling through compression.c.

Here's how I do my setup

1. mkfs.btrfs /dev/sdb1
2. mount -t btrfs -o compress=zlib,compress-force /dev/sdb1 /mnt
3. cp  /mnt
4. dmesg shows print staments from compression.c and zlib.c confirming
compression routine was invoked during write
5. now, copy back from btrfs mount point to home directory also shows
decompress call invokation

Now, try same with fio commands:

fio command

fio --directory=/mnt/ --numjobs=1 --direct=0 --buffered=1
--ioengine=libaio --group_reporting --bs=64k --rw=write --iodepth=128
--name=test --size=10G --runtime=180 --time_based

But it seems to write uncompressed data.

Any help here? what's missing?

Thanks
Shally
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3.1 11/14] btrfs-progs: tests/mkfs: Add basic test case for rootdir parameter

2017-09-18 Thread Qu Wenruo
Add a new test case to check if "--rootdir" option of mkfs.btrfs can
handle the file content, perrmission and xattr correctly.

The new test case reuses the convert facility, and looks just like
convert-tests/001-ext2-basic.

Signed-off-by: Qu Wenruo 
---
changelog:
v3.1:
Cleanup the temporary rootdir directory.
---
 tests/mkfs-tests/010-basic-rootdir/test.sh | 79 ++
 1 file changed, 79 insertions(+)
 create mode 100755 tests/mkfs-tests/010-basic-rootdir/test.sh

diff --git a/tests/mkfs-tests/010-basic-rootdir/test.sh 
b/tests/mkfs-tests/010-basic-rootdir/test.sh
new file mode 100755
index ..49f9b860
--- /dev/null
+++ b/tests/mkfs-tests/010-basic-rootdir/test.sh
@@ -0,0 +1,79 @@
+#!/bin/bash
+# Check basic operations for "mkfs.btrfs --rootdir", including:
+# 1)   Checksum, permission and acl
+#  Should be consistent with source directory
+# 2)   Failure condition
+# 2.1)  Non-existent file/block as destination
+# 2.2) Too small destination file
+# 2.3)  No privilege to read source directory
+#  All failure condition should fail, but without segfault/backtrace
+
+source "$TOP/tests/common"
+source "$TOP/tests/common.convert"
+
+setup_root_helper
+prepare_test_dev 512M
+check_prereq mkfs.btrfs
+check_global_prereq dd
+check_global_prereq sed
+
+# Since our test dir will be in /tmp, which is nowadays tmpfs for most
+# distributions, and tmpfs xattr doesn't support user xattr, here we
+# use a special populate_fs() which won't create user xattr.
+#
+# Don't worry, both acl and user xattr is implemented by xattr in btrfs,
+# so "acls" should cover the case.
+populate_tmpfs() {
+   _assert_path "$1"
+
+for dataset_type in 'small' 'hardlink' 'fast_symlink' 'brokenlink' 
'perm' 'sparse' 'acls' 'fifo' 'slow_symlink'; do
+   generate_dataset "$dataset_type" "$1"
+   done
+}
+
+# Basic content checker for difference nodesize/features
+content_test() {
+   local features
+   local nodesize
+   local src_dir
+   local csum_tmp 
+   local perm_tmp 
+   local acl_tmp
+
+   features="$1"
+   nodesize="$2"
+   src_dir=$(mktemp --tmpdir --directory btrfs-progs-mkfs-rootdir.XXX)
+
+   echo "[TEST/mkfs_rootdir]   nodesize=$nodesize" 
"${features:-defaults}"
+   echo "creating test dir at $src_dir" >> "$RESULTS"
+
+   populate_tmpfs "$src_dir"
+   csum_tmp=$(mktemp --tmpdir btrfs-progs-convert.XX)
+   perm_tmp=$(mktemp --tmpdir btrfs-progs-convert.permXX)
+   acl_tmp=$(mktemp --tmpdir btrfs-progs-convert.aclsXXX)
+   convert_test_gen_checksums "$csum_tmp" "$src_dir"
+   convert_test_perm "$perm_tmp" "$src_dir"
+   convert_test_acl "$acl_tmp" "$src_dir"
+
+   run_check "$TOP/mkfs.btrfs" ${1:+-O "$1"} ${2:+-n "$2"} \
+   "--rootdir" "$src_dir" "$TEST_DEV"
+   run_check "$TOP/btrfs" check "$TEST_DEV"
+   run_check_mount_test_dev
+   convert_test_post_check_checksums "$csum_tmp"
+   convert_test_post_check_permissions "$perm_tmp"
+   convert_test_post_check_acl "$acl_tmp"
+   run_check_umount_test_dev
+
+   rm -- "$csum_tmp"
+   rm -- "$perm_tmp"
+   rm -- "$acl_tmp"
+   run_check $SUDO_HELPER rm -rf -- "$src_dir"
+}
+
+for feature in '' 'extref' 'skinny-metadata' 'no-holes'; do
+   content_test "$feature" 4096
+   content_test "$feature" 8192
+   content_test "$feature" 16384
+   content_test "$feature" 32768
+   content_test "$feature" 65536
+done
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/14] btrfs-progs: mkfs: Fix overwritten return value for mkfs

2017-09-18 Thread Qu Wenruo
For mkfs failure, especially --rootdir errors like EPERM/ENOSPC, the out
branch will overwrite return value, causing wrong status code.

Fix it so it can pass incoming test cases.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index f78c24ce..caa5c2e2 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1339,6 +1339,7 @@ int main(int argc, char **argv)
int zero_end = 1;
int fd = -1;
int ret;
+   int close_ret;
int i;
int mixed = 0;
int nodesize_forced = 0;
@@ -1814,9 +1815,9 @@ raid_groups:
 */
fs_info->finalize_on_close = 1;
 out:
-   ret = close_ctree(root);
+   close_ret = close_ctree(root);
 
-   if (!ret) {
+   if (!close_ret) {
optind = saved_optind;
dev_cnt = argc - optind;
while (dev_cnt-- > 0) {
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/14] btrfs-progs: tests/common: Detect ungraceful failure case

2017-09-18 Thread Qu Wenruo
run_mustfail() checks the return value but doesn't check if the failure
is graceful.

Add such ungraceful failure check, for incoming "mkfs.btrfs --rootdir"
test cases, as old "--rootdir" never fail gracefully.

Signed-off-by: Qu Wenruo 
---
 tests/common | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/common b/tests/common
index 242ded1d..b49d9ff3 100644
--- a/tests/common
+++ b/tests/common
@@ -227,6 +227,11 @@ run_mustfail()
$INSTRUMENT "$@" >> "$RESULTS" 2>&1
fi
if [ $? != 0 ]; then
+   if [ $? -gt 128 ]; then
+   "failed (ungracefully): $@" >> "$RESULTS"
+   _fail "unexpected ungraceful failure: $msg"
+   return 1
+   fi
echo "failed (expected): $@" >> "$RESULTS"
return 0
else
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/14] btrfs-progs: Doc/mkfs: Add extra condition for rootdir option

2017-09-18 Thread Qu Wenruo
Add extra limitation explained for --rootdir option, including:
1) Size limitation
   Now I decide to follow "mkfs.ext4 -d" behavior, so user is
   responsible to make sure the block device/file is large enough.

2) Read permission
   If user can't read the content, mkfs will just fail.
   So user is also responsible to make sure to have enough privilege.

3) Extra warning about the behavior change
   Since we we don't shrink fs the create file image, add such warning
   in documentation.

Signed-off-by: Qu Wenruo 
---
 Documentation/mkfs.btrfs.asciidoc | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/mkfs.btrfs.asciidoc 
b/Documentation/mkfs.btrfs.asciidoc
index d53d9e26..645a2881 100644
--- a/Documentation/mkfs.btrfs.asciidoc
+++ b/Documentation/mkfs.btrfs.asciidoc
@@ -106,6 +106,19 @@ Please see the mount option 'discard' for that in 
`btrfs`(5).
 *-r|--rootdir *::
 Populate the toplevel subvolume with files from 'rootdir'.  This does not
 require root permissions and does not mount the filesystem.
++
+With this option, only one device can be specified.
++
+NOTE: User should make sure the block device/file has large enough space to
+contain the source directory and has enough previllege to read source 
directory.
+Or mkfs will just fail.
++
+WARNING: Before v4.14 btrfs-progs, *--rootdir* will shrink the filesystem,
+prevent user to make use of the remaining space.
+In v4.14 btrfs-progs, this behavior is changed, and will not shrink the fs.
+The result should be the same as `mkfs`, `mount` and then `cp -r`. +
+Also, if destination file/block device does not exist, *--rootdir* will not
+create the image file, to make it follow the normal mkfs behavior.
 
 *-O|--features [,...]*::
 A list of filesystem features turned on at mkfs time. Not all features are
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/14] btrfs-progs: tests/common: Introduce optional parameter to specify destination directory for generate_dataset

2017-09-18 Thread Qu Wenruo
Normally generate_dataset() will create data into
$TEST_MNT/$dataset_type.

This is OK since most tests are doing their operation in $TEST_MNT.
However this is not the case for "mkfs --rootdir" test.

This patch will adds an optional parameter for generate_dataset() to
specify the destination directory, for later "mkfs --rootdir" test
cases.

Signed-off-by: Qu Wenruo 
---
 tests/common | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tests/common b/tests/common
index 9aebe3e7..242ded1d 100644
--- a/tests/common
+++ b/tests/common
@@ -440,7 +440,11 @@ DATASET_SIZE=50
 generate_dataset() {
 
dataset_type="$1"
-   dirpath=$TEST_MNT/$dataset_type
+   if [ -z "$2" ]; then
+   dirpath="$TEST_MNT/$dataset_type"
+   else
+   dirpath="$2/$dataset_type"
+   fi
run_check $SUDO_HELPER mkdir -p "$dirpath"
 
case "$dataset_type" in
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/14] btrfs-progs: tests/common: Make checksum, permission and acl check path independent

2017-09-18 Thread Qu Wenruo
convert_test_gen_checksums(), convert_test_perm() and convert_test_acl()
all uses absolute path, which is good enough for convert test.

However for "mkfs --rootdir" test, we want all above function to use
relative path, making the output path independent.

This patch modified all these functions by:
1) Adding new optional parameter to specify destination directory
   Callers and corresponding checkers also get this new optional parameter
2) Changing directory before generate files list/csum file
   And return to old pwd after work is done.

Signed-off-by: Qu Wenruo 
---
 tests/common.convert | 91 +++-
 1 file changed, 83 insertions(+), 8 deletions(-)

diff --git a/tests/common.convert b/tests/common.convert
index 45174b7e..8a36cba3 100644
--- a/tests/common.convert
+++ b/tests/common.convert
@@ -93,45 +93,92 @@ convert_test_prep_fs() {
 
 # generate md5 checksums of files on $TEST_MNT
 # $1: path where the checksums will be stored
+# $2: (optional) destination directory if we're not using $TEST_MNT
 convert_test_gen_checksums() {
+   local dir_path
+   local csum_file
+   local saved_pwd
+
_assert_path "$1"
+   csum_file="$1"
+   if [ -z "$2" ]; then
+   dir_path="$TEST_MNT"
+   else
+   dir_path="$2"
+   fi
 
-   run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/test" 
"bs=$nodesize" \
+   run_check $SUDO_HELPER dd if=/dev/zero of="$dir_path/test" 
"bs=$nodesize" \
count=1 >/dev/null 2>&1
-   run_check_stdout $SUDO_HELPER find "$TEST_MNT" -type f ! -name 'image' 
-exec md5sum {} \+ > "$1"
+
+   # We change directory into destination, so generated md5sum file won't
+   # include absolute path, making the result path independent.
+   saved_pwd="$(pwd)"
+   run_check cd "$dir_path"
+   run_check_stdout $SUDO_HELPER find . -type f ! -name 'image' -exec 
md5sum {} \+ \
+   > "$csum_file"
+   run_check cd "$saved_pwd"
 }
+
 # list $TEST_MNT data set file permissions.
 # $1: path where the permissions will be stored
+# $2: (optional) destination directory if we're not using $TEST_MNT
 convert_test_perm() {
local PERMTMP
+   local saved_pwd
+   local dir_path
 
_assert_path "$1"
PERMTMP="$1"
+   if [ -z "$2" ]; then
+   dir_path="$TEST_MNT"
+   else
+   dir_path="$2"
+   fi
FILES_LIST=$(mktemp --tmpdir btrfs-progs-convert.fileslistXX)
 
-   run_check $SUDO_HELPER dd if=/dev/zero of="$TEST_MNT/test" 
"bs=$nodesize" \
+   run_check $SUDO_HELPER dd if=/dev/zero of="$dir_path/test" 
"bs=$nodesize" \
count=1 >/dev/null 2>&1
-   run_check_stdout $SUDO_HELPER find "$TEST_MNT" -type f ! -name 'image' 
-fprint "$FILES_LIST"
+
+   # Same as convert_test_gen_checksums(), make output path independent
+   saved_pwd="$(pwd)"
+   run_check cd "$dir_path"
+   run_check_stdout $SUDO_HELPER find . -type f ! -name 'image' -fprint 
"$FILES_LIST"
# Fix directory entries order
sort "$FILES_LIST" -o "$FILES_LIST"
for file in `cat "$FILES_LIST"` ;do
run_check_stdout $SUDO_HELPER getfacl --absolute-names "$file" 
>> "$PERMTMP"
done
+   run_check cd "$saved_pwd"
rm -- "$FILES_LIST"
 }
+
 # list acls of files on $TEST_MNT
 # $1: path where the acls will be stored
+# $2: (optional) destination directory if we're not using $TEST_MNT
 convert_test_acl() {
local ACLSTMP
+   local dir_path
+   local saved_pwd
+
+   _assert_path "$1"
ACLTMP="$1"
+   if [ -z "$2" ]; then
+   dir_path="$TEST_MNT"
+   else
+   dir_path="$2"
+   fi
FILES_LIST=$(mktemp --tmpdir btrfs-progs-convert.fileslistXX)
 
-   run_check_stdout $SUDO_HELPER find "$TEST_MNT/acls" -type f -fprint 
"$FILES_LIST"
+   # Make find result and later getfattr output path independent
+   saved_pwd="$(pwd)"
+   run_check cd "$dir_path"
+   run_check_stdout $SUDO_HELPER find "./acls" -type f -fprint 
"$FILES_LIST"
# Fix directory entries order
sort "$FILES_LIST" -o "$FILES_LIST"
for file in `cat "$FILES_LIST"`;do
run_check_stdout $SUDO_HELPER getfattr --absolute-names -d 
"$file" >> "$ACLTMP"
done
+   run_check cd "$saved_pwd"
rm -- "$FILES_LIST"
 }
 
@@ -149,11 +196,18 @@ convert_test_do_convert() {
 convert_test_post_check_permissions() {
local EXT_PERMTMP
local BTRFS_PERMTMP
+   local dir_path
+   local saved_pwd
 
_assert_path "$1"
EXT_PERMTMP="$1"
+   if [ -z "$2" ]; then
+   dir_path="$TEST_MNT"
+   else
+   dir_path="$2"
+   fi
BTRFS_PERMTMP=$(mktemp --tmpdir btrfs-progs-convert.permXX)
-   convert_test_perm "$BTRFS_PERMTMP"
+   convert_test_perm 

[PATCH v3 11/14] btrfs-progs: tests/mkfs: Add basic test case for rootdir parameter

2017-09-18 Thread Qu Wenruo
Add a new test case to check if "--rootdir" option of mkfs.btrfs can
handle the file content, perrmission and xattr correctly.

The new test case reuses the convert facility, and looks just like
convert-tests/001-ext2-basic.

Signed-off-by: Qu Wenruo 
---
 tests/mkfs-tests/010-basic-rootdir/test.sh | 78 ++
 1 file changed, 78 insertions(+)
 create mode 100755 tests/mkfs-tests/010-basic-rootdir/test.sh

diff --git a/tests/mkfs-tests/010-basic-rootdir/test.sh 
b/tests/mkfs-tests/010-basic-rootdir/test.sh
new file mode 100755
index ..6ff3e1af
--- /dev/null
+++ b/tests/mkfs-tests/010-basic-rootdir/test.sh
@@ -0,0 +1,78 @@
+#!/bin/bash
+# Check basic operations for "mkfs.btrfs --rootdir", including:
+# 1)   Checksum, permission and acl
+#  Should be consistent with source directory
+# 2)   Failure condition
+# 2.1)  Non-existent file/block as destination
+# 2.2) Too small destination file
+# 2.3)  No privilege to read source directory
+#  All failure condition should fail, but without segfault/backtrace
+
+source "$TOP/tests/common"
+source "$TOP/tests/common.convert"
+
+setup_root_helper
+prepare_test_dev 512M
+check_prereq mkfs.btrfs
+check_global_prereq dd
+check_global_prereq sed
+
+# Since our test dir will be in /tmp, which is nowadays tmpfs for most
+# distributions, and tmpfs xattr doesn't support user xattr, here we
+# use a special populate_fs() which won't create user xattr.
+#
+# Don't worry, both acl and user xattr is implemented by xattr in btrfs,
+# so "acls" should cover the case.
+populate_tmpfs() {
+   _assert_path "$1"
+
+for dataset_type in 'small' 'hardlink' 'fast_symlink' 'brokenlink' 
'perm' 'sparse' 'acls' 'fifo' 'slow_symlink'; do
+   generate_dataset "$dataset_type" "$1"
+   done
+}
+
+# Basic content checker for difference nodesize/features
+content_test() {
+   local features
+   local nodesize
+   local src_dir
+   local csum_tmp 
+   local perm_tmp 
+   local acl_tmp
+
+   features="$1"
+   nodesize="$2"
+   src_dir=$(mktemp --tmpdir --directory btrfs-progs-mkfs-rootdir.XXX)
+
+   echo "[TEST/mkfs_rootdir]   nodesize=$nodesize" 
"${features:-defaults}"
+   echo "creating test dir at $src_dir" >> "$RESULTS"
+
+   populate_tmpfs "$src_dir"
+   csum_tmp=$(mktemp --tmpdir btrfs-progs-convert.XX)
+   perm_tmp=$(mktemp --tmpdir btrfs-progs-convert.permXX)
+   acl_tmp=$(mktemp --tmpdir btrfs-progs-convert.aclsXXX)
+   convert_test_gen_checksums "$csum_tmp" "$src_dir"
+   convert_test_perm "$perm_tmp" "$src_dir"
+   convert_test_acl "$acl_tmp" "$src_dir"
+
+   run_check "$TOP/mkfs.btrfs" ${1:+-O "$1"} ${2:+-n "$2"} \
+   "--rootdir" "$src_dir" "$TEST_DEV"
+   run_check "$TOP/btrfs" check "$TEST_DEV"
+   run_check_mount_test_dev
+   convert_test_post_check_checksums "$csum_tmp"
+   convert_test_post_check_permissions "$perm_tmp"
+   convert_test_post_check_acl "$acl_tmp"
+   run_check_umount_test_dev
+
+   rm -- "$csum_tmp"
+   rm -- "$perm_tmp"
+   rm -- "$acl_tmp"
+}
+
+for feature in '' 'extref' 'skinny-metadata' 'no-holes'; do
+   content_test "$feature" 4096
+   content_test "$feature" 8192
+   content_test "$feature" 16384
+   content_test "$feature" 32768
+   content_test "$feature" 65536
+done
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/14] btrfs-progs: Refactor find_next_chunk() to get rid of parameter root and objectid

2017-09-18 Thread Qu Wenruo
Function find_next_chunk() is used to find next chunk start position,
which should only do search on chunk tree and objectid is fixed to
BTRFS_FIRST_CHUNK_TREE_OBJECTID.

So refactor the parameter list to get rid of @root, which should be get
from fs_info->chunk_root, and @objectid, which is fixed to
BTRFS_FIRST_CHUNK_TREE_OBJECTID.

Signed-off-by: Qu Wenruo 
---
 volumes.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/volumes.c b/volumes.c
index 2ae2d1bb..2209e5a9 100644
--- a/volumes.c
+++ b/volumes.c
@@ -505,8 +505,9 @@ err:
return ret;
 }
 
-static int find_next_chunk(struct btrfs_root *root, u64 objectid, u64 *offset)
+static int find_next_chunk(struct btrfs_fs_info *fs_info, u64 *offset)
 {
+   struct btrfs_root *root = fs_info->chunk_root;
struct btrfs_path *path;
int ret;
struct btrfs_key key;
@@ -517,7 +518,7 @@ static int find_next_chunk(struct btrfs_root *root, u64 
objectid, u64 *offset)
if (!path)
return -ENOMEM;
 
-   key.objectid = objectid;
+   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
key.offset = (u64)-1;
key.type = BTRFS_CHUNK_ITEM_KEY;
 
@@ -533,7 +534,7 @@ static int find_next_chunk(struct btrfs_root *root, u64 
objectid, u64 *offset)
} else {
btrfs_item_key_to_cpu(path->nodes[0], _key,
  path->slots[0]);
-   if (found_key.objectid != objectid)
+   if (found_key.objectid != BTRFS_FIRST_CHUNK_TREE_OBJECTID)
*offset = 0;
else {
chunk = btrfs_item_ptr(path->nodes[0], path->slots[0],
@@ -995,8 +996,7 @@ again:
}
return -ENOSPC;
}
-   ret = find_next_chunk(chunk_root, BTRFS_FIRST_CHUNK_TREE_OBJECTID,
- );
+   ret = find_next_chunk(info, );
if (ret)
return ret;
key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
@@ -1129,9 +1129,7 @@ int btrfs_alloc_data_chunk(struct btrfs_trans_handle 
*trans,
} else {
u64 tmp;
 
-   ret = find_next_chunk(chunk_root,
- BTRFS_FIRST_CHUNK_TREE_OBJECTID,
- );
+   ret = find_next_chunk(info, );
key.offset = tmp;
if (ret)
return ret;
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/14] Mkfs: Rework --rootdir to a more generic behavior

2017-09-18 Thread Qu Wenruo
The patchset can be fetched from github:
https://github.com/adam900710/btrfs-progs/tree/mkfs_rootdir_rework

mkfs.btrfs --rootdir provides user a method to generate btrfs with
pre-written content while without the need of root privilege to mount
the fs.

However the code is quite old and doesn't get much review or test.
This makes some strange behavior, from customized chunk allocation
(which uses the reserved 0~1M device space) to variant BUG_ON caused by
ENOSPC or EPERM.

The reworked --rootdir will be based on traditional mkfs, everything is
processed after traditional mkfs, so nothing is customized.

The result will be an equivalent of mkfs, mount, cp, umount.
(If btrfs-progs chunk/extent allocator acts as the same as kernel)

Also, documentation and test cases (especially test cases) are also
enhanced.
Documentation has extra explanation for the behavior change, and test
cases are following convert tests, testing all possible feature and
nodesize combinations.

Patch 1~6 change the rootdir behavior.
Patch 7 enhances the documentation.
Patch 8~14 enhances the mkfs test cases, with one bug fix exposed by new
test cases.

Changelog:
  v2:
Follows the "mkfs.ext4 -d" behavior, which will not create the file
for non-existent destination, nor shrink the fs.
Add extra explanation for the behavior change.

  v3:
Make convert tests facility more flex to handle the test pattern of
"--rootdir".
Introduce new mkfs test cases to both check file content and error
handlers.
Slightly enhance the documentation explanation of --rootdir.
Since the main change is test case, patch 1~5 are not modified.

Qu Wenruo (14):
  btrfs-progs: Refactor find_next_chunk() to get rid of parameter root
and objectid
  btrfs-progs: Fix one-byte overlap bug in free_block_group_cache
  btrfs-progs: mkfs: Rework rootdir option to avoid custom chunk layout
  btrfs-progs: mkfs: Update allocation info before verbose output
  btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens
  btrfs-progs: mkfs: Workaround BUG_ON caused by rootdir option
  btrfs-progs: Doc/mkfs: Add extra condition for rootdir option
  btrfs-progs: tests/common: Split user xattr into its own branch for
generate_dataset
  btrfs-progs: tests/common: Introduce optional parameter to specify
destination directory for generate_dataset
  btrfs-progs: tests/common: Make checksum, permission and acl check
path independent
  btrfs-progs: tests/mkfs: Add basic test case for rootdir parameter
  btrfs-progs: tests/common: Detect ungraceful failure case
  btrfs-progs: mkfs: Fix overwritten return value for mkfs
  btrfs-progs: tests/mkfs: Check error handler for rootdir parameter

 Documentation/mkfs.btrfs.asciidoc  |  13 +
 extent-tree.c  |   5 +-
 mkfs/main.c| 286 ++---
 tests/common   |  17 +-
 tests/common.convert   |  93 ++-
 tests/mkfs-tests/010-basic-rootdir/test.sh |  78 ++
 .../mkfs-tests/011-rootdir-fail-condition/test.sh  |  82 ++
 volumes.c  |  32 ++-
 8 files changed, 377 insertions(+), 229 deletions(-)
 create mode 100755 tests/mkfs-tests/010-basic-rootdir/test.sh
 create mode 100755 tests/mkfs-tests/011-rootdir-fail-condition/test.sh

-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/14] btrfs-progs: mkfs: Rework rootdir option to avoid custom chunk layout

2017-09-18 Thread Qu Wenruo
mkfs.btrfs --rootdir uses its own custom chunk layout.
This provides the possibility to limit the filesystem to a minimal size.

However this custom chunk allocation has several problems.
The most obvious problem is that it will allocate chunk from device offset
0.
Both kernel and normal mkfs will reserve 0~1M range for each device.

This rework will remove all related custom chunk allocation and size
calculation.
Less code to maintain is always a good thing, especially for minor or
less maintained code.

So all --rootdir operation will result the same result as mkfs.btrfs +
mount + cp. (Same result as mkfs.ext4 -d)

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 236 +---
 1 file changed, 34 insertions(+), 202 deletions(-)

diff --git a/mkfs/main.c b/mkfs/main.c
index 7592c1fb..0ce1ae26 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -991,53 +991,6 @@ fail_no_dir:
goto out;
 }
 
-static int create_chunks(struct btrfs_trans_handle *trans,
-struct btrfs_root *root, u64 num_of_meta_chunks,
-u64 size_of_data,
-struct mkfs_allocation *allocation)
-{
-   struct btrfs_fs_info *fs_info = root->fs_info;
-   u64 chunk_start;
-   u64 chunk_size;
-   u64 meta_type = BTRFS_BLOCK_GROUP_METADATA;
-   u64 data_type = BTRFS_BLOCK_GROUP_DATA;
-   u64 minimum_data_chunk_size = SZ_8M;
-   u64 i;
-   int ret;
-
-   for (i = 0; i < num_of_meta_chunks; i++) {
-   ret = btrfs_alloc_chunk(trans, fs_info,
-   _start, _size, meta_type);
-   if (ret)
-   return ret;
-   ret = btrfs_make_block_group(trans, fs_info, 0,
-meta_type, 
BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-chunk_start, chunk_size);
-   allocation->metadata += chunk_size;
-   if (ret)
-   return ret;
-   set_extent_dirty(>fs_info->free_space_cache,
-chunk_start, chunk_start + chunk_size - 1);
-   }
-
-   if (size_of_data < minimum_data_chunk_size)
-   size_of_data = minimum_data_chunk_size;
-
-   ret = btrfs_alloc_data_chunk(trans, fs_info,
-_start, size_of_data, data_type, 0);
-   if (ret)
-   return ret;
-   ret = btrfs_make_block_group(trans, fs_info, 0,
-data_type, BTRFS_FIRST_CHUNK_TREE_OBJECTID,
-chunk_start, size_of_data);
-   allocation->data += size_of_data;
-   if (ret)
-   return ret;
-   set_extent_dirty(>fs_info->free_space_cache,
-chunk_start, chunk_start + size_of_data - 1);
-   return ret;
-}
-
 static int make_image(const char *source_dir, struct btrfs_root *root)
 {
int ret;
@@ -1082,86 +1035,6 @@ out:
return ret;
 }
 
-/*
- * This ignores symlinks with unreadable targets and subdirs that can't
- * be read.  It's a best-effort to give a rough estimate of the size of
- * a subdir.  It doesn't guarantee that prepopulating btrfs from this
- * tree won't still run out of space.
- */
-static u64 global_total_size;
-static u64 fs_block_size;
-static int ftw_add_entry_size(const char *fpath, const struct stat *st,
- int type)
-{
-   if (type == FTW_F || type == FTW_D)
-   global_total_size += round_up(st->st_size, fs_block_size);
-
-   return 0;
-}
-
-static u64 size_sourcedir(const char *dir_name, u64 sectorsize,
- u64 *num_of_meta_chunks_ret, u64 *size_of_data_ret)
-{
-   u64 dir_size = 0;
-   u64 total_size = 0;
-   int ret;
-   u64 default_chunk_size = SZ_8M;
-   u64 allocated_meta_size = SZ_8M;
-   u64 allocated_total_size = 20 * SZ_1M;  /* 20MB */
-   u64 num_of_meta_chunks = 0;
-   u64 num_of_data_chunks = 0;
-   u64 num_of_allocated_meta_chunks =
-   allocated_meta_size / default_chunk_size;
-
-   global_total_size = 0;
-   fs_block_size = sectorsize;
-   ret = ftw(dir_name, ftw_add_entry_size, 10);
-   dir_size = global_total_size;
-   if (ret < 0) {
-   error("ftw subdir walk of %s failed: %s", dir_name,
-   strerror(errno));
-   exit(1);
-   }
-
-   num_of_data_chunks = (dir_size + default_chunk_size - 1) /
-   default_chunk_size;
-
-   num_of_meta_chunks = (dir_size / 2) / default_chunk_size;
-   if (((dir_size / 2) % default_chunk_size) != 0)
-   num_of_meta_chunks++;
-   if (num_of_meta_chunks <= num_of_allocated_meta_chunks)
-   num_of_meta_chunks = 0;
-   else
-   num_of_meta_chunks -= num_of_allocated_meta_chunks;
-
-   

[PATCH v3 04/14] btrfs-progs: mkfs: Update allocation info before verbose output

2017-09-18 Thread Qu Wenruo
Since new --rootdir can allocate chunk, it will modify the chunk
allocation result.

This patch will update allocation info before verbose output to reflect
such info.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 0ce1ae26..6561ac52 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1276,6 +1276,38 @@ out:
return ret;
 }
 
+/*
+ * Just update chunk allocation info, since --rootdir may allocate new
+ * chunks which is not updated in @allocation structure.
+ */
+static void update_chunk_allocation(struct btrfs_fs_info *fs_info,
+   struct mkfs_allocation *allocation)
+{
+   struct btrfs_block_group_cache *bg_cache;
+   u64 mixed_flag = BTRFS_BLOCK_GROUP_DATA | BTRFS_BLOCK_GROUP_METADATA;
+   u64 search_start = 0;
+
+   allocation->mixed = 0;
+   allocation->data = 0;
+   allocation->metadata = 0;
+   allocation->system = 0;
+   while (1) {
+   bg_cache = btrfs_lookup_first_block_group(fs_info,
+ search_start);
+   if (!bg_cache)
+   break;
+   if ((bg_cache->flags & mixed_flag) == mixed_flag)
+   allocation->mixed += bg_cache->key.offset;
+   else if (bg_cache->flags & BTRFS_BLOCK_GROUP_DATA)
+   allocation->data += bg_cache->key.offset;
+   else if (bg_cache->flags & BTRFS_BLOCK_GROUP_METADATA)
+   allocation->metadata += bg_cache->key.offset;
+   else
+   allocation->system += bg_cache->key.offset;
+   search_start = bg_cache->key.objectid + bg_cache->key.offset;
+   }
+}
+
 int main(int argc, char **argv)
 {
char *file;
@@ -1733,6 +1765,7 @@ raid_groups:
if (verbose) {
char features_buf[64];
 
+   update_chunk_allocation(fs_info, );
printf("Label:  %s\n", label);
printf("UUID:   %s\n", mkfs_cfg.fs_uuid);
printf("Node size:  %u\n", nodesize);
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/14] btrfs-progs: mkfs: Workaround BUG_ON caused by rootdir option

2017-09-18 Thread Qu Wenruo
--rootdir option will start a transaction to fill the fs, however if
something goes wrong, from ENOSPC to lack of permission, we won't commit
transaction and cause BUG_ON trigger by uncommitted transaction:

--
extent buffer leak: start 29392896 len 16384
extent_io.c:579: free_extent_buffer: BUG_ON `eb->flags & EXTENT_DIRTY` 
triggered, value 1
--

The root fix is to introduce btrfs_abort_transaction() in btrfs-progs,
however in this particular case, we can workaround it by force
committing the transaction.

Since during mkfs, the magic of btrfs is set to an invalid one, without
setting fs_info->finalize_on_close() the fs is never able to be mounted.
So even we force to commit wrong transaction we won't screw up things
worse.

Signed-off-by: Qu Wenruo 
---
 mkfs/main.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/mkfs/main.c b/mkfs/main.c
index 6561ac52..f78c24ce 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1025,6 +1025,18 @@ static int make_image(const char *source_dir, struct 
btrfs_root *root)
printf("Making image is completed.\n");
return 0;
 fail:
+   /*
+* XXX:
+* To avoid BUG_ON() triggered by uncommitted transaction,
+* here we must commit transaction before we have proper
+* btrfs_abort_transaction() in btrfs-progs.
+*
+* Don't worry, the magic number is not valid so the fs can't be
+* mounted by kernel even we commit the trans.
+* And we don't want to pollute the original error, so we ignore
+* the return value from btrfs_commit_transaction().
+*/
+   btrfs_commit_transaction(trans, root);
while (!list_empty(_head.list)) {
dir_entry = list_entry(dir_head.list.next,
   struct directory_name_entry, list);
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 08/14] btrfs-progs: tests/common: Split user xattr into its own branch for generate_dataset

2017-09-18 Thread Qu Wenruo
generate_dataset() combines file acl and file xattr into "acls" type,
since both of them are implemented by using file xattr.

However sometimes we don't want user file attr under certain case, for
example to populate files on tmpfs, which doesn't support user file
xattr.

So this patch split original "acls" type into "user_xattr" and "acls",
make it easier for us to use it on tmpfs.

Signed-off-by: Qu Wenruo 
---
 tests/common | 6 +-
 tests/common.convert | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/common b/tests/common
index 08a25918..9aebe3e7 100644
--- a/tests/common
+++ b/tests/common
@@ -498,10 +498,14 @@ generate_dataset() {
for num in $(seq 1 "$DATASET_SIZE"); do
run_check $SUDO_HELPER touch 
"$dirpath/$dataset_type.$num"
run_check $SUDO_HELPER setfacl -m "u:root:x" 
"$dirpath/$dataset_type.$num"
+   done
+   ;;
+   user_xattr)
+   for num in $(seq 1 "$DATASET_SIZE"); do
+   run_check $SUDO_HELPER touch 
"$dirpath/$dataset_type.$num"
run_check $SUDO_HELPER setfattr -n user.foo -v 
"bar$num" "$dirpath/$dataset_type.$num"
done
;;
-
fifo)
for num in $(seq 1 "$DATASET_SIZE"); do
run_check $SUDO_HELPER mkfifo 
"$dirpath/$dataset_type.$num"
diff --git a/tests/common.convert b/tests/common.convert
index 1be804cf..45174b7e 100644
--- a/tests/common.convert
+++ b/tests/common.convert
@@ -36,7 +36,7 @@ run_check_mount_convert_dev()
 
 populate_fs() {
 
-for dataset_type in 'small' 'hardlink' 'fast_symlink' 'brokenlink' 
'perm' 'sparse' 'acls' 'fifo' 'slow_symlink'; do
+for dataset_type in 'small' 'hardlink' 'fast_symlink' 'brokenlink' 
'perm' 'sparse' 'acls' 'user_xattr' 'fifo' 'slow_symlink'; do
generate_dataset "$dataset_type"
done
 }
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 14/14] btrfs-progs: tests/mkfs: Check error handler for rootdir parameter

2017-09-18 Thread Qu Wenruo
Error handlers for the following case must fail gracefully:
1) No permission to read content of rootdir
2) Too small destination file
3) Non-existent destination file
4) Zero sized destination file

Signed-off-by: Qu Wenruo 
---
 .../mkfs-tests/011-rootdir-fail-condition/test.sh  | 82 ++
 1 file changed, 82 insertions(+)
 create mode 100755 tests/mkfs-tests/011-rootdir-fail-condition/test.sh

diff --git a/tests/mkfs-tests/011-rootdir-fail-condition/test.sh 
b/tests/mkfs-tests/011-rootdir-fail-condition/test.sh
new file mode 100755
index ..efdb4e9f
--- /dev/null
+++ b/tests/mkfs-tests/011-rootdir-fail-condition/test.sh
@@ -0,0 +1,82 @@
+#!/bin/bash
+#
+# Check if "mkfs.btrfs --rootdir" exit gracefully for different error cases
+
+source "$TOP/tests/common"
+source "$TOP/tests/common.convert"
+
+prepare_test_dev 128M 
+check_prereq mkfs.btrfs
+check_global_prereq fallocate
+
+no_permission() {
+   local dir_path
+   local bad_file 
+
+   dir_path=$(mktemp --tmpdir --directory btrfs-progs-mkfs-rootdir.XX)
+   bad_file="$dir_path/no_read_permission"
+
+   echo "[TEST/mkfs_rootdir]   error handler for -EPERM"
+   echo "creating test dir at $src_dir" >> "$RESULTS"
+
+   run_check fallocate -l 4k "$bad_file"
+   run_check chmod 000 "$bad_file"
+
+   run_mustfail "no permission to read" "$TOP/mkfs.btrfs" \
+   --rootdir "$dir_path" "$TEST_DEV"
+   rm -rf -- "$dir_path"
+}
+
+too_small_dest() {
+   local dir_path
+   local bad_file 
+
+   dir_path=$(mktemp --tmpdir --directory btrfs-progs-mkfs-rootdir.XX)
+   bad_file="$dir_path/no_space"
+
+   echo "[TEST/mkfs_rootdir]   error handler for -ENOSPC"
+   echo "creating test dir at $src_dir" >> "$RESULTS"
+
+   run_check fallocate -l 256M "$bad_file"
+
+   run_mustfail "too small destination file" "$TOP/mkfs.btrfs" \
+   --rootdir "$dir_path" "$TEST_DEV"
+   rm -rf -- "$dir_path"
+}
+
+non_existent_dest() {
+   local dir_path
+   local dest_file
+
+   dir_path=$(mktemp --tmpdir --directory btrfs-progs-mkfs-rootdir.XX)
+   dest_file=$(mktemp --tmpdir btrfs-progs-mkfs-dest-file.)
+   run_check rm -- "$dest_file"
+
+   echo "[TEST/mkfs_rootdir]   error handler for non-existent 
destination file"
+   echo "creating test dir at $src_dir" >> "$RESULTS"
+
+   run_mustfail "non-existent destination file" "$TOP/mkfs.btrfs" \
+   --rootdir "$dir_path" "$dest_file"
+   rm -rf -- "$dir_path"
+}
+
+zero_sized_dest() {
+   local dir_path
+   local dest_file
+
+   dir_path=$(mktemp --tmpdir --directory btrfs-progs-mkfs-rootdir.XX)
+   dest_file=$(mktemp --tmpdir btrfs-progs-mkfs-dest-file.)
+
+   echo "[TEST/mkfs_rootdir]   error handler for zero-sized 
destination file"
+   echo "creating test dir at $src_dir" >> "$RESULTS"
+
+   run_mustfail "zero-sized destination file" "$TOP/mkfs.btrfs" \
+   --rootdir "$dir_path" "$dest_file"
+   rm -rf -- "$dir_path"
+   rm -- "$dest_file"
+}
+
+no_permission
+too_small_dest
+non_existent_dest
+zero_sized_dest
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 02/14] btrfs-progs: Fix one-byte overlap bug in free_block_group_cache

2017-09-18 Thread Qu Wenruo
free_block_group_cache() calls clear_extent_bits() with wrong end, which
is one byte larger than the correct range.

This will cause the next adjacent cache state be split.
And due to the split, private pointer (which points to block group
cache) will be reset to NULL.

This is very hard to detect as this function only gets called in
cleanup_temp_chunks() which is just before mkfs finishes.
This bug only get exposed when reworking --rootdir option.

Signed-off-by: Qu Wenruo 
---
 extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/extent-tree.c b/extent-tree.c
index eed56886..525a237e 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -3724,7 +3724,7 @@ static int free_block_group_cache(struct 
btrfs_trans_handle *trans,
btrfs_remove_free_space_cache(cache);
kfree(cache->free_space_ctl);
}
-   clear_extent_bits(_info->block_group_cache, bytenr, bytenr + len,
+   clear_extent_bits(_info->block_group_cache, bytenr, bytenr + len - 1,
  (unsigned int)-1);
ret = free_space_info(fs_info, flags, len, 0, NULL);
if (ret < 0)
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/14] btrfs-progs: Avoid BUG_ON for chunk allocation when ENOSPC happens

2017-09-18 Thread Qu Wenruo
When passing directory larger than block device using --rootdir
parameter, we get the following backtrace:

--
extent-tree.c:2693: btrfs_reserve_extent: BUG_ON `ret` triggered, value -28
./mkfs.btrfs(+0x1a05d)[0x557939e6b05d]
./mkfs.btrfs(btrfs_reserve_extent+0xb5a)[0x557939e710c8]
./mkfs.btrfs(+0xb0b6)[0x557939e5c0b6]
./mkfs.btrfs(main+0x15d5)[0x557939e5de04]
/usr/lib/libc.so.6(__libc_start_main+0xea)[0x7f83b101af6a]
./mkfs.btrfs(_start+0x2a)[0x557939e5af5a]
--

Nothing special, just BUG_ON() abusing from ancient code.

Fix them by using correct return.

Signed-off-by: Qu Wenruo 
---
 extent-tree.c |  3 ++-
 volumes.c | 18 ++
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/extent-tree.c b/extent-tree.c
index 525a237e..055582c3 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -2690,7 +2690,8 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans,
   search_start, search_end, hint_byte, ins,
   trans->alloc_exclude_start,
   trans->alloc_exclude_nr, data);
-   BUG_ON(ret);
+   if (ret < 0)
+   return ret;
clear_extent_dirty(>free_space_cache,
   ins->objectid, ins->objectid + ins->offset - 1);
return ret;
diff --git a/volumes.c b/volumes.c
index 2209e5a9..e1ee27d5 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1032,11 +1032,13 @@ again:
 info->chunk_root->root_key.objectid,
 BTRFS_FIRST_CHUNK_TREE_OBJECTID, key.offset,
 calc_size, _offset, 0);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
device->bytes_used += calc_size;
ret = btrfs_update_device(trans, device);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
map->stripes[index].dev = device;
map->stripes[index].physical = dev_offset;
@@ -1075,16 +1077,24 @@ again:
map->ce.size = *num_bytes;
 
ret = insert_cache_extent(>mapping_tree.cache_tree, >ce);
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk_map;
 
if (type & BTRFS_BLOCK_GROUP_SYSTEM) {
ret = btrfs_add_system_chunk(info, ,
chunk, btrfs_chunk_item_size(num_stripes));
-   BUG_ON(ret);
+   if (ret < 0)
+   goto out_chunk;
}
 
kfree(chunk);
return ret;
+
+out_chunk_map:
+   kfree(map);
+out_chunk:
+   kfree(chunk);
+   return ret;
 }
 
 /*
-- 
2.14.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ERROR: parent determination failed (btrfs send-receive)

2017-09-18 Thread Dave
On Mon, Sep 18, 2017 at 12:23 AM, Andrei Borzenkov  wrote:
> 18.09.2017 05:31, Dave пишет:
>> Sometimes when using btrfs send-receive, I get errors like this:
>>
>> ERROR: parent determination failed for 
>>
>> When this happens, btrfs send-receive backups fail. And all subsequent
>> backups fail too.
>>
>> The issue seems to stem from the fact that an automated cleanup
>> process removes certain earlier subvolumes. (I'm using Snapper.)
>>
>> I'd like to understand exactly what is happening so that my backups do
>> not unexpectedly fail.
>>
>
> You try to send incremental changes but you deleted subvolume to compute
> changes against. It is hard to tell more without seeing subvolume list
> with uuid/parent uuid.

I do not have a current subvolume list to provide UUID's. To ensure
integrity of my backups, I was forced to delete all backup snapshots
and start over. (After this initial parent determination error, my
attempted solution created a new problem related to Received UUID's,
so removing all backups was the best solution in the end.)

For my understanding, what are the restrictions on deleting snapshots?

What scenarios can lead to "ERROR: parent determination failed"?

After all, it should not be necessary to retain every snapshot ever
made. We have to delete snapshots periodically.

In my case, I still retained every snapshot on the target volume. None
had ever been deleted (yet). And I also retained around 30 recent
snapshots on the source, according to the snapper timeline cleanup
config. Yet I still ran into "ERROR: parent determination failed".

>
>> In my scenario, no parent subvolumes have been deleted from the
>> target. Some subvolumes have been deleted from the source, but why
>> does that matter? I am able to take a valid snapshot at this time and
>> every snapshot ever taken continues to reside at the target backup
>> destination (seemingly meaning that a parent subvolume can be found at
>> the target).
>>
>> This issue seems to make btrfs send-receive a very fragile backup
>> solution.
>
> btrfs send/receive is not backup solution - it is low level tool that
> does exactly what it is told to do. You may create backup solution that
> is using btrfs send/receive to transfer data stream, but then do not
> blame tool for incorrect usage.
>
> To give better advice how to fix your situation you need to describe
> your backup solution - how exactly you select/create snapshots.

I use snap-sync to create and send snapshots.

GitHub - wesbarnett/snap-sync: Use snapper snapshots to backup to external drive
https://github.com/wesbarnett/snap-sync

>
> I hope, instead, there is some knowledge I'm missing, that
>> when learned, will make this a robust backup solution.
>>
>> Thanks
>> --
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html