Re: subvolume / folder compression flag

2014-12-13 Thread Robert White

On 12/13/2014 06:59 PM, Ali AlipourR wrote:


2- ... and rsync files without compression flag ...


The --compress flag for rsync has nothing to do with how the files are 
stored on either end. It determines whether the data is compressed as it 
passes from the source rsync to the destination rsync process over the 
network or whatever.


I don't have any information about the rest of the question but the 
rsync option doesn't change the data as read or written, just as 
transmitted.


The only difference between using compressed or not is:

source --normal--> rsync --compressed--> rsync --normal--> dst
source --normal--> rsync --not_compressed--> rsync --normal--> dst

If the fielsystems at source or dst do transparent compression then that 
will be done/undone transparently.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


subvolume / folder compression flag

2014-12-13 Thread Ali AlipourR
Hi,

1- Do setting compression flag per subvolume is implemented?
(I did read on wiki that it is not implemented, but I can set it via
"btrfs  property")

2- If I set compression flag via "btrfs property" or "chattr" on a
subvolume, and rsync files without compression flag from ext4 file
system and preserve attributes by rsync -AX, do these files on that
btrfs subvolume will be compressed?
I mean do files will be compressed even if their compression flag
wasn't set, and just their subvolume compression flag was set?

3- what if I do same as 2 but on normal folder instead of subvolume
(chattr +c that folder), Do files within that folder will be
compressed even if their compression flag wasn't set?

Thanks,
Ali
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-13 Thread Robert White

On 12/13/2014 01:52 PM, Tomasz Chmielewski wrote:

On 2014-12-13 21:54, Robert White wrote:


- rsync many remote data sources (-a -H --inplace --partial) + snapshot


Using --inplace on a Copy On Write filesystem has only one effect, it
increases fragmentation... a lot...


...if the file was changed.


If the file hasn't changed then it won't be be transferred by 
definition. So the un-changed file is not terribly interesting.


And I did think about the rest of (most of) your points right after 
sending the original email. Particularly since I don't know your actual 
use case. But there is no "un-send" which I suddenly realized I wanted 
to do... Because I needed to change my answer. Like ten seconds later. 
/sigh.


I'm still strongly against forcing compression.

That said, my knee-jerk reaction to using --inplace is still strong for 
almost all file types.


And it remains almost absolute in your case simply because you are 
finding yourself needing to balance and whatnot.


E.g. the theoretical model of efficent partial copies as you present is 
fine... up until we get back your original complain about what a mess it 
makes.


The ruling precept here is Ben Franklin's "penny wise, pound foolish". 
What you _might_ be saving up-front with --inplace is charging you 
double on the back-end with maintenance.



Every new block is going to get
written to a new area anyway,


Exactly - "every new block". But that's true with and without --inplace.
Also - without --inplace, it is "every block". In other words, without
--inplace, the file is likely to be rewritten by rsync to a new one, and
CoW is lost (more below).


I don't know the nature of the particular files you are translating, but 
I do know a lot about rsync and file layout in general for lots of 
different types of files.


(rsync details here for readers-along :: 
http://rsync.samba.org/how-rsync-works.html )


Now I am assuming you took this advice from something like the manual 
page [QUOTE] This [--inplace] option  is useful for transferring large 
files with block-based changes or appended data, and also on systems 
that are disk bound, not network bound.  It can also help keep a 
copy-on-write filesystem snapshot from diverging the entire contents of 
a file that only has minor changes.[/QUOTE] Though maybe not since the 
description goes on to say that --inplace implies --partial so 
specifying both is redundant.


But here's the thing, those files are really rare. Way more rare than 
you might think.  The consist almost entirely of block-based database 
extents (like an oracle tablespace file), logfiles (such as is found in 
/var/log/messages etc.), VM disk image files (particularly raw images) 
and ISO images that are _only_ modified by adding tracks may fall into 
this category as well..



So we've already skipped the unchanged files...

So, inserting a single byte into, or removing a single byte from, any 
file will cause a re-write from that point on. It will send that file 
from the block boundary containing that byte. Just about anything with a 
header and a history is going to get re-sent almost completely. This 
includes the output from any word processing program you are likely to 
encounter.


Anything with linear compression (such as Open Document Format, which is 
basically a ZIP file) will be resent entirely.


All compiled programs binaries will be resent entirely if the program 
changed at all (the headers again, the changes in text segments, the 
changes in layout that a single byte difference in size cause the elf 
formats, or the dll formats to juggle significantly.)


And I could go on at length, but I'll skip that...

And _then_ the forced compression comes into play.

Rsync is going to impose its default block size to frame changes (see 
--block-size=) and then BTRFS is going to impose it's compression frame 
sizes (presuming it is done by block size). If these are not exactly teh 
same size any rsync block that updates will result in one or two "extra" 
compression blocks being re-written by the tiling overlap effect.



so if you have enough slack space to
keep the one new copy of the new file, which you will probably use up
anyway in the COW event, laying in the fresh copy in a likely more
contiguous way will tend to make things cleaner over time.

--inplace is doubly useless with compression as compression is
perturbed by default if one byte changes in the original file.


No. If you change 1 byte in a 100 MB file, or perhaps 1 GB file, you
will likely loose a few kBs of CoW. The whole file is certainly not
rewritten if you use --inplace. However it will be wholly rewritten if
you don't use --inplace.



The only time --inplace might be helpful is if the file is NOCOW...
except...


No, you're wrong.
By default, rsync creates a new file if it detects any file modification
- like "touch file".

Consider this experiment:

# create a "large file"
dd if=/dev/urandom of=bigfile bs=1M count=3000

# copy it with rsync

Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-13 Thread Tomasz Chmielewski

On 2014-12-13 21:54, Robert White wrote:

- rsync many remote data sources (-a -H --inplace --partial) + 
snapshot


Using --inplace on a Copy On Write filesystem has only one effect, it
increases fragmentation... a lot...


...if the file was changed.



Every new block is going to get
written to a new area anyway,


Exactly - "every new block". But that's true with and without --inplace.
Also - without --inplace, it is "every block". In other words, without 
--inplace, the file is likely to be rewritten by rsync to a new one, and 
CoW is lost (more below).




so if you have enough slack space to
keep the one new copy of the new file, which you will probably use up
anyway in the COW event, laying in the fresh copy in a likely more
contiguous way will tend to make things cleaner over time.

--inplace is doubly useless with compression as compression is
perturbed by default if one byte changes in the original file.


No. If you change 1 byte in a 100 MB file, or perhaps 1 GB file, you 
will likely loose a few kBs of CoW. The whole file is certainly not 
rewritten if you use --inplace. However it will be wholly rewritten if 
you don't use --inplace.



The only time --inplace might be helpful is if the file is NOCOW... 
except...


No, you're wrong.
By default, rsync creates a new file if it detects any file modification 
- like "touch file".


Consider this experiment:

# create a "large file"
dd if=/dev/urandom of=bigfile bs=1M count=3000

# copy it with rsync
rsync -a -v --progress bigfile bigfile2

# copy it again - blazing fast, no change
rsync -a -v --progress bigfile bigfile2

# "touch" the original file
touch bigfile

# try copying again with rsync - notice rsync creates a temp file, like 
.bigfile2.J79ta2

# No change to the file except the timestamp, but good bye your CoW.
rsync -a -v --progress bigfile bigfile2

# Now try the same with --inplace; compare data written to disk with 
iostat -m in both cases.



Same goes for append files - even if they are compressed, most CoW will 
be shared. I'd say it will be similar for lightly modified files 
(changed data will be CoW-unshared, some compressed "overhead" will be 
unshared, but the rest will be untouched / shared by CoW between the 
snapshots).





- around 500 snapshots in total, from 20 or so subvolumes


That's a lot of snapshots and subvolumes. Not an impossibly high
number, but a lot. That needs it's own use-case evaluation. But
regardless...

Even if you set the NOCOW option on a file to make the --inplace rsync
work, if that file is snapshotted (snapshot?) between the rsync
modification events it will be in 1COW mode because of the snapshot
anyway and you are back to the default anti-optimal conditions.


Again - if the file was changed a lot, it doesn't matter if it's 
--inplace or not. If the file data was not changed, or changed little - 
--inplace will help preserve CoW.




Especially rsync's --inplace option combined with many snapshots and
large fragmentation was deadly for btrfs - I was seeing system freezes
right when rsyncing a highly fragmented, large file.


You are kind of doing all that to yourself.


To clarify - freezes - I mean kernel bugs exposed and machine freezing.
I think we all agree that whatever userspace is doing in the filesystem, 
it should not result is kernel BUG / freeze.




Combining _forced_
compression with denying the natural opportunity for the re-write of
the file to move it to nicely contiguous "new locations" and then
pinning it all in place with multiple snapshots you've created the
worst of all possible worlds.


I disagree. It's quite compact, for my data usage. If I needed blazing 
fast file access, I wouldn't be using a CoW filesystem nor snapshots in 
the first place. For data mostly stored and rarely read, it is OK.



(...)


And keep repeating this to yourself :: "balance does not reorganize
anything, it just moves the existing disorder to a new location". This
is not a perfect summation, and it's clearly wrong if you are using
"convert", but it's the correct way to view what's happening while
asking yourself "should I balance?".


I agree - I don't run it unless I need to (or I'm curious to see if it 
would expose some more bugs).
It would be quite a step back for a filesystem to need some periodic 
maintenance like that after all.


Also I'm in the opinion that balance should not cause the kernel to BUG 
- it should abort, possibly remount the fs ro etc. (suggest running 
btrfsck, if there is enough confidence in this tool), but definitely not 
BUG.



--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-13 Thread Robert White

On 12/13/2014 05:53 AM, Tomasz Chmielewski wrote:

My usage case is quite simple:

- skinny extents, extended inode refs

okay


- mount compress-force=zlib
I'd, personally, "never" force compression. This can increase the size 
of files by five or more percent if it is an inherently incompressible 
file. While it is easy to deliberately create a file that will trick the 
compress check logic into not compressing something that would enjoy 
compression it does _not_ happen by chance very often at all.



- rsync many remote data sources (-a -H --inplace --partial) + snapshot


Using --inplace on a Copy On Write filesystem has only one effect, it 
increases fragmentation... a lot... Every new block is going to get 
written to a new area anyway, so if you have enough slack space to keep 
the one new copy of the new file, which you will probably use up anyway 
in the COW event, laying in the fresh copy in a likely more contiguous 
way will tend to make things cleaner over time.


--inplace is doubly useless with compression as compression is perturbed 
by default if one byte changes in the original file.


The only time --inplace might be helpful is if the file is NOCOW... 
except...




- around 500 snapshots in total, from 20 or so subvolumes


That's a lot of snapshots and subvolumes. Not an impossibly high number, 
but a lot. That needs it's own use-case evaluation. But regardless...


Even if you set the NOCOW option on a file to make the --inplace rsync 
work, if that file is snapshotted (snapshot?) between the rsync 
modification events it will be in 1COW mode because of the snapshot 
anyway and you are back to the default anti-optimal conditions.




Especially rsync's --inplace option combined with many snapshots and
large fragmentation was deadly for btrfs - I was seeing system freezes
right when rsyncing a highly fragmented, large file.


You are kind of doing all that to yourself. Combining _forced_ 
compression with denying the natural opportunity for the re-write of the 
file to move it to nicely contiguous "new locations" and then pinning it 
all in place with multiple snapshots you've created the worst of all 
possible worlds.


The more you use optional gross-behavior options on some sorts of things 
the more you are fighting the "natural organization" of the system. That 
is, every system is designed around a set of core assumptions and 
behavioral options tend to invalidate the mainline assumptions. Some 
options, like "recursive" are naturally part of those assumptions and 
play into them, other options, particularly things with "force" in the 
name tend to be "if you really think you must, sure, I'll do what you 
say, but if it turns out bad it's on _your_ head" options. Which options 
are which is a judgment call, but the combination you've chosen is 
definitely working in that bad area.


And keep repeating this to yourself :: "balance does not reorganize 
anything, it just moves the existing disorder to a new location". This 
is not a perfect summation, and it's clearly wrong if you are using 
"convert", but it's the correct way to view what's happening while 
asking yourself "should I balance?".


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] btrfs-progs: Support for musl libc (and perhaps also uclibc)

2014-12-13 Thread Merlijn Wajer
Hi,

I've been experimenting with musl-libc Gentoo systems. I used the HEAD
of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

I found that btrfs-progs does not compile with musl-libc, for a few reasons:

* It makes use of the private glibc __always_inline macro.

* Various headers that should be included are not included:
linux/limits.h and limits.h. (for XATTR_SIZE_MAX and PATH_MAX)

* backtrace() using execinfo.h is enabled by default; execinfo.h is
glibc-specific and thus does not work on other libc's. musl does not
support it, and I think uclibc also does not support it.

I have worked around the problems in the following way:

* Define __always_inline if __glibc__ is not defined. This is arguably
the most clean solution. It would be better to simply not use the
__always_inline macro (instead, use __attribute__) throughout
btrfs-progs, but I was not sure what the developers would prefer. This
is currently done in kerncompat.h, but you may want to move that to
another file.

* Include various headers where required.

* If __glibc__ is not defined, define BTRFS_DISABLE_BACKTRACE. Currently
the define magic happens in kerncompat, because that also where BTRFS
includes execinfo. Personally, I think it would make more sense to
always disable backtrace instead of enabling it by default -- but
perhaps in a testing phase, enabling it by default in the sensible choice.

Attached are the two patches generated with git format-patch. I am aware
that this may not be required format for submitting patches -- but
please give me some time to get used to the etiquette. :-)

Please let me know if musl-libc (or any other libc) is a supported
platform, and if so, if and how I can improve on said patches.

Regards,
Merlijn
From da43021732fab3f70c75f155bded8e5f35fdffe3 Mon Sep 17 00:00:00 2001
From: Merlijn Wajer 
Date: Sat, 13 Dec 2014 15:07:25 +0100
Subject: [PATCH 1/2] Include headers required for musl-libc.

This fixes various compilation errors where PATH_MAX and XATTR_SIZE_MAX
were missing. To my knowledge, this should have no bad side effects.
---
 btrfs-convert.c | 1 +
 help.c  | 1 +
 mkfs.c  | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index 02c5e94..7b69a13 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctree.h"
 #include "disk-io.h"
diff --git a/help.c b/help.c
index fab942b..56aaf9c 100644
--- a/help.c
+++ b/help.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "commands.h"
 #include "utils.h"
diff --git a/mkfs.c b/mkfs.c
index e10e62d..6343831 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -35,6 +35,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include "ctree.h"
-- 
2.0.4

From 01d0bfe48dc78b66b6e86d4935d9b9d20194b135 Mon Sep 17 00:00:00 2001
From: Merlijn Wajer 
Date: Sat, 13 Dec 2014 15:08:43 +0100
Subject: [PATCH 2/2] Disable backtrace and define __always_inline

Disable backtrace and define __always_inline when glibc is not used as
libc. This, together with some header changes allows btrfs-progs to
compile with musl-libc.
---
 kerncompat.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kerncompat.h b/kerncompat.h
index 8afadc8..05823a7 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -29,6 +29,12 @@
 #include 
 #include 
 #include 
+
+#ifndef __glibc__
+#define BTRFS_DISABLE_BACKTRACE
+#define __always_inline __inline __attribute__ ((__always_inline__))
+#endif
+
 #ifndef BTRFS_DISABLE_BACKTRACE
 #include 
 #endif
-- 
2.0.4



signature.asc
Description: OpenPGP digital signature


Re: Device only missing if unmounted

2014-12-13 Thread Florian Uekermann
Dear Anand,
thank you for your help.

> On December 10, 2014 at 3:46 AM Anand Jain  wrote:
> It depends on the disk that is read first, you could read super block
> using btrfs-show-super and check if num_device.

I checked this for all three devices and num_devices is 4 for all of them.
The full output is below.

> It may fail if you are mounting more than one subvol during boot and if
> you don't have this patch.
>
> commit 0f23ae74f589304bf33233f85737f4fd368549eb
> Author: Chris Mason 
> Date: Thu Sep 18 07:49:05 2014 -0700
>
> Revert "Btrfs: device_list_add() should not update list when mounted"
>
> This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.

I am running 3.18-rc5+ from linus tree. If I am not mistaken,
 that commit is his tree in since 3.17-rc6.

So I guess the question is: Why does the superblock say
num_devices=4 (and how do I fix it)?
Or am I misunderstanding the problem?

Best regards,
Florian

root@oot:/home/shared# btrfs-show-super /dev/sdb
superblock: bytenr=65536, device=/dev/sdb
-
csum0x55a3eb1d [match]
bytenr  65536
flags   0x1
magic   _BHRfS_M [match]
fsidbe2b3499-7452-4b91-b664-4ec4d7ff62b9
label   
generation  3003
root441395970048
sys_array_size  129
chunk_root_generation   3001
root_level  1
chunk_root  442469761024
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 109927469in the first place8752
bytes_used  160703954944
sectorsize  4096
nodesize16384
leafsize16384
stripesize  4096
root_dir6
num_devices 4
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x61
( MIXED_BACKREF |
  BIG_METADATA |
  EXTENDED_IREF )
csum_type   0
csum_size   4
cache_generation3003
uuid_tree_generation3003
dev_item.uuid   5b8c8d20-c330-48a2-814c-c6986b787d52
dev_item.fsid   be2b3499-7452-4b91-b664-4ec4d7ff62b9 [match]
dev_item.type   0
dev_item.total_bytes500107862016
dev_item.bytes_used 162168569856
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  2
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-13 Thread Tomasz Chmielewski

On 2014-12-13 10:39, Robert White wrote:


Might I ask why you are running balance? After a persistent error I'd
understand going straight to scrub, but balance is usually for
transformation or to redistribute things after atypical use.


There were several reasons for running balance on this system:

1) I was getting "no space left", even though there were hundreds of GBs 
left. Not sure if this still applies to the current kernels (3.18 and 
later) though, but it was certainly the problem in the past.


2) The system was regularly freezing, I'd say once a week was a norm. 
Sometimes I was getting btrfs traces logged in syslog.
After a few freezes the fs was getting corrupted to different degree. At 
some point, it was so bad that it was only possible to use it read only. 
So I had to get the data off, reformat, copy back... It would start 
crashing after a few weeks of usage.


My usage case is quite simple:

- skinny extents, extended inode refs
- mount compress-force=zlib
- rsync many remote data sources (-a -H --inplace --partial) + snapshot
- around 500 snapshots in total, from 20 or so subvolumes

Especially rsync's --inplace option combined with many snapshots and 
large fragmentation was deadly for btrfs - I was seeing system freezes 
right when rsyncing a highly fragmented, large file.


Then, running balance on the "corrupted" filesystem was more an exercise 
(if scrub passes fine, I would expect balance to pass as well). Some 
BUGs it was causing was sometimes fixed in newer kernels, sometimes not 
(btrfsck was not really usable a few months back).


3) I had different luck with recovering btrfs after a failed drive (in 
RAID-1). Sometimes it worked as expected, sometimes, the fs was getting 
broken so much I had to rsync data off it and format from scratch (where 
mdraid would kick the drive after getting write errors - it's not the 
case with btrfs, and weird things can happen).
Sometimes, running "btrfs device delete missing" (it's balance in 
principle, I think) would take weeks, during which a second drive could 
easily die.
Again, running balance would be more exercise there, to see if the newer 
kernel still crashes.




An entire generation of folks have grown used to defraging windows
boxes and all, but if you've already got an array that is going to
take "many days" to balance what benefit do you actually expect to
receive?


For me - it's a good test to see if btrfs is finally getting stable 
(some cases explained above).




Defrag -- used for "I think I'm getting a lot of unnecessary head seek
in this application, these files need to be brought into closer
order".


Fragmentation was an issue for btrfs, at least a few kernels back (as 
explained above, with rsync's --inplace).
However, I'm not running autodefrag anywhere - not sure how it affects 
snapshots.




Scrub -- used for defensive checking a-la checkdisk. "I suspect that
after that unexpected power outage something may be a little off", or
alternately "I think my disks are giving me bitrot, I better check".


For me, it was passing fine, where balance was crashing the kernel.


Again, my main rationale for running balance is to see if btrfs is 
behaving stable. While I have systems with btrfs which are running fine 
for months, I also have ones which will crash after 1-2 weeks (once the 
system grows in size / complexity).


So hopefully, btrfsck had fixed that fs - once it is running stable for 
a week or two, I might be brave to re-enable btrfs quotas (was another 
system freezer, at least a few kernels back).



--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][BTRFS-PROGS] Print metadata profile instead of data profile

2014-12-13 Thread Goffredo Baroncelli

The function test_num_disk_vs_raid() show an error message if
the raid profile is incompatible with the number of devices.

Unfortunately when the error is related to data profile,
the message print the metadata profile.

How reproduce:

  $ mkfs.btrfs -f -m dup -d raid5 /dev/vdb"
  Error: unable to create FS with data profile 32 (have 1 devices)

Expected result:

  Error: unable to create FS with data profile 128 (have 1 devices)

mkfs.btrfs should print the profile 128 (==raid5) and not 32 (==dup)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] BUG: use metadata_profile instead of data_profile

2014-12-13 Thread Goffredo Baroncelli
The function test_num_disk_vs_raid() show an error message if
the raid profile is incompatible with the number of devices.

Unfortunately when the error is related to data profile,
the message print the metadata profile.

This patch correct the bug.

Signed-off-by: Goffredo Baroncelli 
---
 utils.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utils.c b/utils.c
index fc69e9b..e1b56ac 100644
--- a/utils.c
+++ b/utils.c
@@ -2116,7 +2116,7 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 
data_profile,
if (data_profile & ~allowed) {
snprintf(estr, sz, "unable to create FS with data "
"profile %llu (have %llu devices)\n",
-   metadata_profile, dev_cnt);
+   data_profile, dev_cnt);
return 1;
}
 
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: get more accurate output in df command.

2014-12-13 Thread Dongsheng Yang

On 12/13/2014 08:50 AM, Duncan wrote:

Goffredo Baroncelli posted on Fri, 12 Dec 2014 19:00:20 +0100 as
excerpted:


$ sudo ./btrfs fi df /mnt/btrfs1/
Data, RAID1: total=1.00GiB, used=512.00KiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

In this case the filesystem is empty (it was a new filesystem !).
However a 1G metadata chunk was already allocated. This is the reasons
why the free space is only 4Gb.

Trivial(?) correction.

Metadata chunks are quarter-gig, not 1 gig.  So that's 4 quarter-gig
metadata chunks allocated, not a (one/single) 1-gig metadata chunk.


Sorry but from my reading of the code, I have to say that in this case 
it is 1G.


Maybe I should be wrong, I would say, yes, the chunk_size for btrfs which is
smaller than 50G is 256M by default. But in mkfs, we alloc 1G for the first
metadata chunk.



On my system the ratio metadata/data is 234MB/8.82GB = ~3%, so ignoring
the metadata chunk from the free space may not be a big problem.

Presumably your use-case is primarily reasonably large files; too large
for their data to be tucked directly into metadata instead of allocating
an extent from a data chunk.

That's not always going to be the case.  And given the multi-device
default allocation of raid1 metadata, single data, files small enough to
fit into metadata have a default size effect double their actual size!
(Tho it can be noted that given btrfs' 4 KiB standard block size, without
metadata packing there'd still be an outsized effect for files smaller
than half that, 2 KiB or under, but there it'd be in data chunks, not
metadata.)



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] Btrfs: get more accurate output in df command.

2014-12-13 Thread Dongsheng Yang

On 12/13/2014 02:00 AM, Goffredo Baroncelli wrote:

On 12/11/2014 09:31 AM, Dongsheng Yang wrote:

When function btrfs_statfs() calculate the tatol size of fs, it is calculating
the total size of disks and then dividing it by a factor. But in some usecase,
the result is not good to user.

I am checking it; to me it seems a good improvement. However
I noticed that df now doesn't seem to report anymore the space consumed
by the meta-data chunk; eg:

# I have two disks of 5GB each
$ sudo ~/mkfs.btrfs -f -m raid1 -d raid1 /dev/vgtest/disk /dev/vgtest/disk1

$ df -h /mnt/btrfs1/
Filesystem   Size  Used Avail Use% Mounted on
/dev/mapper/vgtest-disk  5.0G  1.1G  4.0G  21% /mnt/btrfs1

$ sudo btrfs fi show
Label: none  uuid: 884414c6-9374-40af-a5be-3949cdf6ad0b
Total devices 2 FS bytes used 640.00KB
devid2 size 5.00GB used 2.01GB path /dev/dm-1
devid1 size 5.00GB used 2.03GB path /dev/dm-0

$ sudo ./btrfs fi df /mnt/btrfs1/
Data, RAID1: total=1.00GiB, used=512.00KiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=1.00GiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B

In this case the filesystem is empty (it was a new filesystem !). However a
1G metadata chunk was already allocated. This is the reasons why the free
space is only 4Gb.


Actually, in the original btrfs_statfs(), the space in metadata chunk 
was also not be considered
as available. But you will get 5G in this case with original 
btrfs_statfs() because there is a bug
in it. As I use another implementation to calculate the @available in df 
command, I did not

mention this problem. I can describe it as below.

In original btrfs_statfs(), we only consider the free space in data 
chunk as available.

<
list_for_each_entry_rcu(found, head, list) {
if (found->flags & BTRFS_BLOCK_GROUP_DATA) {
int i;

total_free_data += found->disk_total - 
found->disk_used;

>
In the later, we will add the total_free_data to @available in output of df.
buf->f_bavail = total_free_data;
BUT:
This is incorrect! It should be:
  buf->f_bavail = div_u64(total_free_data, factor);
That said this bug will add one more (data_chunk_disk_total - 
data_chunk_disk_used = 1G)
to @available by mistake. Unfortunately, the free space in 
metadata_chunk is also 1G.


Then we will get 5G in this case you provided here.

Conclusion:
Even in the original btrfs_statfs(), the free space in metadata is 
also considered as not available.
But one bug in it make it to be misunderstood. My new btrfs_statfs() 
still consider the free space

in metadata as not available, furthermore, I add it into @used.


On my system the ratio metadata/data is 234MB/8.82GB = ~3%, so ignoring the
metadata chunk from the free space may not be a big problem.






Example:
# mkfs.btrfs -f /dev/vdf1 /dev/vdf2 -d raid1
# mount /dev/vdf1 /mnt
# dd if=/dev/zero of=/mnt/zero bs=1M count=1000
# df -h /mnt
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdf1   3.0G 1018M  1.3G  45% /mnt
# btrfs fi show /dev/vdf1
Label: none  uuid: f85d93dc-81f4-445d-91e5-6a5cd9563294
Total devices 2 FS bytes used 1001.53MiB
devid1 size 2.00GiB used 1.85GiB path /dev/vdf1
devid2 size 4.00GiB used 1.83GiB path /dev/vdf2
a. df -h should report Size as 2GiB rather than as 3GiB.
Because this is 2 device raid1, the limiting factor is devid 1 @2GiB.

b. df -h should report Avail as 0.97GiB or less, rather than as 1.3GiB.
 1.85   (the capacity of the allocated chunk)
-1.018  (the file stored)
+(2-1.85=0.15)  (the residual capacity of the disks
 considering a raid1 fs)
---
=   0.97

This patch drops the factor at all and calculate the size observable to
user without considering which raid level the data is in and what's the
size exactly in disk.
After this patch applied:
# mkfs.btrfs -f /dev/vdf1 /dev/vdf2 -d raid1
# mount /dev/vdf1 /mnt
# dd if=/dev/zero of=/mnt/zero bs=1M count=1000
# df -h /mnt
Filesystem  Size  Used Avail Use% Mounted on
/dev/vdf1   2.0G  1.3G  713M  66% /mnt
# df /mnt
Filesystem 1K-blocksUsed Available Use% Mounted on
/dev/vdf12097152 1359424729536  66% /mnt
# btrfs fi show /dev/vdf1
Label: none  uuid: e98c1321-645f-4457-b20d-4f41dc1cf2f4
Total devices 2 FS bytes used 1001.55MiB
devid1 size 2.00GiB used 1.85GiB path /dev/vdf1
devid2 size 4.00GiB used 1.83GiB path /dev/vdf2
a). The @Size is 2G as we expected.
b). @Available is 700M = 1.85G - 1.3G + (2G - 1.85G).
c). @Used is changed to 1.3G rather than 1018M as above. Because
 th

Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-13 Thread Robert White

On 12/13/2014 12:16 AM, Tomasz Chmielewski wrote:

On 2014-12-12 23:58, Robert White wrote:


I don't have the history to answer this definitively, but I don't
think you have a choice. Nothing else is going to touch that error.

I have not seen any "oh my god, btrfsck just ate my filesystem errors"
since I joined the list -- but I am a relative newcomer.

I know that you, of course, as a contentious and well-traveled system
administrator, already have a current backup since you are doing
storage maintenance... right? 8-)


Who needs backups with btrfs, right? :)

So apparently btrfsck --repair fixed some issues, the fs is still
mountable and looks fine.

Running balance again, but that will take many days there.


Might I ask why you are running balance? After a persistent error I'd 
understand going straight to scrub, but balance is usually for 
transformation or to redistribute things after atypical use.


An entire generation of folks have grown used to defraging windows boxes 
and all, but if you've already got an array that is going to take "many 
days" to balance what benefit do you actually expect to receive?



Defrag -- used for "I think I'm getting a lot of unnecessary head seek 
in this application, these files need to be brought into closer order".


Scrub -- used for defensive checking a-la checkdisk. "I suspect that 
after that unexpected power outage something may be a little off", or 
alternately "I think my disks are giving me bitrot, I better check".


Btrfsck -- used for "I suspect structural problems caused by real world 
events like power hits or that one time when the cat knocked over my 
tower case while I was vacuuming all my sql tables." (often reserved for 
"hey, I'm getting weird messages from the kernel about things in my 
filesystem".)


Balance -- primary -- used for "Well I used to use this filessytem for a 
small number of large files, but now I am processing a large number of 
small files and I'm running out of metadata even though I've got a lot 
of space." (or vice versa)


Balance -- other -- used for "I just changed the geometry of my 
filessytem by adding or removing a disk and I want to spread out.


Balance -- (conversion/restructuring) -- used for "single is okay, but 
I'd rather raid-0 to spread out my load across these many disks" or 
"gee, I'd like some redundancy now that I have the room.




Frequent balancing of a Copy On Write filesystem will tend to make 
things somewhat anti-optimal. You are burping the natural working space 
out of the natural layout.


Since COW implies mandatory movement of data, every time you burp out 
all the slack and pack all the data together you are taking your 
regularly modified files and moving them far, far away from the places 
where frequently modified files are most happy (e.g. the 
only-partly-full data region they were just living in).


Similarly two files that usually get modified at the same time (say a 
databse file and its rollback log) will tend to end up in the same 
active data extent as time goes on, and if balance decides it can "clean 
up" that extent it will likely give those two files a data-extent 
divorce and force them to the opposite ends of dataland.


COW systems are inherently somewhat chaotic. If you fight that too 
aggressively you will, at best, be wasting the maintenance time.


It may be a decrease in performance measured in very small quanta, but 
so is the expected benefit of most maintenance.



From the wiki::

https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F

btrfs filesystem balance is an operation which simply takes all of the 
data and metadata on the filesystem, and re-writes it in a different 
place on the disks, passing it through the allocator algorithm on the 
way. It was originally designed for multi-device filesystems, to spread 
data more evenly across the devices (i.e. to "balance" their usage). 
This is particularly useful when adding new devices to a nearly-full 
filesystem.

Due to the way that balance works, it also has some useful side-effects:
If there is a lot of allocated but unused data or metadata chunks, a 
balance may reclaim some of that allocated space. This is the main 
reason for running a balance on a single-device filesystem.
On a filesystem with damaged replication (e.g. a RAID-1 FS with a dead 
and removed disk), it will force the FS to rebuild the missing copy of 
the data on one of the currently active devices, restoring the RAID-1 
capability of the filesystem.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-13 Thread Tomasz Chmielewski

On 2014-12-12 23:58, Robert White wrote:


I don't have the history to answer this definitively, but I don't
think you have a choice. Nothing else is going to touch that error.

I have not seen any "oh my god, btrfsck just ate my filesystem errors"
since I joined the list -- but I am a relative newcomer.

I know that you, of course, as a contentious and well-traveled system
administrator, already have a current backup since you are doing
storage maintenance... right? 8-)


Who needs backups with btrfs, right? :)

So apparently btrfsck --repair fixed some issues, the fs is still 
mountable and looks fine.


Running balance again, but that will take many days there.

# btrfsck --repair /dev/sdc1
fixing root item for root 8681, current bytenr 5568935395328, current 
gen 70315, current level 2, new bytenr 5569014104064, new gen 70316, new 
level 2

Fixed 1 roots.
checking extents
checking free space cache
checking fs roots
root 696 inode 2765103 errors 400, nbytes wrong
root 696 inode 2831256 errors 400, nbytes wrong
root 9466 inode 2831256 errors 400, nbytes wrong
root 9505 inode 2831256 errors 400, nbytes wrong
root 10139 inode 2831256 errors 400, nbytes wrong
root 10525 inode 2831256 errors 400, nbytes wrong
root 10561 inode 2831256 errors 400, nbytes wrong
root 10633 inode 2765103 errors 400, nbytes wrong
root 10633 inode 2831256 errors 400, nbytes wrong
root 10650 inode 2765103 errors 400, nbytes wrong
root 10650 inode 2831256 errors 400, nbytes wrong
root 10680 inode 2765103 errors 400, nbytes wrong
root 10680 inode 2831256 errors 400, nbytes wrong
root 10681 inode 2765103 errors 400, nbytes wrong
root 10681 inode 2831256 errors 400, nbytes wrong
root 10701 inode 2765103 errors 400, nbytes wrong
root 10701 inode 2831256 errors 400, nbytes wrong
root 10718 inode 2765103 errors 400, nbytes wrong
root 10718 inode 2831256 errors 400, nbytes wrong
root 10735 inode 2765103 errors 400, nbytes wrong
root 10735 inode 2831256 errors 400, nbytes wrong
enabling repair mode
Checking filesystem on /dev/sdc1
UUID: 371af1dc-d88b-4dee-90ba-91fec2bee6c3
cache and super generation don't match, space cache will be invalidated
found 942113871627 bytes used err is 1
total csum bytes: 2445349244
total tree bytes: 28743073792
total fs tree bytes: 22880043008
total extent tree bytes: 2890547200
btree space waste bytes: 5339534781
file data blocks allocated: 2779865800704
 referenced 3446026993664
Btrfs v3.17.3

real76m27.845s
user19m1.470s
sys 2m55.690s


--
Tomasz Chmielewski
http://www.sslrack.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html