Re: Switch raid mode without rebalance?

2016-08-26 Thread Gert Menke

Hi,

On 2016-08-26 13:52, Austin S. Hemmelgarn wrote:


Regular 'df' isn't to be trusted when dealing with BTRFS, the only
reason we report anything there is because many things break horribly
if we don't.

Yeah, I noticed. Seems to produce a reasonable guess, though.


Additionally, while running with multiple profiles while not balancing
should work, it's pretty much untested, and any number of things may
break.

Oh. Good to know.


Assuming your two disks have similar latency and transfer
speed, you're almost certainly better off just converting completely
to single mode (which works like raid0, just at the chunk level
instead of the block level).

Okay, I see.


On a slightly separate note, if your doing backups more frequently
than once a week, your probably better off just leaving the disks
connected and running.  Regular load/unload cycles are generally
harder on the mechanical components in modern disks than just leaving
them running 24/7.
True. A bit of context: I first want to make a full backup locally, then 
use the disks off-site for nightly incremental backups via internet.


Cheers
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Switch raid mode without rebalance?

2016-08-26 Thread Gert Menke

Hi Chris,

first off, thank you for the detailled explanations!

On 2016-08-26 01:04, Chris Murphy wrote:

No, it's not a file, directory or subvolume specific command. It
applies to a whole volume.
You are right, but all I was after in the first place was a way to 
change the mode for new data on the whole volume.
But I understand now that there is no simple switch for that. (And to be 
honest it's not overly important, I was just being curious.)



If I add another file, I'll get another data chunk allocated, and
it'll be added to the chunk tree as item 5, and it'll have its own
physical  offset on each device.
And this chunk just uses the same profile as the last one (or the parent 
in the tree), I suppose.



So the point now is, in order to change the profile of a chunk, it has
to be completely rewritten.

That makes sense.


To do what you want is planned, with no work picked up yet as far as I
know. It'd probably involve some work to associate something like an
xattr to let the allocator know which profile the user wants for the
data, and then to allocate it to the proper existing chunk or create a
new chunk with that profile as needed.

I see.

I think it would be very nice to have something like that for the 
different compression modes. For example, use LZ4 for daily use but LZMA 
for the subvolume that stores backups, and no compression at all for 
/boot, so the bootloader doesn't have to know about all the different 
compression algorithms.


Speaking of which, I read here: 
https://btrfs.wiki.kernel.org/index.php/Compression#Why_does_not_du_report_the_compressed_size.3F
that du will not tell me the compressed size of a file. This is very 
counter-intuitive, isn't it?
The reason stated is that some tools apparently determine the 
sparse-ness of a file by comparing the size with the stat.st_blocks 
value.
I do not know if there is a better way to do that, so maybe my argument 
falls apart right here, BUT: this looks to me like working around one 
bug by introducing another. Wouldn't it be better to have a mount option 
"make_du_lie_for_buggy_tools" for those that really need it? BTW, which 
tools would an honest du break, and how? (What harm is there in thinking 
that a compressed file is sparse?)


Thanks!
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Switch raid mode without rebalance?

2016-08-25 Thread Gert Menke

Hi,

On 2016-08-25 21:50, Chris Murphy wrote:

It's incidental right now. It's not something controllable or intended
to have enduring mixed profile block groups.

I see. (Kindof)


Such a switch doesn't exist, there's no way to define what files,
directories, or subvolumes, have what profiles.
Well it kind of does - a running balance process seems to have just that 
effect, it's just not persistent (and has the side effect of, well, 
balancing the existing data).



How does btrfs
find out which raid mode to use when writing new data?


That's kindof an interesting question. If you were to do 'btrfs
balance start -dconvert=single -mconvert=raid1' and very soon after
that do 'btrfs balance cancel' you'll end up with one or a few new
chunks with those profiles. When data is allocated to those chunks,
they will have those profile characteristics. When data is allocated
to old chunks that are still raid0, it will be raid0. The thing is,
you can't really tell or control what data will be placed in what
chunk. So it's plausible that some new data goes in old raid0 chunk,
and some old data goes in new single/raid1 chunks.


I'm not quite familiar with the concept of a chunk here.
Are chunks allocated for new data, or is the unallocated space divided 
into chunks, too?
In the former case, when creating a new chunk, does btrfs just look into 
a random already existing chunk and copy the raid mode from there?
In the latter case, could you (in theory) change the raid mode of all 
empty chunks only?


I know this is not an intended usage scenario; just being curious here.

Thanks!
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Switch raid mode without rebalance?

2016-08-25 Thread Gert Menke

Hi,

On 2016-08-25 20:26, Justin Kilpatrick wrote:


I'm not sure why you want to avoid a balance,
I didn't check, but I imagined it would slow down my rsync 
significantly.


Once you start this command all the new data should follow the new 
rules.

Ah, now that's interesting.
When the balance is running, df shows 4TB free space; when I cancel the 
balance, the free space goes back to a few 100GB.


But if the balancing only happens when the disk would otherwise be idle, 
and the mere fact that there is a balance process running will cause the 
new data to be written in single mode, I'm all set. I would not even 
have to wait for the balance to finish after the rsync is done; I could 
just cancel it and unmount the disks. A bit hacky, but in this case 
totally acceptable.


Thanks!
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Switch raid mode without rebalance?

2016-08-25 Thread Gert Menke

Hi,

I recently created a new btrfs on two disks - one 6TB, one 2TB - for 
temporary backup purposes.
It apparently defaulted to raid0 for data, and I didn't realize at the 
time that this would become a problem.
Now the 2TB is almost full, and df tells me I only have about 200GB of 
free space. Which makes sense, because raid0 spreads the data evenly on 
all disks.
However, I know that btrfs can have different raid modes on the same 
filesystem at the same time. So I was hoping I could just tell it to 
"switch to single mode for all new data", but I don't have a clue how to 
do that. I *could* rebalance, of course, but is that really necessary? 
How does btrfs find out which raid mode to use when writing new data?


Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS as image store for KVM?

2015-09-29 Thread Gert Menke

Hi,

thank you all for your helpful comments.

From what I've read, I forged the following guidelines (for myself; 
ymmv):
- Use btrfs for generic data storage on spinning disks and for 
everything on ssds.
- Use zfs for spinning disks that may be used for cow-unfriendly 
workloads, like vm images (if they are too big and/or too fast-changing 
for a scheduled defrag to make sense).


For now I'm going with the following setup: a Debian system with root on 
btrfs/raid1 on two ssds, and a raidz1 pool for storage and vm images. 
However, those few vms that really should be fast would also fit on the 
SSDs, so I might move them there and switch from ZFS to btrfs on the 
storage pool at some point in the future.


Some of the ideas presented here sound really interesting - for example 
I think that improving the Linux page cache to be more "arc-like" will 
probably benefit not only btrfs. Having both the page cache and the arc 
in parallel when using ZoL does not feel like an elegant solution, so 
maybe there's hope for that. (But I don't know if it is feasible for ZoL 
to abandon the arc in favor of an improved Linux page cache; I imagine 
it might be much work for little benefit.)


Thanks again
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS as image store for KVM?

2015-09-18 Thread Gert Menke

On 2015-09-18 04:22, Duncan wrote:

one way or another, you're going to
have to write two things, one a checksum of the other, and if they are 
in-
place-overwrites, while the race can be narrowed, there's always going 
to
be a point at which either one or the other will have been written, 
while

the other hasn't been, and if failure occurs at that point...


...then you still can recover the old data from the mirror or parity, 
and at least you don't have any inconsistent data. It's like the failure 
occurred just a tiny bit earlier.


The only real way around that is /some/ form of copy-on-write, such 
that

both the change and its checksum can be written to a different location
than the old version, with a single, atomic write then updating a 
pointer
to point to the new version of both the data and its checksum, instead 
of

the old one.


Or an intent log, but I guess that introduces a lot of additional writes 
(and seeks) that would impact performance noticeably...


Gert

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS as image store for KVM?

2015-09-17 Thread Gert Menke

Hi,

thank you for your answers!

So it seems there are several suboptimal alternatives here...

MD+LVM is very close to what I want, but md has no way to cope with 
silent data corruption. So if I'd want to use a guest filesystem that 
has no checksums either, I'm out of luck.
I'm honestly a bit confused here - isn't checksumming one of the most 
obvious things to want in a software RAID setup? Is it a feature that 
might appear in the future? Maybe I should talk to the md guys...


BTRFS looks really nice feature-wise, but is not (yet) optimized for my 
use-case I guess. Disabling COW would certainly help, but I don't want 
to lose the data checksums. Is nodatacowbutkeepdatachecksums a feature 
that might turn up in the future?


Maybe ZFS is the best choice for my scenario. At least, it seems to work 
fine for Joyent - their SmartOS virtualization OS is essentially Illumos 
(Solaris) with ZFS, and KVM ported from Linux.
Since ZFS supports "Volumes" (virtual block devices inside a ZPool), I 
suspect these are probably optimized to be used for VM images (i.e. do 
as little COW as possible). Of course, snapshots will always degrade 
performance to a degree.


However, there are some drawbacks to ZFS:
- It's less flexible, especially when it comes to reconfiguration of 
disk arrays. Add or remove a disk to/from a RaidZ and rebalance, that 
would be just awesome. It's possible in BTRFS, but not ZFS. :-(
- The not-so-good integration of the fs cache, at least on Linux. I 
don't know if this is really an issue, though. Actually, I imagine it's 
more of an issue for guest systems, because it probably breaks memory 
ballooning. (?)


So it seems there are two options for me:
1. Go with ZFS for now, until BTRFS finds a better way to handle disk 
images, or until md gets data checksums.
2. Buy a bunch of SSDs for VM disk images and use spinning disks for 
data storage only. In that case, BTRFS should probably do fine.


Any comments on that? Am I missing something?

Thanks!
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS as image store for KVM?

2015-09-17 Thread Gert Menke

On 17.09.2015 at 21:43, Hugo Mills wrote:

On Thu, Sep 17, 2015 at 07:56:08PM +0200, Gert Menke wrote:

BTRFS looks really nice feature-wise, but is not (yet) optimized for
my use-case I guess. Disabling COW would certainly help, but I don't
want to lose the data checksums. Is nodatacowbutkeepdatachecksums a
feature that might turn up in the future?

[snip]

No. If you try doing that particular combination of features, you
end up with a filesystem that can be inconsistent: there's a race
condition between updating the data in a file and updating the csum
record for it, and the race can't be fixed.
I'm no filesystem expert, but isn't that what an intent log is for? 
(Does btrfs have an intent log?)


And, is this also true for mirrored or raid5 disks?
I'm thinking something like "if the data does not match the checksum, 
just restore both from mirror/parity" should be possible, right?


Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS as image store for KVM?

2015-09-17 Thread Gert Menke

On 17.09.2015 at 20:35, Chris Murphy wrote:

You can use Btrfs in the guest to get at least notification of SDC.
Yes, but I'd rather not depend on all potential guest OSes having btrfs 
or something similar.



Another way is to put a conventional fs image on e.g. GlusterFS with
checksumming enabled (and at least distributed+replicated filtering).

This sounds interesting! I'll have a look at this.


If you do this directly on Btrfs, maybe you can mitigate some of the
fragmentation issues with bcache or dmcache;
Thanks, I did not know about these. bcache seems to be more or less what 
"zpool add foo cache /dev/ssd" does. Definitely worth a look.


> and for persistent snapshotting, use qcow2 to do it instead of Btrfs. 
You'd use Btrfs

snapshots to create a subvolume for doing backups of the images, and
then get rid of the Btrfs snapshot.

Good idea.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS as image store for KVM?

2015-09-15 Thread Gert Menke

Hi everybody,

first off, I'm not 100% sure if this is the right place to ask, so if 
it's not, I apologize and I'd appreciate a pointer in the right direction.


I want to build a virtualization server to replace my current home 
server. I'm thinking about a Debian system with libvirt/KVM. The system 
will have one or two SSDs and five harddisks with some kind of software 
RAID5 for storage. I'd like to have a filesystem with data checksums, so 
BTRFS seems like the right way to go. However, I read that BTRFS does 
not perform well as storage for KVM disk images.

(See here: http://www.linux-kvm.org/page/Tuning_KVM )

Is this still true?

I would appreciate any comments and/or tips you might have on this topic.

Is anyone using BTRFS as an image store? Are there any special settings 
I should be aware of to make it work well?


Thanks,
Gert
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html