On Tue, Jan 5, 2016 at 5:16 PM, lee <l...@yagibdah.de> wrote:
> Rich Freeman <ri...@gentoo.org> writes:
>
>>
>> I would run btrfs on bare partitions and use btrfs's raid1
>> capabilities.  You're almost certainly going to get better
>> performance, and you get more data integrity features.
>
> That would require me to set up software raid with mdadm as well, for
> the swap partition.

Correct, if you don't want a panic if a single swap drive fails.

>
>> If you have a silent corruption with mdadm doing the raid1 then btrfs
>> will happily warn you of your problem and you're going to have a
>> really hard time fixing it,
>
> BTW, what do you do when you have silent corruption on a swap partition?
> Is that possible, or does swapping use its own checksums?

If the kernel pages in data from the good mirror, nothing happens.  If
the kernel pages in data from the bad mirror, then whatever data
happens to be there is what will get loaded and used and/or executed.
If you're lucky the modified data will be part of unused heap or
something.  If not, well, just about anything could happen.

Nothing in this scenario will check that the data is correct, except
for a forced scrub of the disks.  A scrub would probably detect the
error, but I don't think mdadm has any ability to recover it.  Your
best bet is probably to try to immediately reboot and save what you
can, or a less-risky solution assuming you don't have anything
critical in RAM is to just do an immediate hard reset so that there is
no risk of bad data getting swapped in and overwriting good data on
your normal filesystems.

> It's still odd.  I already have two different file systems and the
> overhead of one kind of software raid while I would rather stick to one
> file system.  With btrfs, I'd still have two different file systems ---
> plus mdadm and the overhead of three different kinds of software raid.

I'm not sure why you'd need two different filesystems.  Just btrfs for
your data.  I'm not sure where you're counting three types of software
raid either - you just have your swap.  And I don't think any of this
involves any significant overhead, other than configuration.

>
> How would it be so much better to triple the software raids and to still
> have the same number of file systems?

Well, the difference would be more data integrity insofar as hardware
failure goes, but certainly more risk of logical errors (IMO).

>
>>> When you use hardware raid, it
>>> can be disadvantageous compared to btrfs-raid --- and when you use it
>>> anyway, things are suddenly much more straightforward because everything
>>> is on raid to begin with.
>>
>> I'd stick with mdadm.  You're never going to run mixed
>> btrfs/hardware-raid on a single drive,
>
> A single disk doesn't make for a raid.

You misunderstood my statement.  If you have two drives, you can't run
both hardware raid and btrfs raid across them.  Hardware raid setups
don't generally support running across only part of a drive, and in
this setup you'd have to run hardware raid on part of each of two
single drives.

>
>> and the only time I'd consider
>> hardware raid is with a high quality raid card.  You'd still have to
>> convince me not to use mdadm even if I had one of those lying around.
>
> From my own experience, I can tell you that mdadm already does have
> significant overhead when you use a raid1 of two disks and a raid5 with
> three disks.  This overhead may be somewhat due to the SATA controller
> not being as capable as one would expect --- yet that doesn't matter
> because one thing you're looking at, besides reliability, is the overall
> performance.  And the overall performance very noticeably increased when
> I migrated from mdadm raids to hardware raids, with the same disks and
> the same hardware, except that the raid card was added.

Well, sure, the raid card probably had battery-backed cache if it was
decent, so linux could complete its commits to RAM and not have to
wait for the disks.

>
> And that was only 5 disks.  I also know that the performance with a ZFS
> mirror with two disks was disappointingly poor.  Those disks aren't
> exactly fast, but still.  I haven't tested yet if it changed after
> adding 4 mirrored disks to the pool.  And I know that the performance of
> another hardware raid5 with 6 disks was very good.

You're probably going to find the performance of a COW filesystem to
be inferior to that of an overwrite-in-place filesystem, simply
because the latter has to do less work.

>
> Thus I'm not convinced that software raid is the way to go.  I wish they
> would make hardware ZFS (or btrfs, if it ever becomes reliable)
> controllers.

I doubt it would perform any better.  What would that controller do
that your CPU wouldn't do?  Well, other than have battery-backed
cache, which would help in any circumstance.  If you stuck 5 raid
cards in your PC and put one drive on each card and put mdadm or ZFS
across all five it would almost certainly perform better because
you're adding battery-backed cache.

>
> The relevant advantage of btrfs is being able to make snapshots.  Is
> that worth all the (potential) trouble?  Snapshots are worthless when
> the file system destroys them with the rest of the data.

And that is why I wouldn't use btrfs on a production system unless the
use case mitigated this risk and there was benefit from the snapshots.
Of course you're taking on more risk using an experimental filesystem.

>>
>> btrfs does not support swap files at present.
>
> What happens when you try it?

No idea.  Should be easy to test in a VM.  I suspect either an error
or a kernel bug/panic/etc.

>
>> When it does you'll need to disable COW for them (using chattr)
>> otherwise they'll be fragmented until your system grinds to a halt.  A
>> swap file is about the worst case scenario for any COW filesystem -
>> I'm not sure how ZFS handles them.
>
> Well, then they need to make special provisions for swap files in btrfs
> so that we can finally get rid of the swap partitions.

I'm sure they'll happily accept patches.  :)

>
>> If I had done that in the past I think I would have completely avoided
>> that issue that required me to restore from backups.  That happened in
>> the 3.15/3.16 timeframe and I'd have never even run those kernels.
>> They were stable kernels at the time, and a few versions in when I
>> switched to them (I was probably just following gentoo-sources stable
>> keywords back then), but they still had regressions (fixes were
>> eventually backported).
>
> How do you know if an old kernel you pick because you think the btrfs
> part works well enough is the right pick?  You can either encounter a
> bug that has been fixed or a regression that hasn't been
> discovered/fixed yet.  That way, you can't win.

You read the lists closely.  If you want to be bleeding-edge it will
take more work than if you just go with the flow.  That's why I'm not
on 4.1 yet - I read the lists and am not quite sure they're ready yet.

>
>> I think btrfs is certainly usable today, though I'd be hesitant to run
>> it on production servers depending on the use case (I'd be looking for
>> a use case that actually has a significant benefit from using btrfs,
>> and which somehow mitigates the risks).
>
> There you go, it's usable, and the risk of using it is too high.

That is a judgement that everybody has to make based on their
requirements.  The important thing is to make an informed decision.  I
don't get paid if you pick btrfs.

>
>> Right now I keep a daily rsnapshot (rsync on steroids - it's in the
>> Gentoo repo) backup of my btrfs filesystems on ext4.  I occasionally
>> debate whether I still need it, but I sleep better knowing I have it.
>> This is in addition to my daily duplicity cloud backups of my most
>> important data (so, /etc and /home are in the cloud, and mythtv's
>> /var/video is just on a local rsync backup).
>
> I wouldn't give my data out of my hands.

Somehow I doubt the folks at Amazon are going to break RSA anytime soon.

>
> Snapper?  I've never heard of that ...
>

http://snapper.io/

Basically snapshots+crontab and some wrappers to set retention
policies and such.  That and some things like package-manager plugins
so that you get snapshots before you install stuff.

>
> Queuing up the data when there's more data than the system can deal with
> only works when the system has sufficient time to catch up with the
> queue.  Otherwise, you have to block something at some point, or you
> must drop the data.  At that point, it doesn't matter how you arrange
> the contents of the queue within it.

Absolutely true.  You need to throttle the data before it gets into
the queue, so that the business of the queue is exposed to the
applications so that they behave appropriately (falling back to
lower-bandwidth alternatives, etc).  In my case if mythtv's write
buffers are filling up and I'm also running an emerge install phase
the correct answer (per ionice) is for emerge to block so that my
realtime video capture buffers are safely flushed.  What you don't
want is for the kernel to let emerge dump a few GB of low-priority
data into the write cache alongside my 5Mbps HD recording stream.
Granted, it isn't as big a problem as it used to be now that RAM sizes
have increased.

>
> Gentoo /is/ fire-and-forget in that it works fine.  Btrfs is not in that
> it may work or not.
>

Well, we certainly must have come a long way then.  :)  I still
remember the last time the glibc ABI changed and I was basically
rebuilding everything from single-user mode holding my breath.


-- 
Rich

Reply via email to