Martin Steigerwald posted on Sat, 28 Jan 2012 13:08:52 +0100 as excerpted:

> Am Donnerstag, 26. Januar 2012 schrieb Duncan:

>> The current layout has a total of 16 physical disk partitions on each
>> of the four drives, mostly of which are 4-disk md/raid1, but with a
>> couple md/raid1s for local cache of redownloadables, etc, thrown in.
>> Some of the mds are further partitioned (mdp), some not.  A couple are
>> only 2- disk md/raid1 instead of the usual 4-disk.  Most mds have a
>> working and backup copy of exactly the same partitioned size, thus
>> explaining the multitude of partitions, since most of them come in
>> pairs.  No lvm as I'm not running an initrd which meant it couldn't
>> handle root, and I wasn't confident in my ability to recover the system
>> in an emergency with lvm either, so I was best off without it.
> 
> Sounds like a quite complex setup.

It is.  I was actually writing a rather more detailed description, but 
decided few would care and it'd turn into a tl;dr.  It was I think the 
4th rewrite that finally got it down to something reasonable while still 
hopefully conveying any details that might be corner-cases someone knows 
something about.

>> Three questions:
>> 
>> 1) My /boot partition and its backup (which I do want to keep separate
>> from root) are only 128 MB each.  The wiki recommends 1 gig sizes
>> minimum, but there's some indication that's dated info due to mixed
>> data/ metadata mode in recent kernels.
>> 
>> Is a 128 MB btrfs reasonable?  What's the mixed-mode minumum
>> recommended and what is overhead going to look like?
> 
> I don´t know.
> 
> You could try with a loop device. Just create one and mkfs.btrfs on it,
> mount it and copy your stuff from /boot over to see whether that works
> and how much space is left.

The loop device is a really good idea that hadn't occurred to me.  Thanks!

> On BTRFS I recommend using btrfs filesystem df for more exact figures of
> space utilization that df would return.

Yes.  I've read about the various space reports on the wiki so have the 
general idea, but will of course need to review it again after I get 
something setup so I can actually type in the commands and see for 
myself.  Still, thanks for the reinforcement.  It certainly won't hurt, 
and of course it's quite possible that others will end up reading this 
too, so it could end up being a benefit to many people, not just me. =:^)

> You may try with:
> 
>        -M, --mixed
>               Mix  data  and  metadata  chunks together for more
>               efficient space utilization.  This feature  incurs a 
>               performance  penalty in larger filesystems.  It is
>               recommended for use with filesystems of  1  GiB or
>               smaller.
> 
> for smaller partitions (see manpage of mkfs.btrfs).

I had actually seen that too, but as it's newer there's significantly 
less mentions of it out there, so the reinforcement is DEFINITELY 
valued!  I like to have a rather good general sysadmin's idea of what's 
going on and how everything fits together, as opposed to simply following 
instructions by rote, before I'm really comfortable with something as 
critical as filesystem maintenance (keeping in mind that when one really 
tends to need that knowledge is in an already stressful recovery 
situation, very possibly without all the usual documentation/net-
resources available), and repetition of the basics helps getting 
comfortable with it, so I'm very happy for it even if it isn't "new" to 
me. =:^)  (As mentioned, that was a big reason behind my ultimate 
rejection of LVM, I simply couldn't get comfortable enough with it to be 
confident of my ability to recover it in an emergency recovery situation.)

>> 2)  The wiki indicates that btrfs-raid1 and raid-10 only mirror data 2-
>> way, regardless of the number of devices.  On my now aging disks, I
>> really do NOT like the idea of only 2-copy redundancy.  I'm far happier
>> with the 4-way redundancy, twice for the important stuff since it's in
>> both working and backup mds altho they're on the same 4-disk set (tho I
>> do have an external drive backup as well, but it's not kept as
>> current).
>> 
>> If true that's a real disappointment, as I was looking forward to
>> btrfs- raid1 with checksummed integrity management.
> 
> I didn´t see anything like this.
> 
> Would be nice to be able to adapt the redundancy degree where possible.

I posted the wiki reference in reply to someone else recently.  Let's see 
if I can find it again...

Here it is.  This is from the bottom of the RAID and data replication 
section (immediately above "Balancing") on the SysadminGuide page:

>>>>>
With RAID-1 and RAID-10, only two copies of each byte of data are 
written, regardless of how many block devices are actually in use on the 
filesystem. 
<<<<<

But that's one of the bits that I hoped was stale, and that it allowed 
setting the number of copies for both data and metadata, now.  However, I 
don't see any options along that line to feed to mkfs.btrfs or btrfs * 
either one, so it would seem it's not there yet, at least not in btrfs-
tools as built just a couple days ago from the official/mason tree on 
kernel.org.  I haven't tried the integration tree (aka Hugo Mills' aka 
darksatanic.net tree).  So I guess that wiki quote is still correct.  Oh, 
well... maybe later-this-year/in-a-few-kernel-cycles.

> An idea might be splitting into a delayed synchronisation mirror:
> 
> Have two BTRFS RAID-1 - original and backup - and have a cronjob with
> rsync mirroring files every hour or so. Later this might be replaced by
> btrfs send/receive - or by RAID-1 with higher redundancy.

That's an interesting idea.  However, as I run git kernels and don't 
accumulate a lot of uptime in any case, what I'd probably do is set up 
the rsync to be run after a successful boot or mount of the filesystem in 
question.  That way, if it ever failed to boot/mount for whatever reason, 
I could be relatively confident that the backup version remained intact 
and usable.

That's actually /quite/ an interesting idea.  While I have working and 
backup partitions for most stuff now, the process remains a manual one, 
when I think the system is stable enough and enough time has passed since 
the last one, so the backup tends to be weeks or months old as opposed to 
days or hours.  This idea, modified to do it once per boot or mount or 
whatever, would keep the backups far more current and be much less hassle 
than the manual method I'm using now.  So even if I don't immediately 
switch to btrfs as I had thought I might, I can implement those scripts 
on the current system now, and then they'll be ready and tested, needing 
little modification when I switch to btrfs, later.

Thanks for the ideas! =:^)

>> 3) How does btrfs space overhead (and ENOSPC issues) compare to
>> reiserfs with its (default) journal and tail-packing?  My existing
>> filesystems are 128 MB and 4 GB at the low end, and 90 GB and 16 GB at
>> the high end.  At the same size, can I expect to fit more or less data
>> on them?  Do the compression options change that by much "IRL"?  Given
>> that I'm using same- sized partitions for my raid-1s, I guess at least
>> /that/ angle of it's covered.
> 
> The efficiency of the compression options depend highly of the kind of
> data you want to store.
> 
> I tried lzo on a external disk with movies, music files, images and
> software archives. The effect has been minimal, about 3% or so. But for
> unpacked source trees, lots of clear text files, likely also virtual
> machine image files or other nicely compressible data the effect should
> be better.

Back in the day, MS-DOS 6.2 on a 130 MB hard drive, I used to run MS 
Drivespace (which I guess they partnered with Stacker to get the tech 
for, then dropped the Stacker partnership like a hot potato after they'd 
sucked out all the tech they wanted, killing Stacker in the process...), 
so I'm familiar with the idea of filesystem or lower integrated 
compression and realize that it's definitely variable.  I was just 
wondering what the real-life usage scenarios had come up with, realizing 
even as I wrote it that the question wasn't one that could be answered in 
anything but general terms.

But I run Gentoo and thus deal with a lot of build scripts, etc, plus the 
usual *ix style plain text config files, etc, so I expect for that 
compression will be pretty good.  Rather less so on the media and bzip-
tarballed binpkgs partitions, certainly, with the home partition likely 
intermediate since it has a lot of plain text /and/ a lot of pre-
compressed data.

Meanwhile, even without a specific answer, just the discussion is helping 
to clarify my understanding and expectations regarding compression, so 
thanks.

> Although BTRFS received a lot of fixes for ENOSPC issues I would be a
> bit reluctant with very small filesystems. But that is just a gut
> feeling. So I do not know whether the option -M from above is tested
> widely. I doubt it.

The only real small filesystem/raid I have is /boot, the 128 MB 
mentioned.  But in thinking it over a bit more since I wrote the initial 
post, I realized that given the 9-ish gigs of unallocated freespace at 
the end of the drives and the fact that most of the partitions are at a 
quarter-gig offset due to the 128 MB /boot and the combined 128 MB BIOS 
and UEFI reserved partitions, I have room to expand both by several 
times, and making the total of all 3 (plus the initial few sectors of 
unpartitioned boot area) at the beginning of the drive an even 1 gig 
would give me even gig offsets for all the other partitions/raids as well.

So I'll almost certainly expand /boot from 1/8 gig to 1/4 gig, and maybe 
to half or even 3/4 gig, just so the offsets for everything else end up 
at even half or full gig boundaries, instead of the quarter-gig I have 
now.  Between that and mixed-mode, I think the potential sizing issue of 
/boot pretty much disappears.  One less problem to worry about. =:^)


So the big sticking point now is two-copy-only data on btrfs-raid1, 
regardless of the number of drives, and sticking that on top of md/raid's 
a workaround, tho obviously I'd much rather a btrfs that could mirror 
both data and metadata an arbtrary number of ways instead of just two.  
(There's some hints that metadata at least gets mirrored to all drives in 
a btrfs-raid1, tho nothing clearly states it one way or another.  But 
without data mirrored to all drives as well, I'm just not comfortable.)

But while not ideal, the data integrity checking of two-way btrfs-raid1 
on two-way md/raid1 should at least be better than entirely unverified
4-way md/raid1, and I expect the rest will come over time, so I could 
simply upgrade anyway.

OTOH, in general as I've looked closer, I've found btrfs to be rather 
farther away from exiting experimental than the prominent adoption by 
various distros had led me to believe, and without N-way mirroring raid, 
one of the two big features that I was looking forward to (the other 
being the data integrity checking) just vaporized in front of my eyes, so 
I may well hold off on upgrading until, potentially, late this year 
instead of early this year, even if there are workarounds.  I'm just not 
sure it's worth the cost of dealing with the still experimental aspects.


Either way, however, this little foray into previously unexplored 
territory leaves me with a MUCH firmer grasp of btrfs.  It's no longer 
simply a vague filesystem with some vague features out there.

And now that I'm here, I'll probably stay on the list as well, as I've 
already answered a number of questions posted by others, based on the 
material in the wiki and manpages, so I think I have something to 
contribute, and keeping up with developments will be far easier if I stay 
involved.


Meanwhile, again and overall, thanks for the answer.  I did have most of 
the bits of info I needed there floating around, but having someone to 
discuss my questions with has definitely helped solidify the concepts, 
and you've given me at least two very good suggestions that were entirely 
new to me and that would have certainly taken me quite some time to come 
up with on my own, if I'd been able to do so at all, so thanks, indeed! 
=:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to