On 11/9/22 00:24, hw wrote:
> On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:
> Hmm, when you can backup like 3.5TB with that, maybe I should put
FreeBSD on my
> server and give ZFS a try. Worst thing that can happen is that it
crashes and
> I'd have made an experiment that wasn't successful. Best thing, I
guess, could
> be that it works and backups are way faster because the server
doesn't have to
> actually write so much data because it gets deduplicated and reading
from the
> clients is faster than writing to the server.
Be careful that you do not confuse a ~33 GiB full backup set, and 78
snapshots over six months of that same full backup set, with a full
backup of 3.5 TiB of data. I would suggest a 10 GiB pool to backup the
latter.
Writing to a ZFS filesystem with deduplication is much slower than
simply writing to, say, an ext4 filesystem -- because ZFS has to hash
every incoming block and see if it matches the hash of any existing
block in the destination pool. Storing the existing block hashes in a
dedicated dedup virtual device will expedite this process.
>> I run my backup script each night. It uses rsync to copy files and
>
> Aww, I can't really do that because my servers eats like 200-300W
because it has
> so many disks in it. Electricity is outrageously expensive here.
Perhaps platinum rated power supplies? Energy efficient HDD's/ SSD's?
>> directories from various LAN machines into ZFS filesystems named after
>> each host -- e.g. pool/backup/hostname (ZFS namespace) and
>> /var/local/backup/hostname (Unix filesystem namespace). I have a
>> cron(8) that runs zfs-auto-snapshot once each day and once each month
>> that takes a recursive snapshot of the pool/backup filesystems. Their
>> contents are then available via Unix namespace at
>> /var/local/backup/hostname/.zfs/snapshot/snapshotname. If I want to
>> restore a file from, say, two months ago, I use Unix filesystem tools to
>> get it.
>
> Sounds like a nice setup. Does that mean you use snapshots to keep
multiple
> generations of backups and make backups by overwriting everything
after you made
> a snapshot?
Yes.
> In that case, is deduplication that important/worthwhile? You're not
> duplicating it all by writing another generation of the backup but
store only
> what's different through making use of the snapshots.
Without deduplication or compression, my backup set and 78 snapshots
would require 3.5 TiB of storage. With deduplication and compression,
they require 86 GiB of storage.
> ... I only never got around to figure [ZFS snapshots] out because I
didn't have the need.
I accidentally trash files on occasion. Being able to restore them
quickly and easily with a cp(1), scp(1), etc., is a killer feature.
Users can recover their own files without needing help from a system
administrator.
> But it could also be useful for "little" things like taking a
snapshot of the
> root volume before updating or changing some configuration and being
able to
> easily to undo that.
FreeBSD with ZFS-on-root has a killer feature called "Boot Environments"
that has taken that idea to the next level:
https://klarasystems.com/articles/managing-boot-environments/
>> I have 3.5 TiB of backups.
It is useful to group files with similar characteristics (size,
workload, compressibility, duplicates, backup strategy, etc.) into
specific ZFS filesystems (or filesystem trees). You can then adjust ZFS
properties and backup strategies to match.
>>>> For compressed and/or encrypted archives, image, etc., I do not use
>>>> compression or de-duplication
>>>
>>> Yeah, they wouldn't compress. Why no deduplication?
>>
>>
>> Because I very much doubt that there will be duplicate blocks in
such files.
>
> Hm, would it hurt?
Yes. ZFS deduplication is resource intensive.
> Oh it's not about performance when degraded, but about performance.
IIRC when
> you have a ZFS pool that uses the equivalent of RAID5, you're still
limited to
> the speed of a single disk. When you have a mysql database on such a ZFS
> volume, it's dead slow, and removing the SSD cache when the SSDs
failed didn't
> make it any slower. Obviously, it was a bad idea to put the database
there, and
> I wouldn't do again when I can avoid it. I also had my data on such
a volume
> and I found that the performance with 6 disks left much to desire.
What were the makes and models of the 6 disks? Of the SSD's? If you
have a 'zpool status' console session from then, please post it.
Constructing a ZFS pool to match the workload is not easy. STFW there
are plenty of articles. Here is a general article I found recently:
https://klarasystems.com/articles/choosing-the-right-zfs-pool-layout/
MySQL appears to have the ability to use raw disks. Tuned correctly,
this should give the best results:
https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tablespace.html#innodb-raw-devices
If ZFS performance is not up to your expectations, and there are no
hardware problems, next steps include benchmarking, tuning, and/or
adding or adjusting the hardware and its usage.
>> ... invest in hardware to get performance.
> Hardware like?
Server chassis, motherboards, chipsets, processors, memory, disk host
bus adapters, disk racks, disk drives, network interface cards, etc..
> In theory, using SSDs for cache with ZFS should improve
> performance. In practise, it only wore out the SSDs after a while,
and now it's
> not any faster without SSD cache.
Please run 'zpool status' and post the console session (prompt, command
entered, output displayed). Please correlate the vdev's to disk drive
makes and models.
On 11/9/22 03:41, hw wrote:
I don't have anything without ECC RAM,
Nice.
and my server was never meant for ZFS.
What is the make and model of your server?
With mirroring, I could fit only one backup, not two.
Add another mirror to your pool. Or, use a process of substitution and
resilvering to replace existing drives with larger capacity drives.
In any case, I'm currently tending to think that putting FreeBSD with ZFS on my
server might be the best option. But then, apparently I won't be able to
configure the controller cards, so that won't really work.
What is the make and model of your controller cards?
And ZFS with Linux
isn't so great because it keeps fuse in between.
On 11/9/22 09:24, hw wrote:
> On Wed, 2022-11-09 at 17:29 +0100, DdB wrote:
>> NO fuse, neither FreeBSD nor debian would need the outdated zfs-fuse,
>> use the in.kernel modules from zfsonlinux.org (packages for debian are
>> in contrib IIRC).
>>
>
> Ok, all the better --- I only looked at the package management. Ah,
let me see,
> I have a Debian VM and no contrib ... hm, zfs-dkms and such? That's
promising,
+1
https://packages.debian.org/bullseye/zfs-dkms
https://packages.debian.org/bullseye/zfsutils-linux
On 11/9/22 03:20, hw wrote:
> The use case comes down to making backups once in a while. When
making another
> backup, at least the latest previous backup must not be overwritten.
Sooner or
> later, there won't be enough disk space to keep two full backups.
With disk
> prices as crazy high as they currently are, I might even move discs
from the
> backup server to the active server when it runs out of space before I
move data
> into archive (without backup) or start deleting stuff. All prices
keep going
> up, so I don't expect disc prices to go down.
>
> Deduplication is only one possible way to go about it. I'm undecided
if it's
> better to have only one full backup and to use snapshots instead.
Deduplicating
> the backups would kinda turn two copies into only one for whatever gets
> deduplicated, so that might not be better as snapshots. Or I could
use both and
> perhaps save even more space.
On 11/9/22 04:28, hw wrote:
> Of course it would better to have more than one machine, but I don't
have that.
If you already have a ZFS pool, the way to back it up is to replicate
the pool to another pool. Set up an external drive with a pool and
replicate your server pool to that periodically.
David