Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

David Christensen Wed, 09 Nov 2022 21:37:37 -0800

On 11/9/22 00:24, hw wrote:
> On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:

> Hmm, when you can backup like 3.5TB with that, maybe I should putFreeBSD on my> server and give ZFS a try. Worst thing that can happen is that itcrashes and> I'd have made an experiment that wasn't successful. Best thing, Iguess, could> be that it works and backups are way faster because the serverdoesn't have to> actually write so much data because it gets deduplicated and readingfrom the

> clients is faster than writing to the server.

Be careful that you do not confuse a ~33 GiB full backup set, and 78snapshots over six months of that same full backup set, with a fullbackup of 3.5 TiB of data. I would suggest a 10 GiB pool to backup thelatter.

Writing to a ZFS filesystem with deduplication is much slower thansimply writing to, say, an ext4 filesystem -- because ZFS has to hashevery incoming block and see if it matches the hash of any existingblock in the destination pool. Storing the existing block hashes in adedicated dedup virtual device will expedite this process.



>> I run my backup script each night.  It uses rsync to copy files and
>

> Aww, I can't really do that because my servers eats like 200-300Wbecause it has

> so many disks in it.  Electricity is outrageously expensive here.


Perhaps platinum rated power supplies?  Energy efficient HDD's/ SSD's?


>> directories from various LAN machines into ZFS filesystems named after
>> each host -- e.g. pool/backup/hostname (ZFS namespace) and
>> /var/local/backup/hostname (Unix filesystem namespace).  I have a
>> cron(8) that runs zfs-auto-snapshot once each day and once each month
>> that takes a recursive snapshot of the pool/backup filesystems.  Their
>> contents are then available via Unix namespace at
>> /var/local/backup/hostname/.zfs/snapshot/snapshotname.  If I want to
>> restore a file from, say, two months ago, I use Unix filesystem tools to
>> get it.
>

> Sounds like a nice setup. Does that mean you use snapshots to keepmultiple> generations of backups and make backups by overwriting everythingafter you made

> a snapshot?


Yes.


> In that case, is deduplication that important/worthwhile?  You're not

> duplicating it all by writing another generation of the backup butstore only

> what's different through making use of the snapshots.

Without deduplication or compression, my backup set and 78 snapshotswould require 3.5 TiB of storage. With deduplication and compression,they require 86 GiB of storage.

> ... I only never got around to figure [ZFS snapshots] out because Ididn't have the need.

I accidentally trash files on occasion. Being able to restore themquickly and easily with a cp(1), scp(1), etc., is a killer feature.Users can recover their own files without needing help from a systemadministrator.

> But it could also be useful for "little" things like taking asnapshot of the> root volume before updating or changing some configuration and beingable to

> easily to undo that.

FreeBSD with ZFS-on-root has a killer feature called "Boot Environments"that has taken that idea to the next level:


https://klarasystems.com/articles/managing-boot-environments/


>> I have 3.5 TiB of backups.

It is useful to group files with similar characteristics (size,workload, compressibility, duplicates, backup strategy, etc.) intospecific ZFS filesystems (or filesystem trees). You can then adjust ZFSproperties and backup strategies to match.



>>>> For compressed and/or encrypted archives, image, etc., I do not use
>>>> compression or de-duplication
>>>
>>> Yeah, they wouldn't compress.  Why no deduplication?
>>
>>

>> Because I very much doubt that there will be duplicate blocks insuch files.

>
> Hm, would it hurt?


Yes.  ZFS deduplication is resource intensive.

> Oh it's not about performance when degraded, but about performance.IIRC when> you have a ZFS pool that uses the equivalent of RAID5, you're stilllimited to

> the speed of a single disk.  When you have a mysql database on such a ZFS

> volume, it's dead slow, and removing the SSD cache when the SSDsfailed didn't> make it any slower. Obviously, it was a bad idea to put the databasethere, and> I wouldn't do again when I can avoid it. I also had my data on sucha volume

> and I found that the performance with 6 disks left much to desire.

What were the makes and models of the 6 disks? Of the SSD's? If youhave a 'zpool status' console session from then, please post it.

Constructing a ZFS pool to match the workload is not easy. STFW thereare plenty of articles. Here is a general article I found recently:


https://klarasystems.com/articles/choosing-the-right-zfs-pool-layout/

MySQL appears to have the ability to use raw disks. Tuned correctly,this should give the best results:


https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tablespace.html#innodb-raw-devices

If ZFS performance is not up to your expectations, and there are nohardware problems, next steps include benchmarking, tuning, and/oradding or adjusting the hardware and its usage.



>> ... invest in hardware to get performance.

> Hardware like?

Server chassis, motherboards, chipsets, processors, memory, disk hostbus adapters, disk racks, disk drives, network interface cards, etc..




> In theory, using SSDs for cache with ZFS should improve

> performance. In practise, it only wore out the SSDs after a while,and now it's

> not any faster without SSD cache.

Please run 'zpool status' and post the console session (prompt, commandentered, output displayed). Please correlate the vdev's to disk drivemakes and models.



On 11/9/22 03:41, hw wrote:

I don't have anything without ECC RAM,



Nice.

and my server was never meant for ZFS.



What is the make and model of your server?

With mirroring, I could fit only one backup, not two.

Add another mirror to your pool. Or, use a process of substitution andresilvering to replace existing drives with larger capacity drives.

In any case, I'm currently tending to think that putting FreeBSD with ZFS on my
server might be the best option.  But then, apparently I won't be able to
configure the controller cards, so that won't really work.



What is the make and model of your controller cards?

And ZFS with Linux
isn't so great because it keeps fuse in between.



On 11/9/22 09:24, hw wrote:
> On Wed, 2022-11-09 at 17:29 +0100, DdB wrote:

>> NO fuse, neither FreeBSD nor debian would need the outdated zfs-fuse,
>> use the in.kernel modules from zfsonlinux.org (packages for debian are
>> in contrib IIRC).
>>
>

> Ok, all the better --- I only looked at the package management. Ah,let me see,> I have a Debian VM and no contrib ... hm, zfs-dkms and such? That'spromising,



+1

https://packages.debian.org/bullseye/zfs-dkms

https://packages.debian.org/bullseye/zfsutils-linux


On 11/9/22 03:20, hw wrote:

> The use case comes down to making backups once in a while. Whenmaking another> backup, at least the latest previous backup must not be overwritten.Sooner or> later, there won't be enough disk space to keep two full backups.With disk> prices as crazy high as they currently are, I might even move discsfrom the> backup server to the active server when it runs out of space before Imove data> into archive (without backup) or start deleting stuff. All priceskeep going

> up, so I don't expect disc prices to go down.
>

> Deduplication is only one possible way to go about it. I'm undecidedif it's> better to have only one full backup and to use snapshots instead.Deduplicating

> the backups would kinda turn two copies into only one for whatever gets

> deduplicated, so that might not be better as snapshots. Or I coulduse both and

> perhaps save even more space.

On 11/9/22 04:28, hw wrote:

> Of course it would better to have more than one machine, but I don'thave that.

If you already have a ZFS pool, the way to back it up is to replicatethe pool to another pool. Set up an external drive with a pool andreplicate your server pool to that periodically.



David

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

Reply via email to