subject:"Re\: ZFS performance \(was\: Re\: deduplicating file systems\: VDO with Debian\?\)"

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-15 Thread hw

On Mon, 2022-11-14 at 20:37 +0100, Linux-Fan wrote:
> hw writes:
> 
> > On Fri, 2022-11-11 at 22:11 +0100, Linux-Fan wrote:
> > > hw writes:
> > > > On Thu, 2022-11-10 at 22:37 +0100, Linux-Fan wrote:
> [...]
> > How do you intend to copy files at any other level than at file level?  At  
> > that
> > level, the only thing you know about is files.
> 
> You can copy only a subset of files but you cannot mirror only a subset of a  
> volume in a RAID unless you specifically designed that in at the time of  
> partitioning. With RAID redundancy you have to decide upfront what you  
> want to have mirrored. With files, you can change it any time.

You can do that with RAID as well.  It might take more work, though.

> [...]
> 
> > > Multiple, well established tools exist for file tree copying. In RAID 
> > > scenarios the mode of operation is integral to the solution.
> > 
> > What has file tree copying to do with RAID scenarios?
> 
> Above, I wrote that making copies of the data may be recommendable over  
> using a RAID. You answered “Huh?” which I understood as a question to expand  
> on the advantages of copying files rather than using RAID.

So file tree copying doesn't have anything to with RAID scenarios.

> [...]
> 
> > > File trees can be copied to slow target storages without slowing down the 
> > > source file system significantly. On the other hand, in RAID scenarios, 
> 
> [...]
> 
> > Copying the VM images to the slow HDD would slow the target down just as it
> > might slow down a RAID array.
> 
> This is true and does not contradict what I wrote.

I didn't say that it contradicts.  Only it doesn't matter what kind of files
you're copying to a disk for the disk to slow down while you seemed to make a
distinction that doesn't seem necessary for slowing down disks.

> 
> > > ### when
> > > 
> > > For file copies, the target storage need not always be online. You can 
> > > connect it only for the time of synchronization. This reduces the chance 
> > > that line overvoltages and other hardware faults destroy both copies at  
> > > the same time. For a RAID, all drives must be online at all times (lest
> > > the 
> > > array becomes degraded).
> > 
> > No, you can always turn off the array just as you can turn off single disks.
> > When I'm done making backups, I shut down the server and not much can
> > happen  
> > to
> > the backups.
> 
> If you try this in practice, it is quite limited compared to file copies.

What's the difference between the target storage being offline and the target
storage server being switched off?  You can't copy the files either way because
there's nothing available to copy them to.

> 
> > > Additionally, when using files, only the _used_ space matters. Beyond  
> > > that, the size of the source and target file systems are decoupled. On the
> > > other 
> > > hand, RAID mandates that the sizes of disks adhere to certain properties 
> > > (like all being equal or wasting some of the storage).
> > 
> > And?
> 
> If these limitations are insignificant to you then lifting them provides no  
> advantage to you. You can then safely ignore this point :)

Since you can't copy files into thin air, limitations always apply.

> 
> [...]
> 
> > > > Hm, I haven't really used Debian in a long time.  There's probably no
> > > > reason 
> > > > to change that.  If you want something else, you can always go for it.
> > > 
> > > Why are you asking on a Debian list when you neiter use it nor intend to  
> > > use it?
> > 
> > I didn't say that I don't use Debian, nor that I don't intend to use it.
> 
> This must be a language barrier issue. I do not understand how your  
> statements above do not contradict each other.

It's possible that the context has escaped you because it hasn't been quoted.

> [...]
> 
> > > Now check with 
> > > 
> > > I get the following (smaller number => more popular):
> > > 
> > > 87   e2fsprogs
> > > 1657 btrfs-progs
> > > 2314 xfsprogs
> > > 2903 zfs-dkms
> > > 
> > > Surely this does not really measure if people are actually use these 
> > > file systems. Feel free to provide a more accurate means of measurement.  
> > > For me this strongly suggests that the most popular FS on Debian is ext4.
> > 
> > ext4 doesn't show up in this list.  And it doesen't matter if ext4 is most
> 
> e2fsprogs contains the related tools like `mkfs.ext4`.

So one could think that not many people use ext4.

> 
> [...]
> 
> > 
> > 
> > I was referring to snapshots of backups.  Keeping many full copies requires
> > a
> > lot more disk space than using snapshots.
> 
> Modern backup tools write their backups to files and still manage to be  
> similarly efficient compared the snapshot based technolgoies when it comes  
> to storage usage with the additional benefit that you can copy or  
> synchronize the output of these tools to almost any file system.

When they make full copies they'll also require at least as

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-14 Thread David Christensen


On 11/14/22 13:48, hw wrote:

On Fri, 2022-11-11 at 21:55 -0800, David Christensen wrote:



Lots of snapshots slows down commands that involve snapshots (e.g.  'zfs
list -r -t snapshot ...').  This means sysadmin tasks take longer when
the pool has more snapshots.


Hm, how long does it take?  It's not like I'm planning on making hundreds of
snaphsots ...


2022-11-14 18:00:12 toor@f3 ~
# time zfs list -r -t snapshot bootpool | wc -l
  49

real0m0.020s
user0m0.011s
sys 0m0.012s

2022-11-14 18:00:55 toor@f3 ~
# time zfs list -r -t snapshot soho2_zroot | wc -l
 222

real0m0.120s
user0m0.041s
sys 0m0.082s

2022-11-14 18:01:18 toor@f3 ~
# time zfs list -r -t snapshot p3 | wc -l
3864

real0m0.649s
user0m0.159s
sys 0m0.494s


I surprised myself -- I recall p3 taking 10+ seconds to list all the 
snapshots.  But, I added another mirror since then, I try to destroy old 
snapshots periodically, and the machine has been up for 16+ days (so 
metadata is likely cached).




The Intel Optane Memory Series products are designed to be cache devices
-- when using compatible hardware, Windows, and Intel software.  My
hardware should be compatible (Dell PowerEdge T30), but I am unsure if
FreeBSD 12.3-R will see the motherboard NVMe slot or an installed Optane
Memory Series product.


Try it out?



Eventually, yes.



I thought Optane comes as very expensive PCI cards.  I don't have any m.2 slots,
and it seems difficult to even find mainboards with at least two that support
the same cards, which would be a requirement because there's no storing data
without redundancy.



I was thinking of getting an NVMe M.2 SSD to PCIe x4 adapter card for 
the machines without a motherboard M.2 slot.




# zpool status
   pool: moon
  state: ONLINE
config:

 NAMESTATE READ WRITE CKSUM
 moonONLINE   0 0 0
   mirror-0  ONLINE   0 0 0
 sdc ONLINE   0 0 0
 sdg ONLINE   0 0 0
   raidz1-1  ONLINE   0 0 0
 sdl ONLINE   0 0 0
 sdm ONLINE   0 0 0
 sdn ONLINE   0 0 0
 sdp ONLINE   0 0 0
 sdq ONLINE   0 0 0
 sdr ONLINE   0 0 0
   raidz1-2  ONLINE   0 0 0
 sdd ONLINE   0 0 0
 sde ONLINE   0 0 0
 sdf ONLINE   0 0 0
 sdh ONLINE   0 0 0
 sdi ONLINE   0 0 0
 sdj ONLINE   0 0 0
   mirror-3  ONLINE   0 0 0
 sdk ONLINE   0 0 0
 sdo ONLINE   0 0 0


Some of the disks are 15 years old ...  It made sense to me to group the disks
by the ones that are the same (size and model) and use raidz or mirror depending
on how many disks there are.

I don't know if that's ideal.  Would zfs have it figured out by itself if I had
added all of the disks in a raidz?  With two groups of only two disks each that
might have wasted space?



So, 16 HDD's of various sizes?


Without knowing the interfaces, ports, and drives that correspond to 
devices sd[cdefghijklmnopqr], it is difficult to comment.  I do find it 
surprising that you have two mirrors of 2 drives each and two raidz1's 
of 6 drives each.



If you want maximum server IOPS and bandwidth, layout your pool of 16 
drives as 8 mirrors of 2 drives each.  Try to match the sizes of the 
drives in each mirror.  It is okay if the mirrors are not all the same 
size.  ZFS will proportion writes to top-level vdev's based upon their 
available space.  Reads come from whichever vdev's have the data.



When I built my latest server, I tried different pool layouts with 4 
HDD's and ran benchmarks.  (2 striped mirrors of 2 HDD's each was the 
winner.)



You can monitor pool I/O with:

# zpool iostat -v moon 10


On FreeBSD, top(1) includes ZFS ARC memory usage:

ARC: 8392M Total, 5201M MFU, 797M MRU, 3168K Anon, 197M Header, 2194M Other
 3529M Compressed, 7313M Uncompressed, 2.07:1 Ratio


Is the SSD cache even relevant for a backup server?  



Yes, because the backup server is really a secondary server in a 
primary-secondary scheme.  Both servers contain a complete set of data, 
backups, archives, and images.  The primary server is up 24x7.  I boot 
the secondary periodically and replicate.  If the primary dies, I will 
swap roles and try to recover content that changed since the last 
replication.




I might have two unused
80GB SSDs I may be able to plug in to use as cache.  



Split each SSD into two or more partitions.  Add one partition on each 
SSD as a cache device for the HDD pool.  Using another partition on each 
SSD, add a dedicated dedup mirror for the HDD pool.



I am thinking of using a third

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-14 Thread hw

On Fri, 2022-11-11 at 21:55 -0800, David Christensen wrote:
> [...]
> As with most filesystems, performance of ZFS drops dramatically as you 
> approach 100% usage.  So, you need a data destruction policy that keeps 
> storage usage and performance at acceptable levels.
> 
> Lots of snapshots slows down commands that involve snapshots (e.g.  'zfs 
> list -r -t snapshot ...').  This means sysadmin tasks take longer when 
> the pool has more snapshots.

Hm, how long does it take?  It's not like I'm planning on making hundreds of
snaphsots ...

> 
> > > I have considered switching to one Intel Optane Memory
> > > Series and a PCIe 4x adapter card in each server [for a ZFS cache].
> > 
> > Isn't that very expensinve and wears out just as well?  
> 
> 
> The Intel Optane Memory Series products are designed to be cache devices 
> -- when using compatible hardware, Windows, and Intel software.  My 
> hardware should be compatible (Dell PowerEdge T30), but I am unsure if 
> FreeBSD 12.3-R will see the motherboard NVMe slot or an installed Optane 
> Memory Series product.

Try it out?

> Intel Optane Memory M10 16 GB PCIe M.2 80mm are US $18.25 on Amazon.
> 
> 
> Intel Optane Memory M.2 2280 32GB PCIe NVMe 3.0 x2 are US $69.95 on Amazon.

I thought Optane comes as very expensive PCI cards.  I don't have any m.2 slots,
and it seems difficult to even find mainboards with at least two that support
the same cards, which would be a requirement because there's no storing data
without redundancy.

> > Wouldn't it be better to have the cache in RAM?
> 
> Adding memory should help in more ways than one.  Doing so might reduce 
> ZFS cache device usage, but I am not certain.  But, more RAM will not 
> address the excessive wear problems when using a desktop SSD as a ZFS 
> cache device.

Well, after some fruitless experimentation with btrfs, I finally decided to go
with ZFS for the backups.  It's the better choice because btrfs can't reliably
do RAID5, and deduplication seems still very experimental.  I tried that, too,
and after like 5 hours or so, deduplication with bees freed only about 0.1% disk
space, so that was ridiculous.

ZFS gives me almost twice as much storage capacity as btrfs and also has
snapshots.

> 8 GB ECC memory modules to match the existing modules in my SOHO server 
> are $24.95 each on eBay.  I have two free memory slots.

Apparently ZFS gets really memory hungry with deduplication.  I'll go without,
and when using snapshots, I'll have plenty space left without.

> 
> > > Please run and post the relevant command for LVM, btrfs, whatever.
> > 
> > Well, what would that tell you?
> 
> 
> That would provide accurate information about the storage configuration 
> of your backup server.

Oh I only created that this morning:

# zpool status
  pool: moon
 state: ONLINE
config:

NAMESTATE READ WRITE CKSUM
moonONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
sdc ONLINE   0 0 0
sdg ONLINE   0 0 0
  raidz1-1  ONLINE   0 0 0
sdl ONLINE   0 0 0
sdm ONLINE   0 0 0
sdn ONLINE   0 0 0
sdp ONLINE   0 0 0
sdq ONLINE   0 0 0
sdr ONLINE   0 0 0
  raidz1-2  ONLINE   0 0 0
sdd ONLINE   0 0 0
sde ONLINE   0 0 0
sdf ONLINE   0 0 0
sdh ONLINE   0 0 0
sdi ONLINE   0 0 0
sdj ONLINE   0 0 0
  mirror-3  ONLINE   0 0 0
sdk ONLINE   0 0 0
sdo ONLINE   0 0 0

Some of the disks are 15 years old ...  It made sense to me to group the disks
by the ones that are the same (size and model) and use raidz or mirror depending
on how many disks there are.

I don't know if that's ideal.  Would zfs have it figured out by itself if I had
added all of the disks in a raidz?  With two groups of only two disks each that
might have wasted space?

> Here is the pool in my backup server.  mirror-0 and mirror-1 each use 
> two Seagate 3 TB HDD's.  dedup and cache each use partitions on two 
> Intel SSD 520 Series 180 GB SSD's:
> 
> 2022-11-11 20:41:09 toor@f1 ~
> # zpool status p1
>    pool: p1
>   state: ONLINE
>    scan: scrub repaired 0 in 7 days 22:18:11 with 0 errors on Sun Sep  4 
> 14:18:21 2022
> config:
> 
> NAME  STATE READ WRITE CKSUM
> p1    ONLINE   0 0 0
>   mirror-0    ONLINE   0 0 0
>     gpt/p1a.eli   ONLINE   0 0 0
>     gpt/p1b.eli   ONLINE   0 0 0
>   mirror-1

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-14 Thread Linux-Fan

hw writes:

On Fri, 2022-11-11 at 22:11 +0100, Linux-Fan wrote:
> hw writes:
> > On Thu, 2022-11-10 at 22:37 +0100, Linux-Fan wrote:
>
> [...]
>
> > >  If you do not value the uptime making actual (even
> > >  scheduled) copies of the data may be recommendable over
> > >  using a RAID because such schemes may (among other advantages)
> > >  protect you from accidental file deletions, too.
> >
> > Huh?
>
> RAID is limited in its capabilities because it acts at the file system, 
> block (or in case of hardware RAID even disk) level. Copying files can 
> operate on any subset of the data and is very flexible when it comes to 
> changing what is going to be copied, how, when and where to.

How do you intend to copy files at any other level than at file level?  At  
that

level, the only thing you know about is files.

You can copy only a subset of files but you cannot mirror only a subset of a  
volume in a RAID unless you specifically designed that in at the time of  
partitioning. With RAID redundancy you have to decide upfront what you  
want to have mirrored. With files, you can change it any time.

[...]

> Multiple, well established tools exist for file tree copying. In RAID 
> scenarios the mode of operation is integral to the solution.

What has file tree copying to do with RAID scenarios?

Above, I wrote that making copies of the data may be recommendable over  
using a RAID. You answered “Huh?” which I understood as a question to expand  
on the advantages of copying files rather than using RAID.

[...]

> File trees can be copied to slow target storages without slowing down the 
> source file system significantly. On the other hand, in RAID scenarios, 

[...]

Copying the VM images to the slow HDD would slow the target down just as it
might slow down a RAID array.

This is true and does not contradict what I wrote.

> ### when
>
> For file copies, the target storage need not always be online. You can 
> connect it only for the time of synchronization. This reduces the chance 
> that line overvoltages and other hardware faults destroy both copies at  
> the same time. For a RAID, all drives must be online at all times (lest the 

> array becomes degraded).

No, you can always turn off the array just as you can turn off single disks.
When I'm done making backups, I shut down the server and not much can happen  
to

the backups.

If you try this in practice, it is quite limited compared to file copies.

> Additionally, when using files, only the _used_ space matters. Beyond  
> that, the size of the source and target file systems are decoupled. On the other 

> hand, RAID mandates that the sizes of disks adhere to certain properties 
> (like all being equal or wasting some of the storage).

And?

If these limitations are insignificant to you then lifting them provides no  
advantage to you. You can then safely ignore this point :)

[...]

> > Hm, I haven't really used Debian in a long time.  There's probably no
> > reason 
> > to change that.  If you want something else, you can always go for it.
>
> Why are you asking on a Debian list when you neiter use it nor intend to  
> use it?

I didn't say that I don't use Debian, nor that I don't intend to use it.

This must be a language barrier issue. I do not understand how your  
statements above do not contradict each other.

[...]

> Now check with 
>
> I get the following (smaller number => more popular):
>
> 87   e2fsprogs
> 1657 btrfs-progs
> 2314 xfsprogs
> 2903 zfs-dkms
>
> Surely this does not really measure if people are actually use these 
> file systems. Feel free to provide a more accurate means of measurement.  
> For me this strongly suggests that the most popular FS on Debian is ext4.

ext4 doesn't show up in this list.  And it doesen't matter if ext4 is most

e2fsprogs contains the related tools like `mkfs.ext4`.

widespread on Debian when more widespread distributions use different file
systems.  I don't have a way to get the numbers for that.

Today I installed Debian on my backup server and didn't use ext4.  Perhaps  
the "most widely-deployed" file system is FAT.

Probably yes. With the advent of ESPs it may have even increased in  
popularity again :)

[...]

> I like to be able to store my backups on any file system. This will not  
> work for snapshots unless I “materialize” them by copying out all files of a 

> snapshot.
>
> I know that some backup strategies suggest always creating backups based  
> on snapshots rather than the live file system as to avoid issues with  
> changing files during the creation of backups.

>
> I can see the merit in implementing it this way but have not yet found a 
> strong need for this feature since I backup files that I create/modify 
> myself and can thus manually ensure to not change them during the running 
> backup process.

I was referring to snapshots of backups.  Keeping many full copies

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-13 Thread David Christensen


On 11/13/22 13:02, hw wrote:

On Fri, 2022-11-11 at 07:55 -0500, Dan Ritter wrote:

hw wrote:

On Thu, 2022-11-10 at 20:32 -0500, Dan Ritter wrote:

Linux-Fan wrote:


[...]
* RAID 5 and 6 restoration incurs additional stress on the other
   disks in the RAID which makes it more likely that one of them
   will fail. The advantage of RAID 6 is that it can then recover
   from that...


Disks are always being stressed when used, and they're being stessed as well
when other types of RAID arrays than 5 or 6 are being rebuild.  And is there
evidence that disks fail *because* RAID arrays are being rebuild or would
they
have failed anyway when stressed?


Does it matter? The observed fact is that some notable
proportion of RAID 5/6 rebuilds fail because another drive in
that group has failed.


Fortunately, I haven't observed that.  And why would only RAID 5 or 6 be
affected and not RAID 1 or other levels?



Any RAID level can suffer additional disk failures while recovering from 
a disk failure.  I saw this exact scenario on my SOHO server in August 
2022.  The machine has a stripe of two mirrors of two HDD's each (e.g. 
ZFS equivalent of RAID10).  One disk was dying, so I replaced it.  While 
the replacement disk was resilvering, a disk in the other mirror started 
dying.  I let the first resilver finish, then replaced the second disk. 
Thankfully, no more disks failed.  I got lucky.



David

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-13 Thread hw

On Fri, 2022-11-11 at 22:11 +0100, Linux-Fan wrote:
> hw writes:
> 
> > On Thu, 2022-11-10 at 22:37 +0100, Linux-Fan wrote:
> 
> [...]
> 
> > >  If you do not value the uptime making actual (even
> > >  scheduled) copies of the data may be recommendable over
> > >  using a RAID because such schemes may (among other advantages)
> > >  protect you from accidental file deletions, too.
> > 
> > Huh?
> 
> RAID is limited in its capabilities because it acts at the file system,  
> block (or in case of hardware RAID even disk) level. Copying files can  
> operate on any subset of the data and is very flexible when it comes to  
> changing what is going to be copied, how, when and where to.

How do you intend to copy files at any other level than at file level?  At that
level, the only thing you know about is files.

> ### what
> 
> When copying files, its a standard feature to allow certain patterns of file  
> names to be exclueded.

sure

> [...]
> ### how
> 
> Multiple, well established tools exist for file tree copying. In RAID  
> scenarios the mode of operation is integral to the solution.

What has file tree copying to do with RAID scenarios?

> ### where to
> 
> File trees are much easier copied to network locations compared to adding a  
> “network mirror” to any RAID (although that _is_ indeed an option, DRBD was  
> mentioned in another post...).

Dunno, btrfs and ZFS have some ability to send file systems over the network,
which intended to make copying more efficient.  There must be reasons why this
feature was developed.

> File trees can be copied to slow target storages without slowing down the  
> source file system significantly. On the other hand, in RAID scenarios,  
> slow members are expected to slow down the performance of the entire array.  
> This alone may allow saving a lot of money. E.g. one could consider copying  
> the entire tree of VM images that is residing on a fast (and expensive) SSD  
> to a slow SMR HDD that only costs a fraction of the SSD. The same thing is  
> not possible with a RAID mirror except by slowing down the write operations  
> on the mirror to the speed of the HDD or by having two (or more) of the  
> expensive SSDs. SMR drives are advised against in RAID scenarios btw.

Copying the VM images to the slow HDD would slow the target down just as it
might slow down a RAID array.

> ### when
> 
> For file copies, the target storage need not always be online. You can  
> connect it only for the time of synchronization. This reduces the chance  
> that line overvoltages and other hardware faults destroy both copies at the  
> same time. For a RAID, all drives must be online at all times (lest the  
> array becomes degraded).

No, you can always turn off the array just as you can turn off single disks. 
When I'm done making backups, I shut down the server and not much can happen to
the backups.

> Additionally, when using files, only the _used_ space matters. Beyond that,  
> the size of the source and target file systems are decoupled. On the other  
> hand, RAID mandates that the sizes of disks adhere to certain properties  
> (like all being equal or wasting some of the storage).

And?

> > > > Is anyone still using ext4?  I'm not saying it's bad or anything, it  
> > > > only seems that it has gone out of fashion.
> > > 
> > > IIRC its still Debian's default.
> > 
> > Hm, I haven't really used Debian in a long time.  There's probably no
> > reason  
> > to change that.  If you want something else, you can always go for it.
> 
> Why are you asking on a Debian list when you neiter use it nor intend to use  
> it?

I didn't say that I don't use Debian, nor that I don't intend to use it.

> [...]
> > > licensing or stability issues whatsoever. By its popularity its probably  
> > > one of the most widely-deployed Linux file systems which may enhance the  
> > > chance that whatever problem you incur with ext4 someone else has had
> > > before...
> > 
> > I'm not sure it's most widespread.
> [...]
> Now check with 
> 
> I get the following (smaller number => more popular):
> 
> 87   e2fsprogs
> 1657 btrfs-progs
> 2314 xfsprogs
> 2903 zfs-dkms 
> 
> Surely this does not really measure if people are actually use these  
> file systems. Feel free to provide a more accurate means of measurement. For  
> me this strongly suggests that the most popular FS on Debian is ext4.

ext4 doesn't show up in this list.  And it doesen't matter if ext4 is most
widespread on Debian when more widespread distributions use different file
systems.  I don't have a way to get the numbers for that.

Today I installed Debian on my backup server and didn't use ext4.  Perhaps the
"most widely-deployed" file system is FAT.

> > So assuming that RHEL and Centos may be more widespread than Debian because
> > there's lots of hardware supporting those but not Debian, I wouldn't think  
> > that
> > ext4 is most widespread

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-13 Thread hw

On Fri, 2022-11-11 at 07:55 -0500, Dan Ritter wrote:
> hw wrote: 
> > On Thu, 2022-11-10 at 20:32 -0500, Dan Ritter wrote:
> > > Linux-Fan wrote: 
> > > 
> > > 
> > > [...]
> > > * RAID 5 and 6 restoration incurs additional stress on the other
> > >   disks in the RAID which makes it more likely that one of them
> > >   will fail. The advantage of RAID 6 is that it can then recover
> > >   from that...
> > 
> > Disks are always being stressed when used, and they're being stessed as well
> > when other types of RAID arrays than 5 or 6 are being rebuild.  And is there
> > evidence that disks fail *because* RAID arrays are being rebuild or would
> > they
> > have failed anyway when stressed?
> 
> Does it matter? The observed fact is that some notable
> proportion of RAID 5/6 rebuilds fail because another drive in
> that group has failed.

Fortunately, I haven't observed that.  And why would only RAID 5 or 6 be
affected and not RAID 1 or other levels?

>  The drives were likely to be from the
> same cohort of the manufacturer, and to have experienced very
> similar read/write activity over their lifetime.

Yes, and that means that might they fail all at about the same time due to age
and not because an array is being rebuild.

The question remains what the ratio between surviving volumes and lost volumes
is.

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-12 Thread Dan Ritter

David Christensen wrote: 
> The Intel Optane Memory Series products are designed to be cache devices --
> when using compatible hardware, Windows, and Intel software.  My hardware
> should be compatible (Dell PowerEdge T30), but I am unsure if FreeBSD 12.3-R
> will see the motherboard NVMe slot or an installed Optane Memory Series
> product.
> 
> 
> Intel Optane Memory M10 16 GB PCIe M.2 80mm are US $18.25 on Amazon.
> 
> 
> Intel Optane Memory M.2 2280 32GB PCIe NVMe 3.0 x2 are US $69.95 on Amazon.

Note that the entire product line is discontinued, so if you want this,
assume that you will not be able to get a replacement in future.

-dsr-

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-11 Thread David Christensen


On 11/11/22 00:43, hw wrote:

On Thu, 2022-11-10 at 21:14 -0800, David Christensen wrote:

On 11/10/22 07:44, hw wrote:

On Wed, 2022-11-09 at 21:36 -0800, David Christensen wrote:

On 11/9/22 00:24, hw wrote:
   > On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:



Taking snapshots is fast and easy.  The challenge is deciding when to
destroy them.


That seems like an easy decision, just keep as many as you can and destroy the
ones you can't keep.



As with most filesystems, performance of ZFS drops dramatically as you 
approach 100% usage.  So, you need a data destruction policy that keeps 
storage usage and performance at acceptable levels.



Lots of snapshots slows down commands that involve snapshots (e.g.  'zfs 
list -r -t snapshot ...').  This means sysadmin tasks take longer when 
the pool has more snapshots.




I have considered switching to one Intel Optane Memory
Series and a PCIe 4x adapter card in each server [for a ZFS cache].


Isn't that very expensinve and wears out just as well?  



The Intel Optane Memory Series products are designed to be cache devices 
-- when using compatible hardware, Windows, and Intel software.  My 
hardware should be compatible (Dell PowerEdge T30), but I am unsure if 
FreeBSD 12.3-R will see the motherboard NVMe slot or an installed Optane 
Memory Series product.



Intel Optane Memory M10 16 GB PCIe M.2 80mm are US $18.25 on Amazon.


Intel Optane Memory M.2 2280 32GB PCIe NVMe 3.0 x2 are US $69.95 on Amazon.



Wouldn't it be better to have the cache in RAM?



Adding memory should help in more ways than one.  Doing so might reduce 
ZFS cache device usage, but I am not certain.  But, more RAM will not 
address the excessive wear problems when using a desktop SSD as a ZFS 
cache device.



8 GB ECC memory modules to match the existing modules in my SOHO server 
are $24.95 each on eBay.  I have two free memory slots.




Please run and post the relevant command for LVM, btrfs, whatever.


Well, what would that tell you?



That would provide accurate information about the storage configuration 
of your backup server.



Here is the pool in my backup server.  mirror-0 and mirror-1 each use 
two Seagate 3 TB HDD's.  dedup and cache each use partitions on two 
Intel SSD 520 Series 180 GB SSD's:


2022-11-11 20:41:09 toor@f1 ~
# zpool status p1
  pool: p1
 state: ONLINE
  scan: scrub repaired 0 in 7 days 22:18:11 with 0 errors on Sun Sep  4 
14:18:21 2022

config:

NAME  STATE READ WRITE CKSUM
p1ONLINE   0 0 0
  mirror-0ONLINE   0 0 0
gpt/p1a.eli   ONLINE   0 0 0
gpt/p1b.eli   ONLINE   0 0 0
  mirror-1ONLINE   0 0 0
gpt/p1c.eli   ONLINE   0 0 0
gpt/p1d.eli   ONLINE   0 0 0
dedup   
  mirror-2ONLINE   0 0 0
gpt/CVCV**D0180EGN-2.eli  ONLINE   0 0 0
gpt/CVCV**7K180EGN-2.eli  ONLINE   0 0 0
cache
  gpt/CVCV**D0180EGN-1.eliONLINE   0 0 0
  gpt/CVCV**7K180EGN-1.eliONLINE   0 0 0

errors: No known data errors



I suggest creating a ZFS pool with a mirror vdev of two HDD's.
   If you
can get past your dislike of SSD's,
  add a mirror of two SSD's as a
dedicated dedup vdev.  (These will not see the hard usage that cache
devices get.)
   Create a filesystem 'backup'.  Create child filesystems,
one for each host.  Create grandchild filesystems, one for the root
filesystem on each host.


Huh?  What's with these relationships?



ZFS datasets can be organized into hierarchies.  Child dataset 
properties can be inherited from the parent dataset.  Commands can be 
applied to an entire hierarchy by specifying the top dataset and using a 
"recursive" option.  Etc..



When a host is decommissioned and you no longer need the backups, you 
can destroy the backups for just that host.  When you add a new host, 
you can create filesystems for just that host.  You can use different 
backup procedures for different hosts.  Etc..




   Set up daily rsync backups of the root
filesystems on the various hosts to the ZFS grandchild filesystems.  Set
up zfs-auto-snapshot to take daily snapshots of everything, and retain
10 snapshots.  Then watch what happens.


What do you expect to happen?  



I expect the first full backup and snapshot will use an amount of 
storage that is something less than the sum of the sizes of the source 
filesystems (due to compression).  The second through tenth backups and 
snapshots will each increase the storage usage by something less than 
the sum of the daily churn of the source filesystems.  On day 11, and 
every day thereafter, the oldest

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-11 Thread Linux-Fan

hw writes:

On Thu, 2022-11-10 at 22:37 +0100, Linux-Fan wrote:

[...]

>  If you do not value the uptime making actual (even
>  scheduled) copies of the data may be recommendable over
>  using a RAID because such schemes may (among other advantages)
>  protect you from accidental file deletions, too.

Huh?

RAID is limited in its capabilities because it acts at the file system,  
block (or in case of hardware RAID even disk) level. Copying files can  
operate on any subset of the data and is very flexible when it comes to  
changing what is going to be copied, how, when and where to.

### what

When copying files, its a standard feature to allow certain patterns of file  
names to be exclueded. This allows fine-tuning the system to avoid  
unnecessary storage costs by not duplicating the files of which duplicates  
are not needed (.iso or /tmp files could be an example of files that some  
uses may not consider worth duplicating).

### how

Multiple, well established tools exist for file tree copying. In RAID  
scenarios the mode of operation is integral to the solution.

### where to

File trees are much easier copied to network locations compared to adding a  
“network mirror” to any RAID (although that _is_ indeed an option, DRBD was  
mentioned in another post...).

File trees can be copied to slow target storages without slowing down the  
source file system significantly. On the other hand, in RAID scenarios,  
slow members are expected to slow down the performance of the entire array.  
This alone may allow saving a lot of money. E.g. one could consider copying  
the entire tree of VM images that is residing on a fast (and expensive) SSD  
to a slow SMR HDD that only costs a fraction of the SSD. The same thing is  
not possible with a RAID mirror except by slowing down the write operations  
on the mirror to the speed of the HDD or by having two (or more) of the  
expensive SSDs. SMR drives are advised against in RAID scenarios btw.

### when

For file copies, the target storage need not always be online. You can  
connect it only for the time of synchronization. This reduces the chance  
that line overvoltages and other hardware faults destroy both copies at the  
same time. For a RAID, all drives must be online at all times (lest the  
array becomes degraded).

Additionally, when using files, only the _used_ space matters. Beyond that,  
the size of the source and target file systems are decoupled. On the other  
hand, RAID mandates that the sizes of disks adhere to certain properties  
(like all being equal or wasting some of the storage).

> > Is anyone still using ext4?  I'm not saying it's bad or anything, it  
> > only seems that it has gone out of fashion.

>
> IIRC its still Debian's default.

Hm, I haven't really used Debian in a long time.  There's probably no reason  
to change that.  If you want something else, you can always go for it.

Why are you asking on a Debian list when you neiter use it nor intend to use  
it?

>  Its my file system of choice unless I have 
> very specific reasons against it. I have never seen it fail outside of 
> hardware issues. Performance of ext4 is quite acceptable out of the box. 
> E.g. it seems to be slightly faster than ZFS for my use cases. 
> Almost every Linux live system can read it. There are no problematic 
> licensing or stability issues whatsoever. By its popularity its probably  
> one of the most widely-deployed Linux file systems which may enhance the  
> chance that whatever problem you incur with ext4 someone else has had before...

I'm not sure it's most widespread.  Centos (and Fedora) defaulted to xfs  
quite
some time ago, and Fedora more recently defaulted to btrfs (a while after  
Redhat
announced they would remove btrfs from RHEL altogether).  Centos went down  
the
drain when it mutated into an outdated version of Fedora, and RHEL is  
probably

isn't any better.

~$ dpkg -S zpool | cut -d: -f 1 | sort -u
[...]
zfs-dkms
zfsutils-linux
~$ dpkg -S mkfs.ext4
e2fsprogs: /usr/share/man/man8/mkfs.ext4.8.gz
e2fsprogs: /sbin/mkfs.ext4
~$ dpkg -S mkfs.xfs
xfsprogs: /sbin/mkfs.xfs
xfsprogs: /usr/share/man/man8/mkfs.xfs.8.gz
~$ dpkg -S mkfs.btrfs
btrfs-progs: /usr/share/man/man8/mkfs.btrfs.8.gz
btrfs-progs: /sbin/mkfs.btrfs

Now check with 

I get the following (smaller number => more popular):

87   e2fsprogs
1657 btrfs-progs
2314 xfsprogs
	2903 zfs-dkms 

Surely this does not really measure if people are actually use these  
file systems. Feel free to provide a more accurate means of measurement. For  
me this strongly suggests that the most popular FS on Debian is ext4.

So assuming that RHEL and Centos may be more widespread than Debian because
there's lots of hardware supporting those but not Debian, I wouldn't think  
that

ext4 is most

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-11 Thread Michael Stone


On Fri, Nov 11, 2022 at 09:03:45AM +0100, hw wrote:

On Thu, 2022-11-10 at 23:12 -0500, Michael Stone wrote:

The advantage to RAID 6 is that it can tolerate a double disk failure.
With RAID 1 you need 3x your effective capacity to achieve that and even
though storage has gotten cheaper, it hasn't gotten that cheap. (e.g.,
an 8 disk RAID 6 has the same fault tolerance as an 18 disk RAID 1 of
equivalent capacity, ignoring pointless quibbling over probabilities.)


so with RAID6, 3x8 is 18 instead of 24


you have 6 disks of useable capacity with the 8 disk raid 6, two disks 
worth of parity. 6 disks of useable capacity on a triple redundant 
mirror is 6*3 = 18.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-11 Thread Dan Ritter

hw wrote: 
> On Thu, 2022-11-10 at 20:32 -0500, Dan Ritter wrote:
> > Linux-Fan wrote: 
> > 
> > 
> > [...]
> > * RAID 5 and 6 restoration incurs additional stress on the other
> >   disks in the RAID which makes it more likely that one of them
> >   will fail. The advantage of RAID 6 is that it can then recover
> >   from that...
> 
> Disks are always being stressed when used, and they're being stessed as well
> when other types of RAID arrays than 5 or 6 are being rebuild.  And is there
> evidence that disks fail *because* RAID arrays are being rebuild or would they
> have failed anyway when stressed?

Does it matter? The observed fact is that some notable
proportion of RAID 5/6 rebuilds fail because another drive in
that group has failed. The drives were likely to be from the
same cohort of the manufacturer, and to have experienced very
similar read/write activity over their lifetime.

To some extent this can be ameliorated by using disks from
multiple manufacturers or different batches, but there are only
three rotating disk makers left and managing this is difficult
to arrange at scale.

> > Most of the computers in my house have one disk. If I value any
> > data on that disk,
> 
> Then you don't use only one disk but redundancy.  There's also your time and
> nerves you might value.

It turns out to be really hard to fit a second disk in a laptop,
or in a NUC-sized machine.

> >  I back it up to the server, which has 4 4TB
> > disks in ZFS RAID10. If a disk fails in that, I know I can
> > survive that and replace it within 24 hours for a reasonable
> > amount of money -- rather more reasonable in the last few
> > months.
> 
> How do you get a new suitable disk within 24 hours?  For reasonable amounts of
> money?  Disk prices keep changing all the time.

My local store, MicroCenter, has --- 20ish 4TB disks in stock. I
can go get one in an hour.

Amazon will ship me a suitable drive next day or faster -- I
have ordered some items in the morning and received them before
nightfall -- at a lower cost, but at the price of enriching
Bezos.

-dsr-

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-11 Thread DdB

Am 11.11.2022 um 07:36 schrieb hw:
> That's on https://docs.freebsd.org/en/books/handbook/zfs/
> 
> I don't remember where I read about 8, could have been some documentation 
> about
> FreeNAS.

Well, OTOH there do exist some considerations, which may have lead to
that number sticking somewhere, but i have seen people with MUCH larger
pools.

In order to avoid wasting too much space, there was a "formula" to
calculate optimum pool size:
First take an amount of disks that is a result of 2^^n (like
2/4/8/16/...) and then add the number of disks you need for redundancy
(raidz = 1, raidz2 = 2, raidz3 = 3). That would give nice spots like 4+2
= 6 (identical) disks for raidz2, or 11 for raidz3. Those numbers are
sweet spots for the size of vdevs, otherwise, more space gets waisted on
the drives. But that is only ONE consideration. My motherboard has 8
connectors for SATA, + 2 for NVME, which limited my options more than
anything.
And after long considerations, i opted for 4 mirrored vdevs, giving even
more space to redundancy, but gaining read speed.

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-11 Thread hw

On Thu, 2022-11-10 at 21:14 -0800, David Christensen wrote:
> On 11/10/22 07:44, hw wrote:
> > On Wed, 2022-11-09 at 21:36 -0800, David Christensen wrote:
> > > On 11/9/22 00:24, hw wrote:
> > >   > On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:
> 
> [...]
>
> > 
> Taking snapshots is fast and easy.  The challenge is deciding when to 
> destroy them.

That seems like an easy decision, just keep as many as you can and destroy the
ones you can't keep.

> [...]
> > > Without deduplication or compression, my backup set and 78 snapshots
> > > would require 3.5 TiB of storage.  With deduplication and compression,
> > > they require 86 GiB of storage.
> > 
> > Wow that's quite a difference!  What makes this difference, the compression
> > or
> > the deduplication? 
> 
> 
> Deduplication.

Hmm, that means that deduplication shrinks your data down to about 1/40 of it's
size.  That's an awesome rate.

> > When you have snapshots, you would store only the
> > differences from one snapshot to the next, 
> > and that would mean that there aren't
> > so many duplicates that could be deduplicated.
> 
> 
> I do not know -- I have not crawled the ZFS code; I just use it.

Well, it's like a miracle :)

> > > Users can recover their own files without needing help from a system
> > > administrator.
> > 
> > You have users who know how to get files out of snapshots?
> 
> 
> Not really; but the feature is there.

That means you're still the one to get the files.

> [...]
> > 
> > 
> > > What were the makes and models of the 6 disks?  Of the SSD's?  If you
> > > have a 'zpool status' console session from then, please post it.
> > 
> > They were (and still are) 6x4TB WD Red (though one or two have failed over
> > time)
> > and two Samsung 850 PRO, IIRC.  I don't have an old session anymore.
> > 
> > These WD Red are slow to begin with.  IIRC, both SDDs failed and I removed
> > them.
> > 
> > The other instance didn't use SSDs but 6x2TB HGST Ultrastar.  Those aren't
> > exactly slow but ZFS is slow.
> 
> 
> Those HDD's should be fine with ZFS; but those SSD's are desktop drives, 
> not cache devices.  That said, I am making the same mistake with Intel 
> SSD 520 Series.  I have considered switching to one Intel Optane Memory 
> Series and a PCIe 4x adapter card in each server.

Isn't that very expensinve and wears out just as well?  Wouldn't it be better to
have the cache in RAM?


> Please run and post the relevant command for LVM, btrfs, whatever.

Well, what would that tell you?

> [...]
> > 
> > > What is the make and model of your controller cards?
> > 
> > They're HP smart array P410.  FreeBSD doesn't seem to support those.
> 
> 
> I use the LSI 9207-8i with "IT Mode" firmware (e.g. host bus adapter, 
> not RAID):

Well, I couldn't get those when I wanted them.  Since I didn't plan on using
ZFS, the P410s have to do.

> [...]
> > ... the data to back up is mostly (or even all) on btrfs. ... copy the
> > files over with rsync.  ...
> > the data comes from different machines and all backs up to one volume.
> 
> 
> I suggest creating a ZFS pool with a mirror vdev of two HDD's.

That would be way too small.

>   If you 
> can get past your dislike of SSD's,

I don't dislike them.  I'm using them where they give me advantages, and I don't
use them where they would give me disadvantages.

>  add a mirror of two SSD's as a 
> dedicated dedup vdev.  (These will not see the hard usage that cache 
> devices get.)

I think I have 2x80GB SSDs that are currently not in use.

>   Create a filesystem 'backup'.  Create child filesystems, 
> one for each host.  Create grandchild filesystems, one for the root 
> filesystem on each host.

Huh?  What's with these relationships?

>   Set up daily rsync backups of the root 
> filesystems on the various hosts to the ZFS grandchild filesystems.  Set 
> up zfs-auto-snapshot to take daily snapshots of everything, and retain 
> 10 snapshots.  Then watch what happens.

What do you expect to happen?  I'm thinking about changing my backup sever ... 
In any case, I need to do more homework first.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-11 Thread hw

On Thu, 2022-11-10 at 23:12 -0500, Michael Stone wrote:
> On Thu, Nov 10, 2022 at 08:32:36PM -0500, Dan Ritter wrote:
> > * RAID 5 and 6 restoration incurs additional stress on the other
> >  disks in the RAID which makes it more likely that one of them
> >  will fail.
> 
> I believe that's mostly apocryphal; I haven't seen science backing that 
> up, and it hasn't been my experience either.

Maybe it's a myth that comes about when someone rebuilds a RAID and yet another
disk in it fails (because they're all same age and have been running under same
conditions).  It's easy to jump to conclusions and easy jumps is what people
like.

OTOH, it's not too unplausible that a disk might fail just when it's working
particularly hard.  If it hadn't been working so hard, maybe it would have
failed later because it had more time to wear out or when the ambient
temperatures are higher in the summer.  So who knows?

> >  The advantage of RAID 6 is that it can then recover
> >  from that...
> 
> The advantage to RAID 6 is that it can tolerate a double disk failure. 
> With RAID 1 you need 3x your effective capacity to achieve that and even 
> though storage has gotten cheaper, it hasn't gotten that cheap. (e.g., 
> an 8 disk RAID 6 has the same fault tolerance as an 18 disk RAID 1 of 
> equivalent capacity, ignoring pointless quibbling over probabilities.)

so with RAID6, 3x8 is 18 instead of 24

With 18 disks more can go wrong than with 8.  That's all kinda confusing.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 20:32 -0500, Dan Ritter wrote:
> Linux-Fan wrote: 
> 
> 
> [...]
> * RAID 5 and 6 restoration incurs additional stress on the other
>   disks in the RAID which makes it more likely that one of them
>   will fail. The advantage of RAID 6 is that it can then recover
>   from that...

Disks are always being stressed when used, and they're being stessed as well
when other types of RAID arrays than 5 or 6 are being rebuild.  And is there
evidence that disks fail *because* RAID arrays are being rebuild or would they
have failed anyway when stressed?

> * RAID 10 gets you better read performance in terms of both
>   throughput and IOPS relative to the same number of disks in
>   RAID 5 or 6. Most disk activity is reading.
> 

and it requires more disks for the same capacity

For disks used for backups, most activity is writing.  That goes for some other
purposes as well.

> [...]
> 
>  The power of open source software is that we can make
> opportunities open to people with small budgets that are
> otherwise reserved for people with big budgets.

That's only one advantage.

> Most of the computers in my house have one disk. If I value any
> data on that disk,

Then you don't use only one disk but redundancy.  There's also your time and
nerves you might value.

>  I back it up to the server, which has 4 4TB
> disks in ZFS RAID10. If a disk fails in that, I know I can
> survive that and replace it within 24 hours for a reasonable
> amount of money -- rather more reasonable in the last few
> months.

How do you get a new suitable disk within 24 hours?  For reasonable amounts of
money?  Disk prices keep changing all the time.

Backups are no substitute for redundancy.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 22:37 +0100, Linux-Fan wrote:
> hw writes:
> 
> > On Wed, 2022-11-09 at 19:17 +0100, Linux-Fan wrote:
> > > hw writes:
> > > > On Wed, 2022-11-09 at 14:29 +0100, didier gaumet wrote:
> > > > > Le 09/11/2022 à 12:41, hw a écrit :
> 
> [...]
> 
> > > > I'd
> > > > have to use mdadm to create a RAID5 (or use the hardware RAID but that  
> > > > isn't
> > > 
> > > AFAIK BTRFS also includes some integrated RAID support such that you do  
> > > not necessarily need to pair it with mdadm.
> > 
> > Yes, but RAID56 is broken in btrfs.
> > 
> > > It is advised against using for RAID 
> > > 5 or 6 even in most recent Linux kernels, though:
> > > 
> > > https://btrfs.readthedocs.io/en/latest/btrfs-man5.html#raid56-status-and-recommended-practices
> > 
> > Yes, that's why I would have to use btrfs on mdadm when I want to make a  
> > RAID5.
> > That kinda sucks.
> > 
> > > RAID 5 and 6 have their own issues you should be aware of even when  
> > > running 
> > > them with the time-proven and reliable mdadm stack. You can find a lot of 
> > > interesting results by searching for “RAID5 considered harmful” online.  
> > > This 
> > > one is the classic that does not seem to make it to the top results,  
> > > though:
> > 
> > Hm, really?  The only time that RAID5 gave me trouble was when the hardware 
> 
> [...]
> 
> I have never used RAID5 so how would I know :)
> 
> I think the arguments of the RAID5/6 critics summarized were as follows:
> 
>  * Running in a RAID level that is 5 or 6 degrades performance while
>    a disk is offline significantly. RAID 10 keeps most of its speed and
>    RAID 1 only degrades slightly for most use cases.

It's fine when they pay for the additional disks required by RAID1 or 10.

>  * During restore, RAID5 and 6 are known to degrade performance more compared
>    to restoring one of the other RAID levels.

When that matters, don't use this RAID level.  It's not an issue about keeping
data save.

>  * Disk space has become so cheap that the savings of RAID5 may
>    no longer rectify the performance and reliability degradation
>    compared to RAID1 or 10.

When did that happen?  Disk space is anything but cheap and the electricty
needed to run these disks is anything but cheap.  Needing more disks for the
same capacity is anything but cheap because a server can fit only so many disks,
and the more disks you put into a server, the more expensive that server gets. 
You might even need more server, further increasing costs for hardware and
electricity.

> All of these arguments come from a “server” point of view where it is  
> assumed that
> 
>  (1) You win something by running the server so you can actually
>  tell that there is an economic value in it. This allows for
>  arguments like “storage is cheap” which may not be the case at
>  all if you are using up some thightly limited private budget.

You're not gona win anything when your storage gets too expensive.

>  (2) Uptime and delivering the service is paramount.

That can get expensive very quickly.

>  Hence there
>  are some considerations regarding the online performance of
>  the server while the RAID is degraded and while it is restoring.
>  If you are fine to take your machine offline or accept degraded
>  performance for prolonged times then this does not apply of
>  course.

Sure, when you have issues like that, find a different solution.

>  If you do not value the uptime making actual (even
>  scheduled) copies of the data may be recommendable over
>  using a RAID because such schemes may (among other advantages)
>  protect you from accidental file deletions, too.

Huh?

> Also note that in today's computing landscape, not all unwanted file  
> deletions are accidental. With the advent of “crypto trojans” adversaries  
> exist that actually try to encrypt or delete your data to extort a ransom.
> 

When have unwanted file deletions been exclusively accidential?

> > More than one disk can fail?  Sure can, and it's one of the reasons why I  
> > make
> > backups.
> > 
> > You also have to consider costs.  How much do you want to spend on storage  
> > and
> > and on backups?  And do you want make yourself crazy worrying about your  
> > data?
> 
> I am pretty sure that if I separate my PC into GPU, CPU, RAM and Storage, I  
> spent most on storage actually. Well established schemes of redundancy and  
> backups make me worry less about my data.

Well, what did they say: Disk space has become cheap.  Yeah, for sure ... :)

> I still worry enough about backups to have written my own software:
> https://masysma.net/32/jmbb.xhtml
> and that I am also evaluating new developments in that area to probably  
> replace my self-written program by a more reliable (because used by more  
> people!) alternative:
> https://masysma.net/37/backup_tests_borg_bupstash_kopia.xhtml
> > 

cool :)

> [...]
> > 
> > Is anyone still using ext4?  I'm not saying it's bad or

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 14:28 +0100, DdB wrote:
> Am 10.11.2022 um 13:03 schrieb Greg Wooledge:
> > If it turns out that '?' really is the filename, then it becomes a ZFS
> > issue with which I can't help.
> 
> just tested: i could create, rename, delete a file with that name on a
> zfs filesystem just as with any other fileystem.
> 
> But: i recall having seen an issue with corrupted filenames in a
> snapshot once (several years ago though). At the time, i did resort to
> send/recv to get the issue straightened out.

Well, the ZFS version in use is ancient ...  But that I could rename it is a
good sign.

> But it is very much more likely, that the filename '?' is entirely
> unrelated to zfs. Although zfs is perceived as being easy to handle
> (only 2 commands need to be learned: zpool and zfs),

Ha, it's for from easy.  These commands have many options ...

>  it takes a while to
> get acquainted with all the concepts and behaviors. Take some time to
> play with an installation (in a vm or just with a file based pool should
> be considered).

Ah, yes, that's a good idea :)

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 08:48 -0500, Dan Ritter wrote:
> hw wrote: 
> > And I've been reading that when using ZFS, you shouldn't make volumes with
> > more
> > than 8 disks.  That's very inconvenient.
> 
> 
> Where do you read these things?

I read things like this:

"Sun™ recommends that the number of devices used in a RAID-Z configuration be
between three and nine. For environments requiring a single pool consisting of
10 disks or more, consider breaking it up into smaller RAID-Z groups. If two
disks are available, ZFS mirroring provides redundancy if required. Refer to
zpool(8) for more details."

That's on https://docs.freebsd.org/en/books/handbook/zfs/

I don't remember where I read about 8, could have been some documentation about
FreeNAS.  I've also been reading different amounts of RAM required for
deduplication, so who knows what's true.

> The number of disks in a zvol can be optimized, depending on
> your desired redundancy method, total number of drives, and
> tolerance for reduced performance during resilvering. 
> 
> Multiple zvols together form a zpool. Filesystems are allocated from
> a zpool.
> 
> 8 is not a magic number.
> 

You mean like here:
https://pthree.org/2012/12/21/zfs-administration-part-xiv-zvols/

That seems rather complicated.  I guess it's just a bad guide.  I'll find out if
I use ZFS.

Re: weird directory entry on ZFS volume (Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)))

2022-11-10 Thread David Christensen


On Thu, Nov 10, 2022 at 05:54:00AM +0100, hw wrote:

ls -la
insgesamt 5
drwxr-xr-x  3 namefoo namefoo    3 16. Aug 22:36 .
drwxr-xr-x 24 root    root    4096  1. Nov 2017  ..
drwxr-xr-x  2 namefoo namefoo    2 21. Jan 2020  ?
namefoo@host /srv/datadir $ ls -la '?'
ls: Zugriff auf ? nicht möglich: Datei oder Verzeichnis nicht gefunden
namefoo@host /srv/datadir $


This directory named ? appeared on a ZFS volume for no reason and I can't
access
it and can't delete it.  A scrub doesn't repair it.  It doesn't seem to do
any
harm yet, but it's annoying.

Any idea how to fix that?



2022-11-10 21:24:23 dpchrist@f3 ~/foo
$ freebsd-version ; uname -a
12.3-RELEASE-p7
FreeBSD f3.tracy.holgerdanske.com 12.3-RELEASE-p6 FreeBSD 
12.3-RELEASE-p6 GENERIC  amd64


2022-11-10 21:24:45 dpchrist@f3 ~/foo
$ bash --version
GNU bash, version 5.2.0(3)-release (amd64-portbld-freebsd12.3)
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 



This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

2022-11-10 21:24:52 dpchrist@f3 ~/foo
$ ll
total 13
drwxr-xr-x   2 dpchrist  dpchrist   2 2022/11/10 21:24:21 .
drwxr-xr-x  14 dpchrist  dpchrist  30 2022/11/10 21:24:04 ..

2022-11-10 21:25:03 dpchrist@f3 ~/foo
$ touch '?'

2022-11-10 21:25:08 dpchrist@f3 ~/foo
$ ll
total 14
drwxr-xr-x   2 dpchrist  dpchrist   3 2022/11/10 21:25:08 .
drwxr-xr-x  14 dpchrist  dpchrist  30 2022/11/10 21:24:04 ..
-rw-r--r--   1 dpchrist  dpchrist   0 2022/11/10 21:25:08 ?

2022-11-10 21:25:11 dpchrist@f3 ~/foo
$ rm '?'
remove ?? y

2022-11-10 21:25:19 dpchrist@f3 ~/foo
$ ll
total 13
drwxr-xr-x   2 dpchrist  dpchrist   2 2022/11/10 21:25:19 .
drwxr-xr-x  14 dpchrist  dpchrist  30 2022/11/10 21:24:04 ..


David

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-10 Thread David Christensen

On 11/10/22 07:44, hw wrote:

On Wed, 2022-11-09 at 21:36 -0800, David Christensen wrote:

On 11/9/22 00:24, hw wrote:
  > On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:

Be careful that you do not confuse a ~33 GiB full backup set, and 78
snapshots over six months of that same full backup set, with a full
backup of 3.5 TiB of data.

The full backup isn't deduplicated?

"Full", "incremental", etc., occur at the backup utility level -- e.g. 
on top of the ZFS filesystem.  (All of my backups are full backups using 
rsync.)  ZFS deduplication occurs at the block level -- e.g. the bottom 
of the ZFS filesystem.  If your backup tool is writing to, or reading 
from, a ZFS filesystem, the backup tool is oblivious to the internal 
operations of ZFS (compression or none, deduplicaton or none, etc.) so 
long as the filesystem "just works".

Writing to a ZFS filesystem with deduplication is much slower than
simply writing to, say, an ext4 filesystem -- because ZFS has to hash
every incoming block and see if it matches the hash of any existing
block in the destination pool.  Storing the existing block hashes in a
dedicated dedup virtual device will expedite this process.

But when it needs to write almost nothing because almost everthing gets
deduplicated, can't it be faster than having to write everthing?

There are many factors that affect how fast ZFS can write files to disk. 
 You will get the best answers if you run benchmarks using your 
hardware and data.

  >> I run my backup script each night.  It uses rsync to copy files and
  >
  > Aww, I can't really do that because my servers eats like 200-300W
because it has
  > so many disks in it.  Electricity is outrageously expensive here.

Perhaps platinum rated power supplies?  Energy efficient HDD's/ SSD's?

If you pay for it ... :)

Running it once in a while for some hours to make backups is still possible.
Replacing the hardware is way more expensive.

My SOHO server has ~1 TiB of data.  A ZFS snapshot takes a few seconds. 
ZFS incremental replication to the backup server proceeds at anywhere 
from 0 to 50 MB/s, depending upon how much content is new or has changed.

  > Sounds like a nice setup.  Does that mean you use snapshots to keep
multiple
  > generations of backups and make backups by overwriting everything
after you made
  > a snapshot?

Yes.

I start thinking more and more that I should make use of snapshots.

Taking snapshots is fast and easy.  The challenge is deciding when to 
destroy them.

zfs-auto-snapshot can do both automatically:

https://packages.debian.org/bullseye/zfs-auto-snapshot

https://manpages.debian.org/bullseye/zfs-auto-snapshot/zfs-auto-snapshot.8.en.html

Without deduplication or compression, my backup set and 78 snapshots
would require 3.5 TiB of storage.  With deduplication and compression,
they require 86 GiB of storage.

Wow that's quite a difference!  What makes this difference, the compression or
the deduplication? 

Deduplication.

When you have snapshots, you would store only the
differences from one snapshot to the next, 
and that would mean that there aren't

so many duplicates that could be deduplicated.

I do not know -- I have not crawled the ZFS code; I just use it.

Users can recover their own files without needing help from a system
administrator.

You have users who know how to get files out of snapshots?

Not really; but the feature is there.

   For compressed and/or encrypted archives, image, etc., I do not use
   compression or de-duplication
  >>>
  >>> Yeah, they wouldn't compress.  Why no deduplication?
  >>
  >>
  >> Because I very much doubt that there will be duplicate blocks in
such files.
  >
  > Hm, would it hurt?

Yes.  ZFS deduplication is resource intensive.

But you're using it already.

I have learned the hard way to only use deduplication when it makes sense.

What were the makes and models of the 6 disks?  Of the SSD's?  If you
have a 'zpool status' console session from then, please post it.

They were (and still are) 6x4TB WD Red (though one or two have failed over time)
and two Samsung 850 PRO, IIRC.  I don't have an old session anymore.

These WD Red are slow to begin with.  IIRC, both SDDs failed and I removed them.

The other instance didn't use SSDs but 6x2TB HGST Ultrastar.  Those aren't
exactly slow but ZFS is slow.

Those HDD's should be fine with ZFS; but those SSD's are desktop drives, 
not cache devices.  That said, I am making the same mistake with Intel 
SSD 520 Series.  I have considered switching to one Intel Optane Memory 
Series and a PCIe 4x adapter card in each server.

MySQL appears to have the ability to use raw disks.  Tuned correctly,
this should give the best results:

https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tablespace.html#innodb-raw-devices

Could mysql 5.6 already do that?  I'll have to see if mariadb can do that now
...

I do not know -- I do not run MySQL or Maria.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Michael Stone


On Thu, Nov 10, 2022 at 08:32:36PM -0500, Dan Ritter wrote:

* RAID 5 and 6 restoration incurs additional stress on the other
 disks in the RAID which makes it more likely that one of them
 will fail.


I believe that's mostly apocryphal; I haven't seen science backing that 
up, and it hasn't been my experience either.



 The advantage of RAID 6 is that it can then recover
 from that...


The advantage to RAID 6 is that it can tolerate a double disk failure. 
With RAID 1 you need 3x your effective capacity to achieve that and even 
though storage has gotten cheaper, it hasn't gotten that cheap. (e.g., 
an 8 disk RAID 6 has the same fault tolerance as an 18 disk RAID 1 of 
equivalent capacity, ignoring pointless quibbling over probabilities.)

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Dan Ritter

Linux-Fan wrote: 
> I think the arguments of the RAID5/6 critics summarized were as follows:
> 
> * Running in a RAID level that is 5 or 6 degrades performance while
>   a disk is offline significantly. RAID 10 keeps most of its speed and
>   RAID 1 only degrades slightly for most use cases.
> 
> * During restore, RAID5 and 6 are known to degrade performance more compared
>   to restoring one of the other RAID levels.

* RAID 5 and 6 restoration incurs additional stress on the other
  disks in the RAID which makes it more likely that one of them
  will fail. The advantage of RAID 6 is that it can then recover
  from that...

* RAID 10 gets you better read performance in terms of both
  throughput and IOPS relative to the same number of disks in
  RAID 5 or 6. Most disk activity is reading.

> * Disk space has become so cheap that the savings of RAID5 may
>   no longer rectify the performance and reliability degradation
>   compared to RAID1 or 10.

I think that's a case-by-base basis. Every situation is
different, and should be assessed for cost, reliability and
performance concerns.

> All of these arguments come from a “server” point of view where it is
> assumed that
> 
> (1) You win something by running the server so you can actually
> tell that there is an economic value in it. This allows for
> arguments like “storage is cheap” which may not be the case at
> all if you are using up some thightly limited private budget.
> 
> (2) Uptime and delivering the service is paramount. Hence there
> are some considerations regarding the online performance of
> the server while the RAID is degraded and while it is restoring.
> If you are fine to take your machine offline or accept degraded
> performance for prolonged times then this does not apply of
> course. If you do not value the uptime making actual (even
> scheduled) copies of the data may be recommendable over
> using a RAID because such schemes may (among other advantages)
> protect you from accidental file deletions, too.

Even in household situations, knowing that you could have traded $100
last year for a working computer right now is an incentive to set up
disk mirroring. If you're storing lots of data that other
people in the household depend on, that might factor in to your
decisions, too.

Everybody has a budget. Some have big budgets, and some have
small. The power of open source software is that we can make
opportunities open to people with small budgets that are
otherwise reserved for people with big budgets.

Most of the computers in my house have one disk. If I value any
data on that disk, I back it up to the server, which has 4 4TB
disks in ZFS RAID10. If a disk fails in that, I know I can
survive that and replace it within 24 hours for a reasonable
amount of money -- rather more reasonable in the last few
months.

> > Is anyone still using ext4?  I'm not saying it's bad or anything, it
> > only seems that it has gone out of fashion.
> 
> IIRC its still Debian's default. Its my file system of choice unless I have
> very specific reasons against it. I have never seen it fail outside of
> hardware issues. Performance of ext4 is quite acceptable out of the box.
> E.g. it seems to be slightly faster than ZFS for my use cases. Almost every
> Linux live system can read it. There are no problematic licensing or
> stability issues whatsoever. By its popularity its probably one of the most
> widely-deployed Linux file systems which may enhance the chance that
> whatever problem you incur with ext4 someone else has had before...

All excellent reasons to use ext4.

-dsr-

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread DdB

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Am 10.11.2022 um 22:37 schrieb Linux-Fan:
> Ext4 still does not offer snapshots. The traditional way to do
> snapshots outside of fancy BTRFS and ZFS file systems is to add LVM
> to the equation although I do not have any useful experience with
> that. Specifically, I am not using snapshots at all so far, besides
> them being readily available on ZFS

Yes, although i am heavily dependant on zfs, my OS resides on an
nvme-ssd with ext4, i rsync as need be to an imagefile, that i keep on
compressed zfs, and which i snapshot, backup and so on, rather than
the ext4-partition itself. - Seems like a good compromise to me.
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEumgd33HMGU/Wk4ZRe3aiXLdoWD0FAmNterMACgkQe3aiXLdo
WD3coA//ZXf0/WfGVEfEi1Fxe/vYpnqqx9UZLfAL5+XrE19Gh1oVd25zGDhkaaFl
SbvwcnVII/v7Lzj6by86nJ44LvPqu/NRjzWGwM7ltK3t4t8C7C+h2lfxPuhVKxfW
zqt/kp053ZCPUj6nD8nD60MLI88sxoyTVRG6nVzqW0FiC7be3VE3l4l4O2E0Qr4U
OM2lqmLDXPJcJ6pNGZp5p4470st/ODBNuG/7nM+mM04ylZJYLJV7ykYpCx6R8uiW
ZKoY+VZuI0dPsTEe24O0CRRm2hmW99ET6p+LAZHEnhKGPB/cxOIECiLTmqy8NICi
90AMxTp+D7bLglgnF4a0ZAukYwxnj+gCMj7B/CKCT62qLVVEavLOdVQydQx/7kue
+DsJ8PrXryVhO7xL01NeZbq4Ur5vwbY1Unk5iWHq8snoh+13Dru+a3DGcRQNr2Ph
QkzHny8LwO7x4Ob+a4/YRhjGjYWPxA9Y9huUDFLEDp0v0QDixtI+oytqED/hE30b
SPWZUbQR2pfF8Mbst02zDqknv5rvt9NxNhh0tvcspVsNv4y81/wUO0O8HVc4yBL2
lcJ6Wf11MRmbJ9J4NoZom3GnFUcysqnfEsxk+XLwJUTpfHU3VZS5Ovheu/OZNrDT
ErRY9g+yES8YTtsb4Vk2hqIvU5LvGu/BzCh0kyavGWkz3+dE9go=
=Ed39
-END PGP SIGNATURE-

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Linux-Fan

hw writes:

On Wed, 2022-11-09 at 19:17 +0100, Linux-Fan wrote:
> hw writes:
> > On Wed, 2022-11-09 at 14:29 +0100, didier gaumet wrote:
> > > Le 09/11/2022 à 12:41, hw a écrit :

[...]

> > I'd
> > have to use mdadm to create a RAID5 (or use the hardware RAID but that  
> > isn't

>
> AFAIK BTRFS also includes some integrated RAID support such that you do  
> not necessarily need to pair it with mdadm.

Yes, but RAID56 is broken in btrfs.

> It is advised against using for RAID 
> 5 or 6 even in most recent Linux kernels, though:
>
> 
https://btrfs.readthedocs.io/en/latest/btrfs-man5.html#raid56-status-and-recommended-practices

Yes, that's why I would have to use btrfs on mdadm when I want to make a  
RAID5.

That kinda sucks.

> RAID 5 and 6 have their own issues you should be aware of even when  
> running 

> them with the time-proven and reliable mdadm stack. You can find a lot of 
> interesting results by searching for “RAID5 considered harmful” online.  
> This 
> one is the classic that does not seem to make it to the top results,  
> though:

Hm, really?  The only time that RAID5 gave me trouble was when the hardware  

[...]

I have never used RAID5 so how would I know :)

I think the arguments of the RAID5/6 critics summarized were as follows:

* Running in a RAID level that is 5 or 6 degrades performance while
  a disk is offline significantly. RAID 10 keeps most of its speed and
  RAID 1 only degrades slightly for most use cases.

* During restore, RAID5 and 6 are known to degrade performance more compared
  to restoring one of the other RAID levels.

* Disk space has become so cheap that the savings of RAID5 may
  no longer rectify the performance and reliability degradation
  compared to RAID1 or 10.

All of these arguments come from a “server” point of view where it is  
assumed that

(1) You win something by running the server so you can actually
tell that there is an economic value in it. This allows for
arguments like “storage is cheap” which may not be the case at
all if you are using up some thightly limited private budget.

(2) Uptime and delivering the service is paramount. Hence there
are some considerations regarding the online performance of
the server while the RAID is degraded and while it is restoring.
If you are fine to take your machine offline or accept degraded
performance for prolonged times then this does not apply of
course. If you do not value the uptime making actual (even
scheduled) copies of the data may be recommendable over
using a RAID because such schemes may (among other advantages)
protect you from accidental file deletions, too.

Also note that in today's computing landscape, not all unwanted file  
deletions are accidental. With the advent of “crypto trojans” adversaries  
exist that actually try to encrypt or delete your data to extort a ransom.

More than one disk can fail?  Sure can, and it's one of the reasons why I  
make

backups.

You also have to consider costs.  How much do you want to spend on storage  
and
and on backups?  And do you want make yourself crazy worrying about your  
data?

I am pretty sure that if I separate my PC into GPU, CPU, RAM and Storage, I  
spent most on storage actually. Well established schemes of redundancy and  
backups make me worry less about my data.

I still worry enough about backups to have written my own software:
https://masysma.net/32/jmbb.xhtml
and that I am also evaluating new developments in that area to probably  
replace my self-written program by a more reliable (because used by more  
people!) alternative:

https://masysma.net/37/backup_tests_borg_bupstash_kopia.xhtml

> https://www.baarf.dk/BAARF/RAID5_versus_RAID10.txt
>
> If you want to go with mdadm (irrespective of RAID level), you might also 
> consider running ext4 and trade the complexity and features of the  
> advanced file systems for a good combination of stability and support.

Is anyone still using ext4?  I'm not saying it's bad or anything, it only  
seems that it has gone out of fashion.

IIRC its still Debian's default. Its my file system of choice unless I have  
very specific reasons against it. I have never seen it fail outside of  
hardware issues. Performance of ext4 is quite acceptable out of the box.  
E.g. it seems to be slightly faster than ZFS for my use cases.  
Almost every Linux live system can read it. There are no problematic  
licensing or stability issues whatsoever. By its popularity its probably one  
of the most widely-deployed Linux file systems which may enhance the chance  
that whatever problem you incur with ext4 someone else has had before...

I'm considering using snapshots.  Ext4 didn't have those last time I checked.

Ext4 still does not offer snapshots. The traditional way to do snapshots  
outside of fancy BTRFS and ZFS file systems is to add LVM to the equation  
although I do not have any useful experience with that. Specifically, I am  
not

Re: weird directory entry on ZFS volume (Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)))

2022-11-10 Thread Greg Wooledge

On Thu, Nov 10, 2022 at 06:54:31PM +0100, hw wrote:
> Ah, yes.  I tricked myself because I don't have hd installed,

It's just a symlink to hexdump.

lrwxrwxrwx 1 root root 7 Jan 20  2022 /usr/bin/hd -> hexdump

unicorn:~$ dpkg -S usr/bin/hd
bsdextrautils: /usr/bin/hd
unicorn:~$ dpkg -S usr/bin/hexdump
bsdextrautils: /usr/bin/hexdump

> It's an ancient Gentoo

A.  Anyway, from the Debian man page:

   -C, --canonical
  Canonical  hex+ASCII display.  Display the input offset in hexa‐
  decimal, followed by sixteen space-separated, two-column,  hexa‐
  decimal  bytes, followed by the same sixteen bytes in %_p format
  enclosed in '|' characters.  Invoking the program as hd  implies
  this option.

Why on earth the default format of "hexdump" uses that weird 16-bit
little endian nonsense is beyond me.

Re: weird directory entry on ZFS volume (Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 09:30 -0500, Greg Wooledge wrote:
> On Thu, Nov 10, 2022 at 02:48:28PM +0100, hw wrote:
> > On Thu, 2022-11-10 at 07:03 -0500, Greg Wooledge wrote:
> 
> [...]
> > printf '%s\0' * | hexdump
> > 000 00c2 6177 7468     
> > 007
> 
> I dislike this output format, but it looks like there are two files
> here.  The first is 0xc2, and the second is 0x77 0x61 0x68 0x74 if
> I'm reversing and splitting the silly output correctly.  (This spells
> "waht", if I got it right.)
> > 

Ah, yes.  I tricked myself because I don't have hd installed, so I redirected
the output of printf into a file --- which I wanted to name 'what' but I
mistyped as 'waht' --- so I could load it into emacs and use hexl-mode.  But the
display kinda sucked and I found I have hexdump installed and used that. 
Meanwhile I totally forgot about the file I had created.

> [...]
> > 
> The file in question appears to have a name which is the single byte 0xc2.
> Since that's not a valid UTF-8 character, ls chooses something to display
> instead.  In your case, it chose a '?' character.

I'm the only one who can create files there, and I didn't create that.  Using
0xc2 as a file name speaks loudly against that I'd create that file
accidentially.

>   I'm guessing this is on
> an older release of Debian.

It's an ancient Gentoo which couldn't be updated in years because they broke the
update process.  Back then, Gentoo was the only Linux distribution that didn't
need fuse for ZFS that I could find.

> In my case, it does this:
> 
> unicorn:~$ mkdir /tmp/x && cd "$_"
> unicorn:/tmp/x$ touch $'\xc2'
> unicorn:/tmp/x$ ls -la
> total 80
> -rw-r--r--  1 greg greg 0 Nov 10 09:21 ''$'\302'
> drwxr-xr-x  2 greg greg  4096 Nov 10 09:21  ./
> drwxrwxrwt 20 root root 73728 Nov 10 09:21  ../
> 
> In my version of ls, there's a --quoting-style= option that can help
> control what you see.  But that's a tangent you can explore later.
> 
> Since we know the actual name of the file (subdirectory) now, let's just
> rename it to something sane.
> 
> mv $'\xc2' subdir
> 
> Then you can investigate it, remove it, or do whatever else you want.

Cool, I've renamed it, thank you very much :)  I'm afraid that the file system
will crash when I remove it ...  It's an empty directory.  Ever since I noticed
it, I couldn't do anything with it and I thought it's some bug in the file
system.

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-10 Thread hw

On Thu, 2022-11-10 at 10:47 +0100, DdB wrote:
> Am 10.11.2022 um 06:38 schrieb David Christensen:
> > What is your technique for defragmenting ZFS?
> well, that was meant more or less a joke: there is none apart from
> offloading all the data, destroying and rebuilding the pool, and filling
> it again from the backup. But i do it from time to time if fragmentation
> got high, the speed improvements are obvious. OTOH the process takes
> days on my SOHO servers
> 

Does it save you days so that you save more time than you spend on defragmenting
it because access is faster?

Perhaps after so many days of not defragging, but how many days?

Maybe use an archive pool that doesn't get deleted from?

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-10 Thread hw

On Wed, 2022-11-09 at 21:36 -0800, David Christensen wrote:
> On 11/9/22 00:24, hw wrote:
>  > On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:
> 
>  > Hmm, when you can backup like 3.5TB with that, maybe I should put 
> FreeBSD on my
>  > server and give ZFS a try.  Worst thing that can happen is that it 
> crashes and
>  > I'd have made an experiment that wasn't successful.  Best thing, I 
> guess, could
>  > be that it works and backups are way faster because the server 
> doesn't have to
>  > actually write so much data because it gets deduplicated and reading 
> from the
>  > clients is faster than writing to the server.
> 
> 
> Be careful that you do not confuse a ~33 GiB full backup set, and 78 
> snapshots over six months of that same full backup set, with a full 
> backup of 3.5 TiB of data.  I would suggest a 10 GiB pool to backup the 
> latter.

The full backup isn't deduplicated?

> Writing to a ZFS filesystem with deduplication is much slower than 
> simply writing to, say, an ext4 filesystem -- because ZFS has to hash 
> every incoming block and see if it matches the hash of any existing 
> block in the destination pool.  Storing the existing block hashes in a 
> dedicated dedup virtual device will expedite this process.

But when it needs to write almost nothing because almost everthing gets
deduplicated, can't it be faster than having to write everthing?

>  >> I run my backup script each night.  It uses rsync to copy files and
>  >
>  > Aww, I can't really do that because my servers eats like 200-300W 
> because it has
>  > so many disks in it.  Electricity is outrageously expensive here.
> 
> 
> Perhaps platinum rated power supplies?  Energy efficient HDD's/ SSD's?

If you pay for it ... :)

Running it once in a while for some hours to make backups is still possible. 
Replacing the hardware is way more expensive.

> [...]
>  > Sounds like a nice setup.  Does that mean you use snapshots to keep 
> multiple
>  > generations of backups and make backups by overwriting everything 
> after you made
>  > a snapshot?
> 
> Yes.

I start thinking more and more that I should make use of snapshots.

>  > In that case, is deduplication that important/worthwhile?  You're not
>  > duplicating it all by writing another generation of the backup but 
> store only
>  > what's different through making use of the snapshots.
> 
> Without deduplication or compression, my backup set and 78 snapshots 
> would require 3.5 TiB of storage.  With deduplication and compression, 
> they require 86 GiB of storage.

Wow that's quite a difference!  What makes this difference, the compression or
the deduplication?  When you have snapshots, you would store only the
differences from one snapshot to the next, and that would mean that there aren't
so many duplicates that could be deduplicated.

>  > ... I only never got around to figure [ZFS snapshots] out because I 
> didn't have the need.
> 
> 
> I accidentally trash files on occasion.  Being able to restore them 
> quickly and easily with a cp(1), scp(1), etc., is a killer feature.

indeed

> Users can recover their own files without needing help from a system 
> administrator.

You have users who know how to get files out of snapshots?

>  > But it could also be useful for "little" things like taking a 
> snapshot of the
>  > root volume before updating or changing some configuration and being 
> able to
>  > easily to undo that.
> 
> 
> FreeBSD with ZFS-on-root has a killer feature called "Boot Environments" 
> that has taken that idea to the next level:
> 
> https://klarasystems.com/articles/managing-boot-environments/

That's really cool.  Linux is missing out on a lot by treating ZFS as an alien.

I guess btrfs could, in theory, make something like boot environments possible,
but you can't even really boot from btrfs because it'll fail to boot as soon as
the boot volume is degraded, like when a disc has failed, and then you're
screwed because you can't log in through ssh to fix anything but have to
actually go to the machine to get it back up.  That's a non-option and you have
to use something else than btrfs to boot from.

>  >> I have 3.5 TiB of backups.
> 
> 
> It is useful to group files with similar characteristics (size, 
> workload, compressibility, duplicates, backup strategy, etc.) into 
> specific ZFS filesystems (or filesystem trees).  You can then adjust ZFS 
> properties and backup strategies to match.

That's a good idea.

>   For compressed and/or encrypted archives, image, etc., I do not use
>   compression or de-duplication
>  >>>
>  >>> Yeah, they wouldn't compress.  Why no deduplication?
>  >>
>  >>
>  >> Because I very much doubt that there will be duplicate blocks in 
> such files.
>  >
>  > Hm, would it hurt?
> 
> 
> Yes.  ZFS deduplication is resource intensive.

But you're using it already.

>  > Oh it's not about performance when degraded, but about performance. 
> IIRC when
>  > you have a ZFS pool that uses the equivalent

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Dan Ritter

Brad Rogers wrote: 
> On Thu, 10 Nov 2022 08:48:43 -0500
> Dan Ritter  wrote:
> 
> Hello Dan,
> 
> >8 is not a magic number.
> 
> Clearly, you don't read Terry Pratchett.   :-)

In the context of ZFS, 8 is not a magic number.

May you be ridiculed by Pictsies.

-dsr-

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Brad Rogers

On Thu, 10 Nov 2022 08:48:43 -0500
Dan Ritter  wrote:

Hello Dan,

>8 is not a magic number.

Clearly, you don't read Terry Pratchett.   :-)

-- 
 Regards  _   "Valid sig separator is {dash}{dash}{space}"
 / )  "The blindingly obvious is never immediately apparent"
/ _)rad   "Is it only me that has a working delete key?"
It's only the children of the f** wealthy tend to be good looking
Ugly - The Stranglers


pgp6ozKr0osey.pgp
Description: OpenPGP digital signature

Re: weird directory entry on ZFS volume (Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)))

2022-11-10 Thread Greg Wooledge

On Thu, Nov 10, 2022 at 02:48:28PM +0100, hw wrote:
> On Thu, 2022-11-10 at 07:03 -0500, Greg Wooledge wrote:
> good idea:
> 
> printf %s * | hexdump
> 000 77c2 6861 0074 
> 005

Looks like there might be more than one file here.

> > If you misrepresented the situation, and there's actually more than one
> > file in this directory, then use something like this instead:
> > 
> > shopt -s failglob
> > printf '%s\0' ? | hd
> 
> shopt -s failglob
> printf '%s\0' ? | hexdump
> 000 00c2   
> 002

OK, that's a good result.

> > Note that the ? is *not* quoted here, because we want it to match any
> > one-character filename, no matter what that character actually is.  If
> > this doesn't work, try ?? or * as the glob, until you manage to find it.
> 
> printf '%s\0' ?? | hexdump
> -bash: Keine Entsprechung: ??
> 
> (meaning something like "no equivalent")

The English version is "No match".

> printf '%s\0' * | hexdump
> 000 00c2 6177 7468 
> 007

I dislike this output format, but it looks like there are two files
here.  The first is 0xc2, and the second is 0x77 0x61 0x68 0x74 if
I'm reversing and splitting the silly output correctly.  (This spells
"waht", if I got it right.)

> > If it turns out that '?' really is the filename, then it becomes a ZFS
> > issue with which I can't help.
> 
> I would think it is.  Is it?

The file in question appears to have a name which is the single byte 0xc2.
Since that's not a valid UTF-8 character, ls chooses something to display
instead.  In your case, it chose a '?' character.  I'm guessing this is on
an older release of Debian.

In my case, it does this:

unicorn:~$ mkdir /tmp/x && cd "$_"
unicorn:/tmp/x$ touch $'\xc2'
unicorn:/tmp/x$ ls -la
total 80
-rw-r--r--  1 greg greg 0 Nov 10 09:21 ''$'\302'
drwxr-xr-x  2 greg greg  4096 Nov 10 09:21  ./
drwxrwxrwt 20 root root 73728 Nov 10 09:21  ../

In my version of ls, there's a --quoting-style= option that can help
control what you see.  But that's a tangent you can explore later.

Since we know the actual name of the file (subdirectory) now, let's just
rename it to something sane.

mv $'\xc2' subdir

Then you can investigate it, remove it, or do whatever else you want.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Dan Ritter

hw wrote: 
> And I've been reading that when using ZFS, you shouldn't make volumes with 
> more
> than 8 disks.  That's very inconvenient.

Where do you read these things?

The number of disks in a zvol can be optimized, depending on
your desired redundancy method, total number of drives, and
tolerance for reduced performance during resilvering. 

Multiple zvols together form a zpool. Filesystems are allocated from
a zpool.

8 is not a magic number.

-dsr-

weird directory entry on ZFS volume (Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 07:03 -0500, Greg Wooledge wrote:
> On Thu, Nov 10, 2022 at 05:54:00AM +0100, hw wrote:
> > ls -la
> > insgesamt 5
> > drwxr-xr-x  3 namefoo namefoo    3 16. Aug 22:36 .
> > drwxr-xr-x 24 root    root    4096  1. Nov 2017  ..
> > drwxr-xr-x  2 namefoo namefoo    2 21. Jan 2020  ?
> > namefoo@host /srv/datadir $ ls -la '?'
> > ls: Zugriff auf ? nicht möglich: Datei oder Verzeichnis nicht gefunden
> > namefoo@host /srv/datadir $ 
> > 
> > 
> > This directory named ? appeared on a ZFS volume for no reason and I can't
> > access
> > it and can't delete it.  A scrub doesn't repair it.  It doesn't seem to do
> > any
> > harm yet, but it's annoying.
> > 
> > Any idea how to fix that?
> 
> ls -la might not be showing you the true name.  Try this:
> 
> printf %s * | hd
> 
> That should give you a hex dump of the bytes in the actual filename.

good idea:

printf %s * | hexdump
000 77c2 6861 0074 
005

> If you misrepresented the situation, and there's actually more than one
> file in this directory, then use something like this instead:
> 
> shopt -s failglob
> printf '%s\0' ? | hd

shopt -s failglob
printf '%s\0' ? | hexdump
000 00c2   
002

> Note that the ? is *not* quoted here, because we want it to match any
> one-character filename, no matter what that character actually is.  If
> this doesn't work, try ?? or * as the glob, until you manage to find it.

printf '%s\0' ?? | hexdump
-bash: Keine Entsprechung: ??

(meaning something like "no equivalent")


printf '%s\0' * | hexdump
000 00c2 6177 7468 
007


> If it turns out that '?' really is the filename, then it becomes a ZFS
> issue with which I can't help.

I would think it is.  Is it?

perl -e 'print chr(0xc2) . "\n"'

... prints a blank line.  What's 0xc2?  I guess that should be UTF8 ...


printf %s *
aht

What would you expect it to print after shopt?

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread DdB

Am 10.11.2022 um 13:03 schrieb Greg Wooledge:
> If it turns out that '?' really is the filename, then it becomes a ZFS
> issue with which I can't help.

just tested: i could create, rename, delete a file with that name on a
zfs filesystem just as with any other fileystem.

But: i recall having seen an issue with corrupted filenames in a
snapshot once (several years ago though). At the time, i did resort to
send/recv to get the issue straightened out.

But it is very much more likely, that the filename '?' is entirely
unrelated to zfs. Although zfs is perceived as being easy to handle
(only 2 commands need to be learned: zpool and zfs), it takes a while to
get acquainted with all the concepts and behaviors. Take some time to
play with an installation (in a vm or just with a file based pool should
be considered).

block devices vs. partitions (Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 10:59 +0100, DdB wrote:
> Am 10.11.2022 um 04:46 schrieb hw:
> > On Wed, 2022-11-09 at 18:26 +0100, Christoph Brinkhaus wrote:
> > > Am Wed, Nov 09, 2022 at 06:11:34PM +0100 schrieb hw:
> > > [...]
> [...]
> > > 
> > Why would partitions be better than the block device itself?  They're like
> > an
> > additional layer and what could be faster and easier than directly using the
> > block devices?
> > 
> > 
> hurts my eyes to see such desinformation circulating.

What's wrong about it?

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread hw

On Thu, 2022-11-10 at 10:34 +0100, Christoph Brinkhaus wrote:
> Am Thu, Nov 10, 2022 at 04:46:12AM +0100 schrieb hw:
> > On Wed, 2022-11-09 at 18:26 +0100, Christoph Brinkhaus wrote:
> > > Am Wed, Nov 09, 2022 at 06:11:34PM +0100 schrieb hw:
> > > [...]
> [...]
> > > 
> > 
> > Why would partitions be better than the block device itself?  They're like
> > an
> > additional layer and what could be faster and easier than directly using the
> > block devices?
>  
>  Using the block device is no issue until you have a mirror or so.
>  In case of a mirror ZFS will use the capacity of the smallest drive.

But you can't make partitions larger than the drive.

>  I have read that a for example 100GB disk might be slightly larger
>  then 100GB. When you want to replace a 100GB disk with a spare one
>  which is less larger than the original one the pool will not fit on
>  the disk and the replacement fails.

Ah yes, right!  I kinda did that a while ago for spinning disks that might be
replaced by SSDs eventually and wanted to make sure that the SSDs wouldn't be
too small.  I forgot about that, my memory really isn't what it used to be ...

>  With partitions you can specify the space. It does not hurt if there
>  are a few MB unallocated. But then the partitions of the diks have
>  exactly the same size.

yeah

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Greg Wooledge

On Thu, Nov 10, 2022 at 05:54:00AM +0100, hw wrote:
> ls -la
> insgesamt 5
> drwxr-xr-x  3 namefoo namefoo3 16. Aug 22:36 .
> drwxr-xr-x 24 rootroot4096  1. Nov 2017  ..
> drwxr-xr-x  2 namefoo namefoo2 21. Jan 2020  ?
> namefoo@host /srv/datadir $ ls -la '?'
> ls: Zugriff auf ? nicht möglich: Datei oder Verzeichnis nicht gefunden
> namefoo@host /srv/datadir $ 
> 
> 
> This directory named ? appeared on a ZFS volume for no reason and I can't 
> access
> it and can't delete it.  A scrub doesn't repair it.  It doesn't seem to do any
> harm yet, but it's annoying.
> 
> Any idea how to fix that?

ls -la might not be showing you the true name.  Try this:

printf %s * | hd

That should give you a hex dump of the bytes in the actual filename.

If you misrepresented the situation, and there's actually more than one
file in this directory, then use something like this instead:

shopt -s failglob
printf '%s\0' ? | hd

Note that the ? is *not* quoted here, because we want it to match any
one-character filename, no matter what that character actually is.  If
this doesn't work, try ?? or * as the glob, until you manage to find it.

If it turns out that '?' really is the filename, then it becomes a ZFS
issue with which I can't help.

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread DdB

Am 10.11.2022 um 04:46 schrieb hw:
> On Wed, 2022-11-09 at 18:26 +0100, Christoph Brinkhaus wrote:
>> Am Wed, Nov 09, 2022 at 06:11:34PM +0100 schrieb hw:
>> [...]
>>> FreeBSD has ZFS but can't even configure the disk controllers, so that won't
>>> work.  
>>
>> If I understand you right you mean RAID controllers?
> 
> yes
> 
>> According to my knowledge ZFS should be used without any RAID
>> controllers. Disks or better partions are fine.
> 
> I know, but it's what I have.  JBOD controllers are difficult to find.  And it
> doesn't really matter because I can configure each disk as a single disk ---
> still RAID though.  It may even be an advantage because the controllers have 
> 1GB
> cache each and the computers CPU doesn't need to do command queuing.
> 
> And I've been reading that when using ZFS, you shouldn't make volumes with 
> more
> than 8 disks.  That's very inconvenient.
> 
> Why would partitions be better than the block device itself?  They're like an
> additional layer and what could be faster and easier than directly using the
> block devices?
> 
> 
hurts my eyes to see such desinformation circulating. But i myself am
only one happy zfs user for a decade by now. I suggest to get in contact
with the zfs gurus on ZoL (or read the archive from
https://zfsonlinux.topicbox.com/groups/zfs-discuss)

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-10 Thread DdB

Am 10.11.2022 um 06:38 schrieb David Christensen:
> What is your technique for defragmenting ZFS?
well, that was meant more or less a joke: there is none apart from
offloading all the data, destroying and rebuilding the pool, and filling
it again from the backup. But i do it from time to time if fragmentation
got high, the speed improvements are obvious. OTOH the process takes
days on my SOHO servers

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-10 Thread Christoph Brinkhaus

Am Thu, Nov 10, 2022 at 04:46:12AM +0100 schrieb hw:
> On Wed, 2022-11-09 at 18:26 +0100, Christoph Brinkhaus wrote:
> > Am Wed, Nov 09, 2022 at 06:11:34PM +0100 schrieb hw:
> > [...]
> > > FreeBSD has ZFS but can't even configure the disk controllers, so that 
> > > won't
> > > work.  
> > 
> > If I understand you right you mean RAID controllers?
> 
> yes
> 
> > According to my knowledge ZFS should be used without any RAID
> > controllers. Disks or better partions are fine.
> 
> I know, but it's what I have.  JBOD controllers are difficult to find.  And it
> doesn't really matter because I can configure each disk as a single disk ---
> still RAID though.  It may even be an advantage because the controllers have 
> 1GB
> cache each and the computers CPU doesn't need to do command queuing.
> 
> And I've been reading that when using ZFS, you shouldn't make volumes with 
> more
> than 8 disks.  That's very inconvenient.
> 
> Why would partitions be better than the block device itself?  They're like an
> additional layer and what could be faster and easier than directly using the
> block devices?

 Using the block device is no issue until you have a mirror or so.
 In case of a mirror ZFS will use the capacity of the smallest drive.

 I have read that a for example 100GB disk might be slightly larger
 then 100GB. When you want to replace a 100GB disk with a spare one
 which is less larger than the original one the pool will not fit on
 the disk and the replacement fails.

 With partitions you can specify the space. It does not hurt if there
 are a few MB unallocated. But then the partitions of the diks have
 exactly the same size.

 Kind regards,
 Christoph

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread David Christensen


On 11/9/22 01:35, DdB wrote:

> But
i am satisfied with zfs performance from spinning rust, if i dont fill
up the pool too much, and defrag after a while ...



What is your technique for defragmenting ZFS?


David

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread David Christensen

On 11/9/22 05:29, didier gaumet wrote:

- *BSDs nowadays have departed from old ZFS code and use the same source 
code stack as Linux (OpenZFS)

AIUI FreeBSD 12 and prior use ZFS-on-Linux code, while FreeBSD 13 and 
later use OpenZFS code.

On 11/9/22 05:44, didier gaumet wrote:

> I was using the word "performance" here as I would have in french (same
> word), thinking of technical abilities (speed, scalability and so on)
> without realizing that in english in the particular context of computer
> science that means primarily speed (if I understand correctly) :-)

I tend to use the term "performance" to mean minimum processor cycles, 
minimum memory consumption, minimum latency, and/or maximum data 
transfer per unit time.

David

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread David Christensen

On 11/9/22 00:24, hw wrote:
> On Tue, 2022-11-08 at 17:30 -0800, David Christensen wrote:

> Hmm, when you can backup like 3.5TB with that, maybe I should put 
FreeBSD on my
> server and give ZFS a try.  Worst thing that can happen is that it 
crashes and
> I'd have made an experiment that wasn't successful.  Best thing, I 
guess, could
> be that it works and backups are way faster because the server 
doesn't have to
> actually write so much data because it gets deduplicated and reading 
from the

> clients is faster than writing to the server.

Be careful that you do not confuse a ~33 GiB full backup set, and 78 
snapshots over six months of that same full backup set, with a full 
backup of 3.5 TiB of data.  I would suggest a 10 GiB pool to backup the 
latter.

Writing to a ZFS filesystem with deduplication is much slower than 
simply writing to, say, an ext4 filesystem -- because ZFS has to hash 
every incoming block and see if it matches the hash of any existing 
block in the destination pool.  Storing the existing block hashes in a 
dedicated dedup virtual device will expedite this process.

>> I run my backup script each night.  It uses rsync to copy files and
>
> Aww, I can't really do that because my servers eats like 200-300W 
because it has

> so many disks in it.  Electricity is outrageously expensive here.

Perhaps platinum rated power supplies?  Energy efficient HDD's/ SSD's?

>> directories from various LAN machines into ZFS filesystems named after
>> each host -- e.g. pool/backup/hostname (ZFS namespace) and
>> /var/local/backup/hostname (Unix filesystem namespace).  I have a
>> cron(8) that runs zfs-auto-snapshot once each day and once each month
>> that takes a recursive snapshot of the pool/backup filesystems.  Their
>> contents are then available via Unix namespace at
>> /var/local/backup/hostname/.zfs/snapshot/snapshotname.  If I want to
>> restore a file from, say, two months ago, I use Unix filesystem tools to
>> get it.
>
> Sounds like a nice setup.  Does that mean you use snapshots to keep 
multiple
> generations of backups and make backups by overwriting everything 
after you made

> a snapshot?

Yes.

> In that case, is deduplication that important/worthwhile?  You're not
> duplicating it all by writing another generation of the backup but 
store only

> what's different through making use of the snapshots.

Without deduplication or compression, my backup set and 78 snapshots 
would require 3.5 TiB of storage.  With deduplication and compression, 
they require 86 GiB of storage.

> ... I only never got around to figure [ZFS snapshots] out because I 
didn't have the need.

I accidentally trash files on occasion.  Being able to restore them 
quickly and easily with a cp(1), scp(1), etc., is a killer feature. 
Users can recover their own files without needing help from a system 
administrator.

> But it could also be useful for "little" things like taking a 
snapshot of the
> root volume before updating or changing some configuration and being 
able to

> easily to undo that.

FreeBSD with ZFS-on-root has a killer feature called "Boot Environments" 
that has taken that idea to the next level:

https://klarasystems.com/articles/managing-boot-environments/

>> I have 3.5 TiB of backups.

It is useful to group files with similar characteristics (size, 
workload, compressibility, duplicates, backup strategy, etc.) into 
specific ZFS filesystems (or filesystem trees).  You can then adjust ZFS 
properties and backup strategies to match.

 For compressed and/or encrypted archives, image, etc., I do not use
 compression or de-duplication
>>>
>>> Yeah, they wouldn't compress.  Why no deduplication?
>>
>>
>> Because I very much doubt that there will be duplicate blocks in 
such files.

>
> Hm, would it hurt?

Yes.  ZFS deduplication is resource intensive.

> Oh it's not about performance when degraded, but about performance. 
IIRC when
> you have a ZFS pool that uses the equivalent of RAID5, you're still 
limited to

> the speed of a single disk.  When you have a mysql database on such a ZFS
> volume, it's dead slow, and removing the SSD cache when the SSDs 
failed didn't
> make it any slower.  Obviously, it was a bad idea to put the database 
there, and
> I wouldn't do again when I can avoid it.  I also had my data on such 
a volume

> and I found that the performance with 6 disks left much to desire.

What were the makes and models of the 6 disks?  Of the SSD's?  If you 
have a 'zpool status' console session from then, please post it.

Constructing a ZFS pool to match the workload is not easy.  STFW there 
are plenty of articles.  Here is a general article I found recently:

https://klarasystems.com/articles/choosing-the-right-zfs-pool-layout/

MySQL appears to have the ability to use raw disks.  Tuned correctly, 
this should give the best results:

https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tablespace.html#innodb-raw-devices

If ZFS

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-09 Thread hw

On Wed, 2022-11-09 at 19:17 +0100, Linux-Fan wrote:
> hw writes:
> 
> > On Wed, 2022-11-09 at 14:29 +0100, didier gaumet wrote:
> > > Le 09/11/2022 à 12:41, hw a écrit :
> 
> [...]
> 
> > > I am really not so well aware of ZFS state but my impression was that:
> > > - FUSE implementation of ZoL (ZFS on Linux) is deprecated and that,
> > > Ubuntu excepted (classic module?), ZFS is now integrated by a DKMS module
> > 
> > Hm that could be.  Debian doesn't seem to have it as a module.
> 
> As already mentioned by others, zfs-dkms is readily available in the contrib  
> section along with zfsutils-linux. Here is what I noted down back when I  
> installed it:
> 
> https://masysma.net/37/zfs_commands_shortref.xhtml

Thanks, that's good information.

> I have been using ZFS on Linux on Debian since end of 2020 without any  
> issues. In fact, the dkms-based approach has run much more reliably than  
> my previous experiences with out-of-tree modules would have suggested...

Hm, issues?  I have one:

ls -la
insgesamt 5
drwxr-xr-x  3 namefoo namefoo3 16. Aug 22:36 .
drwxr-xr-x 24 rootroot4096  1. Nov 2017  ..
drwxr-xr-x  2 namefoo namefoo2 21. Jan 2020  ?
namefoo@host /srv/datadir $ ls -la '?'
ls: Zugriff auf ? nicht möglich: Datei oder Verzeichnis nicht gefunden
namefoo@host /srv/datadir $ 

This directory named ? appeared on a ZFS volume for no reason and I can't access
it and can't delete it.  A scrub doesn't repair it.  It doesn't seem to do any
harm yet, but it's annoying.

Any idea how to fix that?

> Nvidia drivers have been working for me in all releases from Debian 6 to 10
> both  
> inclusive. I did not have any need for them on Debian 11 yet, since I have  
> switched to an AMD card for my most recent system.
> 

Maybe it was longer ago.  I recently switched to AMD, too.  NVIDIA remains
uncooperative and their drivers are a hassle, so why would I support NVIDIA by
buying their products.  It was a good choice and it just works out of the box.

I can't get the 2nd monitor to work, but that's probably not an AMD issue.

> > However, Debian has apparently bad ZFS support (apparently still only Gentoo
> > actually supports it), so I'd go with btrfs.  Now that's gona suck because
> 
> You can use ZFS on Debian (see link above). Of course it remains your choice  
> whether you want to trust your data to the older, but less-well-integrated  
> technology (ZFS) or to the newer, but more easily integrated technology  
> (BTRFS).
> 
> 

It's fine when using the kernel module.  This isn't about newer, and ZFS seems
more mature than btrfs.  Somehow, development of btrfs is excruciatingly slow.

If it doesn't work out, I can always do something else and make a new backup.

> > I'd
> > have to use mdadm to create a RAID5 (or use the hardware RAID but that isn't
> 
> AFAIK BTRFS also includes some integrated RAID support such that you do not  
> necessarily need to pair it with mdadm.

Yes, but RAID56 is broken in btrfs.

>  It is advised against using for RAID  
> 5 or 6 even in most recent Linux kernels, though:
> 
> https://btrfs.readthedocs.io/en/latest/btrfs-man5.html#raid56-status-and-recommended-practices
> 

Yes, that's why I would have to use btrfs on mdadm when I want to make a RAID5.
That kinda sucks.

> RAID 5 and 6 have their own issues you should be aware of even when running  
> them with the time-proven and reliable mdadm stack. You can find a lot of  
> interesting results by searching for “RAID5 considered harmful” online. This  
> one is the classic that does not seem to make it to the top results, though:

Hm, really?  The only time that RAID5 gave me trouble was when the hardware RAID
controller steadfastly refused to rebuild the array after a failed disk was
replaced.  How often does that happen?

So yes, there are ppl saying that RAID5 is so bad, and I think it's exaggerated.
At at the end of the day, for all I know lightning could strike the server and
burn out all the disks and no alternative to RAID5 could prevent that.  So all
variants of RAID are bad and ZFS and btrfs and whatever are all just as bad and
any way of storing data is bad because something could happen to the data. 
Gathering data is actually bad to begin with and getting worse all the time. 
The less data you have, the better, because less data is less unwieldy.

There is a write hole with RAID5?  Well, I have an UPS and the controllers have
backup batteries.  So is there really gona be a write hole?  When I use mdadm, I
don't have a backup battery.  Then what?  Do JBOD controllers have backup
batteries or are you forced to use file systems that make them unnecessary? 
Bits can flip and maybe whatever controls the RAID may not be able to tell which
copy is the one to use.  The checksums ZFS and btrfs use may be insufficient and
then what.  ZFS and btrfs may not be a good idea to use because the software,
like Centos 7, is too old and prefers xfs instead.  Now what?  Rebuild the
server like every year or so to

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-09 Thread hw

On Wed, 2022-11-09 at 18:26 +0100, Christoph Brinkhaus wrote:
> Am Wed, Nov 09, 2022 at 06:11:34PM +0100 schrieb hw:
> [...]
> > FreeBSD has ZFS but can't even configure the disk controllers, so that won't
> > work.  
> 
> If I understand you right you mean RAID controllers?

yes

> According to my knowledge ZFS should be used without any RAID
> controllers. Disks or better partions are fine.

I know, but it's what I have.  JBOD controllers are difficult to find.  And it
doesn't really matter because I can configure each disk as a single disk ---
still RAID though.  It may even be an advantage because the controllers have 1GB
cache each and the computers CPU doesn't need to do command queuing.

And I've been reading that when using ZFS, you shouldn't make volumes with more
than 8 disks.  That's very inconvenient.

Why would partitions be better than the block device itself?  They're like an
additional layer and what could be faster and easier than directly using the
block devices?

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-09 Thread Linux-Fan

hw writes:

On Wed, 2022-11-09 at 14:29 +0100, didier gaumet wrote:
> Le 09/11/2022 à 12:41, hw a écrit :

[...]

> I am really not so well aware of ZFS state but my impression was that:
> - FUSE implementation of ZoL (ZFS on Linux) is deprecated and that,
> Ubuntu excepted (classic module?), ZFS is now integrated by a DKMS module

Hm that could be.  Debian doesn't seem to have it as a module.

As already mentioned by others, zfs-dkms is readily available in the contrib  
section along with zfsutils-linux. Here is what I noted down back when I  
installed it:

https://masysma.net/37/zfs_commands_shortref.xhtml

I have been using ZFS on Linux on Debian since end of 2020 without any  
issues. In fact, the dkms-based approach has run much more reliably than  
my previous experiences with out-of-tree modules would have suggested...

My setup works with a mirrored zpool and no deduplication, I did not need  
nor test anything else yet.

> - *BSDs integrate directly ZFS because there are no licences conflicts
> - *BSDs nowadays have departed from old ZFS code and use the same source
> code stack as Linux (OpenZFS)
> - Linux distros don't directly integrate ZFS because they generally
> consider there are licences conflicts. The notable exception being
> Ubuntu that considers that after legal review the situation is clear and
> there is no licence conflicts.

[...]

broke something.  Arch is apparently for machosists, and I don't want
derivatives, especially not Ubuntu, and that leaves only Debian.  I don't  
want

Debian either because when they introduced their brokenarch, they managed to
make it so that NVIDIA drivers didn't work anymore with no fix in sight and
broke other stuff as well, and you can't let your users down like that.  But
what's the alternative?

Nvidia drivers have been working for me in all releases from Debian 6 to 10 both  
inclusive. I did not have any need for them on Debian 11 yet, since I have  
switched to an AMD card for my most recent system.

However, Debian has apparently bad ZFS support (apparently still only Gentoo
actually supports it), so I'd go with btrfs.  Now that's gona suck because

You can use ZFS on Debian (see link above). Of course it remains your choice  
whether you want to trust your data to the older, but less-well-integrated  
technology (ZFS) or to the newer, but more easily integrated technology  
(BTRFS).

I'd
have to use mdadm to create a RAID5 (or use the hardware RAID but that isn't

AFAIK BTRFS also includes some integrated RAID support such that you do not  
necessarily need to pair it with mdadm. It is advised against using for RAID  
5 or 6 even in most recent Linux kernels, though:

https://btrfs.readthedocs.io/en/latest/btrfs-man5.html#raid56-status-and-recommended-practices

RAID 5 and 6 have their own issues you should be aware of even when running  
them with the time-proven and reliable mdadm stack. You can find a lot of  
interesting results by searching for “RAID5 considered harmful” online. This  
one is the classic that does not seem to make it to the top results, though:

https://www.baarf.dk/BAARF/RAID5_versus_RAID10.txt

If you want to go with mdadm (irrespective of RAID level), you might also  
consider running ext4 and trade the complexity and features of the advanced  
file systems for a good combination of stability and support.

fun
after I've seen the hardware RAID refusing to rebuild a volume after a failed
disk was replaced) and put btrfs on that because btrfs doesn't even support
RAID5.

YMMV
Linux-Fan

öö

pgpnYrzw5zESo.pgp
Description: PGP signature

Re: else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-09 Thread Christoph Brinkhaus

Am Wed, Nov 09, 2022 at 06:11:34PM +0100 schrieb hw:

Hi hw,

> On Wed, 2022-11-09 at 14:29 +0100, didier gaumet wrote:
> > Le 09/11/2022 à 12:41, hw a écrit :
> > [...]
> > > In any case, I'm currently tending to think that putting FreeBSD with ZFS 
> > > on
> > > my
> > > server might be the best option.  But then, apparently I won't be able to
> > > configure the controller cards, so that won't really work.  And ZFS with
> > > Linux
> > > isn't so great because it keeps fuse in between.
> > 
> > I am really not so well aware of ZFS state but my impression was that:
> > - FUSE implementation of ZoL (ZFS on Linux) is deprecated and that, 
> > Ubuntu excepted (classic module?), ZFS is now integrated by a DKMS module
> 
> Hm that could be.  Debian doesn't seem to have it as a module.
> 
> > - *BSDs integrate directly ZFS because there are no licences conflicts
> > - *BSDs nowadays have departed from old ZFS code and use the same source 
> > code stack as Linux (OpenZFS)
> > - Linux distros don't directly integrate ZFS because they generally 
> > consider there are licences conflicts. The notable exception being 
> > Ubuntu that considers that after legal review the situation is clear and 
> > there is no licence conflicts.
> 
> Well, I'm not touching Ubuntu.  I want to get away from Fedora because of 
> their
> hostility and that includes Centos since that has become a derivative of it. 
> FreeBSD has ZFS but can't even configure the disk controllers, so that won't
> work.  

If I understand you right you mean RAID controllers?
According to my knowledge ZFS should be used without any RAID
controllers. Disks or better partions are fine.

> I don't want to go with Gentoo because updating is a nightmare to the
> point where you suddenly find yourself unable to update at all because they
> broke something.  Arch is apparently for machosists, and I don't want
> derivatives, especially not Ubuntu, and that leaves only Debian.  I don't want
> Debian either because when they introduced their brokenarch, they managed to
> make it so that NVIDIA drivers didn't work anymore with no fix in sight and
> broke other stuff as well, and you can't let your users down like that.  But
> what's the alternative?
> 
> However, Debian has apparently bad ZFS support (apparently still only Gentoo
> actually supports it), so I'd go with btrfs.  

I have no knowledge about the status of ZFS on Linux distributions,
just about FreeBSD.

> Now that's gona suck because I'd
> have to use mdadm to create a RAID5 (or use the hardware RAID but that isn't 
> fun
> after I've seen the hardware RAID refusing to rebuild a volume after a failed
> disk was replaced) and put btrfs on that because btrfs doesn't even support
> RAID5.
> 
> Or what else?
> 
Kind regards,
Christoph

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread hw

On Wed, 2022-11-09 at 17:29 +0100, DdB wrote:
> Am 09.11.2022 um 12:41 schrieb hw:
> > In any case, I'm currently tending to think that putting FreeBSD with ZFS on
> > my
> > server might be the best option.  But then, apparently I won't be able to
> > configure the controller cards, so that won't really work.  And ZFS with
> > Linux
> > isn't so great because it keeps fuse in between.
> 
> NO fuse, neither FreeBSD nor debian would need the outdated zfs-fuse,
> use the in.kernel modules from zfsonlinux.org (packages for debian are
> in contrib IIRC).
> 

Ok, all the better --- I only looked at the package management.  Ah, let me see,
I have a Debian VM and no contrib ... hm, zfs-dkms and such?  That's promising,
thank you :)

else or Debian (Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?))

2022-11-09 Thread hw

On Wed, 2022-11-09 at 14:29 +0100, didier gaumet wrote:
> Le 09/11/2022 à 12:41, hw a écrit :
> [...]
> > In any case, I'm currently tending to think that putting FreeBSD with ZFS on
> > my
> > server might be the best option.  But then, apparently I won't be able to
> > configure the controller cards, so that won't really work.  And ZFS with
> > Linux
> > isn't so great because it keeps fuse in between.
> 
> I am really not so well aware of ZFS state but my impression was that:
> - FUSE implementation of ZoL (ZFS on Linux) is deprecated and that, 
> Ubuntu excepted (classic module?), ZFS is now integrated by a DKMS module

Hm that could be.  Debian doesn't seem to have it as a module.

> - *BSDs integrate directly ZFS because there are no licences conflicts
> - *BSDs nowadays have departed from old ZFS code and use the same source 
> code stack as Linux (OpenZFS)
> - Linux distros don't directly integrate ZFS because they generally 
> consider there are licences conflicts. The notable exception being 
> Ubuntu that considers that after legal review the situation is clear and 
> there is no licence conflicts.

Well, I'm not touching Ubuntu.  I want to get away from Fedora because of their
hostility and that includes Centos since that has become a derivative of it. 
FreeBSD has ZFS but can't even configure the disk controllers, so that won't
work.  I don't want to go with Gentoo because updating is a nightmare to the
point where you suddenly find yourself unable to update at all because they
broke something.  Arch is apparently for machosists, and I don't want
derivatives, especially not Ubuntu, and that leaves only Debian.  I don't want
Debian either because when they introduced their brokenarch, they managed to
make it so that NVIDIA drivers didn't work anymore with no fix in sight and
broke other stuff as well, and you can't let your users down like that.  But
what's the alternative?

However, Debian has apparently bad ZFS support (apparently still only Gentoo
actually supports it), so I'd go with btrfs.  Now that's gona suck because I'd
have to use mdadm to create a RAID5 (or use the hardware RAID but that isn't fun
after I've seen the hardware RAID refusing to rebuild a volume after a failed
disk was replaced) and put btrfs on that because btrfs doesn't even support
RAID5.

Or what else?

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread DdB

Am 09.11.2022 um 12:41 schrieb hw:
> In any case, I'm currently tending to think that putting FreeBSD with ZFS on 
> my
> server might be the best option.  But then, apparently I won't be able to
> configure the controller cards, so that won't really work.  And ZFS with Linux
> isn't so great because it keeps fuse in between.

NO fuse, neither FreeBSD nor debian would need the outdated zfs-fuse,
use the in.kernel modules from zfsonlinux.org (packages for debian are
in contrib IIRC).

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread D. R. Evans


hw wrote on 11/9/22 04:41:


configure the controller cards, so that won't really work.  And ZFS with Linux
isn't so great because it keeps fuse in between.



That isn't true. I've been using ZFS with Debian for years without FUSE, 
through the ZFSonLinux project.


The only slightly discomforting issue is that it's not officially supported on 
Debian because of a perceived license conflict.


  Doc

--
Web:  http://enginehousebooks.com/drevans

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread didier gaumet


Le 09/11/2022 à 12:41, hw a écrit :
[...]

In any case, I'm currently tending to think that putting FreeBSD with ZFS on my
server might be the best option.  But then, apparently I won't be able to
configure the controller cards, so that won't really work.  And ZFS with Linux
isn't so great because it keeps fuse in between.


I am really not so well aware of ZFS state but my impression was that:
- FUSE implementation of ZoL (ZFS on Linux) is deprecated and that, 
Ubuntu excepted (classic module?), ZFS is now integrated by a DKMS module

- *BSDs integrate directly ZFS because there are no licences conflicts
- *BSDs nowadays have departed from old ZFS code and use the same source 
code stack as Linux (OpenZFS)
- Linux distros don't directly integrate ZFS because they generally 
consider there are licences conflicts. The notable exception being 
Ubuntu that considers that after legal review the situation is clear and 
there is no licence conflicts.

Re: ZFS performance (was: Re: deduplicating file systems: VDO with Debian?)

2022-11-09 Thread hw

On Wed, 2022-11-09 at 10:35 +0100, DdB wrote:
> Am 09.11.2022 um 09:24 schrieb hw:
> > > Learn more about ZFS and invest in hardware to get performance.
> > Hardware like?  In theory, using SSDs for cache with ZFS should improve
> > performance.  In practise, it only wore out the SSDs after a while, and now
> > it's
> > not any faster without SSD cache.
> > 
> > 
> 
> metoo had that unpleasant experience of a worn out SSD that had been
> used as a ZFS cache and zil device. After that, my next comp got huge
> amounts of ECC-RAM, freeing up the SSD for the OS and stuff.

I don't have anything without ECC RAM, and my server was never meant for ZFS.

>  Also i did
> change the geometry of the main pool to a collection of mirrors (much
> faster than raid) and left the raid only on the slower backup server.

But then you have less capacity ...

> Due to snapshots and increments, i am now backing up only once in 2
> weeks, which takes somewhat around 1 hour bcoz of a slow connection. But
> i am satisfied with zfs performance from spinning rust, if i dont fill
> up the pool too much, and defrag after a while ... ;-)

With mirroring, I could fit only one backup, not two.

In any case, I'm currently tending to think that putting FreeBSD with ZFS on my
server might be the best option.  But then, apparently I won't be able to
configure the controller cards, so that won't really work.  And ZFS with Linux
isn't so great because it keeps fuse in between.

54 matches

Mail list logo