Re: [OpenAFS] Advice on using BTRFS for vicep partitions on Linux

2023-03-22 Thread Ciprian Craciun
On Wed, Mar 22, 2023 at 10:30 AM  wrote:
> OpenAFS implements its own CoW and using CoW below that again has no benefits 
> and disturbs the fileservers "free-space" assumptions. It knows when it makes 
> in-place updates and does not expect to run out of space in that situation.


At what level does OpenAFS implement CoW?  Is it implemented at
whole-file-level, i.e. changing a file that is part of a replicated /
backup volume it will copy the entire file, or is it implemented at
some range or smaller granularity level (i.e. it will change only that
range, but share the rest)?

I'm asking this because I've assumed (based on empirical observations)
that all files stored in OpenAFS (via the proper `afsd`) will end
somewhere in `/vicepX` as individual files.  (I.e. if I were to
`md5sum` all the files from `/afs/some-cell`, and then `md5sum` all
the files in `/vicepX`, then the first set of `/afs/...` is a subset
of the second one `/vicepX`.)



> > Unfortunately (at least for my use-case) losing the checksumming and
> > compression is a no-go, because these were exactly the features that
> > made BTRFS appealing versus Ext4.
>
> If you say so...
> AFS does its own data checksumming.


Can one force OpenAFS to do a verification of these checksums and
report back any issues?

What kind of checksums are these?  Cryptographic ones like
MD/SHA/newer or CRC-ones?



> > Also, regarding RAID scrubbing, it doesn't cover the issue of
> > checksumming, because (for example with RAID5) it can only detect that
> > one of the disks has corrupted data, but couldn't say which.
>
> Do not use RAID to prevent data loss! That's what backups are for. RAID is 
> for operative redundancy. Scrubbing also tells you about your state of FS 
> metadata. So, it's not that it has no use without checksumming. I only use 
> RAID 1 and 1-0. They have lower dataloss probabilities that RAID 5.


Granted, RAID is not a backup solution, but it should instead protect
one from faulty hardware.  Which is exactly what it doesn't do 100%,
because if one of the drive in the array returns corrupted data, the
RAID system can't say which one is it (based purely on the returned
data).  Granted, disks don't just return random data without any other
failure or symptom.

With regard to file-system scrubbing, to my knowledge, only those that
actually have checksumming can do this, which currently is either
BTRFS or ZFS.



> All -sync properties are ineffective with NAS, because the network layer and 
> far-end OS decide on actual commit strategies. So you might as well stop 
> deceiving yourself and disable write barriers.

I think that barriers have other implications especially to journaled
file-systems.



> You will use subvolumes the moment you start making snapshots. So be careful 
> to not deceive yourself. A forgotten snapshot can easily get you into trouble 
> the moment you move off some volumes to make room for a large addition, just 
> to realise no space opened up at all.

This is true.  It is true even of OpenAFS backup volumes.  :)


Ciprian.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Advice on using BTRFS for vicep partitions on Linux

2023-03-22 Thread Ciprian Craciun
On Wed, Mar 22, 2023 at 3:11 PM Dirk Heinrichs  wrote:
> > it's not in-kernel; which means sooner or later one would encounter
> > problems.
>
> Can you please elaborate? I run two ZFS systems @home where one is an
> OpenAFS fileserver and client, the other one a client only. They both
> started as Debian Stretch and have been updated to Buster and then
> Bullseye and I've never had any problems because of ZFS being
> out-of-tree. The Debian DKMS system does quite a good job.


Well, I base this supposition on my simple observation with OpenAFS's
own client which is also out-of-tree and requires custom module builds
(via DKMS or equivalent).

For example I use OpenSUSE Tumbleweed (rolling release), and sometimes
I need to delay my updates until the distribution manages to get the
modules ready (with the latest Linux kernel).

Granted, this doesn't usually happen on OpenSUSE Leap, although (for
some reason) the package manager from time to time decides to remove
the old `libafs.ko` for the current running kernel, which (in case
`afsd` must be restarted, for example as is the case of updates)
requires me to reboot the system.

I bet the same applies to ZFS also.

Ciprian.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Advice on using BTRFS for vicep partitions on Linux

2023-03-22 Thread Ciprian Craciun
On Tue, Mar 21, 2023 at 9:32 PM  wrote:
> The main ingredient on BTRFS is to disable Copy-on-Write for the respective. 
> This also somewhat mitigates surprising out-of-space issues.


What is the reason behind disabling copy-on-write for BTRFS?  Does it
break OpenAFS in some way, or is it only the out-of-space issue?



> You need to provide the 'nodatacow' mount option.
> You lose data checksumming and compression on BTRFS. So, reasonable RAID 
> config and scrubbing may be more important, now.


Unfortunately (at least for my use-case) losing the checksumming and
compression is a no-go, because these were exactly the features that
made BTRFS appealing versus Ext4.

Also, regarding RAID scrubbing, it doesn't cover the issue of
checksumming, because (for example with RAID5) it can only detect that
one of the disks has corrupted data, but couldn't say which.

(As an alternative to file-system provided checksumming, at list on
Linux, there is the `dm-integrity`, configured via `integritysetup`,
that could provide checksumming at the block level;  but at the moment
I'm still experimenting with it for other use-cases.)



> Additionally, depending on your exact setup, you may want to disable write 
> barriers (e.g. for network attached storage, 'nobarrier') when it is without 
> effect.


Could you elaborate more on this?  I guess it doesn't apply to
directly attached disks.  Is this in order to increase write
performance, or?

Have you also changed the `-sync` file-server option?

I'm using `-sync onclose` to be sure that my data is actually stored
on the disk.  The write performance does suffer, especially for
use-cases like Git where some simple operations (like repacking) take
forever (because for some reason Git tries to touch each and every
`.git/objects/XX` folders...)



> Last remark. BTRFS, to my knowledge, does not support reservations. You MUST 
> make sure to use a pre-allocated storage for the /vicepX mountpoint or the 
> ugly day of failing AFS writes will come during your next overseas vacation.


You mean in the case `/vicepX` is a separate volume, but on the same
actual disk with other volumes, right?

(In my case I intend to use a dedicated BTRFS disk, over RAID, without
any subsolumes.)



> ZFS, although you don't want to go that way, works fine as well. Again, make 
> sure to create a filesystem (i.e. subvolume) with a fixed reservation. AFAIK 
> the FS takes care of providing enough space although you cannot disable COW. 
> You keep all the goodies, duplication, deduplication, checksumming. I would 
> suggest reading on ZFS setups for heavy database loads, should I have got you 
> interested.


Thanks for the ZFS suggestions, however for me ZFS is a complete no-go
due to one main reason:  it's not in-kernel;  which means sooner or
later one would encounter problems.  The other reason is complexity:
I use OpenAFS for my own "self-hosted" / "home" needs, thus I want
something I can easily debug in case something goes wrong.  ZFS
doesn't give me much peace of mind;  too complex, too many options...

Thanks,
Ciprian.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Advice on using BTRFS for vicep partitions on Linux

2023-03-21 Thread Ciprian Craciun
Hello all!

I've searched the mailing archive (and the internet) about this topic,
but came a bit empty-handed, as the last proper mention of BTRFS on
this mailing list was 10 years ago.

So, given so much time has passed, I would like to ask the OpenAFS
community if anyone has any experience on using BTRFS (on Linux) as a
file-system for `/vicepX` partitions.

Based on what I know, the OpenAFS file-server doesn't have many
requirements for the underlying file-system besides the usual POSIX
meta-data (protection mode, uid+gid, and timestamps?), thus I see no
reason why it wouldn't work.  (But before trying it out, I thought
I'll ask.)  :)



Currently I use Ext4 over MD RAID5, and all is nice.  However, I'm
testing out BTRFS as a successor to Ext4, for two main reasons:
* data integrity;  (both to detect bit-rot, and possible software
glitches that might corrupt my files;)
* compression;  (especially for text files and similar;)
* snapshots;  (it could provide a guard against any possible OpenAFS
glitches, although to date I only suffered one symlink corruption;)



Yes, I know there is also ZFS, but I don't want to go that route.  At
the moment I want to take a more conservative approach and just use
BTRFS over RAID5 as with the previous Ext4 (and JFS before that).

Thanks,
Ciprian.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info