On Sat, Nov 15, 2025 at 7:16 AM Marcel Menzel <[email protected]> wrote:
> For the PostgreSQL upgrade to version 18, I took the opportunity to test
> the reflink support in pg_upgrade (with --clone) on OpenZFS 2.3.4 /
> Linux 6.15.11 and it worked flawlessly, being a huge time saver here.

Nice!

> I've looked into the documentation for pg_upgrade and it's only
> mentioning btrfs and XFS on Linux and not FreeBSD at all, so I thought
> It'd be an interesting heads-up to report that Linux gained a 3rd FS and
> also I think FreeBSD in general the ability for doing reflink copies.

It does mention both Linux and FreeBSD under --copy-file-range.  I
didn't try to list all the relevant file systems there though, partly
because I didn't feel like documenting all the quirks (only works if
you created your XFS file system with the feature enabled, might need
to frobnicate ZFS sysctl, which NFS clients and servers can push it
down, likewise for non-COW file systems and device drivers, etc etc).
It might be nice to find a decent reference for all that stuff
somewhere else and point to it, but I don't think we can maintain that
accurately ourselves.

I was actually surprised to hear that ioctl(dest_fd, FICLONE, src_fd)
worked for you.  I knew that it was really BTRFS's ioctl and XFS
accepted it too, but I didn't know that ZFS also understood it[1] in
2.3.  They apparently didn't really expect anyone to call it, and
since ZFS 2.4 is apparently about to ship without it[2], it seems like
a bad time to add it to the documentation for --clone.

> OpenZFS has been supporting this since 2.2 but has had it disabled due
> to data corruption bugs, now since 2.3 the sysctl (zfs_bclone_enabled on
> Linux, vfs.zfs.bclone_enabled on FreeBSD) has been enabled by default so
> only the zpool feature "block_cloning" has to be enabled, which might be
> the case when running "zpool upgrade".

Yeah, those data corruption reports (which turned out to be
misattributed IIRC?) provided one reason to keep the old BTRFS ioctl()
under --clone but add the new behaviour under --copy-file-range.
--copy-file-range should work for all COW filesystems on Linux via
proper VFS entrypoints, and is the official way to do this from user
space.  Perhaps we should eventually harmonise this under a single
option and drop the ioctl() stuff.  One semantic change would be that
copy_file_range() means "copy with your best trick" (could be cloning,
network/driver pushdown or user space buffer copy, silently selecting
the behaviour), while the BTRFS ioctl() means "clone or fail" IIRC, so
that was another reason to want a separate option for now.

For reference, the macOS copyfile() call used for --clone has flags
that should cause it to fail if it can't clone IIUC, while the Windows
CopyFile() call used for --copy might even clone blocks on ReFS even
if you don't specify --clone... huh.

> I haven't had the possibility to check this on FreeBSD yet, but I don't
> see why this should not work as I also can't spot anything in the
> OpenZFS docs regarding reflink / block cloning limitations on FreeBSD.
> Also I saw one of the OpenZFS devs writing on Reddit about block cloning
> being supported on FreeBSD v14.

It always succeeds on FreeBSD, but it only actually clones if you set
vfs.zfs.bclone_enabled=1. I've tested all our "clone" features with
that and they work nicely.  The sysctl wasn't on by default in FreeBSD
14.x, but 15 is about to ship and the "experimental" label was removed
in man 4 zfs.

If you haven't seen them yet, you might also like these COW tricks:

Shared storage of basic catalog tables when you have a lot of databases:
SET file_copy_method = CLONE;
CREATE DATABASE ... STRATEGY=FILE_COPY;

Fast database clone/snapshot of very large databases (caveats: users
can't be connected to source, checkpoint forced):
SET file_copy_method = CLONE;
CREATE DATABASE ... STRATEGY=FILE_COPY TEMPLATE=source_db;

Combine a chain of incremental backups and a full backup to produce a
new full backup, sharing disk blocks with the ancestor backups:
pg_combinebackup --copy-file-range

That last one is a really powerful use of copy_file_range()'s subfile
cloning powers.  Another subfile cloning trick I've proposed before is
making relation segment size user-controllable, and then allowing
pg_upgrade to migrate between segment sizes by splicing them together.

[1] 
https://github.com/openzfs/zfs/commit/9927f219f1e9f4ee886d426190500abf5b1d602e
[2] 
https://github.com/openzfs/zfs/commit/4800181b3b950d67a62aca7c9e28d34c8b303242


Reply via email to