I have opened a PR for this (includes patch):
https://github.com/openzfs/zfs/pull/14713

We still need to address the a_fsizetd problem.

Maybe

On 4. 4. 2023 7:22, Rick Macklem wrote:
On Mon, Apr 3, 2023 at 10:07 PM Cy Schubert <cy.schub...@cschubert.com> wrote:
In message <CAM5tNy4p5UbAkujksyTR=J0tGuDf=Y6v2LuctK4P_p-H3PZpPg@mail.gmail.c
om>
, Rick Macklem writes:
On Mon, Apr 3, 2023 at 9:00=E2=80=AFPM Cy Schubert <Cy.Schubert@cschubert.c=
om> wrote:
In message <202304031513.333fd6qw014...@gitrepo.freebsd.org>, Martin
Matuska wr
ites:
The branch main has been updated by mm:

URL: https://cgit.FreeBSD.org/src/commit/?id=3D2a58b312b62f908ec92311d1=
bd8536db
aeb8e55b

commit 2a58b312b62f908ec92311d1bd8536dbaeb8e55b
Merge: b98fbf3781df 431083f75bdd
Author:     Martin Matuska <m...@freebsd.org>
AuthorDate: 2023-04-03 14:49:30 +0000
Commit:     Martin Matuska <m...@freebsd.org>
CommitDate: 2023-04-03 14:49:30 +0000

     zfs: merge openzfs/zfs@431083f75

     Notable upstream pull request merges:
       #12194 Fix short-lived txg caused by autotrim
       #13368 ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
       #13392 Implementation of block cloning for ZFS
       #13741 SHA2 reworking and API for iterating over multiple impleme=
ntatio
ns
       #14282 Sync thread should avoid holding the spa config write lock
              when possible
       #14283 txg_sync should handle write errors in ZIL
       #14359 More adaptive ARC eviction
       #14469 Fix NULL pointer dereference in zio_ready()
       #14479 zfs redact fails when dnodesize=3Dauto
       #14496 improve error message of zfs redact
       #14500 Skip memory allocation when compressing holes
       #14501 FreeBSD: don't verify recycled vnode for zfs control direc=
tory
       #14502 partially revert PR 14304 (eee9362a7)
       #14509 Fix per-jail zfs.mount_snapshot setting
       #14514 Fix data race between zil_commit() and zil_suspend()
       #14516 System-wide speculative prefetch limit
       #14517 Use rw_tryupgrade() in dmu_bonus_hold_by_dnode()
       #14519 Do not hold spa_config in ZIL while blocked on IO
       #14523 Move dmu_buf_rele() after dsl_dataset_sync_done()
       #14524 Ignore too large stack in case of dsl_deadlist_merge
       #14526 Use .section .rodata instead of .rodata on FreeBSD
       #14528 ICP: AES-GCM: Refactor gcm_clear_ctx()
       #14529 ICP: AES-GCM: Unify gcm_init_ctx() and gmac_init_ctx()
       #14532 Handle unexpected errors in zil_lwb_commit() without ASSER=
T()
       #14544 icp: Prevent compilers from optimizing away memset()
              in gcm_clear_ctx()
       #14546 Revert zfeature_active() to static
       #14556 Remove bad kmem_free() oversight from previous zfsdev_stat=
e_list
              patch
       #14563 Optimize the is_l2cacheable functions
       #14565 FreeBSD: zfs_znode_alloc: lock the vnode earlier
       #14566 FreeBSD: fix false assert in cache_vop_rmdir when replayin=
g ZIL
       #14567 spl: Add cmn_err_once() to log a message only on the first=
  call
       #14568 Fix incremental receive silently failing for recursive sen=
ds
       #14569 Restore ASMABI and other Unify work
       #14576 Fix detection of IBM Power8 machines (ISA 2.07)
       #14577 Better handling for future crypto parameters
       #14600 zcommon: Refactor FPU state handling in fletcher4
       #14603 Fix prefetching of indirect blocks while destroying
       #14633 Fixes in persistent error log
       #14639 FreeBSD: Remove extra arc_reduce_target_size() call
       #14641 Additional limits on hole reporting
       #14649 Drop lying to the compiler in the fletcher4 code
       #14652 panic loop when removing slog device
       #14653 Update vdev state for spare vdev
       #14655 Fix cloning into already dirty dbufs
       #14678 Revert "Do not hold spa_config in ZIL while blocked on IO"

     Obtained from:  OpenZFS
     OpenZFS commit: 431083f75bdd3efaee992bdd672625ec7240d252
Another problem related to copy_file_range() is the following example.

slippy$ df -h build/make/Makefile .
Filesystem        Size    Used   Avail Capacity  Mounted on
t/wrkdir/amd64     52G     53M     52G     0%    /export/wrkdir/amd64
t/wrkdir/amd64     52G     53M     52G     0%    /export/wrkdir/amd64
slippy$ cp build/make/Makefile .
cp: build/make/Makefile: Cross-device link
slippy$

And,

slippy$ cp y4menc.h foobar
cp: y4menc.h: Cross-device link
slippy$

But the following works because /tmp is mfs and /var/tmp is ufs.

slippy$ cp y4menc.h /tmp
slippy$
slippy$ cp y4menc.h /var/tmp
slippy$

Copying files from one zpool to a dataset in another zpool also works as
does copying files from different datasets on the same zpool. Only copyin=
g
files from/to the same dataset results in cross device link.

zfs_copy_file_range() will only be called if the source and destination
both exist on the same mount point. (I'm guessing that's what you mean
by "same dataset".)
Correct.

The long term solution to the problem (for users) would be to enable
block_cloning. One can test this by testing, checkpoint the pool, enable
block_cloning, problem goes away, export and import the pool with
--rewind-to-checkpoint to look at the regression again.

As to why we see a Cross-device link,

dtrace: script './dtrace.d' matched 1 probe
CPU     ID                    FUNCTION:NAME
   3  26104        vn_copy_file_range:return int64_t 0x12

0x12 == EXDEV.

As to why, zfs_clone_range() is called unconditionally at
zfs/zfs_vnops_os.c:6292. If the feature is not enabled zfs_clone_range()
sets the return code to EXDEV at line 1083 of zfs_vnops.c. Therefore
block_cloning MUST be enabled.

To reiterate, the long term solution would be to enable block_cloning. For
those who either wish or need to defer enabling of block_cloning, because
the zpool is occasionally imported by older versions of freebsd requiring
write acess, or other reason.

If there is a way to test to see if block cloning is enabled at the beginning
of zfs_copy_file_range(), it should just call vn_generic_copy_file_range()
if it is not enabled (or something like that).

rick

--
Cheers,
Cy Schubert <cy.schub...@cschubert.com>
FreeBSD UNIX:  <c...@freebsd.org>   Web:  https://FreeBSD.org
NTP:           <c...@nwtime.org>    Web:  https://nwtime.org

                         e^(i*pi)+1=0




Reply via email to