Re: crash zfs_clone_range()
On Tue, Nov 14, 2023 at 01:30:25PM -0800, Rick Macklem wrote: > On Tue, Nov 14, 2023 at 1:20 PM Konstantin Belousov > wrote: > > > > On Tue, Nov 14, 2023 at 06:47:46PM +0100, Mateusz Guzik wrote: > > > On 11/14/23, Alexander Motin wrote: > > > > On 14.11.2023 12:39, Mateusz Guzik wrote: > > > >> One of the vnodes is probably not zfs, I suspect this will do it > > > >> (untested): > > > >> > > > >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > > >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > > >> index 107cd69c756c..e799a7091b8e 100644 > > > >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > > >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > > >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct > > > >> vop_copy_file_range_args *ap) > > > >> goto bad_write_fallback; > > > >> } > > > >> } > > > >> + > > > >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { > > > >> + goto bad_write_fallback; > > > >> + } > > > >> + > > > >> if (invp == outvp) { > > > >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { > > > >> goto bad_write_fallback; > > > >> > > > > > > > > vn_copy_file_range() verifies for that: > > > > > > > > /* > > > > * If the two vnodes are for the same file system type, call > > > > * VOP_COPY_FILE_RANGE(), otherwise call > > > > vn_generic_copy_file_range() > > > > * which can handle copies across multiple file system types. > > > > */ > > > > *lenp = len; > > > > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > > > > outmp->mnt_vfc->vfc_name) == 0) > > > > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, > > > > outoffp, > > > > lenp, flags, incred, outcred, fsize_td); > > > > else > > > > error = vn_generic_copy_file_range(invp, inoffp, outvp, > > > > outoffp, lenp, flags, incred, outcred, fsize_td); > > > > > > > > > > > > > > The crash at hand comes from nullfs. If "outward" vnodes are both > > > nullfs, but only one underlying vnode is zfs, you get the above. > > > > If this is the reason, the check must be done by nullfs bypass for > > vop_copy_file_range(). > I suppose this is a reasonable alternative, although it means that > all stacked file systems will need the check. > It just seems easier to do it in the actual VOPs, but it is up to others. In theory, eventually we can have much more implementations for VOP than VOP' callers. I.e. fixing all stacked fs means adding unionfs to my patch. Forcing this requirements on all future VOP_COPY_FILE_RANGE implementations is IMO not good. BTW, we already have zfs, nfs, and fuse implementing the VOP. > > Btw, the stuff above the VOP_COPY_FILE_RANGE() that busies the > mounts and checks mnt_vfc being the same could be dropped, if the > VOP_COPY_FILE_RANGE() calls like NFS were careful to lock the > vnodes before doing a "same fs type or same mount" check. > (I suppose that would be a subtle change in VOP semantics that > is arguably not allowed for a minor version.) > > Anyhow, I am happy with whatever others decide. > > rick > > >
Re: crash zfs_clone_range()
On Tue, Nov 14, 2023 at 1:20 PM Konstantin Belousov wrote: > > On Tue, Nov 14, 2023 at 06:47:46PM +0100, Mateusz Guzik wrote: > > On 11/14/23, Alexander Motin wrote: > > > On 14.11.2023 12:39, Mateusz Guzik wrote: > > >> One of the vnodes is probably not zfs, I suspect this will do it > > >> (untested): > > >> > > >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> index 107cd69c756c..e799a7091b8e 100644 > > >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > > >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct > > >> vop_copy_file_range_args *ap) > > >> goto bad_write_fallback; > > >> } > > >> } > > >> + > > >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { > > >> + goto bad_write_fallback; > > >> + } > > >> + > > >> if (invp == outvp) { > > >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { > > >> goto bad_write_fallback; > > >> > > > > > > vn_copy_file_range() verifies for that: > > > > > > /* > > > * If the two vnodes are for the same file system type, call > > > * VOP_COPY_FILE_RANGE(), otherwise call > > > vn_generic_copy_file_range() > > > * which can handle copies across multiple file system types. > > > */ > > > *lenp = len; > > > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > > > outmp->mnt_vfc->vfc_name) == 0) > > > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, > > > lenp, flags, incred, outcred, fsize_td); > > > else > > > error = vn_generic_copy_file_range(invp, inoffp, outvp, > > > outoffp, lenp, flags, incred, outcred, fsize_td); > > > > > > > > > > The crash at hand comes from nullfs. If "outward" vnodes are both > > nullfs, but only one underlying vnode is zfs, you get the above. > > If this is the reason, the check must be done by nullfs bypass for > vop_copy_file_range(). I suppose this is a reasonable alternative, although it means that all stacked file systems will need the check. It just seems easier to do it in the actual VOPs, but it is up to others. Btw, the stuff above the VOP_COPY_FILE_RANGE() that busies the mounts and checks mnt_vfc being the same could be dropped, if the VOP_COPY_FILE_RANGE() calls like NFS were careful to lock the vnodes before doing a "same fs type or same mount" check. (I suppose that would be a subtle change in VOP semantics that is arguably not allowed for a minor version.) Anyhow, I am happy with whatever others decide. rick >
Re: crash zfs_clone_range()
On Tue, Nov 14, 2023 at 1:15 PM Mateusz Guzik wrote: > > On 11/14/23, Rick Macklem wrote: > > On Tue, Nov 14, 2023 at 10:46 AM Alexander Motin wrote: > >> > >> On 14.11.2023 12:44, Alexander Motin wrote: > >> > On 14.11.2023 12:39, Mateusz Guzik wrote: > >> >> One of the vnodes is probably not zfs, I suspect this will do it > >> >> (untested): > >> >> > >> >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> >> index 107cd69c756c..e799a7091b8e 100644 > >> >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct > >> >> vop_copy_file_range_args *ap) > >> >> goto bad_write_fallback; > >> >> } > >> >> } > >> >> + > >> >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { > >> >> + goto bad_write_fallback; > >> >> + } > >> >> + > >> >> if (invp == outvp) { > >> >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { > >> >> goto bad_write_fallback; > >> >> > >> > > >> > vn_copy_file_range() verifies for that: > >> > > >> > /* > >> > * If the two vnodes are for the same file system type, call > >> > * VOP_COPY_FILE_RANGE(), otherwise call > >> > vn_generic_copy_file_range() > >> > * which can handle copies across multiple file system types. > >> > */ > >> > *lenp = len; > >> > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > >> > outmp->mnt_vfc->vfc_name) == 0) > >> > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, > >> > outoffp, > >> > lenp, flags, incred, outcred, fsize_td); > >> > else > >> > error = vn_generic_copy_file_range(invp, inoffp, > >> > outvp, > >> > outoffp, lenp, flags, incred, outcred, fsize_td); > >> > >> Thinking again, what happen if there are two nullfs mounts on top of two > >> different file systems, one of which is indeed not ZFS? Do we need to > >> add those checks to all ZFS, NFS and FUSE, implementing > >> VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? > > Although it would be nice to do the check before the VOP call, I don't > > see an easy way to do that. > > > > It looks like the simple solution is to add a check in each of the > > VOP_COPY_FILE_RANGE() calls, such as mjg@ has proposed > > for ZFS. At this point there is only the three and I can easily do the > > NFS one. > > > > All filesystems except for zfs are already covered because they check > for mismatched mount. Yes, now that the mount point(s) are busied. The NFS check is before the vnodes were locked, so it is unsafe without busying the mount points. (That was not my patch, but I missed the problem during review.) rick > > -- > Mateusz Guzik
Re: crash zfs_clone_range()
On 11/14/23, Rick Macklem wrote: > On Tue, Nov 14, 2023 at 10:46 AM Alexander Motin wrote: >> >> On 14.11.2023 12:44, Alexander Motin wrote: >> > On 14.11.2023 12:39, Mateusz Guzik wrote: >> >> One of the vnodes is probably not zfs, I suspect this will do it >> >> (untested): >> >> >> >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> >> index 107cd69c756c..e799a7091b8e 100644 >> >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct >> >> vop_copy_file_range_args *ap) >> >> goto bad_write_fallback; >> >> } >> >> } >> >> + >> >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { >> >> + goto bad_write_fallback; >> >> + } >> >> + >> >> if (invp == outvp) { >> >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { >> >> goto bad_write_fallback; >> >> >> > >> > vn_copy_file_range() verifies for that: >> > >> > /* >> > * If the two vnodes are for the same file system type, call >> > * VOP_COPY_FILE_RANGE(), otherwise call >> > vn_generic_copy_file_range() >> > * which can handle copies across multiple file system types. >> > */ >> > *lenp = len; >> > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, >> > outmp->mnt_vfc->vfc_name) == 0) >> > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, >> > outoffp, >> > lenp, flags, incred, outcred, fsize_td); >> > else >> > error = vn_generic_copy_file_range(invp, inoffp, >> > outvp, >> > outoffp, lenp, flags, incred, outcred, fsize_td); >> >> Thinking again, what happen if there are two nullfs mounts on top of two >> different file systems, one of which is indeed not ZFS? Do we need to >> add those checks to all ZFS, NFS and FUSE, implementing >> VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? > Although it would be nice to do the check before the VOP call, I don't > see an easy way to do that. > > It looks like the simple solution is to add a check in each of the > VOP_COPY_FILE_RANGE() calls, such as mjg@ has proposed > for ZFS. At this point there is only the three and I can easily do the > NFS one. > All filesystems except for zfs are already covered because they check for mismatched mount. -- Mateusz Guzik
Re: crash zfs_clone_range()
On Tue, Nov 14, 2023 at 10:46 AM Alexander Motin wrote: > > On 14.11.2023 12:44, Alexander Motin wrote: > > On 14.11.2023 12:39, Mateusz Guzik wrote: > >> One of the vnodes is probably not zfs, I suspect this will do it > >> (untested): > >> > >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> index 107cd69c756c..e799a7091b8e 100644 > >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct > >> vop_copy_file_range_args *ap) > >> goto bad_write_fallback; > >> } > >> } > >> + > >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { > >> + goto bad_write_fallback; > >> + } > >> + > >> if (invp == outvp) { > >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { > >> goto bad_write_fallback; > >> > > > > vn_copy_file_range() verifies for that: > > > > /* > > * If the two vnodes are for the same file system type, call > > * VOP_COPY_FILE_RANGE(), otherwise call > > vn_generic_copy_file_range() > > * which can handle copies across multiple file system types. > > */ > > *lenp = len; > > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > > outmp->mnt_vfc->vfc_name) == 0) > > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, > > lenp, flags, incred, outcred, fsize_td); > > else > > error = vn_generic_copy_file_range(invp, inoffp, outvp, > > outoffp, lenp, flags, incred, outcred, fsize_td); > > Thinking again, what happen if there are two nullfs mounts on top of two > different file systems, one of which is indeed not ZFS? Do we need to > add those checks to all ZFS, NFS and FUSE, implementing > VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? Although it would be nice to do the check before the VOP call, I don't see an easy way to do that. It looks like the simple solution is to add a check in each of the VOP_COPY_FILE_RANGE() calls, such as mjg@ has proposed for ZFS. At this point there is only the three and I can easily do the NFS one. rick > > -- > Alexander Motin >
Re: crash zfs_clone_range()
On Tue, Nov 14, 2023 at 07:51:39PM +0100, Mateusz Guzik wrote: > On 11/14/23, Alexander Motin wrote: > > On 14.11.2023 12:44, Alexander Motin wrote: > >> On 14.11.2023 12:39, Mateusz Guzik wrote: > >>> One of the vnodes is probably not zfs, I suspect this will do it > >>> (untested): > >>> > >>> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >>> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >>> index 107cd69c756c..e799a7091b8e 100644 > >>> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >>> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >>> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct > >>> vop_copy_file_range_args *ap) > >>> goto bad_write_fallback; > >>> } > >>> } > >>> + > >>> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { > >>> + goto bad_write_fallback; > >>> + } > >>> + > >>> if (invp == outvp) { > >>> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { > >>> goto bad_write_fallback; > >>> > >> > >> vn_copy_file_range() verifies for that: > >> > >> /* > >> * If the two vnodes are for the same file system type, call > >> * VOP_COPY_FILE_RANGE(), otherwise call > >> vn_generic_copy_file_range() > >> * which can handle copies across multiple file system types. > >> */ > >> *lenp = len; > >> if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > >> outmp->mnt_vfc->vfc_name) == 0) > >> error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, > >> outoffp, > >> lenp, flags, incred, outcred, fsize_td); > >> else > >> error = vn_generic_copy_file_range(invp, inoffp, outvp, > >> outoffp, lenp, flags, incred, outcred, fsize_td); > > > > Thinking again, what happen if there are two nullfs mounts on top of two > > different file systems, one of which is indeed not ZFS? Do we need to > > add those checks to all ZFS, NFS and FUSE, implementing > > VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? > > > > I already advocated for not trying to guess for filesystems what they > can or cannot handle internally. > > That is to say vn_copy_file_range should call VOP_COPY_FILE_RANGE, > that can try to figure out what to do and if it got nothing punt to a > fallback. This already happens for some of the cases. > It is nullfs that is to blame there. See https://reviews.freebsd.org/D42603
Re: crash zfs_clone_range()
On 11/14/23, Alexander Motin wrote: > On 14.11.2023 12:44, Alexander Motin wrote: >> On 14.11.2023 12:39, Mateusz Guzik wrote: >>> One of the vnodes is probably not zfs, I suspect this will do it >>> (untested): >>> >>> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >>> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >>> index 107cd69c756c..e799a7091b8e 100644 >>> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >>> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >>> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct >>> vop_copy_file_range_args *ap) >>> goto bad_write_fallback; >>> } >>> } >>> + >>> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { >>> + goto bad_write_fallback; >>> + } >>> + >>> if (invp == outvp) { >>> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { >>> goto bad_write_fallback; >>> >> >> vn_copy_file_range() verifies for that: >> >> /* >> * If the two vnodes are for the same file system type, call >> * VOP_COPY_FILE_RANGE(), otherwise call >> vn_generic_copy_file_range() >> * which can handle copies across multiple file system types. >> */ >> *lenp = len; >> if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, >> outmp->mnt_vfc->vfc_name) == 0) >> error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, >> outoffp, >> lenp, flags, incred, outcred, fsize_td); >> else >> error = vn_generic_copy_file_range(invp, inoffp, outvp, >> outoffp, lenp, flags, incred, outcred, fsize_td); > > Thinking again, what happen if there are two nullfs mounts on top of two > different file systems, one of which is indeed not ZFS? Do we need to > add those checks to all ZFS, NFS and FUSE, implementing > VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? > I already advocated for not trying to guess for filesystems what they can or cannot handle internally. That is to say vn_copy_file_range should call VOP_COPY_FILE_RANGE, that can try to figure out what to do and if it got nothing punt to a fallback. This already happens for some of the cases. -- Mateusz Guzik
Re: crash zfs_clone_range()
On 14.11.2023 12:44, Alexander Motin wrote: On 14.11.2023 12:39, Mateusz Guzik wrote: One of the vnodes is probably not zfs, I suspect this will do it (untested): diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c index 107cd69c756c..e799a7091b8e 100644 --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap) goto bad_write_fallback; } } + + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { + goto bad_write_fallback; + } + if (invp == outvp) { if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { goto bad_write_fallback; vn_copy_file_range() verifies for that: /* * If the two vnodes are for the same file system type, call * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range() * which can handle copies across multiple file system types. */ *lenp = len; if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, outmp->mnt_vfc->vfc_name) == 0) error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); else error = vn_generic_copy_file_range(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); Thinking again, what happen if there are two nullfs mounts on top of two different file systems, one of which is indeed not ZFS? Do we need to add those checks to all ZFS, NFS and FUSE, implementing VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? -- Alexander Motin
Re: crash zfs_clone_range()
On Tue, Nov 14, 2023 at 06:47:46PM +0100, Mateusz Guzik wrote: > On 11/14/23, Alexander Motin wrote: > > On 14.11.2023 12:39, Mateusz Guzik wrote: > >> One of the vnodes is probably not zfs, I suspect this will do it > >> (untested): > >> > >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> index 107cd69c756c..e799a7091b8e 100644 > >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c > >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct > >> vop_copy_file_range_args *ap) > >> goto bad_write_fallback; > >> } > >> } > >> + > >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { > >> + goto bad_write_fallback; > >> + } > >> + > >> if (invp == outvp) { > >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { > >> goto bad_write_fallback; > >> > > > > vn_copy_file_range() verifies for that: > > > > /* > > * If the two vnodes are for the same file system type, call > > * VOP_COPY_FILE_RANGE(), otherwise call > > vn_generic_copy_file_range() > > * which can handle copies across multiple file system types. > > */ > > *lenp = len; > > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > > outmp->mnt_vfc->vfc_name) == 0) > > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, > > lenp, flags, incred, outcred, fsize_td); > > else > > error = vn_generic_copy_file_range(invp, inoffp, outvp, > > outoffp, lenp, flags, incred, outcred, fsize_td); > > > > > > The crash at hand comes from nullfs. If "outward" vnodes are both > nullfs, but only one underlying vnode is zfs, you get the above. If this is the reason, the check must be done by nullfs bypass for vop_copy_file_range().
Re: crash zfs_clone_range()
On 11/14/23, Alexander Motin wrote: > On 14.11.2023 12:39, Mateusz Guzik wrote: >> One of the vnodes is probably not zfs, I suspect this will do it >> (untested): >> >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> index 107cd69c756c..e799a7091b8e 100644 >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct >> vop_copy_file_range_args *ap) >> goto bad_write_fallback; >> } >> } >> + >> + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { >> + goto bad_write_fallback; >> + } >> + >> if (invp == outvp) { >> if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { >> goto bad_write_fallback; >> > > vn_copy_file_range() verifies for that: > > /* > * If the two vnodes are for the same file system type, call > * VOP_COPY_FILE_RANGE(), otherwise call > vn_generic_copy_file_range() > * which can handle copies across multiple file system types. > */ > *lenp = len; > if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, > outmp->mnt_vfc->vfc_name) == 0) > error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, > lenp, flags, incred, outcred, fsize_td); > else > error = vn_generic_copy_file_range(invp, inoffp, outvp, > outoffp, lenp, flags, incred, outcred, fsize_td); > > The crash at hand comes from nullfs. If "outward" vnodes are both nullfs, but only one underlying vnode is zfs, you get the above. -- Mateusz Guzik
Re: crash zfs_clone_range()
On 14.11.2023 12:39, Mateusz Guzik wrote: One of the vnodes is probably not zfs, I suspect this will do it (untested): diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c index 107cd69c756c..e799a7091b8e 100644 --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap) goto bad_write_fallback; } } + + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { + goto bad_write_fallback; + } + if (invp == outvp) { if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { goto bad_write_fallback; vn_copy_file_range() verifies for that: /* * If the two vnodes are for the same file system type, call * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range() * which can handle copies across multiple file system types. */ *lenp = len; if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, outmp->mnt_vfc->vfc_name) == 0) error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); else error = vn_generic_copy_file_range(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); -- Alexander Motin
Re: crash zfs_clone_range()
On 11/14/23, Ronald Klop wrote: > Response below > > Van: Konstantin Belousov > Datum: zondag, 12 november 2023 19:47 > Aan: Alexander Motin > CC: Ronald Klop , curr...@freebsd.org > Onderwerp: Re: crash zfs_clone_range() >> >> On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: >> > Hi Ronald, >> > >> > As I can see, the clone request to ZFS came through nullfs, and it >> > crashed >> > immediately on enter. I've never been a VFS layer expert, but to me it >> > may >> > be a nullfs problem, not zfs. Is there chance you was (un-)mounting >> > something when this happened? >> It is not nullfs issue, I believe, but the lack of the busy reference on >> the >> upper mount. I think https://reviews.freebsd.org/D42554 should cover it. >> >> > >> > On 10.11.2023 05:12, Ronald Klop wrote: >> > > Hi, >> > > >> > > Had this crash today on RPI4/15-CURRENT. >> > > >> > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 >> > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 >> > > ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG >> > > arm64 >> > > >> > > $ sysctl -a | grep bclon >> > > vfs.zfs.bclone_enabled: 1 >> > > >> > > I started a jail with poudriere to build a package. The jail uses >> > > null >> > > mounts over ZFS. >> > > >> > > [root]# cu -s 115200 -l /dev/cuaU0 >> > > Connected >> > > >> > > db> bt >> > > Tracing pid 95213 tid 100438 td 0xe1e97900 >> > > db_trace_self() at db_trace_self >> > > db_stack_trace() at db_stack_trace+0x120 >> > > db_command() at db_command+0x2e4 >> > > db_command_loop() at db_command_loop+0x58 >> > > db_trap() at db_trap+0x100 >> > > kdb_trap() at kdb_trap+0x334 >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0xf200 >> > > kdb_enter() at kdb_enter+0x48 >> > > vpanic() at vpanic+0x1dc >> > > panic() at panic+0x48 >> > > data_abort() at data_abort+0x2fc >> > > handle_el1h_sync() at handle_el1h_sync+0x18 >> > > --- exception, esr 0x9604 >> > > rms_rlock() at rms_rlock+0x1c >> > > zfs_clone_range() at zfs_clone_range+0x68 >> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c >> > > null_bypass() at null_bypass+0x118 >> > > vn_copy_file_range() at vn_copy_file_range+0x18c >> > > kern_copy_file_range() at kern_copy_file_range+0x36c >> > > sys_copy_file_range() at sys_copy_file_range+0x8c >> > > do_el0_sync() at do_el0_sync+0x634 >> > > handle_el0_sync() at handle_el0_sync+0x48 >> > > --- exception, esr 0x5600 >> > > >> > > >> > > Oh.. While typing this I rebooted the machine and it happened again. >> > > I >> > > didn't start anything in particular although the machine runs some >> > > jails. >> > > >> > > x0: 0x00e0 >> > >x1: 0xa00090317a48 >> > >x2: 0xa000f79d4f00 >> > >x3: 0xa000c61a44a8 >> > >x4: 0xdeefe460 ($d.2 + 0xdd776560) >> > >x5: 0xa001250e4c00 >> > >x6: 0xe54025b5 ($d.5 + 0xc) >> > >x7: 0x030a >> > >x8: 0xe1559000 ($d.2 + 0xdfdd1100) >> > >x9: 0x0001 >> > > x10: 0x >> > > x11: 0x0001 >> > > x12: 0x0002 >> > > x13: 0x >> > > x14: 0x0001 >> > > x15: 0x >> > > x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) >> > > x17: 0x004e0d44 (rms_rlock + 0x0) >> > > x18: 0xdeefe280 ($d.2 + 0xdd776380) >> > > x19: 0x >> > > x20: 0xdeefe460 ($d.2 + 0xdd776560) >> > > x21: 0x7fff >> > > x22: 0xa00090317a48 >> > > x23: 0xa000f79d4f00 >> > > x24: 0xa001067ef910 >> > > x25: 0x00e0 >> > > x26: 0xa000158a8000 >> > > x27: 0x >> > > x28: 0xa000158a8000 >> > > x29: 0xdeefe280 ($d.2 + 0xdd7
Re: crash zfs_clone_range()
Van: Ronald Klop Datum: dinsdag, 14 november 2023 13:59 Aan: Konstantin Belousov CC: Alexander Motin , curr...@freebsd.org Onderwerp: Re: crash zfs_clone_range() Response below Van: Konstantin Belousov Datum: zondag, 12 november 2023 19:47 Aan: Alexander Motin CC: Ronald Klop , curr...@freebsd.org Onderwerp: Re: crash zfs_clone_range() On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: > Hi Ronald, > > As I can see, the clone request to ZFS came through nullfs, and it crashed > immediately on enter. I've never been a VFS layer expert, but to me it may > be a nullfs problem, not zfs. Is there chance you was (un-)mounting > something when this happened? It is not nullfs issue, I believe, but the lack of the busy reference on the upper mount. I think https://reviews.freebsd.org/D42554 should cover it. > > On 10.11.2023 05:12, Ronald Klop wrote: > > Hi, > > > > Had this crash today on RPI4/15-CURRENT. > > > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG > > arm64 > > > > $ sysctl -a | grep bclon > > vfs.zfs.bclone_enabled: 1 > > > > I started a jail with poudriere to build a package. The jail uses null > > mounts over ZFS. > > > > [root]# cu -s 115200 -l /dev/cuaU0 > > Connected > > > > db> bt > > Tracing pid 95213 tid 100438 td 0xe1e97900 > > db_trace_self() at db_trace_self > > db_stack_trace() at db_stack_trace+0x120 > > db_command() at db_command+0x2e4 > > db_command_loop() at db_command_loop+0x58 > > db_trap() at db_trap+0x100 > > kdb_trap() at kdb_trap+0x334 > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0xf200 > > kdb_enter() at kdb_enter+0x48 > > vpanic() at vpanic+0x1dc > > panic() at panic+0x48 > > data_abort() at data_abort+0x2fc > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0x9604 > > rms_rlock() at rms_rlock+0x1c > > zfs_clone_range() at zfs_clone_range+0x68 > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > null_bypass() at null_bypass+0x118 > > vn_copy_file_range() at vn_copy_file_range+0x18c > > kern_copy_file_range() at kern_copy_file_range+0x36c > > sys_copy_file_range() at sys_copy_file_range+0x8c > > do_el0_sync() at do_el0_sync+0x634 > > handle_el0_sync() at handle_el0_sync+0x48 > > --- exception, esr 0x5600 > > > > > > Oh.. While typing this I rebooted the machine and it happened again. I > > didn't start anything in particular although the machine runs some > > jails. > > > > x0: 0x00e0 > >x1: 0xa00090317a48 > >x2: 0xa000f79d4f00 > >x3: 0xa000c61a44a8 > >x4: 0xdeefe460 ($d.2 + 0xdd776560) > >x5: 0xa001250e4c00 > >x6: 0xe54025b5 ($d.5 + 0xc) > >x7: 0x030a > >x8: 0xe1559000 ($d.2 + 0xdfdd1100) > >x9: 0x0001 > > x10: 0x > > x11: 0x0001 > > x12: 0x0002 > > x13: 0x > > x14: 0x0001 > > x15: 0x > > x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) > > x17: 0x004e0d44 (rms_rlock + 0x0) > > x18: 0xdeefe280 ($d.2 + 0xdd776380) > > x19: 0x > > x20: 0xdeefe460 ($d.2 + 0xdd776560) > > x21: 0x7fff > > x22: 0xa00090317a48 > > x23: 0xa000f79d4f00 > > x24: 0xa001067ef910 > > x25: 0x00e0 > > x26: 0xa000158a8000 > > x27: 0x > > x28: 0xa000158a8000 > > x29: 0xdeefe280 ($d.2 + 0xdd776380) > >sp: 0xdeefe280 > >lr: 0x01623564 (zfs_clone_range + 0x6c) > > elr: 0x004e0d60 (rms_rlock + 0x1c) > > spsr: 0xa045 > > far: 0x0108 > > esr: 0x9604 > > panic: data abort in critical section or under mutex > > cpuid = 1 > > time = 1699610885 > > KDB: stack backtrace: > > db_trace_self() at db_trace_self > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > > vpanic() at vpanic+0x1a0 > > panic() at panic+0x48 > > data_abort() at data_abort+0x2fc > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0x9604 > > rms_rlock() at rms_rlock+0x1c > > zfs_clone_range() at zfs_clone_range+0x
Re: crash zfs_clone_range()
Response below Van: Konstantin Belousov Datum: zondag, 12 november 2023 19:47 Aan: Alexander Motin CC: Ronald Klop , curr...@freebsd.org Onderwerp: Re: crash zfs_clone_range() On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: > Hi Ronald, > > As I can see, the clone request to ZFS came through nullfs, and it crashed > immediately on enter. I've never been a VFS layer expert, but to me it may > be a nullfs problem, not zfs. Is there chance you was (un-)mounting > something when this happened? It is not nullfs issue, I believe, but the lack of the busy reference on the upper mount. I think https://reviews.freebsd.org/D42554 should cover it. > > On 10.11.2023 05:12, Ronald Klop wrote: > > Hi, > > > > Had this crash today on RPI4/15-CURRENT. > > > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG > > arm64 > > > > $ sysctl -a | grep bclon > > vfs.zfs.bclone_enabled: 1 > > > > I started a jail with poudriere to build a package. The jail uses null > > mounts over ZFS. > > > > [root]# cu -s 115200 -l /dev/cuaU0 > > Connected > > > > db> bt > > Tracing pid 95213 tid 100438 td 0xe1e97900 > > db_trace_self() at db_trace_self > > db_stack_trace() at db_stack_trace+0x120 > > db_command() at db_command+0x2e4 > > db_command_loop() at db_command_loop+0x58 > > db_trap() at db_trap+0x100 > > kdb_trap() at kdb_trap+0x334 > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0xf200 > > kdb_enter() at kdb_enter+0x48 > > vpanic() at vpanic+0x1dc > > panic() at panic+0x48 > > data_abort() at data_abort+0x2fc > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0x9604 > > rms_rlock() at rms_rlock+0x1c > > zfs_clone_range() at zfs_clone_range+0x68 > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > null_bypass() at null_bypass+0x118 > > vn_copy_file_range() at vn_copy_file_range+0x18c > > kern_copy_file_range() at kern_copy_file_range+0x36c > > sys_copy_file_range() at sys_copy_file_range+0x8c > > do_el0_sync() at do_el0_sync+0x634 > > handle_el0_sync() at handle_el0_sync+0x48 > > --- exception, esr 0x5600 > > > > > > Oh.. While typing this I rebooted the machine and it happened again. I > > didn't start anything in particular although the machine runs some > > jails. > > > > x0: 0x00e0 > >x1: 0xa00090317a48 > >x2: 0xa000f79d4f00 > >x3: 0xa000c61a44a8 > >x4: 0xdeefe460 ($d.2 + 0xdd776560) > >x5: 0xa001250e4c00 > >x6: 0xe54025b5 ($d.5 + 0xc) > >x7: 0x030a > >x8: 0xe1559000 ($d.2 + 0xdfdd1100) > >x9: 0x0001 > > x10: 0x > > x11: 0x0001 > > x12: 0x0002 > > x13: 0x > > x14: 0x0001 > > x15: 0x > > x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) > > x17: 0x004e0d44 (rms_rlock + 0x0) > > x18: 0xdeefe280 ($d.2 + 0xdd776380) > > x19: 0x > > x20: 0xdeefe460 ($d.2 + 0xdd776560) > > x21: 0x7fff > > x22: 0xa00090317a48 > > x23: 0xa000f79d4f00 > > x24: 0xa001067ef910 > > x25: 0x00e0 > > x26: 0xa000158a8000 > > x27: 0x > > x28: 0xa000158a8000 > > x29: 0xdeefe280 ($d.2 + 0xdd776380) > >sp: 0xdeefe280 > >lr: 0x01623564 (zfs_clone_range + 0x6c) > > elr: 0x004e0d60 (rms_rlock + 0x1c) > > spsr: 0xa045 > > far: 0x0108 > > esr: 0x9604 > > panic: data abort in critical section or under mutex > > cpuid = 1 > > time = 1699610885 > > KDB: stack backtrace: > > db_trace_self() at db_trace_self > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > > vpanic() at vpanic+0x1a0 > > panic() at panic+0x48 > > data_abort() at data_abort+0x2fc > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0x9604 > > rms_rlock() at rms_rlock+0x1c > > zfs_clone_range() at zfs_clone_range+0x68 > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > null_bypass() at null_bypass+0x118 > > vn_copy_file_range() at vn_copy_file
Re: crash zfs_clone_range()
On Sun, Nov 12, 2023 at 11:51:40AM -0500, Alexander Motin wrote: > Hi Ronald, > > As I can see, the clone request to ZFS came through nullfs, and it crashed > immediately on enter. I've never been a VFS layer expert, but to me it may > be a nullfs problem, not zfs. Is there chance you was (un-)mounting > something when this happened? It is not nullfs issue, I believe, but the lack of the busy reference on the upper mount. I think https://reviews.freebsd.org/D42554 should cover it. > > On 10.11.2023 05:12, Ronald Klop wrote: > > Hi, > > > > Had this crash today on RPI4/15-CURRENT. > > > > FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 > > main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 > > ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG > > arm64 > > > > $ sysctl -a | grep bclon > > vfs.zfs.bclone_enabled: 1 > > > > I started a jail with poudriere to build a package. The jail uses null > > mounts over ZFS. > > > > [root]# cu -s 115200 -l /dev/cuaU0 > > Connected > > > > db> bt > > Tracing pid 95213 tid 100438 td 0xe1e97900 > > db_trace_self() at db_trace_self > > db_stack_trace() at db_stack_trace+0x120 > > db_command() at db_command+0x2e4 > > db_command_loop() at db_command_loop+0x58 > > db_trap() at db_trap+0x100 > > kdb_trap() at kdb_trap+0x334 > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0xf200 > > kdb_enter() at kdb_enter+0x48 > > vpanic() at vpanic+0x1dc > > panic() at panic+0x48 > > data_abort() at data_abort+0x2fc > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0x9604 > > rms_rlock() at rms_rlock+0x1c > > zfs_clone_range() at zfs_clone_range+0x68 > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > null_bypass() at null_bypass+0x118 > > vn_copy_file_range() at vn_copy_file_range+0x18c > > kern_copy_file_range() at kern_copy_file_range+0x36c > > sys_copy_file_range() at sys_copy_file_range+0x8c > > do_el0_sync() at do_el0_sync+0x634 > > handle_el0_sync() at handle_el0_sync+0x48 > > --- exception, esr 0x5600 > > > > > > Oh.. While typing this I rebooted the machine and it happened again. I > > didn't start anything in particular although the machine runs some > > jails. > > > > x0: 0x00e0 > > x1: 0xa00090317a48 > > x2: 0xa000f79d4f00 > > x3: 0xa000c61a44a8 > > x4: 0xdeefe460 ($d.2 + 0xdd776560) > > x5: 0xa001250e4c00 > > x6: 0xe54025b5 ($d.5 + 0xc) > > x7: 0x030a > > x8: 0xe1559000 ($d.2 + 0xdfdd1100) > > x9: 0x0001 > > x10: 0x > > x11: 0x0001 > > x12: 0x0002 > > x13: 0x > > x14: 0x0001 > > x15: 0x > > x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) > > x17: 0x004e0d44 (rms_rlock + 0x0) > > x18: 0xdeefe280 ($d.2 + 0xdd776380) > > x19: 0x > > x20: 0xdeefe460 ($d.2 + 0xdd776560) > > x21: 0x7fff > > x22: 0xa00090317a48 > > x23: 0xa000f79d4f00 > > x24: 0xa001067ef910 > > x25: 0x00e0 > > x26: 0xa000158a8000 > > x27: 0x > > x28: 0xa000158a8000 > > x29: 0xdeefe280 ($d.2 + 0xdd776380) > > sp: 0xdeefe280 > > lr: 0x01623564 (zfs_clone_range + 0x6c) > > elr: 0x004e0d60 (rms_rlock + 0x1c) > > spsr: 0xa045 > > far: 0x0108 > > esr: 0x9604 > > panic: data abort in critical section or under mutex > > cpuid = 1 > > time = 1699610885 > > KDB: stack backtrace: > > db_trace_self() at db_trace_self > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38 > > vpanic() at vpanic+0x1a0 > > panic() at panic+0x48 > > data_abort() at data_abort+0x2fc > > handle_el1h_sync() at handle_el1h_sync+0x18 > > --- exception, esr 0x9604 > > rms_rlock() at rms_rlock+0x1c > > zfs_clone_range() at zfs_clone_range+0x68 > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c > > null_bypass() at null_bypass+0x118 > > vn_copy_file_range() at vn_copy_file_range+0x18c > > kern_copy_file_range() at kern_copy_file_range+0x36c > > sys_copy_file_range() at sys_copy_file_range+0x8c > > do_el0_sync() at do_el0_sync+0x634 > > handle_el0_sync() at handle_el0_sync+0x48 > > --- exception, esr 0x5600 > > KDB: enter: panic > > [ thread pid 3792 tid 100394 ] > > Stopped at kdb_enter+0x48: str xzr, [x19, #768] > > db> > > > > I'll keep the debugger open for a while. Can I type something for > > additional info? > > > > Regards, > > Ronald. > > -- > Alexander Motin
Re: crash zfs_clone_range()
Hi Ronald, As I can see, the clone request to ZFS came through nullfs, and it crashed immediately on enter. I've never been a VFS layer expert, but to me it may be a nullfs problem, not zfs. Is there chance you was (un-)mounting something when this happened? On 10.11.2023 05:12, Ronald Klop wrote: Hi, Had this crash today on RPI4/15-CURRENT. FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64 $ sysctl -a | grep bclon vfs.zfs.bclone_enabled: 1 I started a jail with poudriere to build a package. The jail uses null mounts over ZFS. [root]# cu -s 115200 -l /dev/cuaU0 Connected db> bt Tracing pid 95213 tid 100438 td 0xe1e97900 db_trace_self() at db_trace_self db_stack_trace() at db_stack_trace+0x120 db_command() at db_command+0x2e4 db_command_loop() at db_command_loop+0x58 db_trap() at db_trap+0x100 kdb_trap() at kdb_trap+0x334 handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0xf200 kdb_enter() at kdb_enter+0x48 vpanic() at vpanic+0x1dc panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 Oh.. While typing this I rebooted the machine and it happened again. I didn't start anything in particular although the machine runs some jails. x0: 0x00e0 x1: 0xa00090317a48 x2: 0xa000f79d4f00 x3: 0xa000c61a44a8 x4: 0xdeefe460 ($d.2 + 0xdd776560) x5: 0xa001250e4c00 x6: 0xe54025b5 ($d.5 + 0xc) x7: 0x030a x8: 0xe1559000 ($d.2 + 0xdfdd1100) x9: 0x0001 x10: 0x x11: 0x0001 x12: 0x0002 x13: 0x x14: 0x0001 x15: 0x x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) x17: 0x004e0d44 (rms_rlock + 0x0) x18: 0xdeefe280 ($d.2 + 0xdd776380) x19: 0x x20: 0xdeefe460 ($d.2 + 0xdd776560) x21: 0x7fff x22: 0xa00090317a48 x23: 0xa000f79d4f00 x24: 0xa001067ef910 x25: 0x00e0 x26: 0xa000158a8000 x27: 0x x28: 0xa000158a8000 x29: 0xdeefe280 ($d.2 + 0xdd776380) sp: 0xdeefe280 lr: 0x01623564 (zfs_clone_range + 0x6c) elr: 0x004e0d60 (rms_rlock + 0x1c) spsr: 0xa045 far: 0x0108 esr: 0x9604 panic: data abort in critical section or under mutex cpuid = 1 time = 1699610885 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 KDB: enter: panic [ thread pid 3792 tid 100394 ] Stopped at kdb_enter+0x48: str xzr, [x19, #768] db> I'll keep the debugger open for a while. Can I type something for additional info? Regards, Ronald. -- Alexander Motin
Re: crash zfs_clone_range()
Hi Ronald, hitting the panic with a DEBUG kernel would be great and it would be very nice if I could somehow reproduce the panic. I have the option to rent an cheap arm64 virtual host at Hetzner so I could test that at an environment close to yours. Please try compiling a GENERIC-DEBUG kernel with: include GENERIC ident GENERIC-DEBUG options INVARIANTS options INVARIANT_SUPPORT options WITNESS options WITNESS_SKIPSPIN options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC options DDB Cheers, mm On 10. 11. 2023 11:12, Ronald Klop wrote: Hi, Had this crash today on RPI4/15-CURRENT. FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64 $ sysctl -a | grep bclon vfs.zfs.bclone_enabled: 1 I started a jail with poudriere to build a package. The jail uses null mounts over ZFS. [root]# cu -s 115200 -l /dev/cuaU0 Connected db> bt Tracing pid 95213 tid 100438 td 0xe1e97900 db_trace_self() at db_trace_self db_stack_trace() at db_stack_trace+0x120 db_command() at db_command+0x2e4 db_command_loop() at db_command_loop+0x58 db_trap() at db_trap+0x100 kdb_trap() at kdb_trap+0x334 handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0xf200 kdb_enter() at kdb_enter+0x48 vpanic() at vpanic+0x1dc panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 Oh.. While typing this I rebooted the machine and it happened again. I didn't start anything in particular although the machine runs some jails. x0: 0x00e0 x1: 0xa00090317a48 x2: 0xa000f79d4f00 x3: 0xa000c61a44a8 x4: 0xdeefe460 ($d.2 + 0xdd776560) x5: 0xa001250e4c00 x6: 0xe54025b5 ($d.5 + 0xc) x7: 0x030a x8: 0xe1559000 ($d.2 + 0xdfdd1100) x9: 0x0001 x10: 0x x11: 0x0001 x12: 0x0002 x13: 0x x14: 0x0001 x15: 0x x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) x17: 0x004e0d44 (rms_rlock + 0x0) x18: 0xdeefe280 ($d.2 + 0xdd776380) x19: 0x x20: 0xdeefe460 ($d.2 + 0xdd776560) x21: 0x7fff x22: 0xa00090317a48 x23: 0xa000f79d4f00 x24: 0xa001067ef910 x25: 0x00e0 x26: 0xa000158a8000 x27: 0x x28: 0xa000158a8000 x29: 0xdeefe280 ($d.2 + 0xdd776380) sp: 0xdeefe280 lr: 0x01623564 (zfs_clone_range + 0x6c) elr: 0x004e0d60 (rms_rlock + 0x1c) spsr: 0xa045 far: 0x0108 esr: 0x9604 panic: data abort in critical section or under mutex cpuid = 1 time = 1699610885 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 KDB: enter: panic [ thread pid 3792 tid 100394 ] Stopped at kdb_enter+0x48: str xzr, [x19, #768] db> I'll keep the debugger open for a while. Can I type something for additional info? Regards, Ronald.
crash zfs_clone_range()
Hi, Had this crash today on RPI4/15-CURRENT. FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64 $ sysctl -a | grep bclon vfs.zfs.bclone_enabled: 1 I started a jail with poudriere to build a package. The jail uses null mounts over ZFS. [root]# cu -s 115200 -l /dev/cuaU0 Connected db> bt Tracing pid 95213 tid 100438 td 0xe1e97900 db_trace_self() at db_trace_self db_stack_trace() at db_stack_trace+0x120 db_command() at db_command+0x2e4 db_command_loop() at db_command_loop+0x58 db_trap() at db_trap+0x100 kdb_trap() at kdb_trap+0x334 handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0xf200 kdb_enter() at kdb_enter+0x48 vpanic() at vpanic+0x1dc panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 Oh.. While typing this I rebooted the machine and it happened again. I didn't start anything in particular although the machine runs some jails. x0: 0x00e0 x1: 0xa00090317a48 x2: 0xa000f79d4f00 x3: 0xa000c61a44a8 x4: 0xdeefe460 ($d.2 + 0xdd776560) x5: 0xa001250e4c00 x6: 0xe54025b5 ($d.5 + 0xc) x7: 0x030a x8: 0xe1559000 ($d.2 + 0xdfdd1100) x9: 0x0001 x10: 0x x11: 0x0001 x12: 0x0002 x13: 0x x14: 0x0001 x15: 0x x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) x17: 0x004e0d44 (rms_rlock + 0x0) x18: 0xdeefe280 ($d.2 + 0xdd776380) x19: 0x x20: 0xdeefe460 ($d.2 + 0xdd776560) x21: 0x7fff x22: 0xa00090317a48 x23: 0xa000f79d4f00 x24: 0xa001067ef910 x25: 0x00e0 x26: 0xa000158a8000 x27: 0x x28: 0xa000158a8000 x29: 0xdeefe280 ($d.2 + 0xdd776380) sp: 0xdeefe280 lr: 0x01623564 (zfs_clone_range + 0x6c) elr: 0x004e0d60 (rms_rlock + 0x1c) spsr: 0xa045 far: 0x0108 esr: 0x9604 panic: data abort in critical section or under mutex cpuid = 1 time = 1699610885 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 KDB: enter: panic [ thread pid 3792 tid 100394 ] Stopped at kdb_enter+0x48: str xzr, [x19, #768] db> I'll keep the debugger open for a while. Can I type something for additional info? Regards, Ronald.