FreeBSD ??-?-RELEASE matching vs. zpool create -o compatibility= use

2023-04-17 Thread Mark Millard
Warner Losh  wrote on
Date: Tue, 18 Apr 2023 01:16:01 UTC :
[For a different subject.]

> . . .
> 
> Related question: what zfs branch is stable/14 going to track? With 13 it
> was whatever the next stable branch was.

I've a somewhat related question, using 13.2-RELEASE as an
example of my general question.

FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC

generates:

# zpool version
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
zfs-2.1.9-FreeBSD_g92e0d9d18
zfs-kmod-2.1.9-FreeBSD_g92e0d9d18

but there is only:

# ls -C1 /usr/share/zfs/compatibility.d/openzfs*freebsd
/usr/share/zfs/compatibility.d/openzfs-2.0-freebsd
/usr/share/zfs/compatibility.d/openzfs-2.1-freebsd

No openzfs-2.1.9-freebsd file is available for use
with the likes of the notation:

-o compatibility=openzfs-2.1.9-freebsd

Such presumably would also enable (based on an
what is reported for an existing
openzfs-2.1-freebsd style pool):

# zpool get all | grep feature@ | grep disabled
zoptb  feature@edonr  disabled   local
zoptb  feature@zilsaxattr disabled   local
zoptb  feature@head_errlogdisabled   local
zoptb  feature@blake3 disabled   local

but there would be a named compatibility assignment
available for that.

It is normal for me to hold compatibility at what some
FreeBSD ??.?-RELEASE (and later) supports, even for
main based systems (my normal context). (I do not know
how common that personal policy is in the world.) I
stick to the most recent that is official in the
??.?-RELEASE's

/usr/share/zfs/compatibility.d/

Would it be appropriate for 13.2-RELEASE (e.g.) to
have such a:

/usr/share/zfs/compatibility.d/openzfs-2.1.9-freebsd

that would match the FreeBSD release but that
eventually would not list everything stable/13 or
main would eventually support?

( stable/13 and main would then also end up with
the file being present as well. But I'm working
backwards from the end result to how to get there. )

Note: One could imagine a openzfs-2.1.10-freebsd
in releng/14.0 that did not list block_cloning, even
if there was (only) an odd way to get block_cloning
enabled for testing purposes.

===
Mark Millard
marklmi at yahoo.com




Re: another crash and going forward with zfs

2023-04-17 Thread Warner Losh
On Mon, Apr 17, 2023, 5:37 PM Rick Macklem  wrote:

> On Mon, Apr 17, 2023 at 4:29 PM Cy Schubert 
> wrote:
> >
> > In message , Pawel
> Jakub
> > Dawi
> > dek writes:
> > > On 4/18/23 05:14, Mateusz Guzik wrote:
> > > > On 4/17/23, Pawel Jakub Dawidek  wrote:
> > > >> Correct me if I'm wrong, but from my understanding there were zero
> > > >> problems with block cloning when it wasn't in use or now disabled.
> > > >>
> > > >> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to
> exactly
> > > >> avoid mess like this and give us more time to sort all the problems
> out
> > > >> while making it easy for people to try it.
> > > >>
> > > >> If there is no plan to revert the whole import, I don't see what
> value
> > > >> removing just block cloning will bring if it is now disabled by
> default
> > > >> and didn't cause any problems when disabled.
> > > >>
> > > >
> > > > The feature definitely was not properly stress tested and what not
> and
> > > > trying to do it keeps running into panics. Given the complexity of
> the
> > > > feature I would expect there are many bug lurking, some of which
> > > > possibly related to the on disk format. Not having to deal with any
> of
> > > > this is can be arranged as described above and is imo the most
> > > > sensible route given the timeline for 14.0
> > >
> > > Block cloning doesn't create, remove or modify any on-disk data until
> it
> > > is in use.
> > >
> > > Again, if we are not going to revert the whole merge, I see no point in
> > > reverting block cloning as until it is enabled, its code is not
> > > executed. This allow people who upgraded the pools to do nothing
> special
> > > and it will allow people to test it easily.
> >
> > In this case zpool upgrade and zpool status should return no feature
> > upgrades are available instead of enticing users to zpool upgrade. The
> > userland zpool command should test for this sysctl and print nothing
> > regarding block_cloning. I can see a scenario when a user zpool upgrades
> > their pools, notices the sysctl and does the unthinkable. Not only would
> > this fill the mailing lists with angry chatter but it would spawn a
> number
> > of PRs plus give us a lot of bad press for data loss.
> >
> > Should we keep the new ZFS in 14, we should:
> >
> > 1. Make sure that zpool(8) does not mention or offer block_cloning in any
> > way if the sysctl is disabled.
> >
> > 2. Print a cautionary note in release notes advising people not to enable
> > this experimental sysctl. Maybe even have it print "(experimental)" to
> warn
> > users that it will hurt.
> >
> > 3. Update the man pages to caution that block_cloning is experimental and
> > unstable.
> I would suggest going a step further and making the sysctl RO for
> FreeBSD14.
> (This could be changed for FreeBSD14.n if/when block_cloning is believed to
>  be debugged.)
>
> I would apply all 3 of the above to "main", since some that install "main"
> will not know how "bleeding edge" this is unless the above is done.
> (Yes, I know "main" is "bleeding edge", but some still expect a stable
>  test system will result from installing it.)
>
> Thanks go to all that tracked this problem down, rick
>

Related question: what zfs branch is stable/14 going to track? With 13 it
was whatever the next stable branch was.

Warner


>
> > It's not enough to have a sysctl without hiding block_cloning completely
> > from view. Only expose it in zpool(8) when the sysctl is enabled. Let's
> > avoid people mistakenly enabling it.
> >
> >
> > --
> > Cheers,
> > Cy Schubert 
> > FreeBSD UNIX: Web:  https://FreeBSD.org
> > NTP:   Web:  https://nwtime.org
> >
> > e^(i*pi)+1=0
> >
> >
> >
>
>


Re: another crash and going forward with zfs

2023-04-17 Thread Rick Macklem
On Mon, Apr 17, 2023 at 4:29 PM Cy Schubert  wrote:
>
> In message , Pawel Jakub
> Dawi
> dek writes:
> > On 4/18/23 05:14, Mateusz Guzik wrote:
> > > On 4/17/23, Pawel Jakub Dawidek  wrote:
> > >> Correct me if I'm wrong, but from my understanding there were zero
> > >> problems with block cloning when it wasn't in use or now disabled.
> > >>
> > >> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
> > >> avoid mess like this and give us more time to sort all the problems out
> > >> while making it easy for people to try it.
> > >>
> > >> If there is no plan to revert the whole import, I don't see what value
> > >> removing just block cloning will bring if it is now disabled by default
> > >> and didn't cause any problems when disabled.
> > >>
> > >
> > > The feature definitely was not properly stress tested and what not and
> > > trying to do it keeps running into panics. Given the complexity of the
> > > feature I would expect there are many bug lurking, some of which
> > > possibly related to the on disk format. Not having to deal with any of
> > > this is can be arranged as described above and is imo the most
> > > sensible route given the timeline for 14.0
> >
> > Block cloning doesn't create, remove or modify any on-disk data until it
> > is in use.
> >
> > Again, if we are not going to revert the whole merge, I see no point in
> > reverting block cloning as until it is enabled, its code is not
> > executed. This allow people who upgraded the pools to do nothing special
> > and it will allow people to test it easily.
>
> In this case zpool upgrade and zpool status should return no feature
> upgrades are available instead of enticing users to zpool upgrade. The
> userland zpool command should test for this sysctl and print nothing
> regarding block_cloning. I can see a scenario when a user zpool upgrades
> their pools, notices the sysctl and does the unthinkable. Not only would
> this fill the mailing lists with angry chatter but it would spawn a number
> of PRs plus give us a lot of bad press for data loss.
>
> Should we keep the new ZFS in 14, we should:
>
> 1. Make sure that zpool(8) does not mention or offer block_cloning in any
> way if the sysctl is disabled.
>
> 2. Print a cautionary note in release notes advising people not to enable
> this experimental sysctl. Maybe even have it print "(experimental)" to warn
> users that it will hurt.
>
> 3. Update the man pages to caution that block_cloning is experimental and
> unstable.
I would suggest going a step further and making the sysctl RO for FreeBSD14.
(This could be changed for FreeBSD14.n if/when block_cloning is believed to
 be debugged.)

I would apply all 3 of the above to "main", since some that install "main"
will not know how "bleeding edge" this is unless the above is done.
(Yes, I know "main" is "bleeding edge", but some still expect a stable
 test system will result from installing it.)

Thanks go to all that tracked this problem down, rick

>
> It's not enough to have a sysctl without hiding block_cloning completely
> from view. Only expose it in zpool(8) when the sysctl is enabled. Let's
> avoid people mistakenly enabling it.
>
>
> --
> Cheers,
> Cy Schubert 
> FreeBSD UNIX: Web:  https://FreeBSD.org
> NTP:   Web:  https://nwtime.org
>
> e^(i*pi)+1=0
>
>
>



Re: another crash and going forward with zfs

2023-04-17 Thread Cy Schubert
In message , Pawel Jakub 
Dawi
dek writes:
> On 4/18/23 05:14, Mateusz Guzik wrote:
> > On 4/17/23, Pawel Jakub Dawidek  wrote:
> >> Correct me if I'm wrong, but from my understanding there were zero
> >> problems with block cloning when it wasn't in use or now disabled.
> >>
> >> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
> >> avoid mess like this and give us more time to sort all the problems out
> >> while making it easy for people to try it.
> >>
> >> If there is no plan to revert the whole import, I don't see what value
> >> removing just block cloning will bring if it is now disabled by default
> >> and didn't cause any problems when disabled.
> >>
> > 
> > The feature definitely was not properly stress tested and what not and
> > trying to do it keeps running into panics. Given the complexity of the
> > feature I would expect there are many bug lurking, some of which
> > possibly related to the on disk format. Not having to deal with any of
> > this is can be arranged as described above and is imo the most
> > sensible route given the timeline for 14.0
>
> Block cloning doesn't create, remove or modify any on-disk data until it 
> is in use.
>
> Again, if we are not going to revert the whole merge, I see no point in 
> reverting block cloning as until it is enabled, its code is not 
> executed. This allow people who upgraded the pools to do nothing special 
> and it will allow people to test it easily.

In this case zpool upgrade and zpool status should return no feature 
upgrades are available instead of enticing users to zpool upgrade. The 
userland zpool command should test for this sysctl and print nothing 
regarding block_cloning. I can see a scenario when a user zpool upgrades 
their pools, notices the sysctl and does the unthinkable. Not only would 
this fill the mailing lists with angry chatter but it would spawn a number 
of PRs plus give us a lot of bad press for data loss.

Should we keep the new ZFS in 14, we should:

1. Make sure that zpool(8) does not mention or offer block_cloning in any 
way if the sysctl is disabled.

2. Print a cautionary note in release notes advising people not to enable 
this experimental sysctl. Maybe even have it print "(experimental)" to warn 
users that it will hurt.

3. Update the man pages to caution that block_cloning is experimental and 
unstable.

It's not enough to have a sysctl without hiding block_cloning completely 
from view. Only expose it in zpool(8) when the sysctl is enabled. Let's 
avoid people mistakenly enabling it.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0





Re: another crash and going forward with zfs

2023-04-17 Thread Pawel Jakub Dawidek

On 4/18/23 05:14, Mateusz Guzik wrote:

On 4/17/23, Pawel Jakub Dawidek  wrote:

Correct me if I'm wrong, but from my understanding there were zero
problems with block cloning when it wasn't in use or now disabled.

The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
avoid mess like this and give us more time to sort all the problems out
while making it easy for people to try it.

If there is no plan to revert the whole import, I don't see what value
removing just block cloning will bring if it is now disabled by default
and didn't cause any problems when disabled.



The feature definitely was not properly stress tested and what not and
trying to do it keeps running into panics. Given the complexity of the
feature I would expect there are many bug lurking, some of which
possibly related to the on disk format. Not having to deal with any of
this is can be arranged as described above and is imo the most
sensible route given the timeline for 14.0


Block cloning doesn't create, remove or modify any on-disk data until it 
is in use.


Again, if we are not going to revert the whole merge, I see no point in 
reverting block cloning as until it is enabled, its code is not 
executed. This allow people who upgraded the pools to do nothing special 
and it will allow people to test it easily.


--
Pawel Jakub Dawidek




Re: find(1): I18N gone wild ?

2023-04-17 Thread Yuri
Xin LI wrote:
> This is expected behavior (in en_US.UTF-8 the ordering is AaBb, not
> ABab).  You might want to set LC_COLLATE to C if C behavior is desirable.
> 
> On Mon, Apr 17, 2023 at 2:06 PM Poul-Henning Kamp  > wrote:
> 
> This surprised me:
> 
>         # mkdir /tmp/P
>         # cd /tmp/P
>         # touch FOO
>         # touch bar
>         # env LANG=C.UTF-8 find . -name '[A-Z]*' -print
>         ./FOO
>         # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print
>         ./FOO
>         ./bar
> 
> Really ?!

A bit more detail:

find uses fnmatch(3) here, where the RE Bracket Expression rules apply
(except for ! instead of ^, but that's unrelated):

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05

...which has the following note:

7. In the POSIX locale, a range expression represents the set of
collating elements that fall between two elements in the collation
sequence, inclusive. In other locales, a range expression has
unspecified behavior: strictly conforming applications shall not rely on
whether the range expression is valid, or on the set of collating
elements matched.

Indeed, it's unfortunate that collations in non-POSIX are not that...
linear and range expressions can break, but I don't see an easy way of
"fixing" this.



Re: find(1): I18N gone wild ?

2023-04-17 Thread Xin LI
This is expected behavior (in en_US.UTF-8 the ordering is AaBb, not ABab).
You might want to set LC_COLLATE to C if C behavior is desirable.

On Mon, Apr 17, 2023 at 2:06 PM Poul-Henning Kamp 
wrote:

> This surprised me:
>
> # mkdir /tmp/P
> # cd /tmp/P
> # touch FOO
> # touch bar
> # env LANG=C.UTF-8 find . -name '[A-Z]*' -print
> ./FOO
> # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print
> ./FOO
> ./bar
>
> Really ?!
>
> --
> Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
> p...@freebsd.org | TCP/IP since RFC 956
> FreeBSD committer   | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by incompetence.
>
>


find(1): I18N gone wild ?

2023-04-17 Thread Poul-Henning Kamp
This surprised me:

# mkdir /tmp/P
# cd /tmp/P
# touch FOO
# touch bar
# env LANG=C.UTF-8 find . -name '[A-Z]*' -print
./FOO
# env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print
./FOO
./bar

Really ?!

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
p...@freebsd.org | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.



Re: another crash and going forward with zfs

2023-04-17 Thread Mateusz Guzik
On 4/17/23, Pawel Jakub Dawidek  wrote:
> On 4/18/23 03:51, Mateusz Guzik wrote:
>> After bugfixes got committed I decided to zpool upgrade and sysctl
>> vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
>> quickly got a new crash:
>>
>> panic: VERIFY(arc_released(db->db_buf)) failed
>>
>> cpuid = 9
>> time = 1681755046
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe0a90b8e5f0
>> vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
>> spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
>> dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
>> dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
>> 0xfe0a90b8e700
>> dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame
>> 0xfe0a90b8e780
>> dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
>> zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
>> zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
>> VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
>> vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
>> vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
>> vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
>> vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
>> dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
>> sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
>> amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>> 0xfe0a90b8ef30
>> --- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
>> 0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
>> KDB: enter: panic
>> [ thread pid 95000 tid 135035 ]
>> Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)
>>
>> The posted 14.0 schedule which plans to branch stable/14 on May 12 and
>> one cannot bet on the feature getting beaten up into production shape
>> by that time. Given whatever non-block_clonning and not even zfs bugs
>> which are likely to come out I think this makes the feature a
>> non-starter for said release.
>>
>> I note:
>> 1. the current problems did not make it into stable branches.
>> 2. there was block_cloning-related data corruption (fixed) and there may
>> be more
>> 3. there was unrelated data corruption (see
>> https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
>> the problematic commit in FreeBSD, not yet sorted out upstream
>>
>> As such people's data may be partially hosed as is.
>>
>> Consequently the proposed plan is as follows:
>> 1. whack the block cloning feature for the time being, but make sure
>> pools which upgraded to it can be mounted read-only
>> 2. run ztest and whatever other stress testing on FreeBSD, along with
>> restoring openzfs CI -- I can do the first part, I'm sure pho will not
>> mind to run some tests of his own
>> 3. recommend people create new pools and restore data from backup. if
>> restoring from backup is not an option, tar or cp (not zfs send) from
>> the read-only mount
>>
>> block cloning beaten into shape would use block_cloning_v2 or whatever
>> else, key point that the current feature name would be considered
>> bogus (not blocking RO import though) to prevent RW usage of the
>> current pools with it enabled.
>>
>> Comments?
>
> Correct me if I'm wrong, but from my understanding there were zero
> problems with block cloning when it wasn't in use or now disabled.
>
> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
> avoid mess like this and give us more time to sort all the problems out
> while making it easy for people to try it.
>
> If there is no plan to revert the whole import, I don't see what value
> removing just block cloning will bring if it is now disabled by default
> and didn't cause any problems when disabled.
>

The feature definitely was not properly stress tested and what not and
trying to do it keeps running into panics. Given the complexity of the
feature I would expect there are many bug lurking, some of which
possibly related to the on disk format. Not having to deal with any of
this is can be arranged as described above and is imo the most
sensible route given the timeline for 14.0

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 21:28, José Pérez wrote:

Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we 
recover from the worst case scenario which is a machine with a kernel > 
2a58b312b62f and ZFS root upgraded with block cloning enabled.


In particular, is it safe to turn such a machine on in the first place, 
and what are the risks involved in doing so? Any potential data loss?


Would such a machine be able to fix itself by compiling a kernel, or 
would compilation fail and might data be corrupted in the process?


I have two poudriere builders powered off (I am not alone in this 
situation) and I need to recover them, ideally minimizing data loss. The 
builders are also hosting current and used to build kernels and worlds 
for 13 and current: as of now all my production machines are stuck on 
the 13 they run, I cannot update binaries nor packages and I would like 
to be back online.


José,

I can only speak of block cloning in details, but I'll try to address 
everything.


The easiest way to avoid block_cloning-related corruption on the kernel 
after the last OpenZFS merge, but before e0bb199925 is to set the 
compress property to 'off' and the sync property to something other than 
'disabled'. This will avoid the block_cloning-related corruption and 
zil_replaying() panic.


As for the other corruption, unfortunately I don't know the details, but 
my understanding is that it is happening under higher load. Not sure I'd 
trust a kernel built on a machine with this bug present. What I would do 
is to compile the kernel as of 068913e4ba somewhere else, boot the 
problematic machine in single-user mode and install the newly built kernel.


As far as I can tell, contrary to some initial reports, none of the 
problems introduced by the recent OpenZFS merge corrupt the pool 
metadata, only file's data. You can locate the files modified with the 
bogus kernel using find(1) with a proper modification time, but you have 
to decide what to do with them (either throw them away, restore them 
from backup or inspect them).


--
Pawel Jakub Dawidek




Re: another crash and going forward with zfs

2023-04-17 Thread Pawel Jakub Dawidek

On 4/18/23 03:51, Mateusz Guzik wrote:

After bugfixes got committed I decided to zpool upgrade and sysctl
vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
quickly got a new crash:

panic: VERIFY(arc_released(db->db_buf)) failed

cpuid = 9
time = 1681755046
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0
vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
0xfe0a90b8e700
dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780
dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30
--- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
KDB: enter: panic
[ thread pid 95000 tid 135035 ]
Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)

The posted 14.0 schedule which plans to branch stable/14 on May 12 and
one cannot bet on the feature getting beaten up into production shape
by that time. Given whatever non-block_clonning and not even zfs bugs
which are likely to come out I think this makes the feature a
non-starter for said release.

I note:
1. the current problems did not make it into stable branches.
2. there was block_cloning-related data corruption (fixed) and there may be more
3. there was unrelated data corruption (see
https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
the problematic commit in FreeBSD, not yet sorted out upstream

As such people's data may be partially hosed as is.

Consequently the proposed plan is as follows:
1. whack the block cloning feature for the time being, but make sure
pools which upgraded to it can be mounted read-only
2. run ztest and whatever other stress testing on FreeBSD, along with
restoring openzfs CI -- I can do the first part, I'm sure pho will not
mind to run some tests of his own
3. recommend people create new pools and restore data from backup. if
restoring from backup is not an option, tar or cp (not zfs send) from
the read-only mount

block cloning beaten into shape would use block_cloning_v2 or whatever
else, key point that the current feature name would be considered
bogus (not blocking RO import though) to prevent RW usage of the
current pools with it enabled.

Comments?


Correct me if I'm wrong, but from my understanding there were zero 
problems with block cloning when it wasn't in use or now disabled.


The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly 
avoid mess like this and give us more time to sort all the problems out 
while making it easy for people to try it.


If there is no plan to revert the whole import, I don't see what value 
removing just block cloning will bring if it is now disabled by default 
and didn't cause any problems when disabled.


--
Pawel Jakub Dawidek




another crash and going forward with zfs

2023-04-17 Thread Mateusz Guzik
After bugfixes got committed I decided to zpool upgrade and sysctl
vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
quickly got a new crash:

panic: VERIFY(arc_released(db->db_buf)) failed

cpuid = 9
time = 1681755046
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0
vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
0xfe0a90b8e700
dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780
dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30
--- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
KDB: enter: panic
[ thread pid 95000 tid 135035 ]
Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)

The posted 14.0 schedule which plans to branch stable/14 on May 12 and
one cannot bet on the feature getting beaten up into production shape
by that time. Given whatever non-block_clonning and not even zfs bugs
which are likely to come out I think this makes the feature a
non-starter for said release.

I note:
1. the current problems did not make it into stable branches.
2. there was block_cloning-related data corruption (fixed) and there may be more
3. there was unrelated data corruption (see
https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
the problematic commit in FreeBSD, not yet sorted out upstream

As such people's data may be partially hosed as is.

Consequently the proposed plan is as follows:
1. whack the block cloning feature for the time being, but make sure
pools which upgraded to it can be mounted read-only
2. run ztest and whatever other stress testing on FreeBSD, along with
restoring openzfs CI -- I can do the first part, I'm sure pho will not
mind to run some tests of his own
3. recommend people create new pools and restore data from backup. if
restoring from backup is not an option, tar or cp (not zfs send) from
the read-only mount

block cloning beaten into shape would use block_cloning_v2 or whatever
else, key point that the current feature name would be considered
bogus (not blocking RO import though) to prevent RW usage of the
current pools with it enabled.

Comments?

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Mark Millard
José_Pérez  wrote on
Date: Mon, 17 Apr 2023 12:28:40 UTC :

> El 2023-04-17 12:43, Pawel Jakub Dawidek escribió:
> > On 4/17/23 18:15, Pawel Jakub Dawidek wrote:
> >> There were three issues that I know of after the recent OpenZFS merge:
> >> 
> >> 1. Data corruption unrelated to block cloning, so it can happen even 
> >> with block cloning disabled or not in use. This was the problematic 
> >> commit:
> >> 
> >> 
> >> https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9
> >> 
> >> It was reverted in 63ee747febbf024be0aace61161241b53245449e.
> >> 
> >> 2. Data corruption with embedded blocks when block cloning is enabled. 
> >> It can happen when compression is enabled and the block contains 
> >> between 60 to 112 bytes (this might be hard to determine). Fix exists, 
> >> it is merged to OpenZFS already, but isn't in FreeBSD yet.
> >> OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739
> >> 
> >> 3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is 
> >> triggered when block cloning is enabled, the sync property is set to 
> >> disabled and copy_file_range(2) is used. Easy fix exists, it is not 
> >> yet merged to OpenZFS and not yet in FreeBSD HEAD.
> >> OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758
> >> 
> >> Block cloning was disabled in 
> >> 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur.
> > 
> > As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are
> > fixed, as far as I can tell.
> > 
> > Block cloning remains disabled for now just to be on the safe side,
> > but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.
> > 
> > Don't relay on this sysctl as it will be removed in 2-3 weeks.
> 
> Hi Pawel,
> thank you for your reply and for the fixes.
> 
> I think there is a 4th issue that needs to be addressed: how do we 
> recover from the worst case scenario which is a machine with a kernel > 
> 2a58b312b62f and ZFS root upgraded with block cloning enabled.
> 
> In particular, is it safe to turn such a machine on in the first place, 
> and what are the risks involved in doing so? Any potential data loss?
> 
> Would such a machine be able to fix itself by compiling a kernel, or 
> would compilation fail and might data be corrupted in the process?
> 
> I have two poudriere builders powered off (I am not alone in this 
> situation) and I need to recover them, ideally minimizing data loss. The 
> builders are also hosting current and used to build kernels and worlds 
> for 13 and current: as of now all my production machines are stuck on 
> the 13 they run, I cannot update binaries nor packages and I would like 
> to be back online.
> 
> Whatever the fixing procedure, it shall be outlined in the UPDATING 
> document.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270811 is an example
issue where a FreeBSD powerpc package building server can not boot
--after patching so it no longer gets a boot time "panic: floating-point
unavailable trap" (that jhibbits patch is still not committed):

QUOTE from the description:
. . .
nda1: 953869MB (1953525168 512 byte sectors)
GEOM_MIRROR: Device mirror/swap0 launched (2/2).
Mounting from zfs:zroot failed with error 6; retrying for 3 more seconds
Mounting from zfs:zroot failed with error 6.

Loader variables:
vfs.root.mountfrom=zfs:zroot

Manual root filesystem specification:
: [options]
Mount  using filesystem 
and with the specified (optional) option list.

eg. ufs:/dev/da0s1a
zfs:zroot/ROOT/default
cd9660:/dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

? List valid disk boot devices
. Yield 1 second (for background tasks)
 Abort manual input

mountroot>

This machine is part of the FreeBSD cluster for building PowerPC packages,
so we can build kernels to test anytime necessary.
END  QUOTE

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread José Pérez

El 2023-04-17 12:43, Pawel Jakub Dawidek escribió:

On 4/17/23 18:15, Pawel Jakub Dawidek wrote:

There were three issues that I know of after the recent OpenZFS merge:

1. Data corruption unrelated to block cloning, so it can happen even 
with block cloning disabled or not in use. This was the problematic 
commit:
 
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9


It was reverted in 63ee747febbf024be0aace61161241b53245449e.

2. Data corruption with embedded blocks when block cloning is enabled. 
It can happen when compression is enabled and the block contains 
between 60 to 112 bytes (this might be hard to determine). Fix exists, 
it is merged to OpenZFS already, but isn't in FreeBSD yet.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739

3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is 
triggered when block cloning is enabled, the sync property is set to 
disabled and copy_file_range(2) is used. Easy fix exists, it is not 
yet merged to OpenZFS and not yet in FreeBSD HEAD.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758

Block cloning was disabled in 
46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur.


As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are
fixed, as far as I can tell.

Block cloning remains disabled for now just to be on the safe side,
but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.

Don't relay on this sysctl as it will be removed in 2-3 weeks.


Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we 
recover from the worst case scenario which is a machine with a kernel > 
2a58b312b62f and ZFS root upgraded with block cloning enabled.


In particular, is it safe to turn such a machine on in the first place, 
and what are the risks involved in doing so? Any potential data loss?


Would such a machine be able to fix itself by compiling a kernel, or 
would compilation fail and might data be corrupted in the process?


I have two poudriere builders powered off (I am not alone in this 
situation) and I need to recover them, ideally minimizing data loss. The 
builders are also hosting current and used to build kernels and worlds 
for 13 and current: as of now all my production machines are stuck on 
the 13 they run, I cannot update binaries nor packages and I would like 
to be back online.


Whatever the fixing procedure, it shall be outlined in the UPDATING 
document.


Thank you.

BR,

--
José Pérez



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 18:15, Pawel Jakub Dawidek wrote:

There were three issues that I know of after the recent OpenZFS merge:

1. Data corruption unrelated to block cloning, so it can happen even 
with block cloning disabled or not in use. This was the problematic commit:

 
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9

It was reverted in 63ee747febbf024be0aace61161241b53245449e.

2. Data corruption with embedded blocks when block cloning is enabled. 
It can happen when compression is enabled and the block contains between 
60 to 112 bytes (this might be hard to determine). Fix exists, it is 
merged to OpenZFS already, but isn't in FreeBSD yet.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739

3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is triggered 
when block cloning is enabled, the sync property is set to disabled and 
copy_file_range(2) is used. Easy fix exists, it is not yet merged to 
OpenZFS and not yet in FreeBSD HEAD.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758

Block cloning was disabled in 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, 
so 2 and 3 should not occur.


As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are 
fixed, as far as I can tell.


Block cloning remains disabled for now just to be on the safe side, but 
can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.


Don't relay on this sysctl as it will be removed in 2-3 weeks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread José Pérez

Hi Pawel,
thank you for the patch.

Can you please elaborate a little more?

Did you run any tests? Is it safe to use your patch to access pools with 
feature@block_cloning active? Is it possible to build a kernel from such 
a pool?


Asking for others: is this fixing any corrupted data?

Thank you.

BR,

El 2023-04-17 06:35, Pawel Jakub Dawidek escribió:

On 4/16/23 01:07, Florian Smeets wrote:
On the pool that has block_cloning enabled I see the above insta panic 
when poudriere starts building. I found a workaround though:


--- /usr/local/share/poudriere/include/fs.sh.orig    2023-04-15 
18:03:50.090823000 +0200
+++ /usr/local/share/poudriere/include/fs.sh    2023-04-15 
18:04:04.144736000 +0200

@@ -295,7 +295,6 @@
  fi

  zfs clone -o mountpoint=${mnt} \
-    -o sync=disabled \
  -o atime=off \
  -o compression=off \
  ${fs}@${snap} \

With this workaround I was able to build thousands of packages without 
panics or failures due to data corruption.


Thank you, Florian, that was very helpful!

This should fix the problem:

https://github.com/openzfs/zfs/pull/14758


--
José Pérez