Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mateusz Guzik
On 8/24/23, Mark Millard  wrote:
> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
>
>> On 8/23/23, Mark Millard  wrote:
>>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>>>
>>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>>>
 On 22.08.2023 14:24, Mark Millard wrote:
> Alexander Motin  wrote on
> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>> I am waiting for final test results from George Wilson and then will
>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>> there are still not many reviewers for the PR, since the code is not
>> trivial, but at least with the test reports Brian Behlendorf and Mark
>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>> have tested and/or reviewed the PR, you may comment on it.
> I had written to the list that when I tried to test the system
> doing poudriere builds (initially with your patches) using
> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
> instead got only one builder that ended up active, the others
> never reaching "Builder started":

> Top was showing lots of "vlruwk" for the cpdup's. For example:
> . . .
> 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
> 0.00% cpdup -i0 -o ref 32
> 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
> 0.01% cpdup -i0 -o ref 31
> 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
> 0.01% cpdup -i0 -o ref 30
> 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
> 0.01% cpdup -i0 -o ref 29
> 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
> 0.01% cpdup -i0 -o ref 28
> 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
> 0.00% cpdup -i0 -o ref 27
> . . .
> But those processes did show CPU?? on occasion, as well as
> *vnode less often. None of the cpdup's was stuck in
> Removing your patches did not change the behavior.

 Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
 deep in that area at least recently, so somebody with more experience
 there could try to diagnose it.  At very least it does not look related
 to
 the ZIL issue discussed in this thread, at least with the information
 provided, so I am not surprised that the mentioned patches do not
 affect
 it.
>>>
>>> I did the above intending to test the deadlock in my context but
>>> ended up not getting that far when I tried to make zfs handle all
>>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>>>
>>> The zfs context is a simple single partition on the boot media. I
>>> use ZFS for bectl BE use, not for other typical reasons. The media
>>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>>> swap, also on that Optane.
>>>
>>> # uname -apKU
>>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>>> amd64 amd64 1400096 1400096
>>>
>>> The GENERIC-DBG variant of the kernel did not report any issues in
>>> earlier testing.
>>>
>>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>>> installed from the same build.
>>>
>>> # zfs list
>>> NAME  USED  AVAIL  REFER
>>> MOUNTPOINT
>>> zoptb79.9G   765G96K  /zoptb
>>> zoptb/BUILDs 20.5G   765G  8.29M
>>> /usr/obj/BUILDs
>>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
>>> /usr/obj/BUILDs/main-amd64-dbg-clang
>>> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
>>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
>>> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang
>>> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
>>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
>>> zoptb/DESTDIRs   4.33G   765G   104K
>>> /usr/obj/DESTDIRs
>>> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud
>>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
>>> zoptb/ROOT   13.1G   765G96K  none
>>> zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mark Millard
On Aug 23, 2023, at 18:14, Mark Millard  wrote:

> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
> 
>> On 8/23/23, Mark Millard  wrote:
>>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>>> 
>>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>>> 
 On 22.08.2023 14:24, Mark Millard wrote:
> Alexander Motin  wrote on
> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>> I am waiting for final test results from George Wilson and then will
>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>> there are still not many reviewers for the PR, since the code is not
>> trivial, but at least with the test reports Brian Behlendorf and Mark
>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>> have tested and/or reviewed the PR, you may comment on it.
> I had written to the list that when I tried to test the system
> doing poudriere builds (initially with your patches) using
> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
> instead got only one builder that ended up active, the others
> never reaching "Builder started":
 
> Top was showing lots of "vlruwk" for the cpdup's. For example:
> . . .
> 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
> 0.00% cpdup -i0 -o ref 32
> 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
> 0.01% cpdup -i0 -o ref 31
> 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
> 0.01% cpdup -i0 -o ref 30
> 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
> 0.01% cpdup -i0 -o ref 29
> 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
> 0.01% cpdup -i0 -o ref 28
> 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
> 0.00% cpdup -i0 -o ref 27
> . . .
> But those processes did show CPU?? on occasion, as well as
> *vnode less often. None of the cpdup's was stuck in
> Removing your patches did not change the behavior.
 
 Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
 deep in that area at least recently, so somebody with more experience
 there could try to diagnose it.  At very least it does not look related to
 the ZIL issue discussed in this thread, at least with the information
 provided, so I am not surprised that the mentioned patches do not affect
 it.
>>> 
>>> I did the above intending to test the deadlock in my context but
>>> ended up not getting that far when I tried to make zfs handle all
>>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>>> 
>>> The zfs context is a simple single partition on the boot media. I
>>> use ZFS for bectl BE use, not for other typical reasons. The media
>>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>>> swap, also on that Optane.
>>> 
>>> # uname -apKU
>>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>>> amd64 amd64 1400096 1400096
>>> 
>>> The GENERIC-DBG variant of the kernel did not report any issues in
>>> earlier testing.
>>> 
>>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>>> installed from the same build.
>>> 
>>> # zfs list
>>> NAME  USED  AVAIL  REFER
>>> MOUNTPOINT
>>> zoptb79.9G   765G96K  /zoptb
>>> zoptb/BUILDs 20.5G   765G  8.29M
>>> /usr/obj/BUILDs
>>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
>>> /usr/obj/BUILDs/main-amd64-dbg-clang
>>> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
>>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
>>> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang
>>> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
>>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
>>> zoptb/DESTDIRs   4.33G   765G   104K
>>> /usr/obj/DESTDIRs
>>> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud
>>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
>>> zoptb/ROOT   13.1G   765G96K  none
>>> zoptb/ROOT/build_area_for-main-amd64 

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mark Millard
On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:

> On 8/23/23, Mark Millard  wrote:
>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>> 
>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>> 
>>> On 22.08.2023 14:24, Mark Millard wrote:
 Alexander Motin  wrote on
 Date: Tue, 22 Aug 2023 16:18:12 UTC :
> I am waiting for final test results from George Wilson and then will
> request quick merge of both to zfs-2.2-release branch. Unfortunately
> there are still not many reviewers for the PR, since the code is not
> trivial, but at least with the test reports Brian Behlendorf and Mark
> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
> have tested and/or reviewed the PR, you may comment on it.
 I had written to the list that when I tried to test the system
 doing poudriere builds (initially with your patches) using
 USE_TMPFS=no so that zfs had to deal with all the file I/O, I
 instead got only one builder that ended up active, the others
 never reaching "Builder started":
>>> 
 Top was showing lots of "vlruwk" for the cpdup's. For example:
 . . .
 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
 0.00% cpdup -i0 -o ref 32
 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
 0.01% cpdup -i0 -o ref 31
 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
 0.01% cpdup -i0 -o ref 30
 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
 0.01% cpdup -i0 -o ref 29
 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
 0.01% cpdup -i0 -o ref 28
 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
 0.00% cpdup -i0 -o ref 27
 . . .
 But those processes did show CPU?? on occasion, as well as
 *vnode less often. None of the cpdup's was stuck in
 Removing your patches did not change the behavior.
>>> 
>>> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
>>> deep in that area at least recently, so somebody with more experience
>>> there could try to diagnose it.  At very least it does not look related to
>>> the ZIL issue discussed in this thread, at least with the information
>>> provided, so I am not surprised that the mentioned patches do not affect
>>> it.
>> 
>> I did the above intending to test the deadlock in my context but
>> ended up not getting that far when I tried to make zfs handle all
>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>> 
>> The zfs context is a simple single partition on the boot media. I
>> use ZFS for bectl BE use, not for other typical reasons. The media
>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>> swap, also on that Optane.
>> 
>> # uname -apKU
>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>> amd64 amd64 1400096 1400096
>> 
>> The GENERIC-DBG variant of the kernel did not report any issues in
>> earlier testing.
>> 
>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>> installed from the same build.
>> 
>> # zfs list
>> NAME  USED  AVAIL  REFER
>> MOUNTPOINT
>> zoptb79.9G   765G96K  /zoptb
>> zoptb/BUILDs 20.5G   765G  8.29M
>> /usr/obj/BUILDs
>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
>> /usr/obj/BUILDs/main-amd64-dbg-clang
>> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
>> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
>> /usr/obj/BUILDs/main-amd64-nodbg-clang
>> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
>> zoptb/DESTDIRs   4.33G   765G   104K
>> /usr/obj/DESTDIRs
>> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
>> /usr/obj/DESTDIRs/main-amd64-poud
>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
>> zoptb/ROOT   13.1G   765G96K  none
>> zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  3.24G  none
>> zoptb/ROOT/main-amd648.04G   765G  3.23G  none
>> zoptb/poudriere  

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mateusz Guzik
On 8/23/23, Mark Millard  wrote:
> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>
> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>
>> On 22.08.2023 14:24, Mark Millard wrote:
>>> Alexander Motin  wrote on
>>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
 I am waiting for final test results from George Wilson and then will
 request quick merge of both to zfs-2.2-release branch. Unfortunately
 there are still not many reviewers for the PR, since the code is not
 trivial, but at least with the test reports Brian Behlendorf and Mark
 Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
 have tested and/or reviewed the PR, you may comment on it.
>>> I had written to the list that when I tried to test the system
>>> doing poudriere builds (initially with your patches) using
>>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>>> instead got only one builder that ended up active, the others
>>> never reaching "Builder started":
>>
>>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>>> . . .
>>>  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
>>> 0.00% cpdup -i0 -o ref 32
>>>  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
>>> 0.01% cpdup -i0 -o ref 31
>>>  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
>>> 0.01% cpdup -i0 -o ref 30
>>>  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
>>> 0.01% cpdup -i0 -o ref 29
>>>  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
>>> 0.01% cpdup -i0 -o ref 28
>>>  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
>>> 0.00% cpdup -i0 -o ref 27
>>> . . .
>>> But those processes did show CPU?? on occasion, as well as
>>> *vnode less often. None of the cpdup's was stuck in
>>> Removing your patches did not change the behavior.
>>
>> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
>> deep in that area at least recently, so somebody with more experience
>> there could try to diagnose it.  At very least it does not look related to
>> the ZIL issue discussed in this thread, at least with the information
>> provided, so I am not surprised that the mentioned patches do not affect
>> it.
>
> I did the above intending to test the deadlock in my context but
> ended up not getting that far when I tried to make zfs handle all
> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>
> The zfs context is a simple single partition on the boot media. I
> use ZFS for bectl BE use, not for other typical reasons. The media
> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
> swap, also on that Optane.
>
> # uname -apKU
> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
> amd64 amd64 1400096 1400096
>
> The GENERIC-DBG variant of the kernel did not report any issues in
> earlier testing.
>
> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
> installed from the same build.
>
> # zfs list
> NAME  USED  AVAIL  REFER
> MOUNTPOINT
> zoptb79.9G   765G96K  /zoptb
> zoptb/BUILDs 20.5G   765G  8.29M
> /usr/obj/BUILDs
> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
> /usr/obj/BUILDs/main-amd64-dbg-clang
> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
> /usr/obj/BUILDs/main-amd64-nodbg-clang
> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
> zoptb/DESTDIRs   4.33G   765G   104K
> /usr/obj/DESTDIRs
> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
> /usr/obj/DESTDIRs/main-amd64-poud
> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
> zoptb/ROOT   13.1G   765G96K  none
> zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  3.24G  none
> zoptb/ROOT/main-amd648.04G   765G  3.23G  none
> zoptb/poudriere  6.58G   765G   112K
> /usr/local/poudriere
> zoptb/poudriere/data 6.58G   765G   128K
> /usr/local/poudriere/data
> 

poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mark Millard
[Forked off the ZFS deadlock 14 discussion, per feedback.]

On Aug 23, 2023, at 11:40, Alexander Motin  wrote:

> On 22.08.2023 14:24, Mark Millard wrote:
>> Alexander Motin  wrote on
>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>>> I am waiting for final test results from George Wilson and then will
>>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>>> there are still not many reviewers for the PR, since the code is not
>>> trivial, but at least with the test reports Brian Behlendorf and Mark
>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>>> have tested and/or reviewed the PR, you may comment on it.
>> I had written to the list that when I tried to test the system
>> doing poudriere builds (initially with your patches) using
>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>> instead got only one builder that ended up active, the others
>> never reaching "Builder started":
> 
>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>> . . .
>>  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23   0.00% 
>> cpdup -i0 -o ref 32
>>  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20   0.01% 
>> cpdup -i0 -o ref 31
>>  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30   0.01% 
>> cpdup -i0 -o ref 30
>>  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18   0.01% 
>> cpdup -i0 -o ref 29
>>  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33   0.01% 
>> cpdup -i0 -o ref 28
>>  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28   0.00% 
>> cpdup -i0 -o ref 27
>> . . .
>> But those processes did show CPU?? on occasion, as well as
>> *vnode less often. None of the cpdup's was stuck in
>> Removing your patches did not change the behavior.
> 
> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not deep 
> in that area at least recently, so somebody with more experience there could 
> try to diagnose it.  At very least it does not look related to the ZIL issue 
> discussed in this thread, at least with the information provided, so I am not 
> surprised that the mentioned patches do not affect it.

I did the above intending to test the deadlock in my context but
ended up not getting that far when I tried to make zfs handle all
the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).

The zfs context is a simple single partition on the boot media. I
use ZFS for bectl BE use, not for other typical reasons. The media
here is PCIe Optane 1.4T media. The machine is a ThreadRipper
1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
swap, also on that Optane.

# uname -apKU
FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 
main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 
root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
 amd64 amd64 1400096 1400096

The GENERIC-DBG variant of the kernel did not report any issues in
earlier testing.

The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
installed from the same build.

# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
zoptb79.9G   765G96K  /zoptb
zoptb/BUILDs 20.5G   765G  8.29M  
/usr/obj/BUILDs
zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M  
/usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M  
/usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G  
/usr/obj/BUILDs/main-amd64-dbg-clang
zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M  
/usr/obj/BUILDs/main-amd64-dbg-gccxtc
zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G  
/usr/obj/BUILDs/main-amd64-nodbg-clang
zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M  
/usr/obj/BUILDs/main-amd64-nodbg-clang-alt
zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M  
/usr/obj/BUILDs/main-amd64-nodbg-gccxtc
zoptb/DESTDIRs   4.33G   765G   104K  
/usr/obj/DESTDIRs
zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G  
/usr/obj/DESTDIRs/main-amd64-poud
zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G  
/usr/obj/DESTDIRs/main-amd64-poud-bulk_a
zoptb/ROOT   13.1G   765G96K  none
zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  3.24G  none
zoptb/ROOT/main-amd648.04G   765G  3.23G  none
zoptb/poudriere  6.58G   765G   112K  
/usr/local/poudriere
zoptb/poudriere/data 6.58G   765G   128K  
/usr/local/poudriere/data
zoptb/poudriere/data/.m   112K   765G   112K  
/usr/local/poudriere/data/.m
zoptb/poudriere/data/cache   17.4M 

Re: ZFS deadlock in 14

2023-08-23 Thread Mark Millard
On Aug 23, 2023, at 11:40, Alexander Motin  wrote:

> On 22.08.2023 14:24, Mark Millard wrote:
>> Alexander Motin  wrote on
>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>>> I am waiting for final test results from George Wilson and then will
>>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>>> there are still not many reviewers for the PR, since the code is not
>>> trivial, but at least with the test reports Brian Behlendorf and Mark
>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>>> have tested and/or reviewed the PR, you may comment on it.
>> I had written to the list that when I tried to test the system
>> doing poudriere builds (initially with your patches) using
>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>> instead got only one builder that ended up active, the others
>> never reaching "Builder started":
> 
>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>> . . .
>>  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23   0.00% 
>> cpdup -i0 -o ref 32
>>  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20   0.01% 
>> cpdup -i0 -o ref 31
>>  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30   0.01% 
>> cpdup -i0 -o ref 30
>>  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18   0.01% 
>> cpdup -i0 -o ref 29
>>  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33   0.01% 
>> cpdup -i0 -o ref 28
>>  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28   0.00% 
>> cpdup -i0 -o ref 27
>> . . .
>> But those processes did show CPU?? on occasion, as well as
>> *vnode less often. None of the cpdup's was stuck in
>> Removing your patches did not change the behavior.
> 
> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not deep 
> in that area at least recently, so somebody with more experience there could 
> try to diagnose it.  At very least it does not look related to the ZIL issue 
> discussed in this thread, at least with the information provided, so I am not 
> surprised that the mentioned patches do not affect it.

Thanks for the information. Good to know. I'll redirect this to be a different 
discussion.



===
Mark Millard
marklmi at yahoo.com




Re: ZFS deadlock in 14

2023-08-23 Thread Alexander Motin

On 22.08.2023 14:24, Mark Millard wrote:

Alexander Motin  wrote on
Date: Tue, 22 Aug 2023 16:18:12 UTC :


I am waiting for final test results from George Wilson and then will
request quick merge of both to zfs-2.2-release branch. Unfortunately
there are still not many reviewers for the PR, since the code is not
trivial, but at least with the test reports Brian Behlendorf and Mark
Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
have tested and/or reviewed the PR, you may comment on it.


I had written to the list that when I tried to test the system
doing poudriere builds (initially with your patches) using
USE_TMPFS=no so that zfs had to deal with all the file I/O, I
instead got only one builder that ended up active, the others
never reaching "Builder started":



Top was showing lots of "vlruwk" for the cpdup's. For example:

. . .
  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23   0.00% 
cpdup -i0 -o ref 32
  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20   0.01% 
cpdup -i0 -o ref 31
  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30   0.01% 
cpdup -i0 -o ref 30
  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18   0.01% 
cpdup -i0 -o ref 29
  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33   0.01% 
cpdup -i0 -o ref 28
  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28   0.00% 
cpdup -i0 -o ref 27
. . .

But those processes did show CPU?? on occasion, as well as
*vnode less often. None of the cpdup's was stuck in

Removing your patches did not change the behavior.


Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not 
deep in that area at least recently, so somebody with more experience 
there could try to diagnose it.  At very least it does not look related 
to the ZIL issue discussed in this thread, at least with the information 
provided, so I am not surprised that the mentioned patches do not affect it.


--
Alexander Motin