Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-24 Thread Mark Millard
On Aug 24, 2023, at 00:22, Mark Millard  wrote:

> On Aug 23, 2023, at 22:54, Mateusz Guzik  wrote:
> 
>> On 8/24/23, Mark Millard  wrote:
>>> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
>>> 
 On 8/23/23, Mark Millard  wrote:
> [Forked off the ZFS deadlock 14 discussion, per feedback.]
> . . .
 
 This is a known problem, but it is unclear if you should be running
 into it in this setup.
>>> 
>>> The changed fixed the issue: so I do run into the the issue
>>> for this setup. See below.
>>> 
 Can you try again but this time *revert*
 138a5dafba312ff39ce0eefdbe34de95519e600d, like so:
 git revert 138a5dafba312ff39ce0eefdbe34de95519e600d
 
 may want to switch to a different branch first, for example: git
 checkout -b vfstesting
>>> 
>>> # git -C /usr/main-src/ diff sys/kern/vfs_subr.c
>>> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
>>> index 0f3f00abfd4a..5dff556ac258 100644
>>> --- a/sys/kern/vfs_subr.c
>>> +++ b/sys/kern/vfs_subr.c
>>> @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd)
>>>   MPASS(curthread->td_pinned > 0);
>>>   MPASS(vd->index == VDBATCH_SIZE);
>>> +   mtx_lock(_list_mtx);
>>>   critical_enter();
>>> -   if (mtx_trylock(_list_mtx)) {
>>> -   for (i = 0; i < VDBATCH_SIZE; i++) {
>>> -   vp = vd->tab[i];
>>> -   vd->tab[i] = NULL;
>>> -   TAILQ_REMOVE(_list, vp, v_vnodelist);
>>> -   TAILQ_INSERT_TAIL(_list, vp, v_vnodelist);
>>> -   MPASS(vp->v_dbatchcpu != NOCPU);
>>> -   vp->v_dbatchcpu = NOCPU;
>>> -   }
>>> -   mtx_unlock(_list_mtx);
>>> -   } else {
>>> -   for (i = 0; i < VDBATCH_SIZE; i++) {
>>> -   vp = vd->tab[i];
>>> -   vd->tab[i] = NULL;
>>> -   MPASS(vp->v_dbatchcpu != NOCPU);
>>> -   vp->v_dbatchcpu = NOCPU;
>>> -   }
>>> +   for (i = 0; i < VDBATCH_SIZE; i++) {
>>> +   vp = vd->tab[i];
>>> +   TAILQ_REMOVE(_list, vp, v_vnodelist);
>>> +   TAILQ_INSERT_TAIL(_list, vp, v_vnodelist);
>>> +   MPASS(vp->v_dbatchcpu != NOCPU);
>>> +   vp->v_dbatchcpu = NOCPU;
>>>   }
>>> +   mtx_unlock(_list_mtx);
>>> +   bzero(vd->tab, sizeof(vd->tab));
>>>   vd->index = 0;
>>>   critical_exit();
>>> }
>>> 
>>> Still with:
>>> 
>>> # grep USE_TMPFS= /usr/local/etc/poudriere.conf
>>> # EXAMPLE: USE_TMPFS="wrkdir data"
>>> #USE_TMPFS=all
>>> #USE_TMPFS="data"
>>> USE_TMPFS=no
>>> 
>>> 
>>> That allowed the other builders to eventually reach "Builder started"
>>> and later activity, [00:05:50] [27] [00:02:29] Builder started
>>> being the first non-[01] to do so, no vlruwk's observed in what
>>> I saw in top:
>>> 
>>> . . .
>>> 
>>> Now testing for the zfs deadlock issue should be possible for
>>> this setup.
>>> 
>> 
>> Thanks for testing, I wrote a fix:
>> 
>> https://people.freebsd.org/~mjg/vfs-recycle-fix.diff
>> 
>> Applies to *stock* kernel (as in without the revert).
> 
> I'm going to leave the deadlock test running for when
> I sleep tonight. So it is going to be a while before
> I get to testing this. $ work will likely happen first
> as well. (No deadlock observed yet, by the way. 6+ hrs
> and 3000+ ports built so far.)
> 
> I can easily restore the sys/kern/vfs_subr.c to then
> do normal 14.0-ALPHA2-ish based patching with: so not
> a problem. Thanks.
> 

I stopped the deadlock experiment, cleaned out the partial
bulk -a, put back the modern sys/kern/vfs_subr.c , applied
your patch, built, installed, rebooted, and started another
bulk -a run. It made progress on all the builders to and
past "Builder started":

. . .
[00:01:34] Building 34042 packages using up to 32 builders
[00:01:34] Hit CTRL+t at any time to see build progress and stats
[00:01:34] [01] [00:00:00] Builder starting
[00:01:57] [01] [00:00:23] Builder started
[00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4
[00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success
[00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
[00:03:22] [02] [00:00:00] Builder starting
[00:03:22] [03] [00:00:00] Builder starting
[00:03:22] [04] [00:00:00] Builder starting
[00:03:22] [05] [00:00:00] Builder starting
[00:03:22] [06] [00:00:00] Builder starting
[00:03:22] [07] [00:00:00] Builder starting
[00:03:22] [08] [00:00:00] Builder starting
[00:03:22] [09] [00:00:00] Builder starting
[00:03:22] [10] [00:00:00] Builder starting
[00:03:22] [11] [00:00:00] Builder starting
[00:03:22] [12] [00:00:00] Builder starting
[00:03:22] [13] [00:00:00] Builder starting
[00:03:22] [14] [00:00:00] Builder starting
[00:03:22] [15] [00:00:00] Builder starting
[00:03:22] [16] [00:00:00] Builder starting
[00:03:22] [17] [00:00:00] Builder starting
[00:03:22] [18] [00:00:00] Builder 

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-24 Thread Mark Millard
On Aug 23, 2023, at 22:54, Mateusz Guzik  wrote:

> On 8/24/23, Mark Millard  wrote:
>> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
>> 
>>> On 8/23/23, Mark Millard  wrote:
 [Forked off the ZFS deadlock 14 discussion, per feedback.]
 . . .
>>> 
>>> This is a known problem, but it is unclear if you should be running
>>> into it in this setup.
>> 
>> The changed fixed the issue: so I do run into the the issue
>> for this setup. See below.
>> 
>>> Can you try again but this time *revert*
>>> 138a5dafba312ff39ce0eefdbe34de95519e600d, like so:
>>> git revert 138a5dafba312ff39ce0eefdbe34de95519e600d
>>> 
>>> may want to switch to a different branch first, for example: git
>>> checkout -b vfstesting
>> 
>> # git -C /usr/main-src/ diff sys/kern/vfs_subr.c
>> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
>> index 0f3f00abfd4a..5dff556ac258 100644
>> --- a/sys/kern/vfs_subr.c
>> +++ b/sys/kern/vfs_subr.c
>> @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd)
>>MPASS(curthread->td_pinned > 0);
>>MPASS(vd->index == VDBATCH_SIZE);
>>  +   mtx_lock(_list_mtx);
>>critical_enter();
>> -   if (mtx_trylock(_list_mtx)) {
>> -   for (i = 0; i < VDBATCH_SIZE; i++) {
>> -   vp = vd->tab[i];
>> -   vd->tab[i] = NULL;
>> -   TAILQ_REMOVE(_list, vp, v_vnodelist);
>> -   TAILQ_INSERT_TAIL(_list, vp, v_vnodelist);
>> -   MPASS(vp->v_dbatchcpu != NOCPU);
>> -   vp->v_dbatchcpu = NOCPU;
>> -   }
>> -   mtx_unlock(_list_mtx);
>> -   } else {
>> -   for (i = 0; i < VDBATCH_SIZE; i++) {
>> -   vp = vd->tab[i];
>> -   vd->tab[i] = NULL;
>> -   MPASS(vp->v_dbatchcpu != NOCPU);
>> -   vp->v_dbatchcpu = NOCPU;
>> -   }
>> +   for (i = 0; i < VDBATCH_SIZE; i++) {
>> +   vp = vd->tab[i];
>> +   TAILQ_REMOVE(_list, vp, v_vnodelist);
>> +   TAILQ_INSERT_TAIL(_list, vp, v_vnodelist);
>> +   MPASS(vp->v_dbatchcpu != NOCPU);
>> +   vp->v_dbatchcpu = NOCPU;
>>}
>> +   mtx_unlock(_list_mtx);
>> +   bzero(vd->tab, sizeof(vd->tab));
>>vd->index = 0;
>>critical_exit();
>> }
>> 
>> Still with:
>> 
>> # grep USE_TMPFS= /usr/local/etc/poudriere.conf
>> # EXAMPLE: USE_TMPFS="wrkdir data"
>> #USE_TMPFS=all
>> #USE_TMPFS="data"
>> USE_TMPFS=no
>> 
>> 
>> That allowed the other builders to eventually reach "Builder started"
>> and later activity, [00:05:50] [27] [00:02:29] Builder started
>> being the first non-[01] to do so, no vlruwk's observed in what
>> I saw in top:
>> 
>> . . .
>> 
>> Now testing for the zfs deadlock issue should be possible for
>> this setup.
>> 
> 
> Thanks for testing, I wrote a fix:
> 
> https://people.freebsd.org/~mjg/vfs-recycle-fix.diff
> 
> Applies to *stock* kernel (as in without the revert).

I'm going to leave the deadlock test running for when
I sleep tonight. So it is going to be a while before
I get to testing this. $ work will likely happen first
as well. (No deadlock observed yet, by the way. 6+ hrs
and 3000+ ports built so far.)

I can easily restore the sys/kern/vfs_subr.c to then
do normal 14.0-ALPHA2-ish based patching with: so not
a problem. Thanks.


===
Mark Millard
marklmi at yahoo.com




Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mateusz Guzik
On 8/24/23, Mark Millard  wrote:
> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
>
>> On 8/23/23, Mark Millard  wrote:
>>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>>>
>>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>>>
 On 22.08.2023 14:24, Mark Millard wrote:
> Alexander Motin  wrote on
> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>> I am waiting for final test results from George Wilson and then will
>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>> there are still not many reviewers for the PR, since the code is not
>> trivial, but at least with the test reports Brian Behlendorf and Mark
>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>> have tested and/or reviewed the PR, you may comment on it.
> I had written to the list that when I tried to test the system
> doing poudriere builds (initially with your patches) using
> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
> instead got only one builder that ended up active, the others
> never reaching "Builder started":

> Top was showing lots of "vlruwk" for the cpdup's. For example:
> . . .
> 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
> 0.00% cpdup -i0 -o ref 32
> 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
> 0.01% cpdup -i0 -o ref 31
> 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
> 0.01% cpdup -i0 -o ref 30
> 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
> 0.01% cpdup -i0 -o ref 29
> 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
> 0.01% cpdup -i0 -o ref 28
> 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
> 0.00% cpdup -i0 -o ref 27
> . . .
> But those processes did show CPU?? on occasion, as well as
> *vnode less often. None of the cpdup's was stuck in
> Removing your patches did not change the behavior.

 Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
 deep in that area at least recently, so somebody with more experience
 there could try to diagnose it.  At very least it does not look related
 to
 the ZIL issue discussed in this thread, at least with the information
 provided, so I am not surprised that the mentioned patches do not
 affect
 it.
>>>
>>> I did the above intending to test the deadlock in my context but
>>> ended up not getting that far when I tried to make zfs handle all
>>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>>>
>>> The zfs context is a simple single partition on the boot media. I
>>> use ZFS for bectl BE use, not for other typical reasons. The media
>>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>>> swap, also on that Optane.
>>>
>>> # uname -apKU
>>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>>> amd64 amd64 1400096 1400096
>>>
>>> The GENERIC-DBG variant of the kernel did not report any issues in
>>> earlier testing.
>>>
>>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>>> installed from the same build.
>>>
>>> # zfs list
>>> NAME  USED  AVAIL  REFER
>>> MOUNTPOINT
>>> zoptb79.9G   765G96K  /zoptb
>>> zoptb/BUILDs 20.5G   765G  8.29M
>>> /usr/obj/BUILDs
>>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
>>> /usr/obj/BUILDs/main-amd64-dbg-clang
>>> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
>>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
>>> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang
>>> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
>>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
>>> zoptb/DESTDIRs   4.33G   765G   104K
>>> /usr/obj/DESTDIRs
>>> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud
>>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
>>> zoptb/ROOT   13.1G   765G96K  none
>>> zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mark Millard
On Aug 23, 2023, at 18:14, Mark Millard  wrote:

> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
> 
>> On 8/23/23, Mark Millard  wrote:
>>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>>> 
>>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>>> 
 On 22.08.2023 14:24, Mark Millard wrote:
> Alexander Motin  wrote on
> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>> I am waiting for final test results from George Wilson and then will
>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>> there are still not many reviewers for the PR, since the code is not
>> trivial, but at least with the test reports Brian Behlendorf and Mark
>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>> have tested and/or reviewed the PR, you may comment on it.
> I had written to the list that when I tried to test the system
> doing poudriere builds (initially with your patches) using
> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
> instead got only one builder that ended up active, the others
> never reaching "Builder started":
 
> Top was showing lots of "vlruwk" for the cpdup's. For example:
> . . .
> 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
> 0.00% cpdup -i0 -o ref 32
> 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
> 0.01% cpdup -i0 -o ref 31
> 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
> 0.01% cpdup -i0 -o ref 30
> 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
> 0.01% cpdup -i0 -o ref 29
> 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
> 0.01% cpdup -i0 -o ref 28
> 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
> 0.00% cpdup -i0 -o ref 27
> . . .
> But those processes did show CPU?? on occasion, as well as
> *vnode less often. None of the cpdup's was stuck in
> Removing your patches did not change the behavior.
 
 Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
 deep in that area at least recently, so somebody with more experience
 there could try to diagnose it.  At very least it does not look related to
 the ZIL issue discussed in this thread, at least with the information
 provided, so I am not surprised that the mentioned patches do not affect
 it.
>>> 
>>> I did the above intending to test the deadlock in my context but
>>> ended up not getting that far when I tried to make zfs handle all
>>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>>> 
>>> The zfs context is a simple single partition on the boot media. I
>>> use ZFS for bectl BE use, not for other typical reasons. The media
>>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>>> swap, also on that Optane.
>>> 
>>> # uname -apKU
>>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>>> amd64 amd64 1400096 1400096
>>> 
>>> The GENERIC-DBG variant of the kernel did not report any issues in
>>> earlier testing.
>>> 
>>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>>> installed from the same build.
>>> 
>>> # zfs list
>>> NAME  USED  AVAIL  REFER
>>> MOUNTPOINT
>>> zoptb79.9G   765G96K  /zoptb
>>> zoptb/BUILDs 20.5G   765G  8.29M
>>> /usr/obj/BUILDs
>>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
>>> /usr/obj/BUILDs/main-amd64-dbg-clang
>>> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
>>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
>>> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang
>>> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
>>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
>>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
>>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
>>> zoptb/DESTDIRs   4.33G   765G   104K
>>> /usr/obj/DESTDIRs
>>> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud
>>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
>>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
>>> zoptb/ROOT   13.1G   765G96K  none
>>> zoptb/ROOT/build_area_for-main-amd64 

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mark Millard
On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:

> On 8/23/23, Mark Millard  wrote:
>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>> 
>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>> 
>>> On 22.08.2023 14:24, Mark Millard wrote:
 Alexander Motin  wrote on
 Date: Tue, 22 Aug 2023 16:18:12 UTC :
> I am waiting for final test results from George Wilson and then will
> request quick merge of both to zfs-2.2-release branch. Unfortunately
> there are still not many reviewers for the PR, since the code is not
> trivial, but at least with the test reports Brian Behlendorf and Mark
> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
> have tested and/or reviewed the PR, you may comment on it.
 I had written to the list that when I tried to test the system
 doing poudriere builds (initially with your patches) using
 USE_TMPFS=no so that zfs had to deal with all the file I/O, I
 instead got only one builder that ended up active, the others
 never reaching "Builder started":
>>> 
 Top was showing lots of "vlruwk" for the cpdup's. For example:
 . . .
 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
 0.00% cpdup -i0 -o ref 32
 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
 0.01% cpdup -i0 -o ref 31
 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
 0.01% cpdup -i0 -o ref 30
 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
 0.01% cpdup -i0 -o ref 29
 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
 0.01% cpdup -i0 -o ref 28
 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
 0.00% cpdup -i0 -o ref 27
 . . .
 But those processes did show CPU?? on occasion, as well as
 *vnode less often. None of the cpdup's was stuck in
 Removing your patches did not change the behavior.
>>> 
>>> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
>>> deep in that area at least recently, so somebody with more experience
>>> there could try to diagnose it.  At very least it does not look related to
>>> the ZIL issue discussed in this thread, at least with the information
>>> provided, so I am not surprised that the mentioned patches do not affect
>>> it.
>> 
>> I did the above intending to test the deadlock in my context but
>> ended up not getting that far when I tried to make zfs handle all
>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>> 
>> The zfs context is a simple single partition on the boot media. I
>> use ZFS for bectl BE use, not for other typical reasons. The media
>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>> swap, also on that Optane.
>> 
>> # uname -apKU
>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>> amd64 amd64 1400096 1400096
>> 
>> The GENERIC-DBG variant of the kernel did not report any issues in
>> earlier testing.
>> 
>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>> installed from the same build.
>> 
>> # zfs list
>> NAME  USED  AVAIL  REFER
>> MOUNTPOINT
>> zoptb79.9G   765G96K  /zoptb
>> zoptb/BUILDs 20.5G   765G  8.29M
>> /usr/obj/BUILDs
>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
>> /usr/obj/BUILDs/main-amd64-dbg-clang
>> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
>> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
>> /usr/obj/BUILDs/main-amd64-nodbg-clang
>> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
>> zoptb/DESTDIRs   4.33G   765G   104K
>> /usr/obj/DESTDIRs
>> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
>> /usr/obj/DESTDIRs/main-amd64-poud
>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
>> zoptb/ROOT   13.1G   765G96K  none
>> zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  3.24G  none
>> zoptb/ROOT/main-amd648.04G   765G  3.23G  none
>> zoptb/poudriere  

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mateusz Guzik
On 8/23/23, Mark Millard  wrote:
> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>
> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>
>> On 22.08.2023 14:24, Mark Millard wrote:
>>> Alexander Motin  wrote on
>>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
 I am waiting for final test results from George Wilson and then will
 request quick merge of both to zfs-2.2-release branch. Unfortunately
 there are still not many reviewers for the PR, since the code is not
 trivial, but at least with the test reports Brian Behlendorf and Mark
 Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
 have tested and/or reviewed the PR, you may comment on it.
>>> I had written to the list that when I tried to test the system
>>> doing poudriere builds (initially with your patches) using
>>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>>> instead got only one builder that ended up active, the others
>>> never reaching "Builder started":
>>
>>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>>> . . .
>>>  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
>>> 0.00% cpdup -i0 -o ref 32
>>>  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
>>> 0.01% cpdup -i0 -o ref 31
>>>  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
>>> 0.01% cpdup -i0 -o ref 30
>>>  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
>>> 0.01% cpdup -i0 -o ref 29
>>>  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
>>> 0.01% cpdup -i0 -o ref 28
>>>  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
>>> 0.00% cpdup -i0 -o ref 27
>>> . . .
>>> But those processes did show CPU?? on occasion, as well as
>>> *vnode less often. None of the cpdup's was stuck in
>>> Removing your patches did not change the behavior.
>>
>> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
>> deep in that area at least recently, so somebody with more experience
>> there could try to diagnose it.  At very least it does not look related to
>> the ZIL issue discussed in this thread, at least with the information
>> provided, so I am not surprised that the mentioned patches do not affect
>> it.
>
> I did the above intending to test the deadlock in my context but
> ended up not getting that far when I tried to make zfs handle all
> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>
> The zfs context is a simple single partition on the boot media. I
> use ZFS for bectl BE use, not for other typical reasons. The media
> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
> swap, also on that Optane.
>
> # uname -apKU
> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
> amd64 amd64 1400096 1400096
>
> The GENERIC-DBG variant of the kernel did not report any issues in
> earlier testing.
>
> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
> installed from the same build.
>
> # zfs list
> NAME  USED  AVAIL  REFER
> MOUNTPOINT
> zoptb79.9G   765G96K  /zoptb
> zoptb/BUILDs 20.5G   765G  8.29M
> /usr/obj/BUILDs
> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
> zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G
> /usr/obj/BUILDs/main-amd64-dbg-clang
> zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M
> /usr/obj/BUILDs/main-amd64-dbg-gccxtc
> zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G
> /usr/obj/BUILDs/main-amd64-nodbg-clang
> zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M
> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt
> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M
> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc
> zoptb/DESTDIRs   4.33G   765G   104K
> /usr/obj/DESTDIRs
> zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G
> /usr/obj/DESTDIRs/main-amd64-poud
> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G
> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a
> zoptb/ROOT   13.1G   765G96K  none
> zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  3.24G  none
> zoptb/ROOT/main-amd648.04G   765G  3.23G  none
> zoptb/poudriere  6.58G   765G   112K
> /usr/local/poudriere
> zoptb/poudriere/data 6.58G   765G   128K
> /usr/local/poudriere/data
> 

poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mark Millard
[Forked off the ZFS deadlock 14 discussion, per feedback.]

On Aug 23, 2023, at 11:40, Alexander Motin  wrote:

> On 22.08.2023 14:24, Mark Millard wrote:
>> Alexander Motin  wrote on
>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>>> I am waiting for final test results from George Wilson and then will
>>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>>> there are still not many reviewers for the PR, since the code is not
>>> trivial, but at least with the test reports Brian Behlendorf and Mark
>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>>> have tested and/or reviewed the PR, you may comment on it.
>> I had written to the list that when I tried to test the system
>> doing poudriere builds (initially with your patches) using
>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>> instead got only one builder that ended up active, the others
>> never reaching "Builder started":
> 
>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>> . . .
>>  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23   0.00% 
>> cpdup -i0 -o ref 32
>>  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20   0.01% 
>> cpdup -i0 -o ref 31
>>  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30   0.01% 
>> cpdup -i0 -o ref 30
>>  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18   0.01% 
>> cpdup -i0 -o ref 29
>>  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33   0.01% 
>> cpdup -i0 -o ref 28
>>  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28   0.00% 
>> cpdup -i0 -o ref 27
>> . . .
>> But those processes did show CPU?? on occasion, as well as
>> *vnode less often. None of the cpdup's was stuck in
>> Removing your patches did not change the behavior.
> 
> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not deep 
> in that area at least recently, so somebody with more experience there could 
> try to diagnose it.  At very least it does not look related to the ZIL issue 
> discussed in this thread, at least with the information provided, so I am not 
> surprised that the mentioned patches do not affect it.

I did the above intending to test the deadlock in my context but
ended up not getting that far when I tried to make zfs handle all
the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).

The zfs context is a simple single partition on the boot media. I
use ZFS for bectl BE use, not for other typical reasons. The media
here is PCIe Optane 1.4T media. The machine is a ThreadRipper
1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
swap, also on that Optane.

# uname -apKU
FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 
main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 
root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
 amd64 amd64 1400096 1400096

The GENERIC-DBG variant of the kernel did not report any issues in
earlier testing.

The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
installed from the same build.

# zfs list
NAME  USED  AVAIL  REFER  MOUNTPOINT
zoptb79.9G   765G96K  /zoptb
zoptb/BUILDs 20.5G   765G  8.29M  
/usr/obj/BUILDs
zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M  
/usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M  
/usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
zoptb/BUILDs/main-amd64-dbg-clang9.96G   765G  9.96G  
/usr/obj/BUILDs/main-amd64-dbg-clang
zoptb/BUILDs/main-amd64-dbg-gccxtc   38.5M   765G  38.5M  
/usr/obj/BUILDs/main-amd64-dbg-gccxtc
zoptb/BUILDs/main-amd64-nodbg-clang  10.3G   765G  10.3G  
/usr/obj/BUILDs/main-amd64-nodbg-clang
zoptb/BUILDs/main-amd64-nodbg-clang-alt  37.2M   765G  37.2M  
/usr/obj/BUILDs/main-amd64-nodbg-clang-alt
zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M   765G  94.6M  
/usr/obj/BUILDs/main-amd64-nodbg-gccxtc
zoptb/DESTDIRs   4.33G   765G   104K  
/usr/obj/DESTDIRs
zoptb/DESTDIRs/main-amd64-poud   2.16G   765G  2.16G  
/usr/obj/DESTDIRs/main-amd64-poud
zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G   765G  2.16G  
/usr/obj/DESTDIRs/main-amd64-poud-bulk_a
zoptb/ROOT   13.1G   765G96K  none
zoptb/ROOT/build_area_for-main-amd64 5.03G   765G  3.24G  none
zoptb/ROOT/main-amd648.04G   765G  3.23G  none
zoptb/poudriere  6.58G   765G   112K  
/usr/local/poudriere
zoptb/poudriere/data 6.58G   765G   128K  
/usr/local/poudriere/data
zoptb/poudriere/data/.m   112K   765G   112K  
/usr/local/poudriere/data/.m
zoptb/poudriere/data/cache   17.4M