Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-18 Thread Mateusz Guzik
To sum up what happened, Yamagi was kind enough to test several
patches and ultimately the issue got solved here
https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
. The patch also got merged into releng/13.0

On 3/17/21, Yamagi  wrote:
> Hi,
> me and some other users in the ##bsdforen.de IRC channel have the
> problem that during Poudriere runs processes getting stuck in the
> 'vlruwk' state.
>
> For me it's fairly reproduceable. The problems begin about 20 to 25
> minutes after I've started poudriere. At first only some ccache
> processes hang in the 'vlruwk' state, after another 2 to 3 minutes
> nearly everything hangs and the total CPU load drops to about 5%.
> When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
> until the system recovers.
>
> First the setup:
> * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
>   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
>   The guest has only zpool, the pool was created with ashift=13. The
>   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
> * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
>   either of these options lowers the probability of the problem to show
>   up significantly.
>
> I've tried several git revisions starting with 14-CURRENT at
> 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
> least one known to be good revision. No chance, even a kernel build
> from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
> +) has the problem. The problem isn't reproduceable with
> 12.2-RELEASE.
>
> The kernel stack ('procstat -kk') of a hanging process is:
> mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
> sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
> amd64_syscall+0x140 fast_syscall_common+0xf8
>
> The kernel stack of vnlru is changing, even while the processes are
> hanging:
> * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
> _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
> * fork_exit+0x80 fork_trampoline+0xe
>
> Since vnlru is accumulating CPU time it looks like it's doing at least
> something. As an educated guess I would say that vn_alloc_hard() is
> waiting a long time or even forever to allocate new vnodes.
>
> I can provide more information, I just need to know what.
>
>
> Regards,
> Yamagi
>
> --
> Homepage: https://www.yamagi.org
> Github:   https://github.com/yamagi
> GPG:  0x1D502515
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Yamagi Burmeister
This time poudriere came to an end:

  % sysctl vfs.highest_numvnodes
  vfs.highest_numvnodes: 500976

On Wed, 17 Mar 2021 18:55:43 +0100
Mateusz Guzik  wrote:

> Thanks, I'm going to have to ponder a little bit.
> 
> In the meantime can you apply this:
> https://people.freebsd.org/~mjg/maxvnodes.diff
> 
> Once you boot, tweak maxvnodes:
> sysctl kern.maxvnodes=1049226
> 
> Run poudriere. Once it finishes, inspect sysctl vfs.highest_numvnodes
> 
> On 3/17/21, Yamagi  wrote:
> > Hi Mateusz,
> > the sysctl output after about 10 minutes into the problem is attached.
> > In case that its stripped by Mailman a copy can be found here:
> > https://deponie.yamagi.org/temp/sysctl_vlruwk.txt.xz
> >
> > Regards,
> > Yamagi
> >
> > On Wed, 17 Mar 2021 15:57:59 +0100
> > Mateusz Guzik  wrote:
> >
> >> Can you reproduce the problem and run obtain "sysctl -a"?
> >>
> >> In general, there is a vnode limit which is probably too small. The
> >> reclamation mechanism is deficient in that it will eventually inject
> >> an arbitrary pause.
> >>
> >> On 3/17/21, Yamagi  wrote:
> >> > Hi,
> >> > me and some other users in the ##bsdforen.de IRC channel have the
> >> > problem that during Poudriere runs processes getting stuck in the
> >> > 'vlruwk' state.
> >> >
> >> > For me it's fairly reproduceable. The problems begin about 20 to 25
> >> > minutes after I've started poudriere. At first only some ccache
> >> > processes hang in the 'vlruwk' state, after another 2 to 3 minutes
> >> > nearly everything hangs and the total CPU load drops to about 5%.
> >> > When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
> >> > until the system recovers.
> >> >
> >> > First the setup:
> >> > * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
> >> >   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
> >> >   The guest has only zpool, the pool was created with ashift=13. The
> >> >   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
> >> > * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
> >> >   either of these options lowers the probability of the problem to show
> >> >   up significantly.
> >> >
> >> > I've tried several git revisions starting with 14-CURRENT at
> >> > 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
> >> > least one known to be good revision. No chance, even a kernel build
> >> > from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
> >> > +) has the problem. The problem isn't reproduceable with
> >> > 12.2-RELEASE.
> >> >
> >> > The kernel stack ('procstat -kk') of a hanging process is:
> >> > mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
> >> > sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
> >> > amd64_syscall+0x140 fast_syscall_common+0xf8
> >> >
> >> > The kernel stack of vnlru is changing, even while the processes are
> >> > hanging:
> >> > * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
> >> > _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
> >> > * fork_exit+0x80 fork_trampoline+0xe
> >> >
> >> > Since vnlru is accumulating CPU time it looks like it's doing at least
> >> > something. As an educated guess I would say that vn_alloc_hard() is
> >> > waiting a long time or even forever to allocate new vnodes.
> >> >
> >> > I can provide more information, I just need to know what.
> >> >
> >> >
> >> > Regards,
> >> > Yamagi
> >> >
> >> > --
> >> > Homepage: https://www.yamagi.org
> >> > Github:   https://github.com/yamagi
> >> > GPG:  0x1D502515
> >> >
> >>
> >>
> >> --
> >> Mateusz Guzik 
> >
> >
> > --
> > Homepage: https://www.yamagi.org
> > Github:   https://github.com/yamagi
> > GPG:  0x1D502515
> >
> 
> 
> -- 
> Mateusz Guzik 


-- 
Homepage: https://www.yamagi.org
Github:   https://github.com/yamagi
GPG:  0x1D502515


pgp7jQmC27DZa.pgp
Description: PGP signature


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Mateusz Guzik
Thanks, I'm going to have to ponder a little bit.

In the meantime can you apply this:
https://people.freebsd.org/~mjg/maxvnodes.diff

Once you boot, tweak maxvnodes:
sysctl kern.maxvnodes=1049226

Run poudriere. Once it finishes, inspect sysctl vfs.highest_numvnodes

On 3/17/21, Yamagi  wrote:
> Hi Mateusz,
> the sysctl output after about 10 minutes into the problem is attached.
> In case that its stripped by Mailman a copy can be found here:
> https://deponie.yamagi.org/temp/sysctl_vlruwk.txt.xz
>
> Regards,
> Yamagi
>
> On Wed, 17 Mar 2021 15:57:59 +0100
> Mateusz Guzik  wrote:
>
>> Can you reproduce the problem and run obtain "sysctl -a"?
>>
>> In general, there is a vnode limit which is probably too small. The
>> reclamation mechanism is deficient in that it will eventually inject
>> an arbitrary pause.
>>
>> On 3/17/21, Yamagi  wrote:
>> > Hi,
>> > me and some other users in the ##bsdforen.de IRC channel have the
>> > problem that during Poudriere runs processes getting stuck in the
>> > 'vlruwk' state.
>> >
>> > For me it's fairly reproduceable. The problems begin about 20 to 25
>> > minutes after I've started poudriere. At first only some ccache
>> > processes hang in the 'vlruwk' state, after another 2 to 3 minutes
>> > nearly everything hangs and the total CPU load drops to about 5%.
>> > When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
>> > until the system recovers.
>> >
>> > First the setup:
>> > * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
>> >   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
>> >   The guest has only zpool, the pool was created with ashift=13. The
>> >   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
>> > * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
>> >   either of these options lowers the probability of the problem to show
>> >   up significantly.
>> >
>> > I've tried several git revisions starting with 14-CURRENT at
>> > 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
>> > least one known to be good revision. No chance, even a kernel build
>> > from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
>> > +) has the problem. The problem isn't reproduceable with
>> > 12.2-RELEASE.
>> >
>> > The kernel stack ('procstat -kk') of a hanging process is:
>> > mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
>> > sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
>> > amd64_syscall+0x140 fast_syscall_common+0xf8
>> >
>> > The kernel stack of vnlru is changing, even while the processes are
>> > hanging:
>> > * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
>> > _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
>> > * fork_exit+0x80 fork_trampoline+0xe
>> >
>> > Since vnlru is accumulating CPU time it looks like it's doing at least
>> > something. As an educated guess I would say that vn_alloc_hard() is
>> > waiting a long time or even forever to allocate new vnodes.
>> >
>> > I can provide more information, I just need to know what.
>> >
>> >
>> > Regards,
>> > Yamagi
>> >
>> > --
>> > Homepage: https://www.yamagi.org
>> > Github:   https://github.com/yamagi
>> > GPG:  0x1D502515
>> >
>>
>>
>> --
>> Mateusz Guzik 
>
>
> --
> Homepage: https://www.yamagi.org
> Github:   https://github.com/yamagi
> GPG:  0x1D502515
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Yamagi
Hi Mateusz,
the sysctl output after about 10 minutes into the problem is attached.
In case that its stripped by Mailman a copy can be found here:
https://deponie.yamagi.org/temp/sysctl_vlruwk.txt.xz

Regards,
Yamagi

On Wed, 17 Mar 2021 15:57:59 +0100
Mateusz Guzik  wrote:

> Can you reproduce the problem and run obtain "sysctl -a"?
> 
> In general, there is a vnode limit which is probably too small. The
> reclamation mechanism is deficient in that it will eventually inject
> an arbitrary pause.
> 
> On 3/17/21, Yamagi  wrote:
> > Hi,
> > me and some other users in the ##bsdforen.de IRC channel have the
> > problem that during Poudriere runs processes getting stuck in the
> > 'vlruwk' state.
> >
> > For me it's fairly reproduceable. The problems begin about 20 to 25
> > minutes after I've started poudriere. At first only some ccache
> > processes hang in the 'vlruwk' state, after another 2 to 3 minutes
> > nearly everything hangs and the total CPU load drops to about 5%.
> > When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
> > until the system recovers.
> >
> > First the setup:
> > * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
> >   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
> >   The guest has only zpool, the pool was created with ashift=13. The
> >   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
> > * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
> >   either of these options lowers the probability of the problem to show
> >   up significantly.
> >
> > I've tried several git revisions starting with 14-CURRENT at
> > 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
> > least one known to be good revision. No chance, even a kernel build
> > from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
> > +) has the problem. The problem isn't reproduceable with
> > 12.2-RELEASE.
> >
> > The kernel stack ('procstat -kk') of a hanging process is:
> > mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
> > sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
> > amd64_syscall+0x140 fast_syscall_common+0xf8
> >
> > The kernel stack of vnlru is changing, even while the processes are
> > hanging:
> > * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
> > _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
> > * fork_exit+0x80 fork_trampoline+0xe
> >
> > Since vnlru is accumulating CPU time it looks like it's doing at least
> > something. As an educated guess I would say that vn_alloc_hard() is
> > waiting a long time or even forever to allocate new vnodes.
> >
> > I can provide more information, I just need to know what.
> >
> >
> > Regards,
> > Yamagi
> >
> > --
> > Homepage: https://www.yamagi.org
> > Github:   https://github.com/yamagi
> > GPG:  0x1D502515
> >
> 
> 
> -- 
> Mateusz Guzik 


-- 
Homepage: https://www.yamagi.org
Github:   https://github.com/yamagi
GPG:  0x1D502515


pgp_AnzseGMlt.pgp
Description: PGP signature


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Mateusz Guzik
Can you reproduce the problem and run obtain "sysctl -a"?

In general, there is a vnode limit which is probably too small. The
reclamation mechanism is deficient in that it will eventually inject
an arbitrary pause.

On 3/17/21, Yamagi  wrote:
> Hi,
> me and some other users in the ##bsdforen.de IRC channel have the
> problem that during Poudriere runs processes getting stuck in the
> 'vlruwk' state.
>
> For me it's fairly reproduceable. The problems begin about 20 to 25
> minutes after I've started poudriere. At first only some ccache
> processes hang in the 'vlruwk' state, after another 2 to 3 minutes
> nearly everything hangs and the total CPU load drops to about 5%.
> When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
> until the system recovers.
>
> First the setup:
> * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
>   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
>   The guest has only zpool, the pool was created with ashift=13. The
>   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
> * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
>   either of these options lowers the probability of the problem to show
>   up significantly.
>
> I've tried several git revisions starting with 14-CURRENT at
> 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
> least one known to be good revision. No chance, even a kernel build
> from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
> +) has the problem. The problem isn't reproduceable with
> 12.2-RELEASE.
>
> The kernel stack ('procstat -kk') of a hanging process is:
> mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
> sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
> amd64_syscall+0x140 fast_syscall_common+0xf8
>
> The kernel stack of vnlru is changing, even while the processes are
> hanging:
> * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
> _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
> * fork_exit+0x80 fork_trampoline+0xe
>
> Since vnlru is accumulating CPU time it looks like it's doing at least
> something. As an educated guess I would say that vn_alloc_hard() is
> waiting a long time or even forever to allocate new vnodes.
>
> I can provide more information, I just need to know what.
>
>
> Regards,
> Yamagi
>
> --
> Homepage: https://www.yamagi.org
> Github:   https://github.com/yamagi
> GPG:  0x1D502515
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Yamagi
Hi,
me and some other users in the ##bsdforen.de IRC channel have the
problem that during Poudriere runs processes getting stuck in the
'vlruwk' state.

For me it's fairly reproduceable. The problems begin about 20 to 25
minutes after I've started poudriere. At first only some ccache
processes hang in the 'vlruwk' state, after another 2 to 3 minutes
nearly everything hangs and the total CPU load drops to about 5%.
When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
until the system recovers.

First the setup:
* poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
  The zvol has a 8k blocksize, the guests partition are aligned to 8k.
  The guest has only zpool, the pool was created with ashift=13. The
  vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
* poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
  either of these options lowers the probability of the problem to show
  up significantly.

I've tried several git revisions starting with 14-CURRENT at
54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
least one known to be good revision. No chance, even a kernel build
from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
+) has the problem. The problem isn't reproduceable with
12.2-RELEASE.

The kernel stack ('procstat -kk') of a hanging process is:
mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
amd64_syscall+0x140 fast_syscall_common+0xf8

The kernel stack of vnlru is changing, even while the processes are
hanging:
* mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
_sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
* fork_exit+0x80 fork_trampoline+0xe

Since vnlru is accumulating CPU time it looks like it's doing at least
something. As an educated guess I would say that vn_alloc_hard() is
waiting a long time or even forever to allocate new vnodes.

I can provide more information, I just need to know what.


Regards,
Yamagi

-- 
Homepage: https://www.yamagi.org
Github:   https://github.com/yamagi
GPG:  0x1D502515


pgpysmCSnldjz.pgp
Description: PGP signature