weird hangs in current (ghc, gnucash)

2023-10-22 Thread Thomas Klausner
Hi!

I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
20) to test the rge(4) changes, and started a bulk build, and the
packages using ghc seem to wait for something and make no progress.

In one of my sandboxes there is a hs-data-array-byte build but it's not
doing anything.

The log stops at:

===> Creating toolchain wrappers for hs-data-array-byte-0.1.0.1nb2
===> Configuring for hs-data-array-byte-0.1.0.1nb2
=> Checking for portability problems in extracted files
[1 of 2] Compiling Main ( Setup.hs, Setup.o )

>From ps:

pbulk   26131  0.0  0.1 1073923564  140684 ?  Il8:23PM 0:00.23 
/usr/pkg/lib/ghc-9.4.7/bin/./ghc-9.4.7 -B/usr/pkg/lib/ghc-9.4.7/lib 
-package-env - --make Setup -dynamic 

(btw, that is a really huge process size?!)

Attaching with gdb shows me:

[Switching to LWP 20090 of process 26131]
0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
(gdb) bt
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa22f010, 
pMut=pMut@entry=0x7195fa22f038) at rts/posix/OSThreads.c:143
#3  0x7195faa903e1 in waitForWorkerCapability (task=) at 
rts/Capability.c:707
#4  yieldCapability (pCap=pCap@entry=0x7195f77fff10, 
task=task@entry=0x7195fa22f000, gcAllowed=gcAllowed@entry=true) at 
rts/Capability.c:1011
#5  0x7195faab0026 in scheduleYield (task=0x7195fa22f000, 
pcap=0x7195f77fff08) at rts/Schedule.c:709
#6  schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 
, task=task@entry=0x7195fa22f000) at rts/Schedule.c:319
#7  0x7195faab20b9 in scheduleWorker (cap=cap@entry=0x7195fab21cc0 
, task=task@entry=0x7195fa22f000) at rts/Schedule.c:2668
#8  0x7195faab78a2 in workerStart (task=0x7195fa22f000) at rts/Task.c:444
#9  0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1
#10 0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12
#11 0x0020 in ?? ()
#12 0x in ?? ()
(gdb) thread apply all bt

Thread 6 (LWP 26131 of process 26131 ""):
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2010, 
pMut=pMut@entry=0x7195fa2b2038) at rts/posix/OSThreads.c:143
#3  0x7195faa903e1 in waitForWorkerCapability (task=) at 
rts/Capability.c:707
#4  yieldCapability (pCap=pCap@entry=0x7f7fff2287c0, 
task=task@entry=0x7195fa2b2000, gcAllowed=gcAllowed@entry=true) at 
rts/Capability.c:1011
#5  0x7195faab0026 in scheduleYield (task=0x7195fa2b2000, 
pcap=0x7f7fff2287b8) at rts/Schedule.c:709
#6  schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 
, task=task@entry=0x7195fa2b2000) at rts/Schedule.c:319
#7  0x7195faab2069 in scheduleWaitThread (tso=0x4200406ce8, 
ret=ret@entry=0x0, pcap=pcap@entry=0x7f7fff228940) at rts/Schedule.c:2651
#8  0x7195faaa85fb in rts_evalLazyIO (cap=cap@entry=0x7f7fff228940, 
p=p@entry=0x1071e60, ret=ret@entry=0x0) at rts/RtsAPI.c:566
#9  0x7195faaabb48 in hs_main (argc=, argv=, 
main_closure=0x1071e60, rts_config=...) at rts/RtsMain.c:72
#10 0x01063124 in main ()

Thread 5 (LWP 7329 of process 26131 "ghc_ticker"):
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fab21bc0 
, pMut=pMut@entry=0x7195fab21b80 ) at 
rts/posix/OSThreads.c:143
#3  0x7195faae040e in itimer_thread_func (_handle_tick=0x7195faab9c57 
) at rts/posix/ticker/Pthread.c:140
#4  0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1
#5  0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12
#6  0x in ?? ()

Thread 4 (LWP 15032 of process 26131 "ghc_worker"):
#0  0x7195fa5a030a in _sys___kevent100 () from /usr/lib/libc.so.12
#1  0x7195fa97a8a7 in __kevent100 () from /usr/lib/libpthread.so.1
#2  0x7195fba014f2 in base_GHCziEventziKQueue_new12_info () from 
/usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so
#3  0x in ?? ()

Thread 3 (LWP 17781 of process 26131 "ghc_worker"):
#0  0x7195fa5a016a in poll () from /usr/lib/libc.so.12
#1  0x7195fa97ae63 in poll () from /usr/lib/libpthread.so.1
#2  0x7195fba0ff55 in ?? () from 
/usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so
#3  0x in ?? ()

Thread 2 (LWP 23219 of process 26131 "ghc_worker"):
#0  0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7195fa97dc4d in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2190, 
pMut=pMut@entry=0x7195fa2b21b8) at rts/posix/OSThreads.c:143
#3  

Re: weird hangs in current (ghc, gnucash)

2023-10-22 Thread Thomas Klausner
On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
> 20) to test the rge(4) changes, and started a bulk build, and the
> packages using ghc seem to wait for something and make no progress.
...
> I see one other new weird behaviour on that machine - gnucash doesn't
> finish starting up.

I've backed out ad's changes from the 13th, and both problems are gone.

I'll attach my local change.

Andrew, can you please take a look?

Thanks,
 Thomas
Module Name:src
Committed By:   ad
Date:   Fri Oct 13 18:48:56 UTC 2023

Modified Files:
src/sys/kern: kern_condvar.c kern_sleepq.c
src/sys/rump/librump/rumpkern: locks.c locks_up.c
src/sys/sys: condvar.h lwp.h

Log Message:
Add cv_fdrestart() (better name suggestions welcome):

Like cv_broadcast(), but make any LWPs that share the same file descriptor
table as the caller return ERESTART when resuming.  Used to dislodge LWPs
waiting for I/O that prevent a file descriptor from being closed, without
upsetting access to the file (not descriptor) made from another direction.


To generate a diff of this commit:
cvs rdiff -u -r1.59 -r1.60 src/sys/kern/kern_condvar.c
cvs rdiff -u -r1.83 -r1.84 src/sys/kern/kern_sleepq.c
cvs rdiff -u -r1.86 -r1.87 src/sys/rump/librump/rumpkern/locks.c
cvs rdiff -u -r1.12 -r1.13 src/sys/rump/librump/rumpkern/locks_up.c
cvs rdiff -u -r1.17 -r1.18 src/sys/sys/condvar.h
cvs rdiff -u -r1.227 -r1.228 src/sys/sys/lwp.h


Module Name:src
Committed By:   ad
Date:   Fri Oct 13 18:50:39 UTC 2023

Modified Files:
src/sys/kern: uipc_socket.c uipc_syscalls.c
src/sys/sys: socketvar.h

Log Message:
Use cv_fdrestart() to implement fo_restart.


To generate a diff of this commit:
cvs rdiff -u -r1.305 -r1.306 src/sys/kern/uipc_socket.c
cvs rdiff -u -r1.208 -r1.209 src/sys/kern/uipc_syscalls.c
cvs rdiff -u -r1.165 -r1.166 src/sys/sys/socketvar.h


Module Name:src
Committed By:   ad
Date:   Fri Oct 13 19:07:09 UTC 2023

Modified Files:
src/sys/ddb: db_command.c db_interface.h db_xxx.c
src/sys/kern: sys_pipe.c
src/sys/sys: pipe.h
src/usr.bin/fstat: fstat.c

Log Message:
Simplify/streamline pipes a little bit:

- Allocate only one struct pipe not two (no need to be bidirectional here).
- Then use f_flag (FREAD/FWRITE) to figure out what to do in the fileops.
- Never wake the other side or acquire long-term (I/O) lock unless needed.
- Whenever possible, defer wakeups until after locks have been released.
- Do some things locklessly in pipe_ioctl() and pipe_poll().

Some notable results:

- -30% latency on a 486DX2/66 doing 1 byte ping-pong within a single process.
- 2.5x less lock contention during "make cleandir" of src on a 48 CPU machine.
- 1.5x bandwith with 1kB messages on the same 48 CPU machine (8kB: same b/w).


To generate a diff of this commit:
cvs rdiff -u -r1.186 -r1.187 src/sys/ddb/db_command.c
cvs rdiff -u -r1.41 -r1.42 src/sys/ddb/db_interface.h
cvs rdiff -u -r1.77 -r1.78 src/sys/ddb/db_xxx.c
cvs rdiff -u -r1.164 -r1.165 src/sys/kern/sys_pipe.c
cvs rdiff -u -r1.39 -r1.40 src/sys/sys/pipe.h
cvs rdiff -u -r1.118 -r1.119 src/usr.bin/fstat/fstat.c



ad.backed.out.diff.gz
Description: Binary data


Re: weird hangs in current (ghc, gnucash)

2023-10-22 Thread Thomas Klausner
On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote:
> On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
> > 20) to test the rge(4) changes, and started a bulk build, and the
> > packages using ghc seem to wait for something and make no progress.
> ...
> > I see one other new weird behaviour on that machine - gnucash doesn't
> > finish starting up.
> 
> I've backed out ad's changes from the 13th, and both problems are gone.
> 
> I'll attach my local change.
> 
> Andrew, can you please take a look?

Two test cases to see the problem I have:

1. start gnucash, it doesn't finish starting up, the splash screen hangs.

2. cd /usr/pkgsrc/devel/hs-data-array-byte && make
   The 'build' step has two parts, it hangs after the first one.

 Thomas


Re: weird hangs in current (ghc, gnucash)

2023-10-22 Thread Markus Kilbinger
... and probably

3. PR kern/57660
https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660

Markus

Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner :
>
> On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote:
> > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
> > > 20) to test the rge(4) changes, and started a bulk build, and the
> > > packages using ghc seem to wait for something and make no progress.
> > ...
> > > I see one other new weird behaviour on that machine - gnucash doesn't
> > > finish starting up.
> >
> > I've backed out ad's changes from the 13th, and both problems are gone.
> >
> > I'll attach my local change.
> >
> > Andrew, can you please take a look?
>
> Two test cases to see the problem I have:
>
> 1. start gnucash, it doesn't finish starting up, the splash screen hangs.
>
> 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make
>The 'build' step has two parts, it hangs after the first one.
>
>  Thomas


Re: weird hangs in current (ghc, gnucash)

2023-11-01 Thread Chavdar Ivanov
This weird hang still takes place on

❯ uname -a
NetBSD ymir.lorien.lan 10.99.10 NetBSD 10.99.10 (GENERIC) #13: Mon Oct
30 19:45:39 GMT 2023
sysbu...@ymir.lorien.lan:/dumps/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/com
pile/GENERIC amd64

- again during building a haskell package:

===> Configuring for hs-tagged-0.8.8
[1 of 2] Compiling Main ( Setup.lhs, Setup.o )


Htop gives weird output for the process not-yet-created:

11506 root63   0 33283   873 S   0.0  0.0  0:00.00 |  `- make
20458 root62   0 34832   613 S   0.0  0.0  0:00.00 |  `-
/bin/sh -c set -e; test -n "" && echo 1>&2 "ERROR:"  && exit
1;  exec 3<&0;??? whil
24942 root63   0 33296   882 S   0.0  0.0  0:00.00 |  `-
/usr/bin/make _MAKE=/usr/bin/make OPSYS=NetBSD OS_VERSION=10.99.10
OPSYS_VERSION=109910 LOWE
21643 root58   0 34302   606 S   0.0  0.0  0:00.00 |  `-
/bin/sh -c set -e;? if test -n "" &&  /usr/pkg/sbin/pkg_info -K
/usr/pkg/pkgdb -qe hs
19149 root63   0 34367   920 S   0.0  0.0  0:00.00 |  `-
/usr/bin/make LOWER_OPSYS=netbsd _PKGSRC_BARRIER=yes
ALLOW_VULNERABLE_PACKAGES= reinst
23303 root58   0 33685   603 S   0.0  0.0  0:00.00 |   `-
/bin/sh -c set -e; ulimit -d `ulimit -H -d`; ulimit -v `ulimit -H -v`;
cd /usr/pkgs
27078 root21   0  256G 37735 S   0.0  0.9  0:00.00 | `-
/usr/pkg/lib/ghc-9.6.3/bin/./ghc-9.6.3 -B/usr/pkg/lib/ghc-9.6.3/lib
-package-env
22058 root   -22   0 0 0 Z   0.0  0.0  0:00.00 |
`- gcc   <==
---


I guess it is back to the kernel from the 9th of October.

Chavdar

-

On Mon, 23 Oct 2023 at 09:27, Chavdar Ivanov  wrote:
>
> I can confirm that after reverting to the kernel from 9th of October 
> devel/happy builds OK.
>
> On Mon, 23 Oct 2023 at 05:56, Markus Kilbinger  wrote:
>>
>> ... and probably
>>
>> 3. PR kern/57660
>> https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660
>>
>> Markus
>>
>> Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner :
>> >
>> > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote:
>> > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
>> > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct
>> > > > 20) to test the rge(4) changes, and started a bulk build, and the
>> > > > packages using ghc seem to wait for something and make no progress.
>> > > ...
>> > > > I see one other new weird behaviour on that machine - gnucash doesn't
>> > > > finish starting up.
>> > >
>> > > I've backed out ad's changes from the 13th, and both problems are gone.
>> > >
>> > > I'll attach my local change.
>> > >
>> > > Andrew, can you please take a look?
>> >
>> > Two test cases to see the problem I have:
>> >
>> > 1. start gnucash, it doesn't finish starting up, the splash screen hangs.
>> >
>> > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make
>> >The 'build' step has two parts, it hangs after the first one.
>> >
>> >  Thomas
>
>
>
> --
> 



-- 



Re: weird hangs in current (ghc, gnucash)

2023-11-01 Thread Thomas Klausner
Should we back out ad's changes until he has time to look at them?
 Thomas


On Wed, Nov 01, 2023 at 09:36:01AM +, Chavdar Ivanov wrote:
> This weird hang still takes place on
> 
> ❯ uname -a
> NetBSD ymir.lorien.lan 10.99.10 NetBSD 10.99.10 (GENERIC) #13: Mon Oct
> 30 19:45:39 GMT 2023
> sysbu...@ymir.lorien.lan:/dumps/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/com
> pile/GENERIC amd64
> 
> - again during building a haskell package:
> 
> ===> Configuring for hs-tagged-0.8.8
> [1 of 2] Compiling Main ( Setup.lhs, Setup.o )
> 
> 
> Htop gives weird output for the process not-yet-created:
> 
> 11506 root63   0 33283   873 S   0.0  0.0  0:00.00 |  `- make
> 20458 root62   0 34832   613 S   0.0  0.0  0:00.00 |  `-
> /bin/sh -c set -e; test -n "" && echo 1>&2 "ERROR:"  && exit
> 1;  exec 3<&0;??? whil
> 24942 root63   0 33296   882 S   0.0  0.0  0:00.00 |  `-
> /usr/bin/make _MAKE=/usr/bin/make OPSYS=NetBSD OS_VERSION=10.99.10
> OPSYS_VERSION=109910 LOWE
> 21643 root58   0 34302   606 S   0.0  0.0  0:00.00 |  `-
> /bin/sh -c set -e;? if test -n "" &&  /usr/pkg/sbin/pkg_info -K
> /usr/pkg/pkgdb -qe hs
> 19149 root63   0 34367   920 S   0.0  0.0  0:00.00 |  `-
> /usr/bin/make LOWER_OPSYS=netbsd _PKGSRC_BARRIER=yes
> ALLOW_VULNERABLE_PACKAGES= reinst
> 23303 root58   0 33685   603 S   0.0  0.0  0:00.00 |   `-
> /bin/sh -c set -e; ulimit -d `ulimit -H -d`; ulimit -v `ulimit -H -v`;
> cd /usr/pkgs
> 27078 root21   0  256G 37735 S   0.0  0.9  0:00.00 | `-
> /usr/pkg/lib/ghc-9.6.3/bin/./ghc-9.6.3 -B/usr/pkg/lib/ghc-9.6.3/lib
> -package-env
> 22058 root   -22   0 0 0 Z   0.0  0.0  0:00.00 |
> `- gcc   <==
> ---
> 
> 
> I guess it is back to the kernel from the 9th of October.
> 
> Chavdar
> 
> -
> 
> On Mon, 23 Oct 2023 at 09:27, Chavdar Ivanov  wrote:
> >
> > I can confirm that after reverting to the kernel from 9th of October 
> > devel/happy builds OK.
> >
> > On Mon, 23 Oct 2023 at 05:56, Markus Kilbinger  wrote:
> >>
> >> ... and probably
> >>
> >> 3. PR kern/57660
> >> https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660
> >>
> >> Markus
> >>
> >> Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner 
> >> :
> >> >
> >> > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote:
> >> > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote:
> >> > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to 
> >> > > > Oct
> >> > > > 20) to test the rge(4) changes, and started a bulk build, and the
> >> > > > packages using ghc seem to wait for something and make no progress.
> >> > > ...
> >> > > > I see one other new weird behaviour on that machine - gnucash doesn't
> >> > > > finish starting up.
> >> > >
> >> > > I've backed out ad's changes from the 13th, and both problems are gone.
> >> > >
> >> > > I'll attach my local change.
> >> > >
> >> > > Andrew, can you please take a look?
> >> >
> >> > Two test cases to see the problem I have:
> >> >
> >> > 1. start gnucash, it doesn't finish starting up, the splash screen hangs.
> >> >
> >> > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make
> >> >The 'build' step has two parts, it hangs after the first one.
> >> >
> >> >  Thomas
> >
> >
> >
> > --
> > 
> 
> 
> 
> -- 
> 


Re: weird hangs in current (ghc, gnucash)

2023-11-02 Thread Martin Husemann
On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote:
> Should we back out ad's changes until he has time to look at them?

I just did that on behalf of core.
Can you test if this solves your problem?

Martin


Re: weird hangs in current (ghc, gnucash)

2023-11-02 Thread Chavdar Ivanov
On Thu, 2 Nov 2023 at 10:33, Martin Husemann  wrote:
>
> On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote:
> > Should we back out ad's changes until he has time to look at them?
>
> I just did that on behalf of core.
> Can you test if this solves your problem?

I rebuilt the system a few hours ago. now previously failing packages
(e.g. devel/hs-assoc) build.

I can restart my present rolling replace, I guess.

>
> Martin

Chavdar


-- 



Re: weird hangs in current (ghc, gnucash)

2023-11-04 Thread Thomas Klausner
On Thu, Nov 02, 2023 at 11:33:54AM +0100, Martin Husemann wrote:
> On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote:
> > Should we back out ad's changes until he has time to look at them?
> 
> I just did that on behalf of core.
> Can you test if this solves your problem?

Thank you, both my test cases work again with a GENERIC.
 Thomas