On 25/05/23(Thu) 16:33, Kurt Miller wrote:
> On May 22, 2023, at 2:27 AM, Claudio Jeker <clau...@openbsd.org> wrote:
> > I have seen these WITNESS warnings on other systems as well. I doubt this
> > is the problem. IIRC this warning is because sys_mount() is doing it wrong
> > but it is not really an issue since sys_mount is not called often.
> 
> Yup. I see that now that I have tested witness on several arches. They all
> show this lock order reversal right after booting the system. I guess this
> means what I am seeing isn’t something that witness detects.
> 
> On -current with my T4-1, I can reliably reproduce the issues I am seeing.
> While the problem is intermittent I can’t get very far into the jdk build 
> without
> tripping it. Instructions for reproducing the issue are:
> 
> Add wxallowed to /usr/local/ and /usr/ports (or wherever WRKOBJDIR has
> been changed to)
> 
> doas pkg_add jdk zip unzip cups-libs bash gmake libiconv giflib
> 
> cd /usr/ports/devel/jdk/1.8
> FLAVOR=native_bootstrap make
> 
> There are two stages to the problem. A java command (or javac or javah)
> gets stuck making forward progress and nearly all of its cpu time is in
> sys time category. You can see this in top as 1500-3000% CPU time on
> the java process. ktrace of the process in this state shows endless
> sched_yield() calls. Debugging shows many threads in
> pthread_cond_wait(3). The condition vars are configured to use
> CLOCK_MONOTONIC.
> 
> The second stage of the problem is when things lock up. While java is
> spinning in this sched_yield() state, if you display the process arguments in
> top (pressing the right arrow) you trip the lockups. top stops responding.
> getty will reprompt if enter is pressed, but locks up if a username is 
> entered.
> Most processes lock up when doing anything after this point. ddb ps at this
> stage shows top waiting on vmmaplk and the rest of the stuck processes
> waiting on sysctllk (sshd, systat, login).

So it seems the java process is holding the `sysctl_lock' for too long
and block all other sysctl(2).  This seems wrong to me.  We should come
up with a clever way to prevent vslocking too much memory.  A single
lock obviously doesn't fly with that many CPUs. 

Once that's improved it should be easier to debug the java issue.
sched_yield() is called from _spinlock() in librthread.  sparc64 should
use the futex version of pthread_cond_wait(3) which doesn't rely on
_spinlock() and sched_yield(2).  So I'm puzzled.  This seems like a poor
man scheduling issue hoping that another thread/process will make
progress.  Can you figure out where this sched_yield() is coming from?

> I tried bisecting when this was introduced but as I go back in time with
> kernels it becomes more intermittent and I didn’t notice that so I would need
> to redo the bisecting. I can say I have seen the problem reproduce as far
> back as Feb 16th kernel. When I updated the jdk in late January I didn’t 
> notice
> it but it could have been a lucky build as I tend to only do one 
> native_bootstrap
> build of the jdk when updating as a way to test the resulting package.
> 
> Here is some sample output of top, systat and ddb ps output on -current in
> my last reproduction of the problem.
> 
> load averages: 15.27,  4.25,  1.68                                            
>                                                       
> oracle.intricatesoftware.com 16:13:10
> 64 processes: 62 idle, 2 on processor                                         
>                                                                          up 0 
> days 00:04:58
> 64  CPUs:  0.0% user,  1.8% nice, 55.5% sys,  0.4% spin,  0.1% intr, 42.2% 
> idle
> Memory: Real: 143M/2676M act/tot Free: 13G Cache: 2334M Swap: 0K/16G
>   PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
> 38582 _pbuild   10    5  290M   64M onproc/19 fsleep   17:16 2892.14% javac
> 
>    3 users Load 15.27 4.25 1.68                    oracle.intricatesof 
> 16:13:09
> 
>             memory totals (in KB)            PAGING   SWAPPING     Interrupts
>            real   virtual     free           in  out   in  out    12777 total
> Active   146752    146752 13198032   ops                                vcons0
> All     2740160   2740160 29713104   pages                            5 
> vpci0:0
>                                                                       2 mpii0
> Proc:r  d  s  w    Csw   Trp   Sys   Int   Sof  Flt       forks         mpii1
>     18   148     78089 18666  8090 94097      18891       fkppw       2 em0
>                                                           fksvm      31 ehci0
>    0.1%Int   0.3%Spn  55.1%Sys   1.8%Usr  42.7%Idle       pwait   12737 clock
> |    |    |    |    |    |    |    |    |    |    |       relck
> ============================>                             rlkok
>                                                           noram
> Namei         Sys-cache    Proc-cache    No-cache         ndcpy
>     Calls     hits    %    hits     %    miss   %         fltcp
>                                                           zfod
>                                                           cow
> Disks   cd0   sd0   sd1   sd2   cd1   sd3           66409 fmin
> seeks                                               88545 ftarg
> xfers                       3                             itarg
> speed                     18K                           2 wired
>   sec                     0.0                             pdfre
>                                                           pdscn
>                                                           pzidl       1 IPKTS
>                                                        13 kmape       1 OPKTS
> 
> ddb{0}> ps
>    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  94135  299634   9252      0  3        0x82  sysctllk      sshd
>  38582  227559  44647     55  3    0x200083  fsleep        javac
>  38582  430054  44647     55  3   0x4200083  fsleep        javac
>  38582  454002  44647     55  7   0x4200003                javac
>  38582   92205  44647     55  3   0x4200083  fsleep        javac
>  38582  391595  44647     55  7   0x4200003                javac
>  38582  306671  44647     55  7   0x4200003                javac
>  38582  444915  44647     55  7   0x4200003                javac
>  38582   68842  44647     55  7   0x4200003                javac
>  38582  210759  44647     55  7   0x4200003                javac
>  38582  295002  44647     55  3   0x4200003  uobjlk        javac
>  38582   20089  44647     55  7   0x4200003                javac
>  38582  426972  44647     55  7   0x4200003                javac
>  38582  485651  44647     55  7   0x4200003                javac
>  38582  198526  44647     55  7   0x4200003                javac
>  38582  140392  44647     55  7   0x4200003                javac
>  38582  426536  44647     55  7   0x4200003                javac
>  38582  237845  44647     55  3   0x4200003  vmmaplk       javac
>  38582   11519  44647     55  7   0x4200003                javac
>  38582  493801  44647     55  7   0x4200003                javac
>  38582  431112  44647     55  7   0x4200003                javac
>  38582    5954  44647     55  3   0x4200003  uobjlk        javac
>  38582  334521  44647     55  7   0x4200003                javac
>  38582   63717  44647     55  7   0x4200003                javac
>  38582  278840  44647     55  3   0x4200003  uobjlk        javac
>  38582  288942  44647     55  3   0x4200003  vmmaplk       javac
>  38582  358539  44647     55  3   0x4200003  vmmaplk       javac
>  38582  292341  44647     55  3   0x4200003  uobjlk        javac
>  38582   71353  44647     55  7   0x4200003                javac
>  38582  173462  44647     55  7   0x4200003                javac
>  38582  369162  44647     55  7   0x4200003                javac
>  38582  103608  44647     55  3   0x4200003  uobjlk        javac
>  38582  283692  44647     55  7   0x4200003                javac
>  38582   73296  44647     55  7   0x4200003                javac
>  38582  107123  44647     55  7   0x4200003                javac
>  38582  190345  44647     55  7   0x4200003                javac
>  38582  519032  44647     55  7   0x4200003                javac
>  38582  512468  44647     55  7   0x4200003                javac
>  38582   30652  44647     55  7   0x4200003                javac
>  38582  379236  44647     55  3   0x4200003  uobjlk        javac
>  38582  269178  44647     55  7   0x4200003                javac
>  38582  268232  44647     55  3   0x4200003  uobjlk        javac
>  38582  352770  44647     55  7   0x4200003                javac
>  38582  377632  44647     55  7   0x4200003                javac
>  38582  244819  44647     55  7   0x4200003                javac
>  38582   72610  44647     55  7   0x4200003                javac
>  38582  469953  44647     55  3   0x4200083  fsleep        javac
>  38582  371383  44647     55  3   0x4200083  fsleep        javac
>  38582  158857  44647     55  3   0x4200083  fsleep        javac
>  38582   11735  44647     55  3   0x4200083  fsleep        javac
>  38582   84102  44647     55  3   0x4200083  fsleep        javac
>  38582   75248  44647     55  3   0x4200083  fsleep        javac
>  38582  375420  44647     55  3   0x4200083  fsleep        javac
>  38582  518610  44647     55  3   0x4200083  fsleep        javac
>  38582  127230  44647     55  3   0x4200003  vmmaplk       javac
>  38582   83423  44647     55  3   0x4200083  fsleep        javac
>  38582  159603  44647     55  3   0x4200083  fsleep        javac
>  38582   31091  44647     55  3   0x4200083  fsleep        javac
>  38582  390888  44647     55  3   0x4200083  fsleep        javac
>  38582  166886  44647     55  3   0x4200083  fsleep        javac
>  38582  377933  44647     55  3   0x4200083  fsleep        javac
>  38582  376867  44647     55  3   0x4200083  fsleep        javac
>  38582  110280  44647     55  3   0x4200083  fsleep        javac
>  38582  396501  44647     55  3   0x4200083  fsleep        javac
>  38582  337157  44647     55  3   0x4200083  fsleep        javac
>  38582  312539  44647     55  3   0x4200083  fsleep        javac
>  38582  439725  44647     55  3   0x4200083  fsleep        javac
>  38582  174728  44647     55  3   0x4200083  fsleep        javac
>  38582  350117  44647     55  3   0x4200083  fsleep        javac
>  38582  336583  44647     55  3   0x4200083  fsleep        javac
>  44647   53285   6389     55  3    0x100089  sigsusp       sh
>   6389    9627   3586     55  3    0x10008b  sigsusp       sh
>   3586  519556  31484     55  3        0x83  wait          gmake
>  29401  282226  87417     55  3    0x100083  piperd        tee
>  31484  384291  87417     55  3        0x81  wait          bash
>  10686  220950  84161     55  3    0x100083  piperd        tee
>  87417   52900  84161     55  3        0x81  wait          bash
>  84161  354717  90894     55  3        0x83  wait          bash
>  90894  299418  54844     55  3    0x100089  sigsusp       sh
>  54844   38202  23322     55  3    0x10008b  sigsusp       sh
>  23322  499463  12410     55  3        0x83  wait          gmake
>   2137  306483   5971   1000  3        0x83  sysctllk      systat
>  96354  374280      1   1000  3    0x1000b0  kqread        ssh-agent
>   5971  326285  53265   1000  3    0x10008b  sigsusp       ksh
>  53265   95696  81543   1000  3        0x98  kqread        sshd
>  81543  262878   9252      0  3        0x82  kqread        sshd
>  12410   25695  53684   1000  3    0x10008b  sigsusp       make
>  53684  389550   2559   1000  3    0x10008b  sigsusp       sh
>  91269  101622  48618   1000  3    0x100003  vmmaplk       top
>  26730   50864      1   1000  3    0x1000b0  kqread        ssh-agent
>  48618  194268  38192   1000  3    0x10008b  sigsusp       ksh
>  38192   94599  15809   1000  3        0x98  kqread        sshd
>  15809  420302   9252      0  3        0x82  kqread        sshd
>   2559  172546  11253   1000  3    0x10008b  sigsusp       make
>  48347  451749      1   1000  3    0x1000b0  kqread        ssh-agent
>  11253  329919  75474   1000  3    0x10008b  sigsusp       ksh
>  75474  434695  24864   1000  3        0x98  kqread        sshd
>  24864  460675   9252      0  3        0x82  kqread        sshd
>  80339  457654      1      0  3        0x83  sysctllk      login
>   2783  471804      1      0  3    0x100098  kqread        cron
>  13986  329572      1     99  3   0x1100090  kqread        sndiod
>  17161  216329      1    110  3    0x100090  kqread        sndiod
>   7125  459497  81420     95  3   0x1100092  kqread        smtpd
>  34588  306264  81420    103  3   0x1100092  kqread        smtpd
>  35539  431962  81420     95  3   0x1100092  kqread        smtpd
>  95046  110434  81420     95  3    0x100092  kqread        smtpd
>  75832  395854  81420     95  3   0x1100092  kqread        smtpd
>  90905  131444  81420     95  3   0x1100092  kqread        smtpd
>  81420   40917      1      0  3    0x100080  kqread        smtpd
>   9252  412804      1      0  3        0x88  kqread        sshd
>  77215  300303  47896      0  3        0x80  nfsd          nfsd
>  39163  328218  47896      0  3        0x80  nfsd          nfsd
>  18940   88416  47896      0  3        0x80  nfsd          nfsd
>  30770  448875  47896      0  3        0x80  nfsd          nfsd
>  47896   35491      1      0  3        0x80  netcon        nfsd
>  69834  381834   1172      0  3        0x80  kqread        mountd
>   1172    8898      1      0  3    0x100080  kqread        mountd
>  34849  159509      1     28  3   0x1100090  kqread        portmap
>  99111  374653      1      0  3    0x100080  kqread        ntpd
>  96514  463552  69056     83  3    0x100092  sysctllk      ntpd
>  69056  185686      1     83  3   0x1100092  sysctllk      ntpd
>  76215   96020  37258     74  3   0x1100092  bpf           pflogd
>  37258  158509      1      0  3        0x80  netio         pflogd
>  74134   93758  35337     73  3   0x1100090  kqread        syslogd
>  35337  436947      1      0  3    0x100082  netio         syslogd
>  49461  134918      1      0  3    0x100080  kqread        resolvd
>  15802  138340  35598     77  3    0x100092  kqread        dhcpleased
>  85374  145907  35598     77  3    0x100092  kqread        dhcpleased
>  35598   69070      1      0  3        0x80  kqread        dhcpleased
>  78965  485143  22355    115  3    0x100092  kqread        slaacd
>  49372  473630  22355    115  3    0x100092  kqread        slaacd
>  22355  252711      1      0  3    0x100080  kqread        slaacd
>  77472  293410      0      0  3     0x14200  bored         smr
>  96287    2177      0      0  3     0x14200  pgzero        zerothread
>  22397  479660      0      0  3     0x14200  aiodoned      aiodoned
>  40550  268537      0      0  3     0x14200  syncer        update
>  30711  491254      0      0  3     0x14200  cleaner       cleaner
>  19845  282487      0      0  3     0x14200  reaper        reaper
>  85959  130144      0      0  3     0x14200  pgdaemon      pagedaemon
>  70483  182566      0      0  3     0x14200  usbtsk        usbtask
>  51296  152380      0      0  3     0x14200  usbatsk       usbatsk
>  16787  444929      0      0  3     0x14200  bored         sensors
>  63429   82409      0      0  7  0x40014200                idle63
>  92645  493403      0      0  7  0x40014200                idle62
>  20350  375563      0      0  7  0x40014200                idle61
>  35222   74521      0      0  7  0x40014200                idle60
>  33426  375581      0      0  7  0x40014200                idle59
>  22679  177213      0      0  7  0x40014200                idle58
>  14639  386227      0      0  7  0x40014200                idle57
>  39400  120262      0      0  7  0x40014200                idle56
>  12654    7800      0      0  7  0x40014200                idle55
>  48608  174405      0      0  3  0x40014200                idle54
>  94678  113021      0      0  7  0x40014200                idle53
>   6170  359112      0      0  3  0x40014200                idle52
>  92056  451703      0      0  7  0x40014200                idle51
>  94062  149790      0      0  3  0x40014200                idle50
>   7490  352645      0      0  7  0x40014200                idle49
>  58618  321162      0      0  3  0x40014200                idle48
>  34596   30496      0      0  3  0x40014200                idle47
>  42019  493114      0      0  7  0x40014200                idle46
>  19753  346089      0      0  3  0x40014200                idle45
>  35289  255668      0      0  7  0x40014200                idle44
>  71157  135665      0      0  7  0x40014200                idle43
>   2489  156561      0      0  7  0x40014200                idle42
>  69267  449921      0      0  3  0x40014200                idle41
>  17078   23764      0      0  3  0x40014200                idle40
>  18986  140559      0      0  7  0x40014200                idle39
>  31565  508859      0      0  3  0x40014200                idle38
>  83892  503397      0      0  3  0x40014200                idle37
>   3971  218332      0      0  3  0x40014200                idle36
>  56087  426177      0      0  3  0x40014200                idle35
>   8255  404604      0      0  3  0x40014200                idle34
>  88769  308362      0      0  7  0x40014200                idle33
>  17805  369954      0      0  7  0x40014200                idle32
>  44764  388232      0      0  7  0x40014200                idle31
>   6184  164272      0      0  7  0x40014200                idle30
>  57793  246870      0      0  3  0x40014200                idle29
>  94847  282657      0      0  3  0x40014200                idle28
>  20730  274685      0      0  7  0x40014200                idle27
>  49977  358582      0      0  3  0x40014200                idle26
>  76662  491808      0      0  3  0x40014200                idle25
>    339  381654      0      0  7  0x40014200                idle24
>  95798  435838      0      0  7  0x40014200                idle23
>  17729  220014      0      0  3  0x40014200                idle22
>  32933  457100      0      0  3  0x40014200                idle21
>  93991  300506      0      0  7  0x40014200                idle20
>  83270   35155      0      0  3  0x40014200                idle19
>    113  190949      0      0  7  0x40014200                idle18
>  47202   13764      0      0  3  0x40014200                idle17
>  26572   32690      0      0  7  0x40014200                idle16
>  79156   94305      0      0  3  0x40014200                idle15
>  91078  227738      0      0  3  0x40014200                idle14
>  92007  190955      0      0  3  0x40014200                idle13
>  30258  465800      0      0  7  0x40014200                idle12
>  21174  286794      0      0  7  0x40014200                idle11
>  38572  384863      0      0  3  0x40014200                idle10
>  39753  266722      0      0  7  0x40014200                idle9
>   2402  328673      0      0  3  0x40014200                idle8
>   7204  135132      0      0  3  0x40014200                idle7
>  50857  147720      0      0  3  0x40014200                idle6
>  98253  405788      0      0  3  0x40014200                idle5
>  45666  511079      0      0  3  0x40014200                idle4
>  31437  507896      0      0  3  0x40014200                idle3
>  22223  410682      0      0  7  0x40014200                idle2
>  66020  259824      0      0  3  0x40014200                idle1
>   1702  438104      0      0  3     0x14200  bored         softnet3
>  79408  470282      0      0  3     0x14200  bored         softnet2
>  65184  187165      0      0  3     0x14200  bored         softnet1
>  19984  214261      0      0  3     0x14200  bored         softnet0
>  96155  318872      0      0  3     0x14200  bored         systqmp
>  56400  116597      0      0  3     0x14200  bored         systq
>  90431  390796      0      0  3  0x40014200  bored         softclock
> *64111  191058      0      0  7  0x40014200                idle0
>  18862  308472      0      0  3     0x14200  kmalloc       kmthread
>      1  421752      0      0  3        0x82  wait          init
>      0       0     -1      0  3     0x10200  scheduler     swapper
> 
> 

Reply via email to