Re: strange happenings with new system build
On Tue, 5 Mar 2024, Paul Goyette wrote: Well, I just built me a new toy and it mostly works just fine. But there are some mostly display-related strangenesses... The entire dmesg for the new build is attaached. dmesg for the old machine is not available since I had to cannabilize some parts. As a summary, the new build is a amd64 `` AMD Ryzen 9 7950X3D'' with 128GB of DDR5 on an Asus ROG Crosshair Hero motherboard 1. Using the same video card as from the original machine (identified ``NVIDIA GeForce GTX 1050 Ti''), a normal boot fails to complete s/1050/1080/ the initial mode-set that occurs during auto-config. The bottom 25% or so of the screen is broken up into 4 sets of "garbage" (looks like bar code, but not really), and blind typing results in scrolling of the 25% section, 1 set at a time. After about 10 or 15 minutes, it suddenly starts working and displays the usual xdm login dialog! 2. Things will go along nicely for (some value of) a while, and then suddenly switching to the console (via ctrl-alt-f1) hangs. This, too, eventually unhangs 3. There's a dwiic0 attached via acpi, and an iic0 attached to the dwiic0. The attach seems to succeed, but approximately 17 seconds later, immediately after a USB4 HCI fails to attach, I start to get a flurry of dwiic0: timed out waiting for rx_full intr dwiic0: timed out reading remaining 0 The messages always come in pairs, and there is roughly a 0.5sec interval between pairs. In total, there are about 150 pairs, then the messages just stop. 4. There are lots of nouveau0 errors, and they seem to correspond to mode-switch attempts: [ 1796.949817] nouveau0: autoconfiguration error: error: DRM: core notifier timeout [ 1826.944253] nouveau0: autoconfiguration error: error: DRM: core notifier timeout [ 1887.113245] nouveau0: autoconfiguration error: error: DRM: base-0: timeout [ 1889.114196] nouveau0: autoconfiguration error: error: DRM: core notifier timeout [ 1891.115161] nouveau0: autoconfiguration error: error: DRM: core notifier timeout [ 1893.196173] nouveau0: autoconfiguration error: error: DRM: core notifier timeout +-+--+--+ | Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses:| | (Retired) | 1B11 1849 721C 56C8 F63A | p...@whooppee.com| | Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | +-+--+--+ +-+--+--+ | Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses:| | (Retired) | 1B11 1849 721C 56C8 F63A | p...@whooppee.com| | Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | +-+--+--+
daily CVS update output
Updating src tree: P src/common/lib/libutil/snprintb.c P src/sys/arch/aarch64/aarch64/aarch64_reboot.c P src/sys/arch/algor/algor/machdep.c P src/sys/arch/alpha/alpha/machdep.c P src/sys/arch/amd64/amd64/machdep.c P src/sys/arch/amiga/amiga/machdep.c P src/sys/arch/amigappc/amigappc/machdep.c P src/sys/arch/amigappc/include/autoconf.h P src/sys/arch/arc/arc/machdep.c P src/sys/arch/arm/arm32/arm32_machdep.c P src/sys/arch/atari/atari/machdep.c P src/sys/arch/bebox/bebox/machdep.c P src/sys/arch/cesfic/cesfic/machdep.c P src/sys/arch/cobalt/cobalt/machdep.c P src/sys/arch/dreamcast/dreamcast/machdep.c P src/sys/arch/emips/emips/machdep.c P src/sys/arch/evbarm/imx23_olinuxino/imx23_olinuxino_machdep.c P src/sys/arch/evbmips/adm5120/machdep.c P src/sys/arch/evbmips/alchemy/machdep.c P src/sys/arch/evbmips/atheros/machdep.c P src/sys/arch/evbmips/cavium/machdep.c P src/sys/arch/evbmips/gdium/machdep.c P src/sys/arch/evbmips/ingenic/machdep.c P src/sys/arch/evbmips/loongson/machdep.c P src/sys/arch/evbmips/malta/machdep.c P src/sys/arch/evbmips/mipssim/machdep.c P src/sys/arch/evbmips/rasoc/machdep.c P src/sys/arch/evbmips/rmixl/machdep.c P src/sys/arch/evbmips/sbmips/machdep.c P src/sys/arch/evbppc/ev64260/machdep.c P src/sys/arch/evbppc/pmppc/machdep.c P src/sys/arch/evbppc/wii/machdep.c P src/sys/arch/ews4800mips/ews4800mips/machdep.c P src/sys/arch/hp300/hp300/machdep.c P src/sys/arch/hpcmips/hpcmips/machdep.c P src/sys/arch/hpcsh/hpcsh/machdep.c P src/sys/arch/hppa/hppa/machdep.c P src/sys/arch/i386/i386/machdep.c P src/sys/arch/ibmnws/ibmnws/machdep.c P src/sys/arch/landisk/landisk/machdep.c P src/sys/arch/luna68k/luna68k/machdep.c P src/sys/arch/mac68k/dev/adb_direct.c P src/sys/arch/macppc/dev/adb_direct.c P src/sys/arch/macppc/macppc/machdep.c P src/sys/arch/mipsco/mipsco/machdep.c P src/sys/arch/mvme68k/mvme68k/machdep.c P src/sys/arch/mvmeppc/mvmeppc/machdep.c P src/sys/arch/news68k/news68k/machdep.c P src/sys/arch/newsmips/newsmips/machdep.c P src/sys/arch/next68k/next68k/machdep.c P src/sys/arch/ofppc/include/autoconf.h P src/sys/arch/ofppc/ofppc/machdep.c P src/sys/arch/playstation2/playstation2/machdep.c P src/sys/arch/pmax/pmax/machdep.c P src/sys/arch/powerpc/booke/booke_machdep.c P src/sys/arch/powerpc/ibm4xx/ibm4xx_machdep.c P src/sys/arch/prep/prep/machdep.c P src/sys/arch/riscv/riscv/riscv_machdep.c P src/sys/arch/rs6000/include/autoconf.h P src/sys/arch/sandpoint/sandpoint/machdep.c P src/sys/arch/sbmips/sbmips/machdep.c P src/sys/arch/sgimips/sgimips/machdep.c P src/sys/arch/sparc/sparc/machdep.c P src/sys/arch/sparc64/sparc64/machdep.c P src/sys/arch/vax/vax/machdep.c P src/sys/arch/virt68k/dev/gfrtc_mainbus.c P src/sys/arch/virt68k/virt68k/machdep.c P src/sys/arch/x86/x86/intr.c P src/sys/arch/zaurus/zaurus/machdep.c P src/sys/dev/goldfish/gfrtc.c P src/sys/dev/ic/mc146818.c P src/sys/dev/ic/mc146818var.h P src/sys/kern/init_main.c P src/sys/kern/kern_reboot.c P src/sys/kern/subr_cpu.c P src/sys/sys/cpu.h P src/sys/uvm/uvm_page.c P src/sys/uvm/pmap/pmap.c P src/usr.bin/make/unit-tests/var-scope-local.exp P src/usr.bin/make/unit-tests/var-scope-local.mk Updating xsrc tree: Killing core files: Updating release-8 src tree (netbsd-8): Updating release-8 xsrc tree (netbsd-8): Updating release-9 src tree (netbsd-9): Updating release-9 xsrc tree (netbsd-9): Updating release-10 src tree (netbsd-10): Updating release-10 xsrc tree (netbsd-10): Updating file list: -rw-rw-r-- 1 srcmastr netbsd 45805367 Mar 6 03:14 ls-lRA.gz
re: rc.d start order
Paul Goyette writes: > On Tue, 5 Mar 2024, Paul Goyette wrote: > > > I _think_ it will work correctly if I modify fstab to refer to > > NAME=Builds instead of ccd0. I will update here after I confirm. > > Yes this seems to work. this is very much preferred. "ccd0" is the device i suspect if you re-ran 'MAKEDEV ccd0' you'd end up with a new /dev/ccd0 that is an alias for the rawpart (c or d, d for amd64.) so, perhaps the failure to run this and get a modern netbsd device name present actually got you to use the right way of talking to wedges :) .mrg.
Re: rc.d start order
Date:Tue, 5 Mar 2024 10:03:35 -0800 (PST) From:Paul Goyette Message-ID: | The resulting device, however, is a gpt device (with one wedge | named ``Builds''). Sounds normal enough. | There is an entry for ccd0 in /etc/fstab, How did that get there? fstab should have filesystem devices (here the wedge, either as its /dev/dkN name or NAME=xxx) not the name of the containing device - there's no filesystem to check there. | but there is no /dev/ccd0 Why not? MAKEDEV should make that one (not that you should be using it for the setup described) along with all the other ccd0[a-p] nodes (none of which, except one of ccd0[cd] perhaps, are useful with GPT). kre
Re: rc.d start order
On Tue, 5 Mar 2024, Paul Goyette wrote: I _think_ it will work correctly if I modify fstab to refer to NAME=Builds instead of ccd0. I will update here after I confirm. Yes this seems to work. +-+--+--+ | Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses:| | (Retired) | 1B11 1849 721C 56C8 F63A | p...@whooppee.com| | Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | +-+--+--+
Re: rc.d start order
On Mon, 4 Mar 2024, Paul Goyette wrote: Hmmm, you're right (as usual) regarding the sequence-control keywords, as verified by rcorder. There must've been something else going on. I'll have to check again after I resolve a few other issues. Ah, OK, I think I understand the problem now. The sequence of calls is correct, ccd is called befeore fsck. The ccd is created correctly. The resulting device, however, is a gpt device (with one wedge named ``Builds''). There is an entry for ccd0 in /etc/fstab, but there is no /dev/ccd0 so get the following failure logged in /var/run/rc.log Can't stat `ccd0' (No such file or directory)Can't stat ccd0: No such file or directory CAN'T CHECK FILE SYSTEM. ccd0: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY. I _think_ it will work correctly if I modify fstab to refer to NAME=Builds instead of ccd0. I will update here after I confirm. +-+--+--+ | Paul Goyette (.sig) | PGP Key fingerprint: | E-mail addresses:| | (Retired) | 1B11 1849 721C 56C8 F63A | p...@whooppee.com| | Software Developer | 6E2E 05FD 15CE 9F2D 5102 | pgoye...@netbsd.org | | & Network Engineer | | pgoyett...@gmail.com | +-+--+--+
Re: new BIND in 10.0_RC5/sparc dies w/Bus error
On 2024-03-05 1:13 am, matthew green wrote: ah. the problem is that struct isc_nmhandle grew a pointer member, adding 4 bytes to the struct size, and it uses C99 [] variable array for the final member, which is later assigned to other pointers, and this memory was now only 4-byte aligned. this hack patch works to stop named crashing for me, but i'll let christos figure out what the right general solution here is. .mrg. Index: lib/isc/netmgr/netmgr-int.h === RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v retrieving revision 1.8.2.1 diff -p -u -r1.8.2.1 netmgr-int.h --- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 - 1.8.2.1 +++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 - @@ -276,7 +276,7 @@ struct isc_nmhandle { LINK(isc_nmhandle_t) active_link; #endif void *opaque; - char extra[]; + char extra[] __attribute__((__aligned__(8))); }; typedef enum isc__netievent_type { Perhaps: union { void *p; long double d; long long lld; intmax_t im; } extra[]; Or simpler: struct { void *p; } extra[]; Does the second form work? christos
Re: new BIND in 10.0_RC5/sparc dies w/Bus error
On 2024-03-05 1:13 am, matthew green wrote: ah. the problem is that struct isc_nmhandle grew a pointer member, adding 4 bytes to the struct size, and it uses C99 [] variable array for the final member, which is later assigned to other pointers, and this memory was now only 4-byte aligned. this hack patch works to stop named crashing for me, but i'll let christos figure out what the right general solution here is. .mrg. Index: lib/isc/netmgr/netmgr-int.h === RCS file: /cvsroot/src/external/mpl/bind/dist/lib/isc/netmgr/netmgr-int.h,v retrieving revision 1.8.2.1 diff -p -u -r1.8.2.1 netmgr-int.h --- lib/isc/netmgr/netmgr-int.h 25 Feb 2024 15:47:24 - 1.8.2.1 +++ lib/isc/netmgr/netmgr-int.h 5 Mar 2024 06:12:50 - @@ -276,7 +276,7 @@ struct isc_nmhandle { LINK(isc_nmhandle_t) active_link; #endif void *opaque; - char extra[]; + char extra[] __attribute__((__aligned__(8))); }; typedef enum isc__netievent_type { Does the following work, and is it more palatable? union { void *p; long double d; long long lld; intmax_t im; } extra[]; or just: -- christos
Re: Unacceptable firefox behavior with nouveau graphics card
FWIW, I have an old radeon0 at pci1 dev 0 function 0: ATI Technologies FirePro W5000 (rev. 0x00) which works close enough to perfect for me under -current (why stick to 10.99.4, BTW ?). Firefox is GL-accelerated, cad.onshape.com/check returns rather decent numbers, comparable to what I remember I ran on the same card years ago under W10 or Linux (but it has been running NetBSD for at least 7 years now, always following -current from within a week distance). I've had no key issues with Firefox, apart from the brief period when the current version did not have accelerated GL at all and I had to revert to the LTS version, which had it. So, I can attest this card works well under NetBSD. I have had occasional issues - when I forget the screen running xfce4 and the screensaver kicks out, some of the 3D savers are perhaps buggy or something similar and I've seen the system left pingable but unable to connect to. Last time I checked my other laptop with Intel 530 graphics {and 3D controller: NVIDIA Corporation GM107M [GeForce GTX 950M] (rev a2), which I could never get to work with nouveau} also had firefox 3D accelerated and no particular issues. My current laptop also perhaps *should work* - a P16s with on-chip Radeon 680 - but I haven't bothered to test it, as the WiFi chip is not recognized (Qualcomm WCN685x), last time I checked. On Tue, 5 Mar 2024 at 12:08, Riccardo Mottola wrote: > > Hi, > > Paul Goyette wrote: > > > > 1) Does this sound familiar to anyone else? Or am I just a set-of-1 ? > > I use firefox on Linux with nouveau and it works quite well, performance > is not to the same level of a Mac or Windows "equivalent", but quite > nice. Just to mean that Firefox+Nouveau "can" work. > I wonder if it is a specific issue for you video card, an issue with > NetBSD or else. > > > > > 2) Are there any alternatives to firefox? > > Hot topic. I think you have > 1) Chromium and all its other derivates. They have their own shares of > issues. You might not like the interface, how it works > 2) Firefox.. and derivates. There you might have some luck > > I have NetBSD on a laptop with an old nvidia card. Do you have a > specific way to reproduce the lag?. e.g. startup? opening a specific > site? new tab? > Does it have to do with video? WebGL? > > > > > 3) Any recommendations on potential replacement of the video card? > > If you are on a cheap side, you could get another nvidia card, used, try > your luck and even "compare". > > You might want to tweak things in firefox. Disable Acceleration. Check > crashes. > Go into about:config and look for: > > webgl.* > > gfx.blacklist.* > > Try to tweak gfx.webrender.software.opengl > > bold items are changed values! > > Riccardo --
re: new BIND in 10.0_RC5/sparc dies w/Bus error
On Tue, 5 Mar 2024, John D. Baker wrote: > Thanks for the rapid analysis and workaround. I've applied it to my > netbsd-10 tree, rebuilt sparc and am updating now. It seems to be working now. Thanks again. -- |/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X |\ / jdbaker[snail]consolidated[flyspeck]net OpenBSDFreeBSD | X No HTML/proprietary data in email. BSD just sits there and works! |/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
Re: Unacceptable firefox behavior with nouveau graphics card
Hi, Paul Goyette wrote: 1) Does this sound familiar to anyone else? Or am I just a set-of-1 ? I use firefox on Linux with nouveau and it works quite well, performance is not to the same level of a Mac or Windows "equivalent", but quite nice. Just to mean that Firefox+Nouveau "can" work. I wonder if it is a specific issue for you video card, an issue with NetBSD or else. 2) Are there any alternatives to firefox? Hot topic. I think you have 1) Chromium and all its other derivates. They have their own shares of issues. You might not like the interface, how it works 2) Firefox.. and derivates. There you might have some luck I have NetBSD on a laptop with an old nvidia card. Do you have a specific way to reproduce the lag?. e.g. startup? opening a specific site? new tab? Does it have to do with video? WebGL? 3) Any recommendations on potential replacement of the video card? If you are on a cheap side, you could get another nvidia card, used, try your luck and even "compare". You might want to tweak things in firefox. Disable Acceleration. Check crashes. Go into about:config and look for: webgl.* gfx.blacklist.* Try to tweak gfx.webrender.software.opengl bold items are changed values! Riccardo
syslog, ENOBUFS and non-C implementations
Hi, over the last couple of months I have seen at least two non-C (rust and python) implementations of syslog() equivalent functionality causing applications written in those languages to become brittle. The reason, I hear you ask? In C, the return type of syslog() is void, so it can't return any error. Our C implementation makes a reasonable attempt at re-trying in the face of OS-errors: /* * If the send() failed, there are two likely scenarios: * 1) syslogd was restarted * 2) /dev/log is out of socket buffer space * We attempt to reconnect to /dev/log to take care of * case #1 and keep send()ing data to cover case #2 * to give syslogd a chance to empty its socket buffer. */ for (tries = 0; tries < MAXTRIES; tries++) { if (send(data->log_file, tbuf, cnt, 0) != -1) break; if (errno != ENOBUFS) { disconnectlog_r(data); connectlog_r(data); } else (void)usleep(1); } and if the number of retries is exceeded, our C code tries to send the syslog message to the console instead. However, the rust and python implementations have the possibility of returning errors or raising exceptions, but the applications using those syslog-like functions are evidently unprepared to deal with any errors from that functionality, causing those applications to exit if an error occurred during syslog'ing. This has caused me to file https://github.com/Geal/rust-syslog/issues/79 which has not seen any activity or comments since I submitted it. This issue caused the net/routinator program (an RPKI validator written in rust) to exit if I had turned up the logging level "too high" (to trigger the ENOBUFS condition) when running it on NetBSD. I worked around this issue by dialing down the syslog level in my routinator configuration. The python issue I'm having is that similarly, the sysutils/py-borgmatic package is not prepared to handle errors from syslog'ing, causing it to exit with this error message: --- Logging error --- Traceback (most recent call last): File "/usr/pkg/lib/python3.10/logging/handlers.py", line 987, in emit self.socket.send(msg) OSError: [Errno 55] No buffer space available During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/pkg/lib/python3.10/logging/handlers.py", line 991, in emit self.socket.send(msg) OSError: [Errno 55] No buffer space available Call stack: File "/usr/pkg/bin/borgmatic", line 33, in sys.exit(load_entry_point('borgmatic==1.8.5', 'console_scripts', 'borgmatic')()) File "/usr/pkg/lib/python3.10/site-packages/borgmatic/commands/borgmatic.py", line 894, in main logger.handle(log) This is operationally troublesome, to say the least, for potentially long-running programs. Since this points to the syslog code inside python itself (in this case python 3.10), I tried replicating parts of the C semantics and eliminate raising exceptions in the syslog code by applying the attached local patch. I've not yet tried to submit this one upstream, and I'm still testing it locally. IMHO, having long-running programs become brittle just because logging failed is just ... silly, and it appears that almost every programmer out there expects the C semantics that syslog()'ing never fails. Secondly: is it something particular we are doing on the NetBSD end of things which contributes to this problem? Doesn't other OSes return ENOBUFS if syslogd isn't able to keep up by consuming the messages at the receiving end? Other comments? Regards, - Håvard $NetBSD$ Introduce code to re-try sending of log message up to 10 times, and drop messages if the retry count is exceeded, instead of raising an error. Calling code is seldom prepared to handle exceptions from syslog-like functions, and become needlessly brittle if syslog-ing can raise an exception. --- Lib/logging/handlers.py.orig2024-03-05 08:27:17.479574742 + +++ Lib/logging/handlers.py @@ -25,6 +25,7 @@ To use, simply 'import logging.handlers' import io, logging, socket, os, pickle, struct, time, re from stat import ST_DEV, ST_INO, ST_MTIME +import errno import queue import threading import copy @@ -983,16 +984,39 @@ class SysLogHandler(logging.Handler): msg = msg.encode('utf-8') msg = prio + msg if self.unixsocket: -try: -self.socket.send(msg) -except OSError: -self.socket.close() -self._connect_unixsocket(self.address) -self.socket.send(msg) +tries = 10 +while tries > 0: +try: +self.socket.send(msg) +except OSError as err: +tries -= 1 +
re: new BIND in 10.0_RC5/sparc dies w/Bus error
On Tue, 5 Mar 2024, matthew green wrote: > ah. the problem is that struct isc_nmhandle grew a pointer member, > adding 4 bytes to the struct size, and it uses C99 [] variable array > for the final member, which is later assigned to other pointers, and > this memory was now only 4-byte aligned. this hack patch works to > stop named crashing for me, but i'll let christos figure out what the > right general solution here is. > > .mrg. > > Index: lib/isc/netmgr/netmgr-int.h > [diff] Thanks for the rapid analysis and workaround. I've applied it to my netbsd-10 tree, rebuilt sparc and am updating now. In the interrim, pointing the mailserver's 'resolv.conf' at my backup nameserver instead of itself and restarting sendmail has allowed mail to start working again. The backup nameserver is amd64 so should not have a problem with the changes in the new BIND when I get a chance to update it to 10.0_RC5. (I had long before set the "kern.defcorename=/var/tmp/cores/%n.core", but there was nothing there. I forget if the subdirectory "cores" will be automatically created or not, but it still isn't present on the system.) -- |/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X |\ / jdbaker[snail]consolidated[flyspeck]net OpenBSDFreeBSD | X No HTML/proprietary data in email. BSD just sits there and works! |/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645