Re: Intermittent t_timerfd failures
Taylor R Campbell wrote: > sys/lib/libc/sys/t_timerfd.c timerfd_block is intermittently failing > on the i386 test bed. Here are some random observations from grepping the logs on various testbeds. 1. The failures have been happening for as long as the timerfd test case has existed. The test case was first run with source date 2021.09.19.15.52.55, and the first failure was soon thereafter, using source date 2021.09.21.09.24.15. 2. The execution time recorded by ATF on i386 indicates that the failed test cases completed in less than a second, while the successful ones took longer. For example: 2023/2023.07.08.15.32.58/test.log.gz:timerfd_block: [1.211653s] Passed. 2023/2023.07.08.16.13.00/test.log.gz:timerfd_block: [1.540414s] Passed. 2023/2023.07.08.17.43.13/test.log.gz:timerfd_block: [1.649699s] Passed. 2023/2023.07.08.19.10.00/test.log.gz:timerfd_block: [0.864084s] Failed: /tmp/build/2023.07.08.19.10.00-i386/src/tests/lib/libc/sys/t_timerfd.c:198: then=1368.605876566 now=1369.462708087 delta=0.856831521 2023/2023.07.08.20.02.10/test.log.gz:timerfd_block: [1.203076s] Passed. 2023/2023.07.08.23.42.48/test.log.gz:timerfd_block: [1.699977s] Passed. 2023/2023.07.09.00.01.55/test.log.gz:timerfd_block: [1.231773s] Passed. 3. The TNF amd64 testbed shows failures similar to the i386 one. Both are using "qemu -accel kvm" hosted on NetBSD 9. 4. The TNF sparc testbed shows no failures in 3456 runs (using plain qemu, no acceleration). 5. The TNF macppc testbed (also plain qemu) shows frequent failures, but unlike i386 and amd64, the execution time indicated by ATF is more than a second: ./2023/2023.07.06.07.59.00/test.log.gz:timerfd_block: [1.132080s] Passed. ./2023/2023.07.07.00.20.39/test.log.gz:timerfd_block: [1.062924s] Passed. ./2023/2023.07.07.04.43.15/test.log.gz:timerfd_block: [1.029156s] Failed: /tmp/build/2023.07.07.04.43.15-macppc/src/tests/lib/libc/sys/t_timerfd.c:198: check_value_against_bounds(delta.tv_sec, 1, 1) not met ./2023/2023.07.07.08.39.23/test.log.gz:timerfd_block: [1.056571s] Passed. ./2023/2023.07.07.20.19.08/test.log.gz:timerfd_block: [1.081552s] Passed. 6. My own amd64 testbed running on real hardware shows no failures in 205 runs. 7. My own i386 testbed using "qemu -accel kvm" on Linux shows no failures in 359 runs. -- Andreas Gustafsson, g...@netbsd.org
Re: ZFS: time to drop Big Scary Warning
Greg Troxel wrote: > > That's a good test, but how does zfs compare in for the same test with lets > > say ffs or ext2fs (filesystems that offer persistence)? > > With the same system, booted in the same way, but with 3 different > filesystems mounted on /tmp, I get similar numbers of failures: > > tmpfs 12 > ffs2 13 > zfs 18 > > So tmpfs/ffs2 are ~equal and zfs has a few more failures (but it all > looks a bit random and non-repeatable).So it's hard to sort out "zfs > is buggy" vs "some tests fail in timing-related hard-to-understand ways > and that seems provoked slightly more with /tmp on zfs". Since I'm all too familiar with the randomly failing tests in the NetBSD test suite, let me sort them out for you. Of the test cases in the zfs run that didn't match the releng results, these two are known random failures reported in PR 55770: > ./usr.bin/cc/t_tsan_data_race:data_race_pie > ./usr.bin/c++/t_tsan_data_race:data_race_pie This is probably the known random failure reported in PR 55692: > ./fs/nfs/t_rquotad:get_nfs_be_1_group This failure is already reported in PR 55603, though your report is the first of it failing on real hardware: > ./modules/t_x86_pte:svs_g_bit_set This one is reported as randomly failing in PR 55331, but since that problem appears to be tmpfs related, it may be the case that tmpfs and zfs are both buggy: > ./lib/libarchive/t_libarchive:libarchive The remaining four failed test cases are _not_ known to fail randomly, and as far as I know, do not have existing PRs. Since they all involve file system operations, it seems likely that they are in fact zfs related in some way: > ./bin/cp/t_cp:file_to_file > ./lib/libc/stdlib/t_mktemp:mktemp_large_template > ./lib/libc/sys/t_stat:stat_chflags > ./usr.bin/ztest/t_ztest:assert -- Andreas Gustafsson, g...@gson.org
Re: boot -d
Edgar Fuß wrote: > I had a look at the relevant commits > src/sys/arch/x86/include/pmap.h 1.100 > src/sys/arch/x86/x86/pmap.c 1.330 > src/sys/arch/xen/x86/xen_pmap.c 1.31 > but unfortunately am unable to back-port the second one to -8. > I know nothing about pmap, and the -current version uses PTE_P and PTE_PS > while the -8 version uses PG_V/nothing. It's probably easier to revert src/sys/arch/x86/x86/db_memrw.c 1.6. -- Andreas Gustafsson, g...@gson.org
Re: boot -d
Edgar Fuß wrote: > Real hardware (AMD64), 8.2_STABLE from yesterday, custom config. This looks like PR 53311. The commit where that problem started (src/sys/arch/x86/x86/db_memrw.c 1.6) was pulled up to to the -8 branch, and apparently the commits that fixed it were not. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
Hauke Fath wrote: > On Tue, 29 Sep 2020 11:09:25 +0300, Andreas Gustafsson wrote: > > I have now committed the code to log the message, without rate > > limiting. If a consensus should arise that rate limiting the message > > is a good idea, the code to do that should be committed by someone in > > favor of it. > > I have seen kernel subsystems spam the console to the point of making > it unusable, which can get quite awkward on production machines. > > While I don't know if this is a relevant scenario for your change, > declaring it "somebody else's problem" has a funny ring. I have already explained why I believe it's not a relevant scenario for this change. And I don't mean to make it someone else's problem - I take full responsibility for my commit, and if it turns out that it actually causes spammage, I will fix that one way or another. And similarly, if rate limiting is added and causes problems, I expect the developer adding it to take responsibility for that commit. I just don't want to end up being held responsible for problems caused by rate limiting I never wanted in the first place. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
I have now committed the code to log the message, without rate limiting. If a consensus should arise that rate limiting the message is a good idea, the code to do that should be committed by someone in favor of it. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
nia wrote: > Unless you buy one of the snake oil devices Mr. Gustafsson sells, of > course. The devices you call "snake oil" and me pushing for NetBSD to stop generating predictable keys are both aspects of me working to address entropy related vulnerabilities ever since recognizing them as a major weak link in Internet security while working on BIND 9 some 20 years ago. But of course I have no way of proving my sincere intent to you, just like you have no way of proving to me that you are not paid by the NSA to undermine the security of NetBSD. So please, let's stop with the ad hominem attacks and focus on the technical issues. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
nia wrote: > On Tue, Sep 22, 2020 at 02:32:23PM +0200, Manuel Bouyer wrote: > > OK, so the printf should never happen when the system has been properly > > configured. In this case I have no objection. > > No, it will happen frequently in VMs and on non-recent-x86 hardware. The cases of processes blocking on randomness we tend to see reports about are pkgsrc builds and web browsers. Since entropy does not run out in -current as already discussed on this list, each hanging pkgsrc build or web browser will cause the message to be logged exactly once. How often do they hang for you? If PR 55659 is fixed, there will be more cases where the message is logged (at least ssh-keyegn), but still only one per blocking process. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
Taylor R Campbell wrote: > I think this is fine, but it should be rate-limited the same way other > messages in kern_entropy.c are rate-limited. > > if (ratecheck(, )) > printf("..."); I would prefer them not to be rate limited. Repeated messages are useful because they will typically contain different PIDs and program names. Consider a system logging the following message to the console at boot time: [ 22.8406587] entropy: pid 188 (syslogd) blocking due to lack of entropy and then hanging. You hit control-c to recover, the boot continues but then immediately hangs again as another process also blocks on entropy. In this case, I would like a second message to be logged so that it is immediately apparent that another process has blocked, and which process that is. Or if you start two different daemons in parallel and both block immediately, it would be helpful to get messages identifying both of them. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
Manuel Bouyer wrote: > > In -current, entropy does not run out. > > So, how can it block ? When there's too little entropy to begin with. Once you have gathered enough, it unblocks, and never blocks again. This is assuming default settings. If you actually want entropy to run out, you can do "sysctl -w kern.entropy.depletion=1", but there's no good reason to ever do that outside of testing. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
Manuel Bouyer wrote: > If you run a dd on /dev/random I guess the system will run out of > entropy pretty fast. In -current, entropy does not run out. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
Manuel Bouyer wrote: > I think we should find and remove theses (or make them conditional) > instead of adding unconditional new ones It would already be conditional in the sense that it's only printed if the system has no entropy. If a multi-user system is lacking entropy, a user spamming the console about it is doing the administrator a favor. -- Andreas Gustafsson, g...@gson.org
Re: Logging a kernel message when blocking on entropy
Manuel Bouyer wrote: > I'm not sure we want a user-triggerable kernel printf enabled by default. > This could be used to DOS the system (especially on serial consoles) You can already trigger kernel printfs as an unprivileged user. The first one that comes to mind is "sorry, pid %d was killed: orphaned traced process", but I'm sure there are many others. -- Andreas Gustafsson, g...@gson.org
Logging a kernel message when blocking on entropy
All, The following patch will cause a kernel message to be logged when a process blocks on /dev/random or some other randomness API. It may help some users befuddled by pkgsrc builds blocking on /dev/random, and I'm finding it useful when testing changes aimed at fixing PR 55659. OK to commit? Index: src/sys/kern/kern_entropy.c === RCS file: /bracket/repo/src/sys/kern/kern_entropy.c,v retrieving revision 1.23 diff -u -r1.23 kern_entropy.c --- src/sys/kern/kern_entropy.c 14 Aug 2020 00:53:16 - 1.23 +++ src/sys/kern/kern_entropy.c 20 Sep 2020 13:53:46 - @@ -1297,6 +1297,9 @@ /* Wait for some entropy to come in and try again. */ KASSERT(E->stage >= ENTROPY_WARM); + printf("entropy: pid %d (%s) blocking due to lack of entropy\n", + curproc->p_pid, curproc->p_comm); + if (ISSET(flags, ENTROPY_SIG)) { error = cv_wait_sig(>cv, >lock); if (error) -- Andreas Gustafsson, g...@gson.org
Fix for slow run(4) configuration on OHCI/UHCI
Hi all, When I connect a USB WiFi adapter based on a Ralink RT5370 chip to a USB port that uses an OHCI or UHCI host controller, running "ifconfig run0 up" takes a very long time, about 20-30 seconds. This is because the run(4) driver writes a large amount of data to the device just two bytes at a time using the WRITE_2 command, and each write takes two full USB frames of 1 millisecond each on UHCI, or three frames on OCHI. With an EHCI or XHCI controller, the large number of transfers is not a problem as these controllers can perform multiple control transfers within a single frame. The driver already contains code to do the transfers in larger blocks using the WRITE_REGION_1 command, but it's #if'ed out with a comment saying it is "not stable on RT2860". The FreeBSD driver is similar, except that its version of the #if'ed-out code limits the transfers to a maximum of 64 bytes at a time. I have verified that enabling the use of WRITE_REGION_1 with the 64-bit limit from FreeBSD works with my RT5370 based adapter, and makes it configure about ten times faster on OHCI and UHCI. I'm now using the following patch, which enables the use of WRITE_REGION_1 using blocks of up to 64 bytes for the RT5370 only: https://www.gson.org/netbsd/patches/run-faster.patch OK to commit? -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Christos Zoulas wrote: > It could be due to tcsh doing its file descriptor dance differently... > What shell are you using? The default shell of the root user. The notion of changing the shell according to a personal preference doesn't really apply when running automated tests on a fresh automated install. The shell command used to run the tests is: mkdir /tmp/tests && \ cd /usr/tests && \ { atf-run; echo $? >/tmp/tests/test.status; } | tee /tmp/tests/test.tps | atf-report -o ticker:- -o xml:/tmp/tests/test.xml which may also matter as file descrptors are allocated for log files and pipes. But rather than manually doing a fresh install, logging in as root, and running that command, it may be easier to just run the anita command starting on line 3 of the log from the first failed test run on b5: http://releng.netbsd.org/b5reports/i386/2020/2020.03.22.00.56.45/test.log -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Robert Elz wrote: > Not an idea, but a possibility - the change to route.c (1.167) was > unimportant - it doesn't really matter (to the tests) if it does > anything useful or not - it is possible that it just happened that the > fd that the setsockopt() was being performed on was a socket (a suitable > socket) prior to the openssl update, but after that, the rump fd's > shifted around, and what the setsockopt() was operating upon was no > longer a socket. > > No idea if that is really what happened or not, but something like that > is at least plausible (even though it would seem that the changes of the > sys call having worked by accident seem to be not very high). I agree that this sounds plausible. Also, the tests never failing for Christos might then be explained by him running them in an environment that has a different number of file descriptors already in use. -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Martin Husemann wrote: > I analyzed this particular one (202 steps back because rump.netstat dumps > core) - will fix it soon. With martin's changes, the number of unexpected test failures went down from 413 to 6 on my bare metal testbed: http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.04.html#2020.04.03.16.20.52 The remaining 6 or so failures are unrelated. Thanks to everyone who helped get this fixed. Does anyone have an idea why the tests didn't start failing immediately when route.c 1.167 was committed, but only after the seemingly unrelated openssl update? -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Christos Zoulas wrote: > >That should take care of the failing network related tests that contain > >rump.route commands, but that's not all of the failing tests. > > Thanks! I fixed that now. Let's see how many break after this... 413 on my bare metal testbed (20 steps forward, 202 steps back): http://www.gson.org/netbsd/bugs/build/amd64-baremetal/commits-2020.04.html#2020.04.02.21.36.03 There have also been other commits since the previous run, so these changes in test outcomes may not all be due to your sbin/route commits. -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Christos Zoulas wrote: > All the tests are failing for you the same way: > > rump.route: SO_RERROR: Socket operation on non-socket > > I doubt that my gif change affected that. This smells to me like the rump fd > hijack is not > working either because we have some new system call involved or something is > messing > up the file descriptors. What is your build host? The tests are failing on both the TNF testbed and my own. The respective OS versions are: NetBSD babylon5.netbsd.org 8.1_STABLE NetBSD 8.1_STABLE (BABYLON5) #1: Fri Jan 24 21:50:18 UTC 2020 s...@franklin.netbsd.org:/home/netbsd/8/amd64/obj/sys/arch/amd64/compile/BABYLON5 amd64 NetBSD guido.araneus.fi 9.0 NetBSD 9.0 (GENERIC) #0: Fri Feb 14 00:06:28 UTC 2020 mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64 -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Christos Zoulas wrote: > I've been looking into this: > 1. The libcrypto/bn test just needs more time That may be. That one never failed on real hardware for me (it just went from taking 3 seconds to 14), but 200+ other test cases did fail, and still do. > 2. The gif related tests are failing because of a recent change to record mac > addresses > I committed a fix for that. Your fix didn't work; the gif tests are still failing with src/tests/net/net_common.sh 1.40: http://www.gson.org/netbsd/bugs/build/amd64-baremetal/2020/2020.03.30.13.01.39/test.html#net_if_gif_t_gif_gif_basic_ipv4overipv4 > 3. The rest of the tests (I've sampled 5 of them) don't fail for me. If you do a full release build from scratch, install the release, and run the tests in the installed release like the testbeds do, I bet they will fail for you, too. -- Andreas Gustafsson, g...@gson.org
Re: All (?) network tests failing
Martin Husemann wrote: > -current just had a serious regression in test results, it seems like > ~all networking tests are failing now: Many (most?) of these have been failing for more than a week now, as reported on current-users in http://mail-index.netbsd.org/current-users/2020/03/23/msg038127.html which identified both the commits and the developer responsible for the breakage. -- Andreas Gustafsson, g...@gson.org
re: usbhist support for urtwn
matthew green wrote: > this looks fine, but if you feel up to it, you can re-combine > the single DPRINTF() -> two calls using URTWNHIST_CALLARGS(). > if_aue.c is a recent file converted to this method. it has > the advantage of consuming fewer kernhist slots -- 1 line vs 2. There is a separate flag bit DBG_FN in urtwn_debug that enables logging of function calls, and I have already translated the sequence URTWNHIST_CALLED(); DPRINTFN(DBG_FN, ...) into URTWNHIST_CALLARGS(...). I have deliberately avoided folding DPRINTFN() calls enabled by other flag bits into URTWNHIST_CALLARGS() so that they can still be enabled and disabled separately from the logging of function calls. > nit: the new DPRINTFN(), URTWNHIST_CALLED(), and > URTWNHIST_CALLARGS() macros should use the do { .. } while(0) > idiom the old DPRINTFN() did so they remain/are safe for us in > all contexts. They already do, they're just formatted differently. -- Andreas Gustafsson, g...@gson.org
usbhist support for urtwn
Hi all, I have this patch to replace the debug printfs in sys/dev/usb/if_urtwn.c by usbhist calls, roughly modelled after the use of usbhist in if_axe.c: https://www.gson.org/netbsd/patches/urtwn-usbhist.patch OK to commit? -- Andreas Gustafsson, g...@gson.org
Re: __{read,write}_once
Maxime Villard wrote: > How about _onestep()? Or _nosplit()? Or just conclude that there are no good options and we might as well call it _once() to be compatible with other OSes. -- Andreas Gustafsson, g...@gson.org
Re: x86 bootstrap features
Michael van Elst wrote: > This assumes that code and data segment is the same and assigns > the following 64k block to the stack. We switch to protected > mode soon after, but the memory layout seems to stay the same. > Maybe changing to > > addw$0x2001, %ax > > is all that is needed. I guess we also need to bump these definitions in the Makefile? SAMISCCPPFLAGS+= -DHEAP_START=0x1 -DHEAP_LIMIT=0x3 -- Andreas Gustafsson, g...@gson.org
re: x86 bootstrap features
matthew green wrote: > more research with more systems would be good here, but perhaps > we simply need to provide multiple versions of each style (eg, > netboot + something, biosboot + something, etc.) Does anyone know the reason for the pxeboot 64k limit, and if it might be possible to increase it? -- Andreas Gustafsson, g...@gson.org
Plentiful unpredictable randomness
Taylor R Campbell wrote: > It has become popular to redefine the traditional semantics of > /dev/random or /dev/urandom so that one or both will block once at > boot until the OS thinks the entropy pool may have been seeded, and > then never block again. IMO, those are the most useful semantics for a /dev/*random device, and NetBSD ought to provide a device that works this way. This would combine the advantages of the existing /dev/random and /dev/urandom by providing randomness that is both unpredictable (like /dev/random) and plentiful (like /dev/urandom) once seeded. It would not solve the problem of /dev/random blocking forever when a system has no entropy available at all, but it would solve the more common problem of it blocking due to becoming "exhausted" from use. I expect there will be strong differences in opinion as to whether /dev/random and/or /dev/urandom should be changed to work like this, but we could start by implementing it under a third name so that at least those of us who want it can use it. An implementation might also help clarify the debate by serving as a concrete example of what the new semantics might be. > I don't want to do this because code paths that may block but only > in extreme circumstances, like early at boot on an embedded system, > are likely never to be exercised even during what might otherwise be > extensive testing, and as noted blocking when not expected can have > severe consequences. That's only an issue with changing /dev/urandom to block until seeded (I happen to think that ought to be done regardless, but reasonable people may disagree). There is no such issue with changing /dev/random to no longer block once seeded. Full disclosure: I sell hardware random number generators for a living. What I'm proposing here could in theory either increase or decrease the demand for my products - I'm not sure which. -- Andreas Gustafsson, g...@gson.org
Re: Understanding PR kern/43997 (kernel timing problems / qemu)
Frank Kardel wrote: > Fixing that requires some more work. But I am surprised that the qemu > interrupt rate is seemingly somewhat around 50Hz. > Could it be a bug in qemu getting the frequeny not right. qemu should > read the clock to get the frequencies right an possibly skip > usleeps less than i/HZ possibly managing an error-budget. I haven't > looked into qemu at all, but an error of a factor 2 looks suspicious. I fully agree. -- Andreas Gustafsson, g...@gson.org
Re: Understanding PR kern/43997 (kernel timing problems / qemu)
Robert Elz wrote: > I want to leave /bin/sh to percolate for a while, make sure there are > no issues with it as it is, before starting on the next round of > cleanups and bug fixes, so I was looking for something else to poke > my nose into ... > > [Aside: the people I added to the cc of this message are those who have > added text to PR kern/43997 and so I thought might be interested, if you're > not, just say...] > > kern/43997 is the "qemu is too slow, clock interrupts get lost, timing > gets all messed up" problem that plagues many of the ATF tests that kind > of expect time to be maintained rationally. Thank you for looking into this. > Now there's no question that qemu is slow, for example, on my amd64 Xen > DomU test system, the shell arithmetic test of ++x (etc) takes: > var_preinc: [0.077617s] Passed. > whereas from the latest completed b5 (qemu) test run (as of this e-mail) > var_preinc Passed N/A 6.200489s > > That's about 80 times slower (and most of the other tests show similar > factors). I don't think we can blame qemu for that, given what it is > doing. > > So, it is hardly surprising that, to borrow Paul's words from the PR: > On (at least) amd64 architecture, qemu cannot simulate clock > interrupts at 100Hz. I don't think the slowness of qemu's emulation is the actual cause of its inability to simulate clock interrupts at 100 Hz. Rather, I think it is more likely caused by the inability of qemu to sleep for periods shorter than 10 ms due to limitations of the underlying host OS, such as that documented in the BUGS section of nanosleep(2). That this is at least partly a host system issue is supported by the observation that when qemu is hosted on a Linux system, the timing in the NetBSD guest is much more accurate than when qemu is hosted on NetBSD, on similar hardware: NetBSD-on-qemu-on-NetBSD# time sleep 10 13.00 real 0.00 user 0.03 sys NetBSD-on-qemu-on-Linux# time sleep 10 10.13 real 0.02 user 0.02 sys If my theory is correct, there are at least three ways the problem could be fixed: - Improve the time resolution of sleeps on the host system, as recently discussed on tech-kern in a thread starting with http://mail-index.netbsd.org/tech-kern/2017/07/02/msg022024.html - Make qemu deal better with hosts unable to sleep for short periods of time, or - Make the guest system deal better with missed timer interrupts. -- Andreas Gustafsson, g...@gson.org
Re: Restricting rdtsc [was: kernel aslr]
Maxime Villard wrote: > Having read several papers on the exploitation of cache latency to defeat > aslr (kernel or not), it appears that disabling the rdtsc instruction is a > good mitigation on x86. However, some applications can legitimately use it, > so I would rather suggest restricting it to root instead. It's ASLR that's broken, not rdtsc, and I strongly object to restricting the latter just to that people can continue to gain a false sense of security from the former. -- Andreas Gustafsson, g...@gson.org
Re: netbsd-7 XEN3PAE_DOM0 crash on agp
Emmanuel Dreyfus wrote: > In case it rings a bell for someone: Trying latest netbsd-7 XEN3PAE_DOM0 > build for i386, I get a reproductible crash when attaching agp: PR 50446 may be related. -- Andreas Gustafsson, g...@gson.org
Re: CVS commit: src/sys/kern
Martin Husemann wrote: I have another evbarm machine now hanging as well, so it is not just alpha. The sparc test runs on babylon5 are affected, too. I just committed a fix, let's see if it works. -- Andreas Gustafsson, g...@netbsd.org
Re: KGDB/i386 broken/supposed to work?
Timo Buhrmester wrote: Using a GENERIC kernel with only the modifications required to enable KGDB (see bottom for config diff), I get the following behavior on the TARGET machine: | boot netbsd -d | 15741968+590492+466076 [689568+730405]=0x1161fd4 | kernel text is mapped with 4 large pages and 5 normal pages | Loaded initial symtab at 0xc110750c, strtab at 0xc11afaac, # entries 43075 | kgdb waiting...fatal breakpoint trap in supervisor mode | trap type 1 code 0 eip c02a6744 cs 8 eflags 202 cr2 0 ilevel 8 esp c1265ea0 | curlwp 0xc1078900 pid 0 lid 1 lowest kstack 0xc12632c0 There is no delay between ``kgdb waiting...'' and ``fatal breakpoint trap in supervisor mode''. I'm not sure whether or not this is the expected behavior, because eip c02a6744 is in the `breakpoint` function so that would make sense; but the documentation makes it sound like it should just say ``kgdb waiting...''. Looks like that behavior was introduced by the following commit to src/sys/arch/i386/i386/trap.c: revision 1.239 date: 2008-05-30 13:38:21 +0300; author: ad; state: Exp; lines: +10 -11; Since breakpoints don't work, dump basic info about the trap before entering the debugger. Sometimes ddb only makes the situation worse. Any idea whether a) KGDB is tested/supposed to work and b) what I might be doing wrong? KGDB over serial on i386 worked for me in January 2008. I don't know if it is still working now; the kernel debugging I've done since then has been using qemu's built-in gdb stub instead, as described in https://wiki.netbsd.org/kernel_debugging_with_qemu/. -- Andreas Gustafsson, g...@gson.org
Re: recent sysctl changes
David Laight wrote: Both of these aspects of the API are now broken: reading a 4-byte CTLTYPE_INT variable now works for any buffer size = 4 *except* 8, That wasn't the intent of the change. The intent was that if the size was 8 then the code would return a numeric value of size 8, otherwise the size would be chnaged to 4 and/or ENOMEM (stupid errno choice) returned. I understand the intent; I'm talking about the unintended consequences. and attempting to read an 8-byte CTLTYPE_QUAD variable into a buffer of less than 8 bytes is now guaranteed to yield ENOMEM *except* if the buffer size happens to be 4. A request to read a CTLTYPE_QUAD variable into a buffer that is shorter than 8 bytes has always been a programming error. An application could, for example, maintain a single, shared, malloc'ed buffer that is reused for multiple sysctl() calls and only resized on ENOMEM returns. IMO, this is allowed by the API, but with your change, a read of a CTLTYPE_QUAD variable will return the wrong result if the buffer happened to be left with a size of 4 by a previous read of a CTLTYPE_INT variable. The intent of the change was to relax that is the length happened to be 4. IMO, this behavior violates both the letter of the sysctl() man page and the principle of least astonishment. I'm not sure about the latter. I run 'sysctl -a' and find the name of the sysctl I'm interested in. The result is a small number so I pass the address and size of a integer variable and then print the result. (Or the value is rather large and I think it might exceed 2^31 so I use an int64.) The 'principle of least astonishment' would mean that I get the value that 'sysctl -a' printed. If you use a C variable of a size different than the documented size of the sysctl variable in case, your code is incorrect, and the resulting undefined behavior should not astonish you. On a BE system I have to be extremely careful with the return values from sysctl() or I see garbage. Yes, you do have to be careful, and we should not encourage carelessness by making incorrect code work in more cases. Note that code calling systctl() has to either know whether the value it is expecting is a string, structure, or number, or use the API calls that expose the kernel internals in order to find out. Not only that, but in the case of a number, it also has to know whether it's an int or a quad, or use the appropriate API calls to find out. Also, the work-around is ineffective in the case of a caller that allocates the buffer dynamically using the size given by an initial sysctl() call with oldp = NULL. Code that does that for a numeric value will be quite happy with either a 32bit of 64bit result. Not if it is expecting the other one. -- Andreas Gustafsson, g...@netbsd.org
Recent sysctl changes
I posted this on source-changes-d earlier today, but Jeff Rizzo asked me to take it to tech-kern. On Feb 27, David Laight said: Module Name: src Committed By: dsl Date: Thu Feb 27 22:50:52 UTC 2014 Modified Files: src/sys/kern: kern_sysctl.c Log Message: Allow CTLTYPE_INT and CTLTYPE_QUAD to be read and written as either 4 or 8 byte values regardless of the type. 64bit writes to 32bit variables must be valid (signed) values. 32bit reads of large values return -1. Amongst other things this should fix libm's code that reads machdep.sse as a 32bit int, but I'd changed it to 64bit (to common up some code). To generate a diff of this commit: cvs rdiff -u -r1.246 -r1.247 src/sys/kern/kern_sysctl.c In private correspondence with David, I have objected to this commit as well as his earlier change to the type of the machdep.sse sysctl variable. Regrettably, David and I have not been able to reach an agreement, and he has requested that I ask for opinions on a public list. So... 1. I object to the earlier change of the following sysctl variables from type CTLTYPE_INT to type CTLTYPE_QUAD: machdep.fpu_present machdep.osfxsr machdep.sse machdep.sse2 machdep.biosbasemem machdep.biosextmem My understanding of David's rationale for changing them is that it facilitated sharing code with the machdep.xsave_features variable which actually needs to be of type CTLTYPE_QUAD. By my reckoning, the savings from this code sharing amount to eight lines of code. Changing the types of existing sysctl variables breaks both source and binary compatibility and should not be done lightly; that's why we have both both hw.physmem and hw.physmem64, for example. Now, the types of multiple variables have been incompatibly changed for no reason other than saving a few lines of code. I believe the harm caused by the incompatible type change far outweighs the cost of a few added lines of code, and would like the original types to be restored. I am attaching a patch to do so; it also changes the newly added variables machdep.fpu_save and machdep.fpu_save_size to CTLTYPE_INT, but if David really thinks those should be CTLTYPE_QUAD, I will not argue that point. 2. I also object to the change of kern_sysctl.c 1.247. This change attempts to work around the problems caused by the changes to the variable types by making sysctl() return different types depending on the value of the *oldlenp argument. IMO, this is a bad idea. The *oldlenp argument does *not* specify the size of the data type expected by the caller, but rather the size of a buffer. The sysctl() API allows the caller to pass a buffer larger than the variable being read, and conversely, guarantees that passing a buffer that is too small results in ENOMEM. Both of these aspects of the API are now broken: reading a 4-byte CTLTYPE_INT variable now works for any buffer size = 4 *except* 8, and attempting to read an 8-byte CTLTYPE_QUAD variable into a buffer of less than 8 bytes is now guaranteed to yield ENOMEM *except* if the buffer size happens to be 4. IMO, this behavior violates both the letter of the sysctl() man page and the principle of least astonishment. Also, the work-around is ineffective in the case of a caller that allocates the buffer dynamically using the size given by an initial sysctl() call with oldp = NULL. If the original types of the sysctl variables are restored, this work-around will no longer serve a purpose, and I'm asking for it to be removed. Opinions? -- Andreas Gustafsson, g...@netbsd.org Index: x86_machdep.c === RCS file: /cvsroot/src/sys/arch/x86/x86/x86_machdep.c,v retrieving revision 1.63 diff -u -r1.63 x86_machdep.c --- x86_machdep.c 23 Feb 2014 22:38:40 - 1.63 +++ x86_machdep.c 3 Mar 2014 18:41:06 - @@ -1056,7 +1056,17 @@ } static void -const_sysctl(struct sysctllog **clog, const char *name, u_quad_t value, int tag) +const_sysctl_int(struct sysctllog **clog, const char *name, int value, int tag) +{ + sysctl_createv(clog, 0, NULL, NULL, + CTLFLAG_PERMANENT | CTLFLAG_IMMEDIATE, + CTLTYPE_INT, name, NULL, + NULL, (u_quad_t)value, NULL, 0, + CTL_MACHDEP, tag, CTL_EOL); +} + +static void +const_sysctl_quad(struct sysctllog **clog, const char *name, u_quad_t value, int tag) { sysctl_createv(clog, 0, NULL, NULL, CTLFLAG_PERMANENT | CTLFLAG_IMMEDIATE, @@ -1115,17 +1125,17 @@ CTL_MACHDEP, CTL_CREATE, CTL_EOL); /* None of these can ever change once the system has booted */ - const_sysctl(clog, fpu_present, i386_fpu_present, CPU_FPU_PRESENT); - const_sysctl(clog, osfxsr, i386_use_fxsave, CPU_OSFXSR); - const_sysctl(clog, sse, i386_has_sse, CPU_SSE); - const_sysctl(clog, sse2, i386_has_sse2, CPU_SSE2
Re: Recent sysctl changes
Thor Lancelot Simon wrote: I don't actually know of any code that hands over a wrong-size buffer and will therefore break, though. Do you? No, but I think we should aim for correctness, not just works for me. -- Andreas Gustafsson, g...@netbsd.org