Re: panic in amap_wipeout()
Le 10/07/2018 à 16:44, Edgar Fuß a écrit : So the machine [...] just paniced: It's running 6.1/amd64. You are using NetBSD 6 and IPF; that's about the least bug-free configuration I can think of. Really, you should switch to NetBSD 8 - even if IPF is not maintained at least the rest of the system is. And basically no one is going to try to investigate what's wrong with your system, if the kernel you're using is reaching EOL in one or two months... Unless you can reproduce the issue on NetBSD 8.
Re: Removing bitrotted sys/dev/pci/n8 (NetOctave NSP2000)
Thanks for confirming! :-) I'll still hold my original promise of waiting a month to do so.
Re: Removing bitrotted sys/dev/pci/n8 (NetOctave NSP2000)
On Tue, Jul 10, 2018 at 06:56:53PM +, co...@sdf.org wrote: > Hi, > > The code in sys/dev/pci/n8 has bitrotted - it still makes references to > LKM_ system things, so it is unlikely it builds. > This has been the case since netbsd-6. I still have the hardware, but I seriously doubt anyone's using it. We imported this driver mostly as a testbed for hardware crypto improvements (it was not really performance-competitive any more even at the time; but we had a good relationship with the IC designer and extensive documentation on their SDK and OpenSSL modifications). The hardware is no longer made and was, as far as I know, used under NetBSD only within the engineering organization at Coyote Point, which doesn't exist any more. I think the clock's up on this one; take it out please. Thor
Re: Removing viadrm
On Mon, Jul 02, 2018 at 12:18:51PM +, co...@sdf.org wrote: > Hi folks, > > we have two ports of linux drm code. old drm, which exists because not > all devices/drivers work with the newer, also non-x86 architectures use > this. > > new drm ("drm2") which hopefully we'll transition to. > > there are two via drivers: > viadrmums (drm2) > viadrm (old drm) > > according to PR port-i386/53364, viadrm doesn't work any more. > viadrmums does, and uses the newer drm code. > > viadrmums shouldn't be significantly different in terms of support (the > driver was almost abandoned upstream until recently). > > I don't know of any non-x86 via graphics possible users. > > If nobody objects, I will delete viadrm in 1-2 weeks. Removed. https://mail-index.netbsd.org/source-changes/2018/07/10/msg096670.html And thanks to testing it, I found out that the Xorg driver was dysfunctional, too, and fixed it.
Removing bitrotted sys/dev/pci/n8 (NetOctave NSP2000)
Hi, The code in sys/dev/pci/n8 has bitrotted - it still makes references to LKM_ system things, so it is unlikely it builds. This has been the case since netbsd-6. I am interested in removing this because while playing with a text-processing tool to look for bugs, I came across this and spent some time looking at something that looked like a bug before realizing the code doesn't build. Thoughts? objections? If no one objects in a month, I will remove all the files in sys/dev/pci/n8 and all related references.
Re: Console on both VGA/Keyboard and IPMI
I asked > Is there any way to have the console (more precisely: the thing where panic > messages go and where DDB operates on) both on VGA/Physical Keyboard (for > on-site access) and something like IPMI SOL (for off-site access)? How does DDB select the device it communicates on?
Re: panic in amap_wipeout()
> Since it's a development server, I let it sit in DDB in case someone wants me > to examine something. I tried to dump (reboot 104), but that froze, so I had to press The Button, so above no longer holds, sorry.
Re: panic in amap_wipeout()
> So the machine [...] just paniced: It's running 6.1/amd64.
panic in amap_wipeout() (was: ipnat ftp proxy suddenly stopped working)
So the machine where ipnat's ftp proxy misbehaved just paniced: uvm_fault(0x8076d460, 0x0, 1) -> e fatal page fault in supervisor mode trap type 6 code 0 rip 8045102e cs 8 rflags 10212 cr2 8 cpl 0 rsp fe812a095a90 kernel: page fault trap, code=0 Stopped in pid 27044.1 (perl) atnetbsd:amap_wipeout+0x81: movq 8 (%rax),%rdx db{0}> bt amap_wipeout() at netbsd:amap_wipeout+0x81 uvm_unmap_detach() at netbsd:uvm_unmap_detach+0x43 uvmspace_free() at netbsd:uvmspace_free+0xe7 exit1() at netbsd:exit1+0x175 sys_exit() at netbsd:sys_exit+0x3e syscall() at netbsd:syscall+0xc4 db{0}> show reg ds 92d8 es 0 fs 2c80 gs 4b73 rdi 8075e100amep_list_lock rsi 0 rbp fe812a095ab0 rbx fe8294b648e8 rdx 2 rcx fe841338a140 rax 0 r8 fe83f3c192d8 r9 2 r10 0 r11 1 r12 2 r13 fe825f43cb88 r14 0 r15 fe811d870520 rip 8045102eamap_wipeout+0x81 cs 8 rflags 10212 rsp fe812a095a90 ss 10 netbsd:amap_wipeout+0x81: movq8(%rax),%rdx db{0}> Since it's a development server, I let it sit in DDB in case someone wants me to examine something.
Re: 8.0 performance issue when running build.sh?
On Tue, Jul 10, 2018 at 12:11:41PM +0200, Kamil Rytarowski wrote: > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, > the ld(1) linker has serious issues with linking Clang/LLVM single > libraries within 20 minutes. This causes frequent timeouts on the NetBSD > buildbot in the LLVM buildfarm. Timeouts were never observed in the > past, today there might be few of them daily. Sounds like a binutils issue (or something like too little RAM available on the machine). > Another observation is that grep(1) on one NetBSD server is > significantly slower between the switch from -7 to 8RC1. Please file separate PRs for each (and maybe provide some input files to reproduce the issue). Martin
Re: 8.0 performance issue when running build.sh?
On 10.07.2018 11:01, Martin Husemann wrote: > On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: >> I have no scientific data yet, but just noticed that build times on the >> auto build cluster did rise very dramatically since it has been updated >> to run NetBSD 8.0 RC2. >> >> Since builds move around build slaves sometimes (not exactly randomly, >> but anyway) I picked the alpha port as an example (the first few >> architectures in the alphabetical list get build slaves assigned pretty >> consistently). > > Here is an intermediate result from further experiments and statistics: > > - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current >and not what will be in the final 8.0 release) has a measurable performance >impact - but it is not the big issue here. > > - if we ignore netbsd-7* branches, the performance loss is reasonable >explainable by the SVS penalty - we are going to check that theory soon. > > - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad >interaction with SVS, making those build times sky rocket - if turning >off SVS does not solve this, we will need to dig deeper. > > So stay tuned, maybe only Intel to blame ;-) > > If anyone has concrete pointers for the last issue (or ideas what to change/ > measure) please speak up. > > Martin > After the switch from NetBSD-HEAD (version from 1 year ago) to 8.0RC2, the ld(1) linker has serious issues with linking Clang/LLVM single libraries within 20 minutes. This causes frequent timeouts on the NetBSD buildbot in the LLVM buildfarm. Timeouts were never observed in the past, today there might be few of them daily. We were experimenting with disabled SVS, but it didn't help. Another observation is that grep(1) on one NetBSD server is significantly slower between the switch from -7 to 8RC1. signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
Le 10/07/2018 à 11:01, Martin Husemann a écrit : On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: I have no scientific data yet, but just noticed that build times on the auto build cluster did rise very dramatically since it has been updated to run NetBSD 8.0 RC2. Since builds move around build slaves sometimes (not exactly randomly, but anyway) I picked the alpha port as an example (the first few architectures in the alphabetical list get build slaves assigned pretty consistently). Here is an intermediate result from further experiments and statistics: - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current and not what will be in the final 8.0 release) has a measurable performance impact - but it is not the big issue here. For the record: EagerFPU has a fixed performance cost, which is, saving+restoring the FPU context during each context switch. LazyFPU, however, had a variable performance cost: during context switches the FPU state was kept on the CPU, in the hope that if we switched back to the owner lwp we may not have had to do a save+restore. If we did have to, however, we needed to send an IPI, and cost(IPI+save+restore) > cost(save+restore). So LazyFPU may have been less/more expensive than EagerFPU, depending on the workload/scheduling. The reason it is more expensive for you is maybe because on your machine each lwp ("make" thread) stays on the same CPU, and the kpreemtions cause a save+restore that is not actually necessary since each CPU always comes back to the owner lwp. (As you said, also, you have the old version of EagerFPU in RC2, which is more expensive than that of the current -current, so it's part of the problem too.) I've already said it, but XSAVEOPT actually eliminates this problem, since it performs the equivalent of LazyFPU (not saving+restoring when not needed) without requiring an IPI. - if we ignore netbsd-7* branches, the performance loss is reasonable explainable by the SVS penalty - we are going to check that theory soon. - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad interaction with SVS, making those build times sky rocket - if turning off SVS does not solve this, we will need to dig deeper. So stay tuned, maybe only Intel to blame ;-) If anyone has concrete pointers for the last issue (or ideas what to change/ measure) please speak up. Not sure this is related, but it seems that the performance of build.sh on netbsd-current is oscillating as fuck. Maybe I just hadn't noticed that it was oscillating this much before, but right now a build.sh -j 4 kernel=GENERIC is in [5min; 5min35s]. Even with SpectreV2/SpectreV4/SVS/EagerFPU all disabled. It seems to me that at a time doing two or three builds was enough and you would get an oscillation that was <5s. Now it looks like I have to do more than 5 builds to get a relevant average. Maxime
Re: interesting skylake perf tidbit
Le 06/07/2018 à 21:47, Maxime Villard a écrit : I guess we should do both; use "monitor" when possible, and in the places that are still required to use "pause", use a lower BACKOFF_MIN (set at boot time, depending on the cpu model) to compensate for the increased CPU latency. Here are two patches [1] [2]. We reduce the backoff values for PAUSE, and use MWAIT instead when possible. The code for MWAIT is not very beautiful, because we need to pass the condition, and therefore we need a macro. And we do a 64bit mwait, while the value could actually be smaller. I've benchmarked the latency of PAUSE on a Kabylake (Core i5). It seems that Kabylake indeed has the same latency, because on average PAUSE takes ~136 cycles, which matches the 140c documented for Skylake. The reason for the uncertainty, is that the Kabylake microarchitecture is an optimization of Skylake, but it's not documented whether the increased latency is present (as inherited from Skylake); it looks like it is. I guess we'll have to add manually the CPU models in probe_intel_slowpause(). I don't have a lot of conviction down there... If you could measure the performance difference with the patches applied, that would be nice. Maxime [1] http://m00nbsd.net/garbage/idle/slowpause.diff [2] http://m00nbsd.net/garbage/idle/spinwmait.diff
Re: CVS commit: src/sys/arch/x86/x86
On 08.07.2018 17:44, Kamil Rytarowski wrote: > I will try to scratch a new header unaligned.h with the set of macros > and submit it to evaluation. I've prepared a scratch of unaligned.h with get_unaligned(): http://netbsd.org/~kamil/kubsan/unaligned.h There are at least two problems to proceed: 1. GCC 8.x is required for no_sanitizer attributes https://gcc.gnu.org/gcc-8/changes.html This version will also ship with NetBSD code for sanitization. The basesystem GCC version in HEAD (v. 6.4.0) is too old. 2. get_unaligned() is a fundamental type oriented (char, int, long etc) A large part of the issues detected in the kernel are due to a misaligned pointer to a struct passed (like disklabel or in6_addr). I think that these cases shall to be addressed directly in the kernel code and treated as buggy. I'm deferring right now the work on unaligned.h and wait for a required minimum version of GCC in the base. I will keep KUBSan reports as non-fatals for now. signature.asc Description: OpenPGP digital signature
Re: 8.0 performance issue when running build.sh?
On Fri, Jul 06, 2018 at 04:04:50PM +0200, Martin Husemann wrote: > I have no scientific data yet, but just noticed that build times on the > auto build cluster did rise very dramatically since it has been updated > to run NetBSD 8.0 RC2. > > Since builds move around build slaves sometimes (not exactly randomly, > but anyway) I picked the alpha port as an example (the first few > architectures in the alphabetical list get build slaves assigned pretty > consistently). Here is an intermediate result from further experiments and statistics: - fpu_eager (as it is on NetBSD 8.0 RC2, which is not what is in -current and not what will be in the final 8.0 release) has a measurable performance impact - but it is not the big issue here. - if we ignore netbsd-7* branches, the performance loss is reasonable explainable by the SVS penalty - we are going to check that theory soon. - maybe the netbsd-7 /bin/sh and/or /usr/bin/make cause some very bad interaction with SVS, making those build times sky rocket - if turning off SVS does not solve this, we will need to dig deeper. So stay tuned, maybe only Intel to blame ;-) If anyone has concrete pointers for the last issue (or ideas what to change/ measure) please speak up. Martin