Kernel locks when configuring motherboard ethernet
... Since using the LOCKDEBUG kernel, this system can't use the network at all without locking up, even after a hardware reset. It's colocated, so while I can have someone physically power cycle the machine, I figured I'd leave it in case more information can be gained from it as it is. The serial console can be accessed via another system via cu, and the other system can also do a hardware reset. The system obviously can't talk on the Internet, but it has netbsd-10 sources and can compile a kernel for itself. The previous kernel that has been running for a couple of weeks had locked up twice, and I don't know if that's directly related to this, because it had nothing to do with configuring network ports. Interestingly, I've seen the same lockups with the previous machine that this machine replaced (8 gig Raspberry Pi 4, netbsd-10). These machines are public facing and are routing parts of a class C over tinc tunnels. Here's one lockup: [ 495715.4076245] fatal breakpoint trap in supervisor mode [ 495715.4076245] trap type 1 code 0 rip 0x80235385 cs 0x8 rflags 0x202 cr2 0x76f4a20740 00 ilevel 0x8 rsp 0xa80839aac8c8 [ 495715.4076245] curlwp 0xa0ed91107480 pid 0.3 lowest kstack 0xa80839aa82c0 Stopped in pid 0.3 (system) at netbsd:breakpoint+0x5: leave breakpoint() at netbsd:breakpoint+0x5 comintr() at netbsd:comintr+0x7e0 intr_kdtrace_wrapper() at netbsd:intr_kdtrace_wrapper+0x26 Xhandle_ioapic_edge1() at netbsd:Xhandle_ioapic_edge1+0x75 --- interrupt --- npf_tcpsaw() at netbsd:npf_tcpsaw+0x1d npf_conn_inspect() at netbsd:npf_conn_inspect+0x86 npfk_packet_handler() at netbsd:npfk_packet_handler+0x18e pfil_run_hooks() at netbsd:pfil_run_hooks+0x128 ip_output() at netbsd:ip_output+0x4c0 ip_forward() at netbsd:ip_forward+0x138 ipintr() at netbsd:ipintr+0xa80 softint_dispatch() at netbsd:softint_dispatch+0x95 DDB lost frame for netbsd:Xsoftintr+0x4c, trying 0xa80839aad0f0 Xsoftintr() at netbsd:Xsoftintr+0x4c --- interrupt --- b31c059c10208e97: ds c9a0 es ddb3 fs 1 gs e8d9 rdi 81845120x86_io rsi 800 rbp a80839aac8c8 rbx a8003df8c01c rdx 7f rcx 22 rax 1 r8 a80839aaca94 r9 0 r10 5ed7b6ca02a0 r11 a8003df91008 r12 a0e6944a1790 r13 800 r14 cc r15 a0e6944a16c0 rip 80235385breakpoint+0x5 cs 8 rflags 202 rsp a80839aac8c8 ss 0 netbsd:breakpoint+0x5: leave Does anyone have any suggestions about what to try next? Does anyone want to have a look around themselves? Thanks, John Klos
Re: mfii0 disappears on warm reboot
Hi, Here are the pcidumps from both the working mfii0 and mfii0 after warm reboot: https://www.klos.com/~john/pcidump_raid.txt https://www.klos.com/~john/pcidump_noraid.txt This is what's different: 3c3 < 0x00: 0x005b1000 0x0010 0x01040003 0x --- 0x00: 0x005b1000 0x0017 0x01040003 0x0010 7,10c7,10 < Command register: 0x < I/O space accesses: off < Memory space accesses: off < Bus mastering: off --- Command register: 0x0007 I/O space accesses: on Memory space accesses: on Bus mastering: on 40c40 < Cache Line Size: 0bytes (0x00) --- Cache Line Size: 64bytes (0x10) 43c43 < 0x10: 0x0001 0x0004 0x 0x0004 --- 0x10: 0xe001 0xfea60004 0x 0xfea4 45c45 < 0x30: 0x 0x0050 0x 0x0100 --- 0x30: 0xfea4 0x0050 0x 0x0105 49c49 < base: 0x, disabled --- base: 0xe000 52c52 < base: 0x, disabled --- base: 0xfea6 55c55 < base: 0x, disabled --- base: 0xfea0 61,62c61,62 < Expansion ROM Base Address Register: 0x < base: 0x --- Expansion ROM Base Address Register: 0xfea4 base: 0xfea4 71c71 < Interrupt line: 0x00 --- Interrupt line: 0x05 141c141 < Device Control Register: 0x2810 --- Device Control Register: 0x2840 146,147c146,147 < Enable Relaxed Ordering: on < Max Payload Size: 128 byte --- Enable Relaxed Ordering: off Max Payload Size: 512 byte 173c173 < Link Control Register: 0x0080 --- Link Control Register: 0x00c0 178c178 < Common Clock Configuration: off --- Common Clock Configuration: on 257c257 < MSI-X Enable: off --- MSI-X Enable: on 269c269 < 0x70: 0x00092810 0x00407482 0x10420080 0x --- 0x70: 0x00092840 0x00407482 0x104200c0 0x 274c274 < 0xc0: 0x000f0011 0x2001 0x3001 0x --- 0xc0: 0x800f0011 0x2001 0x3001 0x 383c383 < 0x120: 0x000f 0x0101004c 0x73cd647b --- 0x120: 0x000f 0x0101004c 0x2d3319c8 471c471 < 0x120: 0x000f 0x0101004c 0x73cd647b 0x --- 0x120: 0x000f 0x0101004c 0x2d3319c8 0x When the RAID card is working, detaching with "drvctl -d mfii0" and trying to reattach caused a panic: [ 256.157518] dk5 at sd0 (doozerroot) deleted [ 256.157518] dk4 at sd0 (doozerswap) deleted [ 256.157518] sd0: detached [ 256.157518] scsibus0: detached [ 256.287530] mfii0: detached [ 269.317565] mfii0 at pci1 dev 0 function 0panic: kernel diagnostic assertion "msipic_find_msi_pic_locked(msipic->mp_devid) == NULL" failed: file "/usr/src/sys/arch/x86/pci/msipic.c", line 262 [ 269.327575] cpu0: Begin traceback... [ 269.327575] vpanic() at netbsd:vpanic+0x183 [ 269.327575] kern_assert() at netbsd:kern_assert+0x4b [ 269.327575] msipic_construct_common_msi_pic() at netbsd:msipic_construct_common_msi_pic+0x325 [ 269.327575] msipic_construct_msix_pic() at netbsd:msipic_construct_msix_pic+0x6e [ 269.337510] pci_msix_alloc_common.part.0() at netbsd:pci_msix_alloc_common.part.0+0x26 [ 269.337510] pci_msix_alloc_exact() at netbsd:pci_msix_alloc_exact+0x5c [ 269.337510] pci_intr_alloc() at netbsd:pci_intr_alloc+0x57 [ 269.337510] mfii_attach() at netbsd:mfii_attach+0x2c0 [ 269.337510] config_attach_internal() at netbsd:config_attach_internal+0x19f [ 269.347509] config_found() at netbsd:config_found+0xc3 [ 269.347509] pci_probe_device() at netbsd:pci_probe_device+0x661 [ 269.347509] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x1a4 [ 269.347509] pcirescan() at netbsd:pcirescan+0x4e [ 269.347509] rescanbus() at netbsd:rescanbus+0x16d [ 269.347509] drvctl_ioctl() at netbsd:drvctl_ioctl+0x534 [ 269.357510] sys_ioctl() at netbsd:sys_ioctl+0x56d [ 269.357510] syscall() at netbsd:syscall+0x196 [ 269.357510] --- syscall (number 54) --- [ 269.357510] netbsd:syscall+0x196: [ 269.357510] cpu0: End traceback... [ 269.357510] dumping to dev 168,1 (offset=8, size=6146568): [ 269.357510] dump [ 1.000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, As per some excellent recommendations, I'd like to try to reset the card, but I'm not sure how to use pcictl to do this. Does anyone have an idea? John
Re: mfii0 disappears on warm reboot
Hi, I don't know exactly how I sent an email with absolutely no "From:" (I have an idea, but I probably couldn't do it again if I tried), but that was me, if anyone wants to reply. Thanks, John
Lock of NetBSD-current with ifconfig down / up
Hi, Here's a nice issue :) Plug in ure* USB ethernet to amd64 machine running NetBSD-current (9.99.99, 22-August-2022): [ 1791670.446266] ure0 at uhub8 port 4 [ 1791670.446266] ure0: Realtek (0x0bda) USB 10/100/1000 LAN (0x8153), rev 2.10/30.00, addr 6 [ 1791670.446266] ure0: RTL8153 ver 5c30 [ 1791670.566267] rgephy0 at ure0 phy 0: RTL8251 1000BASE-T media interface, rev. 0 [ 1791670.586267] rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, auto [ 1791670.586267] ure0: Ethernet address a0:ce:c8:e7:88:5f [ 1791673.256299] ugen1 at uhub8 port 5 [ 1791673.256299] ugen1: VIA Labs, Inc. (0x2109) PD3.0 USB-C Device (0x), rev 2.01/0.01, addr 7 ifconfig ure0 up No problem: ure0: flags=0x8943 mtu 1500 capabilities=0x3ff00 capabilities=0x3ff00 capabilities=0x3ff00 enabled=0 ec_capabilities=0x1 ec_enabled=0 address: a0:ce:c8:e7:88:5f media: Ethernet autoselect (100baseTX full-duplex) status: active inet6 fe80::a2ce:c8ff:fee7:885f%ure0/64 flags 0 scopeid 0x9 ifconfig ure0 down Locks the machine. I couldn't get more information because it's 3000 miles away. There's nothing in dmesg because the machine was power cycled. Initially I imagined it might be due to the ure* driver, but then it happened locally. On an amd64 system running 9.99.98 from 16-July-2022, I ran "ifconfig re1 down" and the machine locked - no ICMP, nothing for SIGINFO, no response to keyboard cnmagic. It doesn't appear to be hardware, but here's this just because. [ 1.044097] re1 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 0x0c) [ 1.044097] re1: interrupting at msix2 vec 0 [ 1.044097] re1: RTL8168G/8111G (0x4c00) [ 1.044097] re1: Ethernet address 4c:cc:6a:01:a5:e0 [ 1.044097] re1: using 256 tx descriptors [ 1.044097] rgephy1 at re1 phy 7: RTL8251 1000BASE-T media interface, rev. 0 I've ordered some PS/2 keyboards, because I take it that's the only way to reliably get in to the kernel debugger on amd64, unless someone knows a trick to make USB keyboards usable. send-pr? Thanks, John
Scanning floppy devices with assumed density
Hi, all, It has been ages since I've used floppies in NetBSD very much, but checking the man page, different slice letters are used to indicate disk densities. From fdc(4): The driver supports the following floppy diskette formats by using particular partitions: 1.44MB 3.5-inch (b) 1.2MB 5.25-inch (c) 360KB 5.25-inch (1.2MB drive) (d) 360KB 5.25-inch (IBM-PC drive) (e) 720KB 3.5-inch (f) 720KB 5.25-inch (g) 360KB 3.5-inch (h) A user on Reddit pointed out this error on booting a NetBSD 9 kernel on an i80486 system: boot device: fd0 [ 5.121888] fd0d: hard error reading fsbn 0 of 0-2 (st0 0x40 st1 0x1 st2 0x0 cyl 0 head 0 sec 1) https://www.reddit.com/r/NetBSD/comments/vh4wgc/a_little_bit_of_fun_booting_the_netbsd_162/idyrf95/ She wondered why fd0d is being used here. I can't imagine this is due to scanning for a disklabel, since they've been around forever, so is this perhaps due to dkwedge_discover? John
Re: mfii hanging on boot
I saw almost the same situation. To recover from the error, I had to power-down the machine, unplug the battery, keep a few minutes, plug the battery and power-up again. I committed the change yesterday. I guess that it fixes kern/55192 and kern/56669. My machine is only 2500 km away, but it'd still be hard to test. I have a spare card I will dig out to test, though. Thanks, John
Re: Complete lock-up from using pkgsrc/net/darkstat
Hi, It also might be relevant to note which port you're running. It must be capable of having re and wm interfaces, since you name them, but that still includes a fair bit. I still see no statement which port(s) you're running... Here my brain equated port with ethernet, not with NetBSD port :P These are all NetBSD/amd64 systems. Maybe the problem is related to the number of DNS lookups the child process is doing rather than the number of TB the parent process is counting? I know DNS can be a big issue when there're a lot of attempts to look up reverse DNS, which is often broken, so I run darkstat with DNS lookup disabled, like so: darkstat_flags="-i re0 -b 127.0.0.1 --no-dns" Where I am I don't have all the things I normally have, so I'm still waiting for a null modem adapter so I can get a serial console on the machine that's physically local. Once that's here, I'm going to try very hard to get a lockup. I'm guessing it's a memory issue with darkstat -- ? specifically, it has a memory leak that runs the system it runs on out of RAM. I bet if you add a ton of swap to a system on which you run darkstat, you'll find it runs longer before it hangs, and, I'm guessing you'll notice there is a lot of swap in use before it hangs darkstat is run as user "nobody" and shows 23 megabytes after a week. It has no special unlimiting of any resources. The systems where issues were seen range from 4 to 16 gigabytes of memory, and I'd have noticed if any of them were in to swap at all (none were). Is it possible the machine is not, strictly, hung, just doing something that renders it unresponsive for a human-perceptible time? You wrote of having to get remote hands to poke an unresponsive machine; how long did that take? Did your remote hands notice whether the disk light was lit (if there is such a light)? When a system was in this specific state, I had someone plug in a USB keyboard and tell me if any new green text appeared on the screen. It did not (he sent a photo of the screen, too). I then asked him to press (and not hold) the power button. He did, and he said nothing happened - nothing on the screen, no disk lighting, et cetera (I told him to look for that). The systems normally power themselves down relatively nicely from a simple press. I was communicating with him in the morning, hours after attempting to stop darkstat, so it had plenty of time to recover. I had also logged in to the backup machine and saw that I couldn't reach the internal interface of the frozen machine. I've had machines appear to lock up hard when what's actually going on is that a large process is dumping core. If it didn't finish in hours, then that would be a problem :) I'll post more when I've got a serial console set up. Thanks, John
Re: Complete lock-up from using pkgsrc/net/darkstat
> On NetBSD 8, 9, current, [...] Stop darkstat. Machine locks. > [...] in case anyone can imagine how and why a complete system lockup > could happen as the result of an interface being used in promiscuous > mode for long periods of time (and not when used that way for short > periods of time. Don't forget, it may _not_ be "as the result of an interface being used in promiscuous mode for long periods of time". That's merely a correlate (and possibly not a perfect correlate - your sample size is small); the causality may be more complicated. (For example, maybe it's actually as a result of receiving certain traffic which is on that segment but which it wouldn't normally receive. Maybe it's got nothing to do with network interfaces and instead is related to something else darkstat does - I know nothing about what darkstat does or doesn't do, except for your implication that it runs interfaces promiscuous.) You're absolutely right - I don't know this for sure, but I can add some additional information. I've seen occasional lockups (once or twice per year) on a number of systems - at least five different systems - which are all running as NAT routers and firewalls for various heavily used networks. Two systems were running NetBSD 8 with ipfilter, one with wm* as the public interface, the other with re*. Two systems were running NetBSD 9 with npf, one with wm*, one with re*. The fifth was running 9.99.93 with re0 as the public interface and npf. It was on this last one that I ran "/etc/rc.d/darkstat stop" and saw that it completely locked up, and I had to have someone physically go and power cycle it. I know that when interfaces switch from promiscuous to non-promiscuous, they can lose link for a moment, but the machine wasn't reachable from the internal network, either, and didn't respond when a USB keyboard was connected to it (no green lines from the kernel). Also, pressing the power button didn't trigger a poweroff event, so I know it was completely locked. Random lockups are one thing, and a specific lockup when stopping darkstat is another, but to add to this, one location has two identical machines, one which occasionally locked under exceptionally high network load, and the other that never did. To ascertain whether it was a hardware fault, the drives were swapped between them. The problem continued and moved with the drive, so then the OSes were reinstalled, and one still kept occasionally locking up. Only after seeing the lockup when stopping darkstat did I realize that the one that continuously had occasional lockups was running darkstat on boot. These lockups have bugged the heck out of me for many years - at least five - and I'm kicking myself that I only realize now that all the machines that were 100% stable for multiple years weren't running darkstat, and the ones that were problematic were running darkstat. I should've realized this ages ago. It also might be relevant to note which port you're running. It must be capable of having re and wm interfaces, since you name them, but that still includes a fair bit. With five different machines, I don't think it's likely an issue with all five interfaces, but it's always better to have too much information than too little: NetBSD 8.2 (1-May-2020): wm0 at pci2 dev 0 function 0: Intel i82574L (rev. 0x00) wm0: for TX and RX interrupting at msix2 vec 0 affinity to 1 wm0: for TX and RX interrupting at msix2 vec 1 affinity to 2 wm0: for LINK interrupting at msix2 vec 2 wm0: PCI-Express bus wm0: 2048 words FLASH, version 1.8.0, Image Unique ID wm0: ASPM L0s and L1 are disabled to workaround the errata. wm0: Ethernet address 00:1b:21:b5:51:e7 wm0: 0x224480 makphy0 at wm0 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1 NetBSD 8.2 (1-May-2020): re0 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 0x0c) re0: interrupting at msi2 vec 0 re0: Ethernet address 4c:cc:6a:0b:ee:1a re0: using 256 tx descriptors rgephy0 at re0 phy 7: RTL8251 1000BASE-T media interface, rev. 0 NetBSD 9.0 (12-June-2020): [ 1.004517] wm0 at pci3 dev 0 function 0: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) [ 1.004517] wm0: interrupting at ioapic1 pin 3 [ 1.004517] wm0: PCI-Express bus [ 1.004517] wm0: 4096 words (16 address bits) SPI EEPROM, version 5.10.2, Image Unique ID [ 1.004517] wm0: ASPM L1 is disabled to workaround the errata. [ 1.004517] wm0: Ethernet address 00:15:17:73:0d:15 [ 1.004517] wm0: 0x24440 [ 1.004517] igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 ... NetBSD 9.1 (24-April-2021): [ 1.008819] re0 at pci6 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. 0x0c) [ 1.008819] re0: interrupting at msix2 vec 0 [ 1.008819] re0: Ethernet address e0:d5:5e:48:2c:58 [ 1.008819] re0: using 256 tx descriptors [ 1.008819] rgephy0 at re0 phy 7: RTL8251 1000BASE-T media interface, rev. 0 NetBSD 9.99.93 (26-F
Complete lock-up from using pkgsrc/net/darkstat
So here's an interesting problem: On NetBSD 8, 9, current, with both ipfilter and with npf, with different kinds of ethernet interfaces (re*, wm*), run pkgsrc/net/darkstat. Pass a lot of traffic (like a week's worth of Internet traffic). Stop darkstat. Machine locks. I've only recently been able to ascertain that it's directly related to darkstat, but because it hasn't happened locally, I don't have any more information than that, so not enough for a PR. I'm going to try to reproduce the lockup on a physically local machine with a USB keyboard already attached in hopes that I can actually get in to the kernel debugger. I figured it's worth mentioning here in case anyone can imagine how and why a complete system lockup could happen as the result of an interface being used in promiscuous mode for long periods of time (and not when used that way for short periods of time. Thanks, John
Issues with older wd* / IDE on various platforms
Hi, all, I've noticed problems in three places and only recently did it occur to me that they may all be related. 1) the last time I tried to install NetBSD on an hpcarm Jornada, I got stuck because once the kernel booted from Windows CE, it couldn't access the CompactFlash card. 2) recently I tried running NetBSD-9 and -current on an m68040 Mac Quadra 630. After booting, the kernel would show some errors but would never be able to talk to the IDE disk. The exact hardware and disk work fine with NetBSD-7 kernels, though. 3) my colocated Amiga 1200 lost the ability to write to its CompactFlash card, which is attached to IDE, that is used for booting the kernel. I thought the card went bad, so I mailed a new one to the datacenter, but the issue contiued. This happened, I think, around the beginning of the year when I upgraded to NetBSD-9. Now I have no way to upgrade the kernel remotely :( All three of these machines have much older IDE, so I'm wondering what in NetBSD changed that may've have caused this. Soon I'll be back and will have access to the Quadra and Jornada, and I'm open to suggestions about which commits to examine and to test. Thanks, John Klos
Re: /dev/crypto missing
I erroneously thought that if pseudo-device crypto wasn't in the kernel, crypto would be done in userland. That's not the case: What makes you think crypto isn't being done in userland? Just a bad guess that the reason for pseudo-device crypto was to do some things in the kernel. The problem looks to me like the server returns garbage on a TLS connection, which gets mixed up with an OpenSSL debugging message -- or possibly it is garbage _because_ it got mixed up with the OpenSSL debugging message. Maybe OpenSSL should handle ENXIO quietly like it handles ENOENT there, but it looks like there's a deeper problem if crap that OpenSSL printed got included in the TLS stream! If this is the case, then why isn't crypto in every kernel configuration by default, except perhaps special cases? /dev/crypto is totally obsolete as it exists today. Really the only reason it continues to exist is to test opencrypto drivers from userland before using them in the kernel. Hmmm... Then I wonder what's really going on. This is from trying to use bozohttpd with TLS on an Amiga with exactly the same configuration as used on ARM and amd64. I'll have to look in to this a bit more and perhaps open a PR. Thanks, John
/dev/crypto missing
Hi, I erroneously thought that if pseudo-device crypto wasn't in the kernel, crypto would be done in userland. That's not the case: openssl s_client -debug -connect 192.80.49.7:443 Could not open /dev/crypto: Device not configured CONNECTED(0003) write to 0xe4f02d0 [0xe546000] (293 bytes => 293 (0x125)) - 16 03 01 01 20 01 00 01-1c 03 03 40 b2 73 a3 d5 ..@.s.. 0010 - 13 f4 91 bb ad cf 6b 49-f1 33 6f 86 ae 5b 1e 1e ..kI.3o..[.. 0020 - f5 cb db 10 5e 27 a5 07-10 97 8d 20 f6 9b 7c 26 ^'. ..|& 0030 - f3 52 e6 e5 19 1e 57 24-c2 ff c7 07 6d 34 23 74 .RW$m4#t 0040 - 6c 36 da 86 f8 39 f9 a8-7e 24 1b 6c 00 3e 13 02 l6...9..~$.l.>.. 0050 - 13 03 13 01 c0 2c c0 30-00 9f cc a9 cc a8 cc aa .,.0 0060 - c0 2b c0 2f 00 9e c0 24-c0 28 00 6b c0 23 c0 27 .+./...$.(.k.#.' 0070 - 00 67 c0 0a c0 14 00 39-c0 09 c0 13 00 33 00 9d .g.9.3.. 0080 - 00 9c 00 3d 00 3c 00 35-00 2f 00 ff 01 00 00 95 ...=.<.5./.. 0090 - 00 0b 00 04 03 00 01 02-00 0a 00 0c 00 0a 00 1d 00a0 - 00 17 00 1e 00 19 00 18-00 23 00 00 00 16 00 00 .#.. 00b0 - 00 17 00 00 00 0d 00 30-00 2e 04 03 05 03 06 03 ...0 00c0 - 08 07 08 08 08 09 08 0a-08 0b 08 04 08 05 08 06 00d0 - 04 01 05 01 06 01 03 03-02 03 03 01 02 01 03 02 00e0 - 02 02 04 02 05 02 06 02-00 2b 00 09 08 03 04 03 .+.. 00f0 - 03 03 02 03 01 00 2d 00-02 01 01 00 33 00 26 00 ..-.3.&. 0100 - 24 00 1d 00 20 74 f9 da-78 03 7e ab f9 52 6d da $... t..x.~..Rm. 0110 - cf 19 9b 11 0d 3c 24 c2-00 44 f1 bf 4b e8 92 33 .<$..D..K..3 0120 - dd 79 33 d7 1e.y3.. read from 0xe4f02d0 [0xe4e7003] (5 bytes => 5 (0x5)) - 43 6f 75 6c 64Could 4294967295:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:/usr/src/crypto/external/bsd/openssl/dist/ssl/record/ssl3_record.c:332: --- no peer certificate available --- No client certificate CA names sent --- SSL handshake has read 5 bytes and written 293 bytes Verification: OK --- New, (NONE), Cipher is (NONE) Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE No ALPN negotiated Early data was not sent Verify return code: 0 (ok) --- read from 0xe4f02d0 [0xe558000] (8192 bytes => 45 (0x2D)) - 20 6e 6f 74 20 6f 70 65-6e 20 2f 64 65 76 2f 63not open /dev/c 0010 - 72 79 70 74 6f 3a 20 44-65 76 69 63 65 20 6e 6f rypto: Device no 0020 - 74 20 63 6f 6e 66 69 67-75 72 65 64 0at configured. If this is the case, then why isn't crypto in every kernel configuration by default, except perhaps special cases? John Klos
Re: Horrendous RAIDframe reconstruction performance
Any thoughts about what's going on here? Is this because the drives are 512e drives? Three weeks is a LONG time to reconstruct. So this turns out to be a failing drive. SMART doesn't show it's failing, but the one that's failing defaults to having the write cache off, and turning it on doesn't change the speed. I guess it's still usable, in a limited way - I can only write at 5 or 6 MB/sec, but I can read at 200 MB/sec. Maybe I'll use it in an m68k Mac. Also, the autoconfigure works, but the forcing of root FS status didn't because I was testing it on a system that already had a RAIDframe with forced root. However, it still doesn't work on aarch64, but I'll recheck this after trying Jared's boot.cfg support. Thanks, Greg, Michael and Edgar. I learned something :) I am still curious about whether I was seeing both good read and write speeds because writes weren't going to both drives. I suppose I assumed that all writes would go to both drives even while reconstructing, but I suppose that only happens when the writes are inside of the area which has already been reconstructed, yes? John
Horrendous RAIDframe reconstruction performance
Hello, I'm setting up two helium, non-SMR, 512e 8 TB disks (HGST HUH728080ALE604) in a RAIDframe mirror: [ 2.829768] wd2 at atabus2 drive 0 [ 2.829768] wd2: [ 2.829768] wd2: drive supports 16-sector PIO transfers, LBA48 addressing [ 2.829768] wd2: 7452 GB, 15504021 cyl, 16 head, 63 sec, 512 bytes/sect x 15628053168 sectors (0 bytes/physsect; first aligned sector: 8) [ 2.859768] wd2: GPT GUID: 4086e8f6-0ddd-4689-a942-80bf1b598539 [ 2.859768] dk0 at wd2: "raid8tb0", 15611274240 blocks at 1024, type: raidframe [ 2.859768] dk1 at wd2: "swap8tb0", 16777216 blocks at 15611275264, type: swap [ 2.869768] wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags) w/PRIO [ 2.869768] wd2(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA), NCQ (31 tags) w/PRIO (Strange that it says "0 bytes/physsect") First, it seems that autoconfigure doesn't allow forcing the root filesystem. I'm guessing because this is using GPT: Autoconfig: Yes Root partition: Force Next, raidctl doesn't handle NAME= for device yet: raidctl -v -a NAME=raid8tb1 raid0 raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: No such file or directory Finally, even though these are absolutely not SMR drives, rebuild is running at 3.5 to 4 MB/sec, whether attached via USB-3 or directly attached via SATA: # raidctl -v -S raid0 Reconstruction is 0% complete. Parity Re-write is 100% complete. Copyback is 100% complete. Reconstruction status: 0% | | ETA: 485:13:54 - Interestingly, a bonnie++ run shows 80+ megabytes per second in block writes and 160+ megabytes per second on block read, and 185 random seeks per second, while reconstructing. Any thoughts about what's going on here? Is this because the drives are 512e drives? Three weeks is a LONG time to reconstruct. This is observed on NetBSD 9.99.68 on both i386 and on aarch64. Thanks, John Klos
Reproducible NetBSD 9.0/amd64 panic, but no crash dump
I've had the same NetBSD 9.0/amd64, compiled from 22-Feb sources, system panic twice while trying to run the same workload, which is just fetching many static files via command-like ftp continuously. This appears to be resproducible since both panics happened within an hour of starting the workload. However, no crash dump was recorded after the panics even though kern.dump_on_panic is set. savecore says there's no core dump, there's plenty of disk space... Not sure how to diagnose this. Suggestions? Thanks, John [ 236045.981322] uvm_fault(0x8cf49ee4fa20, 0x0, 2) -> e [ 236045.981322] fatal page fault in supervisor mode [ 236045.981322] trap type 6 code 0x2 rip 0x8098bb73 cs 0x8 rflags 0x10286 cr2 0 ilevel 0x4 rsp 0xc200c0b827f0 [ 236045.981322] curlwp 0x8cf4e470e8c0 pid 1070.1 lowest kstack 0xc200c0b802c0 [ 236045.981322] panic: trap [ 236045.981322] cpu3: Begin traceback... [ 236045.981322] vpanic() at netbsd:vpanic+0x160 [ 236045.981322] snprintf() at netbsd:snprintf [ 236045.981322] startlwp() at netbsd:startlwp [ 236045.981322] alltraps() at netbsd:alltraps+0xbb [ 236045.981322] thmap_del() at netbsd:thmap_del+0x218 [ 236045.981322] npf_conndb_remove() at netbsd:npf_conndb_remove+0x32 [ 236045.981322] npf_conn_establish() at netbsd:npf_conn_establish+0x1b6 [ 236045.981322] npfk_packet_handler() at netbsd:npfk_packet_handler+0x318 [ 236045.981322] pfil_run_hooks() at netbsd:pfil_run_hooks+0x122 [ 236045.991327] ip_output() at netbsd:ip_output+0x49e [ 236045.991327] tcp_output() at netbsd:tcp_output+0x1970 [ 236045.991327] tcp_connect_wrapper() at netbsd:tcp_connect_wrapper+0x22b [ 236045.991327] do_sys_connect() at netbsd:do_sys_connect+0x90 [ 236045.991327] sys_connect() at netbsd:sys_connect+0x49 [ 236045.991327] syscall() at netbsd:syscall+0x157 [ 236045.991327] --- syscall (number 98) --- [ 236045.991327] 759b6e042bfa: [ 236045.991327] cpu3: End traceback... [ 236045.991327] dumping to dev 0,1 (offset=16877935, size=2084794): [ 236045.991327] dump [ 2159.007866] uvm_fault(0x8413f09b9458, 0x0, 2) -> e [ 2159.007866] fatal page fault in supervisor mode [ 2159.007866] trap type 6 code 0x2 rip 0x8098bb73 cs 0x8 rflags 0x10286 cr2 0 ilevel 0x4 rsp 0xb900b05fe7f0 [ 2159.007866] curlwp 0x8414201602c0 pid 6556.1 lowest kstack 0xb900b05fc2c0 [ 2159.007866] panic: trap [ 2159.007866] cpu3: Begin traceback... [ 2159.007866] vpanic() at netbsd:vpanic+0x160 [ 2159.007866] snprintf() at netbsd:snprintf [ 2159.007866] startlwp() at netbsd:startlwp [ 2159.007866] alltraps() at netbsd:alltraps+0xbb [ 2159.007866] thmap_del() at netbsd:thmap_del+0x218 [ 2159.007866] npf_conndb_remove() at netbsd:npf_conndb_remove+0x32 [ 2159.017872] npf_conn_establish() at netbsd:npf_conn_establish+0x1b6 [ 2159.017872] npfk_packet_handler() at netbsd:npfk_packet_handler+0x318 [ 2159.017872] pfil_run_hooks() at netbsd:pfil_run_hooks+0x122 [ 2159.017872] ip_output() at netbsd:ip_output+0x49e [ 2159.017872] tcp_output() at netbsd:tcp_output+0x1970 [ 2159.017872] tcp_connect_wrapper() at netbsd:tcp_connect_wrapper+0x22b [ 2159.017872] do_sys_connect() at netbsd:do_sys_connect+0x90 [ 2159.017872] sys_connect() at netbsd:sys_connect+0x49 [ 2159.017872] syscall() at netbsd:syscall+0x157 [ 2159.017872] --- syscall (number 98) --- [ 2159.017872] 7f2580c42bfa: [ 2159.017872] cpu3: End traceback... [ 2159.017872] dumping to dev 0,1 (offset=16877935, size=2084794): [ 2159.017872] dump
Re: amd64: svs
It looks like two of the Dell machines I use have BIOS updates that include the new microcode, but I don't know when it'll be available for the other machines I use. The standalone microcode update at https://downloadcenter.intel.com/download/27337/Linux-Processor-Microcode-Data-File is dated 20171117, and from what I've read, does not contain the latest changes. FWIW, Intel updated their microcode on 8-Jan-2018: https://downloadcenter.intel.com/download/27431/ I can't find any documentation which shows what exactly is fixed for each family of processors. I'm not updating my BIOSes for now because using intel-microcode-netbsd package lets me easily make before-and-after comparisons. Plus, many of the BIOS updates I've seen don't say what Intel updates they use - you have to run cpuctl yourself afterwards to see... John
USB fixes hopefully getting in to -8
Hi, I have a 2 TB drive connected via USB to a Raspberry Pi 2 which is running netbsd-8. I tried five times, unsuccessfully, to copy a 200 gigabyte file via scp to the drive. Each time the Pi locked up, either while just copying or while doing things in other ssh sessions. It seemed to coincide at least twice with either logging out or trying to log in. Since ethernet is on USB on the Pi along with the disk, I decided to try an 8.99.3 kernel compiled from yesterday's sources. The scp worked without any issue at all, even with other things going on like cvs updates and compiling. Does anyone who might know what's been fixed know if the changes are going to be pulled in to -8? Thanks, John
NetBSD 7 on amd64 panics
Hi, I have an amd64 system (Intel Core i3) which has been stable for a couple of years doing NAT, IPv6 routing, email, web and so on. Since updating to NetBSD 7 in May, it's paniced six times now. The latest: fatal page fault in supervisor mode trap type 6 code 0 rip 809cc927 cs 8 rflags 10207 cr2 1004f ilevel 4 rsp fe811cfd4ba0 curlwp 0xfe842df3b420 pid 0.3 lowest kstack 0xfe811cfd22c0 panic: trap cpu0: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp alltraps() at netbsd:alltraps+0x96 ipf_frag_natknown() at netbsd:ipf_frag_natknown+0x3a ipf_nat6_checkin() at netbsd:ipf_nat6_checkin+0xe6 ipf_check() at netbsd:ipf_check+0x82b pfil_run_hooks() at netbsd:pfil_run_hooks+0xc4 ip6_input() at netbsd:ip6_input+0x307 ip6intr() at netbsd:ip6intr+0x4b softint_dispatch() at netbsd:softint_dispatch+0x79 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe811cfd4ff0 Xsoftintr() at netbsd:Xsoftintr+0x4f --- interrupt --- 0: cpu0: End traceback... uvm_fault(0xfe842690b480, 0x0, 2) -> e dumpifnagt atlo dpeavg e 0f,a1u lt(o fifns ets=up2e1r2v6i7s1or, msoidzee= 4167952): trap type 6 code 2 ridpu mpf fff805dc489 cs 8 rflags 10202 cr2 84 ilevel 8 rsp fe811dd97e10 curlwp 0xfe8406cb7480 pid 2044.5 lowest kstack 0xfe811dd952c0 Skipping crash dump on recursive panic panic: wddump: polled command has been queued cpu0: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf wddump() at netbsd:wddump+0x282 dumpsys_seg() at netbsd:dumpsys_seg+0xc7 dump_seg_iter() at netbsd:dump_seg_iter+0xce dodumpsys() at netbsd:dodumpsys+0x24c dumpsys() at netbsd:dumpsys+0x1d vpanic() at netbsd:vpanic+0x145 snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp alltraps() at netbsd:alltraps+0x96 ipf_frag_natknown() at netbsd:ipf_frag_natknown+0x3a ipf_nat6_checkin() at netbsd:ipf_nat6_checkin+0xe6 ipf_check() at netbsd:ipf_check+0x82b pfil_run_hooks() at netbsd:pfil_run_hooks+0xc4 ip6_input() at netbsd:ip6_input+0x307 ip6intr() at netbsd:ip6intr+0x4b softint_dispatch() at netbsd:softint_dispatch+0x79 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe811cfd4ff0 Xsoftintr() at netbsd:Xsoftintr+0x4f --- interrupt --- 0: cpu0: End traceback... rebooting... Ideas? I've kept the crashdumps. John
netbsd-7 panic
I have an amd64 machine running netbsd-7 from 6-August-2015 which does common hosting (email, web, DNS), IPv6 tunnels and NAT, amongst other things. It paniced like this a few weeks ago when it had been running netbsd-7 from March, but otherwise it's been problem free. Any thoughts or ideas about what could be causing this? fatal page fault in supervisor mode trap type 6 code 0 rip 80722fb2 cs 8 rflags 10297 cr2 36 ilevel 2 rsp ff fffe811cfebec0 curlwp 0xfe842df3b860 pid 0.5 lowest kstack 0xfe811cfe92c0 panic: trap cpu0: Begin traceback... vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf startlwp() at netbsd:startlwp alltraps() at netbsd:alltraps+0x96 ipf_frag_delete() at netbsd:ipf_frag_delete+0x74 ipf_frag_expire() at netbsd:ipf_frag_expire+0x152 ipf_slowtimer() at netbsd:ipf_slowtimer+0x15 ipf_timer_func() at netbsd:ipf_timer_func+0x2d callout_softclock() at netbsd:callout_softclock+0x248 softint_dispatch() at netbsd:softint_dispatch+0x79 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe811cfebff0 Xsoftintr() at netbsd:Xsoftintr+0x4f --- interrupt --- 0: cpu0: End traceback...
if_wm between netbsd-6 and netbsd-7 issue
Hi, I have a machine which is currently running netbsd-6 with six wm* ethernet interfaces. The first four are on a PCIe card which shows up like so: pci4: i/o space, memory space enabled, rd/line, wr/inv ok wm0 at pci4 dev 0 function 0: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm0: interrupting at ioapic0 pin 17 wm0: PCI-Express bus wm0: 65536 word (16 address bits) SPI EEPROM wm0: Ethernet address 00:15:17:73:0d:15 igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto wm1 at pci4 dev 0 function 1: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm1: interrupting at ioapic0 pin 16 wm1: PCI-Express bus wm1: 65536 word (16 address bits) SPI EEPROM wm1: Ethernet address 00:15:17:73:0d:14 igphy1 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto ppb4 at pci3 dev 4 function 0: vendor 0x111d product 0x8018 (rev. 0x0e) ppb4: PCI Express 1.0 pci5 at ppb4 bus 5 pci5: i/o space, memory space enabled, rd/line, wr/inv ok wm2 at pci5 dev 0 function 0: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm2: interrupting at ioapic0 pin 19 wm2: PCI-Express bus wm2: 65536 word (16 address bits) SPI EEPROM wm2: Ethernet address 00:15:17:73:0d:17 igphy2 at wm2 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 igphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto wm3 at pci5 dev 0 function 1: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm3: interrupting at ioapic0 pin 18 wm3: PCI-Express bus wm3: 65536 word (16 address bits) SPI EEPROM wm3: Ethernet address 00:15:17:73:0d:16 igphy3 at wm3 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 igphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto I was going to upgrade this system to netbsd-7 (built from the sources from three hours ago), but when booting the kernel, this is what I get for wm0 through wm3: pci4: i/o space, memory space enabled, rd/line, wr/inv ok wm0 at pci4 dev 0 function 0: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm0: interrupting at ioapic0 pin 17 wm0: PCI-Express bus wm0: could not acquire SWSM SMBI wm0: wm_nvm_acquire: failed to get semaphore wm0: could not acquire SWSM SMBI wm0: wm_nvm_acquire: failed to get semaphore wm0: No EEPROM wm0: unable to read Ethernet address wm1 at pci4 dev 0 function 1: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm1: interrupting at ioapic0 pin 16 wm1: PCI-Express bus wm1: could not acquire SWSM SMBI wm1: wm_nvm_acquire: failed to get semaphore wm1: could not acquire SWSM SMBI wm1: wm_nvm_acquire: failed to get semaphore wm1: No EEPROM wm1: unable to read Ethernet address ppb4 at pci3 dev 4 function 0: vendor 0x111d product 0x8018 (rev. 0x0e) ppb4: PCI Express capability version 1 x4 @ 2.5GT/s pci5 at ppb4 bus 5 pci5: i/o space, memory space enabled, rd/line, wr/inv ok wm2 at pci5 dev 0 function 0: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm2: interrupting at ioapic0 pin 19 wm2: PCI-Express bus wm2: 4096 words (16 address bits) SPI EEPROM wm2: Ethernet address 00:15:17:73:0d:17 igphy0 at wm2 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto wm3 at pci5 dev 0 function 1: Intel PRO/1000 PT Quad Port Server Adapter (rev. 0x06) wm3: interrupting at ioapic0 pin 18 wm3: PCI-Express bus wm3: 4096 words (16 address bits) SPI EEPROM wm3: Ethernet address 00:15:17:73:0d:16 igphy1 at wm3 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0 igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto Does anyone know what's not right now with support for these Intel quad gigabit cards? Thanks, John Klos
Re: sysctl weirdness on m68k
On NetBSD/atari, luna68k and x68k, kernels print the following sysctl_createv errors during boot: I just noticed that evbarm machines also have those errors: NetBSD 6.99.49 (PI) #1: Fri Jul 25 19:42:40 UTC 2014 j...@chi.ziaspace.com:/usr/current/obj-evbarm/sys/arch/evbarm/compile/PI total memory = 448 MB avail memory = 436 MB sysctl_createv: sysctl_create(machine_arch) returned 17 sysctl_createv: sysctl_locate(multicast) returned 2 sysctl_createv: sysctl_locate(multicast_kludge) returned 2 John
Re: sysctl weirdness on m68k
On NetBSD/atari, luna68k and x68k, kernels print the following sysctl_createv errors during boot: ... Is there anyone who also sees these "sysctl_createv: sysctl_locate(multicast) returned 2" messages on other ports? Yes. I don't have current on an Amiga at the moemnt, but mac68k does: NetBSD 6.99.46 (BRIGGS-$Revision: 6.999 $) #0: Tue Jul 8 22:01:50 UTC 2014 j...@chi.ziaspace.com:/usr/current/obj-mac68k/sys/arch/mac68k/compile/BRIGGS Apple Macintosh Quadra 610 (68040) cpu: delay factor 1601 fpu: mc68040 total memory = 260 MB avail memory = 249 MB sysctl_createv: sysctl_locate(multicast) returned 2 sysctl_createv: sysctl_locate(multicast_kludge) returned 2 John Klos
Re: Unexpected RAIDframe behavior
If the state of the RAID is not being maintained, then that's a bug, and needs to be fixed right away. To my knowledge, however, it does maintain things correctly. Feel free to file a PR with the specifics of any failures in this regard... I will test this via a clean install and create a PR if I observe the same thing. I have a bunch of PRs to go through... John
re: Unexpected RAIDframe behavior
Parity Re-write is 79% complete. OK, so this is really more about how parity checking works than anything else (i guess.) for RAID1, it reads both disks and compares them, and if one fails it will write the "master" data. (more generally, it reads all disks and if anything fails parity check it writes corrected parity back to it.) Ah, so a reboot caused RAIDframe to switch from reconstruction to parity creation. That explains what was going on. However, it makes me wonder if the state of the RAID is not properly being maintained through reboot. I didn't really need all that non-zero data. Thanks, John
re: Unexpected RAIDframe behavior
what does raidctl -s and -m (separate commands) say? raidctl -s raid0 Components: /dev/wd0a: optimal /dev/wd1a: optimal No spares. Component label for /dev/wd0a: Row: 0, Column: 0, Num Rows: 1, Num Columns: 2 Version: 2, Serial Number: 2013090100, Mod Counter: 75 Clean: No, Status: 0 sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1 Queue size: 100, blocksize: 512, numBlocks: 3873470720 RAID Level: 1 Autoconfig: Yes Root partition: Yes Last configured as: raid0 Component label for /dev/wd1a: Row: 0, Column: 1, Num Rows: 1, Num Columns: 2 Version: 2, Serial Number: 2013090100, Mod Counter: 75 Clean: No, Status: 0 sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1 Queue size: 100, blocksize: 512, numBlocks: 3873470720 RAID Level: 1 Autoconfig: Yes Root partition: Yes Last configured as: raid0 Parity status: DIRTY Reconstruction is 100% complete. Parity Re-write is 79% complete. Copyback is 100% complete. raidctl -m raid0 raid0: parity map enabled with 4096 regions of 462MB raid0: regions marked clean after 8 intervals of 40.000s raid0: write/sync/clean counters 43509/2112/1712 raid0: 871 dirty regions raid0: parity map will remain enabled on next configure
Unexpected RAIDframe behavior
Hi, After setting up a RAID-1 mirror with RAIDframe in netbsd-6 (compiled from yesterday's tree), I'm seeing strange disk issues. The initial mirror was set up and was initializing while the system was booted off of the install kernel and both drives (wd0 and wd1) were writing at around 110 MB/sec each. Some time after a reboot onto the installed system, I started seeing this from iostat: tty wd0 wd1 raid0 CPU tin tout KB/t t/s MB/s KB/t t/s MB/s KB/t t/s MB/s us ni sy in id 0 79 64.00 231 14.42 64.00 462 28.90 0.0000 0.000 0 0 0 0 100 0 239 64.00 253 15.84 64.00 506 31.62 0.0000 0.000 0 0 1 0 99 0 79 64.00 200 12.50 64.00 400 25.00 0.0000 0.000 0 0 0 1 99 0 79 64.00 194 12.13 64.00 388 24.26 0.0000 0.000 0 0 1 0 99 0 79 64.00 218 13.61 64.00 436 27.23 0.0000 0.000 0 0 0 0 99 raid0 is completely idle - swap isn't even enabled - yet wd1 is doing twice the I/O as wd0. Does anyone know why this is the case? John
Booting with dk0 root
Hi, Apparently we can't compile a kernel with netbsd root on dk0 type ffs because dk0 isn't something the kernel knows about until later. How does one use a gpt wedge as a root filesystem? I'm loading the kernel from a CompactFlash but would like the root filesystem to be on a 4 TB drive. Thanks, John Klos
Can't get 100% CPU on all cores
Is this a NetBSD issue, or something else? load averages: 7.99, 7.94, 7.91; up 0+21:43:0721:25:56 34 processes: 2 runnable, 24 sleeping, 8 on CPU CPU0 states: 0.0% user, 100% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU1 states: 0.0% user, 100% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU2 states: 0.0% user, 39.7% nice, 0.0% system, 0.0% interrupt, 60.3% idle CPU3 states: 0.0% user, 100% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU4 states: 0.0% user, 100% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU5 states: 0.0% user, 100% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU6 states: 0.0% user, 100% nice, 0.0% system, 0.0% interrupt, 0.0% idle CPU7 states: 0.0% user, 99.6% nice, 0.4% system, 0.0% interrupt, 0.0% idle Memory: 1732M Act, 10M Exec, 1698M File, 27G Free Swap: 8193M Total, 8193M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 6102 john 5 20 8256K 8124K RUN/3 25:56 99.02% 99.02% dnetc 10545 john 5 20 8212K 4540K CPU/4 24:45 99.02% 99.02% dnetc 24632 john 5 20 8200K 4544K CPU/5 25:01 94.58% 94.58% dnetc 14493 john 5 20 8216K 4552K RUN/6 24:34 93.65% 93.65% dnetc 10715 john 6 20 8204K 4540K CPU/6 25:21 92.72% 92.72% dnetc 13965 john 5 20 8224K 4540K CPU/0 25:14 89.75% 89.75% dnetc 8117 john 5 20 8220K 2508K CPU/7 25:52 82.03% 82.03% dnetc 28320 john 5 20 8208K 4552K CPU/1 24:20 81.84% 81.84% dnetc For some reason I can't get 100% CPU on all eight cores. This is with netbsd-6 compiled from two days ago on an eight core AMD Zambezi: cpu0 at mainbus0 apid 16: AMD FX(tm)-8150 Eight-Core Processor , id 0x600f12 ... Nothing else is running on the machine. Strange... Ideas? John
Areca 1880?
Hi, all, Is there any support for the Areca 1880, be it in -current or someone's not-yet-checked-in tree? Thanks, John Klos
Interesting USB observation
Hi, While this may be known and not that unexpected, I thought I'd share a little observation. I have a quad core amd64 system running i386 NetBSD 5 (I haven't figured out how to get the code to run properly under amd64). I'm running some CPU intensive applications which will take almost a week and is generating a couple of terabytes of intermediate data. I was running low on space on one of the SATA drives, so I connected a 1 TB USB drive. Observation 1: when scp'ing files from another machine to one of the SATA connected drives at 35 MB/sec, I started another scp to the same drive; the total rate went up to 50 MB/sec (which I think is all the drive can do). While just one of the scps was running, I started an scp to the USB drive. The total rate for both scps dropped to 10 MB/sec! Observation 2: even when reading a modest amount of data from the USB drive (about 3 MB/sec), the CPU intensive tasks get slowed down significantly. Even though at least two of them are getting their data from the SATA drives, they're anywhere from 50% to 10% idle while using USB for anything. When I stop talking to the USB drives, they then run at 96% to 98% CPU each. While transferring over USB, system overhead for all four CPUs is less than 5% per CPU, and interrupt overhead is less than 2% for one CPU and 0% for the other three. With no USB tranfers and two tasks running off of a SATA drive transferring about 5 MB/sec, the total number of interrupts is around 200 to 250 a second. When running tasks or transferring over USB, that jumps to about 2000 to 2300! I always knew that USB kinda sucked, but I had no idea it is this bad! Does anyone know if other OSes have this much of a performance impact when using USB? Is this just due to really horrible interrupt handling and overhead? ohci0 at pci0 dev 18 function 0: vendor 0x1002 product 0x4397 (rev. 0x00) ohci0: interrupting at ioapic0 pin 16 ohci0: OHCI version 1.0, legacy support usb0 at ohci0: USB revision 1.0 ... ehci0 at pci0 dev 18 function 2: vendor 0x1002 product 0x4396 (rev. 0x00) ehci0: interrupting at ioapic0 pin 17 ehci0: dropped intr workaround enabled ehci0: BIOS has given up ownership ehci0: EHCI version 1.0 ehci0: companion controllers, 3 ports each: ohci0 ohci1 ... Thanks, John Klos
WAPBL and IDE mac68k
Hi, I've been having a problem with my Quadra 630 system panicing on boot. It was hard to figure out what was going on - a netbsd-5 kernel panics, a -current kernel boots, but a -current kernel from 31-December-2009 works. It happens even when I try to boot to single user mode because I see the message saying "/: replaying log to memory" right before it panics. Not sure why the journaling stuff happens when booting in single user mode without mounting any filesystems, but that's what it is. When I moved the drive to the same machine's SCSI bus, it works fine with any kernel, so this is specific to the IDE bus of the Quadra 630-type machines. (1) How does one start up in single user mode WITHOUT filesystems getting read? (2) Who knows enough about WAPBL and IDE busses to guess where to look for a possible solution to this problem? Thanks, John Klos
Re: Hardware RAID problem with NetBSD 5?
How old is your kernel ? I've had strange issues with ahci and interrupts on some machines, and a fix (or rather a workaround) for this has just been pulled up to netbsd-5. It could be your problem as well ... I had been trying 5.0.2 because I had to keep reinstalling, but I'll try a new kernel. Thanks, John
Hardware RAID problem with NetBSD 5?
Hi, I'm not sure if this a problem with the motherboard I'm using or with NetBSD, but here goes anyway. I have an MSI MS-7511 amd64 motherboard which has a form of hardware RAID on the motherboard. However, after using it for a few minutes, the machine locks up HARD. I can't get into the debugger, I can't get any sort of activity, and even the reset button doesn't work! It makes me think that perhaps the system disables the reset so that writes which need to go out don't get interrupted the very instant that the reset button is pushed, but if that's the case this would be the first time I've seen anything like this on an x86 system. If I set the SATA controller to AHCI mode in the BIOS, the system can run for days. If set to RAID (I'm simply mirroring two 1 TB drives), it might finish a newfs and untargzip a set, but never all of them. The drives show up as wd* even when they're configured in a mirror: ahcisata0 port 2: device present, speed: 1.5Gb/s ahcisata0 port 4: device present, speed: 3.0Gb/s ahcisata0 port 5: device present, speed: 3.0Gb/s wd0 at atabus3 drive 0: wd0: drive supports 16-sector PIO transfers, LBA48 addressing wd0: 78167 MB, 158816 cyl, 16 head, 63 sec, 512 bytes/sect x 160086528 sectors wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) wd0(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA) wd1 at atabus5 drive 0: wd1: drive supports 16-sector PIO transfers, LBA48 addressing wd1: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) wd1(ahcisata0:4:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA) wd2 at atabus6 drive 0: wd2: drive supports 16-sector PIO transfers, LBA48 addressing wd2: 931 GB, 1938021 cyl, 16 head, 63 sec, 512 bytes/sect x 1953525168 sectors wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) wd2(ahcisata0:5:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA) ... ataraid0: found 1 RAID volume ld0 at ataraid0 vendtype 3 unit 0: nVidia ATA RAID-1 array ld0: 931 GB, 121601 cyl, 255 head, 63 sec, 512 bytes/sect x 1953525120 sectors opendisk: can't open dev wd1 (16) opendisk: can't open dev wd2 (16) opendisk: can't open dev wd1 (16) opendisk: can't open dev wd2 (16) opendisk: can't open dev wd1 (16) opendisk: can't open dev wd2 (16) ... wd0 in this boot is another disk which isn't configured as part of any RAID. Strange... Does anyone have any ideas? Has anyone seen behaviour like this, particularly the reset button getting disabled? Thanks, John
4k sector sizes on new disks?
Hi, all, Does newfs automatically choose 4k fragment sizes for new drives with 4k sectors? I'm wondering how much fallout there will be when these drives become more common. There's a story about it on Slashdot: http://hardware.slashdot.org/story/10/02/14/1541244/Linux-Not-Quite-Ready-For-New-4K-Sector-Drives John