Re: raidframe and gpt
hello. If you put a gpt label on a raid set and add partitions to it, those partitions will show up as dkn devices automatically, when the raid set is configured. This is true whether the raid set is configured automatically by the kernel at boot or manually via raidctl -C or -c. This has been true since NetBSD-5. The NAME= syntax in the fstab file works as well. -thanks -Brian
Re: openssl3+postfix issue (ca md too weak)
hello Ken. Yes, I missed that part of what you were trying to say. You're right, I didn't try that. I'm not sure that's possible when configuring SSL with sendmail. I elected to arrange for sendmail to hav access to valid public certificates so it could present a certificate both as a server, when receiving mail, and as a client, when sending mail. Thanks for the clarification. -thanks -Brian
Re: openssl3+postfix issue (ca md too weak)
hello Ken. It may be that the RFC says the client need not present a valid certificate, but I have found that smtp clients I manage that want to send mail to Microsoft managed domains cannot set up an SSL encrypted smtp session unless the client presents a valid certificate as part of the key negotiation process. This may be something they're doing in violation of the RFC, but I found when I configured sendmail to present a valid certificate, one that could be verified versus a self-signed certificate, mail which wasn't flowing began flowing again. Note I'm not talking about an smtp-auth situation where an individual user is authenticating to a smtp service, but rather server-to-server communications where two smtp MTA agents want to exchange mail with each other. -thanks -Brian On Nov 14, 9:30am, Ken Hornstein wrote: } Subject: Re: openssl3+postfix issue (ca md too weak) } > Hello Taylor. Just as a point of reference, smtp clients that } >connect to domains hosted by Microsoft, i.e. outlook.com and any other } >domains that use their infrastructure for e-mail, will have to present } >a valid SSL certificate in order to submit mail to their smtp servers. } } I do not believe this statement is correct. My reading of RFC 8461 } is that all it says is that the _server_ has to have a valid certificate } and says nothing about client certificates. In my limited experience } configuring your SMTP _client_ to present a certificate is very very } rare. } } --Ken >-- End of excerpt from Ken Hornstein
Re: openssl3+postfix issue (ca md too weak)
Hello Taylor. Just as a point of reference, smtp clients that connect to domains hosted by Microsoft, i.e. outlook.com and any other domains that use their infrastructure for e-mail, will have to present a valid SSL certificate in order to submit mail to their smtp servers. But that is a different issue than Manuel is describing, as I understand it. I think he is saying that the server is presenting an SSL certificate that his client doesn't like when he tries to send mail to an external smtp server. In that case, I agree with you, his client shouldn't be overly concerned about whether the server presented SSL certificate can be verified all the way down the verification chain. I guess it's fine if it does the verification and puts a note in the headers, but it shouldn't stop mail from going out. -thanks -Brian
Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)
hello. Yes, this behavior is expected. It ensures that there is no conflict between the device on the domu end of the vif port and the device on the dom0 end. This is more sane behavior than FreeBSD, which zeros out the MAC address on the dom0 side of the vif. -thanks -Brian
Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)
hello. A couple of quick questions based on the convrsation and the snippets of logs shown in the e-mails. 1. Is the MAC address shown in the ARP replies the correct one for the dom0? No reason it should be wrong, but it's worth verifying, just in case there is an unknown host replying on the network. 2. Can you capture the same tcpdumps using the -e flag? The -e flag will print the source and destination MAC addresses, as wel as the source and destination IP addresses or host names, depending on whether you use the -n flag. This might provide additional insight into what's happening on the network. -thanks -Brian
Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)
Hello. Here are the network configuration settings I've been using for a number of years, all the way through -current. net.inet.tcp.recvbuf_auto=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 -thanks -Brian
Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)
Hello. The ARP cache timeout used to be 1200 seconds or 20 minutes, hard coded. Now, it looks like it's either 1200 seconds or 300 seconds, I'm not sure after a quick romp through the kernel source. In any case, The fact that you're getting regular delays on your pings suggests there is a delay between the time when the arp cache times out and when it gets refreshed. As a consequence of that delay, if you have a high speed stream running when the cache times out, it's possible the send buffer of the sending process, i.e. sshd, is filling up before that cache gets refreshed and the packets can flow again. What is the value of net.inet.tcp.sendbuf_max on your dom0? also, is net.inet.tcp.sendbuf_auto set to 1? If not, try setting that to 1 with sysctl(8) and see if that changes the behavior at all. -thanks -Brian
Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)
hello. My understanding is that the arp caching mechanism works regardless of whether you use static MAC addresses or dynamically generated ones. The reason is that arp bridges the gap between the layer 2 network, i.e. the MAC addresses, and the layer 3 network, i.e. the IP addresses those MAC addresses map to. You can demonstrate this interaction by shutting down the vif interface to your domu, then delete the MAC address from the arp cache for that vif by using arp -d , then by trying to ping your domu from dom0. After about 20 seconds, you should see the host is down message. Then, use arp -a to look for your domu's IP address. what you'll see in the MAC field is the word "incomplete". If you then run brconfig on the bridge containing the domu, you'll see the MAC address you assigned, or which was assigned dynamically, alive and well. My guess is that you're runing into some sort of short term memory crunch inside the dom0's network stack. The long term ping test should provide more details about where this memory crunch might be. The long time favorite variable for this issue is the good ole nmbclusters value, tunable in the kernel config and visible through: /sbin/sysctl kern.mbuf.nmbclusters Although it's a blunt instrument, the output from: netstat -m might be helpful as well. specifically, the value listed as the number of calls to protocol drain routines. Yet another possibility is if you have a firewall set up , either on the dom0, or on the domu in question. If you're running into some rule that restricts access or bandwidth on the path between the dom0 and the domu, you might see this kind of behavior. Unfortunately, in my experience, when one runs into a firewall issue of this nature, the error messaging around it is very misleading. It's important to remember that the IP stacks on the dom0 or domu, respectively, don't know that the IP address for the machine at the other end of the connection is actually running on the same hardware. Consequently, if there are firewall rules set up on either dom0 or the domu in question, and, possibly both, be sure your firewall rules provide full access between the dom0 and domu in question, just as you would if you were writing rules for remote machines. the fact that you're only seeing this problem when communicating between the dom0 and the domu, and not between the domu and the rest of the world, suggests to me the problem is on the dom0, so I would start by looking there first. Hope these notes help. -Brian
Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)
hello. Actually, on the server side, where you get the "host is down" message, that is a system error from the network stack itself. I've seen it when the arp cache times out and can't be refreshed in a timely manner. What happens if you run an extended ping session between the dom0 and domu hosts, starting the ping from the dom0 side? And, by extended session, I mean running a ping session for an hour or two, capturing all the output in a log file. Do you get any packet loss during that interval? If so, what errors does ping show when the loss is occurring? -thanks -Brian
Re: 10.0 BETA : Poor audio quality if 3.5mm jack is fully inserted
Hello. I've seen this issue and I believe I understand the problem, though I don't have a driver fix at this time. The issue is that the audio output jack on modern Reltek sound chips can be configured for a number of purposes: mono or stereo, line out or speaker out. Some chips even allow you to configure the jack for both audio input and output. Our hdaudio(4) driver doesn't know how to twiddle the bits in the Reltek chip to configure these changes itself, so what ever the default settings are, or what ever the BIOS/firmware does to the chip before NetBSD loads is what we get. I have this problem as well on my Dell Optiplex 5050. My fix was to modify the hdaudio(4) driver to disable the headset jack, allowing me to use the line out jack on the back of the machine without having to listen to the internal speaker. It is my intention to read through the linux driver and figure out how to twiddle the correct bits on the Reltek chip, but I've not yet learned how our driver works well enough to be able to translate linux driver speak into NetBSD architecture speak. The linux driver is full of defines and simple functions that twiddle the right bits in the correct registers, but I had some trouble figuring out how to map those registers into equivalent definitions in our audio framework. The hdaudio(4) talks to hdaudio 1.0 standards compliant chiips, and the registers I'm looking at are extensions beyond the scope of what our driver knows about. It may be that if hdaudio(4) were extended to talk to hdaudio 2.x devices, this problem would go away. As a side note, the FreeBSD audio drivers have the same issue, except they seem to know even less about the characteristics of specific output devices attached to the audio chips they talk to with their pcm(4) driver, which is the hdaudio(4) equivalent. I don't claim to be an expert on this at all, so if I've got it completely wrong, I'd love to have someone correct me. -thanks -Brian
Re: Issue with samba and xen.
hello. I really don't know what's wrong, but I wonder if the xen logs show the domu runing some instruction that's trapped by xen, flagged as dangerous and xen is then faulting the vcpu with an illegal instruction error? I would expect to see that error in the xen log for the specific domu. -thanks -Brian
Re: About kern/57136, panic assertion, probably a diagnotic panic
hello Brad. In reading about your panics on day 6 of uptime, I wonder if the issue might be related to memory allocation? Specifically, by day 6, I expect that memory allocations in the system are pretty fragmented and thus more CPU time is spent doing things like cleaning memory, freeing memory and, perhaps, paging. What happens if you give the domu in question more or less memory in terms of its stability? I'm guessing less memory makes it panic faster, more memory makes it stay up longer. Of course, this is pure speculation on my part and I hve no idea what's going on. :) -Brian
Re: Using the audio(4) driver for recording under -current?
hello Michael. Here is a demonstration of the issue I'm seeing, using audiorecord with two audio devices on the same machine. The audio1 device is a USB C-Media audio dongle. The audio0 device is a Realtek, Product ID: 0255, built into this Dell Optiplex 5050 desktop machine. Hopefully, this bug was a transient one that has been fixed in newer kernels. I'll try to do more testing now that you folks have reported that you're getting data from your audio devices. Thanks for the help and the confirmation that things are working in -current. -Brian %audiorecord -d /dev/sound1 -s 44100 -c 2 -P 16 -e slinear_le -t 1:30 -V test1.wav sample_rate=44100 channels=2 precision=16 encoding=slinear_le recording for 90 seconds, 0 microseconds audiorecord: read failed: Undefined error: 0 Oct 27 11:54:46 mirkwood /netbsd: [ 510110.9126614] audio1: device timeout %audiorecord -d /dev/sound0 -s 44100 -c 2 -P 16 -e slinear_le -t 1:30 -V test1.wav sample_rate=44100 channels=2 precision=16 encoding=slinear_le recording for 90 seconds, 0 microseconds audiorecord: read failed: Resource temporarily unavailable Oct 27 11:57:31 mirkwood /netbsd: [ 510276.2426610] audio0: device timeout
Re: Using the audio(4) driver for recording under -current?
hello. The hdaudio driver I'm using is a locally patched version to work around an issue where the driver doesn't configure the headphone jack correctly. Specifically, it seems the default configuration configures the jack for use with a microphone, rather than with just a headset, which is how I use it. The linux driver seems to have some knobs which permit selecting the various operating modes for the jack. Because I needed something working quickly, I reworked the driver to disable the headphone jack and internal speaker. It has been my intention to go back and fix it so we have the knobs as well, but that hasn't happened yet. So, if you know how to add the appropriate controls, I'm very interested. With that said, here's the dmesg output. Remember, it won't look quite right because it's missing a bunch of output ports. -thanks -Brian Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 9.99.77 (MIRKWOOD) #0: Tue Jan 19 14:34:20 PST 2021 buh...@loth-9.nfbcal.org:/usr/local/netbsd/obj-current/sys/arch/amd64/compile/MIRKWOOD total memory = 16088 MB avail memory = 15564 MB entropy: entering seed from bootloader with 256 bits of entropy WARNING: module error: module `msdos' pushed by boot loader already exists timecounter: Timecounters tick every 10.000 msec Kernelized RAIDframe activated timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100 efi: systbl at pa dbd17018 Dell Inc. OptiPlex 5050 mainbus0 (root) ACPI: RSDP 0xD0ED9000 24 (v02 DELL ) ACPI: XSDT 0xD0ED90B8 F4 (v01 DELL CBX3 01072009 AMI 00010013) ACPI: FACP 0xD0F009F8 00010C (v05 DELL CBX3 01072009 AMI 00010013) ACPI: DSDT 0xD0ED9240 0277B4 (v02 DELL CBX3 01072009 INTL 20160422) ACPI: FACS 0xDB80DF00 40 ACPI: APIC 0xD0F00B08 84 (v03 DELL CBX3 01072009 AMI 00010013) ACPI: FPDT 0xD0F00B90 44 (v01 DELL CBX3 01072009 AMI 00010013) ACPI: FIDT 0xD0F00BD8 AC (v01 DELL CBX3 01072009 AMI 00010013) ACPI: MCFG 0xD0F00C88 3C (v01 DELL CBX3 01072009 MSFT 0097) ACPI: HPET 0xD0F00CC8 38 (v01 DELL CBX3 01072009 AMI. 0005000B) ACPI: SSDT 0xD0F00D00 003176 (v02 SaSsdt SaSsdt 3000 INTL 20160422) ACPI: SSDT 0xD0F03E78 0025A5 (v02 PegSsd PegSsdt 1000 INTL 20160422) ACPI: HPET 0xD0F06420 38 (v01 INTEL SKL 0001 MSFT 005F) ACPI: SSDT 0xD0F06458 000DE5 (v02 INTEL Ther_Rvp 1000 INTL 20160422) ACPI: SSDT 0xD0F07240 0008F6 (v02 INTEL DELL_SFF INTL 20160422) ACPI: UEFI 0xD0F07B38 42 (v01 ) ACPI: SSDT 0xD0F07B80 000EDE (v02 CpuRef CpuSsdt 3000 INTL 20160422) ACPI: LPIT 0xD0F08A60 94 (v01 INTEL SKL MSFT 005F) ACPI: SSDT 0xD0F08AF8 000141 (v02 INTEL HdaDsp INTL 20160422) ACPI: SSDT 0xD0F08C40 00029F (v02 INTEL sensrhub INTL 20160422) ACPI: SSDT 0xD0F08EE0 003002 (v02 INTEL PtidDevc 1000 INTL 20160422) ACPI: SSDT 0xD0F0BEE8 00050D (v02 INTEL TbtTypeC INTL 20160422) ACPI: DBGP 0xD0F0C3F8 34 (v01 INTEL 0002 MSFT 005F) ACPI: DBG2 0xD0F0C430 54 (v00 INTEL 0002 MSFT 005F) ACPI: MSDM 0xD0F0C488 55 (v03 DELL CBX3 06222004 AMI 00010013) ACPI: SLIC 0xD0F0C4E0 000176 (v03 DELL CBX3 01072009 MSFT 00010013) ACPI: TCPA 0xD0F0C658 32 (v02 ALASKA NAPAASF MSFT 0113) ACPI: ASF! 0xD0F0C690 A0 (v32 INTEL HCG 0001 TFSM 000F4240) ACPI: BGRT 0xD0F0C730 38 (v00 ?? 01072009 AMI 00010013) ACPI: DMAR 0xD0F0C768 A8 (v01 INTEL SKL 0001 INTL 0001) ACPI: 10 ACPI AML tables successfully acquired and loaded ioapic0 at mainbus0 apid 2: pa 0xfec0, version 0x20, 120 pins cpu0 at mainbus0 apid 0 cpu0: Use lfence to serialize rdtsc cpu0: CPU base freq 32 Hz cpu0: CPU max freq 36 Hz cpu0: TSC freq CPUID 319200 Hz cpu0: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3 cpu0: node 0, package 0, core 0, smt 0 cpu1 at mainbus0 apid 2 cpu1: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3 cpu1: node 0, package 0, core 1, smt 0 cpu2 at mainbus0 apid 4 cpu2: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3 cpu2: node 0, package 0, core 2, smt 0 cpu3 at mainbus0 apid 6 cpu3: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3 cpu3: node 0, package 0, core 3, smt 0 acpi0 at mainbus0: Intel ACPICA 20201113 acpi0: X/RSDT: OemId ,
Re: Using the audio(4) driver for recording under -current?
hello. Just to clarify, I'm not trying to use two audio devices for recording at the same time. What I was doing was something like: #set up the audio device first audioctl -d /dev/sound2 -w record.rate=44100 record.channels=2 record.precision=16 \ record.encoding=slinear_le #Now, the mixer device mixerctl -d /dev/mixer2 -w inputs.line=240,240 record.source=mic inputs.mic.mute=off \ record.volume=240,240 Then, to record: cat /dev/sound2 > rawrecordingfile Of course, the above parameters need to be changed for the specific audio device which is to be used, but the resulting behavior is the ame, whether I'm using a uaudio device or an hdaudio device. The behavior is: The recording file gets no data from /dev/soundx cat fails with an error: resource temporarily unavailable and, I get kernel messages like: audio2: device timeout At least you guys are getting data out of your audio devices. :) -Brian
Re: Using the audio(4) driver for recording under -current?
hello. thanks for the feedback. In my case, recording doesn't work for hdaudio or uaudio devices. What's strange is that I would expect data to be taken from the wrong input, resulting in a file of silence, but I wouldn't expect kernel messages telling me the audio device timed out. I'll try with a newer kernel. I'll also see if I can test with a 9.x kernel, to make sure it's just a transient error that we had during our march to NetBSD-10. -thanks -Brian On Oct 26, 9:22am, RVP wrote: } Subject: Re: Using the audio(4) driver for recording under -current? } On Tue, 25 Oct 2022, Brian Buhrow wrote: } } > Is anyone using the audio(4) driver for recording successfully under -current? } > } } audiorecord worked for me on 9.99.102: } } ``` } hdaudio0 at pci0 dev 27 function 0: HD Audio Controller } hdaudio0: interrupting at msi1 vec 0 } hdaudio0: HDA ver. 1.0, OSS 4, ISS 4, BSS 0, SDO 1, 64-bit } hdafg0 at hdaudio0: VIA product 8446 } hdafg0: DAC00 2ch: Speaker [Built-In], HP Out [Jack] } hdafg0: ADC01 2ch: Mic In [Built-In] } hdafg0: 2ch/2ch 48000Hz PCM16* } audio0 at hdafg0: playback, capture, full duplex, independent } audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for playback } audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for recording } spkr0 at audio0: PC Speaker (synthesized) } wsbell at spkr0 not configured } hdafg1 at hdaudio0: Intel product 2806 } hdafg1: DP00 8ch: Digital Out [Jack] } hdafg1: 8ch/0ch 48000Hz PCM16* } audio1 at hdafg1: playback, capture, full duplex, independent } audio1: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for playback } audio1: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for recording } spkr1 at audio1: PC Speaker (synthesized) } wsbell at spkr1 not configured } ``` } } } > The symptom is that after I set up the recording parameters, channels, encoding, sampling rate, } > when I try to read data from the /dev/soundx device, I get resource temporarily unavailable } > errors back from the read calls and kernel messages telling me that the audio device has timed } > out. } > } } Perhaps the correct ADC was not selected? See Section 10.6.1 in the Guide[1]. } } -RVP } } [1]: https://netbsd.org/docs/guide/en/chap-audio.html#chap-audio-hdaudio-dacs-adcs >-- End of excerpt from RVP
Using the audio(4) driver for recording under -current?
hello. Is anyone using the audio(4) driver for recording successfully under -current? I'm using an an admittedly older -current (9.99.77), but a browsing of the cvs logs doesn't suggest any fixes have been committed since the version I'm runing, audio.c, rev 1.86. The symptom is that after I set up the recording parameters, channels, encoding, sampling rate, when I try to read data from the /dev/soundx device, I get resource temporarily unavailable errors back from the read calls and kernel messages telling me that the audio device has timed out. I've reproduced this on multiple uaudio USB devices as well as Realtek built-in audio chips. Playback works fine on all of these devices, and, for the USB devices at least, recording works fine under NetBSD-5. Perhaps there is som setting I'm missing when doing the recording setup calls to set the channels, sampling rate and encoding? I'm using 2 channels, 44,100 sampling rate and slinear_le encoding. I'm not sure if this worked under 9.x or not. Any thoughts would be greatly appreciated. -thanks -Brian
Re: How to map Alt keys in X windows under NetBSD-9?
hello. Thanks for the pointers. The following lines in my .Xresources file fixed things up, mostly. It looks like all is working as I expect, except I can't seem to generate an alt-return key sequence. I just get a bell when I try to do this and the application I'm using doesn't receive the keystroke. However, with that said, this is a huge improvement. -thanks -brian xterm*AltIsNotMeta: true xterm*AltSendsEscape: true
How to map Alt keys in X windows under NetBSD-9?
hello. I'm using both the text consoles and the X display on my NetBSD-9 based machine and I'm running into an issue with the keyboard that I think should be simple to figure out, but which I'm finding a bit confusing. Using a USB attached keyboard, when I'm running in the consoles, the alt keys generate a 0x1b (hex) or 27 as the 16-bit key code sent with other keys. That is, applications see 0x1b + the actual ASCII code of the key being pressed with the alt keys. Under X, a different key is sent to the applications running in xterm. I guess I have two questions: 1. How can I figure out which code is being sent when the alt keys are pressed under X? 2. How can I change which code is received by the applications runing undr xterm so they match what is seen under the text consoles? -thanks -Brian
SSH/SCP INSTALLED ON INSTALL IMAGE?
hello. How hard would it be to add ssh/scp to the install image that gets brafted into the netbsd-INSTALL kernel? Is it not there because we don't have room on the install image? Not having ssh capabilities from the install miniroot environment is becoming a real impediment to doing new installs with images copied from other working machines. One of my standard techniques for building a new system is to build a standard machine and keep it as a prototype. Then, dump/restore its image to a new installation and tweak as necessary for that specific application. this gives me all the packages and kernel options I need for our production environment without having to reacreate them every time I build a new machine. By booting the NetBSD-INSTALL kernel, I can escape to a shell, partition the disks on a new machine, mount them and write these saved images directly to the disk, thereby getting a new working machine in one step without having to install each package via script or manually. Traditionally, I've been copying these images around via ftp, but as I begin using more cloud based services, it seems like a real security risk to not be able to copy such images in a secure manner. Thoughts?
Re: ssh, HPN extension and TCP auto-tuning
hello. Refresh my memory. Is it the case that the HPN code only runs if both ends of the ssh connection support HPN and have it turned on? I've been using it for a very long time under NetBSD-5, but I notice that newer versions of openssh as shipped on FreeBSD don't show HPN support in the banner. So, I assume any HPN support I have enabled on NetBSD is not used when talking to FreeBSD machines. Is that true? If that's correct, then I've not noticed a significant drop in network performance without HPN enabled in recent versions of openssh, as shipped with FreeBSD-13 and NetBSD-9. Are there other OS's that support HPN natively? -thanks -Brian
Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)
hello. that's interesting. Do the cores used for the vms also get used for the host os? Can you arrange things so that the host os gets dedicated cores that the vms can't use? If you do that, do you still see a performance drop when you add cores to the vms? -thanks -Brian
Re: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nvmm + ZVOL)
hello. If you want to use zfs for your storage, which I strongly recommend, lose the zvols and use flat files inside zfs itself. I think you'll find your storage performance goes up by orders of magnetude. I struggled with this on FreeBSD for over a year before I found the myriad of tickets on google regarding the terrible performance of zvols. It's a real shame, because zvols are such a tidy way to manage virtual servers. However, the performance penalty is just too big to ignore. -thanks -Brian On Aug 17, 5:43pm, Matthias Petermann wrote: } Subject: Qemu storage performance drops when smp > 1 (NetBSD 9.3 + Qemu/nv } This is a cryptographically signed message in MIME format. } } --ms050409000304070603080100 } Content-Type: text/plain; charset=utf-8; format=flowed } Content-Language: de-DE } Content-Transfer-Encoding: quoted-printable } } Hello, } } I'm trying to find the cause of a performance problem and don't really=20 } know how to proceed. }
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
hello. I just spun up a Linode server, which I think is also running a Q35 virtual chip set and it seems to work just fine, except that in paravirtual mode it doesn't detect all the disks. Is that the issue you all are seeing? -thanks -Brian NetBSD 9.99.77 (LINODE) #0: Fri Aug 12 20:23:43 PDT 2022 buh...@loth-9.nfbcal.org:/usr/local/netbsd/src-9977/sys/arch/amd64/compile/LINODE total memory = 4095 MB avail memory = 3943 MB entropy: entering seed from bootloader with 256 bits of entropy timecounter: Timecounters tick every 10.000 msec Kernelized RAIDframe activated timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100 QEMU Standard PC (Q35 + ICH9, 2009) (pc-q35-7.0)
Re: iscsi target on a zfs zvol?
hello Brad. Yes, I've fooled with the block size, cache sizes, a bunch of other variables. If you search for slow read/write performance with zvols under FreeBSD or Linux, you'll find a number of references to this problem, both directly in the openzfs development bug reports and as bug reports for TrueNAS and other file server packages. I've ben struggling with this speed problem for over a year and a half with ZFS and FreeBSD and when I finally admitted defeat and began using flat files in zfs filesystems, I discovered the true magnetude of the problem. My read/write performance jumped by a factor of 5, which really astounded me. -thanks -Brian On Jul 16, 10:32pm, Brad Spencer wrote: } Subject: Re: iscsi target on a zfs zvol? } Brian Buhrow writes: } } > hello. Yes, I was vaguely aware of the lack of extended attributes for NetBSD-Zfs, but } > what I was suggesting was just using a flat file, exported via iscsi through istgt or your } > initiator of choice, on top of zfs, rather than a zvol, because you'll find the read/write speed } > to be so much faster. Unfortunately, it seems the upstream zfs maintainers have decided that } > zvols are not worth the time to optimize, so while they're functional, they're not performant } > under any openzfs-using implementation. This makes me sad because zvols are such a tidy way to } > manage so many different kinds of things. } > } > -thanks } > -Brian } } } I freely admit that I don't use zvols very much in NetBSD, but did you } mess with the volblocksize any on the volume?? }
Re: iscsi target on a zfs zvol?
hello. Yes, I was vaguely aware of the lack of extended attributes for NetBSD-Zfs, but what I was suggesting was just using a flat file, exported via iscsi through istgt or your initiator of choice, on top of zfs, rather than a zvol, because you'll find the read/write speed to be so much faster. Unfortunately, it seems the upstream zfs maintainers have decided that zvols are not worth the time to optimize, so while they're functional, they're not performant under any openzfs-using implementation. This makes me sad because zvols are such a tidy way to manage so many different kinds of things. -thanks -Brian
Re: iscsi target on a zfs zvol?
Hello. While this is orthogonal to the task you're working on in this e-mail, I'll note that you'll get much better read-write performance if you create a standard zfs filesystem for your time machine backup, then create a regular file in it which you export via iscsi. I discovered this the hard way after years of wondering why I couldn't get zvols to give me the performance I expected. The performance difference is on the order of 4-5 times beter with files under zfs than with zvols. -thanks -Brian
Re: scp -r incompatibility between -current and NetBSD releases
Hello. What version of openssh are you using? I just tested between NetBSD-5.2 and -current as of 99.77. Those versions are: 5.2: OpenSSH_5.0 NetBSD_Secure_Shell-20080403-hpn13v1 99.77: OpenSSH_8.4 NetBSD_Secure_Shell-20201204-hpn13v14-lpk, your command, with a nested directory, works in both directions between these two machines without an issue. -thanks -Brian
Re: NetBSD Xen guest freezes system + vif MAC address confusion (NetBSD 9.99.97 / Xen 4.15.2)
hello. The MAC address confusion is explainable. The MAC address on the virtual domain (domu) cannot match the MAC address on the corresponding interface that's created on the dom0 to service the domu. To solve this problem, the xen backend driver increments the fourth octet of the domu's assigned MAC address to generate a unique MAC address for the dom0. FreeBSD, by contrast, zeros out the entire MAC address on the backend. I like the NetBSD approach much better, because it means both the dom0 and domu sides of each connection have a unique MAC address. Even better, by looking at the dom0 MAC addresses by using brconfig(8), it's easy to match the addresses with their respective domu domains. As to the hanging, I'm less sure about that, but it might be that you're not assigning enough memory to the dom0. Unless I'm mistaken, it looks like you're only allocating 512Mb to the dom0. That seems like a very very small amount for the dom0 to do its work. What happens if you allocate 1G of memory? -thanks -Brian On May 27, 10:12am, Matthias Petermann wrote: } Subject: NetBSD Xen guest freezes system + vif MAC address confusion (NetB } } Hello all, } } currently I am not able to instantiate a NetBSD Xen guest on NetBSD 9.99 } (side fact: I also have problems with a Windows guest, but it is not } that important at the moment). } } The problem occurs in the following environment: } } - Xen Kernel 4.15.2 and matching Xen Tools from pkgsrc 2022Q1 (built } 29.04.2022) } - NetBSD/Xen 9.99.97 (build 25.05.2022) } } The host is booted with this boot.cfg (if this matters): } } ``` } menu=Boot Xen:load /netbsd-XEN3_DOM0.gz console=pc;multiboot } /xen.gz dom0_mem=512M vga=keep console=vga } ``` } } The guest config looks like this: } } ``` } name = "net" } type="pv" } kernel = "/netbsd-INSTALL_XEN3_DOMU.gz" } #kernel = "/netbsd-XEN3_DOMU.gz" } memory = 2048 } vcpus = 2 } vif = [ 'mac=00:16:3E:01:00:01,bridge=bridge0' ] } disk = [ } 'file:/data/vhd/net.img,hda,rw', } 'file:/data/vhd/net-export.img,hdb,rw' } ] } ``` } } When I try to instantiate the guest, I get the following output on the } controlling terminal: } } ``` } ganymed$ doas xl create net } Parsing config from net } libxl: error: libxl_device.c:1109:device_backend_callback: Domain } 1:unable to add device with path /local/domain/0/backend/vif/1/0 } libxl: error: libxl_create.c:1862:domcreate_attach_devices: Domain } 1:unable to add vif devices } ``` } } At the same time the following message appears on the system console: } } ``` } [ 184.680057] xbd backend: attach device vnd0d (size 1048576000) for } domain 1 } [ 184.910057] xbd backend: attach device vnd1d (size 33554432) for } domain 1 } [ 195.260077] xvif1i0: Ethernet address 00:16:3e:02:00:01 } [ 195.320059] xbd backend: detach device vnd1d for domain 1 } [ 195.350051] xbd backend: detach device vnd0d for domain 1 } [ 195.450054] xvif1i0: disconnecting } ``` } } After the messages appear on the system console, the system does not } respond to any input either via SSH or on the local console. It seems to } be frozen. I can still activate the kernel debugger with } Control+Alt+Escape. } } What surprises me: the 4th digit of the MAC address in the system log } seems to be 1 higher than specified in the guest configuration. I have } checked this again because I initially assumed a configuration error. Is } this somehow explainable or might this already be a indication of the } root cause? }
Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers
hello. Following up on this thread yet again, I've figured out a fix for the problem and committed it. Below is the CVS log for the change. -thanks -Brian Source-Changes archive __ * To: source-changes%NetBSD.org@localhost * Subject: CVS commit: src/sys/dev/pci * From: "Brian Buhrow" * Date: Thu, 19 May 2022 04:43:43 + __ Module Name:src Committed By: buhrow Date: Thu May 19 04:43:43 UTC 2022 Modified Files: src/sys/dev/pci: if_bge.c Log Message: For chips which contain an ASF/IPMI firmware, instruct the chip to shut the host ASF firmware down when attaching the device so the IPMI BMC can use the same physical port even when NetBSD doesn't have a network configuration on the device. By contrast, when the device gets a network configuration assigned to it and bge_init() is called, the host ASF firmware is brought up so both NetBSD and the IPMI BMc can use the same physical port. This now matches FreeBSD behavior, as well as behavior from NetBSD-5.2. Tested on a Sunfire X2200-M2 system with the following chip: bge1 at pci7 dev 4 function 1: Broadcom BCM5715 Gigabit Ethernet bge1: interrupting at ioapic0 pin 11 bge1: HW config 00d4, 0014, , bge1: ASIC BCM5715 A3 (0x9003), Ethernet address 00:1e:68:XX:XX:XX bge1: setting short Tx thresholds brgphy1 at bge1 phy 1: BCM5714 1000BASE-T/X media interface, rev. 0 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto To generate a diff of this commit: cvs rdiff -u -r1.352 -r1.353 src/sys/dev/pci/if_bge.c Please note that diffs are not public domain; they are subject to the copyright notices on the relevant files.
Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers
hello. After close to a month with struggling with this issue, I feel like I have a better understanding of the bge(4) driver, but I still don't have a solution to my specific problem. Following up on my original message regarding this issue, see below, I've figured out how to get the port to autonegotiate speed and duplex at boot time, but I still can't get the ASE/IPMI side of the chip to auto-enable itself as it does under V1.152.4.6 of the if_bge.c file. While I can get the IPMI port to work if I log into the machine once it's booted and type: ifconfig bge1 up; ifconfig bge1 down this does me no good if I have to boot the machine in single user mode for some maintenance purpose. And, no, because of the way one of my installations is set up, abandoning the port that's running in dual-use mode from the NetBSD side of the port isn't an option because the machine in question is remotely located and cannot be physically accessed in a timely manner. So, any thoughts would be helpful in tracking this issue down. -thanks -Brian On Apr 20, 11:02pm, Brian Buhrow wrote: } Subject: Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers } hello. Following up on this post, I can now more succinctly describe the problem. } The issue appears to be that when the port is configured at boot time, the media autoselect } code selects 10baset-fdx on the port with ASF running even though the actual speed should be } 100baset-fdx. Typing: } ifconfig bge1 up;ifconfig bge1 down } causes the autoselect code to select the correct speed and duplex. } I realize the ifconfig bge1 down isn't necessary, but I want to show that turning the port off } doesn't revert it to the broken state. } } What I don't understand is what's different between the initial sequence of configuring the } port and doing it again with ifconfig up. I've combed through the if_bge.c file, looking at } the initialization differences between bge_init() and bge_attach(), and they look pretty much } the same relative to the handling of the phy. } Clearly, however, they are not. } Also,I've tried to factor out the differences between what the driver in NetBSD-5.2 does, } versus the current driver, since the 5.2 driver works correctly relative to the ASF firmware. } } Any thoughts anyone might have would be greatly appreciated. I feel I'm close to the answer, } but don't yet have it. } -thanks } -Brian } >-- End of excerpt from Brian Buhrow
Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers
hello. The bug is similar to, but not the same as, kern/32767. The ASF firmware provides a virtual console and IPMI management tools for the server on the same physical port as one of the hosts ethernet ports. When the system boots, it knocks the IPMI board off the net by virtue of the fact that the phy gets set to the wrong speed by the bge(4) driver. I am explicitly not setting any parameters in /etc/ifconfig.bge1 because, while that would work around the problem if the system boots in multi user mode, it means that if the system needs to be booted into single user mode, one loses the IPMI management system and, as a consequence, access to the console. Worse, it also means that one needs physical access to the machine to reset it. Since these machines live in data centers and must be remotely managed, that is not an acceptable work around. -thanks -Brian On Apr 21, 12:37pm, ya...@sdf.org wrote: } Subject: Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers } Apologies for the naïve question... } Are you explicitly setting the interface parameters in /etc/ifconfig.bge1 } file? } Or is there some reason you prefer not to? } } >-- End of excerpt from ya...@sdf.org
Re: Using NetBSD-current/amd64 on Sunfire X2200-M2 servers
hello. Following up on this post, I can now more succinctly describe the problem. The issue appears to be that when the port is configured at boot time, the media autoselect code selects 10baset-fdx on the port with ASF running even though the actual speed should be 100baset-fdx. Typing: ifconfig bge1 up;ifconfig bge1 down causes the autoselect code to select the correct speed and duplex. I realize the ifconfig bge1 down isn't necessary, but I want to show that turning the port off doesn't revert it to the broken state. What I don't understand is what's different between the initial sequence of configuring the port and doing it again with ifconfig up. I've combed through the if_bge.c file, looking at the initialization differences between bge_init() and bge_attach(), and they look pretty much the same relative to the handling of the phy. Clearly, however, they are not. Also,I've tried to factor out the differences between what the driver in NetBSD-5.2 does, versus the current driver, since the 5.2 driver works correctly relative to the ASF firmware. Any thoughts anyone might have would be greatly appreciated. I feel I'm close to the answer, but don't yet have it. -thanks -Brian
Using NetBSD-current/amd64 on Sunfire X2200-M2 servers
hello. I'm trying to update some of our very old Sunfire X2200-M2 servers from NetBSD-5.2 to NetBSD-current/amd64. These are the machines with ELOM/BMc modules on one of the Broadcom ethernet ports which share the physical port with the host machine. the latest commit for NetBSD-5.2 fixed pr kern/49657, which also fixed an issue where when the machine booted, it would knock the LOM board off the net. Unfortunately, this problem crops up again with NetBSD-9.x and nNetBSD-current sources. I've been pouring over the 5.2 driver versus the -current driver to see if I can figure out what the difference is that causes the problem. What I've figured out is that it has something to do with the way the phy gets initialized. Specifically, if the LOM negotiates 100 mbits/sec, full duplex with the switch, when NetBSD boots, it leaves the media set to 10 mbits/sec full-duplex. It's possible to fix the issue manually, by performing an ifconfig up on the interface. That forces the media to reset to the correct value for the NetBSD host and the LOM. Running ifconfig down on the interface after running the initial "up" command doesn't cause the corrected value to go away. That is to say, once one brings the connection "up", it will stay up regardless of the current running state of the interface from NetBSD's perspective. The 5.2 code doesn't do this; it seems to be able to leave the pre-negotiated media value alone. One difference I notice between the two drivers is that the older one uses the MIIF_ANEG flag when it initializes the phy media. So, I tried doing that on the new driver, with no change. Has anyone else run into this with the bge(4) driver and, if so, what fixes or ideas did you come up with to correct the issue? I realize I could work around this with user-land code, but that doesn't solve the problem that if I need to bring the system up in single user mode, I can't do it using the LOM console remotely. that's a real problem for me, since I'm miles away from these machines. Any ideas would be greatly appreciated. -thanks -Brian
Re: State of NET_MPSAFE in -9 and/or -current?
hello. Thanks for the reply. I'll check out the document. Good to know TCP and UDP aren't yet MP safe. -Brian
State of NET_MPSAFE in -9 and/or -current?
hello. I'm wondering if someone could comment on the state of using NET_MPSAFE kernels? Is it ready for production use yet? -thanks -Brian
Re: current - unable to rebuild gobject-introspection due to libffi
hello. I recently ran into this problem. To get around it, I built devel/libffi by hand, , installed it, then rebuilt all the packages that depend on libffi, incluting gobject-introspection. In order for me to do this, I needed to pkg_delete the existing version of gobject-introspection on the build system in order to get things to build. Once I did that, however, the process of moving to libffi version 8 was pretty straightforward. This is with pkgsrc sources as of January 24, 2022. -thanks -Brian
Re: IDENTIFY failed
Hello. Without going and reading the probe routines, I wonder if we can create some sort of hybrid approach? Specifically, probe with the shorter delays, then, if we get a timeout, reset and probe with the longer delays? That wil cause hardware that doesn't exhibit the behavior to work with the faster probes, while slowing the non-working hardware, slightly during boot, while it's probed twice. Again, I'm not sure how dificult it is to introduce that logic, but it's a similar logic we used to determine if old PATA drives needed specific ATA commands to address blocks over 148GB, or something like that. (We'd try the command with the standard command and, if it failed, then try it with the altered command and set a quirk.) -thanks -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello. No. the .lss file change doesn't work because there's no text rendered to highlight. The showcursor problem is a different one and I think that one has been solved with Brett's current fix, but until the current non-rendering of pop-up selections is fixed, I can't say for sure what works and what doesn't, because once the screen is not rendered properly, I become less able to tell what should be there and what should not. -thanks -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello. Okay. I tried -nocolor with lynx + ncurses. It works fine and doesn't demonstrate the same problem. This is a different problem -- the select choices in the pop-up window are not rendered at all, it's not that they're rendered in a transparent color, they're just not there. I think there is still an issue with the base libcurses library and, like Brett, I think it's different than the problem he already solved. Good to know about the color issue, though. -thanks -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
Hello. If that is the problem, why does lynx not demonstrate this problem when linked against libncurses? -thanks -Brian On Sep 16, 8:57am, RVP wrote: } Subject: Re: Help with libcurses and lynx under NetBSD-9 and -current? } On Wed, 15 Sep 2021, Brian Buhrow wrote: } } This, actually isn't a curses issue. In this situation, on colour } terminals, lynx highlights the selected item in yellow; on mono } terms, no highlight is applied at all. You can change this by using } a custom .lss file. Just copy the default /usr/pkg/etc/lynx.lss as } ~/.lynx.lss, then make this small change: } } $ diff /usr/pkg/etc/lynx.lss ~/.lynx.lss } 64c64 } < menu.active: normal: yellow: black } --- } > menu.active: reverse:yellow: black } $ } } Run lynx as: lynx -lss=$HOME/.lynx.lss } You can also: export LYNX_LSS=$HOME/.lynx.lss } } Apart from minor display glitches, it works out OK. } } -RVP >-- End of excerpt from RVP
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello Brett. I'm wondering if you saw my feedback to the PR you worked on regarding the libcurses issue? Here it is, in case you missed it. -thanks -Brian --- Forwarded mail from "Brian Buhrow" From: Brian Buhrow Date: Tue, 10 Aug 2021 11:48:09 -0700 In-Reply-To: <20210720165501.8dd341a9...@mollari.netbsd.org> X-Mailer: Mail User's Shell (7.2.6 beta(4.pl1)+dynamic 2103) To: gnats-b...@netbsd.org, bl...@netbsd.org, gnats-ad...@netbsd.org, netbsd-b...@netbsd.org Subject: Re: pkg/55931: Lynx-2.8.9rel.1 doesn't work with libcurses in NetBSD-9 Cc: buh...@nfbcal.org X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (nfbcal.org [127.0.0.1]); Tue, 10 Aug 2021 11:48:10 -0700 (PDT) X-Status: X-Keywords: X-UID: 8592 Status: OR Hello Brett. My apologies for taking so long to look at this issue. Your fixes improve the situation, but don't appear to entirely solve it. Specifically, in lynx, if trying to select from a drop down menu, the options for the drop down menu do not appear on the screen. Here's how to reproduce the issue I'm seeing and my environment, in case that helps clarify the problem. 1. Run lynx against any web site. lynx -show_cursor 2. While lynx is running, press o to open the options page. 3. Arrow down until you reach the SSL prompting menu item. 4. press return on that menu item. This should produce a popup window in the middle of your text and your arrow keys should let you select one of the choices. 5. On a working system, I get the popup window and I can arrow around. On the libcurses with your changes, nothing appears on the screen, though the arrow keys do change the choices from inside Lynx, meaning when you hit return to select an option, you can see you've changed your selection back on the main screen. My environment is as follows: NetBSD mirkwood.nfbcal.org 9.99.77 NetBSD 9.99.77 (MIRKWOOD) #0: Tue Jan 19 14:34:20 PST 2021 buh...@loth-9.nfbcal.org:/usr/local/netbsd/obj-current/sys/arch/amd64/compile/MIRKWOOD amd64 Just so you're clear, I built a new build of the entire NetBSD-current sources that I've been running for a while, with your patch installed. that way, your patch is the only change from the environment I've been using. My shell environment looks like: HOME=/usr/home/buhrow SHELL=/bin/csh TERM=window-v2 LOGNAME=buhrow USER=buhrow PATH=/bin:/usr/bin:/usr/pkg/bin:/usr/local/bin:/usr/sbin:/usr/X11R7/bin:/usr/games:/usr/home/buhrow/bin:. PWD=/usr/home/buhrow AUDIODEV=/dev/sound0 TERMCAP=WW|window-v2|window program version 2: :am:bs:da:db:ms:pt:cr=^M:nl=^J:bl=^G:ta=^I: :cm=\EY%+ %+ :le=^H:nd=\EC:up=\EA:do=\EB:ho=\EH: :cd=\EJ:ce=\EK:cl=\EE:me=\Er^?:co#100:li#18:se=\ErA:so=\EsA:mr=\EsA:ue=\ErD:us=\EsD:ae=\ErH:as=\EsH:al=\EL:dl=\EM:kb=^H:ku=^[OA:kd=^[OB:kl=^[OD:kr=^[OC: WINDOW_ID=7 Hopefully this is a simple problem for you to reproduce or you can tell me I'm doing something silly that's creating the trouble. -thanks -Brian --- End of forwarded message from "Brian Buhrow"
Re: booting xen [was Re: serial console puzzle]
hello. Because of the BIOS mixup on your serial port numbering, what happens if you change the com1 to com0 on the boot.cfg line where you tell xen to use com1? (Leave the consdev com0 alone for the NetBSD kernel). It may not work, but it might give you more information. -Brian
Re: serial console puzzle
hello Patrick. Are you using a custom kernel with the console defined in it? That overrides any settings in the boot.cfg file. Another thing to check. Are you using a machine that has had NetBSD on it for a long time? Is it possible the boot block on the boot disk is old enough that it doesn't actually read the boot.cfg file? I had one installation for many years where I updated the OS, but the boot block never got updated, so boot.cfg was never relevant on that system. -thanks -Brian
Possible problem with com(4) at 115200 baud when 16550 has only 1 byte in its fifo?
Hello. It looks like there is a problem in the comsoft() routine in sys/dev/ic/com.c. When a panic occurred, I was using com0 on the machine in question, and the port was sending and receiving data at a baud rate of 115200 simultaneously. It's been a long time since I touched this com.c code, but it looks to me like comsoft() doesn't use the mutex cominter() uses to ensure exclusive access. My question is, what happens if cominter() fires, it does its thing, launches comsoft() and, before comsoft() finishes, cominter() fires again? The serial port on this machine has a fifo of 1 byte, so interrupts can com in pretty fast when it's receiving at 115200 baud. On the machine in question, an NetBSD-99.77/amd64 device with a 1-byte fifo 16550 compatible serial chip, I was able to reproduce this panic twice in just a few minutes of each other. -thanks -Brian
Getting the name of a kernel thread?
hello. I just got two panics on a system running NetBSD-9.99.77/amd64. Unfortunately, I didn't have enough swap configured to capture a dump file. However, I did figure out that the problem is in thread 6 of the kernel process, process 0. However, I am having trouble figuring out what the name of thread 6 is. Is there a key word I can use with ps(1) to see the name of a particular thread? I thought I might be able to do it with the wchan field, but for this particular thread, the wchan field is "-". If I can figure out what thread 6 is on this system, it might give me a clue as to where the problem might be. The problem is hard to reproduce, so any data I can get from the running system would be helpful. -thanks -Brian
Re: running xen on current
hello. The difference between UEFI and legacy booting is significant. I'm not sure about the current state of NetBSD and xen-dom0, but with FreeBSD, legacy booting is required unless you're running 13-current. I think NetBSD/xen-dom0 supports UEFI booting, but it requires you use multiboot mode instead of the standard NetBSD boot mode. In addition, I think you need to be running a pretty recent -current, i.e. something since January 1 2021. If you can boot your systems in legacy mode, however, NetBSD-9.x/Xen works very well, except in conjunction with zfs. Hope that helps. -Brian On Apr 15, 9:53am, Patrick Welche wrote: } Subject: running xen on current } I have tried and failed to run xen on 3 -current/amd64 systems with } 3 different failure modes: } } 1) laptop: xen.gz Building a PV Dom0 / ELF: not an ELF binary -> panic/reboot } 2) desktop: XEN3_DOM0 panics including PR port-xen/55978 } 3) server: Trampoline space cannot be allocated; will try fallback -> reboot } } They are all working NetBSD-current/amd64 systems. } } My conclusion was that xen is hopelessly broken, so was quite surprised } by Greg Wood's thread about the finer points of running a guest OS, given } that those systems won't even start the host OS. } } I dug out an old desktop, and to my pleasant surprise it booted XEN3_DOM0, } and I have managed to run some XEN3_DOMUs. } } The difference between the working/broken setups seems to be that the } working one is "BIOS" booting rather than EFI booting. } } Among all your xen success stories, are any of you EFI booting? } } } Cheers, } } Patrick } } } = } } Some extra gory details } } 1) laptop: } } Building a PV Dom0 } ELF: Not an ELF binary } } *** } Panic on CPU 0: } Could not set up DOM0 guest OS } *** } } Reboot in five seconds... } } } 2) desktop: selection of panics in addition to PR port-xen/55978 } } } [ 80.989] panic: LIST_INSERT_HEAD 0xa080073eec28 ../../../../arch/x86/x86/pmap.c:2285 } [ 80.989] cpu13: Begin traceback... } [ 80.989] vpanic() at netbsd:vpanic+0x14a } [ 80.989] snprintf() at netbsd:snprintf } [ 80.989] pmap_enter_ma() at netbsd:pmap_enter_ma+0x14e7 } [ 80.989] pmap_enter() at netbsd:pmap_enter+0x32 } [ 80.989] udv_fault() at netbsd:udv_fault+0x100 } [ 80.989] uvm_fault_internal() at netbsd:uvm_fault_internal+0x574 } [ 80.989] trap() at netbsd:trap+0x432 } [ 80.989] --- trap (number 6) --- } [ 80.989] 7a60617787af: } [ 80.989] cpu13: End traceback... } } [ 75.6599981] panic: kernel diagnostic assertion "ncp->nc_dvp == dvp" failed: file "../../../../kern/vfs_cache.c", line 432 } [ 75.6599981] cpu0: Begin traceback... } [ 75.6599981] vpanic() at netbsd:vpanic+0x14a } [ 75.6599981] kern_assert() at netbsd:kern_assert+0x48 } [ 75.6599981] cache_lookup_entry() at netbsd:cache_lookup_entry+0xde } [ 75.6599981] cache_lookup_linked() at netbsd:cache_lookup_linked+0x160 } [ 75.6599981] namei_tryemulroot() at netbsd:namei_tryemulroot+0x298 } [ 75.6599981] namei() at netbsd:namei+0x29 } [ 75.6599981] vn_open() at netbsd:vn_open+0x8f } [ 75.6599981] do_open() at netbsd:do_open+0x119 } [ 75.6599981] do_sys_openat() at netbsd:do_sys_openat+0x74 } [ 75.6599981] sys_open() at netbsd:sys_open+0x24 } [ 75.6599981] syscall() at netbsd:syscall+0x9c } [ 75.6599981] --- syscall (number 5) --- } [ 75.6599981] netbsd:syscall+0x9c: } [ 75.6599981] cpu0: End traceback... } } } 3) server: EFI boot of Feb 6 2021, xenkernel413-4.13.3.tgz, serial console } } On serial console, all that is seen is: } } 2415648+1324000=0x3910ec } Loading /var/db/entropy-file } Loading /netbsd-XEN3_DOM0 } Start @ 0xce60 [1=0xce991000-0xce9910ec]... } Trampoline space cannot be allocated; will try fallback. } } then it reboots >-- End of excerpt from Patrick Welche
Re: I think I've found why Xen domUs can't mount some file-backed disk images! (vnd(4) hides labels!)
hello. This must be some kind of regression that's ben around a while. I'm runing a xen dom0 with NetBSD-5.2 and xen-3.3.2, very old, but vnd(4) does expose the entire file to the domu's including FreeBSD 11 and 12 without any corruption or booting issues. Do you know when this trouble began? -thanks -Brian
Re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
hello John. Just for completeness, and perhaps I'm mistaken, but it looks like your new setup isn't actually using tls in its smtp transactions at the moment, see below. Since I'm not familir with your setup, I could be completely mistaken, but I note it here in case you want to verify that it's working. -thanks -Brian --- Forwarded mail from "John D. Baker" Received: from mail238c25.carrierzone.com (mail448c25.carrierzone.com [209.235.146.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.netbsd.org (Postfix) with ESMTPS id 3ED3484C86; Thu, 8 Apr 2021 23:02:18 + (UTC) X-Authenticated-User: jdba...@consolidated.net DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=carrierzone.com; s=mailmia; t=1617921724; bh=6XzMGbgl7szZ3CIWi7d5kU2vpW5Rc8K2pQGJ9P7D+4Y=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=sxU6yJLo+boAQ9PI6eYOx6aBgVk6/nBgyGBwURw49jF8rOdHyk1nlOP3YXrDRstsM TQXuA37waCc+XmDQDZ+z2zKpH1ltCn8267DeeYM/+o7RJ77Fon1X9rz2ktk9KHjjvk XfawamtZ5KKPwsNF9nwT7Qpnduj0I1M1yg1x9Hps= Feedback-ID:jdbaker@consoli Received: from david.technoskunk.fur (dsl-dhcp-katytxxchrc-64-92-10-42.consolidated.net [64.92.10.42]) (authenticated bits=0) by mail238c25.carrierzone.com (8.14.9/8.13.1) with ESMTP id 138MfvHE018238; Thu, 8 Apr 2021 22:42:03 +
Re: mail/sendmail not relaying on netbsd-9/sparc, problem with OpenSSL update?
hello. Have you tried running a ktrace on the sendmail process, and its children, to see where things are getting stuck? One other test I can think of is to connect to your ISP's smtp port using openssl from the bad sparc machine and see if you can start up a ssl session using the s_client command. This should tell you whether or not openssl is the problem or if it's somewhere else. Yet another idea is to verify that DNS resolution is working for the the afflicted host, by using dig and the always useful sendmail -bt interface. With these tests in hand, you should have a much better idea of what's actualy going on. -thanks -Brian
Re: regarding the changes to kernel entropy gathering
Hello. As I understand it, Greg ran into this problem on a xen domu. In checking my NetBSD-9 system running as a domu under xen-4.14.1, there is no rdrand or rdseed feature exposed to domu's by xen. This observation is confirmed by looking at the xen command line reference page: https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html So, it seems the best answer is to update our documentation to say that the xen hypervisor, by default, doesn't provide the rdrand and rdseed instructions to the xen guests and NetBSD doesn't trust the random sources provided by the xennet(4) and xbd(4) drivers. Therefore, the only solution to get randomness working for the first time on a newlyinstalled domu is to write 32 bytes to /dev/random. -thanks -Brian
Re: zfs: 9 vs current, and ZIL/L2ARC on ssd?
hello Greg. Zfs seems to be much more stable in -current than 9.x. In particular, if you're using xen, then you definitely want -current because ZFS and xen under 9.x use different maxphys values for data transfers, which leads to a lot of corruption and crashes when using zfs as backingstore for xen domains. Also, under -current, dom0 can be a multi-vcpu system, which should help performance significantly when it comes to i/o on the domu's, especially if running any kind of hvm or pvh domain. Hope that helps. -thanks -Brian On Feb 11, 1:17pm, Greg Troxel wrote: } Subject: zfs: 9 vs current, and ZIL/L2ARC on ssd? } --=-=-= } Content-Type: text/plain } } } I am about to try to use zfs for the first time and have a few } questions. } } I have a machine that is running NetBSD-9/amd64 with 2 cores, 8G of RAM, } a single 1T SSD, with a smallish root/swap/usr, and about 870 GiB free } intended for zfs. I am heading for one po0l that is not raid at all. } } I'm not all that worried about transitions or stability; this is a build } machine for packages, not particularly precious, and it being down for a } week while I fix it is no big deal. } } I will likely pivot the machine to be xen dom0; I hope that doesn't } matter much (other than 1 core only in the dom0). Or I might use nvmm, } or both. } } I might add a spinning disk later, either internal or USB. (I realize } that there, I probably want both ZIL and L2ARC on SSD. I would rather } move bits later than do things now to ease that, since I do not have an } actual plan.) } } My questions are: } } Is 9/current close enough to the same zfs code that it doesn't matter } which I run? If I'm inclined to run current for other reasons, is } that a bad idea zfs-wise? } } I understand that zfs has an intent log always, and that can be within } the pool, or one can add a ZIL device. With the pool having one } device which is an SSD, I see no point in partitioning off part of } that SSD to be the ZIL. } } I understand that zfs has ARC in RAM, and can have L2ARC on disk. } Given that the pool is on SSD, it seems pointless to split off some } for L2ARC. } } My expected answers are: } } The code is basically the same and it doesn't really matter but } probably current has some bugfixes 9 doesn't. There's no reason } current is scary becuase of zfs. } } There is no point in a ZIL on the same SSD as the pool. } } There is really no point in L2ARC on the same SSD as the pool. } } } Corrections/clues appreciated. }
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello Roy and Brett. As usual in these cases, there was a bit of noise in the beginning of this process while I figured out what was wrong and what I was doing wrong. Here is my current understanding of the situation: 1. By default, terminfo reads the TERMCAP environment variable and produces an internal representation of the terminal capabilities, which libcurses can parse and use. that part works fine. 2. Using the TERMCAP ENVIRONMENT works for most screen applications that use libcurses, except for lynx, which, after correcting for some capability errors in the TERMCAP definition, demonstrates the problem whereby cursor tracking doesn't work when select pop-up menus are in use. 3. The captoinfo program and the terminfo tree holding the terminfo definitions for libncurses is incompatible with the captoinfo program in the base distribution, so if one wants to use libncurses, one needs to have both a .terminfo directory in one's home directory, as well as a .terminfo.cdb file in one's home directory, in order to store the translated capabilities for each curses library. This works because, as I discovered by accident, our libterminfo library doesn't actually utilize the plain text .terminfo file it looks up even if it's in the appropriate format. In order for the libterminfo library to produce working terminal description, suitable for consumption by libcurses, the one in our base distribution, it either needs to read a .cdb file directly, or translate it on the fly from the TERMCAP environment variable. 4. When lynx is linked against the libncurses library and all of the terminal capabilities are properly presented to the libncurses library, it works fine, properly drawing the screen and presenting the cursor during all operations. this leads me to the conclusion that we do have a problem in the native libcurses library, and I suspect it has something to do with the changes to the getch.c file in the library, but I don't understand the internals of curses enough to be very helpful in this regard. If it helps, the same version of lynx, when linked against the libcurses from NetBSD-5.2 works fine. 5. As a further confusion, I tried using lynx inside of tmux under NetBSD-9.1, but I got screen drawing errors. This may have been due to the terminal description issues I worked out in 1 above, but I've not gone back to check again since I got lynx working with libncurses under window(1). -thanks -Brian
Issues with ukbd.c Rev. 1.147?
hello. Since upgrading my kernel to -.99.77 a couple of weeks ago, I've noticed a strange problem with my USB keyboard. If I'm in vi, editing a file, if I try to scroll down the file using the j key rapidly, instead of scrolling, I get usage errors from vi. In most cases, I'm editing files that I'm reaching via ssh from the host which I upgraded, i.e. I log in using the keyboard in question to the local host, then ssh over to another host, which is on the same LAN and do my editing, which exhibits the problem. The USB controller is an xhci(4) chip, but I don't think that's important, though it may contribute to the problem. I actually think the problem is related to the changes introduced in Rev. 1.146 of sys/dev/usb/ukbd.c. Specifically, the introduction of a ring buffer for keystrokes. I think the comment beginning on line 689 of the 1.147 version of the file, shown below, suggests the problem I'm seeing. In any case, I think the read pointer is bypassing the write pointer in the ring buffer and generating garbage when I hit keystrokes. If I scroll more slowly, then things work. This problem did not occur with NetBSD-9.1, using the same hardware, though it exhibited other USB issues under 9.1. Has anyone else run into this issue? -thanks -Brian /* * Some keyboards have a peculiar quirk. They sometimes * generate a key up followed by a key down for the same * key after about 10 ms. * We avoid this bug by holding off decoding for 20 ms. * Note that this comes at a cost: we deliberately overwrite * the data for any keyboard event that is followed by * another one within this time window. */ NetBSD 9.99.77 (MIRKWOOD) #0: Tue Jan 19 14:34:20 PST 2021 buh...@loth-9.nfbcal.org:/usr/local/netbsd/obj-current/sys/arch/amd64/compile/MIRKWOOD total memory = 16088 MB avail memory = 15564 MB . . . xhci0 at pci0 dev 20 function 0: vendor 8086 product a2af (rev. 0x00) xhci0: 64-bit DMA xhci0: interrupting at msi0 vec 0 xhci0: xHCI version 1.0 usb1 at xhci0: USB revision 2.0 uhub1 at usb1: NetBSD (0x) xHCI root hub (0x), class 9/0, rev 2.00/1.00, addr 0 uhub2 at uhub1 port 4: API (0x04a5) API USB KB HUB (0x9213), class 9/0, rev 1.00/1.01, addr 2 uhub2: 3 ports with 2 removable, bus powered uhidev1 at uhub2 port 1 configuration 1 interface 0 uhidev1: API (0x04a5) API USB KB HUB (0x0001), rev 1.00/1.01, addr 3, iclass 3/1 ukbd0 at uhidev1
Multiple bells on console terminal?
hello. I have what is probably a simple question on my new NetBSD-9 system. I have a machine with two sound cards in it -- the native audio chip built onto the motherboard, and a USB audio device that plugs into one of the USB ports. When ever the terminal bell rings on the console device, the simulated beep sounds first on the simulated bell on the internal audio chip, then it sounds on the USB audio device. How do I instruct wscons on which device to sound the bell on the console terminal? -thanks -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello. The PREFER.curses=pkgsrc workd fine to get lynx to link against the ncurses library. the ncurses library uses a terminfo database, but it's incompatible with the NetBSD libterminfo library. Fortunately, the NetBSD libterminfo library can exist with just a file called .terminfo.cdb in one's home directory for containing all of the compiled terminal description one plans to use. The ncurses library wants a tree of compiled terminal description files, one for each terminal, in a directory called .terminfo in one's home directory. And, just to make things even more interesting, one must use the appropriate version of tic(1) to compile these descriptions for each of the libraries. the upshot of all of this are the following observations: 1. When lynx is linked against the NetBSD curses library, the .terminfo.cdb file can be used successfully if it contains a terminal description for the terminal you're using with either tmux or window(1). However, when looking at select menus in lynx, the show_cursor feature does not work with the NetBSD curses library. I think, but do not know, this is because the select window is implemented as a pad on top of the regular screen and it appears however lynx accomplishes the cursor tracking in this mode doesn't work with our library. 2. When lynx is linked against the ncurses library and the .terminfo tree is properly installed, all screen rendering and cursor tracking works in all modes. So, while our curses library isn't as broken as I originally thought, I do think there is an issue with the library in that it produces this strange behavior with lynx when using select menus. Finally, I want to thank everyone for the help in getting to the bottom of this issue. I'm usually pretty good at chasing down issues, but this one was definitely confounding me. I will try to look at the lynx code and see if I can figure out what it does differently when trying to show cursor tracking for select menus. If it's a simple fix, then I'd like to see us implement it and publish it in our tree. -thanks -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello. Just to clarify. Right now, the cursor is tracking properly when displaying regular screens, but when select popup menus are in use, the cursor goes to the bottom of the screen. I think this is, again, probably due to a translation error between the termcap spec for the terminal and the terminfo spec for the same terminal. I'm working on checking each field between the termcap spec and the terminfo spec. Hopefully, that will fix this remaining issue. -thanks -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
hello. Thanks again for the tip. After reading the terminfo library sources, I realized that while I had a terminfo description of the terminal, it wasn't being used because, even though the library opens a plain text file containing the terminal description, it only uses the compiled version. The reason vi(1), top, more, etc. work is because they still use the TERMCAP variable if it's available. Something I didn't realize until I read the source. So, now, lynx is working better, but it's still not perfect because it doesn't properly display popup windows, as are used for drop down menus. I assume this is because my translation of termcap data to terminfo data is imperfect. I will work on that. -Brian
Re: Help with libcurses and lynx under NetBSD-9 and -current?
Hello. Thanks for the feedback. Some followup questions: 1. How do I get pkgsrc/www/lynx to compile using -ncurses instead of the native curses library? I tried setting various options in /etc/mk.conf, but it looks like it really wants to compile using the native curses library. I tried changing options.mk in the pkgsrc directory, but I apparently don't fully understand the maze of pkgsrc Makefiles. 2. I think you're right about terminfo versus termcap, but vi(1) works the same way and it works fine with window(1) in full screen mode. Also, lynx definitely draws a full screen, so it seems like it partially works. I created a .terminfo directory in my home directory to translate the termcap info to the terminfo form, but it doesn't look like that's getting used. perhaps it is and my problem is that I have an imperfect mapping between termcap and terminfo that works with vi, more, top, etc. but isn't good enough for lynx. -thanks -Brian On Jan 27, 9:17am, RVP wrote: } Subject: Re: Help with libcurses and lynx under NetBSD-9 and -current? } This might be due to the fact that window(1) relies on setting a } custom TERMCAP environment variable to inform programs running } under it of the term. capabilities it supports, and the curses } library no longer makes use of that. } } With ncurses, building it with the `--enable-termcap' option } makes it use the TERMCAP variable if it set in the environment. } } The ncurses(w) in pkgsrc is not built with that option, so, I } compiled the latest ncurses from source with that option added } and lynx -show_cursor worked just fine under window(1). } } -RVP >-- End of excerpt from RVP
Help with libcurses and lynx under NetBSD-9 and -current?
hello. I'm trying to use lynx, pkgsrc/www/lynx, with NetBSD-9 and NetBSD-current, under the window(1), misc/window package, and I'm having trouble similar to the trouble described in lib/54263. I use the -showcursor option to lynx, so the cursor tracks the links on the page. Under NetBSD-9 and under Netbsd-99.77, the cursor gets hidden when selecting from drop down menus, and the screen gets garbbled much as described in 54263. The same version of lynx, built from the same pkgsrc tree, builds and works fine under NetBSD-5.2, with the native curses there. Might someone be able to shed some light on where the trouble is? I filed a bug for this issue: pkg-55931. I'm sure this is a libcurses problem, rather than a pkgsrc problem at this point, but I didn't realize that when I filed the bug. I use lynx everyday, so having this be broken under NetBSD-9 and -current is a bit of a show stopper. Any help anyone can provide on this topic would be greatly appreciated. -thanks -Brian
Re: Best practice for setting up disks for ZFS on NetBSD
hello David. In the absence of other variables, I'd suggest using wedges. That gives you the ability to replace disks that go bad with differently sized disks in the future, while still retaining your zfs vdev sizes, something zfs likes a lot. Also, I'm pretty sure zfs recovers fine from wedge renumbering, at least it does under FreeBSD, much like raidframe does when it's autoconfiguring. I should say that while I have a lot of experience with zfs under FreeBSD, I've not used it much under NetBSD, mostly due to its instability, which is apparently now becoming much less of a problem -- something I'm very happy about. Anyway, hope that helps. -Brian On Dec 3, 12:30am, David Brownlee wrote: } Subject: Best practice for setting up disks for ZFS on NetBSD } What would be the best practice for setting up disks to use under ZFS } on NetBSD, with particular reference to handling renumbered devices? } } The two obvious options seem to be: } } - Wedges, setup as a single large gpt partition of type zfs (eg /dev/dk7) } - Entire disk (eg: /dev/wd0 or /dev/sd4) } } (I'm going to skip disklabel partitions as they are size limited and } also encounter other issues with zpool import) } } Creating disks with single zfs wedges has the advantage of marking } each disk as "hey, zfs in use here", so it should be less likely to } accidentally overwrite it with something else, the wedge layer is } light enough to not add any measurable overhead, and providing the zfs } partition is aligned correctly zfs should be getting close enough to } "the real disk" } } Using the entire disk seems simpler, and the system (including tools } like iostat) are not suddenly cluttered by (in this case) a set of } unnecessary dk entries. } } In the event of disk renumbering both are thrown out, needing a "zfs } export foo;zfs import foo" to recover. Is there some way to avoid } that? } } David >-- End of excerpt from David Brownlee
Etherip(4) interoperability with NetBSD-9?
hello. I have a bunch of NetBSD-5.2 machines networked together using the etherip(4) protocol. I'm interested in beginning to think about how to upgrade that fleet of devices to NetBSD-9 in hopes of capturing better network performance. I know the etherip(4) driver was removed from NetBSD-9, so I have the following questions: 1. I don't remember if the networking stack in NetBSD-9 is multi-threaded or not. Can someone say how much concurrency is available in the NetBSD-9 stack? 2. Is there a way to get the NetBSD-9 stack to speak to NetBSD-5.2 using the etherip(4) protocol without having to update the NetBSD-5.2 machines? -thanks -Brian
Re: Unable to send packets with ixg(4) driver and NetBSD-9_stable
hello. I have not. The machine is remotely located from me and I'm not sure we have anything but Fiberstore SFP+ modules in stock. -thanks -Brian --- Forwarded mail from SAITOH Masanobu Have you tried any other SFP+ modules?
Unable to send packets with ixg(4) driver and NetBSD-9_stable
hello. I'm trying to get a 10G interface working on a NetBSD-9.0_stable/amd64 machine. I'm able to receive packets on this interface, but appear to be unable to send packets, though the driver doesn't report any errors. Is this a known issue? Version and driver details below. This is with Netbsd-9 CVS sources as of 09/03/2020. Ideas on how to go about troubleshooting this would be greatly appreciated. -thanks -Brian NetBSD 9.0_STABLE (GENERIC) #0: Fri Sep 4 09:06:40 PDT 2020 buh...@nat-1.via.net:/usr/local/netbsd/obj-64/sys/arch/amd64/compile/GENERIC total memory = 32759 MB avail memory = 31784 MB . . . ixg0 at pci4 dev 0 function 0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 4.0.1-k ixg0: device 82599EB ixg0: ETrackID 86c5 ixg0: for TX/RX, interrupting at msix2 vec 0, bound queue 0 to cpu 0 ixg0: for TX/RX, interrupting at msix2 vec 1, bound queue 1 to cpu 1 ixg0: for TX/RX, interrupting at msix2 vec 2, bound queue 2 to cpu 2 ixg0: for TX/RX, interrupting at msix2 vec 3, bound queue 3 to cpu 3 ixg0: for TX/RX, interrupting at msix2 vec 4, bound queue 4 to cpu 4 ixg0: for TX/RX, interrupting at msix2 vec 5, bound queue 5 to cpu 5 ixg0: for TX/RX, interrupting at msix2 vec 6, bound queue 6 to cpu 6 ixg0: for TX/RX, interrupting at msix2 vec 7, bound queue 7 to cpu 7 ixg0: for TX/RX, interrupting at msix2 vec 8, bound queue 8 to cpu 8 ixg0: for TX/RX, interrupting at msix2 vec 9, bound queue 9 to cpu 9 ixg0: for TX/RX, interrupting at msix2 vec 10, bound queue 10 to cpu 10 ixg0: for TX/RX, interrupting at msix2 vec 11, bound queue 11 to cpu 11 ixg0: for TX/RX, interrupting at msix2 vec 12, bound queue 12 to cpu 12 ixg0: for TX/RX, interrupting at msix2 vec 13, bound queue 13 to cpu 13 ixg0: for TX/RX, interrupting at msix2 vec 14, bound queue 14 to cpu 14 ixg0: for TX/RX, interrupting at msix2 vec 15, bound queue 15 to cpu 15 ixg0: for link, interrupting at msix2 vec 16, affinity to cpu 0 ixg0: Using MSI-X interrupts with 17 vectors ixg0: Ethernet address a0:36:9f:66:47:24 ixg0: PCI Express Bus: Speed 5.0GT/s Width x8 ixg0: feature cap 0x1780 ixg0: feature ena 0x400 ixg1 at pci4 dev 0 function 1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 4.0.1-k ixg1: device 82599EB WARNING: Intel (R) Network Connections are quality tested using Intel (R) Ethernet Optics. Using untested modules is not supported and may cause unstable operation or damage to the module or the adapter. Intel Corporation is not responsible for any harm caused by using untested modules. ixg1: ETrackID 86c5 ixg1: for TX/RX, interrupting at msix3 vec 0, bound queue 0 to cpu 0 ixg1: for TX/RX, interrupting at msix3 vec 1, bound queue 1 to cpu 1 ixg1: for TX/RX, interrupting at msix3 vec 2, bound queue 2 to cpu 2 ixg1: for TX/RX, interrupting at msix3 vec 3, bound queue 3 to cpu 3 ixg1: for TX/RX, interrupting at msix3 vec 4, bound queue 4 to cpu 4 ixg1: for TX/RX, interrupting at msix3 vec 5, bound queue 5 to cpu 5 ixg1: for TX/RX, interrupting at msix3 vec 6, bound queue 6 to cpu 6 ixg1: for TX/RX, interrupting at msix3 vec 7, bound queue 7 to cpu 7 ixg1: for TX/RX, interrupting at msix3 vec 8, bound queue 8 to cpu 8 ixg1: for TX/RX, interrupting at msix3 vec 9, bound queue 9 to cpu 9 ixg1: for TX/RX, interrupting at msix3 vec 10, bound queue 10 to cpu 10 ixg1: for TX/RX, interrupting at msix3 vec 11, bound queue 11 to cpu 11 ixg1: for TX/RX, interrupting at msix3 vec 12, bound queue 12 to cpu 12 ixg1: for TX/RX, interrupting at msix3 vec 13, bound queue 13 to cpu 13 ixg1: for TX/RX, interrupting at msix3 vec 14, bound queue 14 to cpu 14 ixg1: for TX/RX, interrupting at msix3 vec 15, bound queue 15 to cpu 15 ixg1: for link, interrupting at msix3 vec 16, affinity to cpu 0 ixg1: Using MSI-X interrupts with 17 vectors ixg1: Ethernet address a0:36:9f:66:47:26 WARNING: Intel (R) Network Connections are quality tested using Intel (R) Ethernet Optics. Using untested modules is not supported and may cause unstable operation or damage to the module or the adapter. Intel Corporation is not responsible for any harm caused by using untested modules. ixg1: PCI Express Bus: Speed 5.0GT/s Width x8 ixg1: feature cap 0x1780 ixg1: feature ena 0x400
Re: NetBSD bug/misbehavior in vdprintf
hello. I'm pretty sure fpritf can return an error that means there was an i/o error or that something about the underlying file descriptor needs investigating. -Brian On Aug 29, 8:25am, Rob Newberry wrote: } Subject: Re: NetBSD bug/misbehavior in vdprintf } >>> NetBSD's implementation of vdprintf makes a special check -- if the } >>> descriptor is in non-blocking mode, it needs to be a regular file (I } >>> think I read that code correctly). But it apparently doesn't have this } >>> check problem for vfprintf. I think it's been there a long time (since } >>> the introduction of vdprintf), but it makes vdprintf behave differently } >>> than vfprintf. In my view, "vfprintf( FILE, ...)" and "vdprintf( } >>> fileno( FILE ), ... )" ought to behave the same -- but they don't (on } >>> NetBSD) if "fileno( FILE )" has been marked non-blocking and it's not a } >>> regular file. } >> } >> You are right, it should work and I removed the test. } > } > Isn't the situation a bit more complicated? Normally, stdio will ensure } > data isn't just lost for non-blocking sockets on the blocking condition. } > But I don't think the whole dprintf interface allows dealing with error } > conditions in any sane way. } } Is the interface any different for fprintf than dprintf? Does fprintf (by virtue of having a FILE* instead of just a descriptor) have the ability to deal with those errors better? } } } >-- End of excerpt from Rob Newberry
Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1
Hello. I'm thinking of notebooks. Yes, they have screens and keyboards, but those are not always usable and, having a serial console over USB could let someone install to a notebook remotely. Also, I've encountered some Intel based appliance boards that don't have easily used serial ports on them. When they're installed in cramped wiring closets, it's much easier to get a USB serial port on them than it is to get a screen and keyboard. -Brian On Jul 6, 5:07pm, Martin Husemann wrote: } Subject: Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot } On Mon, Jul 06, 2020 at 07:55:29AM -0700, Brian Buhrow wrote: } > hello. In my case, there are times when I want a serial console, for } > set up or troubleshooting, but cannot use the built-in display for various } > reasons. So, I think it would be useful in more situations than might } > first appear. Yes, it wouldn't give you DDB on that console, but for } > environments where the kernel loads and runs, it would give you access to } > everything else over a serial port. } } Stupid question: are there now actually x86 boards that do *not* have a real } serial on-board? I have not seen any so far (none of the new ones come with } an external connector of course, but they can be added easily unless it is } a notebook). } } Martin >-- End of excerpt from Martin Husemann
Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1
hello. In my case, there are times when I want a serial console, for set up or troubleshooting, but cannot use the built-in display for various reasons. So, I think it would be useful in more situations than might first appear. Yes, it wouldn't give you DDB on that console, but for environments where the kernel loads and runs, it would give you access to everything else over a serial port. -Brian On Jul 6, 6:05am, Mouse wrote: } Subject: Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot } > I agree with Mouse, except that I also think it would be very helpful } > and useful to have a serial console on USB only devices. } } Oh, sure, it'd be helpful/useful. Lots of difficult things would be. } } > I wonder if we could make the console a virtual device which is } > attached dynamically to a USB serial port if and when available. } } I have no doubt that could be done. } } I'm not sure how useful it would be. It seems to me that the set of } times when it's most important to have a serial console overlaps } heavily with the set of times when the USB stack is least likely to } work. } } So I guess my reaction is "probably better than nothing - but not by } all that much". } } Reworking the USB stack so serial ports can run polled would probably } help, but my own experience echos Mike Pumford's. I've even had times } when I've had boot troubles and a *PS/2* keyboard didn't work - and at } least half the times when I've used a USB keyboard as console keyboard } (which != a USB serial port as serial console) it's worked because the } BIOS and hardware have collaborated to give the illusion of a PS/2 } keyboard. } } /~\ The ASCII Mouse } \ / Ribbon Campaign } X Against HTML mo...@rodents-montreal.org } / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B >-- End of excerpt from Mouse
Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1
Hello. I agree with Mouse, except that I also think it would be very helpful and useful to have a serial console on USB only devices. I wonder if we could make the console a virtual device which is attached dynamically to a USB serial port if and when available. that would let the system think it has a console, but one would only see it when the kernel and the USB subsystem are up. Yes, I get this would make watching things boot challenging, but by the time you get to single user mode, the kernel is fully up and running and USB is or should be available by then. thoughts?
Re: RAIDframe question
hello. If you reboot again, the raid2 will probably look as you expect. The general procedure for disk replacement is; 1. raidctl -a /dev/newdisk raidset 2. raidctl -F /dev/baddisk raidset (fails the bad disk, uses the spare and reconstructs to it) 3. Raid is left with a used_spare, but all is wel. 4. Reboot. All components become optimal. It has long been my desire that once a spare is used, it get automatically promoted to optimal without the interveening reboot. I probably could have made this change with Greg's blessing, but I never did the work. Hope that helps. -Brian On Jun 16, 12:18am, Greywolf wrote: } Subject: Re: RAIDframe question } I don't know what I did to get that volume to recover but ripping } it apart and placing the good component first on reconfiguration } produced a good volume on a rebuild. As I recall it looked a lot like this: } } Components: } component0: failed } /dev/wd1c: optimal } Spares: } /dev/wd0c: spare } component0 status is: failed. skipping label } Component label for /dev/wd1c: }Row: 0, Column: 1, Num Rows: 1, Num Columns: 2 }Version: 2, Serial Number: 1984, Mod Counter: 7232 }Clean: No, Status: 0 }sectPerSU: 128, SUsPerPU: 4, SUsPerRU: 1 }Queue size: 120, blocksize: 512, numBlocks: 976772992 }RAID Level: 1 }Autoconfig: Yes }Root partition: No }Last configured as: raid1 } /dev/wd0c status is: spare. Skipping label. } Reconstruction is 100% complete. } Parity Re-write is 100% complete. } Copyback is 100% complete. } } On the other hand, I have the following showing up after } a rebuild (different volume, "raid2", mirrored 2TB disks): } } Components: } /dev/dk0: optimal } component1: spared } Spares: } /dev/dk1: used_spare } Component label for /dev/dk0: }Row: 0, Column: 0, Num Rows: 1, Num Columns: 2 }Version: 2, Serial Number: 3337, Mod Counter: 468 }Clean: No, Status: 0 }sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1 }Queue size: 100, blocksize: 512, numBlocks: 3907028992 }RAID Level: 1 }Autoconfig: Yes }Root partition: No }Last configured as: raid2 } component1 status is: spared. Skipping label. } Component label for /dev/dk1: }Row: 0, Column: 1, Num Rows: 1, Num Columns: 2 }Version: 2, Serial Number: 3337, Mod Counter: 468 }Clean: No, Status: 0 }sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1 }Queue size: 100, blocksize: 512, numBlocks: 3907028992 }RAID Level: 1 }Autoconfig: Yes }Root partition: No }Last configured as: raid2 } Parity status: clean } Reconstruction is 100% complete. } Parity Re-write is 100% complete. } Copyback is 100% complete. } } I've been thru enough different results it's hard to tell whether that is sane; } I would have expected /dev/dk1 to have shifted up to 'optimal' and component1 to } have vanished. } } On Sat, Jun 13, 2020 at 11:48 PM Martin Husemann wrote: } > } > On Sat, Jun 13, 2020 at 09:44:35PM -0700, Greywolf wrote: } > > raidctl -a /dev/wd0c raid1 } > > } > > raidctl -F component0 raid1 } > } > I would have expected that to work. What is the raidctl status output } > after the -a ? } > } > Martin } } } } -- } --*greywolf; >-- End of excerpt from Greywolf
Re: USB keyboard input overrun on EHCI?
hello. My recollection may be slightly wrong here since I'm still running NetBSD-5 in most cases, but my understanding is that ehci(4) connected devices are all USB-2.0 and for slower devices, the uhci(4) or ohci(4) hub drivers provide service. Do these bar code scanners attach as either USB-2 or USB-1, depending on what's available on the hardware or are they USB-1.0 only devices? On older NetBSD-5 systems, if there is an ehci(4) device, but no matching uhci(4) or ohci(4) device, USB-1.0 attached devices just don't work. I don't know how we worked around that in -current, but I wonder if we're trying to run USB-1.0 devices through ehci(4) in some software emulated mode that's not working right? How does the scanner attach to the working hardware versus the broken hardware? -thanks -Brian
Re: panic on zpool create
hello David. I wonder if you're running into another manifestation of kern/54724, which shows zfs corrupting kernel memory in NetBSD-9. I show the problem having to do with xen, but I now believe the problem is entirely with zfs and it was only coincidental that I ran into it with xen. I left the machine in question, running zfs, for a month, doing pretty much nothing, but when I came back to it, I got weird messages like, proc table is full. There were a lot of process running, but I couldn't determine which one was the one that was "really" stuck, since the number of commands I could run was extremely limited. -thanks -Brian On Jan 8, 1:07am, David Brownlee wrote: } Subject: Re: panic on zpool create } On Tue, 7 Jan 2020 at 12:48, David Brownlee wrote: } > } > On Tue, 7 Jan 2020 at 10:07, J. Hannken-Illjes wrote: } > > } > > For some reason locking the directory we want to mount on crashes. } > > } > > Anything special with the root on this machine? } > > } > > Does the directory (/angus_media I suppose) exist? } > } > /angus_media was present after the initial panic, removing it did not } > seem to help. } > } > Ran a fsck -fyP on the system which picked up some issues, rebooted } > and then tried again but the system still panics on zpool import (I'd } > sent a follow up email noting that it appears the create is working } > fine but the import panics, based on being able to create the pool on } > another box running the same OS, but that pool then panics on import } > on this box). } > } > The system is a Dell T320 with three filesystem on an LSI mfii0 } > controller, but I'm testing ZFS as the only two disk on the onboard } > AHCI } > } > I'm going to try a current kernel in a bit (its a server and gateway } > box so I'm trying to minimise downtime :) } } O-Kaayyy... I have another data point. } } The panic does not occur in single user, but does when the system is running. } } Given this box takes approximately four minutes to complete POST to } the point it actually starts to boot anything, and has the primary } copy of quite a lot of data I value, my testing tomorrow is likely to } be a little painful... } } Will update when I know more :) } } David >-- End of excerpt from David Brownlee
Problems building gcc48 on NetBSD-9_BETA
hello. I'm running into the following build error when trying to build pkgsrc/lang/gcc48 using pkgsrc-2019Q3 sources. The problem appears to be that libgcc_s.so, which is in /lib on the machine on which I'm building, isn't included in the LDFLAGS argument. Either that, or a locally built copy is in play, which doesn't have the symbols referenced below. Has anyone else run into this issue and, if so, is there an easy fix? -thanks -Brian cc1-checksum.o libbackend.a main.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a ./../intl/libintl.a ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -L/usr/pkg/lib -L/usr/pkg/lib -L/usr/pkg/lib -lmpc -lmpfr -lgmp -rdynamic -L../zlib -lz /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__gttf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__extenddftf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__fixunstfdi@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__eqtf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__lttf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__letf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__addtf3@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__multf3@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__getf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__floatunditf@GCC_4.2.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__netf2@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__subtf3@GCC_3.0' /usr/bin/ld: /usr/pkg/lib/libmpfr.so: undefined reference to `__trunctfdf2@GCC_3.0' collect2: error: ld returned 1 exit status
Panic on NetBSD-9 using zfs receive and xen
hello. I'm experimenting with a NetBSD-9 Xen server running zfs volumes as domU disks. While runing a zfs receive of a snapshot of a domU disk from another server, I experienced the following crash. While this could be zfs related, I've been running zfs receives numerous times on the system without a problem. This is on Dom0 with 8GB of RAm and 1 CPU. This is from NetBSD-9_BETA from about September 25, 2019. I wasn't able to get a crash dump, but, maybe, this log will help figure out what's going wrong. It seems like a race condition in the pool code, but I'm not familiar with that part of the kernel enough to provide a lot of detail at the moment. has anyone else seen this issue? Is it a known problem or should I file a bug? -thanks -Brian [ 12156.8100589] panic: kernel diagnostic assertion "(pp->pr_nout > 0)" failed: file "/usr/local/netbsd/src-90/sys/kern/subr_pool.c", line 1146 pool_do_put: [xbbrp] putting with none out [ 12156.8100589] cpu0: Begin traceback... [ 12156.8100589] vpanic() at netbsd:vpanic+0x143 [ 12156.8100589] kern_assert() at netbsd:kern_assert+0x48 [ 12156.8100589] pool_put() at netbsd:pool_put+0x5b6 [ 12156.8100589] pool_cache_invalidate_groups() at netbsd:pool_cache_invalidate_groups+0x59 [ 12156.8100589] pool_reclaim() at netbsd:pool_reclaim+0x72 [ 12156.8100589] pool_drain() at netbsd:pool_drain+0x85 [ 12156.8200580] uvmpd_pool_drain_thread() at netbsd:uvmpd_pool_drain_thread+0x74 [ 12156.8200580] cpu0: End traceback... [ 12156.8200580] dumping to dev 168,11 (offset=2097007, size=0): not possible [ 12156.8200580] rebooting...
Errors building cmake-3.15.3 with pkgsrc-2019Q3
hello. I'm trying to build devel/cmake from the pkgsrc using the pkgsrc-2019Q3 branch of the pkgsrc tree. I'm building on a NetBSD-9.0/amd64 system. The system has xbase and xcomp sets installed from the snapshots on nyftp.netbsd.org from October 17 or so, and the rest of the OS is built from sources as of September 25, 2019 or so. This looks like an X problem, but I have no idea where to look for the missing file. Has anyone run into this eror? -thanks -Brian => Bootstrap dependency digest>=20010302: found digest-20190127 ===> Skipping vulnerability checks. WARNING: No /var/db/pkg/pkg-vulnerabilities file found. WARNING: To fix run: `/usr/sbin/pkg_admin -K /var/db/pkg fetch-pkg-vulnerabilities'. => Checksum SHA1 OK for cmake-3.15.3.tar.gz => Checksum RMD160 OK for cmake-3.15.3.tar.gz => Checksum SHA512 OK for cmake-3.15.3.tar.gz ===> Installing dependencies for cmake-3.15.3 == The following variables will affect the build process of this package, cmake-3.15.3. Their current value is shown below: * CURSES_DEFAULT = curses * KRB5_DEFAULT = heimdal * SSLBASE = /usr * SSLCERTS = /etc/openssl/certs * SSLDIR = /etc/openssl * SSLKEYS = /etc/openssl/private Based on these variables, the following variables have been set: * CURSES_TYPE = curses * KRB5BASE (defined, but empty) * KRB5_TYPE = heimdal You may want to abort the process now with CTRL-C and change their value before continuing. Be sure to run `/usr/bin/make clean' after the changes. == => Tool dependency gmake>=3.81: found gmake-4.2.1nb1 => Build dependency rhash>=1.3.1: found rhash-1.3.8 => Build dependency cwrappers>=20150314: found cwrappers-20180325 => Full dependency libuv>=1.6: found libuv-1.32.0 => Full dependency curl>=7.65.3nb3: found curl-7.66.0nb1 ===> Overriding tools for cmake-3.15.3 ===> Extracting for cmake-3.15.3 ===> Patching for cmake-3.15.3 => Applying pkgsrc patches for cmake-3.15.3 ===> Creating toolchain wrappers for cmake-3.15.3 ===> Configuring for cmake-3.15.3 => Fixing LOCALBASE and X11 paths. /bin/cp /usr/local/netbsd/pkgsrc/devel/cmake/files/Source_Checks_cm_cxx_cbegin.cxx /usr/local/netbsd/pkgsrc/devel/cmake/work/cmake-3.15.3/Source/Checks/cm_cxx_cbegin.cxx cp: /usr/local/netbsd/pkgsrc/devel/cmake/files/Source_Checks_cm_cxx_cbegin.cxx: No such file or directory *** Error code 1 Stop. make[1]: stopped in /usr/local/netbsd/pkgsrc/devel/cmake *** Error code 1 Stop. make: stopped in /usr/local/netbsd/pkgsrc/devel/cmake
Re: Converting termcap entries to terminfo entries
hello Roy. I must have been tired when I looked at the problem before writing my message. Brad suggested I look at the captoinfo program in the ncurses package again to make sure it actually produced binary output instead of terminfo source. In the ncurses package I'm using, from 2016, captoinfo is a symlink to tic, which may have been part of the issue. In any case, I found some formatting issues with my original termcap file that captoinfo didn't like, and when I fixed them, it produced a file suitable for your version of tic. So, vi and other screen using programs are now working with the window(1) program again. Thank you to you and Brad for writing back with such helpful messages. -thanks -Brian
Converting termcap entries to terminfo entries
hello. I'm in the process of building NetBSD-9.0 systems in an effort to consider upgrading from my fleet of NetBSD-5.2 systems to NetBSD-9. As a long time window(1) user, I have a termcap entry for the window terminal type that I use on systems that I ssh into from window(1) panes. It is my practice to put a termcap and a terminfo database in my home directory on such systems, so that regardless of whether a program at the far end wants termcap or terminfo, it will be able to draw on the screen in full screen mode. what I need is a way of converting the termcap entries I have into a terminfo source file that tic(1) can compile into a .cdb file which can be used on NetBSD-9 systems. I have an older version of captoinfo(1) from the ncurses pkg, but it produces binary terminfo output unsuitable for the tic(1) program. I'm fuly aware that window(1) has been deprecated in favor of tmux(1), but I haven't climbed the learning curve of tmux(1) yet and I'm not sure it does everything I get from the window(1) program. So, can someone tell me what program I should use to convert termcap files into terminfo source files suitable for the new terminfo libraries in NetBSD-8 and 9? -thanks -brian
Problem with building installboot on NetBSD-5.2.
hello. I'm working to build release of netBSD-9 on a NetBSD-5.2 system. Among other errors, documented elsewhere, I'm seeing the error shown below. The problem seems to be triggered by a commit Jason made to usr.sbin/installboot/Makefile V1.52. It looks like this conditional block of code doesn't take into account the fact that older versions of gcc might be in use and, thus, not know about the new command line switches. Could someone suggest the best way to fix this problem without breaking builds on newer versions of NetBSD? Below is the error I'm seeing, followed by the diff that causes the trouble. Any suggestions on the best way to correct this problem would be greatly appreciated. -thanks -Brian # compile installboot/fdt.lo cc -O -I/usr/local/netbsd/src-90/tools/installboot/../../common/include -I. -I/usr/local/netbsd/src-90/tools/installboot -I/usr/local/netbsd/src-90/tools/installboot/../mips-elf2ecoff -I/usr/local/netbsd/obj-64/tooldir.NetBSD-5.2_STABLE-i386/include/nbinclude -DEVBOARDS_PLIST_BASE=\"/usr/local/netbsd/obj-64/tooldir.NetBSD-5.2_STABLE-i386\" -I/usr/local/netbsd/obj-64/tooldir.NetBSD-5.2_STABLE-i386/include -I/usr/local/netbsd/obj-64/tooldir.NetBSD-5.2_STABLE-i386/include/nbinclude -I/usr/local/netbsd/obj-64/tooldir.NetBSD-5.2_STABLE-i386/include/compat -I/usr/local/netbsd/src-90/tools/compat -DHAVE_NBTOOL_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -DSUPPORT_FDT -I/usr/local/netbsd/src-90/tools/installboot/../../usr.sbin/installboot/../../sys/external/bsd/libfdt/dist -I/usr/local/netbsd/src-90/tools/installboot/../../usr.sbin/installboot -I. -D_KERNTYPES -c -o fdt.lo.o -Wno-error=sign-compare /usr/local/netbsd/src-90/tools/installboot/../../usr.sbin/installboot/../../sys/external/bsd! /libfdt/dist/fdt.c cc1: error: unrecognized command line option "-Wno-error=sign-compare" *** [fdt.lo] Error code 1 Index: Makefile === RCS file: /cvsroot/src/usr.sbin/installboot/Makefile,v retrieving revision 1.51 retrieving revision 1.52 diff -u -r1.51 -r1.52 --- Makefile11 Jan 2017 12:19:44 - 1.51 +++ Makefile7 May 2019 05:02:42 - 1.52 @@ -1,4 +1,4 @@ -# $NetBSD: Makefile,v 1.51 2017/01/11 12:19:44 joerg Exp $ +# $NetBSD: Makefile,v 1.52 2019/05/07 05:02:42 thorpej Exp $ # .include @@ -11,9 +11,17 @@ ARCH_XLAT+= sun2-sun68k.c sun3-sun68k.c .if !defined(SMALLPROG) && !defined(ARCH_FILES) -ARCH_FILES= alpha.c amiga.c emips.c ews4800mips.c hp300.c hppa.c i386.c -ARCH_FILES+= landisk.c macppc.c news.c next68k.c pmax.c -ARCH_FILES+= sparc.c sparc64.c sun68k.c vax.c x68k.c +ARCH_FILES= alpha.c amiga.c +ARCH_FILES+= emips.c evbarm.c ews4800mips.c +ARCH_FILES+= hp300.c hppa.c +ARCH_FILES+= i386.c +ARCH_FILES+= landisk.c +ARCH_FILES+= macppc.c +ARCH_FILES+= news.c next68k.c +ARCH_FILES+= pmax.c +ARCH_FILES+= sparc.c sparc64.c sun68k.c +ARCH_FILES+= vax.c +ARCH_FILES+= x68k.c .else ARCH_FILES?= ${ARCH_XLAT:M${MACHINE}-*:S/${MACHINE}-//} .if empty(ARCH_FILES) @@ -23,8 +31,30 @@ SRCS+=${ARCH_FILES} +.if !empty(ARCH_FILES:C/(evbarm)/evboard/:Mevboard.c) +SRCS+=evboards.c +.endif + +.if !empty(ARCH_FILES:C/(evbarm)/fdt/:Mfdt.c) +FDTDIR=${.CURDIR}/../../sys/external/bsd/libfdt/dist +.PATH: ${FDTDIR} +CPPFLAGS+= -DSUPPORT_FDT -I${FDTDIR} +SRCS+=fdt.c fdt_ro.c fdt_strerror.c +# XXX libfdt has some sign-comparison issues +COPTS.fdt.c+= -Wno-error=sign-compare +COPTS.fdt_ro.c+= -Wno-error=sign-compare +COPTS.fdt_strerror.c+= -Wno-error=sign-compare +.endif + + +.if !defined(HOSTPROGNAME) +.if !empty(ARCH_FILES:C/(evbarm)/ofw/:Mofw.c) +CPPFLAGS+= -DSUPPORT_OPENFIRMWARE +.endif +.endif + .if empty(ARCH_FILES:C/(macppc|news|sparc|sun68k|x68k)/stg2/:Mstg2.c) -CPPFLAGS += -DNO_STAGE2 +CPPFLAGS+= -DNO_STAGE2 .else SRCS+= bbinfo.c @@ -47,6 +77,11 @@ COPTS.${f}.c+= -Wno-pointer-sign .endfor +.if !empty(SRCS:Mevboards.c) +LDADD+=-lprop +DPADD+=${LIBPROP} +.endif + LDADD+= -lutil DPADD+= ${LIBUTIL} .endif
Re: NetBSD on a wireless router?
hello. My Dell Latitude 400 has an Atheros mini-pci wifi card in it that runs great with NetBSD-5.2 in hostap mode, serving 802.11BG clients. I believe it's using the open source HAL code. While this isn't 802.11N or 802.11AC, I use it regularly in this mode to "tether" devices wishing to use my USB Verizon Internet modem. So, if you can find a piece of equipment with a mini-pci slot in it and an antenna, I believe you can get these cards very cheap. I bought mine on Ebay some 11 or 12 years ago. -thanks -Brian
Re: kern/54289 hosed my RAID. Recovery possible?
hello. Yes, raidctl -C with the original config file that created the raid, or one you faked up for the occasion, should get you going again. Once you configure the raid with raidctl -C, you can then run parity checks and filesystem checks without a problem. I've done this sort of thing many times over the years. -Brian On Aug 15, 12:03am, jdba...@consolidated.net wrote: } Subject: kern/54289 hosed my RAID. Recovery =?UTF-8?Q?possible=3F?= } The SiI3214 SATALink card suffers from the identify problem in netbsd-9 } and -current (PR kern/54289). } } Booting a netbsd-9 kernel, the drives failed to identify which caused } RAIDframe to mark the 4 drives on that card (of 8) in my RAID as FAILED. } Rebooting netbsd-8, the drives identify properly, but are still marked } as } FAILED. } } Is there any way to unmark them so the raid will configure and recover? } Normally 'raidctl -C' is used during first time configuration. Could it } be used to force configuration, ignoring the FAILED status? Would the } RAID } be recoverable with parity rebuild afterwards? } } Thanks. } } John D. Baker } } Sorry for the poor (or lack of) formatting. I've had to evacuate to my } ISP's web mail until this is sorted out (or I get my "oil lamps" in } place). >-- End of excerpt from jdba...@consolidated.net
Re: Switching ttys to /dev/constty by default (Was: Enabling xdm in sysinst breaks console login?)
hello. What happens with xen consoles or serial consoles? -thanks -Brian
Re: recurring tstile hangs on -current
hello. If I were looking at this issue, I'd be looking at the perl process stuck in bioloc, to see what it's doing. As I understand it, processes stuck in tstile are a symptom, rather than a cause. that is, any process that is waiting for access to some subsystem in an indirect manner will show as waiting in tstile, rather than the actual thing it's waiting for. Perl, on the other hand, is in bioloc, short for biolock I assume, and my question is why? If you can clear that process, I'm thinking everything else will spring to life and begin working again. Just my 2 cents. -thanks -Brian On Jun 28, 9:42pm, Thomas Klausner wrote: } 28391 pbulk117033M 6240K bioloc/0 0:00 0.00% 0.00% perl
Re: mcelog?
hello. Does the server on which you're running Xen have a BMC controller that keeps track of hardware conditions and the like? If it does, then, if mcelog is too hard to port, you might be able to get the details you want from ipmitool through the BMC. To answer your question, it looks like mcelog has been ported to FreeBSD, with some limitations. However, if I remember correctly, there needs to be some support in the kernel for trapping and logging the mce errors and I'm not sure the NetBSD kernel does that. -thanks -Brian On Mar 20, 11:22am, John Nemeth wrote: } Subject: mcelog? } I originally posted this on port-amd64, but didn't get any } response, so now trying a list with a wider audience. } } One of my Xen hosts has been getting this error a lot: } } (XEN) Bank 4: 945a4000fd080813 atef3581180 } (XEN) MCE: polling routine found correctable error. Use mcelog to parse above e } rror output. } } My research tells me that "mcelog" is a Linux program for } reading and interpreting the MCE registers. Do we have anything } like mcelog or anyway to read MCE errors? If not, any idea what } it would take to port mcelog? It appears to need a device called, } /dev/mcelog. } } In any event, if I'm reading the above correctly, I believe } that it is telling that there is bad memory? >-- End of excerpt from John Nemeth
Re: zsh crash in recent -current
hello Robert. Given this code fragment and the discussion you raise about it, allow me to ask what perhaps is a naive question. If the sample you quote is incorrect, what is the correct way to accomplish the same task? -thanks -Brian On Mar 13, 6:27pm, Robert Elz wrote: } Subject: Re: zsh crash in recent -current } Date:Wed, 13 Mar 2019 10:06:42 + } From:Chavdar Ivanov } Message-ID: } } | I saw the one with the trashed history as well. } | } | I don't think it is zsh's problem, though. As I mentioned above, I've } | used v5.7 since it came out without any problems until perhaps 3-4 } | days ago. } } I would guess that maybe there is code like this } } for (list_ptr = list_head; list_ptr != NULL; list_ptr = list_ptr->nxt) } { } /* do stuff on list */ } if (element_should_be_deleted) { } /* with testing for NULLs added but not shown here */ } list_ptr->prev->nxt = list_ptr->nxt; } list_ptr->nxt->prev = list_ptr->prev; } free(list_ptr); } } } } } } which will "work" perfectly wih most versions of malloc, as } that free does not change anything in the memory that has been } freed, but will collapse in a giant heap if free() scribbles } over the memory as part of deleting things, which some of the } dumps that various people have shown on this (and similar) issues } looks to be what is happening (the scribbling - it is deliberate } to expose bugs like this one). } } Code like the above is easy to write, and most of the time works fine } (and would have worked with the previous malloc) but will die } big time when the arena is scrambled (not just zeroed, usually). } } Someone should look for something like this in the areas of zsh } that are crashing, and other programs. } } This is far more likely than the new malloc being broken, and just } only happening to hit a few programs, and is more likely than some } random memory corruption that simply has never been noticed until } now. } } kre } } >-- End of excerpt from Robert Elz
Re: problems with USB/CDC serial (umodem) - devices work with Linux, Mac OS X, and FreeBSD, but not NetBSD
hello. My suggestion is to have two windows open to the box while you're debugging this. In the first window, run the command that blocks. In the second, do a ps -l of the process ID of the blocking process. That should tell you right where, inside the kernel, your process is stuck. The field you're interested in is the wchan field. If you post that to the list, I think you'll get a lot more insight into what's going on and it will help you focus your efforts. -thanks -Brian On Feb 22, 11:41am, Rob Newberry wrote: } Subject: Re: problems with USB/CDC serial (umodem) - devices work with Lin } Iâve been debugging this from as many angles as I can, and right now, the summary question is this: } } Why would âread( fd, but, 1);â block for almost 30 seconds, when: } } - the descriptor is in non-blocking mode } - poll indicates the descriptor is safely readable } - top indicates there is no CPU load (machine is also interactively responsive) } } Any ideas? I think my next step is to trace down through the âreadâ system call and try to see where itâs blocking, but so far Iâve gotten a little lost trying to figure out the path of the read call into the TTY code (mostly because I donât know all this code, and Iâm slowly figuring out how â/dev/ttyU0â ends up mapping to a vnode, how the v_ops get mapped, where TTYâs buffering gets mapped in, etc. I will eventually figure this out, but itâs a bit painful just reading and adding printfs :-). } } } } Hereâs some more details: } } It may indeed be unrelated to USB/umodem. } } The code does use select() and poll() to wait for input from the serial port. } } Eventually, the code decides that it can read without blocking from the port (both select and poll have indicated that the descriptor is safely readable). } } It then calls "read( fd, but, 1 );â (the code is adamant about reading a byte at a time). } } This call blocks for a LONG time. } } But the descriptor is in non-blocking mode. Iâve added debug code just before the âreadâ call that (a) checks the flags to make sure the descriptor is non-blocking, (b) does a âpollâ once more to make sure the descriptor appears readable, and (c) prints out the time before the call â and then I added a print out of the time AFTER the âread()â call completes (using clock_gettime(CLOCK_MONOTONIC)); } } In every case, the descriptor is in non-blocking mode, and poll says it is readable. } } But the âreadâ call takes 25 seconds or more! } } Iâve got âtopâ running in another shell â thereâs no CPU load while I run this. } } Hereâs the output: } } FDWrapper::read 0x7f7fffab3479, 1 } in non-blocking mode (flags = 0x4006) } poll indicates descriptor is readable (0x0040) } time before read = 8436, 316529950 } FDWrapper::read - got 1 } time after read = 8461, 854271618 } } } Hereâs the code that generates the above output: } } } ssize_t } FDWrapper::read(void* data, size_t len) } { } } printf("FDWrapper::read %p, %u\n", data, (unsigned int)len ); } } { } int fl; } fl = fcntl( mFDRead, F_GETFL ); } if ( fl & O_NONBLOCK ) } { } printf( "\tin non-blocking mode (flags = 0x%08X)\n", fl ); } } } else } { } printf( "\tin BLOCKING mode (flags = 0x%08X)\n", fl ); } } } } } } { } struct pollfd pfd; } int flags = POLLRDNORM|POLLERR|POLLNVAL|POLLHUP; } int err; } } pfd.fd = mFDRead; } pfd.events = flags; } pfd.revents = 0; } } err = poll( , 1, 0 ); } if ( ( err > 0 ) && ( pfd.revents & flags ) ) } { } printf( "\tpoll indicates descriptor is readable (0x%08lX)\n", (long)(pfd.revents & flags ) ); } } } else } { } printf( "\tpoll indicates descriptor is NOT readable (err = %d, 0x%08lX)\n", err, (long)(pfd.revents & flags ) ); } } } } } } { } struct timespec start; } clock_gettime( CLOCK_MONOTONIC, ); } printf( "\ttime before read = %lu, %lu\n", start.tv_sec, start.tv_nsec ); } } } } ssize_t ret = ::read(mFDRead, data, len); } printf("\tFDWrapper::read - got %d\n", (int)ret ); } } { } struct timespec stop; } clock_gettime( CLOCK_MONOTONIC, ); } printf( "\ttime after read = %lu, %lu\n", stop.tv_sec, stop.tv_nsec ); } } } } } ⦠} } } } } } } } > On Feb 18, 2019, at 11:05 AM, Michael van Elst wrote: } > } > mar...@duskware.de (Martin Husemann)
Re: Root device independent bootable disk images
hello. Perhaps I don't understand what this change means, exactly, but if this change goes forward, will one still be able to specify a specific device as the root disk even if it is not the boot disk? For example, specifying a raid5 set as the root when booting from a single disk, or, setting a hard disk as the root when booting from flash media? -thanks -Brian On Dec 13, 5:51pm, Christos Zoulas wrote: } Subject: Re: Root device independent bootable disk images } In article <23569.9846.9159.416...@guava.gson.org>, } Andreas Gustafsson wrote: } >Hi all, } > } >Since jmcneill's commit of src/lib/libutil/getfsspecname.c 1.5, NetBSD } >supports the special string "ROOT." as an alias for the root device in } >/etc/fstab. This can be used to avoid hard-coding the device name of } >the root disk on bootable disk images, allowing a single image to be } >booted from disks having different device names. } > } >This feature is currently used by the ARM images, but not by the } >images for other architectures. I would like to change this. My } >immediate motivation for this is to fix PR 51503, "7.0.1/amd64 USB } >install image root mount fails when sd present", but I belive it would } >also be useful on live images as well as install images, and on } >other architectures. Note that I am not proposing changing the fstab } >that gets written to the target disk when installing a system using } >sysinst, only that of pre-built disk images such as those from } >"build.sh install-image" or "build.sh live-image". } > } >The question is, is there any reason to keep the existing machinery } >for specifying a fixed device name via the BOOTDISK make variable? } >Or in other words, can anyone think of an architecture or type of disk } >image where the "ROOT." reference might not work, or where a } >hard-coded root disk device in /etc/fstab might otherwise be } >desirable? } > } >If not, the change I'm proposing would basically amount to changing } >"/dev/@@BOOTDISK@@" to "ROOT." in src/distrib/common/bootimage/fstab.in } >and fstab.install.in, followed by a bunch of cleanup work to remove } >things that are no longer used or needed, such as all references to } >BOOTDISK in the Makefiles. } > } >The "build.sh live-image" target currently builds two live images each } >for i386 and amd64, with names containing "-wd0root" and "-sd0root", } >respectively. With the proposed change, these would become almost } >identical, differing only in size and the OMIT_SWAPIMG setting, and } >probably ought to be merged into one. Other architectures only have } >at most a single live image each, but their names also contain strings } >like "-sd0root" or "-ra0root" that would now be meaningless and should } >be removed. } > } >Comments? Objections? } } I think this is a good idea! } } christos } >-- End of excerpt from Christos Zoulas
Re: earmv7hf test status, specifically utimensat
Hello. atime is something I use for diagnostic purposes on a regular basis. I wish Windows had it and when it's not available, I miss it often. -thanks -Brian On Aug 30, 11:01am, Jason Thorpe wrote: } Subject: Re: earmv7hf test status, specifically utimensat } } } > On Aug 30, 2018, at 10:21 AM, Greg Troxel wrote: } > } > It turns out the system I found this on was mounted noatime, just } > like the install image does by default. I had not set up tmpfs, even } > though I usually do. } > } > Adding a tmpfs makes the test pass. } > } > Also, when I said lockup, what really happened was that the system was } > running but the usb wifi interface wedged: } > } > Aug 30 11:00:20 rpi3 /netbsd: urtwn0: timeout waiting for firmware readiness } } I see this all the bloody time on my RPI, and it drives me batty. } } > obviously noatime is about not breaking uSD cards, but I wonder if } > that's still necessary with modern cards? } } Yes. } } > the rpi3 image seems to default to noatime, and I wonder if that } > should continue to be, given the balance of saving wear and oddness } } Honestly, atime is one of the dumbest file attributes there is. Personally, I think we should be defaulting to noatime unless running some sort of conformance test. } } -- thorpej } } >-- End of excerpt from Jason Thorpe
Re: Problem with shutting down the Xserver
hello Paul. If it's not too expensive, I suggest getting a USB->serial dongle, Prolyphic based chips work well with NetBSD and Windows, so you could use it on either machine. This should give you the functionality you need to debug this issue further. -thanks -Brian On Jul 27, 4:26pm, Paul Goyette wrote: } Subject: Re: Problem with shutting down the Xserver } On Fri, 27 Jul 2018, Martin Husemann wrote: } } The problem machine is a desktop. It has a serial port. Unfortunately } the Windoze laptop does not, so nothing to which the other end of a } serial cable could be attached. } } }
Re: Possible regression in wm(4)?
Hello. That's good news. I'll second that the patch worked perfectly. -thanks -Brian On Dec 7, 9:40am, Masanobu SAITOH wrote: } Subject: Re: Possible regression in wm(4)? } On 2017/12/06 22:26, Bert Kiers wrote: } > On Fri, Dec 01, 2017 at 04:40:37PM +0900, Masanobu SAITOH wrote: } >> Hi, all } >> } >> On 2017/11/22 0:21, Bert Kiers wrote: } >>> Hi, } >>> } >>> A different computer with the same type motherboard has the same } >>> problem. A quad I350 (also wm(4)) works fine (with GENERIC netbsd-8 } >>> kernel). } >>> } >>> Still wondering what queue drops are. } >>> } >>> Grtnx, } >> } >> Could you test the following diff? } > } > Yes! Works! } > Thank you! } } Thanks. The diff have been committed now and will be pulled } up to netbsd-8. } } -- } --- } SAITOH Masanobu (msai...@execsw.org } msai...@netbsd.org) >-- End of excerpt from Masanobu SAITOH
Networking issues with NetBSD-8 on Supermicro with X8DTU board?
hello. I'm trying to run the latest NetBSD-8 code on a Supermicro board, but I can't get the wm(4) network cards to work. The dmesg is below. NetBSD-5.2, using my production sources works just fine. It looks like an interrupt routing issue to me, but I don't understand enough about how interrupts work in NetBSD-8 yet to know what to focus on to narrow the problem down. Can someone look at the two dmesg outputs, one from NetBSD-8, the other from NetBSD-5.2 and make suggestions as to what to try to figure out what's going wrong? The NetBSD-8 sources are CVS'd from 11/28/2017. Ideas welcome. -thanks -BrianCopyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 The NetBSD Foundation, Inc. All rights reserved. Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. NetBSD 8.0_BETA (GENERIC) #1: Tue Nov 28 11:11:37 PST 2017 buh...@lothlorien.nfbcal.org:/usr/local/netbsd/obj-80/sys/arch/i386/compile/GENERIC total memory = 3063 MB avail memory = 2992 MB timecounter: Timecounters tick every 10.000 msec Kernelized RAIDframe activated running cgd selftest aes-xts-256 aes-xts-512 done timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100 Supermicro X8DTU (1234567890) mainbus0 (root) ACPI: RSDP 0x000FACE0 24 (v02 ACPIAM) ACPI: XSDT 0xBF790100 8C (v01 SMCI20120803 MSFT 0097) ACPI: FACP 0xBF790290 F4 (v03 080312 FACP1521 20120803 MSFT 0097) ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20170303/tbfadt-642) ACPI: DSDT 0xBF7906A0 006580 (v01 10600 1060 INTL 20051117) ACPI: FACS 0xBF79E000 40 ACPI: FACS 0xBF79E000 40 ACPI: APIC 0xBF790390 00011E (v01 080312 APIC1521 20120803 MSFT 0097) ACPI: MCFG 0xBF7904B0 3C (v01 080312 OEMMCFG 20120803 MSFT 0097) ACPI: SLIT 0xBF7904F0 30 (v01 080312 OEMSLIT 20120803 MSFT 0097) ACPI: OEMB 0xBF79E040 86 (v01 080312 OEMB1521 20120803 MSFT 0097) ACPI: SRAT 0xBF79A6A0 0001A8 (v01 080312 OEMSRAT 0001 INTL 0001) ACPI: HPET 0xBF79A850 38 (v01 080312 OEMHPET 20120803 MSFT 0097) ACPI: DMAR 0xBF79E0D0 000128 (v01 AMIOEMDMAR 0001 MSFT 0097) ACPI: SSDT 0xBF7A1B30 000363 (v01 DpgPmm CpuPm0012 INTL 20051117) ACPI: EINJ 0xBF79A890 000130 (v01 AMIER AMI_EINJ 20120803 MSFT 0097) ACPI: BERT 0xBF79AA20 30 (v01 AMIER AMI_BERT 20120803 MSFT 0097) ACPI: ERST 0xBF79AA50 0001B0 (v01 AMIER AMI_ERST 20120803 MSFT 0097) ACPI: HEST 0xBF79AC00 A8 (v01 AMIER ABC_HEST 20120803 MSFT 0097) ACPI: Executed 1 blocks of module-level executable AML code ACPI: 2 ACPI AML tables successfully acquired and loaded ioapic0 at mainbus0 apid 6: pa 0xfec0, version 0x20, 24 pins ioapic1 at mainbus0 apid 7: pa 0xfec8a000, version 0x20, 24 pins cpu0 at mainbus0 apid 0 cpu0: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu0: package 0, core 0, smt 0 cpu1 at mainbus0 apid 2 cpu1: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu1: package 0, core 1, smt 0 cpu2 at mainbus0 apid 18 cpu2: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu2: package 0, core 9, smt 0 cpu3 at mainbus0 apid 20 cpu3: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu3: package 0, core 10, smt 0 cpu4 at mainbus0 apid 32 cpu4: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu4: package 1, core 0, smt 0 cpu5 at mainbus0 apid 34 cpu5: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu5: package 1, core 1, smt 0 cpu6 at mainbus0 apid 50 cpu6: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu6: package 1, core 9, smt 0 cpu7 at mainbus0 apid 52 cpu7: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu7: package 1, core 10, smt 0 cpu8 at mainbus0 apid 1 cpu8: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu8: package 0, core 0, smt 1 cpu9 at mainbus0 apid 3 cpu9: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu9: package 0, core 1, smt 1 cpu10 at mainbus0 apid 19 cpu10: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu10: package 0, core 9, smt 1 cpu11 at mainbus0 apid 21 cpu11: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu11: package 0, core 10, smt 1 cpu12 at mainbus0 apid 33 cpu12: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu12: package 1, core 0, smt 1 cpu13 at mainbus0 apid 35 cpu13: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu13: package 1, core 1, smt 1 cpu14 at mainbus0 apid 51 cpu14: Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, id 0x206c2 cpu14: package 1, core 9, smt 1 cpu15 at mainbus0 apid 53
Re: Fatal page fault in cbq_enqueue()
Hello. We use altq plus pf all over the place in our company. We've looked at using npf, but it doesn't have the feature set we need to make all of our stuff go. Right now, we're using NetBSD-5, which is rock solid in terms of reliability. I don't know if it's easier to make pf and altq work in NET_MPSAFE mode, or add the missing functionality to npf, but for my part, I think it's easier for me to fix pf plus altq in terms of using it in an SMP environment than it is to add the missing features to npf. -thanks -Brian On Sep 26, 11:37am, Paul Ripke wrote: } Subject: Re: Fatal page fault in cbq_enqueue() } Recently upgraded to netbsd-8 branch, and I'm still seeing these } occassionally. Eg: } } Sep 25 20:57:16 slave /netbsd: fatal page fault in supervisor mode } Sep 25 20:57:16 slave /netbsd: trap type 6 code 0 rip 0x807a68b9 cs 0x8 rflags 0x10286 cr2 0x8 ilevel 0x8 rsp 0xfe80400077e0 } Sep 25 20:57:16 slave /netbsd: curlwp 0xfe811f932420 pid 0.3 lowest kstack 0xfe80400042c0 } Sep 25 20:57:16 slave /netbsd: panic: trap } Sep 25 20:57:16 slave /netbsd: cpu0: Begin traceback... } Sep 25 20:57:16 slave /netbsd: vpanic() at netbsd:vpanic+0x140 } Sep 25 20:57:16 slave /netbsd: snprintf() at netbsd:snprintf } Sep 25 20:57:16 slave /netbsd: trap() at netbsd:trap+0xc6b } Sep 25 20:57:16 slave /netbsd: --- trap (number 6) --- } Sep 25 20:57:16 slave /netbsd: rmc_queue_packet() at netbsd:rmc_queue_packet+0x150 } Sep 25 20:57:16 slave /netbsd: cbq_enqueue() at netbsd:cbq_enqueue+0xee } Sep 25 20:57:16 slave /netbsd: ifq_enqueue2() at netbsd:ifq_enqueue2+0xc4 } Sep 25 20:57:16 slave /netbsd: sppp_output() at netbsd:sppp_output+0x1ab } Sep 25 20:57:16 slave /netbsd: ip6_if_output() at netbsd:ip6_if_output+0x60 } Sep 25 20:57:16 slave /netbsd: ipf_fastroute() at netbsd:ipf_fastroute+0x97e } Sep 25 20:57:16 slave /netbsd: ipf_send_ip() at netbsd:ipf_send_ip+0x13d } Sep 25 20:57:16 slave /netbsd: ipf_check() at netbsd:ipf_check+0xcfc } Sep 25 20:57:16 slave /netbsd: pfil_run_hooks() at netbsd:pfil_run_hooks+0x117 } Sep 25 20:57:16 slave /netbsd: ip6_input() at netbsd:ip6_input+0x278 } Sep 25 20:57:16 slave /netbsd: ip6intr() at netbsd:ip6intr+0x71 } Sep 25 20:57:16 slave /netbsd: softint_dispatch() at netbsd:softint_dispatch+0xd3 } Sep 25 20:57:16 slave /netbsd: DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe8040007ff0 } Sep 25 20:57:16 slave /netbsd: Xsoftintr() at netbsd:Xsoftintr+0x4f } } Are there many users of altq+ipf out there? Perhaps now is a good time to } switch to npf... } } -- } Paul Ripke } "Great minds discuss ideas, average minds discuss events, small minds } discuss people." } -- Disputed: Often attributed to Eleanor Roosevelt. 1948. >-- End of excerpt from Paul Ripke
Re: ssh, HPN extension and TCP auto-tuning
Hello. I'm not sure if the kern.sbmax value needs to be as large as the window size since, presumably, on the receive side, the data is sent to the application long before the socket buffer gets full. And, on the transmit side, I think the data is passed into the stack as MTU-sized mbufs and stored as a chain, rather than as one large socket buffer. But, I could be completely off onn this. However, if I look at slow TCP connections which fill up, the send-queue size seems to reflect the window size, rather than the socket buffer size. I'll pay more attention to this in the future and see if I've missed that obvius adjustment. I can tell you, however, that with those tuning parameters in place, I can fill a 100 mbits/sec pipe with ssh traffic. Before those changes, I could not. -thanks -Brian On Sep 21, 11:25am, Havard Eidnes wrote: } Subject: Re: ssh, HPN extension and TCP auto-tuning } Hi, } } > } > # Improves TCP performance significantly with ssh. } > net.inet.tcp.recvbuf_auto=3D1 } > net.inet.tcp.sendbuf_auto=3D1 } > net.inet.tcp.sendbuf_max=3D16777216 } > net.inet.tcp.recvbuf_max=3D16777216 } } Thanks for the suggestions, and I've done some initial } adjustments with beneficial results. I was a bit more } conservative and went for a 1MB sendbuf_max / recvbuf_max. } } One thing I didn't see was any corresponding adjustment of } kern.sbmax; doesn't it also need to be as large as you want the } TCP window to be able to grow? } } Best regards, } } - H=E5vard >-- End of excerpt from Havard Eidnes
Re: ssh, HPN extension and TCP auto-tuning
Hello. I spent quite a bit of time looking at this under NetBSD-5 and discovered that the default ssh settings, along with the default tcp network settings precluded the adaptive network performance from working. As a result, I've added the following lines to the ssh configs as well as the sysctl.conf files on the NetBSD-5 hosts we manage. As faras I've been able to tell, we've been able to realize good performance gains as a result of these changes. Unless there have been a lot of regressions, I have no reason to believe that these settings won't yield similar performance improvements under NetBSD-7 and NetBSD-8. Here are the fixes I came up with. # Improves TCP performance significantly with ssh. net.inet.tcp.recvbuf_auto=1 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 # Put the following lines in both /etc/ssh/ssh_config # and /etc/ssh/sshd_config #Enable High Performance Networking options (BB 12/27/2010) #Turn on HPN features HPNDisabled no #Allow 5MB of ssh window buffer HPNBufferSize 5000 #Enable dynamic window sizing of SSH buffers #You must have tcp autotuning turned on in the kernel for this to work TcpRcvBufPoll yes
Re: CVS commit: src
hello. COMPAT_IBCS2 implements the functionality necessary to run SCO OS/5 binary files. I haven't used this in a long while, but it was good enough to run a whole set of Oracle tools better than they ran under the native OS.(I used it with great success under NetBSD-1.4). I realize that's a long time ago, but SCO hasn't changed their binary format for this platform in forever and there used to be a lot of closed source software that ran under that platform. SCO was pretty much pure SVR3, so noting that COMPAT_IBCS2 implements SVR3 functionality is pretty much correct. Is anyone still using it? -Brian
Re: NetBSD with a gaming keyboard
Hello. It looks like the ukbd driver only allows 8 modifier keys on the keyboard, although the comment says it's 32 keys. sys/dev/usb/ukbd.c:87 says: #define MAXMOD 8 /* max 32 */ Assuming the comment is correct, it looks like a change from 8 to 16 would allow for up to 64 modifier keys. A quick trip through this file suggests that this is a pretty harmless change and might just fix your problem. So, I suggest changing the above line to: #define MAXMOD 16 /* max 64 */
Re: NetBSD with a gaming keyboard
hello. I'm curious what brought you back to NetBSD from FreeBSD. -thanks -Brian
Re: Using NET_MPSAFE
hello Hauke. Are you saying that pf(4) doesn't work well in NetBSD-8_BETA? -thanks -Brian On Aug 9, 8:31am, Hauke Fath wrote: } Subject: Re: Using NET_MPSAFE } On Tue, 8 Aug 2017 16:53:11 -0700, Brian Buhrow wrote: } > Unfortunately, npf(4) doesn't have all } > the functionality we need to implement the configurations we use. } > Consequently, it may be necessary to MP-ify pf(4) as well, as I suspect } > that's easier than implementing its functionality in npf(4). } } FreeBSD has taken that route, and after my ordeal with a netbsd-{7,8}=20 } pf router pair I am quite happy with the result. } } Cheerio, } hauke } } --=20 } Hauke Fath<ha...@espresso.rhein-neckar.de> } Ernst-Ludwig-Stra=DFe 15 } 64625 Bensheim } Germany >-- End of excerpt from Hauke Fath
Re: Using NET_MPSAFE
hello Ryota San. Thank you for your detailed response. Yes, I'm interested in using NetBSD as a router. We use NetBSD-5 as router devices and find it quite reliable, but for higher speed applications, we're running into the cpu0 takes all interrupts bottleneck. I can't promise to deliver anything in a timely manner, but I can look at the agr(4) driver and see if I can MP-ify it. Are there any notes in English describing the basic procedures for MP-ifying a driver? I've done a bit of it for drivers under NetBSD-5, so it's not completely foreign to me. We also use pf(4) extensively. Unfortunately, npf(4) doesn't have all the functionality we need to implement the configurations we use. Consequently, it may be necessary to MP-ify pf(4) as well, as I suspect that's easier than implementing its functionality in npf(4). -thanks -Brian
Using NET_MPSAFE
hello. I'm excited to see the development of the MP-safe network stack in NetBSD. Now that some progress has been made in that regard and there are MP-safe drivers and stack components to use, I have some questions. I'm interested in using options NET_MPSAFE in NetBSD-8.0_BETA and the eventual netbsd-8 release. Here are my questions. I apologize if some of them seem obvious, but I don't want to make any assumptions when trying this new stuff. 1. If I enable NET_MPSAFE in the kernel, will non-MP-ify'd components work in that kernel using the kernel lock? In other words, if I enable NET_MPSAFE and use the wm(4) driver, I'll get MP performance out of the network stack. However, what if I try to use a non-MP-ify'd component on that same machine, i.e. agr(4) or pf(4)? It looks to me like things should work, but traffic through the non-MP-ify'd components will be single threaded. Is this correct? 2. Am I correct that when NET_MPSAFE is turned on, the network stack is runing as an LWP inside the kernel? And, am I correct that this means that even if a particular network component is single-threaded, it's able to execute on any CPU, thus reducing CPU congestion on CPU0 as happens on the stock NetBSD kernels? 3. How stable is the NET_MPSAFE stack? Is anyone using it in any sort of production environment? the BSDCAN paper I read suggests it's pretty stable, but I'm wondering if anyone can report their experience. -thanks -Brian
route-to and reply-to equivalents in npf?
Hello. I'm working on transitioning services at our shop from NetBSD-5 to NetBSD-8. As part of that effort, I'm working on figuring out how to write configurations for npf(7) which are direct replacements for our pf(4) configurations. The process looks pretty straightforward, except for one case which we use quite extensively. In pf(4), one can use the route-to and reply-to rules to explicitly route packets sourced from certain interfaces to other interfaces. We use this to allow us to use public IP addresses from one ISP through a series of VPN'S, allowing us to provide service through non-local networks. As an example, here is a snippet from one of our working systems. (IP addresses changed to protect the innocent) In the below example, $dmz_if is one interface of this machine that routes to a local subnet. $vpn_if is the interface that runs through an IP tunnel. Any traffic from the $dmz_if should be routed through the $vpn_if instead of using the standard routing table. Any traffic originating from the $vpn_if destined for the $dmz_if should be return-routed to the $vpn_if. This works beautifully under pf(4) under NetbSD-4 and NetBSD-5. How can I replicate this behavior under NetBSD-8 with npf? -thanks -Brian # Allow the back office to keep using foreign addresses (06/26/2017) pass in quick on $dmz_if from $dmz_if:network to $dmz_if:network no state #Pass internal network traffic through the VPN to expose it to the Internet pass in quick on $dmz_if route-to { ($vpn_if 10.0.94.105) } from $dmz_if:network to any keep state pass out quick on $dmz_if from $dmz_if:network to $dmz_if:network no state pass out quick on $dmz_if reply-to { ($vpn_if 10.0.94.105) } from any to $dmz_if:network keep state block out quick on $dmz_if from any to any no state