Re: nvme timeout issues with hardware and bhyve vm's
On Thu, Dec 7, 2023 at 2:39 PM Pete Wright wrote: ... > Hi Warner, just resurfacing this thread because I've had a few lockups > on my workstation running 14.0-STABLE. I was able to capture a photo of > the hang and this seems to be the most important line: > > nvme0: Resetting controller due to a timeout and possible hot unplug. > > When I scan the device after reboot I don't see any errors, but if there > is a particular thing I should check via nvmecontrol please let me know. > Also, since it mentions possible hot unplug I wonder if this is > hardware/firmware related to my system? Does the device support Persistent Log pages (LID=0x0d)? If so, it might be interesting to dump those. --chuck
Re: nvme timeout issues with hardware and bhyve vm's
On Fri, Oct 13, 2023 at 7:34 PM Warner Losh wrote: ... > Let me now if you see similar messages in stable/14. I think I've fixed all > the > issues with timeouts, though you shouldn't ever seem them in a vm setup > unless something else weird is going on. I'd be interested in a repo case too as I haven't seen the NVMe emulation in bhyve do this before. Were there any error messages from bhyve? The guest log messages seem to suggest that the backing storage for the emulated device is timing out. If your comment > I had similar issues on my workstation as well. Scrubbing the NVMe > device on my real-hardware workstation hasn't turned up any issues, but > the system has locked up a handful of times. means the host is seeing NVMe errors on the drives backing the zpool/zvol used by the emulated device, this might explain it. Although, it is curious the emulated controller had trouble resetting (i.e., the error message "controller ready did not become 1 within 30500 ms"). --chuck
Re: Can't assign address to igc
Bah, actually adding freebsd-net this time On Tue, Aug 15, 2023 at 7:44 PM Chuck Tuffli wrote: > > [Adding freebsd-net@] > > On Tue, Aug 1, 2023 at 10:46 AM Chuck Tuffli wrote: > > > > Running a recent-ish version (n264266-8f8da1bcc799) on an Intel NUC > > (RNUC11PABi5), assigning an IPv4 address to igc0 doesn't work. There > > is no error message, and this has been working in the past. Looking > > through dmesg only shows: > > > > igc0: link state changed to UP > > igc0: link state changed to DOWN > > igc0: link state changed to UP > > > > The address is set in rc.conf: > > ifconfig_igc0="inet 192.168.5.10 netmask 255.255.255.0" > > > > And manually setting it via > > # ifconfig igc0 inet 192.168.5.10/24 up > > does not work either. Any suggestions? > > I spent a little time debugging this but still need some help > understanding what might be wrong. > > Instrumenting dump_sa() in the AF_INET case, the .sin_addr contains > the value of the IP address I'm adding (printf shows 0x0a05a8c0 or big > endian 192 168 6 10). Switching the log level in > sys/netlink/route/iface.c to LOG_DEBUG3 shows messages: > [nl_iface] dump_iface_addr: dumping ifa 0xf8000db0cd80 type > inet(2) for interface igc0 > Does this mean the address information was written to the net link > buffer? If so, what might make ifconfig not display that address? > > One other tidbit is, eventually ifconfig reports the address and > things like sshd start working. Typically this takes < 100 iterations > to do this.
Re: Can't assign address to igc
[Adding freebsd-net@] On Tue, Aug 1, 2023 at 10:46 AM Chuck Tuffli wrote: > > Running a recent-ish version (n264266-8f8da1bcc799) on an Intel NUC > (RNUC11PABi5), assigning an IPv4 address to igc0 doesn't work. There > is no error message, and this has been working in the past. Looking > through dmesg only shows: > > igc0: link state changed to UP > igc0: link state changed to DOWN > igc0: link state changed to UP > > The address is set in rc.conf: > ifconfig_igc0="inet 192.168.5.10 netmask 255.255.255.0" > > And manually setting it via > # ifconfig igc0 inet 192.168.5.10/24 up > does not work either. Any suggestions? I spent a little time debugging this but still need some help understanding what might be wrong. Instrumenting dump_sa() in the AF_INET case, the .sin_addr contains the value of the IP address I'm adding (printf shows 0x0a05a8c0 or big endian 192 168 6 10). Switching the log level in sys/netlink/route/iface.c to LOG_DEBUG3 shows messages: [nl_iface] dump_iface_addr: dumping ifa 0xf8000db0cd80 type inet(2) for interface igc0 Does this mean the address information was written to the net link buffer? If so, what might make ifconfig not display that address? One other tidbit is, eventually ifconfig reports the address and things like sshd start working. Typically this takes < 100 iterations to do this.
Can't assign address to igc
Running a recent-ish version (n264266-8f8da1bcc799) on an Intel NUC (RNUC11PABi5), assigning an IPv4 address to igc0 doesn't work. There is no error message, and this has been working in the past. Looking through dmesg only shows: igc0: link state changed to UP igc0: link state changed to DOWN igc0: link state changed to UP The address is set in rc.conf: ifconfig_igc0="inet 192.168.5.10 netmask 255.255.255.0" And manually setting it via # ifconfig igc0 inet 192.168.5.10/24 up does not work either. Any suggestions? --chuck
Re: nvme related(?) panic on recent -CURRENT
On Thu, Jun 29, 2023 at 12:47 PM Juraj Lutter wrote: > > With recent -current, following occured: > > db> bt > Tracing pid 0 tid 100063 td 0xfe00c5c35e40 > kdb_enter() at kdb_enter+0x32/frame 0xfe00c5e31c90 > vpanic() at vpanic+0x181/frame 0xfe00c5e31ce0 > panic() at panic+0x43/frame 0xfe00c5e31d40 > nvme_ctrlr_identify() at nvme_ctrlr_identify+0x10e/frame 0xfe00c5e31d90 > nvme_ctrlr_start() at nvme_ctrlr_start+0x91/frame 0xfe00c5e31e10 > nvme_ctrlr_reset_task() at nvme_ctrlr_reset_task+0xec/frame 0xfe00c5e31e40 > taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfe00c5e31ec0 > taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfe00c5e31ef0 > fork_exit() at fork_exit+0x7d/frame 0xfe00c5e31f30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5e31f30 > --- trap 0, rip = 0, rsp = 0, rbp = 0 — > > machine is a bhyve guest. If I'm lldb'ing correctly, nvme_ctrlr_identify+0x10e is the panic in nvme_completion_poll() if the NVMe command does not complete within the timeout period (10 seconds). In this case, it is the Identify, Controller command. In the bhyve emulation, this command effectively memcpy's the data structure to the memory provided by the guest and completes the command. If this panic is reproducible, I can provide a patch to enhance the debug output to figure out if this is an emulation or driver issue. --chuck
Re: nvme related(?) panic on recent -CURRENT
On Thu, Jun 29, 2023 at 12:47 PM Juraj Lutter wrote: > > With recent -current, following occured: > > db> bt > Tracing pid 0 tid 100063 td 0xfe00c5c35e40 > kdb_enter() at kdb_enter+0x32/frame 0xfe00c5e31c90 > vpanic() at vpanic+0x181/frame 0xfe00c5e31ce0 > panic() at panic+0x43/frame 0xfe00c5e31d40 > nvme_ctrlr_identify() at nvme_ctrlr_identify+0x10e/frame 0xfe00c5e31d90 > nvme_ctrlr_start() at nvme_ctrlr_start+0x91/frame 0xfe00c5e31e10 > nvme_ctrlr_reset_task() at nvme_ctrlr_reset_task+0xec/frame 0xfe00c5e31e40 > taskqueue_run_locked() at taskqueue_run_locked+0x182/frame 0xfe00c5e31ec0 > taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfe00c5e31ef0 > fork_exit() at fork_exit+0x7d/frame 0xfe00c5e31f30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5e31f30 > --- trap 0, rip = 0, rsp = 0, rbp = 0 — > > machine is a bhyve guest. Did bhyve log any warnings or errors? I'm curious if its NVMe emulation did something dumb or itself noticed an issue. --chuck
build failure WITH_ASAN
make buildworld -DWITH_UBSAN -DWITH_ASAN is failing for me with the error: building shared library libc.so.7 ld: error: cannot open /usr/home/ctuffli/dev/freebsd/obj/usr/home/ctuffli/dev/freebsd/src.git/amd64.amd64/tmp/usr/lib/clang/14.0.5/lib/freebsd/libclang_rt.asan_static-x86_64.a: No such file or directory cc: error: linker command failed with exit code 1 (use -v to see invocation) *** Error code 1 This does use meta mode but still occurs after running cleanworld. greping through the sources, I see a references to libclang_rt.asan_static-x86_64.a in tools/build/mk/OptionalObsoleteFiles.inc in the MK_CLANG == no section. What other information can I provide to help figure this out? Thanks! --chuck
bhyve core dump related to llvm 14
I have a virtual machine used to test the NVMe emulation in bhyve. All of the tests in the VM pass running under FreeBSD 13.1-R, but the same VM running under -current causes bhyve(8) to dump core because of a segmentation fault. git bisect identified the last "good" commit on main as cb2ae6163174 sysvsem: Fix a typo After this commit, there are a half-dozen commits related to merging the llvm project release/14.x The core dump is repeatable and consistent. Back traces under lldb look similar to this: * thread #22, name = 'vcpu 2', stop reason = signal SIGSEGV: invalid address (fault address: 0xb8) * frame #0: 0x383eb9fc916b bhyve`pci_nvme_read(ctx=0x38483ad2d700, vcpu=0, pi=0x, baridx=-188391150, offset=0, size=0) at pci_nvme.c:3035:34 frame #1: 0x384834616280 frame #2: 0x383eb9fc1f7a bhyve`pci_emul_mem_handler(ctx=, vcpu=, dir=, addr=, size=, val=, arg1=0x3846e5b71600, arg2=0) at pci_emul.c:498:4 In frame 0, pi being NULL causes the core dump, but most of the arguments are invalid / garbage. Looking earlier in the stack, the vcpu value should be 2, the ctx pointer doesn't match, and the value passed to pi isn't NULL. Poking around in frame 2, I can see that the "direction" is a memory write (dir == MEM_F_WRITE) and the statement being executed is this: (*pe->pe_barwrite)(ctx, vcpu, pdi, bidx, offset, size, *val); Confusingly, the function pointer pe_barwrite is pci_nvme_write() and not pci_nvme_read() where the crash occurs. I've confirmed the fault is in pci_nvme_read() by adding an assert for pi != NULL. This is especially odd because pci_emul_mem_handler() directly calls pci_nvme_read() and pci_nvme_write(). So why does frame 1 exist at all? Using gdb, the back traces either don't decode at all or look similar to this: (gdb) bt #0 pci_nvme_read (ctx=0x944c1168700, vcpu=0, pi=0x0, baridx=-1835053270, offset=0, size=0) at /poudriere/jails/14-current-amd64/usr/src/usr.sbin/bhyve/pci_nvme.c:3035 #1 0x09436891d8e8 in _CurrentRuneLocale () from /lib/libc.so.7 #2 0x09436a73ca28 in ?? () #3 0x09436a73e1c0 in ?? () ... #34 0x09436a747600 in ?? () #35 0x093b3e76b088 in pci_de_lpc () #36 0x09436a716500 in ?? () #37 0x0944c3196d10 in ?? () #38 0x093b3e74501a in pci_emul_mem_handler (ctx=0x9436a7bd670, vcpu=0, dir=, addr=, size=0, val=0x646165725f657469, arg1=0x1, arg2=10185153275136) at /poudriere/jails/14-current-amd64/usr/src/usr.sbin/bhyve/pci_emul.c:498 Other random tidbits: - disabling compiler optimization (i.e. -O0) for the two files in question (pci_nvme.c and pci_emul.c) makes the core dump go away - using the default optimization level but generously sprinkling debug printf everywhere makes the core dump go away. I'm not sure where to go from here and could use some help. --chuck
Re: nvme INVALID_FIELD in dmesg.boot
On Wed, May 25, 2022 at 6:59 AM Alexander Motin wrote: ... > > nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:000b cdw11:031f > > nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0 ... > Those messages mean that driver tried to enable certain types of > asynchronous events, but probably the hardware does not support some of Du-oh! I read the 'b' in 000b as 'binary'. Alexander and Warner are correct that this is from setting an AEN, and I was wrong about CDW10 causing the error. FWIW, "CDW" in NVMe-land is shorthand for Command DWord as NVMe commands are composed of 32-bit fields (ancient Intel/Microsoft "double words").
Re: nvme INVALID_FIELD in dmesg.boot
On Wed, May 25, 2022 at 5:26 AM Matteo Riondato wrote: ... > nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:000b > cdw11:031f > nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0 ... > nda0 at nvme0 bus 0 scbus16 target 0 lun 1 > nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link ... > The disks seem to work fine, from what I can tell. > > Are the "INVALID_FIELD" messages harmless, or can they be avoided with > some tuning, or maybe with some patch? The log messages mean the driver is sending the Set Features command with an invalid parameter. Usually, this won't be fatal which seems to be the case here as the nda appear. If the logging output is to be believed, the invalid parameter is CDW10 which shouldn't be 0x0. That said, I'm not immediately seeing how that could be the case. It would be interesting to set hw.nvme.verbose_cmd_dump to confirm this is happening. --chuck
Re: network address not restored on resume
On Tue, Mar 22, 2022 at 4:25 PM Kevin Oberman wrote: > Not enough information to guess. > > What is the content of /etc/rc.conf in regard to configuration of this > interface? defaultrouter="192.168.5.1" ifconfig_igc0="inet 192.168.5.10 netmask 255.255.255.0" > What shows up in the log file when this happens (usually /var/log/messages)? > The log information is particularly important. (Note that I don't have an igc > interface.) Mostly USB noise, other than igc0: link state changed to UP at the end ... Mar 23 07:48:33 stargate acpi[2172]: resumed at 20220323 07:48:33 Mar 23 07:48:34 stargate kernel: ugen1.2: at usbus1 Mar 23 07:48:34 stargate kernel: uaudio0 on uhub1 Mar 23 07:48:34 stargate kernel: uaudio0: on usbus1 Mar 23 07:48:34 stargate kernel: uaudio0: Play[0]: 48000 Hz, 2 ch, 16-bit S-LE PCM format, 2x2ms buffer. Mar 23 07:48:34 stargate kernel: uaudio0: Play[0]: 44100 Hz, 2 ch, 16-bit S-LE PCM format, 2x2ms buffer. Mar 23 07:48:34 stargate kernel: uaudio0: Record[0]: 48000 Hz, 2 ch, 16-bit S-LE PCM format, 2x2ms buffer. Mar 23 07:48:34 stargate kernel: uaudio0: Record[0]: 44100 Hz, 2 ch, 16-bit S-LE PCM format, 2x2ms buffer. Mar 23 07:48:34 stargate kernel: uaudio0: No MIDI sequencer. Mar 23 07:48:34 stargate kernel: pcm2: on uaudio0 Mar 23 07:48:34 stargate kernel: uaudio0: HID volume keys found. Mar 23 07:48:34 stargate kernel: ugen1.3: at usbus1 Mar 23 07:48:34 stargate kernel: uhub2 on uhub1 Mar 23 07:48:34 stargate kernel: uhub2: on usbus1 Mar 23 07:48:34 stargate kernel: uhub2: MTT enabled Mar 23 07:48:35 stargate kernel: uhub2: 4 ports with 4 removable, self powered Mar 23 07:48:35 stargate kernel: ugen1.4: at usbus1 Mar 23 07:48:35 stargate kernel: ukbd0 on uhub2 Mar 23 07:48:35 stargate kernel: ukbd0: on usbus1 Mar 23 07:48:35 stargate kernel: kbd1 at ukbd0 Mar 23 07:48:35 stargate kernel: ums0 on uhub2 Mar 23 07:48:35 stargate kernel: ums0: on usbus1 Mar 23 07:48:35 stargate kernel: ums0: 16 buttons and [XYZT] coordinates ID=2 Mar 23 07:48:35 stargate kernel: uhid0 on uhub2 Mar 23 07:48:35 stargate kernel: uhid0: on usbus1 Mar 23 07:48:35 stargate kernel: ugen1.5: at usbus1 Mar 23 07:48:35 stargate kernel: ums1 on uhub2 Mar 23 07:48:35 stargate kernel: ums1: on usbus1 Mar 23 07:48:35 stargate kernel: ums1: 3 buttons and [XYZ] coordinates ID=0 Mar 23 07:48:35 stargate kernel: ukbd1 on uhub2 Mar 23 07:48:35 stargate kernel: ukbd1: on usbus1 Mar 23 07:48:35 stargate kernel: kbd2 at ukbd1 Mar 23 07:48:35 stargate kernel: uhid1 on uhub2 Mar 23 07:48:35 stargate kernel: uhid1: on usbus1 Mar 23 07:48:36 stargate kernel: ugen1.6: at usbus1 Mar 23 07:49:26 stargate kernel: igc0: link state changed to UP Mar 23 07:51:28 stargate shutdown[2287]: reboot by root:
network address not restored on resume
On a recent-ish current (git hash 66b86c8a7604), after resuming from sleep, the main network interface doesn't get restored. Further, manually fixing this via service netif or ifconfig seems to fail. Am I doing something wrong? root@stargate:~ # uname -a FreeBSD stargate.tuffli.net 14.0-CURRENT FreeBSD 14.0-CURRENT #2 main-n253430-66b86c8a7604: Sat Feb 26 23:35:02 PST 2022 root@stargate:~ # ifconfig igc0 igc0: flags=8863 metric 0 mtu 1500 options=4e527bb ether 1c:69:7a:a9:cd:1c inet 192.168.5.10 netmask 0xff00 broadcast 192.168.5.255 media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 root@stargate:~ # zzz root@stargate:~ # ifconfig igc0 igc0: flags=8c22 metric 0 mtu 1500 options=4e527bb ether 1c:69:7a:a9:cd:1c media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 root@stargate:~ # service netif restart igc0 Stopping Network: igc0. igc0: flags=8c22 metric 0 mtu 1500 options=4e527bb ether 1c:69:7a:a9:cd:1c media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 Starting Network: igc0. igc0: flags=8863 metric 0 mtu 1500 options=4e527bb ether 1c:69:7a:a9:cd:1c inet 192.168.5.10 netmask 0xff00 broadcast 192.168.5.255 media: Ethernet autoselect status: no carrier nd6 options=29 root@stargate:~ # root@stargate:~ # ifconfig igc0 igc0: flags=8c22 metric 0 mtu 1500 options=4e527bb ether 1c:69:7a:a9:cd:1c media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 root@stargate:~ # ifconfig igc0 inet 192.168.5.10/24 igc0: flags=8c22 metric 0 mtu 1500 options=4e527bb ether 1c:69:7a:a9:cd:1c media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 root@stargate:~ #
build failure on -current
When building current from git, I keep hitting the error below. This is with meta-mode, but I've also tried deleting the object directory. The system also has a couple of tweaks to src-env.conf that were an attempt to avoid building any (most?) of clang. Relevant system information: $ uname -mrsv FreeBSD 14.0-CURRENT FreeBSD 14.0-CURRENT main-22c4ab6cb0 GENERIC amd64 $ cat /etc/src-env.conf WITH_META_MODE=YES WITHOUT_CLANG=YES WITHOUT_CLANG_BOOTSTRAP=YES $ env MAKEOBJDIRPREFIX=$(realpath ../obj) make buildworld -j$(sysctl -n hw.ncpu) --- Core/ModuleList.o --- In file included from /usr/home/ctuffli/dev/freebsd/src.git/contrib/llvm-project/lldb/source/Core/ModuleList.cpp:34: In file included from /usr/home/ctuffli/dev/freebsd/src.git/contrib/llvm-project/clang/include/clang/Driver/Driver.h:12: In file included from /usr/home/ctuffli/dev/freebsd/src.git/contrib/llvm-project/clang/include/clang/Basic/Diagnostic.h:17: /usr/home/ctuffli/dev/freebsd/src.git/contrib/llvm-project/clang/include/clang/Basic/DiagnosticIDs.h:71:10: fatal error: 'clang/Basic/DiagnosticCommonKinds.inc' file not found #include "clang/Basic/DiagnosticCommonKinds.inc" ^~~ 1 error generated. *** [Core/ModuleList.o] Error code 1 Where did I goof? TIA --chuck
Re: problem with re(4) interface
On Mon, Nov 22, 2021 at 9:34 AM Chris wrote: > > On 2021-11-22 08:47, Chuck Tuffli wrote: > > Running on a recent-ish -current > > # uname -a > > FreeBSD stargate.tuffli.net 14.0-CURRENT FreeBSD 14.0-CURRENT > > main-81b22a9892 GENERIC amd64 > > > > I'm having trouble using the second NIC interface in a bridge to provide > > network connectivity to bhyve VMs and need some help figuring out what is > > wrong. ... > Because there's subtle differences between them; are you using the re driver > from base, or from ports? The driver is from base. Didn't realize there was one in ports. --chuck
problem with re(4) interface
Running on a recent-ish -current # uname -a FreeBSD stargate.tuffli.net 14.0-CURRENT FreeBSD 14.0-CURRENT main-81b22a9892 GENERIC amd64 I'm having trouble using the second NIC interface in a bridge to provide network connectivity to bhyve VMs and need some help figuring out what is wrong. The system is an AMD Ryzen mini-pc (UM250) with two RealTek gigabit NICs (8168/8111). The second NIC (re1) is a member of a bridge. A configuration similar to this works on a different system with Intel NICs, but on this system, the VMs aren't able to connect to the network. I've done the easy things and verified, for example, the interface can pass traffic (i.e. hardware, cable, switch are fine). There are some additional "odd" things. For example, ifconfig doesn't configure an address or even enable the interface. E.g., # ifconfig re1 10.0.0.10/24 up # ifconfig re1 re1: flags=8902 metric 0 mtu 1500 options=82099 ether 1c:83:41:28:c9:e4 media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 The command does appear to enable/disable the port: # tail -f /var/log/messages Nov 22 08:31:03 stargate kernel: re1: link state changed to DOWN Nov 22 08:31:11 stargate kernel: re1: link state changed to UP Note that setting the interface's address from rc.conf works, but after the system boots, setting the address from the command line doesn't. What else should I check? # ifconfig -a -G lo re0: flags=8843 metric 0 mtu 1500 options=8209b ether 1c:83:41:28:c9:e3 inet 192.168.5.10 netmask 0xff00 broadcast 192.168.5.255 media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 re1: flags=8902 metric 0 mtu 1500 options=82099 ether 1c:83:41:28:c9:e4 media: Ethernet autoselect (1000baseT ) status: active nd6 options=29 vm-public: flags=8843 metric 0 mtu 1500 ether 46:76:29:af:7b:fa id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143 ifmaxaddr 0 port 5 priority 128 path cost 200 member: re1 flags=143 ifmaxaddr 0 port 2 priority 128 path cost 2 groups: bridge vm-switch viid-4c918@ nd6 options=9 tap0: flags=8943 metric 0 mtu 1500 description: vmnet-freebsd-0-public options=8 ether 58:9c:fc:10:ff:f6 groups: tap vm-port media: Ethernet autoselect status: active nd6 options=29 Opened by PID 38298 --chuck
Re: I got a panic for "nvme0: cpl does not map to outstanding cmd" on a MACHIATObin Double Shot
On Thu, Jun 24, 2021 at 11:50 PM Mark Millard via freebsd-current wrote: > > I've given up on figuring any useful out for this example. > I've also not had a repeat so far. > > I'm progressing to much more recent commits for the > environment to be based on as well. > > The primary aarch64 system for my access is switching to > be a HoneyComb. The Optane was moved to the HoneyComb. Since the architecture isn't x86, I'm wondering if what you are seeing is related to the changes being proposed in these Differentials: https://reviews.freebsd.org/D30995 https://reviews.freebsd.org/D31002 --chuck
Re: bhyve fopen failure
On Tue, Mar 2, 2021 at 10:13 AM Mark Johnston wrote: > > On Tue, Mar 02, 2021 at 09:31:22AM -0800, Chuck Tuffli wrote: > > I'm porting some code to bhyve and am getting a failure I don't > > understand. This is git as of af11c2029006 FWIW. > > > > The code in question is for an emulated device and looks like: > > dbg = fopen("/tmp/bhyve_ata.log", "w+"); > > if (dbg == NULL) > > perror("fopen"); > > > > Running this fails with: > > fopen: Not permitted in capability mode > > Googling suggests this might be capsicum related. If so, what do I > > need to change to allow writes to a debug file? > > You would need to either open the file in the driver's initialization > routine, which I believe is executed before bhyve enters capability > mode, or add -DWITHOUT_CAPSICUM to the bhyve CFLAGS and recompile. Thanks to you both; that did the trick. I was confused as other emulated devices are doing the same thing, but there must be an ordering difference that allows them to work (I assume). --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
bhyve fopen failure
I'm porting some code to bhyve and am getting a failure I don't understand. This is git as of af11c2029006 FWIW. The code in question is for an emulated device and looks like: dbg = fopen("/tmp/bhyve_ata.log", "w+"); if (dbg == NULL) perror("fopen"); Running this fails with: fopen: Not permitted in capability mode Googling suggests this might be capsicum related. If so, what do I need to change to allow writes to a debug file? --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Intel TigerLake NVMe vmd: Adding Support & Debugging a Patch
On Wed, Dec 30, 2020 at 4:38 PM Neel Chauhan wrote: > > Hi Chuck, > > On 2020-12-30 10:04, Chuck Tuffli wrote: > > What is the output from > > # pciconf -rb pci0:0:14:0 0x40:0x48 > > The output is: > > 01 00 00 00 01 2e 68 02 00 Perfect. The Linux driver says the 8086:9a0b device you have "... may provide root port configuration information which limits bus numbering" which causes the code to read the VM Capability register (0x40) and the VM Configuration register (0x44). Here, VMCAP = 0x0001 where bit 0 set appears to mean the config register has starting bus number information. VMCFG = 0x2e01 where bits 5:4 give the coded start number of bus 224 or 0xe0 which matches the PCI bridge shown in the lspci output (i.e. 1:e0:06.0). I wonder if mirroring the logic in [1] and setting bus->rman.rm_start = 224; in vmd_attach() might help. > I was also able to stop kernel panics by adding: > > rman_fini(&sc->vmd_bus.rman); > > In the fail: statement in vmd_attach(). > > But I still cannot detect the SSD. [1] https://github.com/torvalds/linux/blob/master/drivers/pci/controller/vmd.c#L507 --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Intel TigerLake NVMe vmd: Adding Support & Debugging a Patch
On Tue, Dec 29, 2020 at 6:30 PM Neel Chauhan wrote: > > Hi freebsd-hackers@, CC'd freebsd-current@, > > I hope you all had a wonderful holiday season. > > I recently got a HP Spectre x360 13t-aw200 which is an Intel > TigerLake-based laptop. It has the Intel "Evo" branding and an "Optane" > SSD which I disabled (so I can get a "second" SSD). > > On the Spectre, the NVMe is not detected: https://imgur.com/a/ighTwHQ > > I don't know if it is HP or Intel, but the VMD IDs device id is > 8086:9a0b. I'm guessing Intel since Dell laptops (XPS, Vostro) also have > this device ID [1]. > > Sadly, NVMe RAID is forced on this laptop. > > I wrote a rough patch to add the device IDs, and the patch is below: FWIW, that is the same change I would have made. Peeking at the Linux vmd driver, it doesn't appear to do anything special for 8086:9a0b as compared to the 8086:2a0c device the FreeBSD driver already supports. That said, the Linux driver reads a capability register to determine the bus number start (vmd_bus_number_start()) which I don't see in the FreeBSD driver. This is curious because, looking at the "lspci all" output from the XPS link you provided, the NVMe device shows up in PCI domain 0x1000 (i.e. not 0x). Which (and I have no direct experience with this device or code) only happens if the bus number start function returns 0x0. What is the output from # pciconf -rb pci0:0:14:0 0x40:0x48 --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: port build fails with missing sys/smr_types.h
On Thu, Dec 3, 2020 at 4:43 PM Mark Johnston wrote: ... > $ fetch > http://ftp.freebsd.org/pub/FreeBSD/snapshots/VM-IMAGES/13.0-CURRENT/amd64/20201126/FreeBSD-13.0-CURRENT-amd64-20201126-9e082d278b9.raw.xz > $ unxz FreeBSD-13.0-CURRENT-amd64-20201126-9e082d278b9.raw.xz > $ sudo mdconfig -a -f FreeBSD-13.0-CURRENT-amd64-20201126-9e082d278b9.raw > md0 > $ sudo mount /dev/md0p4 /mnt > $ stat /mnt/usr/include/sys/smr_types.h > 544 241404 -r--r--r-- 1 root wheel 554933 4985 "Nov 26 03:57:51 2020" "Nov 26 > 03:51:14 2020" "Nov 26 03:58:26 2020" "Nov 26 03:51:14 2020" 32768 16 0 > /mnt/usr/include/sys/smr_types.h > $ > > So I'm not sure what's going on in your case. smr_types.h was added a > number of months ago. Weird, it looks like I borked my system somehow. But thank you, that helped immensely! --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: port build fails with missing sys/smr_types.h
On Thu, Dec 3, 2020 at 3:18 PM Mark Johnston wrote: > > On Thu, Dec 03, 2020 at 01:08:52PM -0800, Chuck Tuffli wrote: > > Hi > > > > I'm trying to fix the build of qemu-utils but am seeing failures on > > CURRENT (13.0-HEAD-9e082d278b9) like: > > > > In file included from util/oslib-posix.c:50: > > In file included from /usr/include/sys/user.h:51: > > In file included from /usr/include/sys/proc.h:50: > > /usr/include/sys/filedesc.h:47:10: fatal error: 'sys/smr_types.h' file not > > found > > #include > > ^ > > > > # uname -a > > FreeBSD sv0.tuffli.net 13.0-HEAD-9e082d278b9 FreeBSD > > 13.0-HEAD-9e082d278b9 #0 9e082d278b91-c254726(HEAD)-dirty: Fri Nov 27 > > 00:09:50 PST 2020 > > root@freebsd:/build/9e082d278b9/obj/build/9e082d278b9/src/amd64.amd64/sys/GENERIC-NODEBUG > > amd64 > > # ls -l /usr/include/sys/*smr* > > -r--r--r-- 1 root wheel 1988 Nov 30 14:04 /usr/include/sys/_smr.h > > -r--r--r-- 1 root wheel 7822 Nov 30 14:04 /usr/include/sys/smr.h > > > > So it appears the file is missing. Any ideas? > > How old is your world? I have /usr/include/sys/smr_types.h on my > systems. It's present on freefall as well. It is the FreeBSD-13.0-CURRENT-amd64-20201126-9e082d278b9 snapshot. If this is fixed in recent snapshots, I can move to one of those. --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
port build fails with missing sys/smr_types.h
Hi I'm trying to fix the build of qemu-utils but am seeing failures on CURRENT (13.0-HEAD-9e082d278b9) like: In file included from util/oslib-posix.c:50: In file included from /usr/include/sys/user.h:51: In file included from /usr/include/sys/proc.h:50: /usr/include/sys/filedesc.h:47:10: fatal error: 'sys/smr_types.h' file not found #include ^ # uname -a FreeBSD sv0.tuffli.net 13.0-HEAD-9e082d278b9 FreeBSD 13.0-HEAD-9e082d278b9 #0 9e082d278b91-c254726(HEAD)-dirty: Fri Nov 27 00:09:50 PST 2020 root@freebsd:/build/9e082d278b9/obj/build/9e082d278b9/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 # ls -l /usr/include/sys/*smr* -r--r--r-- 1 root wheel 1988 Nov 30 14:04 /usr/include/sys/_smr.h -r--r--r-- 1 root wheel 7822 Nov 30 14:04 /usr/include/sys/smr.h So it appears the file is missing. Any ideas? --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: PCIe NVME drives not detected on Dell R6515
On Mon, May 4, 2020 at 11:12 AM Miroslav Lachman <000.f...@quip.cz> wrote: > > On 2020-04-27 08:02, Miroslav Lachman wrote: > > I don't know what is with Scott. I hope he is well. > > Is there somebody else who can help me with this issue? > > Scott wrote there are hotplug PCIe buses not probed during boot process. > > I am not a developer so I cannot move forward alone. > > The problem is with PCIe Hot Plug. > Hot Plug bus was not enumerated thus no NVME detected. I may have just been bitten by this as well when running FreeBSD under qemu. The q35 machine type with PCIe emulation enables PCIe hot plug on all the root ports, but I am not seeing any downstream devices (either emulated like e1000 or passed through by the host) because of a check in pcib_hotplug_present(): /* * Require the Electromechanical Interlock to be engaged if * present. */ if (sc->pcie_slot_cap & PCIEM_SLOT_CAP_EIP && (sc->pcie_slot_sta & PCIEM_SLOT_STA_EIS) == 0) return (0); Under qemu, the slot indicates an Electromechanical Interlock is Present in the capabilities register, but it does not set the Electromechanical Interlock Status bit. This causes the PCI driver to not probe any children. Commenting out the above code made both emulated PCIe devices as well as host devices passed through appear in FreeBSD. As a data point, I'm not seeing similar checks in the Linux kernel. Miroslav, would it be possible to comment out/delete the above code in your kernel and retest to see if that helps your case as well? --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Build failed compiling ittnotify_static.pico
On Sat, Mar 14, 2020 at 1:16 AM Dimitry Andric wrote: > > On 14 Mar 2020, at 04:53, Masachika ISHIZUKA wrote: > > > >>> cc: error: no such file or directory: > >>> '/usr/src/contrib/llvm-project/openmp/runtime/src/thirdparty/ittnotify/ittnotify_static.c' > >>> cc: error: no input files > >>> *** [ittnotify_static.pico] Error code 1 > >> > >> 'make cleanworld' solved it, build without error. > >> I used to always delete obj for years, but I've not done that for a > >> couple of years without any problems I've noticed. > > > > Thank you for information. > > I'll try it. > > Obviously, cleaning up your /usr/obj will always help, but that is not > the point of the fix. It should always be possible to do an incremental > buildworld, but there is a deficiency in our dependency tracking system. > When a file changes extension, but its basename stays the same, the > tracking does not notice it. 0. rm -rf the obj directory contents 1. buildworld using r358851 (just prior to the llvm/clang commit) this completed. Note I do use MAKEOBJDIRPREFIX when building 2. update to r359007 and buildworld fails with: cc: error: no such file or directory: '/usr/home/ctuffli/dev/freebsd/freebsd.hg/contrib/llvm-project/openmp/runtime/src/thirdparty/ittnotify/ittnotify_static.c' cc: error: no input files *** [ittnotify_static.pico] Error code 1 The build log does not show "Removing stale dependencies" The first two lines of the amd64 depend are: $ head -2 ../obj/usr/home/ctuffli/dev/freebsd/freebsd.hg/amd64.amd64/lib/libomp/.depend.ittnotify_static.pico ittnotify_static.pico: \ /usr/home/ctuffli/dev/freebsd/freebsd.hg/contrib/llvm-project/openmp/runtime/src/thirdparty/ittnotify/ittnotify_static.c \ The first two lines of the 32-bit depend are: $ head -2 ../obj/usr/home/ctuffli/dev/freebsd/freebsd.hg/amd64.amd64/obj-lib32/lib/libomp/.depend.ittnotify_static.pico ittnotify_static.pico: \ /usr/home/ctuffli/dev/freebsd/freebsd.hg/contrib/llvm-project/openmp/runtime/src/thirdparty/ittnotify/ittnotify_static.c \ --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: nda(4) does not work (reliably) in VMware Workstation
On Sun, Dec 9, 2018 at 3:50 PM Chuck Tuffli wrote: > On Sun, Dec 9, 2018 at 3:43 PM Yuri Pankov wrote: > >> Chuck Tuffli wrote: >> > On Sat, Dec 8, 2018 at 12:28 PM Yuri Pankov > > <mailto:yur...@yuripv.net>> wrote: >> > >> > Hi, >> > >> > Running -HEAD in VMware Workstation 15.0.2 VM. Trying to use nda(4) >> > instead of nvd(4) shows the following list of errors, and eventually >> > panics: >> > >> > https://people.freebsd.org/~yuripv/nda1.png >> > https://people.freebsd.org/~yuripv/nda2.png >> > >> > nvd(4) works without issues in this VM. nda(4) works as well in >> VMware >> > ESXi VMs. Is this a problem with WS NVMe emulation? >> > >> > >> > Since I don't have access to ESXi, the attached is a speculative fix. If >> > it works, I'll clean this up a bit and get it committed. If not, please >> > post the output from: >> > nvmecontrol identtify nvme0 >> >> Thank you, it seems to help (was seeing the issue previously immediately >> after boot). BTW, the ESXi VM works fine with nda, it's only the >> Workstation that had the problem. Comparing `nvmecontrol identify` >> output from both, the only difference is (first is WS, second is ESXi): >> >> -Dataset Management Command: Not Supported >> +Dataset Management Command: Supported >> > > OK, that makes sense given the error message and the patch working. I'll > get this cleaned up and committed. Thanks for the report! > Committed r342046 to address this --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: nda(4) does not work (reliably) in VMware Workstation
On Sun, Dec 9, 2018 at 3:43 PM Yuri Pankov wrote: > Chuck Tuffli wrote: > > On Sat, Dec 8, 2018 at 12:28 PM Yuri Pankov > <mailto:yur...@yuripv.net>> wrote: > > > > Hi, > > > > Running -HEAD in VMware Workstation 15.0.2 VM. Trying to use nda(4) > > instead of nvd(4) shows the following list of errors, and eventually > > panics: > > > > https://people.freebsd.org/~yuripv/nda1.png > > https://people.freebsd.org/~yuripv/nda2.png > > > > nvd(4) works without issues in this VM. nda(4) works as well in > VMware > > ESXi VMs. Is this a problem with WS NVMe emulation? > > > > > > Since I don't have access to ESXi, the attached is a speculative fix. If > > it works, I'll clean this up a bit and get it committed. If not, please > > post the output from: > > nvmecontrol identtify nvme0 > > Thank you, it seems to help (was seeing the issue previously immediately > after boot). BTW, the ESXi VM works fine with nda, it's only the > Workstation that had the problem. Comparing `nvmecontrol identify` > output from both, the only difference is (first is WS, second is ESXi): > > -Dataset Management Command: Not Supported > +Dataset Management Command: Supported > OK, that makes sense given the error message and the patch working. I'll get this cleaned up and committed. Thanks for the report! --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: nda(4) does not work (reliably) in VMware Workstation
On Sat, Dec 8, 2018 at 12:28 PM Yuri Pankov wrote: > Hi, > > Running -HEAD in VMware Workstation 15.0.2 VM. Trying to use nda(4) > instead of nvd(4) shows the following list of errors, and eventually > panics: > > https://people.freebsd.org/~yuripv/nda1.png > https://people.freebsd.org/~yuripv/nda2.png > > nvd(4) works without issues in this VM. nda(4) works as well in VMware > ESXi VMs. Is this a problem with WS NVMe emulation? > Since I don't have access to ESXi, the attached is a speculative fix. If it works, I'll clean this up a bit and get it committed. If not, please post the output from: nvmecontrol identtify nvme0 --chuck diff -r 1fbb2025b263 sys/cam/nvme/nvme_da.c --- a/sys/cam/nvme/nvme_da.c Sun Dec 09 21:53:45 2018 + +++ b/sys/cam/nvme/nvme_da.c Sun Dec 09 15:18:08 2018 -0800 @@ -798,7 +798,7 @@ disk->d_mediasize = (off_t)(disk->d_sectorsize * nsd->nsze); disk->d_delmaxsize = disk->d_mediasize; disk->d_flags = DISKFLAG_DIRECT_COMPLETION; -// if (cd->oncs.dsm) // XXX broken? + if ((cd->oncs >> NVME_CTRLR_DATA_ONCS_DSM_SHIFT) & NVME_CTRLR_DATA_ONCS_DSM_MASK) disk->d_flags |= DISKFLAG_CANDELETE; vwc_present = (cd->vwc >> NVME_CTRLR_DATA_VWC_PRESENT_SHIFT) & NVME_CTRLR_DATA_VWC_PRESENT_MASK; ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: linux-c7 and opengl apps?
On Wed, Oct 3, 2018 at 2:39 AM Johannes Lundberg wrote: > Hi > > Have anyone successfully run opengl apps with linux-c7? > > Linux opengl apps works great with linux-c6 on gpu < kabylake but > linux-c6-dri does not include support for kabylake gpus. > Linux glxinfo in c7 show support for hardware rendering on kabylake but any > attempt to run an opengl app results in application seg fault or other > crash (I believe this is also the case with skylake gpus on linux-c7). > > Is there any way to run gdb on linux apps/core dumps? > I have a patch to generate core dumps that a Linux gdb can decode. It has limited testing, but I'm happy to rebase + share it with you. --chuck ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Announcing Emulex 10G Ethernet NIC driver availability
On Fri, Feb 10, 2012 at 12:03 AM, Daniel Braniss wrote: >> On Tue, Feb 7, 2012 at 10:39 PM, Matthew Jacob wrote: >> > Any plans for iscsi, fcoe? >> > >> >> Hi All, >> >> >> >> >> >> >> >> Please find the 10Gb Ethernet NIC driver for Emulex OneConnect >> >> (BladeEngine) and Lancer family of network adapters at >> >> Yes, Emulex is working on a native FC/FCoE driver (initiator and/or >> target) that should be ready for wider testing in the next 3-4 months. >> Note this driver only supports the more recent devices such as the >> 10GbE FCoE CNA and 16G FC HBA (i.e. LPe1600x). > > any plans for iSCSI? There is definitely a desire but no commitments at this point. ---chuck ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Announcing Emulex 10G Ethernet NIC driver availability
On Tue, Feb 7, 2012 at 10:39 PM, Matthew Jacob wrote: > Any plans for iscsi, fcoe? > >> Hi All, >> >> >> >> Please find the 10Gb Ethernet NIC driver for Emulex OneConnect >> (BladeEngine) and Lancer family of network adapters at Yes, Emulex is working on a native FC/FCoE driver (initiator and/or target) that should be ready for wider testing in the next 3-4 months. Note this driver only supports the more recent devices such as the 10GbE FCoE CNA and 16G FC HBA (i.e. LPe1600x). ---chuck ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"