Re: [Bug 204376] Cavium ThunderX system heavily loaded while at db> prompt
On Mon, 9 Nov 2015 bugzilla-nore...@freebsd.org wrote: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204376 --- Comment #2 from Conrad E. Meyer --- If ARM is anything like amd64, it just spinwaits in IPI_STOP (waiting for the CPU to be re-enabled). On amd64 you could reduce it to 2 CPUs spinning pretty easily (hlt any non-panic and non-BSP core -- they'll never be needed until reboot). But that still leaves 2 CPUs spinning. The patch attempted to hlt all non-panic CPUs in IPI_STOP, but leave interrupts enabled so they could be woken again. This does Not Work Well in panic context (I forget the details, but if you've paniced you really don't want normal interrupt code running on the non-ddb CPU(s)). Enabling normal interrupts breaks ddb context too. ddb is already broken in restarting other CPUs when it single steps. This usually enables interrupts on other CPUs (if not the current one), so the state might be completely different after you step a single instruction. Just like it might be in normal operation for unlocked states, but more so since in normal operation the single instruction runs in a few cycle but for single stepping it takes thousands or millions of cycles in real time (and the other CPUs run many of thos cycles in real time after they are restarted before they are stopped again). But it is inconvenient for the state that you are trying to debug to change much. Bruce ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
Re: [Bug 204376] Cavium ThunderX system heavily loaded while at db> prompt
On Sun, 8 Nov 2015 comment-igno...@freebsd.org wrote: --- Comment #1 from NGie Cooper --- It's not just arm64; amd64 does/did a horrible job at yielding when in the debugger (part of the reason why we have a script which goes and suspends test VMs at $work if/when they panic). I thought it worked OK since it uses cpu_pause(). But Haswell runs much hotter with cpu_pause() than with an empty loop. monitor+mwait works better. For normal idling, here it runs about 10 degrees C hotter than ACPI C2. Loops in userland without cpu_pause() (foo: jmp foo) run 25-30 degees hotter. Loops in userland with cpu_pause() (foo: pause; jmp foo) run 25-35 degrees hotter. ddb is about the same as this. But normal idling with cpu_pause() runs 40-45 degrees hotter. So ddb saves power compared with misconfigured normal idling :-). Conrad had a patch out for amd64 a few months ago which yielded in the debugger a bit on amd64, but IIRC there were issues at the time. I'll let him comment on it though. It would be nice if dropping into the debugger didn't spin all the CPUs at ~100% though. CPUs stopped by the the debugger or anything else cannot do any normal yielding or interrupt handling. Perhaps they can halt with interrupts disabled and depend on an NMI to wake them up. NMIs are hard to program, especially near kdb, but stopping CPUs already depends on NMIs to promptly stop ones that are spinning with interrupts disabled. When a CPU is stopped, it must not handle any NMIs, but just let them kick it out of a halt, and halt again unless the start/stop masks allow them to proceed. I think all console drivers are too stupid to use cpu_pause() in spinloops waiting for input, so sitting at the debugger prompt burns at least 1 CPU, but since cpu_pause() apparently doesn't work very well it might not make much difference. cpu_pause() seemed to help when I used it to make the busy-waiting in DEVICE_POLLING in idle less harmful. Perhaps it works better in i/o loops. That might help the console drivers' i/o loops too. But userland tests shows that doing a slow ISA i/o is cool enough by itself (perhaps a little cooler than the dumb spinloop, and now adding the pause makes little difference). It seems reasonable to expect the CPU to mostly shut down itself when waiting 5000+ cycles for slow i/o if neither is emulated, and blame the emulation if it uses too many cycles or too few cycles to emulate this. The normal idle spinloop must be doing something stupid to run so much hotter. It calls sched_runnable() in a loop. This obviously uses more CPU resources than "foo: jmp foo", but in other tests I never saw much difference caused by the instruction mix. Apparently sched_runnable() uses lots of CPU resources and pausing probably makes little difference. Normal idle needs to wake up fast so it needs to check a lot, but even the function call for this is wasteful. Stopped CPUs don't need to restart so fast (except for faster tracing in ddb -- it is already about 1000 times too slow). Bruce ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204376] Cavium ThunderX system heavily loaded while at db> prompt
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204376 --- Comment #2 from Conrad E. Meyer --- If ARM is anything like amd64, it just spinwaits in IPI_STOP (waiting for the CPU to be re-enabled). On amd64 you could reduce it to 2 CPUs spinning pretty easily (hlt any non-panic and non-BSP core -- they'll never be needed until reboot). But that still leaves 2 CPUs spinning. The patch attempted to hlt all non-panic CPUs in IPI_STOP, but leave interrupts enabled so they could be woken again. This does Not Work Well in panic context (I forget the details, but if you've paniced you really don't want normal interrupt code running on the non-ddb CPU(s)). -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204376] Cavium ThunderX system heavily loaded while at db> prompt
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204376 NGie Cooper changed: What|Removed |Added CC||c...@freebsd.org, ||n...@freebsd.org --- Comment #1 from NGie Cooper --- It's not just arm64; amd64 does/did a horrible job at yielding when in the debugger (part of the reason why we have a script which goes and suspends test VMs at $work if/when they panic). Conrad had a patch out for amd64 a few months ago which yielded in the debugger a bit on amd64, but IIRC there were issues at the time. I'll let him comment on it though. It would be nice if dropping into the debugger didn't spin all the CPUs at ~100% though. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204378] xhci fails on Cavium ThunderX
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204378 Bug ID: 204378 Summary: xhci fails on Cavium ThunderX Product: Base System Version: 11.0-CURRENT Hardware: arm64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: ema...@freebsd.org ThunderX XHCI requires MSI I tried booting with: hw.usb.xhci.use_polling=0 hw.usb.xhci.debug=255 and I see: xhci0: mem 0x8680-0x8680001f,0x86800020-0x8680002f at device 16.0 on pci0^M xhci_init: ^M xhci_init: CAPLENGTH=0x20^M xhci_init: RUNTIMEOFFSET=0x440^M xhci_init: DOOROFFSET=0x480^M xhci_init: xHCI version = 0x0100^M xhci_init: HCS0 = 0x0220f665^M xhci0: 64 bytes context size, 64-bit DMA^M xhci_init: Max slots: 64^M xhci_init: HCS2=0x0cf1^M xhci_init: Max scratch: 1^M xhci0: Could not allocate IRQ^M xhci_halt_controller: ^M device_attach: xhci0 attach returned 6^M xhci0: mem 0x8690-0x8690001f,0x86900020-0x8690002f at device 17.0 on pci0^M xhci_init: ^M xhci_init: CAPLENGTH=0x20^M xhci_init: RUNTIMEOFFSET=0x440^M xhci_init: DOOROFFSET=0x480^M xhci_init: xHCI version = 0x0100^M xhci_init: HCS0 = 0x0220f665^M xhci0: 64 bytes context size, 64-bit DMA^M xhci_init: Max slots: 64^M xhci_init: HCS2=0x0cf1^M xhci_init: Max scratch: 1^M xhci0: Could not allocate IRQ^M xhci_halt_controller: ^M device_attach: xhci0 attach returned 6^M I tried booting with xhci polling: set hw.usb.xhci.use_polling=1 set hw.usb.xhci.debug=255 and I see: xhci0: mem 0x8680-0x8680001f,0x86800020-0x8680002f at device 16.0 on pci0^M xhci_init: ^M xhci_init: CAPLENGTH=0x20^M xhci_init: RUNTIMEOFFSET=0x440^M xhci_init: DOOROFFSET=0x480^M xhci_init: xHCI version = 0x0100^M xhci_init: HCS0 = 0x0220f665^M xhci0: 64 bytes context size, 64-bit DMA^M xhci_init: Max slots: 64^M xhci_init: HCS2=0x0cf1^M xhci_init: Max scratch: 1^M xhci0: Could not allocate IRQ^M xhci0: Interrupt polling at 1000Hz^M xhci_interrupt: real interrupt (status=0x0001)^M xhci_interrupt: host controller halted^M xhci_halt_controller: ^M xhci_start_controller: ^M xhci_start_controller: CONFIG=0x -> 0x0040^M xhci_start_controller: ERSTSZ=0x -> 0x0001^M xhci_start_controller: ERDP(0)=0x07600080^M xhci_start_controller: ERSTBA(0)=0x0760^M xhci_start_controller: CRCR=0x07600d80^M usbus0 on xhci0^M xhci0: usbpf: Attached^M random: harvesting attach, 8 bytes (4 bits) from usbus0^M random: harvesting attach, 8 bytes (4 bits) from xhci0^M xhci1: mem 0x8690-0x8690001f,0x86900020-0x8690002f at device 17.0 on pci0^M xhci_init: ^M xhci_init: CAPLENGTH=0x20^M xhci_init: RUNTIMEOFFSET=0x440^M xhci_init: DOOROFFSET=0x480^M xhci_init: xHCI version = 0x0100^M xhci_init: HCS0 = 0x0220f665^M xhci1: 64 bytes context size, 64-bit DMA^M xhci_init: Max slots: 64^M xhci_init: HCS2=0x0cf1^M xhci_init: Max scratch: 1^M xhci1: Could not allocate IRQ^M xhci1: Interrupt polling at 1000Hz^M xhci_interrupt: real interrupt (status=0x0001)^M xhci_interrupt: host controller halted^M xhci_halt_controller: ^M xhci_start_controller: ^M xhci_start_controller: CONFIG=0x -> 0x0040^M xhci_start_controller: ERSTSZ=0x -> 0x0001^M xhci_start_controller: ERDP(0)=0x066cd080^M xhci_start_controller: ERSTBA(0)=0x066cd000^M xhci_start_controller: CRCR=0x066cdd80^M usbus1 on xhci1^M xhci1: usbpf: Attached^M random: harvesting attach, 8 bytes (4 bits) from usbus1^M random: harvesting attach, 8 bytes (4 bits) from xhci1^M and then every few seconds: xhci_set_hw_power: xhci_set_hw_power: xhci_set_hw_power: xhci_set_hw_power: xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0001 xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0001 xhci_roothub_exec: UR_GET_STATUS i=1 xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0001 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: UR_GET_STATUS i=1 xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0002 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: UR_GET_STATUS i=2 xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0002 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: UR_GET_STATUS i=2 xhci_roothub_exec: UR_GET_STATUS i=1 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0001 xhci_roothub_exec: type=0xa3 request=0x00 wLen=0x0004 wValue=0x wIndex=0x0002 xhci_roothub_exec: UR_GET_STATUS i=1 xhci_roothub_exec: UR_GET_STATUS i=2 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: port status=0x02a0 xhci_roothub_exec: type=0xa3 request=0x00
[Bug 204376] Cavium ThunderX system heavily loaded while at db> prompt
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204376 Ed Maste changed: What|Removed |Added Summary|System heavily loaded while |Cavium ThunderX system |at db> prompt |heavily loaded while at db> ||prompt -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204376] System heavily loaded while at db> prompt
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204376 Ed Maste changed: What|Removed |Added Hardware|Any |arm64 -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204376] System heavily loaded while at db> prompt
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204376 Bug ID: 204376 Summary: System heavily loaded while at db> prompt Product: Base System Version: 11.0-CURRENT Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: ema...@freebsd.org I don't have a way to directly quantify this, but have observed it as a side-effect. I have a Cavium ThunderX 96-core system beside me in a hotel room. While operating normally the system fans are reasonably quiet, but when the system panics the they start increasing in speed as the system heats up. After a short while are producing enough noise that I cannot leave the system running, ending my debug session. -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204326] [patch] sys/dev/oce bugged promiscuous mode
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204326 Mark Linimon changed: What|Removed |Added Assignee|freebsd-bugs@FreeBSD.org|freebsd-...@freebsd.org -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204358] zfs loader zfs_probe_args secsz is too small, causing memory corruption
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204358 Mark Linimon changed: What|Removed |Added Assignee|freebsd-bugs@FreeBSD.org|freebsd...@freebsd.org Keywords||patch -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"
[Bug 204371] lib/libmd/sha1c.c - silence a compiler warning
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204371 Mark Linimon changed: What|Removed |Added Keywords||patch -- You are receiving this mail because: You are the assignee for the bug. ___ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"