Re: Weird clock behaviour with current (amd64) kernel
This is likely somewhat similar to what I reported here: http://mail-index.netbsd.org/current-users/2019/07/29/msg036293.html tl;dr: weird clock behaviour on GCE micro instances. This at least provides a nice easy testbed. -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: Weird clock behaviour with current (amd64) kernel
On Sun, Aug 14, 2022 at 09:00:20AM +0700, Robert Elz wrote: > Date:Sun, 14 Aug 2022 00:28:38 +0200 > From:Joerg Sonnenberger > Message-ID: > > | I'm more wondering about the LAPIC frequency here. That one is normally > | used to drive the clockintr and if that frequency is off, interrupt rate > | would be off too. Does the interrupt rate match HZ? > > That's a very good question, I never thought to check that, and should have. > I will do later today, when I also test Michael's latest patch for HPET > overflow. > > Thanks both. > > Do you (either of you, or anyone else) consider that what is happening > here might be related to PR 43997? If it is, then this might not be > quite so unimportant as I had been considering it. PR 43997 is more of a bug in qemu than anything else. You cannot emulate a 100Hz interrupt when your clock granularity for sleep is 10ms. Best you can do is to catch up interrupts when you are too late but which has other problems. Qemu doesn't catch up, and so the emulated interrupt effectively runs at 50Hz. Linux (tickless kernel) has a clock granularity of ideally zero (in reality limited by clock resolution and CPU speed), so you don't see such a problem there. You can still get a discrepancy between sleep time and wall clock time as these clocks run independently (starting with the fact that you sleep for "at least" some interval), but that's a different problem. Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
daily CVS update output
Updating src tree: P src/share/man/man4/viocon.4 P src/share/man/man4/man4.amiga/amidisplaycc.4 P src/share/man/man9/bus_space.9 P src/sys/arch/arm/acpi/acpi_pci_machdep.c P src/sys/arch/arm/acpi/acpipchb.c P src/sys/arch/arm/fdt/pcihost_fdt.c P src/sys/arch/evbarm/conf/GENERIC64 P src/sys/arch/macppc/macppc/disksubr.c P src/sys/arch/x86/x86/bus_dma.c P src/sys/dev/audio/audio.c P src/sys/dev/audio/audiodef.h P src/sys/dev/pci/pciconf.c P src/sys/dev/virtio/viocon.c P src/tests/dev/audio/audiotest.c P src/tests/dev/cgd/t_cgdconfig.sh Updating xsrc tree: Killing core files: Updating file list: -rw-rw-r-- 1 srcmastr netbsd 38466670 Aug 14 03:04 ls-lRA.gz
Re: Weird clock behaviour with current (amd64) kernel
Date:Sun, 14 Aug 2022 00:28:38 +0200 From:Joerg Sonnenberger Message-ID: | I'm more wondering about the LAPIC frequency here. That one is normally | used to drive the clockintr and if that frequency is off, interrupt rate | would be off too. Does the interrupt rate match HZ? That's a very good question, I never thought to check that, and should have. I will do later today, when I also test Michael's latest patch for HPET overflow. Thanks both. Do you (either of you, or anyone else) consider that what is happening here might be related to PR 43997? If it is, then this might not be quite so unimportant as I had been considering it. kre
Re: Weird clock behaviour with current (amd64) kernel
On Sun, Aug 14, 2022 at 02:38:07AM +0700, Robert Elz wrote: > To avoid delays in a message turnaround, this is what sysctl says is > available (this output is from a normal boot, not the PCIDUMP one). > > kern.timecounter.choice = TSC(q=3000, f=3417601000 Hz) clockinterrupt(q=0, > f=100 Hz) lapic(q=-100, f=3840 Hz) hpet0(q=2000, f=1920 Hz) > ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-100, > f=100 Hz) I'm more wondering about the LAPIC frequency here. That one is normally used to drive the clockintr and if that frequency is off, interrupt rate would be off too. Does the interrupt rate match HZ? Joerg
Re: Weird clock behaviour with current (amd64) kernel
mlel...@serpens.de (Michael van Elst) writes: >In your case, you say it takes ~6 minutes between attachment and >calibration and your hpet runs at 19.2MHz. >This is enough for HPET_MCOUNT_LO to overflow. This patch adds a separate delay of ~0.1 seconds to calibrate the timers. This should avoid any overflow. Index: sys/dev/ic/hpet.c === RCS file: /cvsroot/src/sys/dev/ic/hpet.c,v retrieving revision 1.17 diff -p -u -r1.17 hpet.c --- sys/dev/ic/hpet.c 16 May 2020 23:06:40 - 1.17 +++ sys/dev/ic/hpet.c 13 Aug 2022 21:24:58 - @@ -54,8 +54,6 @@ static u_int hpet_get_timecount(struct t static boolhpet_resume(device_t, const pmf_qual_t *); static struct hpet_softc *hpet0 __read_mostly; -static uint32_t hpet_attach_val; -static uint64_t hpet_attach_tsc; int hpet_detach(device_t dv, int flags) @@ -147,14 +145,6 @@ hpet_attach_subr(device_t dv) eval = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); val = eval - sval; sc->sc_adj = (int64_t)val * sc->sc_period / 1000; - - /* Store attach-time values for computing TSC frequency later. */ - if (cpu_hascounter() && sc == hpet0) { - (void)cpu_counter(); - val = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); - hpet_attach_tsc = cpu_counter(); - hpet_attach_val = val; - } } static u_int @@ -214,33 +204,37 @@ uint64_t hpet_tsc_freq(void) { struct hpet_softc *sc; - uint64_t td, val, freq; - uint32_t hd; + uint64_t td0, td, val, freq; + uint32_t hd0, hd; int s; if (hpet0 == NULL || !cpu_hascounter()) return 0; - /* Slow down if we got here from attach in under 0.1s. */ sc = hpet0; - hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); - hd -= hpet_attach_val; - if (hd < (uint64_t)10 * 10 / sc->sc_period) - hpet_delay(10); + + s = splhigh(); + (void)cpu_counter(); + (void)bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); + hd0 = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); + td0 = cpu_counter(); + splx(s); + + hpet_delay(10); /* * Determine TSC freq by comparing how far the TSC and HPET have -* advanced since attach time. Take the cost of reading HPET -* register into account and round result to the nearest 1000. +* advanced and round result to the nearest 1000. */ s = splhigh(); (void)cpu_counter(); + (void)bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); td = cpu_counter(); splx(s); - hd -= hpet_attach_val; - val = ((uint64_t)hd * sc->sc_period - sc->sc_adj) / 1; - freq = (td - hpet_attach_tsc) * 1000 / val; + + val = (uint64_t)(hd - hd0) * sc->sc_period / 1; + freq = (td - td0) * 1000 / val; return rounddown(freq + 500, 1000); }
Re: HPE H240 in HBA mode
I have seen some somewhat official recommendations from HPE to leave such controllers in RAID mode and to then create an individual RAID 0 target for each disk. It seems more complicated than is needed, but it's what they said to do and it did seem to work. Cheers On 13/08/22 19:35, os...@fessel.org wrote: Hej, while trying to get rid of the issues with the mfii driver when running xen, i popped in a H240 controller int o my DL380. This works fine, but there seems to be no real driver for that Card. When running in RAID mode, ciss claims this device and works. But I want to avoid double raid overhead (this runs zfs), so I configured the controller to HBA-mode. Obviously, the ciss driver now does not recognize the connected drives: [ 1.03] ciss1 at pci12 dev 0 function 0: HP Smart Array 10 [ 1.03] ciss1: interrupting at msix6 vec 0 [ 1.03] ciss1: 0 LDs, HW rev 1, FW 7.00/7.00, 64bit fifo rro, method perf 0x2005 [ 1.03] scsibus2 at ciss1: 0 targets, 1 lun per target Looks to me ciss only operated with the HP virtual disks. Is there a driver for HBA mode on these cards? Cheers Oskar
Re: Weird clock behaviour with current (amd64) kernel
On Sun, Aug 14, 2022 at 02:38:07AM +0700, Robert Elz wrote: > Date:Sat, 13 Aug 2022 17:41:05 +0200 > From:Michael van Elst > Message-ID: > > | If you boot the kernel in debug mode (netbsd -x), > > I did. > > | you may see output like: > > which was: > > [ 1.03] cpu0: TSC freq CPUID 341760 Hz > [ 1.03] cpu0: TSC freq from CPUID 341760 Hz > [ 1.064451] xhci0: hcc2=0x1fd > [ 1.064451] xhci3: hcc2=0xfd > [ 1.064451] cpu0: TSC freq from HPET 9007294000 Hz > [ 1.064451] cpu0: TSC freq CPUID 341760 Hz > [ 1.064451] cpu0: TSC freq calibrated 9007294000 Hz So it's the HPET calibration that goes wrong. The calibration works like: Fetch hpet and tsc at attach time. (void)cpu_counter(); val = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); hpet_attach_tsc = cpu_counter(); hpet_attach_val = val; When calibtrating, make sure that hpet has counted for at least 0.1 seconds: hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); hd -= hpet_attach_val; if (hd < (uint64_t)10 * 10 / sc->sc_period) hpet_delay(10); Fetch hpet and tsc again s = splhigh(); (void)cpu_counter(); hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO); td = cpu_counter(); splx(s); Compute tsc frequency from hpet frequency. hd -= hpet_attach_val; val = ((uint64_t)hd * sc->sc_period - sc->sc_adj) / 1; freq = (td - hpet_attach_tsc) * 1000 / val; return rounddown(freq + 500, 1000); In your case, you say it takes ~6 minutes between attachment and calibration and your hpet runs at 19.2MHz. This is enough for HPET_MCOUNT_LO to overflow. Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Weird clock behaviour with current (amd64) kernel
Date:Sat, 13 Aug 2022 17:41:05 +0200 From:Michael van Elst Message-ID: | If you boot the kernel in debug mode (netbsd -x), I did. | you may see output like: which was: [ 1.03] cpu0: TSC freq CPUID 341760 Hz [ 1.03] cpu0: TSC freq from CPUID 341760 Hz [ 1.064451] xhci0: hcc2=0x1fd [ 1.064451] xhci3: hcc2=0xfd [ 1.064451] cpu0: TSC freq from HPET 9007294000 Hz [ 1.064451] cpu0: TSC freq CPUID 341760 Hz [ 1.064451] cpu0: TSC freq calibrated 9007294000 Hz [ 1.972808] cpu1: TSC skew=4 drift=0 [ 1.993656] cpu2: TSC skew=0 drift=0 [ 2.060514] cpu3: TSC skew=81 drift=-16 [ 2.187906] cpu4: TSC skew=-61 drift=4 [ 2.213693] cpu5: TSC skew=0 drift=0 [ 2.289145] cpu6: TSC skew=0 drift=0 [ 2.415347] cpu7: TSC skew=0 drift=0 [ 2.441701] cpu8: TSC skew=-65 drift=4 [ 2.557519] cpu9: TSC skew=0 drift=0 [ 2.583039] cpu10: TSC skew=-2 drift=-2 [ 2.705257] cpu11: TSC skew=68 drift=-127 [ 2.731613] cpu12: TSC skew=0 drift=0 [ 2.799767] cpu13: TSC skew=0 drift=0 [ 2.875167] cpu14: TSC skew=0 drift=0 [ 2.944878] cpu15: TSC skew=-51 drift=49 [ 3.074237] cpu16: TSC skew=-28 drift=2 [ 3.100594] cpu17: TSC skew=-27 drift=-2 [ 3.219004] cpu18: TSC skew=-27 drift=-1 [ 3.244362] cpu19: TSC skew=-31 drift=6 [ 3.365815] cpu20: TSC skew=-9 drift=0 [ 3.391221] cpu21: TSC skew=-5 drift=0 [ 3.512663] cpu22: TSC skew=-33 drift=4 [ 3.538003] cpu23: TSC skew=-3 drift=-1 [ 3.654063] timecounter: Timecounter "TSC" frequency 9007294000 Hz quality 3000 (ignore the xhci lines, they just matched the grep pattern "TSC"). A normal boot (no PCIDUMP) ends with: [ 1.678054] timecounter: Timecounter "TSC" frequency 3417601000 Hz quality 3000 jacaranda$ bc scale=3 9007294000/341760 2.635 which is almost exactly the slowdown I was seeing in user mode once it booted. Note that aside from time running slowly, everything else seems to work fine. After switching to hpet0 (after having booted as above) time advances at the proper rate again, but internal delays (as in that needed for "sleep 1" to sleep for a second) keep running at the slow rate. In the dmesg, everything that happens at timestamp 1.064451 is while the PCI CONFIG DUMP code is running - that lasts about 6 minutes, maybe a bit more with -x (I didn't time it this time), during which the timestamps don't advance. The timestamps that happen after that are running on slow time (a 1 second increment is really about 2.6 seconds). | "from delay" is the first calibration against the i8254 timer. I don't have any "from delay" output. | The patch should improve the accuracy of the "from delay" value. In that case it is no surprise that nothing changed much. I am guessing that that line would appear without -x as well, the other ones do - I don't see a "from delay" line in any dmesg output I have saved (unless that was something new you just added, and does only appear with -x - in which case it simply wasn't invoked on my system). | It's also the only place that could have been influenced by e.g. console | output. Clearly there's some other place, the only difference between the kernel that has this problem, and one that doesn't, is the PCIDUMP option being enabled: jacaranda$ (command cd ~/src/sys-conf; diff JACARANDA JACARANDA_PCI_DUMP) 207c207 < #options PCI_CONFIG_DUMP # verbosely dump PCI config space --- > options PCI_CONFIG_DUMP # verbosely dump PCI config space jacaranda$ | If you have a working HPET, I do, that one works fine in general. | the second calibration should be better. | Here it always returns exactly the same number. See above. To avoid delays in a message turnaround, this is what sysctl says is available (this output is from a normal boot, not the PCIDUMP one). kern.timecounter.choice = TSC(q=3000, f=3417601000 Hz) clockinterrupt(q=0, f=100 Hz) lapic(q=-100, f=3840 Hz) hpet0(q=2000, f=1920 Hz) ACPI-Fast(q=1000, f=3579545 Hz) i8254(q=100, f=1193182 Hz) dummy(q=-100, f=100 Hz) As I have said before, this is most probably an exceedingly unimportant issue, not worth spending a lot of time on. That is, unless perhaps it is exposing some other bug that may possibly reveal itself in some other situations. kre
Re: specfs/spec_vnops.c diagnostic assertion panic
Date:Sat, 13 Aug 2022 14:44:46 + From:Taylor R Campbell Message-ID: <20220813144453.af03d60...@jupiter.mumble.net> | When _userland_ opens the raw /dev/rdkN _character_ device, for a | wedge on (say) raid0, the _kernel_ will do the equivalent of opening | the /dev/raid0 _block_ device. Oh yes, sorry, I saw that in your earlier messages, and somehow failed to take any notice... | The patch to spec_vnops.c is necessary to make spec_open gracefully | return EBUSY instead of crashing the kernel when this happens. I have no problem with that patch, my question was more whether I should keep it removed in the kernels I am running (I can do, no problem with that, if it might help). kre
Re: Weird clock behaviour with current (amd64) kernel
Date:Sun, 7 Aug 2022 09:17:52 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | Does this help ? | | Index: sys/arch/x86/x86/cpu.c Finally got a chance to test that patch properly - sorry for the delay. The result: "not much" if anything at all. Running that system (after booting with PCIDUMP enabled) and using TSC as the clock source, a while sleep 1; do date; done loop prints the date (which increases a second at a time) about every 2.6 seconds (previously it was about 3 - but that was a less precise measurement). After switching to hpet0 (no reboot) the same loop runs at about the same speed, but the date (time really) printed is advancing in accordance with clock time - successive times reported differ by 2 or 3 seconds. I have the dmesg from the boot (I used -x for this one) if that might help (or some parts of it perhaps - the file is just a little under 1.5MB). kre
Re: Weird clock behaviour with current (amd64) kernel
On Sat, Aug 13, 2022 at 10:15:30PM +0700, Robert Elz wrote: > > The result: "not much" if anything at all. If you boot the kernel in debug mode (netbsd -x), you may see output like: [ 1.03] cpu0: TSC freq from delay 2521276800 Hz maybe also something like: [ ] cpu0: TSC freq from CPUID XX Hz [ 1.057594] cpu0: TSC freq from HPET 2491906000 Hz [ 1.957885] cpu1: TSC skew=8 drift=0 [ 2.014612] cpu2: TSC skew=34 drift=4 [ 2.181611] cpu3: TSC skew=34 drift=4 [ 2.291306] timecounter: Timecounter "TSC" frequency 2491906000 Hz quality 3000 "from delay" is the first calibration against the i8254 timer. "from CPUID" is a value that the CPU reports. "from HPET" is a second calibration against the HPET timer. The patch should improve the accuracy of the "from delay" value. It's also the only place that could have been influenced by e.g. console output. If you have a working HPET, the second calibration should be better. Here it always returns exactly the same number. Greetings, -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: specfs/spec_vnops.c diagnostic assertion panic
> Date: Sat, 13 Aug 2022 19:47:54 +0700 > From: Robert Elz > > However, since we now know the issue we have been looking at does involve > the raw devices, not the block ones, I'm not sure what is the point of > reverting that specfs_vnode.c patch, which only affects the block device > open. If that is needed, we might as well keep it, right? It shouldn't > affect current testing either way. When _userland_ opens the raw /dev/rdkN _character_ device, for a wedge on (say) raid0, the _kernel_ will do the equivalent of opening the /dev/raid0 _block_ device. All I/O by userland through /dev/rdkN goes through the block device in the kernel. Normally, the paths in the kernel for open/close on /dev/rdkN arrange to open the block device only once at a time, and serialize the opening and closing the block device under a lock -- well, except they _don't_ serialize closing the block device under that lock. So if, say, fsck opens and closes /dev/rdkN, and dkctl opens /dev/rdkM at about the same time, dkctl might race to open the block device (in the kernel) before fsck has finished closing it (again, in the kernel). That's the race that the patch to dk.c avoids. The patch to spec_vnops.c is necessary to make spec_open gracefully return EBUSY instead of crashing the kernel when this happens. The patch to dk.c is necessary to serialize the /dev/rdkN open/close logic so that it never hits this case at all when opening the raid0 block device -- and thus never spuriously fails with EBUSY _or_ crashes the kernel.
Re: specfs/spec_vnops.c diagnostic assertion panic
Turns out that I was misled by cvs diff ... I did that on the kernel after the previous crash, just to verify that nothing unexpected had happened, but forgot that that diffs the checked out version against the version it was checked out from. When I saw no diffs from specfs_vnode.c I jumped to the conclusion that it had been updated to 1.213 by by earlier update (I use a script, which has the ability to use -A, though I didn't knowingly cause that to happen). I should have actually checked the file, and now have. The currently running kernel has 1.212 of that file, the dk.c "move the unlocks" patch, (plus Michael's TSC calibration patch, and mine to allow opens of wd drives with no sectors, just the 'd' partition). Apart from those it is HEAD as of an hour or two ago. No problems booting (which doesn't really mean a lot, but I will keep running this one - and/or the same thing with PCIDUMP enabled to check the effect of Michael's patch), until something newer is needed. However, since we now know the issue we have been looking at does involve the raw devices, not the block ones, I'm not sure what is the point of reverting that specfs_vnode.c patch, which only affects the block device open. If that is needed, we might as well keep it, right? It shouldn't affect current testing either way. kre
Re: specfs/spec_vnops.c diagnostic assertion panic
OK, ignore the previous crash - whatever it was, either it was some weird one off, or it was something unrelated that has been fixed in the past 18 hours (and caused in the period not long before that). I did as I said, updated the src tree again, backed out your dk.c patch, and rebuilt the kernel. But it appears that somehow while doing that I lost the specfs_vnode version backout, so I will repeat that again in a while. That version worked fine. I reapplied the dk.c patch, verified that only dk.o and vers.o were recompiled in the kernel build, and tested that one. That's what I am running now (ie: no issues). Now I need to try again with the older specfs_vnode.c (will take a while before I can get to that, but in a few hours). kre
Re: specfs/spec_vnops.c diagnostic assertion panic
Date:Sat, 13 Aug 2022 12:10:46 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | That panic should be fixed by now, it was an inverted assertion. OK, thanks - I did see it gone in my latest test (the message about which was delayed getting to mail.netbsd.org) - but nice to know it was a known issue and fixed, so I can simply trash that kernel binary and don't need to investigate. Things are now in a state, I believe, where I should be able to boot a PCIDUMP kernel again, and properly test your patch... I will do that soon. kre
Re: specfs/spec_vnops.c diagnostic assertion panic
k...@munnari.oz.au (Robert Elz) writes: >vpanic() >kern_assert() >_bus_dmamem_unmap.constprop.0() at +0x157 That panic should be fixed by now, it was an inverted assertion.
Re: specfs/spec_vnops.c diagnostic assertion panic
Date:Fri, 12 Aug 2022 23:35:26 + From:Taylor R Campbell Message-ID: <20220812233531.8c22560...@jupiter.mumble.net> | Can you try _reverting_ specfs_blockopen.patch, and _applying_ the | attached dkopenclose.patch, and see if you can reproduce any crash? OK, I put specfs_vnode.c back to 1.212 and applied that patch, and yes, getting a crash from that is easy - but isn't in any way related to specfs that I can see. Further, this one happens early - very early - too early for the kernel to have attached my USB keyboard, so I cannot interact with ddb at all. But I do think I managed to see the full backtrace from the "command on enter" stuff that happens (but it didn't get as far as the register dump), and the actual panic message was lost, but it is certainly a KASSERT failing. I should also note that between the last kernel build and this one, I had updated by src tree, and I see that there were some autoconf changes applied in that, so it is possible that it isn't your patch that caused the problem. I am going to undo that, and build a new kernel (with even more updated src tree, not that I see any changes today that are likely to matter - the audio changes cannot be related) and see what happens. If that works, I will apply your patch again, so that is the only change that is being made (I will leave specfs_vnode.c at 1.212 through all of this, and temporarily simply ignore any crash that looks like the fsck_ffs/dkctl race issue for now). The kernel stack trace (with most details omitted, though I have a photo which shows it all) vpanic() kern_assert() _bus_dmamem_unmap.constprop.0() at +0x157 nvme_dmamem_free() at +0x2c nvme_attach() at +0x4e3 nvme_pci_attach() config_attach_internal() config_found() pci_probe_device() pci_enumerate_bus() pcirescan() pciattach() config_attach_internal() config_found() ppbattach() config_attach_internal() config_found() pci_probe_device() pci_enumerate_bus() pcirescan() pciattach() config_attach_internal() config_found() ppbattach() config_attach_internal() config_found() pci_probe_device() pci_enumerate_bus() pcirescan() ppciattach() config_attach_internal() config_found() mp_pci_scan() amd64_mainbus_attach() config_attach_internal() config_rootfound() cpu_configure() main() Note that I had to piece that together from the msgbuf stacktrace that the panic prints, and the bt ddb command that ddb runs when entered, and because of that (and the repititive nature of some of it) it is entirely possible that some of the frames listed above are duplicates. There are definitely at least 2 instances of pciattach()/pcirescan()/pci_enumerate_bus() in the stacktrace (those are visible in both the dmesg and bt stack traces) - but I am unable to tell if they are the same frames or not (actually, I cannot be certain there aren't more instances of that group of stack frames that don't appear on the screen at all). The two ends of the trace will be correct however, just not necessarily all that is in the middle. More later after I do some more tests. kre
HPE H240 in HBA mode
Hej, while trying to get rid of the issues with the mfii driver when running xen, i popped in a H240 controller int o my DL380. This works fine, but there seems to be no real driver for that Card. When running in RAID mode, ciss claims this device and works. But I want to avoid double raid overhead (this runs zfs), so I configured the controller to HBA-mode. Obviously, the ciss driver now does not recognize the connected drives: [ 1.03] ciss1 at pci12 dev 0 function 0: HP Smart Array 10 [ 1.03] ciss1: interrupting at msix6 vec 0 [ 1.03] ciss1: 0 LDs, HW rev 1, FW 7.00/7.00, 64bit fifo rro, method perf 0x2005 [ 1.03] scsibus2 at ciss1: 0 targets, 1 lun per target Looks to me ciss only operated with the HP virtual disks. Is there a driver for HBA mode on these cards? Cheers Oskar smime.p7s Description: S/MIME cryptographic signature