re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread matthew green
> #3  0x80fe6e5f in kern_assert ()
> #4  0x8058be67 in bus_dmamap_sync ()
> #5  0x8044edc7 in rge_rxeof ()
> #6  0x804536fd in rge_intr ()

i'm pretty sure this is the 2nd bus_dmamap_sync() call, as that's
the only dma map that has load/unload applied at run time, vs the
init sequence only, and it implies to me that rx dma map has had
allocation failures to deplete the entire ring of mbufs, and then
there are no mappings in the dma map, which leaves the dm_mapsize
as 0, and triggers this bug.

if i'm right, what's happened is this:

1237 for (i = sc->rge_ldata.rge_rxq_considx; ; i = RGE_NEXT_RX_DESC(i)) 
{

1245 if (RGE_OWN(cur_rx))
1246 break;

1252 rxq = >rge_ldata.rge_rxq[i];
1253 m = rxq->rxq_mbuf;

1257 /* Invalidate the RX mbuf and unload its map. */
1258 bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0,
1259 rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD);
1260 bus_dmamap_unload(sc->sc_dmat, rxq->rxq_dmamap);

1283  * If allocating a replacement mbuf fails,
1284  * reload the current one.

1287 if (rge_newbuf(sc, i) != 0) {
1288 if (sc->rge_head != NULL) {
1289 m_freem(sc->rge_head);
1290 sc->rge_head = sc->rge_tail = NULL;
1291 }
1292 rge_discard_rxbuf(sc, i);
1293 continue;
1294 }

loop 'i' has the ability to range between 0 and 1023, and
accesses each ring entries rge_rxq.  if, over time, each 
value between 0 and 1023 triggers the rge_newbuf() failure
path, each successive entry will be lost, never to be 
replaced unless an explicit ifconfig down/up occurs.

hmmm, but in thie case, no buffers would should be set to
be available for rx, so nowthing should pass RGE_OWN() at
L1245 i'd hope.  i still see the problem with everything
being depleted, but then it should just stop getting any
rx packets at all...

networking folks, am i missing something here?  i see the
same problem in wm(4) as well.  if wm_add_rxbuf() fails,
where will this ring entry's mbuf ever be replaced again?


.mrg.


daily CVS update output

2023-10-16 Thread NetBSD source update


Updating src tree:
P src/sys/arch/amd64/amd64/locore.S
P src/sys/arch/i386/i386/locore.S
P src/sys/arch/vax/vax/pmap.c
P src/sys/arch/x86/acpi/acpi_machdep.c
P src/sys/arch/x86/acpi/acpi_wakeup.c
P src/sys/arch/x86/include/genfb_machdep.h
P src/sys/arch/x86/pci/pci_machdep.c
P src/sys/arch/x86/x86/genfb_machdep.c
P src/sys/arch/x86/x86/hyperv.c
P src/sys/arch/xen/include/hypervisor.h
P src/sys/arch/xen/x86/pvh_consinit.c
P src/sys/arch/xen/xen/xen_machdep.c
P src/sys/dev/sequencer.c
P src/sys/dev/hid/hid.c
P src/sys/net/lagg/if_lagg.c
P src/tests/net/if_lagg/t_lagg.sh

Updating xsrc tree:


Killing core files:




Updating file list:
-rw-rw-r--  1 srcmastr  netbsd  42120453 Oct 17 03:03 ls-lRA.gz


Re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread Thomas Klausner
On Tue, Oct 17, 2023 at 10:07:14AM +1100, Matthew Green wrote:
> > panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
> > "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0
> 
> this is from:
> 
> KASSERTMSG(offset < map->dm_mapsize,
> "bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE,
> offset, map->dm_mapsize);
> 
> the mapsize being zero indicates that there's nothing mapped
> currently in this dma map, so there's nothing to sync.  ie,
> the caller seems to be trying to sync something not mapped.
> 
> can you post the full back trace?

Sure:

(gdb) bt
#0  0x80239c75 in cpu_reboot ()
#1  0x80ddb28d in kern_reboot ()
#2  0x80e21798 in vpanic ()
#3  0x80fe6e5f in kern_assert ()
#4  0x8058be67 in bus_dmamap_sync ()
#5  0x8044edc7 in rge_rxeof ()
#6  0x804536fd in rge_intr ()
#7  0x80592c15 in intr_biglock_wrapper ()
#8  0x80214405 in Xhandle_ioapic_edge18 ()
#9  0x8023547d in x86_mwait ()
#10 0x805819d0 in acpicpu_cstate_idle ()
#11 0x80dbe5d6 in idle_loop ()
#12 0x80210327 in lwp_trampoline ()
#13 0x in ?? ()

 Thomas


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread matthew green
> panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
> "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0

this is from:

KASSERTMSG(offset < map->dm_mapsize,
"bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE,
offset, map->dm_mapsize);

the mapsize being zero indicates that there's nothing mapped
currently in this dma map, so there's nothing to sync.  ie,
the caller seems to be trying to sync something not mapped.

can you post the full back trace?


.mrg.


panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread Thomas Klausner
Hi!

I just tried checking out pkgsrc on an nvme when the machine paniced:

panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
"/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0×0 >= 0x0

That's a GENERIC 10.99.10/amd64 from releng, Oct 11.

Has anyone seen this one before?

I have a crash dump but no debug kernel, since I didn't build it
myself. dmesg attached, there is one warning from ACPI:
acpi0: autoconfiguration error: invalid PCI address for D005
no idea if that could be related.
 Thomas


dmesg.redacted.txt.gz
Description: Binary data


Automated report: NetBSD-current/i386 test failure

2023-10-16 Thread NetBSD Test Fixture
This is an automatically generated notice of a new failure of the
NetBSD test suite.

The newly failing test case is:

net/if_lagg/t_lagg:lagg_mtu

The above test failed in each of the last 4 test runs, and passed in
at least 26 consecutive runs before that.

The following commits were made between the last successful test and
the first failed test:

2023.10.16.07.57.40 yamaguchi src/tests/net/if_lagg/t_lagg.sh 1.9

Logs can be found at:


http://releng.NetBSD.org/b5reports/i386/commits-2023.10.html#2023.10.16.07.57.40


Re: new rust

2023-10-16 Thread Havard Eidnes
>>for i in 0..u64::MAX {
>>match libc::_cpuset_isset(i, set) {
>> [...]
>> but ... under which conditions would it seg-fault inside that
>> function?
>
> What's does the Rust impl. of _cpuset_isset() look like? Does it
> take ints by any chance and you're passing a u64 to it here. A C
> compiler will complain if you use `-m32', but, that's all. Don't
> know how the Rust FFI will handle this. That's all I can think
> of...

The relevant rust definitions were (from
vendor/libc/src/unix/bsd/netbsdlike/netbsd/mod.rs):

pub type cpuid_t = u64;

extern "C" {
pub fn _cpuset_isset(cpu: cpuid_t, set: *const cpuset_t) -> ::c_int;
}

Of these, the cpuid_t was wrong, because in C it is

typedef unsigned long   cpuid_t;

(from ), and that's a 32-bit type on ILP32 ports.
On such systems, seen from the 32-bit "actual" libc side, this
would cause rust to do the equivalent of _cpuset_isset(0, NULL),
which is of course going to cause an immediate NULL pointer
de-reference.

This is now all on the way to be fixed, since this pull request has
been accepted and applied upstream:

  https://github.com/rust-lang/libc/pull/3386

and I've applied this patch to the various "rust libc*" versions
vendored inside rust, and have re-built the 1.72.1 bits with this
fix as well.

>> Debugging the C program reveals that pthread_getaffinity_np() has
>> done exactly nothing to the "cset" contents as near as I can
>> tell, the "bits" entry doesn't change.
>
> pthread_getaffinity_np() _can_ be used to get the no. of "online"
> CPUs on both Linux and FreeBSD, but it looks (from my perusal just
> now) like threads default to no affinity on NetBSD and the scheduler
> just picks whatever CPUs available for it--unless the affinity is
> explicitly set, in which case it's inherited.
>
> I think you should just use sysconf(_SC_NPROCESSORS_ONLN) or the
> equivalent on NetBSD.

That threads default to no affinity on NetBSD matches what I'm
seeing and hearing.  However, the affinity set *can* be tweaked
by schedctl (which appears to require root privileges).

The fallback code in rust already does as you suggest: if the
probe for the number of CPUs the thread has affinity to is 0, the
code probes for _SC_NPROCESSORS_ONLN, and if that returns < 1,
then probes for HW_NCPU.

Regards,

- Håvard