re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread matthew green
> panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
> "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0

this is from:

KASSERTMSG(offset < map->dm_mapsize,
"bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE,
offset, map->dm_mapsize);

the mapsize being zero indicates that there's nothing mapped
currently in this dma map, so there's nothing to sync.  ie,
the caller seems to be trying to sync something not mapped.

can you post the full back trace?


.mrg.


Re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread Thomas Klausner
On Tue, Oct 17, 2023 at 10:07:14AM +1100, Matthew Green wrote:
> > panic: kernel diagnostic assertion "offset < map->dm_maps" failed: file 
> > "/usr/src/sys/arch/x86/x86/bus_dma.c", line 826 bad offset 0x0 >= 0x0
> 
> this is from:
> 
> KASSERTMSG(offset < map->dm_mapsize,
> "bad offset 0x%"PRIxBUSADDR" >= 0x%"PRIxBUSSIZE,
> offset, map->dm_mapsize);
> 
> the mapsize being zero indicates that there's nothing mapped
> currently in this dma map, so there's nothing to sync.  ie,
> the caller seems to be trying to sync something not mapped.
> 
> can you post the full back trace?

Sure:

(gdb) bt
#0  0x80239c75 in cpu_reboot ()
#1  0x80ddb28d in kern_reboot ()
#2  0x80e21798 in vpanic ()
#3  0x80fe6e5f in kern_assert ()
#4  0x8058be67 in bus_dmamap_sync ()
#5  0x8044edc7 in rge_rxeof ()
#6  0x804536fd in rge_intr ()
#7  0x80592c15 in intr_biglock_wrapper ()
#8  0x80214405 in Xhandle_ioapic_edge18 ()
#9  0x8023547d in x86_mwait ()
#10 0x805819d0 in acpicpu_cstate_idle ()
#11 0x80dbe5d6 in idle_loop ()
#12 0x80210327 in lwp_trampoline ()
#13 0x in ?? ()

 Thomas


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-16 Thread matthew green
> #3  0x80fe6e5f in kern_assert ()
> #4  0x8058be67 in bus_dmamap_sync ()
> #5  0x8044edc7 in rge_rxeof ()
> #6  0x804536fd in rge_intr ()

i'm pretty sure this is the 2nd bus_dmamap_sync() call, as that's
the only dma map that has load/unload applied at run time, vs the
init sequence only, and it implies to me that rx dma map has had
allocation failures to deplete the entire ring of mbufs, and then
there are no mappings in the dma map, which leaves the dm_mapsize
as 0, and triggers this bug.

if i'm right, what's happened is this:

1237 for (i = sc->rge_ldata.rge_rxq_considx; ; i = RGE_NEXT_RX_DESC(i)) 
{

1245 if (RGE_OWN(cur_rx))
1246 break;

1252 rxq = &sc->rge_ldata.rge_rxq[i];
1253 m = rxq->rxq_mbuf;

1257 /* Invalidate the RX mbuf and unload its map. */
1258 bus_dmamap_sync(sc->sc_dmat, rxq->rxq_dmamap, 0,
1259 rxq->rxq_dmamap->dm_mapsize, BUS_DMASYNC_POSTREAD);
1260 bus_dmamap_unload(sc->sc_dmat, rxq->rxq_dmamap);

1283  * If allocating a replacement mbuf fails,
1284  * reload the current one.

1287 if (rge_newbuf(sc, i) != 0) {
1288 if (sc->rge_head != NULL) {
1289 m_freem(sc->rge_head);
1290 sc->rge_head = sc->rge_tail = NULL;
1291 }
1292 rge_discard_rxbuf(sc, i);
1293 continue;
1294 }

loop 'i' has the ability to range between 0 and 1023, and
accesses each ring entries rge_rxq.  if, over time, each 
value between 0 and 1023 triggers the rge_newbuf() failure
path, each successive entry will be lost, never to be 
replaced unless an explicit ifconfig down/up occurs.

hmmm, but in thie case, no buffers would should be set to
be available for rx, so nowthing should pass RGE_OWN() at
L1245 i'd hope.  i still see the problem with everything
being depleted, but then it should just stop getting any
rx packets at all...

networking folks, am i missing something here?  i see the
same problem in wm(4) as well.  if wm_add_rxbuf() fails,
where will this ring entry's mbuf ever be replaced again?


.mrg.


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-17 Thread matthew green
> hmmm, but in thie case, no buffers would should be set to
> be available for rx, so nowthing should pass RGE_OWN() at
> L1245 i'd hope.  i still see the problem with everything
> being depleted, but then it should just stop getting any
> rx packets at all...
>
> networking folks, am i missing something here?  i see the
> same problem in wm(4) as well.  if wm_add_rxbuf() fails,
> where will this ring entry's mbuf ever be replaced again?

i see the thing i missed.

i was looking at openbsd if_rge.c 1.16, which m_free()s
the mbuf in this case, which in our tree has nothing that
would refill it, but our if_rge.c has this comment:

   * If allocating a replacement mbuf fails,
   * reload the current one.

which means that when we have a mbuf allocation error,
we basically drop the current packet, and leave the mbuf
in place ready for use next time.  that means there is no
mbuf leak in our current code, and i think the only part
of openbsd if_rge.c 1.16 we want is the if_ierrors++
(that we call if_statinc(ifp, if_ierrors).)

i think i see the problem (no, really, this time :-).

when we have a memory failure, we don't re-load the
map with bus_dmamap_unload(), so that's why it has zero
size.

the fix isn't simple because the load of the new mbuf
can fail, and then we want to reload the old one, but
it was the load event that failed, why would it work
again for the old mbuf now?  seems like we need to have
a (very short) timer that tries to realloc it again,
but i'm hoping someone else has solved this problem and
we can use their method..


.mrg. 


re: panic: kernel diagnostic assertion "offset < map->dm_maps" failed

2023-10-18 Thread matthew green
i'm pretty sure i've solved this properly this attempt, but
review on this change would be appreciated.

   https://www.netbsd.org/~mrg/if_rge.c.v3.diff

it includes a potential way to avoid wm(4) calling panic() if
bus_dmamap_load*() fails..


.mrg.