On Thu, Sep 15, 2016 at 08:25:10AM +0000, [email protected] wrote:
> On Wed, Sep 14, 2016 at 09:46:35PM +0200, [email protected] wrote:
> > On Tue, Sep 13, 2016 at 08:50 +0000, Olivier Cherrier wrote:
> > > >Synopsis: crash with oce(4)
> > > >Category: network
> > > >Environment:
> > > System : OpenBSD 6.0
> > > Details : OpenBSD 6.0 (GENERIC.MP) #2319: Tue Jul 26
> > > 13:00:43 MDT 2016
> > >
> > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > >Description:
> > >
> > > After upgrading systems from 5.9 (with patch 004) to 6.0, I am getting
> > > crash after a few seconds the network is configured. The problem seems
> > > to be linked to oce(4) and pool, at least not linked to carp/vlan since
> > > I can reproduce the crash with just «ifconfig ocex up» commands as
> > > shown here while booting in single user:
> > >
> >
> > I didn't test CARP, but I cound't reproduce this with vlans on
> > top of a trunk on top of two oce's with 6.0-release. I will
> > double check -current tomorrow. I don't see a good reason for
> > the "missing descriptor in rxeof" unless it's a stray interrupt
> > with a valid completion queue entry which is a bit too weird.
> >
> > Perhaps we're not filling the Rx ring with enough slots and get
> > a heavily fragmented jumbo frame that the card has managed to
> > only partially fit into provided space. How about this diff?
> >
> > diff --git sys/dev/pci/if_oce.c sys/dev/pci/if_oce.c
> > index ee74185..a74b35b 100644
> > --- sys/dev/pci/if_oce.c
> > +++ sys/dev/pci/if_oce.c
> > @@ -1078,7 +1078,7 @@ oce_init(void *arg)
> > rq->ring->index = 0;
> >
> > /* oce splits jumbos into 2k chunks... */
> > - if_rxr_init(&rq->rxring, 8, rq->nitems);
> > + if_rxr_init(&rq->rxring, OCE_MAX_TX_ELEMENTS, rq->nitems);
> >
> > if (!oce_alloc_rx_bufs(rq)) {
> > printf("%s: failed to allocate rx buffers\n",
> > @@ -1560,8 +1560,8 @@ oce_rxeof(struct oce_rq *rq, struct oce_nic_rx_cqe
> > *cqe)
> >
> > for (i = 0; i < cqe->u0.s.num_fragments; i++) {
> > if ((pkt = oce_pkt_get(&rq->pkt_list)) == NULL) {
> > - printf("%s: missing descriptor in rxeof\n",
> > - sc->sc_dev.dv_xname);
> > + printf("%s: missing descriptor in rxeof, frag %d/%u\n",
> > + sc->sc_dev.dv_xname, i, cqe->u0.s.num_fragments);
> > goto exit;
> > }
> >
> >
>
>
> Hi Mike,
>
> With Current and your patch, it is stable (no crash) and there is no
> "missing descriptor in rxeof" message anymore.
>
> But there is still the vlan part that is not working.
Precisely, it seems vlan is not working when I try to define it on top
of a trunk.
I moved all the hostname.* files into a directory called
"/etc/hostname.ALL". Then I booted from scratch (so with a 'blank'
network config) and experimented this way:
# cd /etc/hostname.ALL/
# ls -la
total 48
drwxr-xr-x 2 root wheel 512 Sep 15 15:22 .
drwxr-xr-x 24 root wheel 2048 Sep 15 15:22 ..
-rw-r----- 1 root wheel 85 Dec 15 2015 hostname.carp0
-rw-r----- 1 root wheel 100 Dec 15 2015 hostname.carp1
-rw-r----- 1 root wheel 73 Jan 27 2016 hostname.carp2
-rw-r----- 1 root wheel 88 Jan 27 2016 hostname.carp3
-rw-r----- 1 root wheel 3 Dec 15 2015 hostname.oce0
-rw-r----- 1 root wheel 3 Dec 15 2015 hostname.oce1
-rw-r----- 1 root wheel 33 Dec 15 2015 hostname.pfsync0
-rw-r----- 1 root wheel 56 Dec 15 2015 hostname.trunk0
-rw-r----- 1 root wheel 58 Dec 15 2015 hostname.vlan1
-rw-r----- 1 root wheel 60 Dec 15 2015 hostname.vlan20
#
# cat hostname.vlan20
vlan 20 vlandev trunk0
inet x.x.x.x 255.255.255.0
up
#
# ifconfig oce0 up
# ifconfig vlan20 create
# ifconfig vlan20 vlan 20 vlandev oce0
# ifconfig vlan20 inet x.x.x.x 255.255.255.0
# ifconfig vlan20 up
#
#
# echo it works
it works
#
#
# ifconfig vlan
vlan20: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:17:a4:77:04:3e
index 10 priority 0 llprio 3
vlan: 20 parent interface: oce0
vnetid: 20
parent: oce0
groups: vlan
status: active
inet x.x.x.x netmask 0xffffff00 broadcast x.x.x.x
#
# ifconfig vlan20 destroy
#
# ifconfig vlan
vlan: no such interface
# ifconfig oce1 up
#
# cat hostname.trunk0
trunkport oce0 trunkport oce1 trunkproto loadbalance
up
#
# ifconfig trunk0 create
# ifconfig trunk0 trunkport oce0 trunkport oce1 trunkproto loadbalance
# ifconfig trunk0 up
#
#
# ifconfig vlan20 create
# ifconfig vlan20 vlan 20 vlandev trunk0
ifconfig: SIOCSETVLAN: No buffer space available
#
So it doesn't seems to be related to oce(4) but more to trunk(4).
Thanks
Best
oc