Thanks for your support. I'll try to test when you get it done.

On Mon, May 9, 2022 at 8:51 PM Alexandr Nedvedicky <
alexandr.nedvedi...@oracle.com> wrote:

> Hello Barbaros,
>
> thank you for testing and excellent report.
>
> </snip>
>
> > ddb{1}> trace
> > db_enter() at db_enter+0x10
> > panic(ffffffff81f22e39) at panic+0xbf
> > __assert(ffffffff81f96c9d,ffffffff81f85ebc,a3,ffffffff81fd252f) at
> __assert+0x25
> > assertwaitok() at assertwaitok+0xcc
> > mi_switch() at mi_switch+0x40
>
>     assert indicates we attempt to sleep inside SMR section,
>     which must be avoided.
>
> > sleep_finish(ffff800025574da0,1) at sleep_finish+0x10b
> > rw_enter(ffffffff822cfe50,1) at rw_enter+0x1cb
> > pf_test(2,1,ffff80000520e000,ffff800025575058) at pf_test+0x1088
> > ip_input_if(ffff800025575058,ffff800025575064,4,0,ffff80000520e000) at
> ip_input_if+0xcd
> > ipv4_input(ffff80000520e000,fffffd8053616700) at ipv4_input+0x39
> > ether_input(ffff80000520e000,fffffd8053616700) at ether_input+0x3ad
> > vport_if_enqueue(ffff80000520e000,fffffd8053616700) at
> vport_if_enqueue+0x19
> >
> veb_port_input(ffff8000051c3800,fffffd806064c200,ffffffffffff,ffff800002066600)
> at veb_port_input+0x4d2
> > ether_input(ffff8000051c3800,fffffd806064c200) at ether_input+0x100
> > vlan_input(ffff80000095a050,fffffd806064c200,ffff8000255752bc) at
> vlan_input+0x23d
> > ether_input(ffff80000095a050,fffffd806064c200) at ether_input+0x85
> > if_input_process(ffff80000095a050,ffff800025575358) at
> if_input_process+0x6f
> > ifiq_process(ffff80000095a460) at ifiq_process+0x69
> > taskq_thread(ffff800000035080) at taskq_thread+0x100
>
>     above is a call stack, which has done a bad thing (sleeping SMR
> section)
>
> in my opinion the primary suspect is veb_port_input() which code reads as
> follows:
>
>  966 static struct mbuf *
>  967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void
> *brport)
>  968 {
>  969         struct veb_port *p = brport;
>  970         struct veb_softc *sc = p->p_veb;
>  971         struct ifnet *ifp = &sc->sc_if;
>  972         struct ether_header *eh;
>  ...
> 1021         counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
> 1022             m->m_pkthdr.len);
> 1023
> 1024         /* force packets into the one routing domain for pf */
> 1025         m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
> 1026
> 1027 #if NBPFILTER > 0
> 1028         if_bpf = READ_ONCE(ifp->if_bpf);
> 1029         if (if_bpf != NULL) {
> 1030                 if (bpf_mtap_ether(if_bpf, m, 0) != 0)
> 1031                         goto drop;
> 1032         }
> 1033 #endif
> 1034
> 1035         veb_span(sc, m);
> 1036
> 1037         if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) &&
> 1038             veb_ip_filter(m))
> 1039                 goto drop;
> 1040
> 1041         if (!ISSET(ifp->if_flags, IFF_LINK0) &&
> 1042             veb_vlan_filter(m))
> 1043                 goto drop;
> 1044
> 1045         if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst))
> 1046                 goto drop;
>
> call to veb_span() at line 1035 seems to be our guy/culprit (in my
> opinion):
>
>  356         smr_read_enter();
>  357         SMR_TAILQ_FOREACH(p, &sc->sc_spans.l_list, p_entry) {
>  358                 ifp0 = p->p_ifp0;
>  359                 if (!ISSET(ifp0->if_flags, IFF_RUNNING))
>  360                         continue;
>  361
>  362                 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN,
> M_NOWAIT);
>  363                 if (m == NULL) {
>  364                         /* XXX count error */
>  365                         continue;
>  366                 }
>  367
>  368                 if_enqueue(ifp0, m); /* XXX count error */
>  369         }
>  370         smr_read_leave();
>
> loop above comes from veb_span(), which calls if_enqueue() from within
> a smr section. The line 368 calls here:
>
> 2191 static int
> 2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m)
> 2193 {
> 2194         /*
> 2195          * switching an l2 packet toward a vport means pushing it
> 2196          * into the network stack. this function exists to make
> 2197          * if_vinput compat with veb calling if_enqueue.
> 2198          */
> 2199
> 2200         if_vinput(ifp, m);
> 2201
> 2202         return (0);
> 2203 }
>
> which in turn calls if_vinput() which calls further down to ipstack, and IP
> stack my sleep. We must change veb_span() such calls to if_vinput() will
> happen
> outside of SMR section.
>
> I don't have such complex setup to use vlans and virtual ports. I'll try to
> cook some diff and pass it to you for testing.
>
> thanks again for coming back to us with report.
>
> regards
> sashan
>
>
>

Reply via email to