On Sat, 30 Jan 2021 06:29:20 -0800 Xie He wrote: > On Fri, Jan 29, 2021 at 5:36 PM Jakub Kicinski <k...@kernel.org> wrote: > > I'm still struggling to wrap my head around this. > > > > Did you test your code with lockdep enabled? Which Qdisc are you using? > > You're queuing the frames back to the interface they came from - won't > > that cause locking issues? > > Hmm... Thanks for bringing this to my attention. I indeed find issues > when the "noqueue" qdisc is used. > > When using a qdisc other than "noqueue", when sending an skb: > "__dev_queue_xmit" will call "__dev_xmit_skb"; > "__dev_xmit_skb" will call "qdisc_run_begin" to mark the beginning of > a qdisc run, and if the qdisc is already running, "qdisc_run_begin" > will fail, then "__dev_xmit_skb" will just enqueue this skb without > starting qdisc. There is no problem. > > When using "noqueue" as the qdisc, when sending an skb: > "__dev_queue_xmit" will try to send this skb directly. Before it does > that, it will first check "txq->xmit_lock_owner" and will find that > the current cpu already owns the xmit lock, it will then print a > warning message "Dead loop on virtual device ..." and drop the skb. > > A solution can be queuing the outgoing L2 frames in this driver first, > and then using a tasklet to send them to the qdisc TX queue. > > Thanks! I'll make changes to fix this.
Sounds like too much afford for a sub-optimal workaround. The qdisc semantics are borken in the proposed scheme (double counting packets) - both in term of statistics and if user decides to add a policer, filter etc. Another worry is that something may just inject a packet with skb->protocol == ETH_P_HDLC but unexpected structure (IDK if that's a real concern). It may be better to teach LAPB to stop / start the internal queue. The lower level drivers just needs to call LAPB instead of making the start/wake calls directly to the stack, and LAPB can call the stack. Would that not work?