Re: [Kgdb-bugreport] [PATCH 2.6.20-rc7] 8139too KGDBoE fix

Mark Huth Fri, 23 Feb 2007 10:12:00 -0800

Amit S. Kale wrote:

Hi Net Gurus,
This thread came up on kgdb-bugreport mailing list. Could you please suggestus what's the correct way of fixing this problem?
1. When running a kgdb on RTL8139 ethernet interface: 8139too driver printstoo many "Out-of-sync dirty pointer" messages on console and gdb can'tconnect to kgdb stub. These messages can be suppressed, though it stillresults in connection failures frequently.

We think this comes from calling the driver while the queue is stopped.Drivers should not do horrible things when hard start is called with thequeue stopped, but unfortunately, at this time, at least some driversdo explode or complain under that condition.

2. Here is how kgdb uses polling mechanism for communication to gdb. kgdbcalls netpoll_set_trap(1) just before entering a loop where it communicatesto gdb. It calls netpoll_set_trap(0) after it is done and wants to resume akernel. The communication to gdb goes through netpoll_poll (which calls kgdbrx_hook) and netpoll_send_udp functions.
3. A queue for an interface may have been stopped by it's driver by callingnetif_stop_queue. After this if kgdb attempts to enter communication withgdb, it'll call netpoll_set_trap(1), after which the queue can't be startedagain. This is a potential deadlock situation. Is there a way out of this?

We are trying without setting the CONFIG_NETPOLL_TRAP option. Thisoption is what turns off the function of the netif_stop/wake_queuecalls, which breaks the usual flow control mechanism used by netpolltransmit function. It also prevents the netif_schedule call, which willputs the device on the tx softirq queue. However, in the case whereinterupts are off and scheduling is not allowed - which would be thenetpoll_set_trap(1) condition, the softirq will not run until netpoll isdone and the user of netpoll returns the system to normal operation. SoI am unclear that allowing the schedule is a problem. There may be someobscure race conditions on smp, so we are trying to analyze that part,but for the moment are testing with the netif_schedule call allowed inthe event of queuing the device.

4. Is it necessary to call netpoll_set_trap(1) at all before entering gdbcommunication loop? Even if a driver stops the queue in middle of thecommunication netpoll_poll and netpoll_send_udp calls can recover from thatby calling driver's interrupt and poll routines. Is this a valid statement?

netpoll_set_trap() is necessary, as it informs the netpoll code torespond to arp requests on behalf of the netpoll user, as well as makingsure that skbs are freed without needing the completion queue stuff torun (I think)

Thanks a lot.
-Amit



On Thursday 22 February 2007 22:11, Sergei Shtylyov wrote:

Hello, I wrote:

Even with this patch, the packets probably get stuck somewhere in
the driver, as cross-gdb sees tail of the $g packet reply only in
reply to next packet...

 This wasn;t happeing on x86 probably because the register packet
should be much shorted there than on PPC...

 Argh! That's all because of the CONFIG_NETPOLL_TRAP that
CONFIG_KGDBOE* options select -- since the initial breakpoint enables
trapping via KGDBoE's pre_exception() handler,
netif_{stop/wake}_queue() stop to work and that causes KGDBoE to
literally flood 8139too with packets (although it can't queue up
more than 4). Looks like a general design issue to me... :-/

Well, maybe not. But many drivers are surely unprepared to their
hard_start_xmit() method being called with queue alraedy stopped and
those with small TX queue (like natsemi with which we're also having
trouble) would get flooded as well. I'm going to submit a patch to
netdev adding extra check for TX ring being full -- after/if it gets
accepted, this patch won't be needed anymore.

Here is what comes to my mind right away. It might need some more
polishing or cleaning up:

A potential solution will be to check the if hard_start_xmit() returns
NETDEV_TX_BUSY. In case transmit queue is busy (due to lot of threads
or queue getting full), we should wait in netpoll_send_skb(), call a
cleanup through poll() and then retry sending packet.

  This is already being done by netpoll iself. The thing is that
hard_start_xmit() doesdn't return NETDEV_TX_BUSY in those drivers. :-/

In addition to that we set trapped. I wonder whether it is possible that
a queue is stopped and we enter kgdb. It would be a deadlock.
-Amit

    Why? Netpoll does call the driver's interrupt and NAPI handlers in
that case (until the retry count is 0).

    Ah, got it -- since the traffic trapping (when enabled) effectively
bypasses netif_wake_queue(), a queue would never be actually woken up.
Maybe it's worth to always return 0 from netif_queue_stopped() in this
case? Or maybe the correct thing to do when trapping is to just thiddle the
__LINK_STATE_XOFF bit, bypassing call to netif_schedule()?

Regards,
Mithlesh Thukral

WBR, Sergei

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Kgdb-bugreport mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Kgdb-bugreport] [PATCH 2.6.20-rc7] 8139too KGDBoE fix

Reply via email to