Major bug found in TX part - see below. Andrew Lunn wrote: >> During debugging, I see that 'priv->tx_busy' stays 'true', because the >> TX-Complete bit (COMP of TSR reg) is never set - I also see that the >> TX-Used Bit Read (UBR of TSR reg) is set, which is normal I think >> because the TX buffers have not yet been released (because COMP is not >> set..). But the TX-GO bit is 0, meaning that transmit is not busy. >> >> I added code to release the TX buffers and to clear 'priv->tx_busy' when >> TX-GO is read 0. Then at each ARP request, there is an ARP reply, but it >> does not get out.. (monitoring with Wireshark) >> What's happening - it is like the EMAC TX is blocked, but why? > > It might be worth printing out the value of isr in at91_eth_isr. As > the comment says, error conditions are not handled. Maybe you are > getting a buffer under run? Indeed Andrew, looking at the value of the ISR lead me to the cause of error, and hence the solution. The TX status of the ISR can be read in AT91_EMAC_TSR. The problem is bit 0 of this register: UBR = "Used Bit Read". The datasheet (DS) is very unclear on this point (both the ATSAM9260 DS and ATSAM7X DS have completely the same section about their EMAC). But after a full day debugging, I'm quite sure I'm right. So, that Used Bit indicates if the concerning packet buffer (with sg_list[i] data) has already been used by the EMAC. This Used Bit must be set to 0 by the user SW, and is set to 1 by the EMAC. In case of the TX EMAC, the Used Bit is set to 1 by the EMAC after transmit of the concerning buffer. The TX EMAC holds a private counter/pointer (read-only in TBQP) to the transmit buffer descriptor list - the start location is the write-only TBQP. (Only the RX part is explained in the DS). Normally this counter/pointer cycles trough the circular transmit buffer descriptor list. Now, after transmit of the last buffer (that last buffer has to be marked by the SW with a EOF flag) the TX EMAC increases its TX counter/pointer to already point the next TX buffer. But that next TX buffer - that is free of course - is marked by the SW as "Used" because the Used Bit is set. And therefore the UBR error is flagged. And therefore the TX EMAC *resets its counter/pointer* !! Because the SW does not reset its tbd_idx pointer, both pointers are out of sync so all further transmits are blocked. The major bug is that the if_at91.c SW rigorously sets the Used Bit (in the buffer descriptor) of all free buffers to 1, meaning "Used". A the end of the EMAC DS (36.4.1.3 for SAM9260 vG and 37.4.1.3 for SAM7 vG point 2): "Mark all entries in this list as owned by EMAC, i.e. bit 31 of word 1 set to 0." Therefore I changed at91_tb_init() and at91_reset_tbd() accordingly. Solving the "Used Bit" issue is not enough. All the TX errors UBR, RLE, BEX and UND cause the counter/pointer to be reset. Therefore I changed the deliver() function and added an extra argument to at91_reset_tbd() to reset the SW tbd_idx pointer. I also added those TX errors to the IRQ IER and ISR.
In attachment a diff - it also contains my previous changes. I also added some comments, and added '#ifdef CYGINT_IO_ETH_INT_SUPPORT_REQUIRED' to the start and stop functions. And I added more debug printing - even in the ISR.. I tested this fix during 5 minutes - long enough to see that a ping finally works... There are of course still some problems - the ping reply is not synchronized with the ping request: it runs 1 ping iteration behind - more of it in my next mail. Kind regards, Juergen P.S.: The current code can only work if the driver is initialized after each TX, or because each tx uses all buffers so that the pointer is wrapped. I guess redboot must work this way?? Or am I still missing something? -- Before posting, please read the FAQ: http://ecos.sourceware.org/fom/ecos and search the list archive: http://ecos.sourceware.org/ml/ecos-discuss
