Re: sky2 silicon bugs and workarounds...

2007-07-06 Thread Stephen Hemminger
On Mon, 2 Jul 2007 14:37:06 +0100
Daniel J Blueman [EMAIL PROTECTED] wrote:

 Hi Stephen,
 
 When the sky2 driver initialises, it sets the the ISR timer register
 (STAT_ISR_TIMER_INI) to 125 * 20 = 2500, whereas the vendor sk98lin
 driver sets it to 400, irrespective of the clockspeed of the NIC
 processor.
 
 I guess you found more performance/stability from this value...?
 
 I've checked through the errata workarounds common to my rev-1 and 2
 Yukon-EC chips...the HWF_WA_DEV_4167 oversize receive hang
 workaround checks and can reset the (as I understand) bus master unit
 of the NIC (in CheckRxPath) in a periodic timer that is fired, where
 is finds no progress is made.
 

My best guess at what that is handling is the chip (bug) that causes
the receiver to hang if a packet larger than the receive DMA buffer is
received.  The sky2 driver doesn't need this because it allocates a 
slightly larger buffer than necessary, and truncates the oversize packet.
This works because the hardware has a truncation register that was probably
designed for use when packet sniffing.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


sky2 silicon bugs and workarounds...

2007-07-02 Thread Daniel J Blueman

Hi Stephen,

When the sky2 driver initialises, it sets the the ISR timer register
(STAT_ISR_TIMER_INI) to 125 * 20 = 2500, whereas the vendor sk98lin
driver sets it to 400, irrespective of the clockspeed of the NIC
processor.

I guess you found more performance/stability from this value...?

I've checked through the errata workarounds common to my rev-1 and 2
Yukon-EC chips...the HWF_WA_DEV_4167 oversize receive hang
workaround checks and can reset the (as I understand) bus master unit
of the NIC (in CheckRxPath) in a periodic timer that is fired, where
is finds no progress is made.

With the issues we see, can they be detected earlier by the stats
counters not being incremented, then resetting the bus-master unit,
rather than the whole chip getting kicked after a far longer period.

It looks like if it is a silicon bug, we can just acknowledge it and
have a better framework to detect the chip's PCI interface locking up
and kick it in a smarter way perhaps...

Daniel
--
Daniel J Blueman
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 silicon bugs and workarounds...

2007-07-02 Thread Daniel J Blueman

On 02/07/07, Stephen Hemminger [EMAIL PROTECTED] wrote:

On Mon, 2 Jul 2007 14:37:06 +0100
Daniel J Blueman [EMAIL PROTECTED] wrote:
 When the sky2 driver initialises, it sets the the ISR timer register
 (STAT_ISR_TIMER_INI) to 125 * 20 = 2500, whereas the vendor sk98lin
 driver sets it to 400, irrespective of the clockspeed of the NIC
 processor.

 I guess you found more performance/stability from this value...?



Not really, it was just a rough guess to try and get more frames
per irq under DoS load.  Haven't fine tuned those values.



 I've checked through the errata workarounds common to my rev-1 and 2
 Yukon-EC chips...the HWF_WA_DEV_4167 oversize receive hang
 workaround checks and can reset the (as I understand) bus master unit
 of the NIC (in CheckRxPath) in a periodic timer that is fired, where
 is finds no progress is made.



Where did you get those errata's. I keep having to do reverse
engineering guessing with vendor driver.


http://www.syskonnect.de/e_en/products/adapters/pcie_server/sk-9exx/software/linux/driver/install_v10.0.4.3.tar.bz2
from sk98lin.tar.bz2 inside

--- defined in ./common/h/skgeinit.h

/*-RMV- DWORD 1: Deviations */
#define HWF_WA_DEV_53   0x1100UL/*-RMV- 5.3
(Tx Done LSOv2 rep)*/
#define HWF_WA_DEV_LIM_IPV6_RSS 0x1080UL/*-RMV- IPV6 RSS limitted */
#define HWF_WA_DEV_4217 0x1040UL/*-RMV- 4.217
(PCI-E blockage) */
#define HWF_WA_DEV_4200 0x1020UL/*-RMV- 4.200
(D3 Blue Screen)*/
#define HWF_WA_DEV_4185CS   0x1010UL/*-RMV- 4.185
(ECU 100 CS cal)*/
#define HWF_WA_DEV_4185 0x1008UL/*-RMV- 4.185
(ECU Tx h check)*/
#define HWF_WA_DEV_4167 0x1004UL/*-RMV- 4.167
(Rx OvSize Hang)*/
#define HWF_WA_DEV_4152 0x1002UL/*-RMV- 4.152
(RSS issue) */
#define HWF_WA_DEV_4115 0x1001UL/*-RMV- 4.115
(Rx MAC FIFO) */
#define HWF_WA_DEV_4109 0x10008000UL/*-RMV- 4.109
(BIU hang) */
#define HWF_WA_DEV_483  0x10004000UL/*-RMV- 4.83
(Rx TCP wrong) */
#define HWF_WA_DEV_479  0x10002000UL/*-RMV- 4.79
(Rx BMU hang II) */
#define HWF_WA_DEV_472  0x10001000UL/*-RMV- 4.72
(GPHY2 MDC clk) */
#define HWF_WA_DEV_463  0x1800UL/*-RMV- 4.63
(Rx BMU hang I) */
#define HWF_WA_DEV_427  0x1400UL/*-RMV- 4.27
(Tx Done Rep) */
#define HWF_WA_DEV_42   0x1200UL/*-RMV- 4.2
(pref unit burst) */
#define HWF_WA_DEV_46   0x1100UL/*-RMV- 4.6
(CPU crash II) */
#define HWF_WA_DEV_43_418   0x1080UL/*-RMV- 4.3 
4.18 (PCI unexp */

/*-RMV- complStat BMU deadl) */
#define HWF_WA_DEV_420  0x1040UL/*-RMV- 4.20
(Status BMU ov) */
#define HWF_WA_DEV_423  0x1020UL/*-RMV- 4.23
(TCP Segm Hang) */
#define HWF_WA_DEV_424  0x1010UL/*-RMV- 4.24
(MAC reg overwr) */
#define HWF_WA_DEV_425  0x1008UL/*-RMV- 4.25
(Magic packet */

/*-RMV- with odd offset) */
#define HWF_WA_DEV_428  0x1004UL/*-RMV- 4.28
(Poll-U BigEndi)*/
#define HWF_WA_FIFO_FLUSH_YLA0  0x1002UL/*-RMV- dis Rx GMAC FIFO Flush*/

/*-RMV- for Yu-L Rev. A0 only */
#define HWF_WA_COMA_MODE0x1001UL/*-RMV- Coma
Mode WA req */

--- common/skgeinit.c:SkGeSetUpSupFeatures()

   case CHIP_ID_YUKON_EC:
   pAC-GIni.HwF.Features[HW_DEV_LIST] =
   HWF_WA_DEV_42   | HWF_WA_DEV_46   |
HWF_WA_DEV_43_418 |
...
   case CHIP_ID_YUKON_FE:
   pAC-GIni.HwF.Features[HW_DEV_LIST] =
   HWF_WA_DEV_427  | HWF_WA_DEV_4109 |
   HWF_WA_DEV_4152 | HWF_WA_DEV_4167;
   break;
   case CHIP_ID_YUKON_XL:
... etc

It's worthwhile looking at 2.6/skge.c:CheckRxPath() and it's call-site
from the timer handler.

Thanks,
 Daniel
--
Daniel J Blueman
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 silicon bugs and workarounds...

2007-07-02 Thread Stephen Hemminger
On Mon, 2 Jul 2007 14:37:06 +0100
Daniel J Blueman [EMAIL PROTECTED] wrote:

 Hi Stephen,
 
 When the sky2 driver initialises, it sets the the ISR timer register
 (STAT_ISR_TIMER_INI) to 125 * 20 = 2500, whereas the vendor sk98lin
 driver sets it to 400, irrespective of the clockspeed of the NIC
 processor.
 
 I guess you found more performance/stability from this value...?
 
 I've checked through the errata workarounds common to my rev-1 and 2
 Yukon-EC chips...the HWF_WA_DEV_4167 oversize receive hang
 workaround checks and can reset the (as I understand) bus master unit
 of the NIC (in CheckRxPath) in a periodic timer that is fired, where
 is finds no progress is made.

This code in the vendor driver is not acceptable. It causes the device
to continually reset itself in idle state. The sk9lin driver
can not tell the difference between no packets arriving and hung!


-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html