Re: [PATCH 0/4] RFC: Realtek 83xx SMI driver core

2018-04-02 Thread Carl-Daniel Hailfinger
Hi Linus,

did you make any progress with this?
I noticed that the Vodafone Easybox 904xdsl/904lte models both make use
of the RTL8367 switch. About one million of these routers have been
deployed in Germany.
There is an OpenWrt fork at
https://github.com/Quallenauge/Easybox-904-XDSL/commits/master-lede
which depends on the out-of-tree patches which seem to be the basis for
your Realtek 83xx driver patches.

Having your Realtek 83xx patches in the upstream Linux kernel would help
tremendously in getting support for those router models merged in OpenWrt.

Regards,
Carl-Daniel


Re: [RFD] L2 Network namespace infrastructure

2007-06-23 Thread Carl-Daniel Hailfinger
On 23.06.2007 19:19, Eric W. Biederman wrote:
 Patrick McHardy [EMAIL PROTECTED] writes:
 
 Eric W. Biederman wrote:
 
 Depending upon the data structure it will either be modified to hold
 a per entry network namespace pointer or it there will be a separate
 copy per network namespace.  For large global data structures like
 the ipv4 routing cache hash table adding an additional pointer to the
 entries appears the more reasonable solution.

 So the routing cache is shared between all namespaces?
 
 Yes.  Each namespaces has it's own view so semantically it's not
 shared.  But the initial fan out of the hash table 2M or something
 isn't something we want to replicate on a per namespace basis even
 assuming the huge page allocations could happen.
 
 So we just tag the entries and add the network namespace as one more
 part of the key when doing hash table look ups.

Can one namespace DoS other namespaces' access to the routing cache?
Two scenarios come to mind:
* provoking hash collisions
* lock contention (sorry, haven't checked whether/how we do locking)

Regards,
Carl-Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Virtual ethernet tunnel (v.2)

2007-06-08 Thread Carl-Daniel Hailfinger
On 08.06.2007 19:00, Ben Greear wrote:
 I have another sysfs patch that allows setting a default skb-mark for
 an interface so that you can set the skb-mark
 before it hits the connection tracking logic, but I'm been told this one
 has very little chance
 of getting into the kernel.  The skb-mark patch is only useful (as far
 as I can tell) if you
 also include a patch Patrick McHardy did for me that allowed the
 conn-tracking logic to
 use skb-mark as part of it's tuple.  This allows me to do NAT between
 virtual routers
 (routing tables) on the same machine using veth-equivalent drivers to
 connect the
 routers.  He thinks this will probably not ever get into the kernel either.

Are these patches available somewhere? I'm currently doing NAT between
virtual routers by some advanced iproute2/iptables trickery, but I have
no way to handle the occasional tuple conflict.

Regards,
Carl-Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LARTC] [ANNOUNCE] iproute2-2.6.18-061002

2006-10-04 Thread Carl-Daniel Hailfinger
Stephen Hemminger wrote:
 This is a much delayed update to the iproute2 command set.
 It can be downloaded from:
   
 http://developer.osdl.org/dev/iproute2/download/iproute2-2.6.18-061002.tar.gz

Thanks!

Are there any plans to merge the ip arp patches at
http://www.ssi.bg/~ja/#iparp ? Apologies if this has already
been rejected before. Searching the archives I couldn't find
such a discussion.


Regards,
Carl-Daniel


-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Revert sky2 to 0.13a

2006-02-25 Thread Carl-Daniel Hailfinger
Hi Jeff,

you may want to push this patch into 2.6.16. The version it reverts to
has been running stable for over four weeks for various folks (CC'ed)
and we have had no success communicating with the maintainer.

Regards,
Carl-Daniel


Revert sky2 to 0.13 with a four-line fix on top of it.
Later versions cause random oopses and just hang on some chips.

Signed-off-by: Carl-Daniel Hailfinger [EMAIL PROTECTED]


diff -Nurp linux-2.6.16-rc4-git8/drivers/net/sky2.c 
linux-2.6.16-rc4-git8-sky2fix/drivers/net/sky2.c
--- linux-2.6.16-rc4-git8/drivers/net/sky2.c2006-02-25 02:38:35.0 
+0100
+++ linux-2.6.16-rc4-git8-sky2fix/drivers/net/sky2.c2006-02-26 
01:29:45.0 +0100
@@ -23,6 +23,12 @@
  * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
+/*
+ * TOTEST
+ * - speed setting
+ * - suspend/resume
+ */
+
 #include linux/config.h
 #include linux/crc32.h
 #include linux/kernel.h
@@ -51,7 +57,7 @@
 #include sky2.h
 
 #define DRV_NAME   sky2
-#define DRV_VERSION0.15
+#define DRV_VERSION0.13a
 #define PFXDRV_NAME  
 
 /*
@@ -96,10 +102,6 @@ static int copybreak __read_mostly = 256
 module_param(copybreak, int, 0);
 MODULE_PARM_DESC(copybreak, Receive copy threshold);
 
-static int disable_msi = 0;
-module_param(disable_msi, int, 0);
-MODULE_PARM_DESC(disable_msi, Disable Message Signaled Interrupt (MSI));
-
 static const struct pci_device_id sky2_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9000) },
{ PCI_DEVICE(PCI_VENDOR_ID_SYSKONNECT, 0x9E00) },
@@ -195,11 +197,11 @@ static int sky2_set_power_state(struct s
pr_debug(sky2_set_power_state %d\n, state);
sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_ON);
 
-   power_control = sky2_pci_read16(hw, hw-pm_cap + PCI_PM_PMC);
-   vaux = (sky2_read16(hw, B0_CTST)  Y2_VAUX_AVAIL) 
+   pci_read_config_word(hw-pdev, hw-pm_cap + PCI_PM_PMC, power_control);
+   vaux = (sky2_read8(hw, B0_CTST)  Y2_VAUX_AVAIL) 
(power_control  PCI_PM_CAP_PME_D3cold);
 
-   power_control = sky2_pci_read16(hw, hw-pm_cap + PCI_PM_CTRL);
+   pci_read_config_word(hw-pdev, hw-pm_cap + PCI_PM_CTRL, 
power_control);
 
power_control |= PCI_PM_CTRL_PME_STATUS;
power_control = ~(PCI_PM_CTRL_STATE_MASK);
@@ -223,7 +225,7 @@ static int sky2_set_power_state(struct s
sky2_write8(hw, B2_Y2_CLK_GATE, 0);
 
/* Turn off phy power saving */
-   reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
+   pci_read_config_dword(hw-pdev, PCI_DEV_REG1, reg1);
reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD);
 
/* looks like this XL is back asswards .. */
@@ -232,28 +234,18 @@ static int sky2_set_power_state(struct s
if (hw-ports  1)
reg1 |= PCI_Y2_PHY2_COMA;
}
-
-   if (hw-chip_id == CHIP_ID_YUKON_EC_U) {
-   sky2_pci_write32(hw, PCI_DEV_REG3, 0);
-   reg1 = sky2_pci_read32(hw, PCI_DEV_REG4);
-   reg1 = P_ASPM_CONTROL_MSK;
-   sky2_pci_write32(hw, PCI_DEV_REG4, reg1);
-   sky2_pci_write32(hw, PCI_DEV_REG5, 0);
-   }
-
-   sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
-
+   pci_write_config_dword(hw-pdev, PCI_DEV_REG1, reg1);
break;
 
case PCI_D3hot:
case PCI_D3cold:
/* Turn on phy power saving */
-   reg1 = sky2_pci_read32(hw, PCI_DEV_REG1);
+   pci_read_config_dword(hw-pdev, PCI_DEV_REG1, reg1);
if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
reg1 = ~(PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD);
else
reg1 |= (PCI_Y2_PHY1_POWD | PCI_Y2_PHY2_POWD);
-   sky2_pci_write32(hw, PCI_DEV_REG1, reg1);
+   pci_write_config_dword(hw-pdev, PCI_DEV_REG1, reg1);
 
if (hw-chip_id == CHIP_ID_YUKON_XL  hw-chip_rev  1)
sky2_write8(hw, B2_Y2_CLK_GATE, 0);
@@ -275,7 +267,7 @@ static int sky2_set_power_state(struct s
ret = -1;
}
 
-   sky2_pci_write16(hw, hw-pm_cap + PCI_PM_CTRL, power_control);
+   pci_write_config_byte(hw-pdev, hw-pm_cap + PCI_PM_CTRL, 
power_control);
sky2_write8(hw, B2_TST_CTRL1, TST_CFG_WRITE_OFF);
return ret;
 }
@@ -473,31 +465,16 @@ static void sky2_phy_init(struct sky2_hw
ledover |= PHY_M_LED_MO_RX(MO_LED_OFF);
}
 
-   if (hw-chip_id == CHIP_ID_YUKON_EC_U  hw-chip_rev = 2) {
-   /* apply fixes in PHY AFE */
-   gm_phy_write(hw, port, 22, 255);
-   /* increase differential signal amplitude in 10BASE-T */
-   gm_phy_write(hw, port, 24, 0xaa99);
-   gm_phy_write(hw, port, 23, 0x2011

Re: [PATCH 02/02] add mask options to fwmark masking code

2006-02-20 Thread Carl-Daniel Hailfinger
Michael Richardson schrieb:
 [PATCH] This patch introduces a mask to the fwmark test cases in the advanced
 routing. This let's one test individual bits of the fwmark to determine
 how things should be routed (pick a routing table). This patch retains
 compatibility with tests that do not set the mask by assuming a mask
 of 0 is equivalent to a mask of 0x.

Sorry if I misunderstood the intention of your patch, but isn't similar code
already in mainline?

linux-2.6.16-rc3/net/sched/cls_u32.c:146
#ifdef CONFIG_CLS_U32_MARK
if ((skb-nfmark  n-mark.mask) != n-mark.val) {


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT] sky2 0.16

2006-02-20 Thread Carl-Daniel Hailfinger
Ian Kumlien schrieb:
 On Sun, 2006-02-19 at 14:20 +0100, Wolfgang Hoffmann wrote:
 
On Saturday 18 February 2006 18:00, Carl-Daniel Hailfinger wrote:

Hi,

Stephen Hemminger schrieb:

Could everyone who has problems with hangs try the
following patch (against current 2.6.16-rc3 version)

If Stephen's patch doesn't work for you, could you try replacing
sky2.c and sky2.h with the ones attached to this mail? I'd be very
interested in feedback for my version of the hangfix.

Yes, your version cures my hangs.
 
 It's official, it cures my hangs as well, but it doesn't do MSI, and MSI
 might add some  additional complexity.

Could you all please test the attached patch against 2.6.16-rc4?
It is a straight forward-port of my sky2 version that worked for you.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
diff -urN linux-2.6.16-rc4/drivers/net/sky2.c 
linux-2.6.16-rc4-sky2fix/drivers/net/sky2.c
--- linux-2.6.16-rc4/drivers/net/sky2.c 2006-02-21 01:31:18.0 +0100
+++ linux-2.6.16-rc4-sky2fix/drivers/net/sky2.c 2006-02-21 01:27:42.0 
+0100
@@ -1863,6 +1863,17 @@
 
sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
 
+   /*
+* Kick the STAT_LEV_TIMER_CTRL timer.
+* This fixes my hangs on Yukon-EC (0xb6) rev 1.
+* The if clause is there to start the timer only if it has been
+* configured correctly and not been disabled via ethtool.
+*/
+   if (sky2_read8(hw, STAT_LEV_TIMER_CTRL) == TIM_START) {
+   sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_STOP);
+   sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_START);
+   }
+
hwidx = sky2_read16(hw, STAT_PUT_IDX);
BUG_ON(hwidx = STATUS_RING_SIZE);
rmb();


Re: [RFT] sky2 0.16

2006-02-18 Thread Carl-Daniel Hailfinger
Ian Kumlien schrieb:
 On Sat, 2006-02-18 at 18:00 +0100, Carl-Daniel Hailfinger wrote:
 
Hi,

Stephen Hemminger schrieb:

Could everyone who has problems with hangs try the
following patch (against current 2.6.16-rc3 version)

If Stephen's patch doesn't work for you, could you try replacing
sky2.c and sky2.h with the ones attached to this mail? I'd be very
interested in feedback for my version of the hangfix.
 
 
 Using that time stop and start on current 0.17 did not help:

And what about the modified 0.13 version I had attached?


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: fix hang on Yukon-EC (0xb6) rev 1

2006-01-25 Thread Carl-Daniel Hailfinger
Carl-Daniel Hailfinger schrieb:
 Stephen Hemminger schrieb:
 
On Tue, 24 Jan 2006 14:19:56 +0100
Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote:



This patch for sky2 fixes a hang on Yukon-EC (0xb6) rev 1
where suddenly no more interrupts were delivered.

I don't know the real cause of the hang due to lack of docs,
but the patch has been running stable for a few hours
whereas the unmodified driver will hang after less than
2 minutes.

OK, the patch has been stable for me for about 30 hours.
As an added benefit, it seems to have reduced the NMI rate
on my box from 1/second to 0.15/second, so something was
wrong with these cards.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sky2: fix ethtool ops

2006-01-24 Thread Carl-Daniel Hailfinger
This fixes setting rx_coalesce_usecs_irq via ethtool in sky2.
The write was directed to the wrong register.

Signed-off-by: Carl-Daniel Hailfinger [EMAIL PROTECTED]

--- linux/drivers/net/sky2.c2006-01-23 23:41:35.0 +0100
+++ linux/drivers/net/sky2.c2006-01-24 12:52:11.0 +0100
@@ -2843,7 +2843,7 @@
if (ecmd-rx_coalesce_usecs_irq == 0)
sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP);
else {
-   sky2_write32(hw, STAT_TX_TIMER_INI,
+   sky2_write32(hw, STAT_ISR_TIMER_INI,
 sky2_us2clk(hw, ecmd-rx_coalesce_usecs_irq));
sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START);
}


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sky2: fix hang on Yukon-EC (0xb6) rev 1

2006-01-24 Thread Carl-Daniel Hailfinger
Stephen Hemminger schrieb:
 On Tue, 24 Jan 2006 14:19:56 +0100
 Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote:
 
 
This patch for sky2 fixes a hang on Yukon-EC (0xb6) rev 1
where suddenly no more interrupts were delivered.

I don't know the real cause of the hang due to lack of docs,
but the patch has been running stable for a few hours
whereas the unmodified driver will hang after less than
2 minutes.
 
 
 This shouldn't be necessary, but I don't have specifications (yet) either.
 The logic is that clearing the interrupt should be okay since later on
 we check to see if the status ring is empty.

Hm. We check the TX status, but do we really also check the RX status?
I always had the feeling that the hangs were due to RX packets not getting
serviced. Well, we'll see the real reason once you have the docs.

 I'll hold off till I get hardware specifications, that are coming any
 day now.

Understandable. Just wanted to let you know that the patch has now been
stable under various loads for 8 hours.

Should there indeed be a documented bug in my Yukon2 version, will the
bugfix make it into 2.6.16?


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-23 Thread Carl-Daniel Hailfinger
Stephen Hemminger schrieb:
 You might try adjusting the interrupt coalescing parameters with
   ethtool -C eth0 ...
 But I can't give you hard guidelines as to what would make it better.
 
 I have a debug patch, but it needs work still.

I don't care whether that debug patch will freeze the box or perform
other random funnies. All the debugging printks I added to the driver
did not trigger and I'd try anything. So yes, I'm desparate.

Does the sk98lin driver have any code for such problems?

Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-23 Thread Carl-Daniel Hailfinger
Stephen Hemminger schrieb:
 On Mon, 23 Jan 2006 20:57:10 +0100
 Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote:
 
 
Stephen Hemminger schrieb:

You might try adjusting the interrupt coalescing parameters with
 ethtool -C eth0 ...
But I can't give you hard guidelines as to what would make it better.

I have a debug patch, but it needs work still.

I don't care whether that debug patch will freeze the box or perform
other random funnies. All the debugging printks I added to the driver
did not trigger and I'd try anything. So yes, I'm desparate.

Does the sk98lin driver have any code for such problems?
 
 
 There are several differences that the sk98lin driver has.
 * It programs some parts of the chip differently. But most
   of those are wrong. I started copying it, but where it was wrong
   I didn't copy the mistakes.
 * Sk98lin does NAPI wrong. It has interrupts disabled and runs
   packets through soft irq twice.
 * Sk98lin does it's own buggy rx checksum validation.
 * Sk98lin does not do VLAN
 * Sk98lin programs PCI-Ex for 2K transfers, but that causes data
   corruption
 
 The one that probably is saving you with sk98lin, is it has a watchdog
 routine that tries to work around all the possible driver hangs.
 I prefer to find an fix these hangs, because a watchdog routine like that
 just masks the problem and introduces a bunch of SMP race conditions which
 the sk98lin author either didn't see or ignored.

Oh. Now that is news to me. Glad I didn't have a SMP machine with the old
driver.

There is a bug in ethtool support in sky2. Namely, rx-frames{,-irq}=64 is
wrapped to zero. And rx-usecs-irq is 20 no matter what I set it to.

# ethtool -C bridgeint0 rx-frames 64 rx-frames-irq 64 rx-usecs 1 rx-usecs-irq 1 
tx-usecs 1 tx-frames 64
# ethtool -c bridgeint0
Coalesce parameters for bridgeint0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 1
rx-frames: 0
rx-usecs-irq: 20
rx-frames-irq: 0

tx-usecs: 1
tx-frames: 64
tx-usecs-irq: 0
tx-frames-irq: 0

Will continue investigating.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-23 Thread Carl-Daniel Hailfinger
Carl-Daniel Hailfinger schrieb:
 Stephen Hemminger schrieb:
 
On Mon, 23 Jan 2006 20:57:10 +0100
Carl-Daniel Hailfinger [EMAIL PROTECTED] wrote:


Stephen Hemminger schrieb:

You might try adjusting the interrupt coalescing parameters with
ethtool -C eth0 ...
But I can't give you hard guidelines as to what would make it better.

I have a debug patch, but it needs work still.

I don't care whether that debug patch will freeze the box or perform
other random funnies. All the debugging printks I added to the driver
did not trigger and I'd try anything. So yes, I'm desparate.

Does the sk98lin driver have any code for such problems?


There are several differences that the sk98lin driver has.
* It programs some parts of the chip differently. But most
  of those are wrong. I started copying it, but where it was wrong
  I didn't copy the mistakes.
* Sk98lin does NAPI wrong. It has interrupts disabled and runs
  packets through soft irq twice.
* Sk98lin does it's own buggy rx checksum validation.
* Sk98lin does not do VLAN
* Sk98lin programs PCI-Ex for 2K transfers, but that causes data
  corruption

The one that probably is saving you with sk98lin, is it has a watchdog
routine that tries to work around all the possible driver hangs.
I prefer to find an fix these hangs, because a watchdog routine like that
just masks the problem and introduces a bunch of SMP race conditions which
the sk98lin author either didn't see or ignored.
 
 
 Oh. Now that is news to me. Glad I didn't have a SMP machine with the old
 driver.
 
 There is a bug in ethtool support in sky2. Namely, rx-frames{,-irq}=64 is
 wrapped to zero. And rx-usecs-irq is 20 no matter what I set it to.

The following whitespace-damaged patch should help with the latter problem.
--- a/drivers/net/sky2.c  2006-01-23 23:41:35.0 +0100
+++ b/drivers/net/sky2.c  2006-01-24 03:41:21.0 +0100
@@ -2843,7 +2843,7 @@
if (ecmd-rx_coalesce_usecs_irq == 0)
sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP);
else {
-   sky2_write32(hw, STAT_TX_TIMER_INI,
+   sky2_write32(hw, STAT_ISR_TIMER_INI,
 sky2_us2clk(hw, ecmd-rx_coalesce_usecs_irq));
sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START);
}


Despite all the problems I'm having with sky2, I want to thank you
for writing it. The driver is easily readable and I can at least try
to get it running. With sk98lin I'm just stuck due to coding style
and general obfuscation.

Yeah!
I got the nic to reproducibly auto-recover. With the following ethtool
settings it would hang after a few minutes and not recover until a
rmmod/modprobe cycle. Now it comes back reliably.
# ethtool -C bridgeext0 rx-frames 63 rx-frames-irq 63 tx-frames 63 \
rx-usecs 250 rx-usecs-irq 250 tx-usecs 250

Patch follows:
--- a/drivers/net/sky2.c  2006-01-23 23:41:35.0 +0100
+++ b/drivers/net/sky2.c  2006-01-24 04:59:38.0 +0100
@@ -1623,6 +1623,12 @@
unsigned txq = txqaddr[sky2-port];
u16 ridx;

+   //sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
+   sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_STOP);
+   //sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_STOP);
+   //sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
+   sky2_write8(hw, STAT_LEV_TIMER_CTRL, TIM_START);
+   //sky2_write8(hw, STAT_ISR_TIMER_CTRL, TIM_START);
/* Maybe we just missed an status interrupt */
spin_lock(sky2-tx_lock);
ridx = sky2_read16(hw,
@@ -1639,6 +1645,7 @@
if (netif_msg_timer(sky2))
printk(KERN_ERR PFX %s: tx timeout\n, dev-name);

+#if 0
sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP);
sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);

@@ -1646,6 +1653,7 @@

sky2_qset(hw, txq);
sky2_prefetch_init(hw, txq, sky2-tx_le_map, TX_RING_SIZE - 1);
+#endif
 }

Properties of the patch above: The device will fail after
some time, enter the tx_timeout handler, recover and continue.
Now if I could avoid entering the tx_timeout handler, I would
be happy because it triggers only after hanging for approx.
10 seconds.

Error log with my patch so far:
Jan 24 05:09:27 switch kernel: NETDEV WATCHDOG: bridgeint0: transmit timed out
Jan 24 05:09:27 switch kernel: sky2 bridgeint0: tx timeout
Jan 24 05:09:41 switch kernel: NETDEV WATCHDOG: bridgeext0: transmit timed out
Jan 24 05:09:41 switch kernel: sky2 bridgeext0: tx timeout
Jan 24 05:09:41 switch kernel: sky2 bridgeext0: rx error, status 0x7ffc0001 
length 1312
Jan 24 05:11:12 switch kernel: NETDEV WATCHDOG: bridgeint0: transmit timed out
Jan 24 05:11:12 switch kernel: sky2 bridgeint0: tx timeout
Jan 24 05:11:12 switch kernel: sky2 bridgeint0: rx error, status 0x7ffc0001 
length 592
Jan 24 05:11:42 switch kernel: NETDEV WATCHDOG: bridgeint0: transmit timed out
Jan 24 05:11:42 switch kernel: sky2 bridgeint0: tx timeout
Jan 24 05:11:42 switch kernel: sky2

Re: sky2 0.11 instability

2006-01-22 Thread Carl-Daniel Hailfinger
Hi,

Carl-Daniel Hailfinger schrieb:
 Carl-Daniel Hailfinger schrieb:
 
Carl-Daniel Hailfinger schrieb:


after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21
card (sky2 says it is a Yukon-EC (0xb6) rev 1), the card appears
dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board.

I have now added a hard reset routine to the tx timeout
path and hope it won't kill my machine.
 
 
 Apologies for mangled whitespace, this is just a rough cut'n'paste.
 --- linux-2.6.15/drivers/net/sky2.c.orig2006-01-21 16:00:15.0 
 +0100
 +++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.0 +0100
 @@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2
 return 0;
  }
 
 +static int sky2_reset(struct sky2_hw *hw);
  /*
   * Interrupt from PHY are handled outside of interrupt context
   * because accessing phy registers requires spin wait which might
 @@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d
 if (netif_msg_timer(sky2))
 printk(KERN_ERR PFX %s: tx timeout\n, dev-name);
 
 +   if (0) {
 sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP);
 sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);
 
 @@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d
 
 sky2_qset(hw, txq);
 sky2_prefetch_init(hw, txq, sky2-tx_le_map, TX_RING_SIZE - 1);
 +   } else {
 +   printk(KERN_ERR PFX %s: recovering the HARD way...\n, dev-name);
 +   sky2_down(dev);
 +   sky2_reset(hw);
 +   sky2_up(dev);
 +   }
  }
 
 
 And everytime the kernel throws this message, I run the following
 script:
 
 #!/bin/bash
 deadinterface=`dmesg|grep HARD|tail -1|sed s/.*sky2 //;s/:.*//`
 ip l s $deadinterface down
 ip l s $deadinterface up
 
 After that, everything continues to work until the next tx timeout
 happens, and then the script again saves the day.
 
 More results about the circumstances of this bug: It seems that
 it will only trigger under LOW load. As long as I keep the interface
 busy, it will have no problems at all.

OK, more info about the circumstances of the bug.
- happens with sky2 0.11 and 0.13
- with low load (100 kB/s) it triggers after 12 hours and then
  approx. every 50 minutes
- with medium load (100-1200 kB/s) it triggers after 30 minutes
  and then approx. every 70 minutes
- with high RX load (9-12 MB/s) it triggers every 8 hours
- with high TX load (9-12 MB/s) I can't get it to trigger
- with stock tx_timeout handler, it will stay dead and no interrupts
  are received from the nic once it hangs
- simply taking the interface down and up again doesn't help
- with my modified tx_timeout handler, taking the interface down and
  up again after the timeout helps
- with stock tx_timeout handler, I have to unload and reload the
  module to fix up the card
- general pattern seems to be medium interrupt load - instability
- ah yes, and this is a production machine at a slightly remote
  location. Silly me.

If you want me to test any patch, tell me. It can only get better.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-22 Thread Carl-Daniel Hailfinger
Stephen Hemminger schrieb:
 You might try adjusting the interrupt coalescing parameters with
   ethtool -C eth0 ...
 But I can't give you hard guidelines as to what would make it better.
 
 I have a debug patch, but it needs work still.

ethtool -C bridgeint1 rx-frames 255 rx-frames-irq 255 rx-usecs 0 rx-usecs-irq 0 
tx-usecs 0 tx-frames 255

always results in a hang after less than 2 minutes if the network
activity is not too high (about 100-600 packets/s). So yes, I can
trigger this sucker on demand and give you all the debugging you
need.

Do you have any idea what the out-of-tree sk98lin did differently?


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-22 Thread Carl-Daniel Hailfinger
Carl-Daniel Hailfinger schrieb:
 Stephen Hemminger schrieb:
 
You might try adjusting the interrupt coalescing parameters with
  ethtool -C eth0 ...
But I can't give you hard guidelines as to what would make it better.

I have a debug patch, but it needs work still.

After experimenting further, the following command will always hang
the card after 2-3 seconds:

ethtool -C bridgeint1 rx-frames 63 rx-frames-irq 63 rx-usecs 0 rx-usecs-irq 0 
tx-usecs 0 tx-frames 63

Crude activity log (1 second interval) follows:

interrupts   RX packets   TX packets

# normal activity
18225503  1828622  2084564
18225914  1828932  2084939
18226422  1829361  2085422
18226875  1829694  2085832
18227286  1830012  2086183
18227622  1830270  2086465
18227963  1830541  2086738
18228340  1830827  2087057
18228710  1831107  2087382
18229091  1831390  2087694
18229467  1831677  2088002
18229835  1831954  2088338
# ethtool starts now
18230143  1832249  2088647
18230146  1832434  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
18230146  1832462  2088799
# the netdev watchdog triggers now


 So yes, I can trigger this sucker on demand and give you all the
 debugging you need.
 
 Do you have any idea what the out-of-tree sk98lin v8.14.3.3 did
 differently?


Regards,
Carl-Daniel
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-21 Thread Carl-Daniel Hailfinger
Hi,

Carl-Daniel Hailfinger schrieb:
 
 after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21
 card (sky2 says it is a Yukon-EC (0xb6) rev 1), the card appears
 dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board.
 
 sky2 v0.11 addr 0xc900 irq 74 Yukon-EC (0xb6) rev 1
 sky2 eth3: addr 00:00:5a:70:30:fb
 [...]
 sky2 eth3: enabling interface
 [...]
 sky2 eth3: phy interrupt status 0x1c40 0x7d0c
 sky2 eth3: Link is up at 100 Mbps, full duplex, flow control both
 [...]
 NETDEV WATCHDOG: eth3: transmit timed out
 sky2 eth3: tx timeout
 NETDEV WATCHDOG: eth3: transmit timed out
 sky2 eth3: tx timeout
 
 
 switch:~ # ifconfig eth3
 eth3   Link encap:Ethernet  HWaddr 00:00:5A:70:30:FB
   inet6 addr: fe80::200:5aff:fe70:30fb/64 Scope:Link
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:130530358 errors:0 dropped:0 overruns:0 frame:0
   TX packets:209647800 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:25980735946 (24777.1 Mb)  TX bytes:259787058579 (247752.2 
 Mb)
   Interrupt:74
 
 switch:~ # cat /proc/interrupts
CPU0
   0:   11213627IO-APIC-edge  timer
   1:  24783IO-APIC-edge  i8042
   8:  0IO-APIC-edge  rtc
   9:  0   IO-APIC-level  acpi
  15: 401558IO-APIC-edge  ide1
  50:  249384881   IO-APIC-level  eth0
  58:  179123938   IO-APIC-level  sky2
  66:  3   IO-APIC-level  sky2, ohci1394
  74:   98956955   IO-APIC-level  sky2
  82:  19952   IO-APIC-level  sky2
 217:   1865   IO-APIC-level  libata, NVidia CK804
 225: 263052   IO-APIC-level  libata, ehci_hcd:usb1
 NMI:  11098
 LOC:   11214113
 ERR:  0
 MIS:  0
 
 Not only will the card not transmit anymore, it also doesn't
 receive any packet at all. ethtool -r eth3 doesn't change
 anything, taking the interface down and up again also doesn't
 help. The interrupt count of interrupt 74 stays constant after
 failing.
 
 modprobe -r sky2; modprobe sky2
 fixes the problem for me, so maybe resetting the card on TX
 timeouts will help.
 
 The same problem appeared much earlier for another card which
 shared interrupt 58 with an onboard card driven by skge. After
 disabling the skge driver and rebooting, that card has been
 stable so far.
 
 The card is connected to a 100 MBit switch.
 
 These problems didn't appear with sk98lin v8.14.3.3 (that
 driver did survive about 10 TB of traffic before I rebooted).
 
 Register dumps are available on request (too big for this
 list).
 
 I will now try sky2 0.13 and report back.

And it hit the other interface after 200 MB transferred...
NETDEV WATCHDOG: bridgeext0: transmit timed out
sky2 bridgeext0: tx timeout
NETDEV WATCHDOG: bridgeext0: transmit timed out
sky2 transmit interrupt missed? recovered

Although the driver claims to recover, it doesn't recover at all.
What debug level would be advisable? It is now running with
modprobe sky2 debug=2, but I can't see more than the messages
above.

I have now added a hard reset routine to the tx timeout
path and hope it won't kill my machine.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sky2 0.11 instability

2006-01-21 Thread Carl-Daniel Hailfinger
Carl-Daniel Hailfinger schrieb:
 Hi,
 
 Carl-Daniel Hailfinger schrieb:
 
after sending 259 GB and receiving 25 GB over my SysKonnect SK-9E21
card (sky2 says it is a Yukon-EC (0xb6) rev 1), the card appears
dead. Machine is an Athlon64 3200+ on an Asus A8N-SLI Deluxe board.

sky2 v0.11 addr 0xc900 irq 74 Yukon-EC (0xb6) rev 1
sky2 eth3: addr 00:00:5a:70:30:fb
[...]
sky2 eth3: enabling interface
[...]
sky2 eth3: phy interrupt status 0x1c40 0x7d0c
sky2 eth3: Link is up at 100 Mbps, full duplex, flow control both
[...]
NETDEV WATCHDOG: eth3: transmit timed out
sky2 eth3: tx timeout
NETDEV WATCHDOG: eth3: transmit timed out
sky2 eth3: tx timeout


switch:~ # ifconfig eth3
eth3   Link encap:Ethernet  HWaddr 00:00:5A:70:30:FB
  inet6 addr: fe80::200:5aff:fe70:30fb/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:130530358 errors:0 dropped:0 overruns:0 frame:0
  TX packets:209647800 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:25980735946 (24777.1 Mb)  TX bytes:259787058579 (247752.2 
 Mb)
  Interrupt:74

switch:~ # cat /proc/interrupts
   CPU0
  0:   11213627IO-APIC-edge  timer
  1:  24783IO-APIC-edge  i8042
  8:  0IO-APIC-edge  rtc
  9:  0   IO-APIC-level  acpi
 15: 401558IO-APIC-edge  ide1
 50:  249384881   IO-APIC-level  eth0
 58:  179123938   IO-APIC-level  sky2
 66:  3   IO-APIC-level  sky2, ohci1394
 74:   98956955   IO-APIC-level  sky2
 82:  19952   IO-APIC-level  sky2
217:   1865   IO-APIC-level  libata, NVidia CK804
225: 263052   IO-APIC-level  libata, ehci_hcd:usb1
NMI:  11098
LOC:   11214113
ERR:  0
MIS:  0

Not only will the card not transmit anymore, it also doesn't
receive any packet at all. ethtool -r eth3 doesn't change
anything, taking the interface down and up again also doesn't
help. The interrupt count of interrupt 74 stays constant after
failing.

modprobe -r sky2; modprobe sky2
fixes the problem for me, so maybe resetting the card on TX
timeouts will help.

The same problem appeared much earlier for another card which
shared interrupt 58 with an onboard card driven by skge. After
disabling the skge driver and rebooting, that card has been
stable so far.

The card is connected to a 100 MBit switch.

These problems didn't appear with sk98lin v8.14.3.3 (that
driver did survive about 10 TB of traffic before I rebooted).

Register dumps are available on request (too big for this
list).

I will now try sky2 0.13 and report back.
 
 
 And it hit the other interface after 200 MB transferred...
 NETDEV WATCHDOG: bridgeext0: transmit timed out
 sky2 bridgeext0: tx timeout
 NETDEV WATCHDOG: bridgeext0: transmit timed out
 sky2 transmit interrupt missed? recovered
 
 Although the driver claims to recover, it doesn't recover at all.
 What debug level would be advisable? It is now running with
 modprobe sky2 debug=2, but I can't see more than the messages
 above.
 
 I have now added a hard reset routine to the tx timeout
 path and hope it won't kill my machine.

Apologies for mangled whitespace, this is just a rough cut'n'paste.
--- linux-2.6.15/drivers/net/sky2.c.orig2006-01-21 16:00:15.0 
+0100
+++ linux-2.6.15/drivers/net/sky2.c 2006-01-21 14:08:28.0 +0100
@@ -1565,6 +1565,7 @@ static int sky2_autoneg_done(struct sky2
return 0;
 }

+static int sky2_reset(struct sky2_hw *hw);
 /*
  * Interrupt from PHY are handled outside of interrupt context
  * because accessing phy registers requires spin wait which might
@@ -1639,6 +1640,7 @@ static void sky2_tx_timeout(struct net_d
if (netif_msg_timer(sky2))
printk(KERN_ERR PFX %s: tx timeout\n, dev-name);

+   if (0) {
sky2_write32(hw, Q_ADDR(txq, Q_CSR), BMU_STOP);
sky2_write32(hw, Y2_QADDR(txq, PREF_UNIT_CTRL), PREF_UNIT_RST_SET);

@@ -1646,6 +1648,12 @@ static void sky2_tx_timeout(struct net_d

sky2_qset(hw, txq);
sky2_prefetch_init(hw, txq, sky2-tx_le_map, TX_RING_SIZE - 1);
+   } else {
+   printk(KERN_ERR PFX %s: recovering the HARD way...\n, dev-name);
+   sky2_down(dev);
+   sky2_reset(hw);
+   sky2_up(dev);
+   }
 }


And everytime the kernel throws this message, I run the following
script:

#!/bin/bash
deadinterface=`dmesg|grep HARD|tail -1|sed s/.*sky2 //;s/:.*//`
ip l s $deadinterface down
ip l s $deadinterface up

After that, everything continues to work until the next tx timeout
happens, and then the script again saves the day.

More results about the circumstances of this bug: It seems that
it will only trigger under LOW load. As long as I keep the interface
busy, it will have no problems at all.


Regards,
Carl-Daniel
-- 
http://www.hailfinger.org/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org