Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-05-02 Thread Nuri Jawad

Good evening,

Bugzilla entry on this is now here:
http://bugzilla.kernel.org/show_bug.cgi?id=6484

Note the interesting fact that kernel mode PPPoE is not affected. Thus it 
could also be a bug in Roaring Penguin's PPPoE program. The problem is 
that all other user space implementations seem to be quite outdated

(made for 2.2 kernels).

I still did observe around 500 packets being lost after a night of pinging 
this machine, compared to around 30 with 2.6.15.7 user mode pppoe and 
700-1100 with >=2.6.16 user mode. I made a little perl script that 
generates a histogram from ping's output and there were only single 
packets lost.
However, just when I was about to test kernel mode with 2.6.15.7 tonight, 
this effect disappeared. Might have been my ISP after all.


Thus, for the time being, kernel mode PPPoE seems to be a viable 
workaround.
The whole matter is a bit strange to me. I would have expected that the 
kernel only communicates with pppd which then utilizes a process 
encapsulating the packts in ethernet frames. That's why I didn't

think this bug was a pure pppoe issue, which it seems to be.

Regards, Nuri
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-30 Thread Nuri Jawad

On Sun, 30 Apr 2006, [EMAIL PROTECTED] wrote:


I observed a 1-2 sec stalling behaviour for the complete system every
10 seconds or so _seemingly_ only when my ADSL connection was up.


I had that idea too, but that sounds different from what I have here. I 
have also transfered lots of data at >900 MBits/s with the e1000 and 
never had a single problem. The packets are not vanishing on the wire and 
the system does not stall, there's just nothing appearing on ppp0 tx at 
all. That sounds like an unrelated issue to me. 
BTW, there was no dropout in the last 8 hours, only after I started some 
tx load a while ago one of them came up within minutes.



erroneous patch I realized I had changed the driver when upgrading the
kernel to 2.6.14.


What does 2.6.14 have to do with it? The ppp problem appeared exactly with 
*2.6.16*. It looks like it will also be in 2.6.17 because nobody is 
stepping on the brake :/. All this with code that had worked perfectly 
fine for ages. I'm getting a bit frustrated here.


Well, I might try disabling the onboard e1000 and replacing it with a 
"good" old Realtek.


Regards, Nuri
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-30 Thread theosch

> Going back to e100 helped

Sorry, I meant: Going back to eepro100 helped

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-30 Thread theosch

Maybe this is an issue of the e100 driver?

I observed a 1-2 sec stalling behaviour for the complete system every
10 seconds or so _seemingly_ only when my ADSL connection was up. That
was after I had changed the ethernet driver for a card _not_ connected
to the modem from eepro100 to e100.

After a lot of fiddling around with git-bisect trying to find the
erroneous patch I realized I had changed the driver when upgrading the
kernel to 2.6.14. The problem also existed in later kernel versions.
Going back to e100 helped.

Cheers
Arnold


If you have any question, please Cc to theosch at gmx.net as I am not
subscibed to the list.

+ + + + +
Using Kernel pppoe
PCI-Rev: Don't know (well, it says "PCI: PCI BIOS revision 2.10...")

# CONFIG_SMP is not set
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set

First ethernet (connected to ADSL modem): 
PCI: Found IRQ 11 for device :00:0c.0
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
:00:0c.0: 3Com PCI 3c905B Cyclone 100baseTx at e080. Vers LK1.1.19

Second ethernet (local LAN):
eepro100.c:v1.09j-t 9/29/99 Donald Becker 
http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin 
<[EMAIL PROTECTED]> and others
PCI: Found IRQ 10 for device :00:0b.0
eth1: :00:0b.0, 00:D0:B7:83:58:26, IRQ 10.
  Board assembly 721383-009, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0x04f4518b).

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16 - updates

2006-04-29 Thread Nuri Jawad

Some more info:

- turning off Hyper Threading and using a uniprocessor kernel
  did not improve things
- so didn't using 2.6.17rc3, in fact the bug manifested after
  only 4 minutes with a 43 seconds gap
- those kernel debug watchdog routines don't detect anything

Going to try kernel PPPoE next time. Btw, at least with rp-pppoe 
it requires HDLC and that dependency isn't caught in menuconfig.


I would try to roll back some patches between 2.6.15.7 and 2.6.16 but
that changelog is pretty large. I'm sure there are good reasons for 
the current development model, but with the old unstable/stable

system and its few changes between stable versions, the right
one could've been spotted easily :/.

Regards, Nuri
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-28 Thread Nuri Jawad

On Wed, 26 Apr 2006, Sven Schuster wrote:


but don't hold your breath waiting for me, kernel compile
takes more than two hours on my box :-)


Ouch. Takes 5/7 minutes here on the AMD64 resp. P4. Computer museum? :P
Anyhow, I tested PPP for 2.5 hours on the AMD64 the day before 
yesterday with a bidirectional transfer that maxed out the upstream.
Last night, I additionally put some load on the CPU. Another 3 hours, no 
problems whatsoever. Looks like the bug does not manifest on that system.


The next step will be to clone .config's settings as far as possible with 
the different hardware and try again.


Regards,
Nuri
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-26 Thread Sven Schuster

Hi Andrew,

On Wed, Apr 26, 2006 at 03:07:33PM -0700, Andrew Morton told us:
> So there's something in -mm which fixes your kernel?  It's usually the
> other way around ;)

actually this was the first time that I tried a "normal" kernel.
I haven't chosen to run -mm because it fixed something for me
originally, I just run -mm for a matter of taste ;)

> And it sounds like something which has been in -mm for a long time, so it
> might not be a patch which I was planning on sending upstream.
>
> Can you think of a way in which we can identify which patch does the good
> deed?

My first thought was it had something to do with pata_via, as
mkinitrd complained it cannot find that module in 2.6.16.9
when I installed it. Taking a closer look, it doesn't even seem
like pata_via is really used, its use count in lsmod output is 0.
But, in the last few releases of -mm I had problems every now and
then where my box didn't want to boot complaining about lost
interrupts on hdb (hdb here, not hda) or it just froze after some
days of uptime (I was able to do sysrq though). Later on I ran
SMART self tests on both my hard drives which didn't reveal any
errors. Google told me some other guys with VIA based boards had
similar problems which went away when using a board with another
vendor's chipset. Being a lazy bastard and having no real time I
stopped digging into this...
How to debug? I might try unapplying VIA and/or IDE related patches
from -mm until I get the same problem like with the stable series.
If one would tell me which patches I should try :-)

Here's the dmesg output concerning my IDE controller:

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot :00:07.1
PCI: Via IRQ fixup for :00:07.1, from 255 to 0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:07.1
ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
hda: Maxtor 6Y120L0, ATA DISK drive
hdb: Maxtor 6Y120L0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: AOPEN CD-RW CRW4852 1.00 20030123, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 >
hdb: max request size: 128KiB
hdb: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
hdb: cache flushes supported
 hdb: hdb1 hdb2 hdb3 hdb4 < hdb5 hdb6 hdb7 hdb8 >
hdc: ATAPI 40X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20

If someone wants me to provide more info, test patches or anything
please tell me :-)


Thanks

Sven

--
Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 
i686 athlon i386 GNU/Linux
 07:19:57 up 12:02,  2 users,  load average: 0.19, 0.10, 0.13


pgpEpSP63pPq7.pgp
Description: PGP signature


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-26 Thread Andrew Morton
Sven Schuster <[EMAIL PROTECTED]> wrote:
>
> On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us:
> > Did you create a high load on the system in the manner I described?
> > The bug once only appeared after about 6 hours here when line + CPU had 
> > been mostly idle. But that was the longest time between failures. Can you 
> > test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say 
> 
> Unfortunately it seems like plain 2.6.16.x doesn't like the ide
> controller on my (VIA) mainboard, I'm getting I/O errrors on hda
> when booting this kernel (but hard drive works ok with -mm) :-(
> actually I haven't been running a plain stable kernel for a while,
> I've been running -mm kernels for ages...
> 

So there's something in -mm which fixes your kernel?  It's usually the
other way around ;)

And it sounds like something which has been in -mm for a long time, so it
might not be a patch which I was planning on sending upstream.

Can you think of a way in which we can identify which patch does the good
deed?


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-26 Thread Sven Schuster
On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us:
> Did you create a high load on the system in the manner I described?
> The bug once only appeared after about 6 hours here when line + CPU had 
> been mostly idle. But that was the longest time between failures. Can you 
> test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say 

Unfortunately it seems like plain 2.6.16.x doesn't like the ide
controller on my (VIA) mainboard, I'm getting I/O errrors on hda
when booting this kernel (but hard drive works ok with -mm) :-(
actually I haven't been running a plain stable kernel for a while,
I've been running -mm kernels for ages...


Sven

> for sure if CPU load is a factor, load on the connection seems to be.

-- 
Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 
i686 athlon i386 GNU/Linux
 23:16:15 up  3:58,  2 users,  load average: 0.83, 0.82, 0.79


pgpuSq4dOMoY2.pgp
Description: PGP signature


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-25 Thread Sven Schuster

Hi,

On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us:
> >no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1.
> 
> Did you create a high load on the system in the manner I described?
> The bug once only appeared after about 6 hours here when line + CPU had 
> been mostly idle. But that was the longest time between failures. Can you 
> test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say 
> for sure if CPU load is a factor, load on the connection seems to be.

well, machine is mostly idle beside downloads now and then or
software compilations (kernel mostly) or periodic mail fetching
including virus and spam scanning. This is my box at home (on
which I'm currently writing this email). I'm currently compiling
2.6.16.9 and will test with this release later on. I will get
some periodic ping running to check for connection failures and
put some load on the machine. Will come back with the results
later, but don't hold your breath waiting for me, kernel compile
takes more than two hours on my box :-)


Regards,

Sven

>
> Regards,
> Nuri
>

-- 
Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 
i686 athlon i386 GNU/Linux
 07:56:45 up 3 days, 11:30,  2 users,  load average: 2.79, 1.32, 0.68


pgpCmDjMRWQQT.pgp
Description: PGP signature


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-25 Thread Nuri Jawad

no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1.


Did you create a high load on the system in the manner I described?
The bug once only appeared after about 6 hours here when line + CPU had 
been mostly idle. But that was the longest time between failures. Can you 
test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say 
for sure if CPU load is a factor, load on the connection seems to be.


After using 2.6.15.7 for another 5 days now with some more stress 
testing, I can assure that 2.6.15 definitely does not produce any 
dropouts on this machine.


For now I'll try to reproduce the effects on my second box (AMD64/nf4).
I'd be happy if someone could give me some hints on which patches I could 
try to revert as the changes to ppp between the two versions look fairly 
harmless. For the first time in 8.5 years, I cannot use a 'stable' kernel 
release and there is really nothing special about this system.


Regards,
Nuri
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-24 Thread Sven Schuster
On Sat, Apr 22, 2006 at 02:02:59AM +0200, Andi Kleen told us:
> On Friday 21 April 2006 19:15, Jesse Brandeburg wrote:
> > On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote:
> > >
> > > We do seem to have had a few reports of ppp regressions around this
> > > timeframe.
> > 
> > me too.  I couldn't use 2.6.16 at home on my pppoe connected router
> > because it was so slow.  I didn't have time to debug.  I can probably
> > try patches and provide more data too.  Tell me what is needed.
> 
> I seem to have some trouble on my PPPoE too. But it's not really unusable,
> just dropouts now and then.

no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1.


Sven

> -Andi
> -

-- 
Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 
i686 athlon i386 GNU/Linux
 09:40:22 up 1 day, 13:14,  4 users,  load average: 0.34, 0.16, 0.11


pgpada2CR2yQ1.pgp
Description: PGP signature


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-21 Thread Andi Kleen
On Friday 21 April 2006 19:15, Jesse Brandeburg wrote:
> On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote:
> >
> > We do seem to have had a few reports of ppp regressions around this
> > timeframe.
> 
> me too.  I couldn't use 2.6.16 at home on my pppoe connected router
> because it was so slow.  I didn't have time to debug.  I can probably
> try patches and provide more data too.  Tell me what is needed.

I seem to have some trouble on my PPPoE too. But it's not really unusable,
just dropouts now and then.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-21 Thread Andrew Morton
"Jesse Brandeburg" <[EMAIL PROTECTED]> wrote:
>
> On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote:
> >
> > We do seem to have had a few reports of ppp regressions around this
> > timeframe.
> 
> me too.  I couldn't use 2.6.16 at home on my pppoe connected router
> because it was so slow.  I didn't have time to debug.  I can probably
> try patches and provide more data too.  Tell me what is needed.

probably git-bisect, sorry.  It's the sort of thing you can do while
reading a good book ;)

> Is there a bugzilla on this?

I don't think so.  Bubgzilla records which I'm folowing which mention ppp
are:

http://bugzilla.kernel.org/show_bug.cgi?id=5695
http://bugzilla.kernel.org/show_bug.cgi?id=6197
http://bugzilla.kernel.org/show_bug.cgi?id=6402

Perhaps the final one is related.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Bug: PPP dropouts in >=2.6.16

2006-04-21 Thread Jesse Brandeburg
On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> We do seem to have had a few reports of ppp regressions around this
> timeframe.

me too.  I couldn't use 2.6.16 at home on my pppoe connected router
because it was so slow.  I didn't have time to debug.  I can probably
try patches and provide more data too.  Tell me what is needed.

Is there a bugzilla on this?

Jesse
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html