Re: Fw: Bug: PPP dropouts in >=2.6.16
Good evening, Bugzilla entry on this is now here: http://bugzilla.kernel.org/show_bug.cgi?id=6484 Note the interesting fact that kernel mode PPPoE is not affected. Thus it could also be a bug in Roaring Penguin's PPPoE program. The problem is that all other user space implementations seem to be quite outdated (made for 2.2 kernels). I still did observe around 500 packets being lost after a night of pinging this machine, compared to around 30 with 2.6.15.7 user mode pppoe and 700-1100 with >=2.6.16 user mode. I made a little perl script that generates a histogram from ping's output and there were only single packets lost. However, just when I was about to test kernel mode with 2.6.15.7 tonight, this effect disappeared. Might have been my ISP after all. Thus, for the time being, kernel mode PPPoE seems to be a viable workaround. The whole matter is a bit strange to me. I would have expected that the kernel only communicates with pppd which then utilizes a process encapsulating the packts in ethernet frames. That's why I didn't think this bug was a pure pppoe issue, which it seems to be. Regards, Nuri - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
On Sun, 30 Apr 2006, [EMAIL PROTECTED] wrote: I observed a 1-2 sec stalling behaviour for the complete system every 10 seconds or so _seemingly_ only when my ADSL connection was up. I had that idea too, but that sounds different from what I have here. I have also transfered lots of data at >900 MBits/s with the e1000 and never had a single problem. The packets are not vanishing on the wire and the system does not stall, there's just nothing appearing on ppp0 tx at all. That sounds like an unrelated issue to me. BTW, there was no dropout in the last 8 hours, only after I started some tx load a while ago one of them came up within minutes. erroneous patch I realized I had changed the driver when upgrading the kernel to 2.6.14. What does 2.6.14 have to do with it? The ppp problem appeared exactly with *2.6.16*. It looks like it will also be in 2.6.17 because nobody is stepping on the brake :/. All this with code that had worked perfectly fine for ages. I'm getting a bit frustrated here. Well, I might try disabling the onboard e1000 and replacing it with a "good" old Realtek. Regards, Nuri - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
> Going back to e100 helped Sorry, I meant: Going back to eepro100 helped - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
Maybe this is an issue of the e100 driver? I observed a 1-2 sec stalling behaviour for the complete system every 10 seconds or so _seemingly_ only when my ADSL connection was up. That was after I had changed the ethernet driver for a card _not_ connected to the modem from eepro100 to e100. After a lot of fiddling around with git-bisect trying to find the erroneous patch I realized I had changed the driver when upgrading the kernel to 2.6.14. The problem also existed in later kernel versions. Going back to e100 helped. Cheers Arnold If you have any question, please Cc to theosch at gmx.net as I am not subscibed to the list. + + + + + Using Kernel pppoe PCI-Rev: Don't know (well, it says "PCI: PCI BIOS revision 2.10...") # CONFIG_SMP is not set # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set First ethernet (connected to ADSL modem): PCI: Found IRQ 11 for device :00:0c.0 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html :00:0c.0: 3Com PCI 3c905B Cyclone 100baseTx at e080. Vers LK1.1.19 Second ethernet (local LAN): eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin <[EMAIL PROTECTED]> and others PCI: Found IRQ 10 for device :00:0b.0 eth1: :00:0b.0, 00:D0:B7:83:58:26, IRQ 10. Board assembly 721383-009, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16 - updates
Some more info: - turning off Hyper Threading and using a uniprocessor kernel did not improve things - so didn't using 2.6.17rc3, in fact the bug manifested after only 4 minutes with a 43 seconds gap - those kernel debug watchdog routines don't detect anything Going to try kernel PPPoE next time. Btw, at least with rp-pppoe it requires HDLC and that dependency isn't caught in menuconfig. I would try to roll back some patches between 2.6.15.7 and 2.6.16 but that changelog is pretty large. I'm sure there are good reasons for the current development model, but with the old unstable/stable system and its few changes between stable versions, the right one could've been spotted easily :/. Regards, Nuri - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
On Wed, 26 Apr 2006, Sven Schuster wrote: but don't hold your breath waiting for me, kernel compile takes more than two hours on my box :-) Ouch. Takes 5/7 minutes here on the AMD64 resp. P4. Computer museum? :P Anyhow, I tested PPP for 2.5 hours on the AMD64 the day before yesterday with a bidirectional transfer that maxed out the upstream. Last night, I additionally put some load on the CPU. Another 3 hours, no problems whatsoever. Looks like the bug does not manifest on that system. The next step will be to clone .config's settings as far as possible with the different hardware and try again. Regards, Nuri - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
Hi Andrew, On Wed, Apr 26, 2006 at 03:07:33PM -0700, Andrew Morton told us: > So there's something in -mm which fixes your kernel? It's usually the > other way around ;) actually this was the first time that I tried a "normal" kernel. I haven't chosen to run -mm because it fixed something for me originally, I just run -mm for a matter of taste ;) > And it sounds like something which has been in -mm for a long time, so it > might not be a patch which I was planning on sending upstream. > > Can you think of a way in which we can identify which patch does the good > deed? My first thought was it had something to do with pata_via, as mkinitrd complained it cannot find that module in 2.6.16.9 when I installed it. Taking a closer look, it doesn't even seem like pata_via is really used, its use count in lsmod output is 0. But, in the last few releases of -mm I had problems every now and then where my box didn't want to boot complaining about lost interrupts on hdb (hdb here, not hda) or it just froze after some days of uptime (I was able to do sysrq though). Later on I ran SMART self tests on both my hard drives which didn't reveal any errors. Google told me some other guys with VIA based boards had similar problems which went away when using a board with another vendor's chipset. Being a lazy bastard and having no real time I stopped digging into this... How to debug? I might try unapplying VIA and/or IDE related patches from -mm until I get the same problem like with the stable series. If one would tell me which patches I should try :-) Here's the dmesg output concerning my IDE controller: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx VP_IDE: IDE controller at PCI slot :00:07.1 PCI: Via IRQ fixup for :00:07.1, from 255 to 0 VP_IDE: chipset revision 6 VP_IDE: not 100% native mode: will probe irqs later VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci:00:07.1 ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:DMA, hdd:pio Probing IDE interface ide0... hda: Maxtor 6Y120L0, ATA DISK drive hdb: Maxtor 6Y120L0, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 Probing IDE interface ide1... hdc: AOPEN CD-RW CRW4852 1.00 20030123, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) hda: cache flushes supported hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 hda9 hda10 hda11 > hdb: max request size: 128KiB hdb: 240121728 sectors (122942 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) hdb: cache flushes supported hdb: hdb1 hdb2 hdb3 hdb4 < hdb5 hdb6 hdb7 hdb8 > hdc: ATAPI 40X CD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 If someone wants me to provide more info, test patches or anything please tell me :-) Thanks Sven -- Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 i686 athlon i386 GNU/Linux 07:19:57 up 12:02, 2 users, load average: 0.19, 0.10, 0.13 pgpEpSP63pPq7.pgp Description: PGP signature
Re: Fw: Bug: PPP dropouts in >=2.6.16
Sven Schuster <[EMAIL PROTECTED]> wrote: > > On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us: > > Did you create a high load on the system in the manner I described? > > The bug once only appeared after about 6 hours here when line + CPU had > > been mostly idle. But that was the longest time between failures. Can you > > test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say > > Unfortunately it seems like plain 2.6.16.x doesn't like the ide > controller on my (VIA) mainboard, I'm getting I/O errrors on hda > when booting this kernel (but hard drive works ok with -mm) :-( > actually I haven't been running a plain stable kernel for a while, > I've been running -mm kernels for ages... > So there's something in -mm which fixes your kernel? It's usually the other way around ;) And it sounds like something which has been in -mm for a long time, so it might not be a patch which I was planning on sending upstream. Can you think of a way in which we can identify which patch does the good deed? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us: > Did you create a high load on the system in the manner I described? > The bug once only appeared after about 6 hours here when line + CPU had > been mostly idle. But that was the longest time between failures. Can you > test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say Unfortunately it seems like plain 2.6.16.x doesn't like the ide controller on my (VIA) mainboard, I'm getting I/O errrors on hda when booting this kernel (but hard drive works ok with -mm) :-( actually I haven't been running a plain stable kernel for a while, I've been running -mm kernels for ages... Sven > for sure if CPU load is a factor, load on the connection seems to be. -- Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 i686 athlon i386 GNU/Linux 23:16:15 up 3:58, 2 users, load average: 0.83, 0.82, 0.79 pgpuSq4dOMoY2.pgp Description: PGP signature
Re: Fw: Bug: PPP dropouts in >=2.6.16
Hi, On Wed, Apr 26, 2006 at 02:36:18AM +0200, Nuri Jawad told us: > >no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1. > > Did you create a high load on the system in the manner I described? > The bug once only appeared after about 6 hours here when line + CPU had > been mostly idle. But that was the longest time between failures. Can you > test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say > for sure if CPU load is a factor, load on the connection seems to be. well, machine is mostly idle beside downloads now and then or software compilations (kernel mostly) or periodic mail fetching including virus and spam scanning. This is my box at home (on which I'm currently writing this email). I'm currently compiling 2.6.16.9 and will test with this release later on. I will get some periodic ping running to check for connection failures and put some load on the machine. Will come back with the results later, but don't hold your breath waiting for me, kernel compile takes more than two hours on my box :-) Regards, Sven > > Regards, > Nuri > -- Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 i686 athlon i386 GNU/Linux 07:56:45 up 3 days, 11:30, 2 users, load average: 2.79, 1.32, 0.68 pgpCmDjMRWQQT.pgp Description: PGP signature
Re: Fw: Bug: PPP dropouts in >=2.6.16
no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1. Did you create a high load on the system in the manner I described? The bug once only appeared after about 6 hours here when line + CPU had been mostly idle. But that was the longest time between failures. Can you test with one of the 2.6.16 kernels I tried (latest was .9)? Can't say for sure if CPU load is a factor, load on the connection seems to be. After using 2.6.15.7 for another 5 days now with some more stress testing, I can assure that 2.6.15 definitely does not produce any dropouts on this machine. For now I'll try to reproduce the effects on my second box (AMD64/nf4). I'd be happy if someone could give me some hints on which patches I could try to revert as the changes to ppp between the two versions look fairly harmless. For the first time in 8.5 years, I cannot use a 'stable' kernel release and there is really nothing special about this system. Regards, Nuri - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
On Sat, Apr 22, 2006 at 02:02:59AM +0200, Andi Kleen told us: > On Friday 21 April 2006 19:15, Jesse Brandeburg wrote: > > On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > > We do seem to have had a few reports of ppp regressions around this > > > timeframe. > > > > me too. I couldn't use 2.6.16 at home on my pppoe connected router > > because it was so slow. I didn't have time to debug. I can probably > > try patches and provide more data too. Tell me what is needed. > > I seem to have some trouble on my PPPoE too. But it's not really unusable, > just dropouts now and then. no problems here with pppoe, kernel is 2.6.17-rc1-mm1, ppp 2.4.4-b1. Sven > -Andi > - -- Linux zion.homelinux.com 2.6.17-rc1-mm1_31 #31 Sat Apr 8 16:18:23 CEST 2006 i686 athlon i386 GNU/Linux 09:40:22 up 1 day, 13:14, 4 users, load average: 0.34, 0.16, 0.11 pgpada2CR2yQ1.pgp Description: PGP signature
Re: Fw: Bug: PPP dropouts in >=2.6.16
On Friday 21 April 2006 19:15, Jesse Brandeburg wrote: > On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > We do seem to have had a few reports of ppp regressions around this > > timeframe. > > me too. I couldn't use 2.6.16 at home on my pppoe connected router > because it was so slow. I didn't have time to debug. I can probably > try patches and provide more data too. Tell me what is needed. I seem to have some trouble on my PPPoE too. But it's not really unusable, just dropouts now and then. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
"Jesse Brandeburg" <[EMAIL PROTECTED]> wrote: > > On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > We do seem to have had a few reports of ppp regressions around this > > timeframe. > > me too. I couldn't use 2.6.16 at home on my pppoe connected router > because it was so slow. I didn't have time to debug. I can probably > try patches and provide more data too. Tell me what is needed. probably git-bisect, sorry. It's the sort of thing you can do while reading a good book ;) > Is there a bugzilla on this? I don't think so. Bubgzilla records which I'm folowing which mention ppp are: http://bugzilla.kernel.org/show_bug.cgi?id=5695 http://bugzilla.kernel.org/show_bug.cgi?id=6197 http://bugzilla.kernel.org/show_bug.cgi?id=6402 Perhaps the final one is related. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: Bug: PPP dropouts in >=2.6.16
On 4/21/06, Andrew Morton <[EMAIL PROTECTED]> wrote: > > We do seem to have had a few reports of ppp regressions around this > timeframe. me too. I couldn't use 2.6.16 at home on my pppoe connected router because it was so slow. I didn't have time to debug. I can probably try patches and provide more data too. Tell me what is needed. Is there a bugzilla on this? Jesse - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html