Re: pppoe repeated disconnects

2013-03-03 Thread Pim van Pelt
Hoi once more,

Closure on this.

2013/2/24 Stuart Henderson s...@spacehopper.org:
 On 2013-02-24, Pim van Pelt p...@ipng.nl wrote:
 .. , or the DSL has untrained

 It does seem like the line could well be dropping out then.
.. and that was it. I have since replaced the DSL modem with a
Fritz!Box 7390 and the line has been up for 6 days without pretty much
a single packet dropped. As a bonus, it now trains at 50/10 where it
used to be 30/6 or something significantly weaker. That is sufficient
for me to prove that (a) the OpenBSD pppoe(4) and pppoe(8) setups were
fine, and (b) DSL modem vendors make a difference :)

Thanks Stuart, for helping me think this through, and others on this
list for lurking.

groet,
Pim
-- 
Pim van Pelt p...@ipng.nl
PBVP1-RIPE - http://www.ipng.nl/



Re: pppoe repeated disconnects

2013-02-24 Thread Pim van Pelt
Hoi,

Thanks for helping me think about this, Stu.

2013/2/22 Stuart Henderson s...@spacehopper.org:
 At 00:27:30.706636 I see ingress to .86, which is replied at
 00:27:30.708050 egress. Then, ingress halts. I send three echos at
 00:27:52.564297, 00:28:02.563980 and 00:28:12.563665, and then give up
 at 00:28:22.563420 and send a Terminate-Request. I then send three
 Initiation requests, which are answered finally at 00:29:27.771647 by
 the ISP. The connection re-establishes and the cycle repeats.

 Your log only shows the last incoming packet and a few unanswered
 outgoing packets, so it's not enough to tell, does the ISP reply to
 any of your earlier LCP echo requests or does it ignore all of them?
You are right, I only pasted the tail end of the data, but I do have
the whole file still:
http://www.ipng.nl/pppoe-tcpdump/pppoe4.tcpdump (kernel)
http://www.ipng.nl/pppoe-tcpdump/pppoe8.tcpdump (userland)

In the call I was referring to above (pppoe4.tcpdump, the kernel
version), a whole session lasts from 00:25:32.588726 to
00:28:22.583343.
I took another look and it seems that my side does not send LCP echos,
but it does reply to them. Typically, the ISP (mac ending in 8d:20)
sends LCP echos every 10s, and my side (mac ending in 79:79) responds
to them, for example:
00:26:33.656302 00:90:1a:a4:8d:20 00:60:e0:53:79:79 8864 60:
PPPoE-Session code Session, version 1, type 1, id 0x124a, length 14
LCP: Echo-Request, Magic-Number=373307203
00:26:33.656321 00:60:e0:53:79:79 00:90:1a:a4:8d:20 8864 34:
PPPoE-Session code Session, version 1, type 1, id 0x124a, length 14
LCP: Echo-Reply, Magic-Number=1979569634

You mentioned that I would/should be sending echos as well, but my
side definitely does not do that every N seconds. It does, however,
when the connection did not see packets for a while, for example at
00:24:42.570312, 00:24:52.569996, 00:25:02.569679 (none responded by
ISP), and it then sends terminate-request at 00:25:12.569442. I
counted the Echo-Requests, I saw 20 from them, and replied 20 times. I
sent 6 to them, they replied 0 times. In the userland pppoe(8) I also
saw 20 of them, replied 20 times. I sent 0 to them, they replied 0
times. But I think it's relevant to note that I only sent LCP
Echo-Requests just before the line was disconnected (The one I
mentioned before is 00:24:42.570312 but there's another example at
00:27:52.564297) - they could very well not be replied because the ISP
stopped forwarding traffic to me, or the DSL has untrained (i mean,
while the LCPs are are being sent, I am also not seeing any IP traffic
coming in, whlie I have a ping running in the background, I should be
seeing 2 packets per second, for 40secs I see nothing and then see the
line disconnect).

 Normally if there is a period of 30 seconds with either no LCP
 keepalives or no user data, pppoe(4) will drop the session and log
 LCP keepalive timeout, which it looks like should show up at the
 bottom of dmesg output.
Confirmed. I can see those 3 LCP packets with 10s interval, and then
my side sending Terminate-Request and immediately thereafter
Terminate.

 So, if they do normally respond to LCP, it looks like the link
 is dropping out somehow..
I don't think we can conclude if they normally respond to LCP or not,
because at it seems I am not normally sending LCP Echos .. and I don't
see how to set this in ifconfig(8) or sppp(4) or pppoe(4). Do you have
any idea why my side is not sending LCP echos, or how I can configure
it to?

 Or, if they don't normally respond to LCP, you may either need some
 other way to generate data to keep the session alive, disable the
 timeout (by hacking the driver), or ideally see if they can start
 responding (easier if it's an ISP compliant with xkcd 806 ;)
There's ingress and egress data ~always. In my examples, it's a
running ping from internet to me. The ping just times out after a
while, for up to a minute or so while my side is sending (sometimes
multiple!) Initiation frames, for example 00:28:22.583356,
00:28:27.583185, 00:29:27.581285. Then the ISP responds with an Offer
and the cycle restarts.

The only other thing I can think of is that actually my DSL modem is
dropping the connection. I should easily be able to find a pocket of
downtime and inspect the DSL line status because the LCP keepalive
timeouts occur more than 100 times per day (there can be 8 hours
between them, or as little as 3 minutes) -
http://pastebin.com/1VikG019

groet,
Pim

-- 
Pim van Pelt p...@ipng.nl
PBVP1-RIPE - http://www.ipng.nl/



Re: pppoe repeated disconnects

2013-02-24 Thread Stuart Henderson
On 2013-02-24, Pim van Pelt p...@ipng.nl wrote:
 Your log only shows the last incoming packet and a few unanswered
 outgoing packets, so it's not enough to tell, does the ISP reply to
 any of your earlier LCP echo requests or does it ignore all of them?
 You are right, I only pasted the tail end of the data, but I do have
 the whole file still:
 http://www.ipng.nl/pppoe-tcpdump/pppoe4.tcpdump (kernel)
 http://www.ipng.nl/pppoe-tcpdump/pppoe8.tcpdump (userland)

 In the call I was referring to above (pppoe4.tcpdump, the kernel
 version), a whole session lasts from 00:25:32.588726 to
 00:28:22.583343.
 I took another look and it seems that my side does not send LCP echos,
 but it does reply to them.

Ah... this might indeed be normal then, the line I was looking at has
echoes every second sent by my ISP (http://aa.net.uk/kb-broadband-cqm.html)
so I wouldn't have noticed whether my side was sending them (I was looking
at tcpdump on the pppoe interface where it's not clear who sends them).
I suspect I was wrong then and we only send them if the line is quiet.

 You mentioned that I would/should be sending echos as well, but my
 side definitely does not do that every N seconds. It does, however,
 when the connection did not see packets for a while, for example at
 00:24:42.570312, 00:24:52.569996, 00:25:02.569679 (none responded by
 ISP), and it then sends terminate-request at 00:25:12.569442. I
 counted the Echo-Requests, I saw 20 from them, and replied 20 times. I
 sent 6 to them, they replied 0 times. In the userland pppoe(8) I also
 saw 20 of them, replied 20 times. I sent 0 to them, they replied 0
 times. But I think it's relevant to note that I only sent LCP
 Echo-Requests just before the line was disconnected (The one I
 mentioned before is 00:24:42.570312 but there's another example at
 00:27:52.564297) - they could very well not be replied because the ISP
 stopped forwarding traffic to me, or the DSL has untrained (i mean,
 while the LCPs are are being sent, I am also not seeing any IP traffic
 coming in, whlie I have a ping running in the background, I should be
 seeing 2 packets per second, for 40secs I see nothing and then see the
 line disconnect).

It does seem like the line could well be dropping out then.
Are you able to setup syslog on the DSL modem and get the logs sent to
the box running pppoe? Having the two logs together would make it
easier to correlate any problems logged by the modem with those logged
by pppoe. (it will need 'syslog_flags=-u' in rc.conf.local to accept
from the network, and /etc/rc.d/syslogd restart). Maybe you could also
leave a ping running to the modem to spot if there's a problem with
the ethernet link or something?



pppoe repeated disconnects

2013-02-21 Thread Pim van Pelt
Hoi,

I have recently switched from a Zyxel modem terminated pppoe
connection, to an OpenBSD based termination of the pppoe connection. I
think I have a reasonable configuration, and both kernel as well as
userland pppoe configs connect fine, but after a few minutes the
ingress traffic halts and the connection drops. I am running 5.2, with
the following /etc/ppp/ppp.conf ([1] for userland) and
/etc/hostname.pppoe0 ([2] for kernel). Both exhibit the same issue.
When the connection establishes, I have the following device:
$ ifconfig pppoe0
pppoe0: flags=8851UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST mtu 1492
priority: 0
dev: vlan5 state: session
sid: 0x1268 PADI retries: 8 PADR retries: 0 time: 00:04:02
sppp: phase network authproto chap authname u280...@dsl.green.ch
groups: pppoe
status: active
inet6 fe80::260:e0ff:fe53:7978%pppoe0 -  prefixlen 64 scopeid 0x1e
inet  81.6.62.85 -- 80.254.161.242 netmask 0x

The ISP routes me 81.6.62.84/30 and when the connection is up, this
works (I can ping all 4 IP addresses). There's a netscreen at
81.6.62.86 behind this machine and it is responding to ping's from the
internet just fine. A few minutes later, traffic stops flowing and
about one minute later, pppoe0 disconnects and then reconnects a few
minutes later, to rinse and repeat. So I tcpdump'ed the kernel pppoe,
and noticed the following snippet (see [3] for details). I am MAC
00:60:e0:53:79:79 , the ISP is MAC 00:90:1a:a4:8d:20.

At 00:27:30.706636 I see ingress to .86, which is replied at
00:27:30.708050 egress. Then, ingress halts. I send three echos at
00:27:52.564297, 00:28:02.563980 and 00:28:12.563665, and then give up
at 00:28:22.563420 and send a Terminate-Request. I then send three
Initiation requests, which are answered finally at 00:29:27.771647 by
the ISP. The connection re-establishes and the cycle repeats.

When I try the same thing using userland ppp (to benefit from
debugging), things look pretty good to me, but also after a few
minutes the physical link goes silent and I disconnect. The logs[4]
show that the connection establishes (and on the ppp command line,
'show ipcp' and 'show lcp' look fine to me), but the problem persists.

I've read up on ppp/pppd/pppoe for both userspace and kernel, but I'm
stumped and at this point I believe the problem is with the ISP (I say
this because traffic stops flowing while I'm sending LCP echos, and I
always reply the ISPs echos, but after a few failed echos pppoe on my
machine gives up). Can somebody please confirm this before I open a
problem ticket with my ISP, or help me find a way to gather more
information to diagnose this issue?

Kind regards,
Pim/ Zurich



[1] cat /etc/ppp/ppp.conf
default:
  set log Phase Chat LCP IPCP CCP tun command
  set speed sync
  disable acfcomp protocomp shortseq vjcomp
  deny acfcomp

green:
  set device !/usr/sbin/pppoe -i vlan5
  set mtu max 1492
  set mru max 1492
  set authname x...@dsl.green.ch
  set authkey Y
  set ifaddr 0.0.0.0 0.0.0.0
  disable echo lqr
  deny lqr

[2] cat /etc/hostname.pppoe0
inet 0.0.0.0 255.255.255.255 NONE pppoedev vlan5 authproto chap
authname 'x...@dsl.green.ch' authkey 'Y' up
dest 0.0.0.1

[3] http://pastebin.com/ph6xrNaW

[4] http://pastebin.com/grifHJ41
-- 
Pim van Pelt p...@ipng.nl
PBVP1-RIPE - http://www.ipng.nl/



Re: pppoe repeated disconnects

2013-02-21 Thread Stuart Henderson
On 2013-02-21, Pim van Pelt p...@ipng.nl wrote:
 Hoi,

 I have recently switched from a Zyxel modem terminated pppoe
 connection, to an OpenBSD based termination of the pppoe connection. I
 think I have a reasonable configuration, and both kernel as well as
 userland pppoe configs connect fine, but after a few minutes the
 ingress traffic halts and the connection drops. I am running 5.2, with
 the following /etc/ppp/ppp.conf ([1] for userland) and
 /etc/hostname.pppoe0 ([2] for kernel). Both exhibit the same issue.
 When the connection establishes, I have the following device:
 $ ifconfig pppoe0
 pppoe0: flags=8851UP,POINTOPOINT,RUNNING,SIMPLEX,MULTICAST mtu 1492
 priority: 0
 dev: vlan5 state: session
 sid: 0x1268 PADI retries: 8 PADR retries: 0 time: 00:04:02
 sppp: phase network authproto chap authname u280...@dsl.green.ch
 groups: pppoe
 status: active
 inet6 fe80::260:e0ff:fe53:7978%pppoe0 -  prefixlen 64 scopeid 0x1e
 inet  81.6.62.85 -- 80.254.161.242 netmask 0x

 The ISP routes me 81.6.62.84/30 and when the connection is up, this
 works (I can ping all 4 IP addresses). There's a netscreen at
 81.6.62.86 behind this machine and it is responding to ping's from the
 internet just fine. A few minutes later, traffic stops flowing and
 about one minute later, pppoe0 disconnects and then reconnects a few
 minutes later, to rinse and repeat. So I tcpdump'ed the kernel pppoe,
 and noticed the following snippet (see [3] for details). I am MAC
 00:60:e0:53:79:79 , the ISP is MAC 00:90:1a:a4:8d:20.

 At 00:27:30.706636 I see ingress to .86, which is replied at
 00:27:30.708050 egress. Then, ingress halts. I send three echos at
 00:27:52.564297, 00:28:02.563980 and 00:28:12.563665, and then give up
 at 00:28:22.563420 and send a Terminate-Request. I then send three
 Initiation requests, which are answered finally at 00:29:27.771647 by
 the ISP. The connection re-establishes and the cycle repeats.

Your log only shows the last incoming packet and a few unanswered
outgoing packets, so it's not enough to tell, does the ISP reply to
any of your earlier LCP echo requests or does it ignore all of them?

Normally if there is a period of 30 seconds with either no LCP
keepalives or no user data, pppoe(4) will drop the session and log
LCP keepalive timeout, which it looks like should show up at the
bottom of dmesg output.

So, if they do normally respond to LCP, it looks like the link
is dropping out somehow..

Or, if they don't normally respond to LCP, you may either need some
other way to generate data to keep the session alive, disable the
timeout (by hacking the driver), or ideally see if they can start
responding (easier if it's an ISP compliant with xkcd 806 ;)

 I've read up on ppp/pppd/pppoe for both userspace and kernel, but I'm
 stumped and at this point I believe the problem is with the ISP (I say
 this because traffic stops flowing while I'm sending LCP echos, and I
 always reply the ISPs echos, but after a few failed echos pppoe on my
 machine gives up). Can somebody please confirm this before I open a
 problem ticket with my ISP, or help me find a way to gather more
 information to diagnose this issue?

 [1] cat /etc/ppp/ppp.conf
 default:
   set log Phase Chat LCP IPCP CCP tun command
   set speed sync
   disable acfcomp protocomp shortseq vjcomp
   deny acfcomp

 green:
   set device !/usr/sbin/pppoe -i vlan5
   set mtu max 1492
   set mru max 1492
   set authname x...@dsl.green.ch
   set authkey Y
   set ifaddr 0.0.0.0 0.0.0.0
   disable echo lqr
   deny lqr

I haven't really used iij ppp since around 1998 so not too sure about
syntax here (though it seems ok).. but

 [2] cat /etc/hostname.pppoe0
 inet 0.0.0.0 255.255.255.255 NONE pppoedev vlan5 authproto chap
 authname 'x...@dsl.green.ch' authkey 'Y' up
 dest 0.0.0.1

this definitely looks right to me, and I have a bunch of pppoe(4)
running over vlans, so there's no problem in that respect.

 [3] http://pastebin.com/ph6xrNaW

 [4] http://pastebin.com/grifHJ41