Re: [lisp] [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))

2014-01-14 Thread l.wood
That's robustness _for the tunnelled traffic_.

Not for anything else sharing the network - that hasn't been instrumented and 
measured.

Lloyd Wood
http://about.me/lloydwood

From: Curtis Villamizar [cur...@ipv6.occnc.com]
Sent: 15 January 2014 03:43
To: Wood L  Dr (Electronic Eng)
Cc: stbry...@cisco.com; w...@mti-systems.com; cur...@ipv6.occnc.com; 
go...@erg.abdn.ac.uk; m...@ietf.org; i...@ietf.org; ra...@psg.com; 
ts...@ietf.org; j...@mit.edu; lisp@ietf.org
Subject: Re: [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: 
gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))

Lloyd,

The part about RFC 6936 section 3.1 most relevant might be:

   There is extensive experience with deployments using tunnel
   protocols in well-managed networks (e.g., corporate networks and
   service provider core networks).  This has shown the robustness of
   methods such as Pseudowire Emulation Edge-to-Edge (PWE3) and MPLS
   that do not employ a transport protocol checksum and that have not
   specified mechanisms to protect from corruption of the unprotected
   headers (such as the VPN Identifier in MPLS).  Reasons for the
   robustness may include:

If the rate of undetected modified packets is extremely low in
"well-managed networks", as we beleive is the case, then UDP checksums
won't change the situration much.

So why *not* make them optional if experience has shown they are not
needed in the types of deployments we are talking about.

Curtis


In message <290e20b455c66743be178c5c84f1240847e6334...@exmb01cms.surrey.ac.uk>
l.w...@surrey.ac.uk writes:
>
> Stewart,
>
> your 'I'm not in tunnel applications' suggests you've misunderstood
> the argument here. The point is not to protect the tunnel traffic
> (which can quite happily checksum itself), it is to protect everything
> else on the network from misdelivery. It's not the tunnel application,
> it's every application sharing the internet with the tunnel which
> has UDP checksums turned off. See all of  RFC 6936 section 3.1.
> Tunnel is fine, sideeffects of misdelivery  do not affect tunnel.
>
> And in IPv4 and IPv6, the pseudo-header checksum built into
> TCP and UDP is all we have. IPv6 deliberately copied v4 here.
>
> > What is the error rate in modern h/w and network systems?
>
> No-one measures end-to-end misdelivery. No-one knows.
>
> Lloyd Wood
> http://about.me/lloydwood
> 
> From: Stewart Bryant [stbry...@cisco.com]
> Sent: 14 January 2014 22:46
> To: Wesley Eddy; Wood L  Dr (Electronic Eng); cur...@ipv6.occnc.com
> Cc: go...@erg.abdn.ac.uk; m...@ietf.org; i...@ietf.org; ra...@psg.com; 
> ts...@ietf.org; j...@mit.edu; lisp@ietf.org
> Subject: Re: [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: 
> gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))
>
> On 14/01/2014 22:07, Wesley Eddy wrote:
> > On 1/14/2014 4:57 PM, l.w...@surrey.ac.uk wrote:
> >> I don't think sayng 'oh, that error source is no longer a problem' 
> >> disproves
> >> Stone's overall point about undetected errors, though the
> >> examples he uses from the technology of the day are necessarily
> >> dated. Dismissing the overall  point because the examples use obsolete
> >> technology is throwing the baby out with the bathwater; a host-to-host
> >> error check catches things that intermediate checks cannot.
> >>
> >> Measuring error rates across end-to-end  Internet traffic is something 
> >> that has
> >> not received much attention , as error detection is not
> >> instrumented well - hence citing Stone's published work,  in the absence
> >> of awareness of anything newer (and as high profile/immediately 
> >> recognisable
> >> as sigcomm) in the area.
> >>
> >
> > +1 ... the message in the paper is applicable to layered systems
> > and internetworks in general.  Changes in the link technology
> > since then don't invalidate it, especially since we know that
> > the technology not only changes rapidly, but also is always
> > growing in diverse directions, such that there things almost
> > universally true today may be turned on their heads tomorrow.
> >
> > Designs for stacking layers need to follow solid general
> > principles in order to be robust to changes (above and below).
> >
> Note that it is not only the link layer technology that has moved on,
> the signal integrity of the h/w at all stages of the design and
> implementation process has moved on.
>
> Can we agree that the statistics in the paper are discredited?
>
> If not, why not?
>
> What is the error rate in modern h/w and network systems?
>
> Is this significant in the application under consideration?
>
> Finally if we are really concerned that we do actually need a
> c/s (I am not in tunnel applications) why are we still happy to
> use what is frankly a pathetic check in modern terms? Why
> for example are we not moving to something like
> the  Fletcher 64 bit c/s?
>
> Stewart
___

Re: [lisp] [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))

2014-01-14 Thread l.wood
Stewart,

your 'I'm not in tunnel applications' suggests you've misunderstood
the argument here. The point is not to protect the tunnel traffic
(which can quite happily checksum itself), it is to protect everything
else on the network from misdelivery. It's not the tunnel application,
it's every application sharing the internet with the tunnel which
has UDP checksums turned off. See all of  RFC 6936 section 3.1.
Tunnel is fine, sideeffects of misdelivery  do not affect tunnel.

And in IPv4 and IPv6, the pseudo-header checksum built into
TCP and UDP is all we have. IPv6 deliberately copied v4 here.

> What is the error rate in modern h/w and network systems?

No-one measures end-to-end misdelivery. No-one knows.

Lloyd Wood
http://about.me/lloydwood

From: Stewart Bryant [stbry...@cisco.com]
Sent: 14 January 2014 22:46
To: Wesley Eddy; Wood L  Dr (Electronic Eng); cur...@ipv6.occnc.com
Cc: go...@erg.abdn.ac.uk; m...@ietf.org; i...@ietf.org; ra...@psg.com; 
ts...@ietf.org; j...@mit.edu; lisp@ietf.org
Subject: Re: [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: 
gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))

On 14/01/2014 22:07, Wesley Eddy wrote:
> On 1/14/2014 4:57 PM, l.w...@surrey.ac.uk wrote:
>> I don't think sayng 'oh, that error source is no longer a problem' disproves
>> Stone's overall point about undetected errors, though the
>> examples he uses from the technology of the day are necessarily
>> dated. Dismissing the overall  point because the examples use obsolete
>> technology is throwing the baby out with the bathwater; a host-to-host
>> error check catches things that intermediate checks cannot.
>>
>> Measuring error rates across end-to-end  Internet traffic is something that 
>> has
>> not received much attention , as error detection is not
>> instrumented well - hence citing Stone's published work,  in the absence
>> of awareness of anything newer (and as high profile/immediately recognisable
>> as sigcomm) in the area.
>>
>
> +1 ... the message in the paper is applicable to layered systems
> and internetworks in general.  Changes in the link technology
> since then don't invalidate it, especially since we know that
> the technology not only changes rapidly, but also is always
> growing in diverse directions, such that there things almost
> universally true today may be turned on their heads tomorrow.
>
> Designs for stacking layers need to follow solid general
> principles in order to be robust to changes (above and below).
>
Note that it is not only the link layer technology that has moved on,
the signal integrity of the h/w at all stages of the design and
implementation process has moved on.

Can we agree that the statistics in the paper are discredited?

If not, why not?

What is the error rate in modern h/w and network systems?

Is this significant in the application under consideration?

Finally if we are really concerned that we do actually need a
c/s (I am not in tunnel applications) why are we still happy to
use what is frankly a pathetic check in modern terms? Why
for example are we not moving to something like
the  Fletcher 64 bit c/s?

Stewart
___
lisp mailing list
lisp@ietf.org
https://www.ietf.org/mailman/listinfo/lisp


Re: [lisp] [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))

2014-01-14 Thread Stewart Bryant

On 14/01/2014 22:07, Wesley Eddy wrote:

On 1/14/2014 4:57 PM, l.w...@surrey.ac.uk wrote:

I don't think sayng 'oh, that error source is no longer a problem' disproves
Stone's overall point about undetected errors, though the
examples he uses from the technology of the day are necessarily
dated. Dismissing the overall  point because the examples use obsolete
technology is throwing the baby out with the bathwater; a host-to-host
error check catches things that intermediate checks cannot.

Measuring error rates across end-to-end  Internet traffic is something that has
not received much attention , as error detection is not
instrumented well - hence citing Stone's published work,  in the absence
of awareness of anything newer (and as high profile/immediately recognisable
as sigcomm) in the area.



+1 ... the message in the paper is applicable to layered systems
and internetworks in general.  Changes in the link technology
since then don't invalidate it, especially since we know that
the technology not only changes rapidly, but also is always
growing in diverse directions, such that there things almost
universally true today may be turned on their heads tomorrow.

Designs for stacking layers need to follow solid general
principles in order to be robust to changes (above and below).


Note that it is not only the link layer technology that has moved on,
the signal integrity of the h/w at all stages of the design and
implementation process has moved on.

Can we agree that the statistics in the paper are discredited?

If not, why not?

What is the error rate in modern h/w and network systems?

Is this significant in the application under consideration?

Finally if we are really concerned that we do actually need a
c/s (I am not in tunnel applications) why are we still happy to
use what is frankly a pathetic check in modern terms? Why
for example are we not moving to something like
the  Fletcher 64 bit c/s?

Stewart
___
lisp mailing list
lisp@ietf.org
https://www.ietf.org/mailman/listinfo/lisp


Re: [lisp] [tsvwg] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: Milestones changed for tsvwg WG))

2014-01-14 Thread Wesley Eddy
On 1/14/2014 4:57 PM, l.w...@surrey.ac.uk wrote:
> I don't think sayng 'oh, that error source is no longer a problem' disproves
> Stone's overall point about undetected errors, though the
> examples he uses from the technology of the day are necessarily
> dated. Dismissing the overall  point because the examples use obsolete
> technology is throwing the baby out with the bathwater; a host-to-host
> error check catches things that intermediate checks cannot.
> 
> Measuring error rates across end-to-end  Internet traffic is something that 
> has
> not received much attention , as error detection is not
> instrumented well - hence citing Stone's published work,  in the absence
> of awareness of anything newer (and as high profile/immediately recognisable
> as sigcomm) in the area.
> 


+1 ... the message in the paper is applicable to layered systems
and internetworks in general.  Changes in the link technology
since then don't invalidate it, especially since we know that
the technology not only changes rapidly, but also is always
growing in diverse directions, such that there things almost
universally true today may be turned on their heads tomorrow.

Designs for stacking layers need to follow solid general
principles in order to be robust to changes (above and below).

-- 
Wes Eddy
MTI Systems
___
lisp mailing list
lisp@ietf.org
https://www.ietf.org/mailman/listinfo/lisp


Re: [lisp] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: [tsvwg] Milestones changed for tsvwg WG))

2014-01-14 Thread l.wood
I don't think sayng 'oh, that error source is no longer a problem' disproves
Stone's overall point about undetected errors, though the
examples he uses from the technology of the day are necessarily
dated. Dismissing the overall  point because the examples use obsolete
technology is throwing the baby out with the bathwater; a host-to-host
error check catches things that intermediate checks cannot.

Measuring error rates across end-to-end  Internet traffic is something that has
not received much attention , as error detection is not
instrumented well - hence citing Stone's published work,  in the absence
of awareness of anything newer (and as high profile/immediately recognisable
as sigcomm) in the area.

Lloyd Wood
http://about.me/lloydwood

From: Stewart Bryant [stbry...@cisco.com]
Sent: 14 January 2014 18:26
To: cur...@ipv6.occnc.com; Wood L  Dr (Electronic Eng)
Cc: go...@erg.abdn.ac.uk; m...@ietf.org; i...@ietf.org; lisp@ietf.org; 
david.bl...@emc.com; ra...@psg.com; ts...@ietf.org; j...@mit.edu
Subject: Re: [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft 
(was: RE: [tsvwg] Milestones changed for tsvwg WG))

I agree the paper is now obsolete.

Stewart


On 14/01/2014 17:06, Curtis Villamizar wrote:
> Lloyd,
>
> Maybe you should reread the paper too before citing it as evidence.
> Check the date on it.  Check the cited causes of errors.
>
> Packet traces from 1998 and 1999 are prehaps not so relevante today,
> particularly wrt error rates due to hosts, router memories, and link
> error rates.  Look at the cited caused of errors and realize that many
> things have changed.
>
> The largest single error in the paper (paerhaps as high as 99.9% of
> the errors) was the ACK-of-FIN bug in Windows NT.  They may have fixed
> that by now.  Other large causes were host hardware and host software
> bugs (sent bad data in the first place).
>
> One fifth of campus errors were caused by two hosts.  This is cited as
> a bug in Mac OS on Powermac 8100, fixed at the time of publication.
>
> A lot of host software errors are discussed, where the checksums are
> sent bad and therefore will pass any router link FCS.
>
> Sending host hardware errors were thought to be in large part caused
> by no data integrity check on host DMA transfers.  I think progress
> has been made on that front.  You can check PCIe.  It has a 32bit CRC
> per lane in the link layer and retransmits on error.
>
> Router memory errors was thought to play a large role in the non-host
> part of the error rate in the paper.  Routers use ECC now.  Going that
> far back they didn't always have parity RAM (and sometimes ran parity
> disabled to avoid parity error reloads).
>
> VJHC software errors?  Anyone still use VJHC?  Those errors should be
> gone.
>
> IP over HDLC over T1 missed a few errors at the link layer in those
> days.  That could be true and occurances dependent on the providers.
> In the paper that was thought to be a very small contributor.
>
> Both HDLC and PPP had an option for 16 bit or 32 bit FCS.  Often 16
> bit was used.  Some HDLC equipment could be and was configured to
> count errors and send the packets on their way on the assumption that
> it was better to count errors and deliver bad packets rather than
> deliver no packet.  Perhaps also to hide packet loss.  Today all of
> the link layers in use have 32 bit FCS and count and toss errored
> packets.  In most equipment all of the memories have ECC and all of
> the buses ECC or FCS if serialized.
>
> It is now 14 years since that paper was published in Signcomm and 16
> years since some of the observations.  Things have changed.  Nice bit
> of nostangia but that paper may no longer be relevant.
>
> Curtis
>
> reference -
> http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf
>
> In message <290e20b455c66743be178c5c84f1240847e6334...@exmb01cms.surrey.ac.uk>
> l.w...@surrey.ac.uk writes:
>> Curtis
>>
>> I suggest reading Stone's work, particularly
>> ''When The CRC and TCP Checksum Disagree'
>> for discussion of corruption.
>>
>> Particularly its conclusions: 'In the internet, that means
>> we are sending large volumes of incorrect data without
>> anyone noticing'.
>>
>> The Layer-2 check is per link, not end-to-end. That matters.
>>
>> The MPLS assumption is that it crosses a link with a frame
>> checksum. Putting MPLS over UDP breaks that assumption.
>>
>> Lloyd Wood
>> http://about.me/lloydwood
>> 
>> From: Curtis Villamizar [cur...@ipv6.occnc.com]
>> Sent: 12 January 2014 18:09
>> To: Wood L  Dr (Electronic Eng)
>> Cc: adr...@olddog.co.uk; ra...@psg.com; go...@erg.abdn.ac.uk; m...@ietf.org; 
>> lisp@ietf.org; i...@ietf.org; david.bl...@emc.com; j...@mit.edu; 
>> ts...@ietf.org
>> Subject: Re: [mpls] draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: 
>> RE: [tsvwg] Milestones changed for tsvwg WG)
>>
>> In message 
>> <290e20b455c66743be178c5c84f1240847e6334...

Re: [lisp] [mpls] OT (was Re: draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: [tsvwg] Milestones changed for tsvwg WG))

2014-01-14 Thread Stewart Bryant


I agree the paper is now obsolete.

Stewart


On 14/01/2014 17:06, Curtis Villamizar wrote:

Lloyd,

Maybe you should reread the paper too before citing it as evidence.
Check the date on it.  Check the cited causes of errors.

Packet traces from 1998 and 1999 are prehaps not so relevante today,
particularly wrt error rates due to hosts, router memories, and link
error rates.  Look at the cited caused of errors and realize that many
things have changed.

The largest single error in the paper (paerhaps as high as 99.9% of
the errors) was the ACK-of-FIN bug in Windows NT.  They may have fixed
that by now.  Other large causes were host hardware and host software
bugs (sent bad data in the first place).

One fifth of campus errors were caused by two hosts.  This is cited as
a bug in Mac OS on Powermac 8100, fixed at the time of publication.

A lot of host software errors are discussed, where the checksums are
sent bad and therefore will pass any router link FCS.

Sending host hardware errors were thought to be in large part caused
by no data integrity check on host DMA transfers.  I think progress
has been made on that front.  You can check PCIe.  It has a 32bit CRC
per lane in the link layer and retransmits on error.

Router memory errors was thought to play a large role in the non-host
part of the error rate in the paper.  Routers use ECC now.  Going that
far back they didn't always have parity RAM (and sometimes ran parity
disabled to avoid parity error reloads).

VJHC software errors?  Anyone still use VJHC?  Those errors should be
gone.

IP over HDLC over T1 missed a few errors at the link layer in those
days.  That could be true and occurances dependent on the providers.
In the paper that was thought to be a very small contributor.

Both HDLC and PPP had an option for 16 bit or 32 bit FCS.  Often 16
bit was used.  Some HDLC equipment could be and was configured to
count errors and send the packets on their way on the assumption that
it was better to count errors and deliver bad packets rather than
deliver no packet.  Perhaps also to hide packet loss.  Today all of
the link layers in use have 32 bit FCS and count and toss errored
packets.  In most equipment all of the memories have ECC and all of
the buses ECC or FCS if serialized.

It is now 14 years since that paper was published in Signcomm and 16
years since some of the observations.  Things have changed.  Nice bit
of nostangia but that paper may no longer be relevant.

Curtis

reference -
http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf

In message <290e20b455c66743be178c5c84f1240847e6334...@exmb01cms.surrey.ac.uk>
l.w...@surrey.ac.uk writes:

Curtis
  
I suggest reading Stone's work, particularly

''When The CRC and TCP Checksum Disagree'
for discussion of corruption.
  
Particularly its conclusions: 'In the internet, that means

we are sending large volumes of incorrect data without
anyone noticing'.
  
The Layer-2 check is per link, not end-to-end. That matters.
  
The MPLS assumption is that it crosses a link with a frame

checksum. Putting MPLS over UDP breaks that assumption.
  
Lloyd Wood

http://about.me/lloydwood

From: Curtis Villamizar [cur...@ipv6.occnc.com]
Sent: 12 January 2014 18:09
To: Wood L  Dr (Electronic Eng)
Cc: adr...@olddog.co.uk; ra...@psg.com; go...@erg.abdn.ac.uk; m...@ietf.org; 
lisp@ietf.org; i...@ietf.org; david.bl...@emc.com; j...@mit.edu; ts...@ietf.org
Subject: Re: [mpls] draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: 
[tsvwg] Milestones changed for tsvwg WG)
  
In message <290e20b455c66743be178c5c84f1240847e6334...@exmb01cms.surrey.ac.uk>

l.w...@surrey.ac.uk writes:
  

On nested checksums, the question is how they are nested; it's a matter
of scope. With a bunch of checksums checking only a payload and any
inner checksums like Russian Matryoshka dolls, the end-to-end argument
tells us that for reliable receipt of the payload, only the innermost checksum
matters.

But here, we are not solely checking the payload, but information on how to
deliver and identify that payload - and while an outer Ethernet CRC is across
the last link, the UDP checksum, though weak, provides a check on the IP
addresses and UDP ports (via the  pseudoheader check) and MPLS stack
from UDP/IP source to UDP/IP destination (and the payload, which is the bit
everyone focuses on as the performance hit as redundant and a processing
cost when the payload has its own check, and the bit that UDP-Lite can leave 
out).

Nothing else checks that scope. The scope is wider, and affects the network
as a whole. Errors in these unchecked fields lead to misdirection and lead to
misdelivery. Or pollution of other ports.

The MPLS assumption is that it's protected and checked by a strong link CRC like
Ethernet, and checked/regenerated by stack processing between hops; here,
in a path context, with zero UDP checksums MPLS has no checking at all.
  
That UDP would be running over IP over

Re: [lisp] [mpls] draft-ietf-mpls-in-udp was RE: gre-in-udp draft (was: RE: [tsvwg] Milestones changed for tsvwg WG)

2014-01-14 Thread Stewart Bryant

Lloyd

I have just read the Stone paper and I have some significant
concerns about its validity with modern h/w. Certainly it
is hard to credit the notion that the  error rate
is in the range 1:1000 to 1:32000 as reported by the authors.

The paper was written in 2000 with hardware that would have
have been designed in the mid 1990s. In that era, h/w was
far more marginal, with performance traded against signal
integrity, and indeed a lot less signal integrity measurement
and simulation took place at both board and chip level. This
was also the era where metastability was just beginning to
become widely understood, and its lack of understanding
would be a possible source of DMA errors.

I therefore do not think we should place much reliance
on this paper, but should instead look at the rather more
modern statistics.

Such statistics ought to be readily available by looking at
the tcp/udp c/s error stats in hosts and routers. As a tiny
and perhaps erroneous sample I looked at three Macs
in the office here and the tcp c/s error stats were
313/29144518, 0/300, 0/500. Only one of those
three systems got within a factor of 3 of the lowest error
rate reported by Stone.

Bottom line, it seems that we could use more recent data
and then an understanding of how important these low
background error rates are in the tunneling application that
we are considering here.

- Stewart










___
lisp mailing list
lisp@ietf.org
https://www.ietf.org/mailman/listinfo/lisp