""Howard C. Berkowitz""  wrote in message
news:200210311445.OAA31654@;groupstudy.com...
> At 10:22 AM +0000 10/31/02, Nigel Taylor wrote:
>
> There are several problems with using timestamped measurement in the
> router itself.  Some of these may be reduced with IPv6, but, for
> others, external passive hardware or special router hardware seems
> necessary.  See our BGP convergence drafts,

Clock synchronization and time skew are problems related to all
cross-system, cross-network measurement.  In the SNMP world, this is
generally avoided by utilizing a few different methods (usually combined):
1) External NTP or GPS time synchronization (or internal atomic clocks)
2) Polling sysUpTime or ifCounterDiscontinuityTime
3) Polling the above variables with GET-BULK's at the same time the
    other variables are polled

If using external sources for time synchronization, you also have to
take into account things like jitter (which you wouldn't necessarily
have to account for in many applications like TCP/HTTP).

An extra problem with time skew is the rate at which the user-level
or kernel-level timestamps are created (e.g. when sending test packets).
The kernel-level clock granularities on PC (x86) computers cause at
least 10ms jitter, while NTP adjustment for VCO gain require 1 ms
(see RFC 1305 for more details or Google Groups below:
http://groups.google.com/groups?as_ugroup=comp.protocols.time.ntp ).

One way to overcome this problem is done with a program like
rude (http://rude.sourceforge.net/), which dispatches packets with
a pecision of 1us.  Normally, user-level timestamps are less accurate
than even kernel-level timestamps, but in this case they are not ;>

> First, routers may not give sufficient precision in measurement,
> because they rate-limit ICMP to protect against ICMP floods, or
> simply don't prioritize it highly.  I mention IPv6 because
> authenticated source addresses may be used without fear of denial of
> service.

Rate-limiting is actually default in some versions of IOS for ICMP, add
into things like SPD (selective packet discard) and the primary fact that
ICMP takes a different path in software/hardware (every hardware has
different ways of doing "life-of-a-packet")... all this makes ICMP fairly
useless as an accurate measurement.

There are also problems with ECMP and ICMP (I love how the two
terms are totally unrelated otherwise), and traceroute doesn't like ECMP
much either.

Sort of sad that after 18 years or so, VJ's tool "traceroute" is still the
only
game in town (please don't mention things like mtr or visualpulse, which
effectively are the same thing).

> Second, the router may or may not have the capacity to capture and
> store a statistically valid amount of data. NetFlow data export, for
> example, summarizes to a degree. If you could shoot debug to syslog,
> you'd have a much better chance as long as the router could keep up
> with it, using something like a SPAN port.

NetFlow, Cisco debugs (yes, if possible and generally not), packet
capture infrastructure (better to use taps than SPAN's for similar timing
reasons), and other tools are great for measurement.  It's good to have
other options.  Using optical/copper taps along with a FreeBSD box
running tcpdump (http://www.tcpdump.org/) and tcpdump contrib tools
like tcptrace (http://www.tcpdump.org/related.html) are really useful for
determining things like TCP Goodput, etc.  These can generally be more
accurate than ping/traceroute for determining performance problems, but
only if you are skilled with ARP/IP/TCP/HTTP capture output and
network/server hardware (e.g. routers, switches, NIC's, drivers) and/or
TCP/IP stacks (sendspace/recvspace, socket buffers, maxconn's, mss
max/avg/min, retransmissions, retransmission timers, SACK, window
sizes, et al).  Commercial products can sometimes be substituted (e.g.
Niksun NetVCR or Finisar/Shomiti Surveyor - which are sort of like
NAI SnifferPro, except they work).

Generally, network problems in today's network (particularly problems
with HTTP applications) generally revolve only around a few things
inside the networks (at the server or client end):
1) Server/Client NIC problems
2) TCP/IP stack problems (visible by zero window sizes / window
    resets or too many retransmissions)
3) Layer 2 Spanning-Tree or HSRP/VRRP switch/router problems
4) Layer 3 BGP/OSPF/EIGRP convergence router problems
5) Layer 3 CEF or forwarding router problems
6) ACL or firewall problems (handling high pps or number of sessions)

Even more rarely, they happen outside of the network (on the Internet):
1) ISP rolling reboots during maintenance or unscheduled outage
2) ISP peering issues generally caused by congestion between two peers
3) ISP Layer 3 BGP/OSPF/ISIS/MPLS convergence router problems
4) ISP Layer 3 CEF or forwarding router problems

Sometimes they aren't network-related but instead browser (i.e.
application) or server-related:
1) Mis-configured or "bad-state" client or server (including proxies
    and caches which could be "on the network" somewhere in between
    the browser or http server)
2) Error ocurrs between chair and keyboard (user or sysadmin)

The main problem is that ping/traceroute OFTEN do not identify
these problems -- and require a deep understanding of many of the
concepts mentioned in this email to understand why they don't.

People in the security world are increasingly becoming aware of
"false positives" from NIDS tools.  However, this concept has not
carried over into network performance measurement.  Many users
today are aware of ping/traceroute and the fact that sometimes they
are correct and sometimes not (most network engineers fall into this
same category as well actually), but nobody has done anything about
it, and there is no real documentation on the problem.  Even the
measurement experts don't completely understand the problem and
are willing to admit/document where they lack consistency/information:
http://www.icir.org/vern/talks/vp-nrdm01.ps.gz

> http://www.ietf.org/internet-drafts/draft-ietf-bmwg-conterm-03.txt
>
> There is something wrong with the second one. I'll have to check on
Monday.

Try:
http://www.watersprings.org/pub/id/draft-ietf-bmwg-bgpbas-01.txt
I believe it recently expired.

-dre




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=56625&t=56560
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

Reply via email to