""Howard C. Berkowitz"" wrote in message news:200210311445.OAA31654@;groupstudy.com... > At 10:22 AM +0000 10/31/02, Nigel Taylor wrote: > > There are several problems with using timestamped measurement in the > router itself. Some of these may be reduced with IPv6, but, for > others, external passive hardware or special router hardware seems > necessary. See our BGP convergence drafts,
Clock synchronization and time skew are problems related to all cross-system, cross-network measurement. In the SNMP world, this is generally avoided by utilizing a few different methods (usually combined): 1) External NTP or GPS time synchronization (or internal atomic clocks) 2) Polling sysUpTime or ifCounterDiscontinuityTime 3) Polling the above variables with GET-BULK's at the same time the other variables are polled If using external sources for time synchronization, you also have to take into account things like jitter (which you wouldn't necessarily have to account for in many applications like TCP/HTTP). An extra problem with time skew is the rate at which the user-level or kernel-level timestamps are created (e.g. when sending test packets). The kernel-level clock granularities on PC (x86) computers cause at least 10ms jitter, while NTP adjustment for VCO gain require 1 ms (see RFC 1305 for more details or Google Groups below: http://groups.google.com/groups?as_ugroup=comp.protocols.time.ntp ). One way to overcome this problem is done with a program like rude (http://rude.sourceforge.net/), which dispatches packets with a pecision of 1us. Normally, user-level timestamps are less accurate than even kernel-level timestamps, but in this case they are not ;> > First, routers may not give sufficient precision in measurement, > because they rate-limit ICMP to protect against ICMP floods, or > simply don't prioritize it highly. I mention IPv6 because > authenticated source addresses may be used without fear of denial of > service. Rate-limiting is actually default in some versions of IOS for ICMP, add into things like SPD (selective packet discard) and the primary fact that ICMP takes a different path in software/hardware (every hardware has different ways of doing "life-of-a-packet")... all this makes ICMP fairly useless as an accurate measurement. There are also problems with ECMP and ICMP (I love how the two terms are totally unrelated otherwise), and traceroute doesn't like ECMP much either. Sort of sad that after 18 years or so, VJ's tool "traceroute" is still the only game in town (please don't mention things like mtr or visualpulse, which effectively are the same thing). > Second, the router may or may not have the capacity to capture and > store a statistically valid amount of data. NetFlow data export, for > example, summarizes to a degree. If you could shoot debug to syslog, > you'd have a much better chance as long as the router could keep up > with it, using something like a SPAN port. NetFlow, Cisco debugs (yes, if possible and generally not), packet capture infrastructure (better to use taps than SPAN's for similar timing reasons), and other tools are great for measurement. It's good to have other options. Using optical/copper taps along with a FreeBSD box running tcpdump (http://www.tcpdump.org/) and tcpdump contrib tools like tcptrace (http://www.tcpdump.org/related.html) are really useful for determining things like TCP Goodput, etc. These can generally be more accurate than ping/traceroute for determining performance problems, but only if you are skilled with ARP/IP/TCP/HTTP capture output and network/server hardware (e.g. routers, switches, NIC's, drivers) and/or TCP/IP stacks (sendspace/recvspace, socket buffers, maxconn's, mss max/avg/min, retransmissions, retransmission timers, SACK, window sizes, et al). Commercial products can sometimes be substituted (e.g. Niksun NetVCR or Finisar/Shomiti Surveyor - which are sort of like NAI SnifferPro, except they work). Generally, network problems in today's network (particularly problems with HTTP applications) generally revolve only around a few things inside the networks (at the server or client end): 1) Server/Client NIC problems 2) TCP/IP stack problems (visible by zero window sizes / window resets or too many retransmissions) 3) Layer 2 Spanning-Tree or HSRP/VRRP switch/router problems 4) Layer 3 BGP/OSPF/EIGRP convergence router problems 5) Layer 3 CEF or forwarding router problems 6) ACL or firewall problems (handling high pps or number of sessions) Even more rarely, they happen outside of the network (on the Internet): 1) ISP rolling reboots during maintenance or unscheduled outage 2) ISP peering issues generally caused by congestion between two peers 3) ISP Layer 3 BGP/OSPF/ISIS/MPLS convergence router problems 4) ISP Layer 3 CEF or forwarding router problems Sometimes they aren't network-related but instead browser (i.e. application) or server-related: 1) Mis-configured or "bad-state" client or server (including proxies and caches which could be "on the network" somewhere in between the browser or http server) 2) Error ocurrs between chair and keyboard (user or sysadmin) The main problem is that ping/traceroute OFTEN do not identify these problems -- and require a deep understanding of many of the concepts mentioned in this email to understand why they don't. People in the security world are increasingly becoming aware of "false positives" from NIDS tools. However, this concept has not carried over into network performance measurement. Many users today are aware of ping/traceroute and the fact that sometimes they are correct and sometimes not (most network engineers fall into this same category as well actually), but nobody has done anything about it, and there is no real documentation on the problem. Even the measurement experts don't completely understand the problem and are willing to admit/document where they lack consistency/information: http://www.icir.org/vern/talks/vp-nrdm01.ps.gz > http://www.ietf.org/internet-drafts/draft-ietf-bmwg-conterm-03.txt > > There is something wrong with the second one. I'll have to check on Monday. Try: http://www.watersprings.org/pub/id/draft-ietf-bmwg-bgpbas-01.txt I believe it recently expired. -dre Message Posted at: http://www.groupstudy.com/form/read.php?f=7&i=56625&t=56560 -------------------------------------------------- FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]