Hi, I have a dedicated SunFire V210 collecting Netflow from 60 Cisco's on my campus, running flow-tools 0.67. Recently the flow-capture processes started logging (very heavily) ftpdu_seq_check() errors, indicating that the sequence numbers on the flow records are not getting processed in the correct order. Here's a sample of the logs (filtered to show records from only one router). This is the raw output, and directory following is a stripped down, more readable version: Jan 26 16:03:35 borg.nyu.edu flow-capture[9428]: [ID 148558 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=58 received=610 lost=552 Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 650602 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=3 received=1 lost=4294967293 Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 467302 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=610 received=6 lost=4294966691 Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 300642 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=958 received=33 lost=4294966370 Jan 26 16:03:36 borg.nyu.edu flow-capture[9428]: [ID 964148 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=291 received=51 lost=4294967055 Jan 26 16:03:37 borg.nyu.edu flow-capture[9428]: [ID 972929 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5 expecting=22 received=78 lost=56 Jan 26 16:03:37 borg.nyu.edu flow-capture[9428]: [ID 393162 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5 expecting=107 received=1 lost=4294967189 Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 874228 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=78 received=341 lost=263 Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 516023 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=6 received=1 lost=4294967290 Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 706140 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.1 dst_ip=0.0.0.0 d_version=5 expecting=602 received=12 lost=4294966705 Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 165033 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5 expecting=44915 received=17 lost=4294922397 Jan 26 16:03:39 borg.nyu.edu flow-capture[9428]: [ID 678267 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5 expecting=24 received=365 lost=341 Jan 26 16:03:41 borg.nyu.edu flow-capture[9428]: [ID 467412 local6.info] ftpdu_seq_check(): src_ip=xxx.xxx.1.35 dst_ip=0.0.0.0 d_version=5 expecting=341 received=771 lost=430
Here's the same data with only the time/expecting/received/lost fields: Jan 26 16:03:35 expecting=58 received=610 lost=552 Jan 26 16:03:36 expecting=3 received=1 lost=4294967293 Jan 26 16:03:36 expecting=610 received=6 lost=4294966691 Jan 26 16:03:36 expecting=958 received=33 lost=4294966370 Jan 26 16:03:36 expecting=291 received=51 lost=4294967055 Jan 26 16:03:37 expecting=22 received=78 lost=56 Jan 26 16:03:37 expecting=107 received=1 lost=4294967189 Jan 26 16:03:39 expecting=78 received=341 lost=263 Jan 26 16:03:39 expecting=6 received=1 lost=4294967290 Jan 26 16:03:39 expecting=602 received=12 lost=4294966705 Jan 26 16:03:39 expecting=44915 received=17 lost=4294922397 Jan 26 16:03:39 expecting=24 received=365 lost=341 Jan 26 16:03:41 expecting=341 received=771 lost=430 It seems that the records are being processed out of order. For example, the first row received seq# 610, which was not expected until 1 second later, 2 rows down. The pattern repeats itself. Also, the collector is showing overflow errors in its "lost" arithmetic. The collector's resources are not overloaded. Also, I'm pretty sure I've ruled out network delays as an explanation. First of all the errors are happening too frequently. Secondly, I'm even getting them from routers that are directly linked to the collector via a single gigE switch, which should have (practically) zero delay. And last, it's happening consistently with almost every router on my network. I'm trying to narrow down the domain of possible explanations, and the more I look at it the more it seems to be a configuration error or bug on the flow-capture side. Has anybody had this experience, or know how to handle the problem? Thanks, Ari Leichtberg _______________________________________________ Flow-tools mailing list [EMAIL PROTECTED] http://mailman.splintered.net/mailman/listinfo/flow-tools
