Just following on from this, I'm trying to narrow down where the issue is. This is my understanding of how this all works.
== Flow-capture collects Netflow raw data from the Routers. Flowscan is launched every 5 mins to process this data from the tmp file created from the above process. Here's an example of what I'm seeing in the logs. 2008/02/22 15:25:31 flowscan-1.020 CUFlow: Cflow::find took 4 wallclock secs ( 3.69 usr + 0.00 sys = 3.69 CPU) for 642543 flow file bytes, flow hit ratio: 34353/35700 2008/02/22 15:30:08 flowscan-1.020 CUFlow: Cflow::find took 7 wallclock secs ( 6.55 usr + 0.00 sys = 6.55 CPU) for 700461 flow file bytes, flow hit ratio: 37792/39240 2008/02/22 15:35:15 flowscan-1.020 CUFlow: Cflow::find took 6 wallclock secs ( 6.07 usr + 0.01 sys = 6.08 CPU) for 638391 flow file bytes, flow hit ratio: 33862/35460 2008/02/22 15:40:18 flowscan-1.020 CUFlow: Cflow::find took 3 wallclock secs ( 3.19 usr + 0.00 sys = 3.19 CPU) for 568464 flow file bytes, flow hit ratio: 29665/31170 2008/02/22 15:45:22 flowscan-1.020 CUFlow: Cflow::find took 3 wallclock secs ( 3.23 usr + 0.00 sys = 3.23 CPU) for 580433 flow file bytes, flow hit ratio: 29981/31620 2008/02/22 15:50:26 flowscan-1.020 CUFlow: Cflow::find took 4 wallclock secs ( 3.41 usr + 0.00 sys = 3.41 CPU) for 611599 flow file bytes, flow hit ratio: 31890/33360 2008/02/22 15:55:30 flowscan-1.020 CUFlow: Cflow::find took 4 wallclock secs ( 3.76 usr + 0.01 sys = 3.77 CPU) for 661072 flow file bytes, flow hit ratio: 34889/36630 2008/02/22 16:00:06 flowscan-1.020 CUFlow: Cflow::find took 6 wallclock secs ( 6.01 usr + 0.00 sys = 6.01 CPU) for 640229 flow file bytes, flow hit ratio: 33088/34980 2008/02/22 16:05:11 flowscan-1.020 CUFlow: Cflow::find took 5 wallclock secs ( 4.66 usr + 0.01 sys = 4.67 CPU) for 668444 flow file bytes, flow hit ratio: 41535/42930 2008/02/22 16:10:17 flowscan-1.020 CUFlow: Cflow::find took 5 wallclock secs ( 5.50 usr + 0.01 sys = 5.51 CPU) for 586241 flow file bytes, flow hit ratio: 30329/31800 2008/02/22 16:15:27 flowscan-1.020 CUFlow: Cflow::find took 9 wallclock secs ( 5.31 usr + 0.01 sys = 5.32 CPU) for 575349 flow file bytes, flow hit ratio: 29203/30780 2008/02/22 16:20:38 flowscan-1.020 CUFlow: Cflow::find took 11 wallclock secs ( 5.82 usr + 0.00 sys = 5.82 CPU) for 626396 flow file bytes, flow hit ratio: 32176/33780 2008/02/22 16:25:20 flowscan-1.020 CUFlow: Cflow::find took 11 wallclock secs ( 6.08 usr + 0.02 sys = 6.10 CPU) for 645224 flow file bytes, flow hit ratio: 33314/34920 2008/02/22 16:30:36 flowscan-1.020 CUFlow: Cflow::find took 15 wallclock secs ( 6.12 usr + 0.01 sys = 6.13 CPU) for 644156 flow file bytes, flow hit ratio: 33146/34710 2008/02/22 16:35:25 flowscan-1.020 CUFlow: Cflow::find took 19 wallclock secs ( 7.62 usr + 0.01 sys = 7.63 CPU) for 769207 flow file bytes, flow hit ratio: 40907/42600 2008/02/22 16:40:47 flowscan-1.020 CUFlow: Cflow::find took 21 wallclock secs ( 8.37 usr + 0.00 sys = 8.37 CPU) for 841025 flow file bytes, flow hit ratio: 44987/46800 2008/02/22 16:45:43 flowscan-1.020 CUFlow: Cflow::find took 25 wallclock secs ( 7.70 usr + 0.00 sys = 7.70 CPU) for 795074 flow file bytes, flow hit ratio: 42475/44610 2008/02/22 16:50:35 flowscan-1.020 CUFlow: Cflow::find took 20 wallclock secs ( 6.97 usr + 0.01 sys = 6.98 CPU) for 744377 flow file bytes, flow hit ratio: 38356/40560 2008/02/22 16:55:27 flowscan-1.020 CUFlow: Cflow::find took 21 wallclock secs ( 7.60 usr + 0.01 sys = 7.61 CPU) for 802334 flow file bytes, flow hit ratio: 41759/43890 2008/02/22 17:00:53 flowscan-1.020 CUFlow: Cflow::find took 25 wallclock secs ( 7.68 usr + 0.01 sys = 7.69 CPU) for 797614 flow file bytes, flow hit ratio: 42425/44550 2008/02/22 17:05:49 flowscan-1.020 CUFlow: Cflow::find took 25 wallclock secs ( 7.84 usr + 0.00 sys = 7.84 CPU) for 793587 flow file bytes, flow hit ratio: 42936/44970 2008/02/22 17:10:47 flowscan-1.020 CUFlow: Cflow::find took 27 wallclock secs ( 8.21 usr + 0.00 sys = 8.21 CPU) for 814166 flow file bytes, flow hit ratio: 44722/46740 2008/02/22 17:15:45 flowscan-1.020 CUFlow: Cflow::find took 27 wallclock secs ( 7.57 usr + 0.00 sys = 7.57 CPU) for 749273 flow file bytes, flow hit ratio: 41375/43410 2008/02/22 17:20:48 flowscan-1.020 CUFlow: Cflow::find took 31 wallclock secs ( 8.68 usr + 0.01 sys = 8.69 CPU) for 863693 flow file bytes, flow hit ratio: 47824/49920 As you can see it's taking longer and longer (wallclock secs). It will eventually get to this: 008/02/23 17:31:18 flowscan-1.020 CUFlow: Cflow::find took 254 wallclock secs ( 9.08 usr + 0.01 sys = 9.09 CPU) for 922132 flow file bytes, flow hit ratio: 49702/51390 2008/02/23 17:35:39 flowscan-1.020 CUFlow: Cflow::find took 237 wallclock secs (11.15 usr + 0.00 sys = 11.15 CPU) for 1073584 flow file bytes, flow hit ratio: 60328/62400 2008/02/23 17:44:18 flowscan-1.020 CUFlow: Cflow::find took 506 wallclock secs (10.48 usr + 0.02 sys = 10.50 CPU) for 1049448 flow file bytes, flow hit ratio: 57060/59520 2008/02/23 17:48:38 flowscan-1.020 CUFlow: Cflow::find took 247 wallclock secs ( 8.91 usr + 0.02 sys = 8.93 CPU) for 923017 flow file bytes, flow hit ratio: 48785/50670 2008/02/23 17:52:06 flowscan-1.020 CUFlow: Cflow::find took 194 wallclock secs ( 8.92 usr + 0.00 sys = 8.92 CPU) for 923040 flow file bytes, flow hit ratio: 49157/51240 2008/02/23 17:55:14 flowscan-1.020 CUFlow: Cflow::find took 180 wallclock secs ( 9.34 usr + 0.00 sys = 9.34 CPU) for 955834 flow file bytes, flow hit ratio: 51547/53340 2008/02/23 17:58:02 flowscan-1.020 CUFlow: Cflow::find took 159 wallclock secs ( 8.54 usr + 0.00 sys = 8.54 CPU) for 884217 flow file bytes, flow hit ratio: 46810/49110 2008/02/23 18:11:51 flowscan-1.020 CUFlow: Cflow::find took 814 wallclock secs ( 9.64 usr + 0.02 sys = 9.66 CPU) for 986885 flow file bytes, flow hit ratio: 52130/54480 2008/02/23 18:30:48 flowscan-1.020 CUFlow: Cflow::find took 1091 wallclock secs ( 9.30 usr + 0.04 sys = 9.34 CPU) for 950331 flow file bytes, flow hit ratio: 50770/52500 2008/02/23 18:44:44 flowscan-1.020 CUFlow: Cflow::find took 745 wallclock secs ( 9.13 usr + 0.02 sys = 9.15 CPU) for 938829 flow file bytes, flow hit ratio: 49541/51810 2008/02/23 18:53:37 flowscan-1.020 CUFlow: Cflow::find took 386 wallclock secs ( 8.53 usr + 0.00 sys = 8.53 CPU) for 883836 flow file bytes, flow hit ratio: 46801/48750 2008/02/23 18:57:35 flowscan-1.020 CUFlow: Cflow::find took 230 wallclock secs ( 7.88 usr + 0.00 sys = 7.88 CPU) for 815832 flow file bytes, flow hit ratio: 42967/45120 2008/02/23 19:10:18 flowscan-1.020 CUFlow: Cflow::find took 708 wallclock secs ( 8.34 usr + 0.01 sys = 8.35 CPU) for 843948 flow file bytes, flow hit ratio: 45503/47550 2008/02/23 19:29:16 flowscan-1.020 CUFlow: Cflow::find took 1078 wallclock secs ( 9.34 usr + 0.03 sys = 9.37 CPU) for 941108 flow file bytes, flow hit ratio: 50325/52770 2008/02/23 19:41:23 flowscan-1.020 CUFlow: Cflow::find took 640 wallclock secs ( 8.78 usr + 0.02 sys = 8.80 CPU) for 895410 flow file bytes, flow hit ratio: 47862/49740 2008/02/23 19:54:02 flowscan-1.020 CUFlow: Cflow::find took 754 wallclock secs ( 8.83 usr + 0.01 sys = 8.84 CPU) for 893960 flow file bytes, flow hit ratio: 47974/49860 2008/02/23 20:08:45 flowscan-1.020 CUFlow: Cflow::find took 763 wallclock secs ( 8.59 usr + 0.01 sys = 8.60 CPU) for 878969 flow file bytes, flow hit ratio: 46564/48990 Very ugly. >From my understanding, using this example. 2008/02/22 16:20:38 flowscan-1.020 CUFlow: Cflow::find took 11 wallclock secs ( 5.82 usr + 0.00 sys = 5.82 CPU) for 626396 flow file bytes, flow hit ratio: 32176/33780 The flowscan process should be finished and be terminated. How does something taking ~10secs end up getting longer and longer ? I did a test on the secondary machine up there. Killed flow-capture and flowscan. Deleted all files in the /ft folder. d3-syd-equinix01:/var/netflow/ft# ls -al total 792 drwxr-xr-x 2 root root 794624 2008-02-25 13:38 . drwxr-xr-x 6 root root 12288 2008-02-25 13:37 .. d3-syd-equinix01:/var/netflow/ft# Then started flow-capture. 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft 0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme d3-syd-equinix01:/var/netflow/ft# ls -al total 848 drwxr-xr-x 2 root root 794624 2008-02-25 13:40 . drwxr-xr-x 6 root root 12288 2008-02-25 13:37 .. -rw-r--r-- 1 root root 49244 2008-02-25 13:40 tmp-v05.2008-02-25.134041+1100 d3-syd-equinix01:/var/netflow/ft# Now started flowscan. 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft 0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme 20253 pts/2 S 0:00 /usr/bin/perl bin/flowscan sleep 30... sleep 30... So far so good. On the 5min mark: 2008/02/25 13:45:05 working on file /var/netflow/ft-v05.2008-02-25.134041+1100... 2008/02/25 13:45:08 flowscan-1.020 CUFlow: Cflow::find took 3 wallclock secs ( 3.66 usr + 0.02 sys = 3.68 CPU) for 666906 flow file bytes, flow hit ratio: 37098/39630 2008/02/25 13:45:08 flowscan-1.020 CUFlow: report took 0 wallclock secs ( 0.00 usr 0.00 sys + 0.09 cusr 0.03 csys = 0.12 CPU) 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft 0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme 20253 pts/2 S 0:04 /usr/bin/perl bin/flowscan 20265 pts/2 D 0:35 /usr/bin/perl bin/flowscan top - 13:46:53 up 1:52, 3 users, load average: 0.96, 0.70, 2.13 Tasks: 76 total, 2 running, 74 sleeping, 0 stopped, 0 zombie Cpu0 : 88.1%us, 0.0%sy, 0.0%ni, 0.0%id, 11.9%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 1.0%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1036428k total, 1022988k used, 13440k free, 15964k buffers Swap: 5301368k total, 52k used, 5301316k free, 872916k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ PPID P SWAP TIME COMMAND 20265 root 18 0 21912 17m 1732 R 88 1.7 1:18.68 1 0 3900 1:18 flowscan Now here's the part I don't understand. Why is flowscan still running ? It finished didn't it ? Now on the next run (10min mark): 2008/02/25 13:50:09 working on file /var/netflow/ft-v05.2008-02-25.134500+1100... 2008/02/25 13:50:15 flowscan-1.020 CUFlow: Cflow::find took 6 wallclock secs ( 6.24 usr + 0.01 sys = 6.25 CPU) for 734307 flow file bytes, flow hit ratio: 40698/43290 2008/02/25 13:50:15 flowscan-1.020 CUFlow: report took 0 wallclock secs ( 0.00 usr 0.01 sys + 0.11 cusr 0.03 csys = 0.15 CPU) 20241 ? Ss 0:01 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft 0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme 20253 pts/2 S 0:10 /usr/bin/perl bin/flowscan 20265 pts/2 R 4:34 /usr/bin/perl bin/flowscan 20274 pts/2 R 0:35 /usr/bin/perl bin/flowscan top - 13:50:40 up 1:56, 3 users, load average: 1.34, 0.93, 1.90 Tasks: 77 total, 5 running, 72 sleeping, 0 stopped, 0 zombie Cpu0 : 88.0%us, 1.0%sy, 0.0%ni, 0.0%id, 11.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 86.1%us, 1.0%sy, 0.0%ni, 2.0%id, 10.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1036428k total, 1022256k used, 14172k free, 30528k buffers Swap: 5301368k total, 52k used, 5301316k free, 844556k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ PPID P SWAP TIME COMMAND 20274 root 18 0 22688 18m 1732 R 90 1.8 0:12.58 1 0 3900 0:12 flowscan 20265 root 18 0 21912 17m 1732 R 85 1.7 4:12.54 1 1 3900 4:12 flowscan Can anyone shed some light on this ? Regards, Ross From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ross Tsolakidis Sent: Thursday, 21 February 2008 4:06 PM To: [email protected] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: [Flow-tools] cpu utilisation on flowscan processes Hi Dave, Thanks again for responding. I had a look at the rc scripts, I did notice flow-capture had 2 startup scripts, which was wierd. I removed that. But not flowscan, it's only being started once. Here's a top with the required info. top - 16:04:06 up 1 day, 5:34, 1 user, load average: 9.20, 8.68, 8.46 Tasks: 83 total, 7 running, 75 sleeping, 0 stopped, 1 zombie Cpu0 : 96.0%us, 4.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 98.0%us, 2.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1036428k total, 1022864k used, 13564k free, 17768k buffers Swap: 5301368k total, 52k used, 5301316k free, 758620k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ PPID P SWAP TIME COMMAND 9013 root 18 0 30280 26m 1732 R 44 2.7 2:38.67 1 0 2776 2:38 flowscan 16158 root 18 0 30284 26m 1732 R 44 2.7 0:51.91 1 1 2780 0:51 flowscan 24721 root 18 0 30116 26m 1728 R 40 2.6 6:12.83 1 0 2656 6:12 flowscan 3869 root 18 0 30120 26m 1728 R 39 2.6 3:51.55 1 0 2656 3:51 flowscan 2384 mysql 15 0 128m 36m 5220 S 13 3.6 252:44.53 2347 0 91m 252:44 mysqld 29815 root 23 0 30284 26m 1732 R 9 2.7 5:44.42 1 1 2780 5:44 flowscan Regards, Ross DISCLAIMER: This e-mail and any files transmitted with it may be privileged and confidential, and are intended only for the use of the intended recipient. If you are not the intended recipient or responsible for delivering this e-mail to the intended recipient, any use, dissemination, forwarding, printing or copying of this e-mail and any attachments is strictly prohibited. If you have received this e-mail in error, please REPLY TO the SENDER to advise the error AND then DELETE the e-mail from your system. Any views expressed in this e-mail and any files transmitted with it are those of the individual sender, except where the sender specifically states them to be the views of our organisation. Our organisation does not represent or warrant that the attached files are free from computer viruses or other defects. 25/2/2008 The user assumes all responsibility for any loss or damage resulting directly or indirectly from the use of the attached files. In any event, the liability to our organisation is limited to either the resupply of the attached files or the cost of having the attached files resupplied. _______________________________________________ Flow-tools mailing list [EMAIL PROTECTED] http://mailman.splintered.net/mailman/listinfo/flow-tools
