Thanks Mark. So, the more immediate problem that I'm encountering has nothing to do with my Linux cluster, but rather with flow-report running out of memory.
Basically, when I run flow-report on ~1.7GB of level 6 compressed data (one day's activity on a router) the process will enter into a terminal "disk sleep" state and just endlessly spin until I kill it. This usually occurs when the process reaches around 96% of memory capacity. Here a sample snapshot of some stats after one such process has crashed: --> Version, kernel: 2.4.27, gcc: 3.3.4 --> Results of top: Mem: 901216k total, 895324k used, 5892k free, 256k buffers Swap: 1999992k total, 887396k used, 1112596k free, 1400k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2930 netflow 19 0 884m 848m 827m D 2.6 96.5 34:39.24 flow-report --> Contents of /proc/2930/status: Name: flow-report State: D (disk sleep) Tgid: 2930 Pid: 2930 PPid: 2915 VmSize: 907724 kB VmLck: 0 kB VmRSS: 866036 kB VmData: 905668 kB --> Results of strace -p 2390 read(0, "[EMAIL PROTECTED]"..., 32768) = 32708 read(0, "\336\232\201A\360k\3111T\33\262TE\261L\300\361|\24E\320"..., 32768) = 32708 read(0, "\336\232\201A\302\246C2\\\33\262TE\261L\300\202\3\321\310"..., 32768) = 32708 Clearly the process has run out of physical memory, but it still has plenty of swap space that's not being used. Also, I'm wondering about its behavior once the process reaches this point. It doesn't blows up with a malloc error or some other memory related issue. It just keeps making read() system calls without getting anywhere. Is this behavior expected when a flow-report process gets too big? Also, I was running flow-reports on even larger files on a Solaris box with twice as much memory. On occasion a process would terminate with no memory, but it never cycled like this, it seems like a linux thing. Any help would be greatly appreciated. Thanks, Ari > -----Original Message----- > From: Mark Fullmer [mailto:[EMAIL PROTECTED] > Sent: Tuesday, November 02, 2004 8:13 PM > To: Ari Leichtberg > Cc: [EMAIL PROTECTED] > Subject: Re: [Flow-tools] Flow-tools on linux cluster (Mosix) > > > On Nov 2, 2004, at 3:21 PM, Ari Leichtberg wrote: > > > > > On that note, does anybody know about the inner workings of > > flow-report? > > My general understanding is that it loads up a huge hashtable (or other > > data structure) in memory and then basically dumps out quick stats. > > Not > > very cpu intensive. Is that accurate? > > If there are less than 64K buckets in a key its a direct lookup > otherwise a hash. So an IP protocol or IP port report would not use a > hash, an IP address report would. If the pps and bps calculations are > not in the report the floating point calculations are skipped which can > really impact CPU cycles. > > Flow-report can run many reports on one pass of the data depending on > the memory available. This usually translates to a big speed gain when > running many reports vs flow-stat due to the reduced disk I/O. > > -- > mark _______________________________________________ Flow-tools mailing list [EMAIL PROTECTED] http://mailman.splintered.net/mailman/listinfo/flow-tools
