On Thu, Jul 14, 2005 at 06:12:58PM +0200, Peter Valdemar M?rch scratched on the 
wall:
> Thanks for your responses!
> 
> Adam Powers apowers-at-lancope.com |Lists| wrote:
> >Yeah, and your timeouts have a pretty significant impact as well. What are
> >you using?
> 
> NAT (or rather, PAT) translations time out after 3 hours, while TCP 
> connections time out after 1 hour. I guess if PAT timeout was tiny, like 
> 2 minutes,  that would create many flows, because new PAT entries were 
> created on different ports all the time. Is that what you mean when you 
> say "pretty significant"?

  As Adam clarified in his second mail, there are some adjustable
  timers on the exporter (no matter what brand).  The two that are likely
  to have the largest impact are the maximum lifetime timer (defaults to
  30 min) and the "flush on inactivity" timer (which, IIRC, defaults to
  something like 15 sec.).  The max lifetime timer actually doesn't have
  that big of an effect on the total number of flows, but it will effect
  some of your other averages if you have long-running/high-traffic flows
  (e.g.  large downloads, like ISO images, over a moderately slow link).
  The biggest reason to reduce the max lifetime value is if you have
  quasi-realtime graph systems-- the max lifetime value more or less
  defines the delay between "happened" and "reported", worst case.
  Unless this value gets into the 5 min or less range, it isn't going
  to have a drastic effect on your total flow count (for most
  applications).

  The inactivity timer can have a fairly significant impact, however,
  if you're behind a slow(er) link or if you access a lot of data that
  is "stuttered".  Basically if a flow cache entry sees no data within
  the time period defined by this value, it flushes the cache entry
  (i.e. exports the flows).  This means that you end up with multiple
  flows for each side of a TCP connection.  The worst case of this is
  usually someone reading text over an SSH session.  They press one
  the space bar, get one screen full of text (two flows), spend 30 to
  45 seconds reading it (flows timeout), then press the space bar again
  (two more flows), and so on.  Cranking this up to something like 60
  seconds isn't a bad idea, but it can also put a lot of stress on the
  flow cache, as most UDP flows depend on the inactivity timer to flush
  them out (unlike short TCP sessions, where the flow cache is aware of
  FIN flags).  Make sure you've got plenty of memory in your export
  device if you tune this value higher.

  That said, if you have a well sized uplink for your needs (e.g. your
  Internet connection is not pegged all the time), even this is value
  is unlikely to have a huge effect on your total flow count if you are
  running the defaults and have more or less "normal" (or at least not
  radically abnormal) traffic patterns.  Even in this stutter kind of
  situation, the total number of flows (say two per 15 seconds vs two
  per 60) is still fairly low.  While this timer will have an impact,
  that "impact" is likely to be in the 5 to 10% range, or less.

> Jay, I was a bit unclear before. Yes I'm talking about 800 flows / 
> person / hour between the inside and the internet - not internally, 
> where it is bound to be higher! :-D

  I should also add that in talking about flows/IP|host|person/hour,
  the "hours" are work hours, not total hours.  In a business, I assume
  flows tapper off considerably outside of the 8am - 6pm hours.  The
  University network tends to be much more 24x7 (especially when the
  students are around), but we do have our consistent low points (6am).

> Thanks for your data, Jay! We're trying to get an estimate of how huge 
> the data set is going to get, and so good estimates are very valuable 
> and your data points help!

  Assuming you're storing things in raw binary format, and not trying
  to put things into a database or something of that sort, storage
  shouldn't be a big issue.  Remember that your write speeds are
  (fairly) slow, so the only cost in the storage question is how fast
  you need to read it.  If you only keep the data for "what if"
  security issues, you rarely need to access it and read speeds may not
  be that important-- a simple ~200 GB drive (or a mirrored set) will do
  you very well.  If, on the other hand, you do a lot of post-analysis
  (say, per-hour or per-day reports) you may or may not care about read
  speeds-- depends on your traffic level and how complex the reports
  are.  We do a lot of interactive filtering and searching on our data,
  so read speeds are very important because I've got users sitting at a
  command line waiting for results (usually because some big security
  thing blew up and they need to know _RIGHT_NOW_).  If you do most of
  your graphs/reports in real-time, read speeds may not be important at
  all.

  We've got just over 5TB on our collector, but we just upgraded it,
  and only 1.4TB or so is actually available to the collector.  We also
  keep everything for 90 days and run ~200M flows per day, which works
  out to about 850GB of data.  Flows tend to compress fairly well,
  however, so we only use about 450GB (we compress after two weeks).
  We do, however, experience about 20% to 35% annual growth, which
  makes things interesting.

   -j

-- 
                     Jay A. Kreibich | CommTech, Emrg Net Tech Svcs
                        [EMAIL PROTECTED] | Campus IT & Edu Svcs
          <http://www.uiuc.edu/~jak> | University of Illinois at U/C
_______________________________________________
Flow-tools mailing list
[EMAIL PROTECTED]
http://mailman.splintered.net/mailman/listinfo/flow-tools

Reply via email to