Re: [Nfsen-discuss] Have we considered a database backend to store flowdata?

Werner Schram Tue, 02 Oct 2007 07:57:07 -0700

Hi,

You always need someone that disagrees in discussions like these, so
here I go :)

Adrian Popa wrote:
> Would any database manage the huge volume of data (I have about 200GB
>  of data from about 2 days of collecting)?

Databases are designed to handle huge amounts of data, so the answer to
your question would definitely be yes (companies like IBM build
sql-based data warehouses that handle terabytes of data).

> I also have mysql setups that take about a minute to search through 3
> million records that take about 1GB (on similar hardware setups).
> 

In my opinion, this doesn't really say anything. Does this database
contain netflow information? If not, how is it comparable? Is the
database indexed correctly? Is your query optimized for these indexes?
Are you doing full text searches? Is mysql the best tool for the job?

> This is why I think a binary format + a fast application can go 
> through the data much faster than a conventional application.
> 

I partly agree, in that I think that a binary format *can* be faster,
but I seriously doubt that the current nfcapd format *is* faster, as it
doesn't include any indexes or other methods that improve the speed of
random searches on fields other than then endtime of the flow. For example:
If I would like to see all hosts that contacted a certain subnet during
a 1 hour period, nfdump would traverse all the flows in this period (in
our case that would be about 30 million) and compare the destination of
every flow to the subnet. A sql database with a b-tree index on the
"destination ip" field would simply use this b-tree to filter out the
correct records, thus preventing millions of comparisons and file
operations.

The flowd collector (http://www.mindrot.org/projects/flowd/) includes
scripts that can be used to create a mysql backed collector. I might
setup a system next week to compare it's performance to nfdump as I am
quite curious to see how it actually compares (and to see if I am not
making idle claims :)

Werner

> Adrian Popa
> 
> On 10/2/07, Tristan RHODES <[EMAIL PROTECTED]> wrote:
>> Will using a database backend to store flowdata help improve query 
>> times?  Has anyone experimented with this?
>> 
>> Tristan Rhodes Weber State University
>> 
>> 
>> -------------------------------------------------------------------------
>>  This SF.net email is sponsored by: Microsoft Defy all challenges.
>> Microsoft(R) Visual Studio 2005. 
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ 
>> _______________________________________________ Nfsen-discuss
>> mailing list [email protected] 
>> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
>> 
> 
> -------------------------------------------------------------------------
>  This SF.net email is sponsored by: Microsoft Defy all challenges.
> Microsoft(R) Visual Studio 2005. 
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ 
> _______________________________________________ Nfsen-discuss mailing
> list [email protected] 
> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nfsen-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nfsen-discuss

Re: [Nfsen-discuss] Have we considered a database backend to store flowdata?

Reply via email to