Hi, You always need someone that disagrees in discussions like these, so here I go :)
Adrian Popa wrote: > Would any database manage the huge volume of data (I have about 200GB > of data from about 2 days of collecting)? Databases are designed to handle huge amounts of data, so the answer to your question would definitely be yes (companies like IBM build sql-based data warehouses that handle terabytes of data). > I also have mysql setups that take about a minute to search through 3 > million records that take about 1GB (on similar hardware setups). > In my opinion, this doesn't really say anything. Does this database contain netflow information? If not, how is it comparable? Is the database indexed correctly? Is your query optimized for these indexes? Are you doing full text searches? Is mysql the best tool for the job? > This is why I think a binary format + a fast application can go > through the data much faster than a conventional application. > I partly agree, in that I think that a binary format *can* be faster, but I seriously doubt that the current nfcapd format *is* faster, as it doesn't include any indexes or other methods that improve the speed of random searches on fields other than then endtime of the flow. For example: If I would like to see all hosts that contacted a certain subnet during a 1 hour period, nfdump would traverse all the flows in this period (in our case that would be about 30 million) and compare the destination of every flow to the subnet. A sql database with a b-tree index on the "destination ip" field would simply use this b-tree to filter out the correct records, thus preventing millions of comparisons and file operations. The flowd collector (http://www.mindrot.org/projects/flowd/) includes scripts that can be used to create a mysql backed collector. I might setup a system next week to compare it's performance to nfdump as I am quite curious to see how it actually compares (and to see if I am not making idle claims :) Werner > Adrian Popa > > On 10/2/07, Tristan RHODES <[EMAIL PROTECTED]> wrote: >> Will using a database backend to store flowdata help improve query >> times? Has anyone experimented with this? >> >> Tristan Rhodes Weber State University >> >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft Defy all challenges. >> Microsoft(R) Visual Studio 2005. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ Nfsen-discuss >> mailing list [email protected] >> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss >> > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft Defy all challenges. > Microsoft(R) Visual Studio 2005. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ Nfsen-discuss mailing > list [email protected] > https://lists.sourceforge.net/lists/listinfo/nfsen-discuss ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nfsen-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
