Hello, On 10/2/07, Werner Schram <[EMAIL PROTECTED]> wrote: > Hi, > > You always need someone that disagrees in discussions like these, so > here I go :) > > > Adrian Popa wrote: > > Would any database manage the huge volume of data (I have about 200GB > > of data from about 2 days of collecting)? > > Databases are designed to handle huge amounts of data, so the answer to > your question would definitely be yes (companies like IBM build > sql-based data warehouses that handle terabytes of data). >
Thank you for that information - but I was judging the performance of the database engine based on what I've seen in real life > > I also have mysql setups that take about a minute to search through 3 > > million records that take about 1GB (on similar hardware setups). > > > > In my opinion, this doesn't really say anything. Does this database > contain netflow information? If not, how is it comparable? Is the > database indexed correctly? Is your query optimized for these indexes? > Are you doing full text searches? Is mysql the best tool for the job? > The table I was talking about holds syslog information, and has a date field and a text field used to store the actual message. Searches through it are very slow and painful. If mysql isn't the best tool for the job, I don't know what is... > > This is why I think a binary format + a fast application can go > > through the data much faster than a conventional application. > > > > I partly agree, in that I think that a binary format *can* be faster, > but I seriously doubt that the current nfcapd format *is* faster, as it > doesn't include any indexes or other methods that improve the speed of > random searches on fields other than then endtime of the flow. For example: > If I would like to see all hosts that contacted a certain subnet during > a 1 hour period, nfdump would traverse all the flows in this period (in > our case that would be about 30 million) and compare the destination of > every flow to the subnet. A sql database with a b-tree index on the > "destination ip" field would simply use this b-tree to filter out the > correct records, thus preventing millions of comparisons and file > operations. > > The flowd collector (http://www.mindrot.org/projects/flowd/) includes > scripts that can be used to create a mysql backed collector. I might > setup a system next week to compare it's performance to nfdump as I am > quite curious to see how it actually compares (and to see if I am not > making idle claims :) > Looking forward to your measurements. > Werner > > > Adrian Popa > > > > On 10/2/07, Tristan RHODES <[EMAIL PROTECTED]> wrote: > >> Will using a database backend to store flowdata help improve query > >> times? Has anyone experimented with this? > >> > >> Tristan Rhodes Weber State University > >> > >> > >> ------------------------------------------------------------------------- > >> This SF.net email is sponsored by: Microsoft Defy all challenges. > >> Microsoft(R) Visual Studio 2005. > >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > >> _______________________________________________ Nfsen-discuss > >> mailing list [email protected] > >> https://lists.sourceforge.net/lists/listinfo/nfsen-discuss > >> > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Microsoft Defy all challenges. > > Microsoft(R) Visual Studio 2005. > > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > > _______________________________________________ Nfsen-discuss mailing > > list [email protected] > > https://lists.sourceforge.net/lists/listinfo/nfsen-discuss > > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Nfsen-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nfsen-discuss
