Hi everyone, I've been exploring using ferret for indexing large amounts of production log files. Right now we have a homemade system for searching through the logs that involves specifying a date/time range and then grepping through the relevant files. This can take a long time.
My initial tests (on 2gb of log files) have been promising, I've taken two separate approaches: The first is loading each line in each log file as a "document". The plus side to this is that doing a search will get you individual log lines as the results, which is what I want. The downside is that indexing takes a long long time and the index size is very large even when not storing the contents of the lines. This approach is not viable for indexing all of our logs. The second approach is indexing the log files as documents. This is relatively fast, 211sec for 2gb of logs, and the index size is a nice 12% of the sample size. The downside is that after figuring out which files match your search terms, you have to crawl through each "hit" document to find the relevant lines. For the sake of full disclosure, at any given time we keep roughly 30 days of logs which comes to about 800ish Gb of log files. Each file is roughly 15Mb in size before it gets rotated. Has anyone else tackled a problem like this and can offer any ideas on how to go about searching those logs? The best idea I can come up with (that I haven't implemented yet to get real numbers) is to index a certain number of log files by line, like the last 2 days, and then do another set by file (like the last week). This would have fast results for the more recent logs and you would just have to be patient for the slightly older logs. Any ideas/help? Thanks, Chris
_______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

