[Ferret-talk] Experience using ferret to index log files

Chris TenHarmsel Thu, 21 Feb 2008 09:36:19 -0800

Hi everyone,
I've been exploring using ferret for indexing large amounts of production
log files.  Right now we have a homemade system for searching through the
logs that involves specifying a date/time range and then grepping through
the relevant files.  This can take a long time.


My initial tests (on 2gb of log files) have been promising, I've taken two
separate approaches:
The first is loading each line in each log file as a "document".  The plus
side to this is that doing a search will get you individual log lines as the
results, which is what I want.  The downside is that indexing takes a long
long time and the index size is very large even when not storing the
contents of the lines.  This approach is not viable for indexing all of our
logs.

The second approach is indexing the log files as documents.  This is
relatively fast, 211sec for 2gb of logs, and the index size is a nice 12% of
the sample size.  The downside is that after figuring out which files match
your search terms, you have to crawl through each "hit" document to find the
relevant lines.

For the sake of full disclosure, at any given time we keep roughly 30 days
of logs which comes to about 800ish Gb of log files.  Each file is roughly
15Mb in size before it gets rotated.

Has anyone else tackled a problem like this and can offer any ideas on how
to go about searching those logs?  The best idea I can come up with (that I
haven't implemented yet to get real numbers) is to index a certain number of
log files by line, like the last 2 days, and then do another set by file
(like the last week).  This would have fast results for the more recent logs
and you would just have to be patient for the slightly older logs.

Any ideas/help?

Thanks,
Chris

_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

[Ferret-talk] Experience using ferret to index log files

Reply via email to