On Thu, 20 Jul 2006, Arya, Manish Kumar wrote: > I want to use htdig for searching syslog-ng log > outputs. I have installed apache and its running on > 127.0.0.1:8080 (with document root as path to log dir) > and in htdig conf i have given this url for indexing. > and one apache is running on public interface to run > htdig html/cgi. > my Q is that can htdig handle 5-10 G data per day > for indexing I am planing to rebuild index after every > 6 hrs or so?
I have never worked with this much data on a daily basis, so there is little that I can say regarding the specifics. In general what you can index in a given day is going to be almost entirely dependent upon how much hardware you throw at the problem. About the only way you are going to get a definitive answer to this question, for your circumstances, is to just start indexing and see what happens. You should also keep in mind that htdig indexes at a file level. If you trying to index very large log files, your results are going to be very coarse. A hit might end up telling nothing more than that a term was found somewhere in a file that is tens or hundreds of megabytes in size. > second Q, I want to customize htdig to show complete > log message on search output means message terminated > till newline "\n" and i dont want to show link url of > search results (because its pointing to 127.0.0.1 > interface so its of no use) If you want to retain all log messages in a manner allowing them to be displayed, you will need to set max_head_length very high (larger than the largest log file). This is going to result in an extremely large database set and significantly increase the required index time. Picking out individual log messages for display in the results is not going to work out the box. htdig doesn't know anything about what constitutes a log entry. I don't think any notion of newlines is even retained in the excerpts. I am fairly certain that you would need to hack the code that handles display to even attempt what you need here. About the only way I can think of to reasonably use ht://Dig for the type of task you are describing is to add an extra component that splits each log file into individual files, one per log entry. Then index and search on those files. There are various ways to map URLs if you can overcome the other issues. See for example the documentation on url_rewrite_rules, search_rewrite_rules, and url_part_aliases. Jim ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

