Hi Stas, I just updated the search site for Apache.org with a newer version of swish. The context highlighting is a bit silly, but that can be fixed. I'm only caching the first 15K of text from each page for context highlighting.
http://search.apache.org It seems reasonably fast (it's not running under mod_perl currently, but could -- if mod_perl was in that server ;). It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org so the mod_perl list (and others) shouldn't too much trouble, I'd think, with smaller numbers and smaller content. It doesn't do incremental indexing at this point, which is a draw back, but indexing is so fast it normally doesn't matter (and there's an easy work-around for something like a mailing list to pickup new messages as they come in during the day). Swish-e can also call a perl program which feeds docs to swish. That makes it easy to parse the email into fields for something like: http://swish-e.org/Discussion/search/swish.cgi which looks a lot like the Apache search site... But, what would be needed is a good threaded mail archiver, which there are many to pick from, I'd expect. >Some >archives are browsable, but their search engines simply suck. e.g. >marc.theaimsgroup.com I think is the only one that archives >[EMAIL PROTECTED], but if you try to seach for perl string like >APR::Table::FETCH it won't find anything. If you search for >get_dir_config it will split it into 'get', 'dir', 'config' and give you >a zillion matches when you know that there are just a few. On swish you could say ":" and "_" are part of words and those would index as full words. Or, just simply search for phrase: "get_dir_config" and it would search for the phrase "get dir config" which would probably find what you want. Maybe : and _ are ok in words, but you have to think carefully about others. It's more flexible to split the words and use phrases in many cases. Bill Moseley mailto:[EMAIL PROTECTED]