Bill Moseley wrote:

> Hi Stas,
> 
> I just updated the search site for Apache.org with a newer version of
> swish.  The context highlighting is a bit silly, but that can be fixed.
> I'm only caching the first 15K of text from each page for context
> highlighting.
> 
> http://search.apache.org
> 
> It seems reasonably fast (it's not running under mod_perl currently, but
> could -- if mod_perl was in that server ;).
> 
> It takes about eight or nine minutes to reindex ~35,000 docs on *.apache.org
> so the mod_perl list (and others) shouldn't too much trouble, I'd think,
> with smaller numbers and smaller content.
> 
> It doesn't do incremental indexing at this point, which is a draw back, but
> indexing is so fast it normally doesn't matter (and there's an easy
> work-around for something like a mailing list to pickup new messages as
> they come in during the day).
> 
> Swish-e can also call a perl program which feeds docs to swish.  That makes
> it easy to parse the email into fields for something like:
> 
>   http://swish-e.org/Discussion/search/swish.cgi
> 
> which looks a lot like the Apache search site...
> 
> But, what would be needed is a good threaded mail archiver, which there are
> many to pick from, I'd expect.
> 
> 
>>Some 
>>archives are browsable, but their search engines simply suck. e.g. 
>>marc.theaimsgroup.com I think is the only one that archives 
>>[EMAIL PROTECTED], but if you try to seach for perl string like 
>>APR::Table::FETCH it won't find anything. If you search for
>>get_dir_config it will split it into 'get', 'dir', 'config' and give you 
>>a zillion matches when you know that there are just a few.
>>
> 
> On swish you could say ":" and "_" are part of words and those would index
> as full words.  Or, just simply search for phrase: "get_dir_config" and it
> would search for the phrase "get dir config" which would probably find what
> you want.
> 
> Maybe : and _ are ok in words, but you have to think carefully about
> others.  It's more flexible to split the words and use phrases in many cases.

Hi Bill,

It's great that search.apache.org gets a new engine, but if you run a 
few simple tests it's still not very good with what you've just 
explained. When I search for mod_perl, I search for 'mod_perl' and not 
'mod' and 'perl'. It's possible that there are hundreds of pages which 
have mod_perl or 'mod' and 'perl' in them, in the current case those 
with 'mod_perl' won't get higher relevance than those with 'mod' and 
'perl'. So it's not good.

Well we have been through this already with Randy Kobe's version of the 
searchable guide (Swish-E too), which has been tuned to work with Perl 
content. You may want to ask Randy to give you the tuned configuration. 
You can compare all three search engines used at 
http://perl.apache.org/guide/#search.

Thanks Bill!

_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/

Reply via email to