On Fri, 19 May 2000, Stas Bekman wrote:

> On Thu, 18 May 2000, Matt Sergeant wrote:
> 
> > One more point... The indexer or the searcher (or both) has a broken
> > tokenizer for anything involving perl. Try searching for
> > Apache::Constants, for example.
> 
> That's right. It's broken :( After searching for 'Apache::Constants' I've
> got 'apach constant'... 

Just to expand on this - I turned stemming of words on by default
in the search, which is why the stemmed words get returned. Perhaps
it'll be better to turn stemming off by default, and rather
make it a configureable option?

> The :: are stripped on the fly, since these cannot be used in index, so
> when you look for Foo::Bar you are actually looking for 'Foo && Bar'.

That's a limitation of swish-e - you can configure it to
index characters like $, !, ... as part of a "word", but
the characters >, <, *, and : cannot be so indexed. So the
script silently stripped ':' out, leaving the search term
to be 'Apache' && 'Constants'. This should be mentioned 
on the search page .... 

Another thing that was configured in is that words have
to be at least 3 characters long, which seems reasonable,
and also there's some stopwords that don't get indexed,
as they're too common. This list of stopwords is built
by hand - so far it only includes 'perl' and 'modperl'.
Also, the maximum number of hits is set at 30.

best regards,
randy

Reply via email to