On Mar 31, 2007, at 5:36 PM, Jeff Mallatt wrote:
> I'm getting some results that I don't understand from a search.
>
> index << {:uid => 'one', :title => 'Some Title', :content => 'my
> first text'}
> index << {:uid => 'two', :title => 'Some Title', :content => 'some
> second content'}
> index << {:uid => 'three', :title => 'Other Title', :content => 'my
> third text'}
>
> query(index, 'title:"Some"')
> query(index, 'title:"Title"')
> query(index, 'uid:"two"')
Nice one.
When people don't understand search results, it's usually to do with
stop words. The StandardAnalyzer which parses documents and(!)
queries, uses a list of stop words which are ignored. See
Ferret::Analysis::FULL_ENGLISH_STOP_WORDS for a complete list of
(english) stop words.
In the case of "title:Some", "Some" is removed by the analyzer giving
only "title:", i.e. an empty query which (surprisingly) matches all
documents.
However, the same should happen with "content:some" but this one
returns only one document which leaves me completely puzzled. This
just isn't consistent.
So I'm afraid I can't be of much help here, but I'm sure somebody
else will enlighten us. This might as well be a bug, but even if it's
not, it's definitely not what anyone would reasonably expect.
--
@David: You should probably consider changing StandardAnalyzer not to
use stop words by default. It confuses people because no one would
suspect such a feature to be enabled by default. It just doesn't
follow the principle of least astonishment.
Even if people want to use stop words, they might not be happy with
the ones built into Ferret. It very much depends on the nature of the
content that is indexed and instead of using a one-size-fit-all stop
word list one is usually better off with compiling a custom one for
any particular application.
Cheers,
Andy
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk