On 9/28/06, William Morgan <[EMAIL PROTECTED]> wrote:
> Hi Dave,
>
> Excerpts from David Balmain's mail of 26 Sep 2006 (PDT):
> > You need to downcase the term when you add it to a TermQuery. The
> > StandardAnalyzer downcases all text so you need to do the same with
> > any terms you add to any hand built queries.
>
> Thanks for the response. Downcasing the string passed into the TermQuery
> does, in fact, retrieve the document. BUT, I had used a
> WhitespaceAnalyzer with no downcasing on that field, so it should have
> preserved case in the index.
>
> In fact, some experimentation shows:
>
> > mid = "[EMAIL PROTECTED]"
> > i = Ferret::Index::Index.new
> > wsa = Ferret::Analysis::WhiteSpaceAnalyzer.new false
> > wsa.token_stream(:message_id, mid).next
> => token["[EMAIL PROTECTED]":0:26:1]
> > i.add_document({:message_id => mid}, wsa)
> > i.search(Ferret::Search::TermQuery.new(:message_id, mid))
> => #<struct Ferret::Search::TopDocs total_hits=0, hits=[], max_score=0.0>
> > i.search(Ferret::Search::TermQuery.new(:message_id, mid.downcase))
> => #<struct Ferret::Search::TopDocs total_hits=1, hits=[#<struct 
> Ferret::Search::Hit doc=0, score=0.3068528175354>], max_score=0.3068528175354>
>
> So it looks like WSA#token_stream does the right thing. Is it possible
> isn't not actually being called at insertion time? Or am I
> misunderstanding something?
>
> --
> William <[EMAIL PROTECTED]>

Hi William,

Ok, this is definitely a a bug. I've already fixed it and it'll be out
in the next release. By the way, you probably already know this but
you can set the analyzer used by the index.

    Ferret::Index::Index.new(:analyzer => wsa)

You probably have a good reason to be doing it the way you are but I
just wanted to check.

Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to