On Wed, Aug 23, 2006 at 03:30:46AM +0900, David Balmain wrote:
> On 8/22/06, Benjamin Krause <[EMAIL PROTECTED]> wrote:
> >
> > >> Lets suppose I index a User on the phrase "Ruby on Rails."  If I then
> > >> search using User.find_by_contents("Ruby on Rails") I get no results,
> > >> since "or" is a common term and does not get indexed.  Of course,
> > >> User.find_by_contents("Ruby Rails") works just fine.
> > >
[..]
> 
> This shouldn't be necessary. What Jens said is correct. If you use the
> same analyzer in your indexer as you use in your query parser then a
> search for "Ruby on Rails" should work. If you use the Index::Index
> class this will be handled for you.

As this problem seems to be fairly common recently, I did some tests and
I think I found a common pattern that seems to lead to wrong query 
analyzing when using the Index::Index class:

  def test_stopwords
    i = Ferret::Index::Index.new(
            :occur_default => Ferret::Search::BooleanClause::Occur::MUST,
            :default_search_field => '*')
    d = Ferret::Document::Document.new

    # adding this additional field to the document leads to failure below
    # comment out this statement and all tests pass:
    d << Ferret::Document::Field.new('id', '1', 
                                     Ferret::Document::Field::Store::YES,
                                     
Ferret::Document::Field::Index::UNTOKENIZED)

    d << Ferret::Document::Field.new('content', 'Move or shake', 
                                     Ferret::Document::Field::Store::NO,
                                     Ferret::Document::Field::Index::TOKENIZED,
                                     Ferret::Document::Field::TermVector::NO,
                                     false, 1.0)
    i << d
    hits = i.search 'move nothere shake'
    assert_equal 0, hits.size
    hits = i.search 'move shake'
    assert_equal 1, hits.size
    hits = i.search 'move or shake'
    assert_equal 1, hits.size # fails when id field is present
  end


the id field is constructed just like we do it in aaf. I tried some
variations of the way the field is constructed (another name, other
flags), but as soon as there is more than one field, the test doesn't
work any more.

Setting the default_search_field to 'content' makes the tests pass, btw.

Dave, any suggestions ?

Jens


-- 
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer       [EMAIL PROTECTED]
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to