Hello,

I'm using: Ruby 1.8.6, Rails 1.2.3, ferret 0.11.4, acts_as_ferret from
svn stable.

I've had quite a day wrestling with trying to remove the use of
stopwords.  The problem was that when searching for words like "no" or
"the", no results were found.  I found a confusing thing behavior that
has taken me some time to figure out, and I hope sharing it saves
someone else some time.

>From searching around online and in the source code I came up with the
following config in my ActiveRecord model:

  acts_as_ferret({:fields => {:name        => {:boost => 10},
                              :type        => {:boost => 2},
                              :email       => {:boost => 10},
                              :bio         => {:store => :no},
                              :status_id   => {:boost => 1}},
                  :store_class_name => true,
                  :remote => true,
                  :ferret => { :analyzer =>
Ferret::Analysis::StandardAnalyzer.new([]) }
                  } )

With the StandardAnalyzer added, I do find results with "no" or "the".
 The complicating factor is that as you can see, I have a field
"status_id".  This field lets me filter for profiles that are
published or draft in my CMS.

Before I added the StandardAnalyzer, the status_id field worked fine
in queries like this:

a = Profile.find_by_contents("smith status_id:100")
a.total_hits
=> 2 # this is correct, only 2 are published

a = Profile.find_by_contents("smith")
a.total_hits
=> 4 # this is correct, there are 4 total

So, you can see that the status_id was automatically "AND"-ed to the
query word.

However, after adding the above StandardAnalyzer config, the status_id
was now "OR"-ed, like so:

a = Profile.find_by_contents("no")
a.total_hits
=> 5 # this is good

a = Profile.find_by_contents("no status_id:100")
a.total_hits
=> 208 # this is bad -- it's the same as if I only searched for status_id:100.

a = Profile.find_by_contents("smith status_id:100")
a.total_hits
=> 208 # this is just as bad -- it's the same as if I only searched
for status_id:100.

The fix here is to add the AND keyword explicitly to the query:

a = Profile.find_by_contents("smith AND status_id:100")
a.total_hits
=> 2 # works just like before.

In fact, OR becomes the default search regardless of whether I use a
field in the query:

a = Profile.find_by_contents("smith jones")
a.total_hits
=> 5 # OR'ed results

a = Profile.find_by_contents("smith AND jones")
a.total_hits
=> 0

Again, before StandardAnalyzer, "AND" was the default so the first
"smith jones" query would have returned 0 as it should.

Any insight as to why this might be?  I would prefer AND to be the default.

Thanks,

Doug
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to