[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

Carl Lerche Sun, 21 Jan 2007 09:11:14 -0800

Hello all,

I am new to the list, but I have been using ferret for a little bit
already. I would first like to thank Dave for all his work on ferret.


I had a few questions that I haven't been able to figure out after
messing around with ferret and going through the documentation.

StemFilter ------

I am trying to improve the quality of my searches in context of the
content of my application. I have created an analyzer using the
following:

StemFilter.new StopFilter.new(
LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words )

This has been pretty good so far, however, I really would like to get
a search for "plumber" match "plumbing" at maybe a lower score than it
would match "plumbers". The thing is that plumber(s) is filtered to
"plumber" and plumbing is filtered to plumb, so it doesn't match. Is
there any way to tweak the filter to be able to do these matches? I
would like to match all noun and verbs together (and ideally with a
lower score than different verb conjugations would match). Another
example would be driving and driver.

Worst case scenario, I could probably do some preprocessing to the
search queries to expand "plumber" or "driving" to a query that
includes both stems (for example expand the query for plumber to
"plumber plumb")

Indexes ---

I was wondering how exactly indexes are implemented under the hood and
if there is a way to give hints to ferret as to how our queries will
be formed in order to optimize performance. Maybe I'm thinking of
ferret too much as a database, but I am not too familiar with what's
under ferret's hood.

The reason I ask is that for the project I am working on, I have huge
amounts of text to search, but each item also has a location
associated with it (longitude & lattitude) and each query will only
want to search the text located in a specific area (point and radius).
I can add ranged parameters to the query and that will work, but is
that optimal? Hopefully I am making sense.

Donations ---

I was wondering if there is a page that lists the total amount of
donations so far?

Thanks,
-carl

-- 
EPA Rating: 3000 Lines of Code / Gallon (of coffee)
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

Reply via email to