Hello all, I am new to the list, but I have been using ferret for a little bit already. I would first like to thank Dave for all his work on ferret.
I had a few questions that I haven't been able to figure out after messing around with ferret and going through the documentation. StemFilter ------ I am trying to improve the quality of my searches in context of the content of my application. I have created an analyzer using the following: StemFilter.new StopFilter.new( LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words ) This has been pretty good so far, however, I really would like to get a search for "plumber" match "plumbing" at maybe a lower score than it would match "plumbers". The thing is that plumber(s) is filtered to "plumber" and plumbing is filtered to plumb, so it doesn't match. Is there any way to tweak the filter to be able to do these matches? I would like to match all noun and verbs together (and ideally with a lower score than different verb conjugations would match). Another example would be driving and driver. Worst case scenario, I could probably do some preprocessing to the search queries to expand "plumber" or "driving" to a query that includes both stems (for example expand the query for plumber to "plumber plumb") Indexes --- I was wondering how exactly indexes are implemented under the hood and if there is a way to give hints to ferret as to how our queries will be formed in order to optimize performance. Maybe I'm thinking of ferret too much as a database, but I am not too familiar with what's under ferret's hood. The reason I ask is that for the project I am working on, I have huge amounts of text to search, but each item also has a location associated with it (longitude & lattitude) and each query will only want to search the text located in a specific area (point and radius). I can add ranged parameters to the query and that will work, but is that optimal? Hopefully I am making sense. Donations --- I was wondering if there is a page that lists the total amount of donations so far? Thanks, -carl -- EPA Rating: 3000 Lines of Code / Gallon (of coffee) _______________________________________________ Ferret-talk mailing list [email protected] http://rubyforge.org/mailman/listinfo/ferret-talk

