Hi,

I've implemented synonym searching in my rails application but have  
an idea I'd like to implement but can't figure out how to do. The  
idea is that I'd like to give the end user the choice on whether to  
search for the synonym of a word or not. Preferably by extending the  
query language to parse a construct similar to '%word1' and then have  
the word turned into a or list (i.e., word1|word2|word3|...).

Currently, the query parser constantly calls SynonymTokenFilter to  
get synonyms for each token. Is there a way I can go about achieving  
this functionality?

Here's an overview of what I've done so far:

My model classes in my rails app use acts_as_ferret with a call that  
looks like:

acts_as_ferret(
     :fields => [:body],
     :store_class_name => true,
     :ferret => {
         :or_default => false,
         :analyzer => SynonymAnalyzer.new(WordnetSynonymEngine.new, [])
     }
)


I created a SynonymAnalyzer and SynonymTokenFilter:

class SynonymAnalyzer < Ferret::Analysis::Analyzer
   include Ferret::Analysis

   def initialize(synonym_engine, stop_words =  
FULL_ENGLISH_STOP_WORDS, lower = true)
     @synonym_engine = synonym_engine
     @lower = lower
     @stop_words = stop_words
   end

   def token_stream(field, str)
     ts = StandardTokenizer.new(str)
     ts = LowerCaseFilter.new(ts) if @lower
     ts = StopFilter.new(ts, @stop_words)
     ts = SynonymTokenFilter.new(ts, @synonym_engine)
   end
end

class SynonymTokenFilter < Ferret::Analysis::TokenStream
   include Ferret::Analysis

   def initialize(token_stream, synonym_engine)
     @token_stream = token_stream
     @synonym_stack = []
     @synonym_engine = synonym_engine
   end

   def text=(text)
     @token_stream.text = text
   end

   def next
     return @synonym_stack.pop if @synonym_stack.size > 0

     if token = @token_stream.next
       add_synonyms_to_stack(token) unless token.nil?
     end

     return token
   end

   private
   def add_synonyms_to_stack(token)
     synonyms = @synonym_engine.get_synonyms(token.text)

     return if synonyms.nil?

     synonyms.each do |s|
       @synonym_stack.push(
         Token.new(s, token.start, token.end, 0))
     end
   end
end

FInally a WordnetSynonymEngine that queries my wordnet index I created:

class WordnetSynonymEngine
   include Ferret::Search

   def initialize(index_name = "wordnet")
     @searcher = Searcher.new("#{RAILS_ROOT}/index/#{ENV 
['RAILS_ENV']}/#{index_name}")
   end

   def get_synonyms(word)
     @searcher.search_each(TermQuery.new(:word, word)) do |doc_id,  
score|
       return @searcher[doc_id][:syn]
     end

     return nil
   end
end


It works great except that I'd really like that ability to only run  
tokens through the SynonymTokenFilter when they are prepended by an  
unescaped % sign.

Also, if anyone is interested I can post the code for turning the  
wordnet prolog database into a ferret database (primarily recoding  
the java lucene program that did the same thing to ruby and ferret).

Thanks,
Curtis
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to