Re: [Ferret-talk] Tokenizers?

Jens Kraemer Wed, 17 Jan 2007 01:41:05 -0800

Hi!

On Wed, Jan 17, 2007 at 01:05:14PM +0700, Stian Haklev wrote:
[..]
> but is there a way I could easily configure the
> "tokenizing" behaviour (let me know if my terminology is wrong) to
> split for example "applications/entries" into two words, searchable by
> themselves?


your terminology is correct, the tokenizer is responsible of splitting
document content into single terms.

You can get an idea of how this works at
http://ferret.davebalmain.com/api/classes/Ferret/Analysis.html

If you want to use a custom tokenizer you'll have to write your own
analyzer which then makes use of this tokenizer. Don't be afraid, this 
is really easy:

def MyAnalyzer < Ferret::Analysis::Analyzer
  def token_stream(field, str)
    return
      StemFilter.new(LowerCaseFilter.new(StandardTokenizer.new(str)))
  end
end
(from
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html)

hope this gets you started.

Cheers,
Jens


-- 
webit! Gesellschaft für neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Krämer       [EMAIL PROTECTED]
Schnorrstraße 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Tokenizers?

Reply via email to