According to the Analyzer doc and the StandardTokenizer doc:

  http://ferret.davebalmain.com/api/classes/Ferret/Analysis/Analyzer.html
  
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html

I ought to be able to construct a StandardTokenizer like this:

  t = StandardTokenizer.new( true) # true to downcase tokens

and then later:

  stream = token_stream( ignored_field_name, some_string)

To create a new TokenStream from some_string. This approach would be
valuable for my application since I am analyzing many short strings --
so I'm thinking that building my 5-deep analyzer chain for each small
string will be a nice savings.

Unfortunately, StandardTokenizer#initialize does not work as advertised.
It takes a string, not a boolean. So it does not support the reuse model
from the documentation cited above. If you have a look at the "source"
link on the StandardTokenizer documentation for "new":
  
http://ferret.davebalmain.com/api/classes/Ferret/Analysis/StandardTokenizer.html#

You'll see that the rdoc comment apparently lies :) That formal
parameter name that should hold "lower" is named "rstr". Fishy. A quick
look indicates that WhiteSpaceTokenizer has a similar mismatch with its
documentation.

Is there an idiomatic way to reuse analyzer chains?
-- 
Posted via http://www.ruby-forum.com/.
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to