On 9/6/06, Kent Sibilev <[EMAIL PROTECTED]> wrote:
> David Balmain wrote:
> > On 9/6/06, Kent Sibilev <[EMAIL PROTECTED]> wrote:
> > No analyzer currently emulates Lucene's StandardAnalyzer exactly.
> > You'd have to port it to Ruby which shouldn't be too hard if you know
> > how to use racc. But is sounds to me like you don't need anything so
> > complex. If you are indexing code you might want to try using the
> > AsciiLetterAnalyzer.
>
> No, it doesn't do what I want. Looking at the code I'm slightly
> confused. The criteria is that if isalpha returns 0 then we reached the
> end of a token. Does it mean that '_' character is considered
> alphanumeric?
irb(main):001:0> require 'rubygems'
irb(main):002:0> require 'ferret'
irb(main):004:0> i = Ferret::I.new(:analyzer =>
Ferret::Analysis::AsciiLetterAnalyzer.new)
irb(main):005:0> i << "test_case"
irb(main):006:0> i.search("case")
=> #<struct Ferret::Search::TopDocs total_hits=1, hits=[#<struct
Ferret::Search::Hit doc=0, score=0.191783010959625>],
max_score=0.191783010959625>
irb(main):007:0>
So no, '_' is not considered alphanumeric (or in this case alpha, as
AsciiLetterAnalyzer won't match numbers)
> > Or you could use the RegExpAnalyzer and describe
> > your tokens with a Ruby RegExp. Something like this;
> >
> > include Ferret
> > include Ferret::Analysis
> > index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/))
> >
> > # or if you want case sensitive searches;
> > index = I.new(:analyzer => RegExpAnalyzer.new(/[A-Za-z0-9]/, false))
> >
>
> It would be great if this code worked, but it segfaulted on me. I've
> glanced at the code and noticed that for this type of stream
>
> typedef struct RegExpTokenStream {
> CachedTokenStream super;
> VALUE rtext;
> VALUE regex;
> VALUE proc;
> int curr_ind;
> } RegExpTokenStream;
>
>
> you initialize tree VALUE objects but never mark them for garbage
> collector. Eventually they are being freed behind my back. What you
> should do is to keep the type of the stream in TokenStream structure and
> rework frt_ts_mark method.
>
> Hope that helps,
> Kent
Actually, frt_rets_mark already marks the three VALUE objects
correctly. What would really help would be if you could give me an
example script that segfaults. If you can do this I'll fix it and get
a new gem out as soon as possible.
Cheers,
Dave
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk