The best Analyzer documentation so far is Erik Hatcher's "Parser Rulez" article. Link is under Resources page on Lucene's site.
Looking forward to the contribution. Otis --- karl wettin <[EMAIL PROTECTED]> wrote: > > Hello list, > > I'm Karl, and I just started testing Lucene the other day. It's a > great > core engine, but feel there are some things missing I'd be happy to > contribute with. > > I stated with writing a simple N-gram classifier to detect language > of > a text in order to automatically cluster documents by langauge. The > algorithm is very similair to the "TextCat" C-libray. > > And then I though, maybe it would be possible to use the same N-gram > classifier to make an automatic stemmer that works on all languages. > Hopefully I'll have something up and running for tests by next > weekend. > > The same classifier could be used for a simple metaphone index. > > However, I need some help on understanding the Analyzer. Where can I > find some tutorials on how to write my own? I didn't check with > Google, > maybe I should before posting here. Since the stemmer (and metaphone) > data would have to be indexed in their own field(?) querying the > stemmed > would require one to stem the query too. Can I create a subclass of > Query (or so), or do I need to create my own Query-class that handles > the stemming all the way for the user? The last option is my current > approach, so I would appreciate some hints and pointers here. > > > Great project! > > > karl > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]