Alng these same lines, has anyone developed a Spanish filter? I have looked but have not turned anything up.
Justin > -----Original Message----- > From: Joshua O'Madadhain [mailto:[EMAIL PROTECTED]] > Sent: Thursday, November 21, 2002 4:00 PM > To: Lucene Users List > Cc: Joshua Rhys Taliesin O'Madadhain > Subject: Re: StandardFilter that works for French > > On Thu, 21 Nov 2002, Konrad Scherer wrote: > > > In French you have 6 words (me, te, se, le/la , ne, de) > where the e is > > replaced with an apostrophe when the following word starts > with a vowel. > > For example me aider becomes m'aider. Currently Lucene > indexes m'aider, > > s'aider, n'aider as different words when in fact they > should be analyzed as > > me aider, se aider, ne aider, etc. So I modified Standard > filter to send > > back these words as two words. I had to add a one Token > buffer. I toyed > > with modifying StandardTokenizer.jj but I was worried about > unintended > > changes in behavior. > > > > This change will not effect English indexing. The only > change I can think > > of is that a word like m'lord would be indexed as "me > lord". Still it might > > be better to make a French package and add this to a French Filter. > > There are a number of contractions in English that could be > affected if > you're using the apostrophe as a marker, e.g.: isn't, > wouldn't, I'd, he's, > hasn't. (Granted, these are often considered stop words.) > Thus, I think > that your idea of incorporating this change into a French > filter, rather > than modifying Standard filter, is a good idea. > > Joshua O'Madadhain > > [EMAIL PROTECTED] Per > Obscurius....www.ics.uci.edu/~jmadden > Joshua O'Madadhain: Information Scientist, Musician, > Philosopher-At-Tall > It's that moment of dawning comprehension that I live for. > -- Bill Watterson > My opinions are too rational and insightful to be those of > any organization. > > > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>