Hi,

I see a lots of thread about apostrophe not being considered a separator and
I see lots of french people complaining about that (I also complain since I
am french ;) ).

My question is "what is the status of http://tinyurl.com/ynskw3 ?"

I think the patch given in this thread will work for english and french
without disturbing the filtering of english words such as O'Reilly since it
only cares about "m', t', s', n', l', d'" as the first letters which I think
is not going to happen in any english construction.

so what is planned :
 1. having a FrenchStandardFilter and an EnglishStandardFilter and removing
StandardFilter
 2. include that in the StandardFilter
 3. having a EuropeanStandardFilter (with the most common rules of english,
french, german, spanish, italian, ...)
 3. doing nothing

Personnaly, I'd like a EuropeanStandardFilter (i.e the 3rd point) which will
handle most of the cases as I often find myself indexing french and english
documents (as well as some spanish and italian) and I do not care losing
some terms (for example, if the document was english but a word as been lost
because of a very common italian rule).

Thanks for the time you will spend answering my question

chris
-- 
View this message in context: 
http://www.nabble.com/Apostrophe-filtering-in-StandardFilter-tp15156768p15156768.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to