[ https://issues.apache.org/jira/browse/LUCENE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578195#action_12578195 ]
Hiroaki Kawai commented on LUCENE-1029: --------------------------------------- I'd like to comment that we have another tool for this. :-) java.text.Collator can collate the texts, and the instance is base on Locale, wow! So, if we use this collator, you might get a better query result, i.e, more low search noise that German "ä" might hit with "ae". I'd like to submit a patch later. > Illegal character replacements in ISOLatin1AccentFilter > ------------------------------------------------------- > > Key: LUCENE-1029 > URL: https://issues.apache.org/jira/browse/LUCENE-1029 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.2 > Reporter: Marko Asplund > Attachments: ISOLatin1AccentFilter-javadoc.patch > > > The ISOLatin1AccentFilter class is responsible for replacing "accented > characters in the ISO Latin 1 character set by their unaccented equivalent". > Some of the replacements performed for scandinavian characters (used e.g. in > the finnish, swedish, danish languages etc.) are illegal. The scandinavian > characters are different from the accented characters used e.g. in latin > based languages such as french in that these characters (ä, ö, å) represent > entirely independent sounds in the language and therefore cannot be > represented with any other sound without change of meaning. It is therefore > illegal to replace these characters with any other character. > This means for example that you can't change the finnish word sää (weather) > to saa (will have) because these are two entirely different words with > different meaning. The same applies to scandinavian languages as well. > There's no connection between the sounds represented by ä and a; ö and o or å > and a. > In addition to the three characters mentioned above danish and norwegian use > other special characters such as ø and æ. It should be checked if the > replacement is legal for these characters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]