[ https://issues.apache.org/jira/browse/LUCENE-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534818 ]
Karl Wettin commented on LUCENE-1029: ------------------------------------- >> With the accent filter, running the Swedish word "kön" through the filter >> would >> create "kon". The first means "gender" and the second "cow". That would not >> be accetable. > > I am feeling lazy right now, but it seems to me you could find a similar rare > stemming > example (eg something that means something else in its stemmed form). The > process > is algorithmic after all, and there are many language with plenty of words > out there. Just to point out, pretty much any small (less than say 6 letters or so) in Swedish containing å, ä or ö would get a complete different meaning if you replace the letters. > Illegal character replacements in ISOLatin1AccentFilter > ------------------------------------------------------- > > Key: LUCENE-1029 > URL: https://issues.apache.org/jira/browse/LUCENE-1029 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 2.2 > Reporter: Marko Asplund > > The ISOLatin1AccentFilter class is responsible for replacing "accented > characters in the ISO Latin 1 character set by their unaccented equivalent". > Some of the replacements performed for scandinavian characters (used e.g. in > the finnish, swedish, danish languages etc.) are illegal. The scandinavian > characters are different from the accented characters used e.g. in latin > based languages such as french in that these characters (ä, ö, å) represent > entirely independent sounds in the language and therefore cannot be > represented with any other sound without change of meaning. It is therefore > illegal to replace these characters with any other character. > This means for example that you can't change the finnish word sää (weather) > to saa (will have) because these are two entirely different words with > different meaning. The same applies to scandinavian languages as well. > There's no connection between the sounds represented by ä and a; ö and o or å > and a. > In addition to the three characters mentioned above danish and norwegian use > other special characters such as ø and æ. It should be checked if the > replacement is legal for these characters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]