http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14759
--- Comment #9 from David Cook <dc...@prosentient.com.au> --- (In reply to Galen Charlton from comment #8) > (In reply to David Cook from comment #7) > > Do we really need to remove accents > > for that? > > Per bug 7411, there was apparently an issue searching on usernames with > diacritics, although in retrospect that may simply have been an issue with > mismatched Unicode normalization forms -- impossible to tell now. > > The current patcheset for bug 7679 also proposes to use Text::Unaccent, but > I'm dubious about that one. It's surprising that Text::Unaccent doesn't appear to be working correctly, since it is using iconv for the heavy lifting, and iconv seems to be pretty good when it comes to character conversions. I can't speak to Hebrew or Greek (while I thought I wasn't bad with the modern Greek alphabet, I didn't know they used accents...), Arabic is sure interesting. So we have the following string: مُدَرِّسَة If we run the following: echo "مُدَرِّسَة" | xxd -p We get this hex: d985d98fd8afd98ed8b1d990d991d8b3d98ed8a90a If we look at the first couple bytes there using a UTF-8 table (http://www.utf8-chartable.de/unicode-utf8-table.pl) d985 = م = ARABIC LETTER MEEM d98f = ُُ = ARABIC DAMMA Together, these are written like مُ However, if you add the letter "dal": d8af = د = ARABIC LETTER DAL You'll get something like the following: مُد We'd recognize that from the "English end/Arabic start" of the string: "مُدَرِّسَة" I had forgotten that Hebrew only has consonants in its alphabet, and it appears Arabic is the same. So that "damma" indicates a vowel sound but isn't a letter per se. I'd say it's a diacritic and this would agree: https://en.wikipedia.org/wiki/Arabic_diacritics#.E1.B8.8Cammah So the output for "Strip Nonspacing Mark" looks good in the very first case at least: Strip NonspacingMark - مُدَرِّسَة => مدرسة Although I don't know if it makes sense semantically as I don't read Arabic. If I understand correctly, you can omit vowel sounds from written Arabic and rely purely on context for meaning? (https://en.wikipedia.org/wiki/Arabic_alphabet#Vowels) At a glance, the Strip NonspacingMark looks OK for Greek too as those diacritics appear to be there purely for pronunciation like in languages written in the Roman alphabet. (https://en.wikipedia.org/wiki/Modern_Greek#Phonology_and_orthography) -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list Koha-bugs@lists.koha-community.org http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/