Pádraig Brady wrote: > Note as well as folding case I think it might > be useful to fold other forms like: > Enclosed: \u24b6 -> A > Stylistic: \uff21-> A
These two transformations are already executed when you use ulc_casecmp with the UNINORM_NFKD argument. > Diacritics: À -> A Very good point. The case-insensitive comparisons are used in contexts where different people enter the same word / name / term. But in these context, additional transformations need to be done, depending on culture. I think Google's front end to the search engine does these transformations. They are: - for French, to remove accents and diacritics, - for German, to transform umlauts (ü -> ue), - for Danish, probably to transform å -> aa, - and certainly much more for other languages (what is it for Chinese)? > I.E. have more general function like: > ulc_coll(fold={Case|Diactritics|Stylistic}, ...); _coll or _cmp ? _coll is used when people want to put lists of names in order. The use case where diacritics are ignored is to do lookups, not for sorting. Also, as mentioned above, I think which parts should be folded is locale dependent. For French, it is ok to ignore diacritics when doing caseless matching; for German, it is not. Bruno _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils