bug#79702: request: flag for visually identical but different unicode characters

Dave via Bug reports for GNU grep Sun, 26 Oct 2025 06:54:46 -0700

Today, I realized that there are characters which are visually
identical, yet have different unicodes, thus they can't be matched in
grep.


Example #1:
احمدی

Example #2:
احمدى

The ى in both examples are exactly the same, yet the first one is
U+06CC, and second one U+0649.

>From the user's perspective, it's impossible to realize which unicode
the word is using. In fact, these two words, even though they are from
different languages/keyboards, match perfectly on the other letters,
and only it's ی/ى that espaces the match.

While not as important, this letter has other variants like ي (notice
two dots below it, think an umlaut) corresponding to U+064A. If you
press Ctrl + F on your browser, you'd notice that you can match U+064A
with U+0649 one. but this is not the default behavior in grep either.

I understand there's no straightforward solution for this, so I'm
thinking of having an extra flag which converts all visually similar
characters to the same unicode and then looks for matches. Thoughts?

bug#79702: request: flag for visually identical but different unicode characters

Reply via email to