Today, I realized that there are characters which are visually
identical, yet have different unicodes, thus they can't be matched in
grep.

Example #1:
احمدی

Example #2:
احمدى

The ى in both examples are exactly the same, yet the first one is
U+06CC, and second one U+0649.

>From the user's perspective, it's impossible to realize which unicode
the word is using. In fact, these two words, even though they are from
different languages/keyboards, match perfectly on the other letters,
and only it's ی/ى that espaces the match.

While not as important, this letter has other variants like ي (notice
two dots below it, think an umlaut) corresponding to U+064A. If you
press Ctrl + F on your browser, you'd notice that you can match U+064A
with U+0649 one. but this is not the default behavior in grep either.

I understand there's no straightforward solution for this, so I'm
thinking of having an extra flag which converts all visually similar
characters to the same unicode and then looks for matches. Thoughts?



Reply via email to