Today, I realized that there are characters which are visually identical, yet have different unicodes, thus they can't be matched in grep.
Example #1: احمدی Example #2: احمدى The ى in both examples are exactly the same, yet the first one is U+06CC, and second one U+0649. >From the user's perspective, it's impossible to realize which unicode the word is using. In fact, these two words, even though they are from different languages/keyboards, match perfectly on the other letters, and only it's ی/ى that espaces the match. While not as important, this letter has other variants like ي (notice two dots below it, think an umlaut) corresponding to U+064A. If you press Ctrl + F on your browser, you'd notice that you can match U+064A with U+0649 one. but this is not the default behavior in grep either. I understand there's no straightforward solution for this, so I'm thinking of having an extra flag which converts all visually similar characters to the same unicode and then looks for matches. Thoughts?
