hi perl,

Perl wrote:
For speed, I'd recommend caching a simplified 'eyedex' version of each
person's username, either as a new column in their main record, or in a
secondary list that is cross-indexed by the user_id.
thanks for the suggestion.

grouping visually similar characters (or chunks, more specifically, because i also want to catch things like "m" and "nn") and then replacing all occurences of such chunks with one representation (e.g.: all 1, |, !, and l into 1, or the "eyedex" version) is a pretty straightforward approach i guess. i'll probably try to do an implementation of it and see how well it does.

but after some more thinking yesterday, it occured to me that this "typo attack" problem is actually much more difficult. it has to deal with different fonts and even gliphs/figures in general. for example, in some fonts "W" is visually similar to "VV" (two Vs). "1" (the digit one) and "l" (the lowercase L) in some fonts are pretty distinct, while in some other fonts they might be virtually indistinguishable to the common eyes. then there's the problem of different faces (italics, bold, etc), text styles (strikethroughs, underlines/overlines), and different font sizes that could complicate the matter even further.

--
dave


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to