fkoyer opened a new pull request, #19: URL: https://github.com/apache/spamassassin/pull/19
Problems with old definitions: * Tries to match UTF-8 and Latin-1 characters in same expression. e.g. \<A\> includes the byte sequence for "ã" in Latin-1 (\xE3) and UTF-8 (\xC3\xA3). This seems like a good thing at first but it can cause false positives if the text is in UTF-8 and the pattern is looking for Latin-1 * Contains redundant characters. e.g. \xE3 appears multiple times in \<A\> * Contains unnecessary characters. e.g. \xE3 also appears in \<V\> and \<Y\> * Patterns are case-insensitive. e.g. \<I\> attempts to match lowercase L but because it's case-insensitive, it also matches uppercase L * Some look-alike characters aren't matched e.g. \xEA\x93\xAE = LISU LETTER A (U+A4EE) Changes: * All byte sequences are UTF-8 only (no Latin-1) * All patterns are case-sensitive * Removed redundant and unnecessary characters * Added additional look-alike characters -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
