Re: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
Johnson, Robert F wrote: Hi, I have been having a high occurrence of Japanese false positives since upgrading from Spam Assassin 2.64 on RedHat 7.3 with MimeDefang 2.31 to Spam Assassin 3.01 on RedHat Workstation 3.0 installed site wide via MimeDefang 2.44. I am wondering if this is due to the problem with Red Hat 9.0 Unicode UTF-8. I had no issues with Japanese false positives in the RH 7.3 based environment. I've a few articles regarding this issue, but need some help understanding correct LANG configurations for Spam Assassin 3.01 on RedHat Workstation 3.0 installed site wide via MimeDefang 2.44. I currently have the following set in /etc/sysconfig/ i18n: ( we are US based) LANG=en_US SUPPORTED=en_US I compiled Spam Assassin from tar ball with LANG set to en_US (export LANG=en_US). Are these settings correct? Could this be causing the Japanese false positives? Are there any other known issues that can cause Japanese false positives using Spam Assassin 3.01? Thanks for any help! Rob Rob, just a couple obvious questions. what are your ok_locales and ok_languages settings in your sa-mimedefang.cf file set to? what rules are the japanese emails hitting when they're tagged as false positives? I'm based in Japan, just recently upgraded to SA 3.01 with MD 2.49 and using a MySQL based bayes database and I've been noticing some quirkiness with Japanese email as well, but haven't really pinned it down yet. alan
RE: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
-Original Message- From: alan premselaar [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 30, 2004 5:55 PM To: Johnson, Robert F Cc: users@spamassassin.apache.org Subject: Re: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0 Johnson, Robert F wrote: Hi, I have been having a high occurrence of Japanese false positives since upgrading from Spam Assassin 2.64 on RedHat 7.3 with MimeDefang 2.31 to Spam Assassin 3.01 on RedHat Workstation 3.0 installed site wide via MimeDefang 2.44. I am wondering if this is due to the problem with Red Hat 9.0 Unicode UTF-8. I had no issues with Japanese false positives in the RH 7.3 based environment. I've a few articles regarding this issue, but need some help understanding correct LANG configurations for Spam Assassin 3.01 on RedHat Workstation 3.0 installed site wide via MimeDefang 2.44. I currently have the following set in /etc/sysconfig/ i18n: ( we are US based) LANG=en_US SUPPORTED=en_US I compiled Spam Assassin from tar ball with LANG set to en_US (export LANG=en_US). Are these settings correct? Could this be causing the Japanese false positives? Are there any other known issues that can cause Japanese false positives using Spam Assassin 3.01? Thanks for any help! Rob Rob, just a couple obvious questions. what are your ok_locales and ok_languages settings in your sa-mimedefang.cf file set to? what rules are the japanese emails hitting when they're tagged as false positives? I'm based in Japan, just recently upgraded to SA 3.01 with MD 2.49 and using a MySQL based bayes database and I've been noticing some quirkiness with Japanese email as well, but haven't really pinned it down yet. alan [Johnson, Robert F] Thanks for your reply. I had ok_locales set to all but didn't have ok_languages explicitly set. I think that is ok since the default value is supposed to be all. Based on spt checking of a couple of dozen examples, I didn't see any significant pattern of out of the box rules being involved, mostly SARE or WIKI rules. The most heavily implicated were the following: (MANGLED and SARE_SUB_CASH_CHAR were probably had the biggest impact. SARE Rules SARE_SUB_CASH_CHAR SARE_RAND_2 WIKI Rules MANGLED_LIST MANGLED_LIPS J_CHICKENPOX_12 J_CHICKENPOX_22 HTML_BACKHAIR_4 Out of the Box: GAPPY_SUBJECT FREE_SAMPLE OBSCURED_EMAIL Rob
Re: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
Johnson, Robert F [EMAIL PROTECTED] writes: Based on spt checking of a couple of dozen examples, I didn't see any significant pattern of out of the box rules being involved, mostly SARE or WIKI rules. The most heavily implicated were the following: (MANGLED and SARE_SUB_CASH_CHAR were probably had the biggest impact. SARE Rules SARE_SUB_CASH_CHAR SARE_RAND_2 WIKI Rules MANGLED_LIST MANGLED_LIPS J_CHICKENPOX_12 J_CHICKENPOX_22 HTML_BACKHAIR_4 The last of those is a default rule, but it has almost a zero score. Out of the Box: GAPPY_SUBJECT FREE_SAMPLE OBSCURED_EMAIL The problem doesn't sound like it's SpamAssassin despite the subject line of this email, rather it's third-party rulesets. Daniel -- Daniel Quinlan http://www.pathname.com/~quinlan/
Re: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
Daniel Quinlan wrote: Johnson, Robert F [EMAIL PROTECTED] writes: Based on spt checking of a couple of dozen examples, I didn't see any significant pattern of out of the box rules being involved, mostly SARE or WIKI rules. The most heavily implicated were the following: (MANGLED and SARE_SUB_CASH_CHAR were probably had the biggest impact. SARE Rules SARE_SUB_CASH_CHAR SARE_RAND_2 WIKI Rules MANGLED_LIST MANGLED_LIPS J_CHICKENPOX_12 J_CHICKENPOX_22 HTML_BACKHAIR_4 The last of those is a default rule, but it has almost a zero score. Out of the Box: GAPPY_SUBJECT FREE_SAMPLE OBSCURED_EMAIL The problem doesn't sound like it's SpamAssassin despite the subject line of this email, rather it's third-party rulesets. Daniel I hit GAPPY_SUBJECT and OBSCURED_EMAIL *A LOT* ... i don't have any 3rd party rulesets installed. as a side note, i've been recently trying to update the JAPAN_UCE_SUBJECT rule as there's another phrase that's being used recently, and for some reason it hasn't been triggering. I think part of the problem is that I have to enter it in ISO-2022-JP charset and it contains at least 2 escape(d) characters so the regex might night be accurate. (still working on that) alan
Re[2]: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
Hello Robert, Tuesday, November 30, 2004, 9:25:52 PM, Daniel wrote: DQ The problem doesn't sound like it's SpamAssassin despite the subject DQ line of this email, rather it's third-party rulesets. I agree. DQ Johnson, Robert F [EMAIL PROTECTED] writes: Based on spt checking of a couple of dozen examples, I didn't see any significant pattern of out of the box rules being involved, mostly SARE or WIKI rules. The most heavily implicated were the following: (MANGLED and SARE_SUB_CASH_CHAR were probably had the biggest impact. SARE Rules SARE_SUB_CASH_CHAR SARE_RAND_2 Can you email a couple of examples to me that hit these rules to me, preferably in a zip or gz file? I maintain the Subject rules file for SARE, and would like to refine/rescore SARE_SUB_CASH_CHAR to help avoid your FPs. I'll also forward the info to the SARE ninja that maintains our Random rules file. WIKI Rules MANGLED_LIST MANGLED_LIPS J_CHICKENPOX_12 J_CHICKENPOX_22 All of these are language-related rules, which work well in English, might be subject to an occasional misfire in a non-English Western European language, and can readily misfire in any non-Latin/non-Romance language. If you regularly get non-spam in Japanese, you should probably drop the entire MANGLED and CHICKENPOX families. If you're using Tripwire, you should drop that also since it too can misfire on Japanese non-spam. Bob Menschel
Re[2]: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
-Original Message- From: Robert Menschel [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 9:56 AM To: Johnson, Robert F; users@spamassassin.apache.org Subject: Re[2]: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0 Hello Robert, Tuesday, November 30, 2004, 9:25:52 PM, Daniel wrote: DQ The problem doesn't sound like it's SpamAssassin despite the subject DQ line of this email, rather it's third-party rulesets. I agree. DQ Johnson, Robert F [EMAIL PROTECTED] writes: Based on spt checking of a couple of dozen examples, I didn't see any significant pattern of out of the box rules being involved, mostly SARE or WIKI rules. The most heavily implicated were the following: (MANGLED and SARE_SUB_CASH_CHAR were probably had the biggest impact. SARE Rules SARE_SUB_CASH_CHAR SARE_RAND_2 Can you email a couple of examples to me that hit these rules to me, preferably in a zip or gz file? I maintain the Subject rules file for SARE, and would like to refine/rescore SARE_SUB_CASH_CHAR to help avoid your FPs. I'll also forward the info to the SARE ninja that maintains our Random rules file. WIKI Rules MANGLED_LIST MANGLED_LIPS J_CHICKENPOX_12 J_CHICKENPOX_22 All of these are language-related rules, which work well in English, might be subject to an occasional misfire in a non-English Western European language, and can readily misfire in any non-Latin/non-Romance language. If you regularly get non-spam in Japanese, you should probably drop the entire MANGLED and CHICKENPOX families. If you're using Tripwire, you should drop that also since it too can misfire on Japanese non-spam. Bob Menschel [Johnson, Robert F] Bob, Thanks for the reply. I will try to get some example for your analysis. I may have to attempt a repro of the issue. I will let you know soon. Could the SARE team provide a guideline regarding the best SARE and WIKI rules sets to work in an environment that supports the following languages? Maybe some sort of a local language compatibility matrix would be useful to many users. I would be happy to help put that together in any way I could. Japanese, Korean, traditional and simplified Chinese, English, assorted European. Regards, Rob
Re[3]: Japanese False Postives with Spam Assassin 3.01 and RH WS 3.0
Hello Robert, Wednesday, December 1, 2004, 10:29:12 AM, you wrote: JRF Could the SARE team provide a guideline regarding the best SARE JRF and WIKI rules sets to work in an environment that supports the JRF following languages? Unfortunately, the SARE team is heavily North American, and as such has not yet been able to develop good rules for non-European languages. The best we've done is to break out a few of our English-specific rules into *_eng.cf files, so people who need to receive non-English emails don't get burned by them. That move isn't 100% perfect, and we may need to add SARE_SUB_CASH_CHR to that list if we can't fix your problem with that rule. Bob Menschel