Re: I am getting all external domain emails subject tagged as SpamSpam
more logs Oct 1 13:22:20 mail amavis[17226]: (17226-02) LMTP RCPT TO:u...@example.com ORCPT=rfc822;u...@example.com\r\n Oct 1 13:22:20 mail amavis[17226]: (17226-02) LMTP 250 2.1.5 Recipient u...@example.com OK Oct 1 13:22:20 mail amavis[17226]: (17226-02) LMTP::10024 /var/lib/amavis/tmp/amavis-20091001T131825-17226: mohsinaliz...@hotmail.com - moh...@example.com,u...@example.com SIZE=1911 Received: from mail.example.com ([127.0.0.1]) by localhost (mail.example.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP; Thu, 1 Oct 2009 13:22:20 +0600 (PKST) Oct 1 13:22:20 mail amavis[17226]: (17226-02) Checking: k-6-c3dQQGNL mohsinaliz...@hotmail.com - moh...@example.com,u...@example.com Oct 1 13:22:20 mail amavis[17226]: (17226-02) query_keys: u...@example.com, user@, example.com, .example.com, .com.pk, .pk, . Oct 1 13:22:20 mail amavis[17226]: (17226-02) lookup_hash(u...@example.com), no matches Oct 1 13:22:20 mail amavis[17226]: (17226-02) lookup (bypass_virus_checks) = undef, u...@example.com does not match Oct 1 13:22:20 mail amavis[17226]: (17226-02) lookup (bypass_header_checks) = true, u...@example.com matches, result=1, matching_key=(constant:1) Oct 1 13:22:20 mail amavis[17226]: (17226-02) query_keys: u...@example.com, user@, example.com, .example.com, .com.pk, .pk, . Oct 1 13:22:20 mail amavis[17226]: (17226-02) lookup_hash(u...@example.com), no matches Oct 1 13:22:20 mail amavis[17226]: (17226-02) lookup (bypass_banned_checks) = undef, u...@example.com does not match Oct 1 13:22:20 mail amavis[17226]: (17226-02) lookup (banned_filename), 1 matches for u...@example.com, results: (constant:DEFAULT)=DEFAULT Oct 1 13:22:20 mail amavis[17226]: (17226-02) collect banned table[0]: u...@example.com, tables: DEFAULT=Amavis::Lookup::RE=ARRAY(0x8c680e8) Oct 1 13:22:20 mail amavis[17226]: (17226-02) skip banned check for u...@example.com, same tables as previous, result = Oct 1 13:22:20 mail amavis[17226]: (17226-02) p.path u...@example.com: P=p003,L=1,M=multipart/alternative | P=p001,L=1/1,M=text/plain,T=txt Oct 1 13:22:20 mail amavis[17226]: (17226-02) skip banned check for u...@example.com, same tables as previous, result = Oct 1 13:22:20 mail amavis[17226]: (17226-02) p.path u...@example.com: P=p003,L=1,M=multipart/alternative | P=p002,L=1/2,M=text/html,T=html Oct 1 13:22:31 mail amavis[17226]: (17226-02) query_keys: u...@example.com, user@, example.com, .example.com, .com.pk, .pk, . Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup_hash(u...@example.com), no matches Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (bypass_virus_checks) = undef, u...@example.com does not match Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (spam_tag2_level) = true, u...@example.com matches, result=4.31, matching_key=(constant:4.31) Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (spam_tag3_level) = undef, u...@example.com does not match Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (spam_kill_level) = true, u...@example.com matches, result=4.31, matching_key=(constant:4.31) Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (bypass_spam_checks) = true, u...@example.com matches, result=1, matching_key=(constant:1) Oct 1 13:22:31 mail amavis[17226]: (17226-02) final_destiny PASS, recip u...@example.com Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (clean_quarantine_to) = true, u...@example.com matches, result=clean-quarantine, matching_key=(constant:clean-quarantine) Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup = undef, u...@example.com, no lookup tables Oct 1 13:22:31 mail amavis[17226]: (17226-02) query_keys: u...@example.com, user@, example.com, .example.com, .com.pk, .pk, . Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup_hash(u...@example.com), no matches Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup_acl(u...@example.com) matches key example.com, result=1 Oct 1 13:22:31 mail amavis[17226]: (17226-02) lookup (local_domains) = true, u...@example.com matches, result=1, matching_key=example.com Oct 1 13:22:31 mail amavis[17226]: (17226-02) headers CLUSTERING: u...@example.com joining cluster Oct 1 13:22:31 mail amavis[17226]: (17226-02) (about to connect to [127.0.0.1]:10025) FWD via SMTP: mohsinaliz...@hotmail.com - moh...@example.com,u...@example.com Oct 1 13:22:31 mail amavis[17226]: (17226-02) sending RCPT TO:u...@example.com Oct 1 13:22:31 mail amavis[17226]: (17226-02) response to RCPT TO for u...@example.com: 250 2.1.5 Ok Oct 1 13:22:32 mail amavis[17226]: (17226-02) FWD via SMTP: mohsinaliz...@hotmail.com - moh...@example.com,u...@example.com, 250 2.6.0 Ok, id=17226-02, from MTA([127.0.0.1]:10025): 250 2.0.0 Ok: queued as E0EAD19B349 Oct 1 13:22:32 mail amavis[17226]: (17226-02) dsn: from MTA 250 Clean mohsinaliz...@hotmail.com - u...@example.com: on_succ=0, on_dly=1, on_fail=1, never=0, warn_sender=, DSN_passed_on=1 Oct 1 13:22:32 mail amavis[17226]: (17226-02) DSN: SUCC from MTA 250 Clean, no DSN requested:
Re: SA 3.3.0 and sa-compile
I have same problem. Any solution ? Regards Zdenek Herman zdenek.her...@ille.cz tel: 777 730 218 http://www.cistaposta.cz to...@starbridge.org napsal(a): -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, i'm running SA 3.3.0 (3.3.0-alpha3-r808953) and i've some problem with compiled rules. sa-compile runs without errors, and SA seems to works fine when restarted. But some body rules are now not detected. exemple of simple body rule (for testing): body TONIO_SPAM_TEST/toniospam/i describe TONIO_SPAM_TESTMentions Generic toniospamtest score TONIO_SPAM_TEST 5 if i commented out loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody in v320.pre, body rules is working again. I've tested with SA 3.2.5 and it's working fine with Rule2XSBody active. I've tried to delete compiled rules and compile again: same result. Some info on my environnement: debian testing perl v5.10.0 xsubpp version 2.200401 (from debian perl package) re2c version 0.13.5-1 Thanks for your help Regards Tonio NB: sorry for this second post, but i've made a mistake with the previous one (replying to an other thread) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkrDzE4ACgkQ8FtMlUNHQINOIgCeIgXvgz5VafWgZmeb7RhS3vvo 7ZUAn0+ANE9/uzBbSTcCsn26PGVHlflt =sq17 -END PGP SIGNATURE-
Re: .cn Oddity
On Thu, 1 Oct 2009, Warren Togami wrote: uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Therapeutic Phrenologist - send email for affordable rate schedule. --- Approximately 9051420 firearms legally purchased in the U.S. this year
Re: SA 3.3.0 and sa-compile
On Thu, 1 Oct 2009, Zdenek Herman wrote: I have same problem. Any solution ? to...@starbridge.org napsal(a): i'm running SA 3.3.0 (3.3.0-alpha3-r808953) and i've some problem with compiled rules. sa-compile runs without errors, and SA seems to works fine when restarted. But some body rules are now not detected. A suggestion to both of you, based on sa-compile support requests seen earlier on the list: run sa-compile with the debug option turned on, publish the debugging output and intermediate files on a webserver somewhere, and post the URIs for that info here so they can be examined. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Therapeutic Phrenologist - send email for affordable rate schedule. --- Approximately 9051420 firearms legally purchased in the U.S. this year
Re: SA 3.3.0 and sa-compile
On Thu, Oct 1, 2009 at 16:15, John Hardin jhar...@impsec.org wrote: On Thu, 1 Oct 2009, Zdenek Herman wrote: I have same problem. Any solution ? to...@starbridge.org napsal(a): i'm running SA 3.3.0 (3.3.0-alpha3-r808953) and i've some problem with compiled rules. sa-compile runs without errors, and SA seems to works fine when restarted. But some body rules are now not detected. A suggestion to both of you, based on sa-compile support requests seen earlier on the list: run sa-compile with the debug option turned on, publish the debugging output and intermediate files on a webserver somewhere, and post the URIs for that info here so they can be examined. even better: open a Bugzilla entry and do the same. That's how we track (possible) bugs and prioritize them. -- --j.
Re: SA 3.3.0 and sa-compile
On Thu, 1 Oct 2009, Justin Mason wrote: On Thu, Oct 1, 2009 at 16:15, John Hardin jhar...@impsec.org wrote: On Thu, 1 Oct 2009, Zdenek Herman wrote: I have same problem. Any solution ? to...@starbridge.org napsal(a): i'm running SA 3.3.0 (3.3.0-alpha3-r808953) and i've some problem with compiled rules. sa-compile runs without errors, and SA seems to works fine when restarted. But some body rules are now not detected. A suggestion to both of you, based on sa-compile support requests seen earlier on the list: run sa-compile with the debug option turned on, publish the debugging output and intermediate files on a webserver somewhere, and post the URIs for that info here so they can be examined. even better: open a Bugzilla entry and do the same. That's how we track (possible) bugs and prioritize them. And the bugzilla entry could have the logs as attachments. I wasn't sure if it was appropriate to open a bug yet, but if Justin suggests it then I guess it is... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Therapeutic Phrenologist - send email for affordable rate schedule. --- Approximately 9051420 firearms legally purchased in the U.S. this year
Re: .cn Oddity
John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%. uri LOCAL_URI_CNm{https?://.{1,40}\.cn\b} describeLOCAL_URI_CNcontains link to Chinese tld
Re: SA 3.3.0 and sa-compile
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Justin Mason a écrit : On Thu, Oct 1, 2009 at 16:15, John Hardin jhar...@impsec.org wrote: On Thu, 1 Oct 2009, Zdenek Herman wrote: I have same problem. Any solution ? to...@starbridge.org napsal(a): i'm running SA 3.3.0 (3.3.0-alpha3-r808953) and i've some problem with compiled rules. sa-compile runs without errors, and SA seems to works fine when restarted. But some body rules are now not detected. A suggestion to both of you, based on sa-compile support requests seen earlier on the list: run sa-compile with the debug option turned on, publish the debugging output and intermediate files on a webserver somewhere, and post the URIs for that info here so they can be examined. even better: open a Bugzilla entry and do the same. That's how we track (possible) bugs and prioritize them. thank for your answers. It's done: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6214 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkrE1EEACgkQ8FtMlUNHQINOJgCdG7Piu3Phd1Mb2iYl7dmX1pV7 b0UAn1yITwVbWgddDiUlJtdQgCWsb4QL =mPa4 -END PGP SIGNATURE-
Re: .cn Oddity
On Thu, 1 Oct 2009, Ned Slider wrote: John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%. uri LOCAL_URI_CNm{https?://.{1,40}\.cn\b} describeLOCAL_URI_CNcontains link to Chinese tld Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If healthcare is a Right means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- Approximately 9052800 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On tor 01 okt 2009 18:26:01 CEST, John Hardin wrote m;^https?://[^/?]+\.cn\b; replace ; with / no ? m/\bhttps?://[^/?]+\.cn\b/i -- xpoint
Re: .cn Oddity
From: John Hardin jhar...@impsec.org Sent: Thursday, 2009/October/01 09:26 On Thu, 1 Oct 2009, Ned Slider wrote: John Hardin wrote: On Thu, 1 Oct 2009, Warren Togami wrote: uri T_CN_URL /[^\/]+\.cn(?:$|\/|\?)/i describe T_CN_URL Contains a URL in the .cn domain uri T_CN_8_URL /[\/.]+\w{8}\.cn(?:$|\/|\?)/i describe T_CN_8_URL Contains a URL in the .cn domain of exactly 8 characters long http://ruleqa.spamassassin.org/20090930-r820211-n/T_CN_URL/detail Last night's masscheck. 63243 out of 124241 spam hits T_CN_URL, nearly 51%. 7263 T_CN_URL hits in 15517 spam corpus 7200 T_CN_8_URL hits in 15517 spam corpus Does this make any sense? This is funny. Could someone add this rule to the sandbox? I'm just curious. I note that neither is anchored at the beginning of the URI, so they may be hitting on .cn embedded somewhere within the path part. That doesn't explain 51%, though. I run my own custom .cn tld URI rule, and whilst it's right down in percentage terms atm, in the past it has certainly hit on around 50% plus of all spam containing a URI. So depending on the corpus, I'm not surprised by the 51%. uri LOCAL_URI_CN m{https?://.{1,40}\.cn\b} describe LOCAL_URI_CN contains link to Chinese tld Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. {^_-}
Re: .cn Oddity
On Thu, 1 Oct 2009, Benny Pedersen wrote: On tor 01 okt 2009 18:26:01 CEST, John Hardin wrote m;^https?://[^/?]+\.cn\b; replace ; with / no ? m/\bhttps?://[^/?]+\.cn\b/i No. The point to m; is so that you can embed / in the RE without escaping them. You are changing the RE delimiters. m{...} is fine _if_ you don't use {m,n} syntax, in which case it becomes confusing. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If healthcare is a Right means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- Approximately 9052800 firearms legally purchased in the U.S. this year
Re: SA 3.3.0 and sa-compile
On tor 01 okt 2009 18:09:38 CEST, to...@starbridge.org wrote thank for your answers. It's done: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6214 also spamassassin 21 -D -t msg output.log and another time with the plugin disabled shows it work (this time with output.log) add output.log to the ticket -- xpoint
Re: Understanding the hostKarma Lists
From: Marc Perkel m...@perkel.com Sent: Wednesday, 2009/September/30 16:41 Blaine Fleming wrote: Marc Perkel wrote: I like it. RCVD_IN_HOSTKARMA_BL RCVD_IN_HOSTKARMA_WL RCVD_IN_HOSTKARMA_YL RCVD_IN_HOSTKARMA_BR Let's go with it. Marc, have you updated your wiki to reflect the new rules? I think that will pretty well settle any debate or question people have. --Blaine Yes - the wiki is updated. I installed it on my personal mail for testing, Marc. I forwarded an email that failed within minutes of installing it. The bozo was in the whitelist and hit quite a few rules including a 5.0001 Bayes 99. It still got through with a 4.9 total because of the bogus whitelist rule hit and its bogus score. Whitelists aren't is my rule. {^_^}
Re: Understanding the hostKarma Lists
On 10/01/2009 12:42 PM, jdow wrote: From: Marc Perkel m...@perkel.com Sent: Wednesday, 2009/September/30 16:41 Blaine Fleming wrote: Marc Perkel wrote: I like it. RCVD_IN_HOSTKARMA_BL RCVD_IN_HOSTKARMA_WL RCVD_IN_HOSTKARMA_YL RCVD_IN_HOSTKARMA_BR Let's go with it. Marc, have you updated your wiki to reflect the new rules? I think that will pretty well settle any debate or question people have. --Blaine Yes - the wiki is updated. I installed it on my personal mail for testing, Marc. I forwarded an email that failed within minutes of installing it. The bozo was in the whitelist and hit quite a few rules including a 5.0001 Bayes 99. It still got through with a 4.9 total because of the bogus whitelist rule hit and its bogus score. Whitelists aren't is my rule. {^_^} spamassassin's default scores do not give big negative scores to any of the whitelist rules for a good reason. They are mainly informational. Warren
Re: .cn Oddity
On Thu, 1 Oct 2009, jdow wrote: From: John Hardin jhar...@impsec.org Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. what works is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- If healthcare is a Right means that the government is obligated to provide the people with hospitals, physicians, treatments and medications at low or no cost, then the right to free speech means the government is obligated to provide the people with printing presses and public address systems, the right to freedom of religion means the government is obligated to build churches for the people, and the right to keep and bear arms means the government is obligated to provide the people with guns, all at low or no cost. --- Approximately 9052800 firearms legally purchased in the U.S. this year
Re: .cn Oddity
On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: John Hardin jhar...@impsec.org Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. what works is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The Oddity I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren
Re: .cn Oddity
On 10/01/2009 01:16 PM, Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: John Hardin jhar...@impsec.org Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. what works is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The Oddity I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren (And yes I'm fully aware even this narrowed rule is prejudiced and unsafe. This is is partly out of curiosity, and also wondering if it could be made useful if meta booleaned with something else.) Warren
Re: Hostkarma: to be or not to be in SA defaults
SM wrote: Hi Marc, At 09:32 30-09-2009, Marc Perkel wrote: I have a lot of mighty servers set up ad have servers at 4 locations. I have 50mb bought and using about 30 of it now. I am not sure what it takes to support a default SA inclusion. Does anyone know if what I described sounds like it is enough? They can still be a soft target. Most of the DNSBLs were unprepared to deal with denial of service attacks. Some of them have closed down after an attack. That can be a problem for users as most people have a configure and forget setup or it's a default vendor setup. The bandwidth may be enough for current usage. The more mirrors you have, the better. If your DNSBL is effective, you might be able to get help with that. The problems with your setup is not worse than other resources that are commonly used by users from this mailing list. Someone pointed out that it's not a good idea to do more DNS lookups as it affects the performance of SpamAssassin. It does not matter whether your DNSBL is included in the default configuration as people will use it if they believe that it is effective in stopping spam. If you are concerned about marketing, then it may matter to you. :-) Regards, -sm I guess that if HOSTKARMA were included in the default build then I will need more mirrors to handle the load.
Re: Understanding the hostKarma Lists
Updated that as well. R-Elists wrote: marc dont forget this one http://wiki.apache.org/spamassassin/MarcPerkelsExperiments - rh From: Marc Perkel [mailto:m...@perkel.com] snip Yes - the wiki is updated.
Re: .cn Oddity
Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: John Hardin jhar...@impsec.org Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. what works is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The Oddity I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren Warren, Seems to hold true here to an extent. From my recent confirmed spam archive I see: # cat spam* | grep '\.cn\b' | grep http | wc -l 1088 # cat spam* | grep '\.\w\{8\}\.cn\b' | grep http | wc -l 908 # cat spam* | grep '\/\w\{8\}\.cn\b' | grep http | wc -l 23 so 85% of .cn URIs also match the {8}.cn pattern. Not quite as high as your findings, but very high nevertheless.
Re: .cn Oddity
From: Warren Togami wtog...@redhat.com Sent: Thursday, 2009/October/01 10:24 On 10/01/2009 01:16 PM, Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: John Hardin jhar...@impsec.org Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. what works is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The Oddity I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren (And yes I'm fully aware even this narrowed rule is prejudiced and unsafe. This is is partly out of curiosity, and also wondering if it could be made useful if meta booleaned with something else.) Warren I just had a thought, Warren. Look up Chinese numerology. 8 signifies wealth or sudden prosperity. Conversely, I suspect few Chinese names are four characters. Four is a pun on death. Some social sites might like 5 letters - me. 7 is right out, it's a vulgar word in Cantonese. 9 is also slang or vulgar in Cantonese. I wonder how many companies that deal with China have figured out that an 888 toll free number is WONDERFUL, Wealth, wealth, wealth. I understand numerology is quite important to the Chinese. (Of course, I am not claiming to be an expert. The above is mostly Wikipoodle and surmise.) {^_-}
Re: .cn Oddity
From: Ned Slider n...@unixmail.co.uk Sent: Thursday, 2009/October/01 10:48 Warren Togami wrote: On 10/01/2009 01:05 PM, John Hardin wrote: On Thu, 1 Oct 2009, jdow wrote: From: John Hardin jhar...@impsec.org Yours may still hit .cn in the path part. May I suggest: m;^https?://[^/?]+\.cn\b; Regardless of their correctness, would you care to expound on the success of these two rules, John? I like what works not political correctness. I think these are two interesting observations. Of course, they won't work very well for somebody doing business with China or embedded within the .cn TLD. what works is based on the accuracy of the corpora. If the corpora show lots of spam with .cn TLD URIs and little or no ham with such, then that rule will hit often, and have a good S/O, and get a high score. I too am surprised that .cn TLDs appear in 51% of the spam corpus but I haven't looked into it in any detail. I can certainly check it against my own corpora and see if it's reasonable - but then again, I don't do any business with anyone in china, and I _do_ get a fair amount of bulk emails from manufacturers in china purportedly looking for business partners. The Oddity I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? Warren Warren, Seems to hold true here to an extent. From my recent confirmed spam archive I see: # cat spam* | grep '\.cn\b' | grep http | wc -l 1088 # cat spam* | grep '\.\w\{8\}\.cn\b' | grep http | wc -l 908 # cat spam* | grep '\/\w\{8\}\.cn\b' | grep http | wc -l 23 so 85% of .cn URIs also match the {8}.cn pattern. Not quite as high as your findings, but very high nevertheless. Based on my last note about Chinese numerology I bet if you have a large Chinese ham corpus you'd pick up on 8 as a magic number there, too. I am intrigued enough I'd LOVE to know if that's right. {^_^}
Re: .cn Oddity
On Thu, 1 Oct 2009, Warren Togami wrote: The Oddity I was pointing out at the beginning of the thread is not prevalence of .cn URI's, but rather most of them appear to be exactly 8 characters long. Could someone please commit my T_CN_8_URL rule to the sandbox so we can see if that trend holds beyond my own corpa? I've put a .CN 8 URI rule into my sandbox file but it may be a few days before it gets committed, my stuff is in flux right now... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #9: Accuracy is relative: most combat shooting standards will be more dependent on pucker factor than the inherent accuracy of the gun. --- Approximately 9055560 firearms legally purchased in the U.S. this year
Re: DNSWL and JMF White false positives, what to do exactly?
Karsten Bräckelmann wrote: On Wed, 2009-09-30 at 23:35 +0200, mouss wrote: Warren Togami wrote: I scanned my spam folders and found a few false positives that hit on either DNSWL FP with DNSWL? FP = False Positive = legitimaite mail tagged as spam DNSWL = Whitelist False positive. Something, that matches (positive) the criterion for a certain test, but should not (false). if your system adds points because of dnswl, you have a serious problem. .. or do you mean FN (false negative)? Granted, the wording (FPs that hit ham rules) could need some polish, but I believe Warren was talking about spam that falsely hits ham rules. you can certainly devise a system to detect alpha(foo) where alpha is a function mapping a Banach space to a Hilbert Space, and define what FP, FN, FX mean in the context you consider. you can also say let PI=69, ... . but conventions are here for a reason. they allow us to understand each others more easily. the fact that children of today can solve computation problems that great scientists of the old times couldn't handle is thanks to conventions (think of a/b * c/d = (a*c)/(b*d), which looks trivial today, but wasn't before). when talking about spam or intrusion detection, FN means missing and FP means false alarm. if we allow defining FN and FP differently, then we'll need to rewrite a lot of books, reports, articles, ...
Re: DNSWL and JMF White false positives, what to do exactly?
RW wrote: On Wed, 30 Sep 2009 23:35:31 +0200 mouss mo...@ml.netoyen.net wrote: Warren Togami wrote: I scanned my spam folders and found a few false positives that hit on either DNSWL FP with DNSWL? FP = False Positive = legitimaite mail tagged as spam DNSWL = Whitelist The term false-positive can apply to any test. A test for ham that matches a spam is a false-positive, it's a matter of context. spam too can be (re)defined. and actually any term. but it is assumed here that we talk about spam detection. so false negative means miss and false positive means false alarm. this is the common terminology inherited from intrusion detection. I used to have a clock that was anti-clockwise. but it was for fun. I always understood what clockwise meant.
Re: DNSWL and JMF White false positives, what to do exactly?
On Fri, 2009-10-02 at 00:08 +0200, mouss wrote: Karsten Bräckelmann wrote: False positive. Something, that matches (positive) the criterion for a certain test, but should not (false). I stand to what I said. you can certainly devise a system to detect alpha(foo) where alpha is a function mapping a Banach space to a Hilbert Space, and define what FP, FN, FX mean in the context you consider. you can also say let PI=69, ... . but conventions are here for a reason. they allow us to understand each others more easily. the fact that children of today can solve computation problems that great scientists of the old times couldn't handle is thanks to conventions (think of a/b * c/d = (a*c)/(b*d), which looks trivial today, but wasn't before). when talking about spam or intrusion detection, FN means missing and FP means false alarm. if we allow defining FN and FP differently, then we'll need to rewrite a lot of books, reports, articles, ... IFF you are talking about the black box that spam detection is, that is true. If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to be that simple. However, it is not. You are looking at a single test, which -- if positive -- either is correct or wrong. Same for RCVD_IN_DNSWL. If it positively matches, it either it is correct, or wrong. A false positive is a match, that is wrong. No matter the score you assign the test. This concept is NOT specific to spam detection, or even computer science. As a matter of fact, when I first really grasped the concept, a medical scientist explained it to me. Yes, a FP for a rule that identifies *ham* actually evaluated positive on a spam. It only appears to be spam centric on this list, cause it is mainly dedicated to identifying spam, not ham. You might want to ask wikipedia as well. And don't focus on the spam filtering *example*, which again exclusively talks about a rule identifying spam. Not ham. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: DNSWL and JMF White false positives, what to do exactly?
On Oct 1, 2009, at 18:36, Karsten Bräckelmann guent...@rudersport.de wrote: Same for RCVD_IN_DNSWL. If it positively matches, it either it is correct, or wrong. A false positive is a match, that is wrong. No matter the score you assign the test. Lke others havecsaid, you can make the words mean whatever you want. However, if you want to be understood you need to speak the Lingua Franca. If you choose to use a term differently than everyone else you WILL be misunderstood and corrected. Saying everyone else is wrong isn't going to help.
Re: DNSWL and JMF White false positives, what to do exactly?
On Fri, 02 Oct 2009 00:14:52 +0200 mouss mo...@ml.netoyen.net wrote: RW wrote: The term false-positive can apply to any test. A test for ham that matches a spam is a false-positive, it's a matter of context. spam too can be (re)defined. and actually any term. but it is assumed here that we talk about spam detection. so false negative means miss and false positive means false alarm. this is the common terminology inherited from intrusion detection. The term comes from statistics, not intrusion detection. I don't know much about the latter, perhaps people in that field are a little sloppy in their usage, more likely all the tests are expressed as tests for intrusion, so the same kind of issue doesn't arise. The source of your confusion is that you are mixing-up the terminology of the overall classification and individual test results. Think of this way, in a fingerprint comparison the meanings of TP, TN, FP and FN are obvious and intrinsic to the test, it would be absurd to switch them around depending on whether it's evidence for the defence or prosecution.
Re: Do I need to do anything to maintain MySQL?
On 09/24/09 09:21, quoth Benny Pedersen: On tor 24 sep 2009 04:57:57 CEST, Steven W. Orr wrote Since I haven't *ever* touched this table for cleanup, the above described cron job will not delete any rows for that period of time. you will have less problems with innodb then myisam here is my complete spamassassin sql setup, not showing tables that is standard here CREATE TABLE `awl` ( `username` varchar(100) NOT NULL default '', `email` varchar(200) NOT NULL default '', `ip` varchar(10) NOT NULL default '', `count` int(11) default '0', `totscore` float default '0', `lastupdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, PRIMARY KEY (`username`,`email`,`ip`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE `bayes_seen` ( `id` int(11) NOT NULL default '0', `msgid` varchar(200) character set utf8 collate utf8_bin NOT NULL default '', `flag` char(1) NOT NULL default '', `lastupdate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP, PRIMARY KEY (`id`,`msgid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; this 2 table will need to be expired from cron CREATE TABLE `bayes_token` ( `id` int(11) NOT NULL default '0', `token` char(5) NOT NULL default '', `spam_count` int(11) NOT NULL default '0', `ham_count` int(11) NOT NULL default '0', `atime` int(11) NOT NULL default '0', PRIMARY KEY (`id`,`token`), KEY `bayes_token_idx1` (`token`), KEY `bayes_token_idx2` (`id`,`atime`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; last table will expire in standard way, this setup is working in 3.2.5 and its not bugging down my mysql server if you change your db to lastupdate now() then all data will get added as today even thay are not added for real today, but the expire will expire okay later I have all my SA tables up and running using InnoDB and using the above table definitions. I just have one question: Will the cronjob that was described here earlier #!/bin/sh howfar='where lastupdate date_sub(now(), interval 3 month)' mysql -h localhost -u sa -pssaa spamassassin EOF delete from awl $howfar ; delete from bayes_seen $howfar ; EOF also clean up the bayes_token table, or is there another cron job I should use for that? And, why is bayes_token.atime int(11) instead of timestamp NOT NULL default CURRENT_TIMESTAMP on update ? Is this a part of the design or is it more efficient? TIA -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net signature.asc Description: OpenPGP digital signature