Re: Distributed Bayes DB?
Matthias Leisi wrote: Hello List, How would you set up a distributed Bayes DB? In this context, distributed means that I have four mailserver machines in parallel (all with equal MX priority) where I want to run Spamassassins Bayes filtering -- without introducing a single point of failure (eg a central database). All servers should thus run with local Bayes DBs. No they shouldn't.. there are better ways. In order to avoid that they diverge too much, 1) the files are copied from one machine to the others once a day (or twice, ...). 2) the files are merged and re-distributed to all four machines once a day (or twice, ...). Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. Example with mysql: http://www.howtoforge.com/loadbalanced_mysql_cluster_debian SA 3.0.0 and higher supports generic SQL, as well as MySQL and Postgres optimized backends for bayes storage. This is THE way to have multiple servers share a bayes database, because it's what SQL was designed to do. Anything else is a hack at best. See bayes_store_module and the bayes_sql_* options in the conf manpage. http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html Also see the SQL readme: http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes What is the best practice in that regard with Spamassassin? Using SQL is by far the best practice here. Is it even possible to merge Bayes DBs (and if yes, how)? No. Btw., I would like a similar setup for the Autowhitelist/AWL where I think a simple filecopy (ie option 1 above) is sufficient. Ditto. See auto_whitelist_factory in the AWL plugin manpage (assuming SA 3.1.x) http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_AWL.html Thanks for your input, -- Matthias
Re: Distributed Bayes DB?
Matt Kettler wrote: Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. [..] Also see the SQL readme: http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes I already took a look at using SQL, but this quote: | NB: This should be considered BETA, and the interface, schema, or | overall operation of SQL support may change at any time with future | releases of SA. stops me from using it. Unfortunately, I can not run software officially considered Beta on this system. Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. Example with mysql: http://www.howtoforge.com/loadbalanced_mysql_cluster_debian I suppose that every message passed through SpamAssassin will issue at least on query and one update statement to the DB. How does a MySQL cluster perform with 500'000 messages per day, considering that replication must also take place? What is the best practice in that regard with Spamassassin? Using SQL is by far the best practice here. I do not see many mentions of the SQL approach - either because it is not used much or because it works so well? Thanks, -- Matthias
Re: Well, that didn't take very bloody long
From: Steve Lake [EMAIL PROTECTED] Ok, remember that Name Wrote: :) emails? They've completely changed. Now it's hi username instead. Joy, oh joy. Can anyone find any common elements in these emails because whoever this putz is, they're adapting a lot. They hit us, we adapt, they immediately change tactics and come at us again. Now with all the brilliant minds on this mailing list, we really should be able to find out who this putz is and nail all his stuff regardless of what tactic he switches to. I believe the record will show that I more or less predicted this with the first postings of the wrote spam. Obvious single features that are easily changeable are lousy for using as rules. I figure they are digital prestidigitation - misdirect your eye to where you want them to look so they don't notice the hard to change features. {^_-}
RE: Distributed Bayes DB?
-Original Message- From: Matthias Leisi [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 4:48 AM To: users@spamassassin.apache.org Subject: Re: Distributed Bayes DB? Matt Kettler wrote: Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. [..] Or just use mysql with replication? Put mysql on two servers, replicate the data using built in (not beta) data base replication in mysql? For load balancing/failover, use something like 'CARP' (on *BSD systems) similar things on linux? Use a load/balancing ip address for target of MX as well as target for SQL? Can't linux itself do IP clustering? You could also contact me offlist for information on how we have this solved, and have systems that are doing 10million emails per day.
OT : MailScanner
Hi, Need some inputs from the experts. I am planning to switch to postfix + mailscanner + sa + clamav. Just want to know one thing before doing that. I have kaspersky linux edition. Can I create two antivirus scanning layers in mailscanner? Warm Regards, Suhas System Administrator QualiSpace - A QuantumPages Enterprise An ICANN Accredited Domain Registrar === URL: http://www.qualispace.com === QualiSpace Community Discussion forum: http://forum.qualispace.com
Re: Distributed Bayes DB? (SQL usage)
Matthias Leisi wrote: Matt Kettler wrote: Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. [..] Also see the SQL readme: http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes I already took a look at using SQL, but this quote: | NB: This should be considered BETA, and the interface, schema, or | overall operation of SQL support may change at any time with future | releases of SA. stops me from using it. Unfortunately, I can not run software officially considered Beta on this system. I think that documentation line is obsolete, and has probably been overlooked for a long time. SQL support has been in SA since 2004, and was touted as a major feature of SA 3.0.0. http://mail-archives.apache.org/mod_mbox/spamassassin-announce/200409.mbox/browser The 3.1.0 release announcement declared SQL to be THE preferred method for bayes storage, even for single-box setups. http://mail-archives.apache.org/mod_mbox/spamassassin-announce/200509.mbox/[EMAIL PROTECTED] - - added PostgreSQL, MySQL 4.1+, and local SDBM file Bayes storage modules. SQL storage is now recommended for Bayes, instead of DB_File. NDBM_File support has been dropped due to a major bug in that module. - That said, yes, they might change the schema or operation in a future version.. But the same goes for DB files. It's happened once already.. But this is not beta, it's the recommended configuration. Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. Example with mysql: http://www.howtoforge.com/loadbalanced_mysql_cluster_debian I suppose that every message passed through SpamAssassin will issue at least on query and one update statement to the DB. How does a MySQL cluster perform with 500'000 messages per day, considering that replication must also take place? *MUCH* faster than the default Berkely DB does: http://wiki.apache.org/spamassassin/BayesBenchmarkResults MySQL with MYISAM tables completed the test in 56% of the time DBM took. Admittedly that's over lo, not the wire, but you get the point. In general, SQL is more efficient and faster than the default Berkely DB. SDBM is faster still, but it's got some issues with the dump/restore process last I checked, so conversion to SDBM is not very practical. I'd consider SDBM not well supported nor well tested, although I do use it on my boxes. What is the best practice in that regard with Spamassassin? Using SQL is by far the best practice here. I do not see many mentions of the SQL approach - either because it is not used much or because it works so well? Erm, really? It seems to get talked about here a lot. And the official recommendation in the release announcement is hard to overlook.
Re: Distributed Bayes DB?
Michael Scheidell wrote: -Original Message- From: Matthias Leisi [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 4:48 AM To: users@spamassassin.apache.org Subject: Re: Distributed Bayes DB? Matt Kettler wrote: Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. [..] Or just use mysql with replication? Put mysql on two servers, replicate the data using built in (not beta) data base replication in mysql? Actually his point wasn't the SQL clustering was beta, but that the SQL Readme on the wiki claims that SA's SQL bayes backend is beta.. But that's just an oops.
RE: OT : MailScanner
Yes you can, and many of us MailScanner users do run two or more virus scanners. You should join the MailScanner user's mailing list, we're are a helpful lot. Phil From: Suhas (QualiSpace) [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 10:17 AM To: users@spamassassin.apache.org Subject: OT : MailScanner Hi, Need some inputs from the experts. I am planning to switch to postfix + mailscanner + sa + clamav. Just want to know one thing before doing that. I have kaspersky linux edition. Can I create two antivirus scanning layers in mailscanner? Warm Regards, Suhas System Administrator QualiSpace - A QuantumPages Enterprise An ICANN Accredited Domain Registrar === URL: http://www.qualispace.com === QualiSpace Community Discussion forum: http://forum.qualispace.com
Re: Distributed Bayes DB?
Am 11.11.2006 um 10:48 schrieb Matthias Leisi: I already took a look at using SQL, but this quote: | NB: This should be considered BETA, and the interface, schema, or | overall operation of SQL support may change at any time with future | releases of SA. stops me from using it. Unfortunately, I can not run software officially considered Beta on this system. I suppose you could use something like NFS so that all systems share the same DB, config files, etc. Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. Example with mysql: http://www.howtoforge.com/loadbalanced_mysql_cluster_debian I suppose that every message passed through SpamAssassin will issue at least on query and one update statement to the DB. How does a MySQL cluster perform with 500'000 messages per day, considering that replication must also take place? How long is a piece of string? 500,000 queries per day shouldn't cause any problems for an RDBMS but the architecture of such a system should be given a bit of consideration - connection pooling et al. There is in fact a mail system that uses PostgreSQL to store all the mails. If you want more information on requirements, speed, etc. I'm pretty sure you could run Spamassassin on the top of it. What is the best practice in that regard with Spamassassin? Using SQL is by far the best practice here. I do not see many mentions of the SQL approach - either because it is not used much or because it works so well? Probably the former. And you're right not to use something like the SQL backend for a large volume production system. Not because it's unreliable but because it's still in development and keeping the schema up to date could become a real headache. I suspect that at some point it might make sense to use something like SQLite for persistence (because it's relatively easy to distribute) which would make using alternative backends relatively easy. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226
Re: OT : MailScanner
Suhas (QualiSpace) wrote: Hi, Need some inputs from the experts. I am planning to switch to postfix + mailscanner + sa + clamav. Just want to know one thing before doing that. I have kaspersky linux edition. Can I create two antivirus scanning layers in mailscanner? A) probably better to ask over on the mailscanner list B) Yes. You can have multiple virus scanners with mailscanner. And I'm pretty sure you can use kaspersky with it.
Re: Distributed Bayes DB?
Matthias Leisi wrote: Matt Kettler wrote: Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. [..] Also see the SQL readme: http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes I already took a look at using SQL, but this quote: | NB: This should be considered BETA, and the interface, schema, or | overall operation of SQL support may change at any time with future | releases of SA. stops me from using it. Unfortunately, I can not run software officially considered Beta on this system. Like Matt mentioned.. this is an oops. I've been using global sql bayes ever since the 3.0.0 release (about 2 years now).. same for awl (which i later disabled for lack of janitor tools). It's rock stable and quite fast (though on a dedicated server).. for redundancy look at DRBL or something similar. - dhawal
RE: Well, that didn't take very bloody long
But most of us aren't clever enough with Perl RE's to construct the rule to go with it. So where's the rule to match, folks? Cheers, Phil -Original Message- From: Tony Finch [mailto:[EMAIL PROTECTED] On Behalf Of Tony Finch Sent: Friday, November 10, 2006 9:49 PM To: Steve Lake Cc: users@spamassassin.apache.org Subject: Re: Well, that didn't take very bloody long On Fri, 10 Nov 2006, Steve Lake wrote: Ok, remember that Name Wrote: :) emails? They've completely changed. Now it's hi username instead. Joy, oh joy. Can anyone find any common elements in these emails because whoever this putz is, they're adapting a lot. http://article.gmane.org/gmane.mail.spam.spamassassin.general/90322 Tony. -- f.a.n.finch [EMAIL PROTECTED] http://dotat.at/ VIKING: SOUTHERLY VEERING WESTERLY 6 TO GALE 8, OCCASIONALLY SEVERE GALE 9. HIGH. RAIN THEN SHOWERS. MODERATE OR GOOD.
Re: Distributed Bayes DB?
Charlie Clark wrote: Am 11.11.2006 um 10:48 schrieb Matthias Leisi: I already took a look at using SQL, but this quote: | NB: This should be considered BETA, and the interface, schema, or | overall operation of SQL support may change at any time with future | releases of SA. stops me from using it. Unfortunately, I can not run software officially considered Beta on this system. I suppose you could use something like NFS so that all systems share the same DB, config files, etc. NFS would be HIGHLY not -recommended. http://article.gmane.org/gmane.mail.spam.spamassassin.general/72362/match=sql In fact, I personally would suggest never using NFS for anything at all, and I'm shocked that you'd even consider using it for any production purpose. Besides, the point here is to eliminate any single-point-of-failure. NFS would offer no redundancy at all. If the server hosting the NFS share went down, the bayes DB would be unavailable. I do not see many mentions of the SQL approach - either because it is not used much or because it works so well? Probably the former. And you're right not to use something like the SQL backend for a large volume production system. Not because it's unreliable but because it's still in development and keeping the schema up to date could become a real headache. But it's not still in development.. It's the recommended configuration as of 3.1.0. SA's SQL support is solid. I personally don't use it, but many here do.
Re: Distributed Bayes DB?
Dhawal Doshy wrote: Matthias Leisi wrote: Matt Kettler wrote: Do you see additional options? Use a SQL server backend. If you must have a no-failure option for the bayes DB, use a cluster of SQL servers. [..] Also see the SQL readme: http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes I already took a look at using SQL, but this quote: | NB: This should be considered BETA, and the interface, schema, or | overall operation of SQL support may change at any time with future | releases of SA. stops me from using it. Unfortunately, I can not run software officially considered Beta on this system. Like Matt mentioned.. this is an oops. I've been using global sql bayes ever since the 3.0.0 release (about 2 years now).. same for awl (which i later disabled for lack of janitor tools). It's rock stable and quite fast (though on a dedicated server).. for redundancy look at DRBL or something similar. that should be DRBD - dhawal
RE: Distributed Bayes DB?
-Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 5:23 AM To: Michael Scheidell Cc: users@spamassassin.apache.org Subject: Re: Distributed Bayes DB? Actually his point wasn't the SQL clustering was beta, but that the SQL Readme on the wiki claims that SA's SQL bayes backend is beta.. But that's just an oops. I have asked, on this list and amavisd list, at least twice, if anyone has tried SA with NDB clusters. I have not gotten an answer. Do, do you have this running? Does it work?
Re: FuzzyOcr problem (Re: Relay Checker plugin v0.2)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Rudd wrote: decoder wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John Rudd wrote: D.J. wrote: On 11/10/06, Patrick Sneyers [EMAIL PROTECTED] wrote: I get this warning: plugin: failed to create instance of plugin Mail::SpamAssassin::Plugin::RelayChecker: Can't locate object method new via package Mail::SpamAssassin::Plugin::RelayChecker at (eval 26) line 1. (This is my own build of SA 3.1.7 on Max OS X Server 10.4 ppc) It seems to work OK though: * 3.0 RELAY_CHECKER RELAY: badrdns (I lowered the score) Patrick Sneyers Belgium I also received some weirdness. When linting in debug mode, I found the following lines that seem to indicate that RelayChecker isn't playing nicely with FuzzyOCR: [28058] dbg: plugin: fixed relative path: /etc/mail/spamassassin/FuzzyOcr.pm [28058] dbg: plugin: loading FuzzyOcr from /etc/mail/spamassassin/FuzzyOcr.pm [28058] dbg: plugin: registered FuzzyOcr=HASH(0x9d04570) [28058] dbg: plugin: FuzzyOcr=HASH(0x9d04570) implements 'parse_config' [28058] dbg: FuzzyOcr: Option logfile = /home/amavis/.spamassassin/FuzzyOcr.log [28058] dbg: FuzzyOcr: Found scan: $gocr -i $pfile [28058] dbg: FuzzyOcr: Found scan: $gocr -l 180 -d 2 -i $pfile [28058] dbg: FuzzyOcr: Found scan: $gocr -l 140 -d 2 -i $pfile [28058] dbg: FuzzyOcr: Option threshold = 0.25 [28058] dbg: FuzzyOcr: Score{autodisable} = 10.01 [28058] dbg: FuzzyOcr: Option counts_required = 3 [28058] dbg: plugin: fixed relative path: /etc/mail/spamassassin/RelayChecker.pm [28058] dbg: plugin: loading RelayChecker from /etc/mail/spamassassin/RelayChecker.pm [28058] dbg: plugin: registered RelayChecker=HASH(0x9d94a80) [28058] dbg: plugin: FuzzyOcr=HASH(0x9d04570) implements 'parse_config' [28058] dbg: plugin: RelayChecker=HASH(0x9d94a80) implements 'parse_config' [28058] dbg: FuzzyOcr: unknown Score: relaychecker_score [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_nordns [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_badrdns [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_baddns [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_ipinhostname [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_dynhostname [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_clienthostname [28058] dbg: FuzzyOcr: unknown Option: relaychecker_skip_ip [28058] dbg: FuzzyOcr: unknown Option: relaychecker_pass_auth Ok that really doesn't look nice... is the fault on our (FuzzyOcr's) side? Yes. If so, then maybe someone can explain me what the correct way would be to fix this :) When you encounter an option you don't own (ie. it's not a FuzzyOcr option), then parse_config should return 0. If you could verify that this also applies to the latest development version (3.4.1), then that would be nice Yup, I found this in your 3.4.1 code (my comments indicate the issues): Thank you very much for the work, I will patch this into our SVN version and the 3.4.x devel branch right now. Best regards Chris sub parse_config { my ( $self, $opts ) = @_; # this is good: you're restricting yourself to ^focr_bin_ keys if ( $opts-{key} =~ /^focr_bin_/i ) { my $p = lc $opts-{key}; $p =~ s/focr_bin_//; if (grep {m/$p/} @bin_utils) { $App{$p} = $opts-{value}; debuglog(App{$p} = $App{$p}); } else { debuglog(unknown App: $opts-{key}); } # you should tell SA you processed this config option: #$self-inhibit_further_callbacks(); } # this is bad: you're processing _score configs that may not belong to # FuzzyOcr. A better statement might be: #elsif (($opts-{key} =~ /^focr_/i) ($opts-{key} =~ m/_score$/i)) { # that way you're only processing _score configs that belong to focr elsif ( $opts-{key} =~ m/_score$/i ) { my $o = lc $opts-{key}; $o =~ s/focr_//; $o =~ s/_score//; if (grep {m/$o/} @pgm_scores) { $Score{$o} = $opts-{value}; debuglog(Score{$o} = $Score{$o}); } else { debuglog(unknown Score: $opts-{key}); } # again, inhibit further callbacks here: #$self-inhibit_further_callbacks(); } # same as above: now you're taking ANY key, from ANY plugin, and handling # it. Bad bad bad. This should be changed to: #elsif ($opts-{key} =~ /^focr_/i) { else { my $o = lc $opts-{key}; $o =~ s/focr_//; if (grep {m/$o/} @pgm_opts) { if ($o eq 'scansets') { @scansets = (); # remove foreach my $s (split(',',$opts-{value})) { $s =~ s/^\s*//; $s =~ s/\s*$//; push @scansets,$s; debuglog(Found scan: $s); } } elsif ($o eq 'path_bin') { @paths = (); # remove foreach my $p
Re: FuzzyOCR
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 sokka wrote: Hi, Can anyone post me URL or PDF of clear documentation of the FuzzyOcr ? The current URL for FuzzyOcr is http://fuzzyocr.own-hero.net/ The page (wiki) is still quite under construction, but you'll find installation instructions inside the tarball (you can try version 3.4.1 if you want, it performs better than the stable version 2.3b, just isnt tested as long yet..). Installation itself is not hard if you have all the dependencies installed :) If you need further assistance, check out our list at http://lists.own-hero.net/mailman/listinfo/devel-spam Once I get more time, I will also be able to do more work on the wiki :) Best regards, Chris thanks in advance -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFVbEaJQIKXnJyDxURAkYrAJ4/ObuZsaThvCh13jBycDpMZrUpqQCgsdO6 UmIM0FUXykERwXZTIN7wLPo= =dtEH -END PGP SIGNATURE-
Unsubscribe
unsubscribe
Re: Questions about FuzzyOCR
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Pascal Maes wrote: Version 2.3b 1) Here is the ouptut of the scanner (gocr -i) : _ date Informations 9- 11-lO061O_30 Le __ek-end du 3-4r'11, les adresses de cou r_er jlectron_que des jtud_ants non ri_nscmts j _UCL ont jtj ddsact_vjes. La ra_son est pÄrement adm_n_strat_ve et I_je j Ia caNe j puce. Pour permeNre j ces jtud_ants de rjcupdrer leurs messaqes, nous avons fa_t en soNe qu'_Is pu_ssent encore accjder j leur boîte aux leNres jusqu'au l4.r l 1 ,/lo 06 . ANent_on, la consuttat_on se fera av_ un cI_ent de messager_e !Thunderb_rd. Eudora, Outlook.. .7 ou v_a le _IebMa_I ma_s plus v_a le poNa_I . We get almost the same result with gocr -l 180 -d 2 -i And FuzzyOCr says : 13 FUZZY_OCR BODY: Mail contains an image with common spam text inside Words found: wexe in 3 lines alert in 2 lines alert in 2 lines investor in 1 lines trade in 3 lines (11 word occurrences found) But I don't find any of these words in th text above ! You can try lowering your fuzz from 0.3 to 0.2, I didn't make any experience so far how the plugin reacts to text in different languages, so this might produce false positives. 2) How remove an image which as been stored by mistake in the hash database ? In version 2.3b, this is not possible yet with a tool, unfortunately. But the database is only a textfile, so you can simply search the hash there and delete the line. Version 3.4.1 brings a tool that removes a given hash from the database, but I am still improving it a bit, so one can also pass it an image file to look for. Best regards, Chris Thanks -- Pascal -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFVbIjJQIKXnJyDxURAkYjAJ9iFDj2oFrY+mVMyEBvEusYxxBxFQCgjZoM SJny4nTsw1G3XgGqBOVl7S8= =5S1J -END PGP SIGNATURE-
Re: Distributed Bayes DB?
Don't overrate Bayes. Don't focus solely on a bullet-proof highly available clustered or replicated database. If the Bayes database is gone, only one check is gone! All the others are still there. For my mail content, the real filtering power today come from the network checks such as url-blocklists, content-checksums (razor/dcc) and open-relay block lists. Focus on making these additional tests work. For Bayes, use a central SQL database on one server that is used by all your MTA's, and keep it simple. Make a disaster recovery concept for the database machine and for the rebuild of an empty SA Bayes database. This could be very fast. Don't backup the Bayes token data. You wrote that you expect 500.000 messages per day. If you use Bayes auto-learning, an empty central Bayes database is refilled to a usable state from current messages in only a few hours. This is probably faster than a cumbersome restore process. regards, Alex
Re: current stock scams are easy to spot
Loren Wilton writes: Well, that's all fine and dandy, but what do we do about them? Since we know they all have a common element, we need to figure out a way to stop them using that info. Well, just from the description and knowing the existance of header ALL, it would be pretty trivial to write about three rules involving a capturing clause to do the matching. yep, agreed. (If they work really well but perform really badly, they can always be rewritten into an eval-rule plugin later.) If someone *does* write this, please post 'em and I'll put them into my sandbox for testing. --j.
rule secrecy *again* (Re: Well, that didn't take very bloody long)
Loren Wilton writes: Ok, remember that Name Wrote: :) emails? They've completely changed. Now it's hi username instead. Joy, oh joy. Can anyone find any common elements in these emails because whoever this putz is, they're adapting a lot. They hit us, we adapt, they immediately change tactics and come at us again. Now with all the brilliant minds on this mailing list, we really should be able to find out who this putz is and nail all his stuff regardless of what tactic he switches to. The reason they adapt is because there are detailed announcements on the mailing list of the things that are easy to spot. The guy sending these is on the list too, so as soon as the oversight or excessive cleverness is announced to the world, he knows what he has to fix. ho hum... here we go again. :( As I've noted several times recently -- these *are* being caught by rules which were developed in the open -- namely RCVD_FORGED_WROTE, which has been sitting in my sandbox for several weeks, was announced in a checkin message (with diffs!), and is currently live in both trunk and 3.1.x rule updates. The rule has been visible since: r465179 | jm | 2006-10-18 10:11:15 +0100 (Wed, 18 Oct 2006) | 1 line add rule to catch 'Subject: foo wrote:' stock spam Take a look at the graph of hit-rates over time in everyone's corpora: http://ruleqa.spamassassin.org/last-night/RCVD_FORGED_WROTE?s_detail=ons_g_over_time=1s_zero=onsrcpath=#over_time_anchor There's been no change in hitrates since 2006-10-18 -- in fact, in cthielen and zmi's corpora, they rose *dramatically*. Secrecy is *NOT* an essential element of rule development. It seems logical to think it is, but evidence repeatedly demonstrates otherwise. For some spammers, it may _help_ -- but not for all, so it's by no means essential. On the other hand, secrecy damages collaborative development, restricting rule refinement and improvement to a secret cabal. It's antithetical to open source development. --j.
Re: Distributed Bayes DB?
First, a thank you all for the suggestions relating to SQL. It seems SQL support is better than I expected and I will give it a try. Alex Woick wrote: Don't overrate Bayes. The system has been running without Bayes for roughly 3 years (with incremental Spamassassin updates), and with good results until now. However that system without the Bayes check handled the recent increase in spam volumes with less success than other systems that do have Bayes checks enabled. Don't focus solely on a bullet-proof highly available clustered or replicated database. If the Bayes database is gone, only one check is gone! All the others are still there. That's a very good suggestion, since it seems like a bit of an overkill to have additional database server machines for this simple task. Is it even necessary to have a consistent shared storage amongst equal MXes or would it be sufficient to let them run independently? For Bayes, use a central SQL database on one server that is used by all your MTA's, and keep it simple. Make a disaster recovery concept for the database machine and for the rebuild of an empty SA Bayes database. This could be very fast. Don't backup the Bayes token data. You wrote that I don't worry too much about disaster recovery, more about avoiding a single point of failure, ie if one or two machine go/es up in smoke or is/are taken offline for maintenance the remaining machines should continue just as before. -- Matthias smime.p7s Description: S/MIME Cryptographic Signature
Re: Distributed Bayes DB?
Am 11.11.2006 um 11:47 schrieb Matt Kettler: I suppose you could use something like NFS so that all systems share the same DB, config files, etc. NFS would be HIGHLY not -recommended. http://article.gmane.org/gmane.mail.spam.spamassassin.general/72362/ match=sql In fact, I personally would suggest never using NFS for anything at all, and I'm shocked that you'd even consider using it for any production purpose. NFS or equivalent has its place and can be made safe enough if required but I think other issues like concurrent access suggest that the SQL approach is the way to go. Besides, the point here is to eliminate any single-point-of- failure. NFS would offer no redundancy at all. If the server hosting the NFS share went down, the bayes DB would be unavailable. Agreed. I do not see many mentions of the SQL approach - either because it is not used much or because it works so well? Probably the former. And you're right not to use something like the SQL backend for a large volume production system. Not because it's unreliable but because it's still in development and keeping the schema up to date could become a real headache. But it's not still in development.. It's the recommended configuration as of 3.1.0. SA's SQL support is solid. I personally don't use it, but many here do. Yes, sorry I should have read all e-mails relating to the thread first. Charlie -- Charlie Clark Helmholtzstr. 20 Düsseldorf D- 40215 Tel: +49-211-938-5360 GSM: +49-178-782-6226
user_prefs
I have searched for several hours and can't seem to find the answer to this. I've found close answers, but not complete.I have SA set up as individual users. When a new user is created SA creates a new user_prefs file for them. This file contains two prefs. required_score 7 and rewrite_header subject SPAM.I am trying to find out if I can change some prefs so that the new user_prefs file will contain my prefs when it is newly created.I have changed prefs in user_prefs.template and that didn't make any difference. I assume this template is supposed to be used by SA to create the new user_prefs, but it doesn't seem so.Where can I add my own prefs so the newly created defualt user_prefs file isloaded with what I want?Thanks. - /etc/mail/spamassassin/user_prefs.template: Default user preferences, for system admins to create, modify, and set defaults for users' preferences files. Takes precedence over the above prefs file, if it exists. Do not put system-wide settings in here; put them in a file in the "/etc/mail/spamassassin" directory ending in ".cf". This file is just a template, which will be copied to a user's home directory for them to change. - $USER_HOME/.spamassassin/user_prefs: User preferences file. If it does not exist, one of the default prefs file from above will be copied here for the user to edit later, if they wish. Unless you're using spamd, there is no difference in interpretation between the rules file and the preferences file, so users can add new rules for their own use in the "~/.spamassassin/user_prefs" file, if they like. (spamd disables this for security and increased speed.) Access over 1 million songs - Yahoo! Music Unlimited.
Re: Questions about FuzzyOCR
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 decoder wrote: Pascal Maes wrote: Version 2.3b 1) Here is the ouptut of the scanner (gocr -i) : _ date Informations 9- 11-lO061O_30 Le __ek-end du 3-4r'11, les adresses de cou r_er jlectron_que des jtud_ants non ri_nscmts j _UCL ont jtj ddsact_vjes. La ra_son est pÄrement adm_n_strat_ve et I_je j Ia caNe j puce. Pour permeNre j ces jtud_ants de rjcupdrer leurs messaqes, nous avons fa_t en soNe qu'_Is pu_ssent encore accjder j leur boîte aux leNres jusqu'au l4.r l 1 ,/lo 06 . ANent_on, la consuttat_on se fera av_ un cI_ent de messager_e !Thunderb_rd. Eudora, Outlook.. .7 ou v_a le _IebMa_I ma_s plus v_a le poNa_I . We get almost the same result with gocr -l 180 -d 2 -i And FuzzyOCr says : 13 FUZZY_OCR BODY: Mail contains an image with common spam text inside Words found: wexe in 3 lines alert in 2 lines alert in 2 lines investor in 1 lines trade in 3 lines (11 word occurrences found) But I don't find any of these words in th text above ! You can try lowering your fuzz from 0.3 to 0.2, I didn't make any experience so far how the plugin reacts to text in different languages, so this might produce false positives. 2) How remove an image which as been stored by mistake in the hash database ? In version 2.3b, this is not possible yet with a tool, unfortunately. But the database is only a textfile, so you can simply search the hash there and delete the line. Version 3.4.1 brings a tool that removes a given hash from the database, but I am still improving it a bit, so one can also pass it an image file to look for. I must correct myself there, passing it an image is already supported :) Best regards, Chris Best regards, Chris Thanks -- Pascal -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFVeMqJQIKXnJyDxURAhIbAKCpiYddgBqEBZZt1WnM9e4qjkgFfgCePG/R mWU8mtJuXQlVIHdO90e6xR0= =hMuz -END PGP SIGNATURE-
Re: question about bayes database
Matthias Haegele wrote: pinoyskull schrieb: will it be ok if i have 1000+ spam learned and only 300+ ham learned, will it still be effective? Dont know. But i think it´s better if you learn *all* spam and ham ... that's my problem, spams overwhelmed ham on our server (If your spam-ham-ratio is really that bad perhaps you want to use some MTA-level antispam, or blacklists?) could you give me an example of a MTA-level antispam, im kinda new to this, thanks hth MH
RE: question about bayes database
-Original Message- From: pinoyskull [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 9:55 AM To: users@spamassassin.apache.org Subject: Re: question about bayes database (If your spam-ham-ratio is really that bad perhaps you want to use some MTA-level antispam, or blacklists?) could you give me an example of a MTA-level antispam, im kinda new to this, thanks Google for 'postfix+spam'
Re: Distributed Bayes DB?
Michael Scheidell wrote: -Original Message- From: Matt Kettler [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 5:23 AM To: Michael Scheidell Cc: users@spamassassin.apache.org Subject: Re: Distributed Bayes DB? Actually his point wasn't the SQL clustering was beta, but that the SQL Readme on the wiki claims that SA's SQL bayes backend is beta.. But that's just an oops. I have asked, on this list and amavisd list, at least twice, if anyone has tried SA with NDB clusters. I have not gotten an answer. Do, do you have this running? Does it work? No I do not.. I don't even use SQL with SA. Is SQL Clustering in mysql a beta feature? My point wasn't to debate the merits of clustering vs replication, just to point out that Matthias was under the impression that ANY use of SQL in SA was beta.. I'd readily defer implementation details to anyone else more versed in MySQL redundancy..
spam that only hits the BAYES_99 rule
Hi, I was getting hit by a great deal of spam that only hits the BAYES_99 rule, and maybe gets less than a point or so from elsewhere. But now I'm getting ones through that are basically only hitting the BAYES_99 and nothing else; X-Spam-Score: 3.5 (***) BAYES_99 I tried to send the mail to this list to demonstrate the content but got bounced with 12.9 spam score. I'm running sa-update weekly, and rules_de_jour daily with a big set of rules, and I'm still not hitting loads of obvious spam. Particularly those with the title Re: + good and then a number appended to the end. The only thing I can think of at the moment is to reduce my requried_hits to 3.5 or increase the score for BAYES_99 to 5, but I would prefer not to do the latter as I like a default and automatically updated installation. I would be grateful for any ideas on this... Thanks, Tom H
Re: OT : MailScanner
On Sat, November 11, 2006 11:16, Suhas \(QualiSpace\) wrote: Need some inputs from the experts. experts is on mailscanner mail lists I am planning to switch to postfix + mailscanner + sa + clamav. Just want to know one thing before doing that. I have kaspersky linux edition. Can I create two antivirus scanning layers in mailscanner? don't know since i have only used mailscanner 3.x before one told me to use amavisd-new with postfix just one thing why do you want mailscanner and not amavisd-new ? -- This message was sent using 100% recycled spam mails.
Re: OT : MailScanner
Benny Pedersen wrote: On Sat, November 11, 2006 11:16, Suhas \(QualiSpace\) wrote: Need some inputs from the experts. experts is on mailscanner mail lists I am planning to switch to postfix + mailscanner + sa + clamav. Just want to know one thing before doing that. I have kaspersky linux edition. Can I create two antivirus scanning layers in mailscanner? don't know since i have only used mailscanner 3.x before one told me to use amavisd-new with postfix just one thing why do you want mailscanner and not amavisd-new ? switch from what? as for Benny's comment, it's nice to have a choice isn't it. Amavisd-new doesn't seem to have quite the active development mailScanner does. Also Amavis seems2 to be much more complicated to get going and do nice things with rules (policy banks) than MailScanner's simple config and rules syntax. Just my take from a quick 5 minute wonder onto the amavid-new docs. -- Martin Hepworth Senior Systems Administrator Solid State Logic Tel: +44 (0)1865 842300 ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote confirms that this email message has been swept for the presence of computer viruses and is believed to be clean. **
Re: OT : MailScanner
Suhas (QualiSpace) wrote: Hi, Need some inputs from the experts. I am planning to switch to postfix + mailscanner + sa + clamav. Just want to know one thing before doing that. I have kaspersky linux edition. Can I create two antivirus scanning layers in mailscanner? Yes. I use 3 scanners with MailScanner.. ClamAV, Command, and BitDefender. Works just fine, you just declare more than one scanner on your Virus Scanners config line in MailScanner.conf. Note however that MS will always scan with all of your available scanners. It won't stop after the first one finds a virus. This is handy because you can do a fair comparison of scanners, but not exactly efficient.
Re: spam that only hits the BAYES_99 rule
Tom H wrote: Hi, I was getting hit by a great deal of spam that only hits the BAYES_99 rule, and maybe gets less than a point or so from elsewhere. But now I'm getting ones through that are basically only hitting the BAYES_99 and nothing else; X-Spam-Score: 3.5 (***) BAYES_99 I tried to send the mail to this list to demonstrate the content but got bounced with 12.9 spam score. I'm running sa-update weekly, and rules_de_jour daily with a big set of rules, and I'm still not hitting loads of obvious spam. Particularly those with the title Re: + good and then a number appended to the end. The only thing I can think of at the moment is to reduce my requried_hits to 3.5 or increase the score for BAYES_99 to 5, but I would prefer not to do the latter as I like a default and automatically updated installation. I would be grateful for any ideas on this... Sounds like the message contains a URI that is now listed in many of the SURBL and URIBL lists. It may be that this got listed after you got the spam, but do you have network tests enabled?
Re: is there a way to block email coming from
In my case the rule is designed to catch UK recruiters who are always contacting me. This isn't the only way I trap spam obviously. Another thing I just realized is that this only looks for URI's in the email itself in order to determine if they reside in the UK. Something different from RBL type solutions. On Nov 10, 2006, at 8:54 PM, Benny Pedersen wrote: On Sat, November 11, 2006 02:31, Robert Nicholson wrote: header URICOUNTRY_GB eval:check_uricountry('URICOUNTRY_GB') what if a spammer sends mails from another ip outside GB ? imho such rules only changes the problem, not solving it :( -- This message was sent using 100% recycled spam mails.
Is there a release date for 3.1.8?
When will the Shortcircuit feature be made available in a release?
Re: Creating a signature of an email
Sounds to me as if the iXhash mechanism might be what you need. The iXhash plugin you find on the SA wiki works on the body of a mail, removes (redundant) parts of it and computes a hash value from the rest. The results have been found to be quite a reliable indicator for spam mails. I feed two DNS zones with the input of several spamtraps (this is what the plugins queries against), but I see no reason why you shouldn't use a modified version that stores its hashes differently. You'd need a modified version of the plugin then as well, of course. Alternatively you could use the relevant parts of the original procmail code to compute the hashes and check your incoming mails against that data. See http://www.ix.de for that. Knowing some German might help. The fine thing is that you can use the iXhash plugin along razor, pyzor and dcc. (I don't know if it's possible to use two pyzor servers from within spamassassin, I think if you set up your own server you automatically lose the capabilty to use the public one). HTH Dirk On Sat, 11 Nov 2006 03:58:00 -0500 Paul Aviles [EMAIL PROTECTED] wrote: Hi there, is there a way to create a signature or rule more or less automatically based on new spam you get? I used MessageLabs in the past and for those new messages you got they asked to forward the headers of the email to a particular account so that they could create a signature for those emails. Anything similar? Regards, Paul Aviles
Re: sa-update rules for SA 3.1.7 have been updated but they fail lint
Theo Van Dinter [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Fri, Nov 10, 2006 at 11:31:31PM -0500, Debbie D wrote: Is sa-update something built in or is it an plug-in?? It's a script that comes with 3.1. I ran sa-update service spamassassin restart and was told spamassassin is an unknown service (dur I knew that) Ok. replace service spamassassin restart with the appropriate command for your machine. BUT.. I see neither directory has updated files: /usr/share/spamassassin /etc/mail/spamassassin Correct. Now I ran sa-update -D :) and poking more I see it did bring down the latest cf files in /var/lib/spamassassin/3.001007/updates_spamassassin_org Yep. I have verified manually that at least one rule set has changed since I last upgraded on Oct 11th.. 7733 Nov 10 22:53 25_uribl.cf 6738 Oct 11 22:35 /usr/share/spamassassin/25_uribl.cf Yep. 80_additional.cf is a new file too. So now my next question is.. am I missing something here to have these downloaded rule sets in effect?? The FAQ say I should have to do nothing but Nope. but somehow I don't think that's right.. I never told SA to look for rules in this new directory and even if I did then it would be reading the rule sets twice and causing a huge load issue.. SA knows to look there by itself (see perldoc spamassassin), and it's not reading anything twice. SA uses the local state dir (/var/lib/spamassassin/...) instead of the default rules dir (/usr/share/spamassassin). OK thanks Theo.. what would be the best way for the to triple verify indeed it is picking up these new rules?? I'll set this to cron today on a weekly basic I think.. is that frequent enough?? And I assume as these folders start creating themselv'es with the new update SA knows enough to look at the lestest set only???
Re: sa-update rules for SA 3.1.7 have been updated but they fail lint
On Sat, Nov 11, 2006 at 03:08:08PM -0500, Debbie D wrote: OK thanks Theo.. what would be the best way for the to triple verify indeed it is picking up these new rules?? I'll set this to cron today on a weekly basic I think.. is that frequent enough?? spamassassin --lint -D will show what rule files are being used. Weekly is probably a good choice, daily is as frequent as I would suggest at the moment. And I assume as these folders start creating themselv'es with the new update SA knows enough to look at the lestest set only??? There's only one directory per SA version per channel. So yes. :) -- Randomly Selected Tagline: Hey, you're shaped like buddah, millions of people follow him! - The Drew Carey Show pgpkcDKDu2KEl.pgp Description: PGP signature
Re: sa-update rules for SA 3.1.7 have been updated but they fail lint
--On Saturday, November 11, 2006 3:20 PM -0500 Theo Van Dinter [EMAIL PROTECTED] wrote: spamassassin --lint -D will show what rule files are being used. Weekly is probably a good choice, daily is as frequent as I would suggest at the moment. It uses DNS to detect new updates, doesn't it? So one could use a frequency as high as the record TTL at very low cost.
RE: Running spamc via postfix not as user nobody
-Original Message- From: Michael Frotscher [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 6:19 AM To: users@spamassassin.apache.org Subject: Running spamc via postfix not as user nobody spamassassinunix - n n - - pipe user=nobody argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} Try postfix mailing list? What happens with this: user=${recipient} argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}
Re: rule secrecy *again* (Re: Well, that didn't take very bloody long)
At 12:27 PM 11/11/2006 +, Justin Mason wrote: ho hum... here we go again. :( As I've noted several times recently -- these *are* being caught by rules which were developed in the open -- namely RCVD_FORGED_WROTE, which has been sitting in my sandbox for several weeks, was announced in a checkin message (with diffs!), and is currently live in both trunk and 3.1.x rule updates. Yeah, I pushed my updates for SA and now it seems that those spams aren't getting through anymore. heh. I can't wait for this spam war to end so I can go back to my more laid back 3 month cycle of updates instead of 3-4x's a day. :( Steven Lake Owner/Technical Writer Raiden's Realm www.raiden.net A friendly web community
RE: Distributed Bayes DB?
-Original Message- From: Dhawal Doshy [mailto:[EMAIL PROTECTED] Sent: Saturday, November 11, 2006 5:54 AM To: users@spamassassin.apache.org Subject: Re: Distributed Bayes DB? that should be DRBD Or even geom_gate and geom_mirror on *BSD
Re: Is there a release date for 3.1.8?
Robert Nicholson wrote: When will the Shortcircuit feature be made available in a release? The Shortcircuit plugin should be available in 3.2.0. Recent messages have suggested that this might be released before January.
sa-update
Hey all, I am trying to run spamassassin updates on a qmail toaster install centos 4.4 but when I try it throws me this error. [EMAIL PROTECTED] ~]# sa-update -D Can't locate LWP/UserAgent.pm in @INC (@INC contains: /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib= /perl5/vendor_perl/5.8.5 /usr/lib/perl5/5.8.5/i386-linux-thread-multi /usr/= lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr= /lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl= /5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-th= read-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/= perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.= 5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/pe= rl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl= /5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.4/i386-linux= -thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr= /lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_= perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-li= nux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_per= l/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /= usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl) at /usr/bin/sa-= update line 92. BEGIN failed--compilation aborted at /usr/bin/sa-update line 92. Anyone Have any ideas? Thanks Q
RE: sa-update
Hey all, I am trying to run spamassassin updates on a qmail toaster install centos 4.4 but when I try it throws me this error. [EMAIL PROTECTED] ~]# sa-update -D Can't locate LWP/UserAgent.pm [...] BEGIN failed--compilation aborted at /usr/bin/sa-update line 92. Anyone Have any ideas? Thanks Q Install LWP? http://search.cpan.org/~gaas/libwww-perl-5.805/ Gary V _ Stay in touch with old friends and meet new ones with Windows Live Spaces http://clk.atdmt.com/MSN/go/msnnkwsp007001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=createwx_url=/friends.aspxmkt=en-us
RE: sa-update
Install LWP? http://search.cpan.org/~gaas/libwww-perl-5.805/ Gary V on Centos I think it's perl-libwww-perl _ Get today's hot entertainment gossip http://movies.msn.com/movies/hotgossip?icid=T002MSN03A07001
scoring question
Hi, I got the following in a message from our list management software: *X-Spam-Status: * Yes, hits=9.7 tagged_above=0.0 required=6.3 tests=AWL, BAYES_20, NO_RELAYS *X-Spam-Level: * * *X-Spam-Flag: * YES Basic configuration: Debian Sarge Postfix amavisd-new spamassassin 3.001003 standard ruleset, plus updates from - default channel - saupdates.openprotect.com The thing is, that if I'm reading things correctly, the scores for the listed tests are: AWL 1 (default) 50_scores.cf:score BAYES_20 0.0001 0.0001 -0.740 -0.740 50_scores.cf:score NO_RELAYS -0.001 Which should add up to .259 (net tests and Bayes turned on). So... why is this showing hits=9.7? What am I missing? Thanks very much, Miles
Re: scoring question
Miles Fidelman wrote: Hi, I got the following in a message from our list management software: *X-Spam-Status: * Yes, hits=9.7 tagged_above=0.0 required=6.3 tests=AWL, BAYES_20, NO_RELAYS *X-Spam-Level: * * *X-Spam-Flag: * YES Basic configuration: Debian Sarge Postfix amavisd-new spamassassin 3.001003 standard ruleset, plus updates from - default channel - saupdates.openprotect.com The thing is, that if I'm reading things correctly, the scores for the listed tests are: AWL 1 (default) Nope... the AWL has a variable score. It's the Automatic whitelist which is really more of a History-tracking score averager than anything else. It's only called AWL because its most common effect is to push down scores when a normally low-scoring sender sends a message that gets a high score. In this case, it went the other way. A sender that was high-scoring in the past sent a low scoring message and got pushed up. 50_scores.cf:score BAYES_20 0.0001 0.0001 -0.740 -0.740 50_scores.cf:score NO_RELAYS -0.001 Which should add up to .259 (net tests and Bayes turned on). So... why is this showing hits=9.7? What am I missing? See above, the variable score for the AWL would have been on the order of +9.45 or so. Apparently the past average for this sender is somewhere around +20, causing the AWL to add a lot to this message. The AWL score is based on the current pre-awl score, and the past average for that sender. Basically the AWL always looks at the difference between the current score, and the past average. It then adds half that difference in. See http://wiki.apache.org/spamassassin/AutoWhitelist
Re: Is there a release date for 3.1.8?
Robert Nicholson wrote: When will the Shortcircuit feature be made available in a release? I doubt that will be in 3.1.8.. sounds more like something for the 3.2.0 release. Of course I could be wrong, but usually features that make a dramatic change in how SA handles things are not done in minor releases.
RE: Running spamc via postfix not as user nobody
On Sat, November 11, 2006 22:49, Michael Scheidell wrote: What happens with this: user=${recipient} argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi -f ${sender} ${recipient} unix accounts with @ in ? -- This message was sent using 100% recycled spam mails.
Re: Creating a signature of an email
On Sat, November 11, 2006 20:47, Dirk Bonengel wrote: The fine thing is that you can use the iXhash plugin along razor, pyzor and dcc. (I don't know if it's possible to use two pyzor servers from within spamassassin, I think if you set up your own server you automatically lose the capabilty to use the public one). with more then one ip in pyzor servers list all ip will be queried and reported to, atleast it seems so here on my pyzord don't use pyzor discover that will remove your own server could be the same reason its called servers not server, to my knowledge from pyzor maillist there will be pyzord to pyzord digest exchange in a new version when ready, this will hopefully improve pyzor alot -- This message was sent using 100% recycled spam mails.
Re: rule secrecy *again* (Re: Well, that didn't take very bloody long)
From: Justin Mason [EMAIL PROTECTED] Loren Wilton writes: Ok, remember that Name Wrote: :) emails? They've completely changed. Now it's hi username instead. Joy, oh joy. Can anyone find any common elements in these emails because whoever this putz is, they're adapting a lot. They hit us, we adapt, they immediately change tactics and come at us again. Now with all the brilliant minds on this mailing list, we really should be able to find out who this putz is and nail all his stuff regardless of what tactic he switches to. The reason they adapt is because there are detailed announcements on the mailing list of the things that are easy to spot. The guy sending these is on the list too, so as soon as the oversight or excessive cleverness is announced to the world, he knows what he has to fix. ho hum... here we go again. :( As I've noted several times recently -- these *are* being caught by rules which were developed in the open -- namely RCVD_FORGED_WROTE, which has been sitting in my sandbox for several weeks, was announced in a checkin message (with diffs!), and is currently live in both trunk and 3.1.x rule updates. The rule has been visible since: r465179 | jm | 2006-10-18 10:11:15 +0100 (Wed, 18 Oct 2006) | 1 line add rule to catch 'Subject: foo wrote:' stock spam Take a look at the graph of hit-rates over time in everyone's corpora: http://ruleqa.spamassassin.org/last-night/RCVD_FORGED_WROTE?s_detail=ons_g_over_time=1s_zero=onsrcpath=#over_time_anchor There's been no change in hitrates since 2006-10-18 -- in fact, in cthielen and zmi's corpora, they rose *dramatically*. Secrecy is *NOT* an essential element of rule development. It seems logical to think it is, but evidence repeatedly demonstrates otherwise. Indeed - if you have a rule that depends on secrecy then it is too fragile to have a long life. Good rules have long usable lifetimes. {^_^}
Re: spam that only hits the BAYES_99 rule
From: Tom H [EMAIL PROTECTED] Hi, I was getting hit by a great deal of spam that only hits the BAYES_99 rule, and maybe gets less than a point or so from elsewhere. But now I'm getting ones through that are basically only hitting the BAYES_99 and nothing else; X-Spam-Score: 3.5 (***) BAYES_99 I tried to send the mail to this list to demonstrate the content but got bounced with 12.9 spam score. I'm running sa-update weekly, and rules_de_jour daily with a big set of rules, and I'm still not hitting loads of obvious spam. Particularly those with the title Re: + good and then a number appended to the end. The only thing I can think of at the moment is to reduce my requried_hits to 3.5 or increase the score for BAYES_99 to 5, but I would prefer not to do the latter as I like a default and automatically updated installation. I would be grateful for any ideas on this... Tom, my answer is a cheat. Simply raise Bayes 99 score until you start seeing false positives from it. Then reduce the score a little. It appears that either Bayes 99 is pessimistic of its likelihood of being spam or else one of my few negative scores has saved me from the expected potload of mismarked ham. I run at 5.0001. (The .0001 is just to be obnoxious about it.) {^_^}
Re: When Bayes goes bad... How to fix?
I am still trying to figure out why Bayes is giving so many false positives. 0.000 0 3 0 non-token data: bayes db version 0.000 0 101467 0 non-token data: nspam 0.000 0 39694 0 non-token data: nham 0.000 0 181047 0 non-token data: ntokens 0.000 0 1163102355 0 non-token data: oldest atime 0.000 0 1163306671 0 non-token data: newest atime 0.000 0 1163306671 0 non-token data: last journal sync atime 0.000 0 1163275571 0 non-token data: last expiry atime 0.000 0 172800 0 non-token data: last expire atime delta 0.000 0 30379 0 non-token data: last expire reduction count If I read that right the all of the tokens are from the 9th to the 11th. Is that right? In that case my suggestion to reduce the time is not going to help. But then why has the Bayes locked on to so many bad tokens? I wish there were some way to debug this. Bob
question re. whitelist_from_rcvd
Hi, I'm trying to figure out how to whitelist control messages generated by our list manager (Sympa) - which are generated on the localhost and sent to addresses on the localhost. In particular, here's a specific example: *From: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *Subject: * SPAM*** Message diffusion* *Date: * November 11, 2006 10:22:05 AM EST *To: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *Return-Path: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *X-Original-To: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *Delivered-To: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *Received: * from localhost (localhost.localdomain [127.0.0.1]) by server1.neighborhoods.net (Postfix) with ESMTP id 5CDE2B6C2F0 for [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]; Sat, 11 Nov 2006 10:22:18 -0500 (EST) *Received: * from server1.neighborhoods.net ([127.0.0.1]) by localhost (server1 [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 31180-01-2 for [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]; Sat, 11 Nov 2006 10:22:12 -0500 (EST) *Received: * by server1.neighborhoods.net (Postfix, from userid 114) id 1A9BFB6C2F6; Sat, 11 Nov 2006 10:22:05 -0500 (EST) *Mime-Version: * 1.0 *Content-Type: * text/plain; charset=utf-8; *Content-Transfer-Encoding: * 8bit *Message-Id: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] *X-Virus-Scanned: * by amavisd-new-20030616-p10 (Debian) at neighborhoods.net *X-Spam-Status: * Yes, hits=9.7 tagged_above=0.0 required=6.3 tests=AWL, BAYES_20, NO_RELAYS *X-Spam-Level: * * *X-Spam-Flag: * YES *Status:** * It's pretty clear that the entry in user_prefs would start with whitelist_from_rcvd [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] but what would I use as the domain part? Thanks very much, Miles