Re: Distributed Bayes DB?

2006-11-11 Thread Matt Kettler
Matthias Leisi wrote:
 Hello List,

 How would you set up a distributed Bayes DB?

 In this context, distributed means that I have four mailserver
 machines in parallel (all with equal MX priority) where I want to run
 Spamassassins Bayes filtering -- without introducing a single point of
 failure (eg a central database).

 All servers should thus run with local Bayes DBs.
No they shouldn't.. there are better ways.
  In order to avoid that
 they diverge too much,

 1) the files are copied from one machine to the others once a day (or
 twice, ...).

 2) the files are merged and re-distributed to all four machines once a
 day (or twice, ...).

 Do you see additional options? 
Use a SQL server backend. If you must have a no-failure option for the
bayes DB, use a  cluster of SQL servers.

Example with mysql:

http://www.howtoforge.com/loadbalanced_mysql_cluster_debian

SA 3.0.0 and higher supports generic SQL, as well as MySQL and Postgres
optimized backends for bayes storage. This is THE way to have multiple
servers share a bayes database, because it's what SQL was designed to
do. Anything else is a hack at best.

See bayes_store_module and the bayes_sql_* options in the conf manpage.

http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Conf.html

Also see the SQL readme:

http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes

 What is the best practice in that
 regard with Spamassassin? 
Using SQL is by far the best practice here.
 Is it even possible to merge Bayes DBs (and if
 yes, how)?
   
No.
 Btw., I would like a similar setup for the Autowhitelist/AWL where I
 think a simple filecopy (ie option 1 above) is sufficient.
   
Ditto. See auto_whitelist_factory in the AWL plugin manpage (assuming SA
3.1.x)

http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_AWL.html
 Thanks for your input,
 -- Matthias

   



Re: Distributed Bayes DB?

2006-11-11 Thread Matthias Leisi

Matt Kettler wrote:

 Do you see additional options? 
 Use a SQL server backend. If you must have a no-failure option for the
 bayes DB, use a  cluster of SQL servers.
 [..]

 Also see the SQL readme:
 
 http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes

I already took a look at using SQL, but this quote:

| NB:  This should be considered BETA, and the interface, schema, or
| overall operation of SQL support may change at any time with future
| releases of SA.

stops me from using it. Unfortunately, I can not run software officially
considered Beta on this system.


 Use a SQL server backend. If you must have a no-failure option for the
 bayes DB, use a  cluster of SQL servers.

 Example with mysql:

 http://www.howtoforge.com/loadbalanced_mysql_cluster_debian

I suppose that every message passed through SpamAssassin will issue at
least on query and one update statement to the DB. How does a MySQL
cluster perform with 500'000 messages per day, considering that
replication must also take place?


 What is the best practice in that
 regard with Spamassassin? 

 Using SQL is by far the best practice here.

I do not see many mentions of the SQL approach - either because it is
not used much or because it works so well?

Thanks,
-- Matthias



Re: Well, that didn't take very bloody long

2006-11-11 Thread jdow

From: Steve Lake [EMAIL PROTECTED]
Ok, remember that Name Wrote: :) emails?  They've completely 
changed.  Now it's hi username instead.  Joy, oh joy.  Can anyone find 
any common elements in these emails because whoever this putz is, they're 
adapting a lot.  They hit us, we adapt, they immediately change tactics and 
come at us again.  Now with all the brilliant minds on this mailing list, 
we really should be able to find out who this putz is and nail all his 
stuff regardless of what tactic he switches to.


I believe the record will show that I more or less predicted this with
the first postings of the wrote spam.

Obvious single features that are easily changeable are lousy for using
as rules. I figure they are digital prestidigitation - misdirect your
eye to where you want them to look so they don't notice the hard to
change features.

{^_-}


RE: Distributed Bayes DB?

2006-11-11 Thread Michael Scheidell
 -Original Message-
 From: Matthias Leisi [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 4:48 AM
 To: users@spamassassin.apache.org
 Subject: Re: Distributed Bayes DB?
 
 
 
 Matt Kettler wrote:
 
  Do you see additional options?
  Use a SQL server backend. If you must have a no-failure 
 option for the 
  bayes DB, use a  cluster of SQL servers. [..]

Or just use mysql with replication? Put mysql on two servers, replicate
the data using built in (not beta) data base replication in mysql?

For load balancing/failover, use something like 'CARP' (on *BSD systems)
similar things on linux?
Use a load/balancing ip address for target of MX as well as target for
SQL?

Can't linux itself do IP clustering?

You could also contact me offlist for information on how we have this
solved, and have systems that are doing 10million emails per day.




OT : MailScanner

2006-11-11 Thread Suhas \(QualiSpace\)








Hi,



Need some inputs from the experts. 



I am planning to switch to postfix + mailscanner + sa +
clamav. Just want to know one thing before doing that. I have kaspersky linux
edition. Can I create two antivirus scanning layers in mailscanner?



Warm Regards,

Suhas

System Administrator

QualiSpace - A QuantumPages
Enterprise

An ICANN Accredited Domain Registrar
===

URL: http://www.qualispace.com 

===

QualiSpace Community
Discussion forum: http://forum.qualispace.com










Re: Distributed Bayes DB? (SQL usage)

2006-11-11 Thread Matt Kettler
Matthias Leisi wrote:
 Matt Kettler wrote:

   
 Do you see additional options? 
   
 Use a SQL server backend. If you must have a no-failure option for the
 bayes DB, use a  cluster of SQL servers.
 [..]

 Also see the SQL readme:

 http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes
 

 I already took a look at using SQL, but this quote:

 | NB:  This should be considered BETA, and the interface, schema, or
 | overall operation of SQL support may change at any time with future
 | releases of SA.

 stops me from using it. Unfortunately, I can not run software officially
 considered Beta on this system.
   
I think that documentation line is obsolete, and has probably been
overlooked for a long time.

SQL support has been in SA since 2004, and was touted as a major feature
of SA 3.0.0.

http://mail-archives.apache.org/mod_mbox/spamassassin-announce/200409.mbox/browser

The 3.1.0 release announcement declared SQL to be THE preferred method
for bayes storage, even for single-box setups.

http://mail-archives.apache.org/mod_mbox/spamassassin-announce/200509.mbox/[EMAIL
 PROTECTED]

-

- added PostgreSQL, MySQL 4.1+, and local SDBM file Bayes storage modules. SQL
  storage is now recommended for Bayes, instead of DB_File. NDBM_File support
  has been dropped due to a major bug in that module.
-



That said, yes, they might change the schema or operation in a future
version.. But the same goes for DB files. It's happened once already..

But this is not beta, it's the recommended configuration.


   
 Use a SQL server backend. If you must have a no-failure option for the
 bayes DB, use a  cluster of SQL servers.

 Example with mysql:

 http://www.howtoforge.com/loadbalanced_mysql_cluster_debian
 

 I suppose that every message passed through SpamAssassin will issue at
 least on query and one update statement to the DB. How does a MySQL
 cluster perform with 500'000 messages per day, considering that
 replication must also take place?
   
*MUCH* faster than the default Berkely DB does:

http://wiki.apache.org/spamassassin/BayesBenchmarkResults

MySQL with MYISAM tables completed the test in 56% of the time DBM took.

Admittedly that's over lo, not the wire, but you get the point. In
general, SQL is more efficient and faster than the default Berkely DB.

SDBM is faster still, but it's got some issues with the dump/restore
process last I checked, so conversion to SDBM is not very practical. I'd
consider SDBM not well supported nor well tested, although I do use it
on my boxes.



   
 What is the best practice in that
 regard with Spamassassin? 
   
 Using SQL is by far the best practice here.
 

 I do not see many mentions of the SQL approach - either because it is
 not used much or because it works so well?

   
Erm, really?  It seems to get talked about here a lot. And the official
recommendation in the release announcement is hard to overlook.




Re: Distributed Bayes DB?

2006-11-11 Thread Matt Kettler
Michael Scheidell wrote:
 -Original Message-
 From: Matthias Leisi [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 4:48 AM
 To: users@spamassassin.apache.org
 Subject: Re: Distributed Bayes DB?



 Matt Kettler wrote:

 
 Do you see additional options?
 
 Use a SQL server backend. If you must have a no-failure 
   
 option for the 
 
 bayes DB, use a  cluster of SQL servers. [..]
   

 Or just use mysql with replication? Put mysql on two servers, replicate
 the data using built in (not beta) data base replication in mysql?

Actually his point wasn't the SQL clustering was beta, but that the SQL
Readme on the wiki claims that SA's SQL bayes backend is beta.. But
that's just an oops.


RE: OT : MailScanner

2006-11-11 Thread Randal, Phil








Yes you can, and many of us MailScanner
users do run two or more virus scanners.



You should join the MailScanner user's
mailing list, we're are a helpful lot.



Phil











From: Suhas
(QualiSpace) [mailto:[EMAIL PROTECTED] 
Sent: Saturday, November 11, 2006
10:17 AM
To: users@spamassassin.apache.org
Subject: OT : MailScanner





Hi,



Need some inputs from the experts. 



I am planning to switch to postfix + mailscanner + sa +
clamav. Just want to know one thing before doing that. I have kaspersky linux
edition. Can I create two antivirus scanning layers in mailscanner?



Warm Regards,

Suhas

System Administrator

QualiSpace - A QuantumPages
Enterprise

An ICANN Accredited Domain Registrar
===

URL: http://www.qualispace.com 

===

QualiSpace Community
Discussion forum: http://forum.qualispace.com










Re: Distributed Bayes DB?

2006-11-11 Thread Charlie Clark


Am 11.11.2006 um 10:48 schrieb Matthias Leisi:



I already took a look at using SQL, but this quote:

| NB:  This should be considered BETA, and the interface, schema, or
| overall operation of SQL support may change at any time with future
| releases of SA.

stops me from using it. Unfortunately, I can not run software  
officially

considered Beta on this system.


I suppose you could use something like NFS so that all systems share  
the same DB, config files, etc.




Use a SQL server backend. If you must have a no-failure option for  
the

bayes DB, use a  cluster of SQL servers.

Example with mysql:

http://www.howtoforge.com/loadbalanced_mysql_cluster_debian


I suppose that every message passed through SpamAssassin will issue at
least on query and one update statement to the DB. How does a MySQL
cluster perform with 500'000 messages per day, considering that
replication must also take place?


How long is a piece of string? 500,000 queries per day shouldn't  
cause any problems for an RDBMS but the architecture of such a system  
should be given a bit of consideration - connection pooling et al.


There is in fact a mail system that uses PostgreSQL to store all the  
mails. If you want more information on requirements, speed, etc. I'm  
pretty sure you could run Spamassassin on the top of it.





What is the best practice in that
regard with Spamassassin?


Using SQL is by far the best practice here.


I do not see many mentions of the SQL approach - either because it is
not used much or because it works so well?


Probably the former. And you're right not to use something like the  
SQL backend for a large volume production system. Not because it's  
unreliable but because it's still in development and keeping the  
schema up to date could become a real headache.


I suspect that at some point it might make sense to use something  
like SQLite for persistence (because it's relatively easy to  
distribute) which would make using alternative backends relatively easy.


Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





Re: OT : MailScanner

2006-11-11 Thread John Rudd

Suhas (QualiSpace) wrote:

Hi,

 

Need some inputs from the experts. 

 


I am planning to switch to postfix + mailscanner + sa + clamav. Just want to
know one thing before doing that. I have kaspersky linux edition. Can I
create two antivirus scanning layers in mailscanner?



A) probably better to ask over on the mailscanner list

B) Yes.  You can have multiple virus scanners with mailscanner.  And I'm 
pretty sure you can use kaspersky with it.


Re: Distributed Bayes DB?

2006-11-11 Thread Dhawal Doshy

Matthias Leisi wrote:

Matt Kettler wrote:

Do you see additional options? 

Use a SQL server backend. If you must have a no-failure option for the
bayes DB, use a  cluster of SQL servers.
[..]

Also see the SQL readme:

http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes


I already took a look at using SQL, but this quote:

| NB:  This should be considered BETA, and the interface, schema, or
| overall operation of SQL support may change at any time with future
| releases of SA.

stops me from using it. Unfortunately, I can not run software officially
considered Beta on this system.


Like Matt mentioned.. this is an oops. I've been using global sql bayes 
ever since the 3.0.0 release (about 2 years now).. same for awl (which i 
later disabled for lack of janitor tools).


It's rock stable and quite fast (though on a dedicated server).. for 
redundancy look at DRBL or something similar.


- dhawal


RE: Well, that didn't take very bloody long

2006-11-11 Thread Randal, Phil
But most of us aren't clever enough with Perl RE's to construct the rule
to go with it.

So where's the rule to match, folks?

Cheers,

Phil

-Original Message-
From: Tony Finch [mailto:[EMAIL PROTECTED] On Behalf Of Tony Finch
Sent: Friday, November 10, 2006 9:49 PM
To: Steve Lake
Cc: users@spamassassin.apache.org
Subject: Re: Well, that didn't take very bloody long

On Fri, 10 Nov 2006, Steve Lake wrote:

 Ok, remember that Name Wrote: :) emails?  They've completely
 changed.  Now it's hi username instead.  Joy, oh joy.  Can anyone
find any
 common elements in these emails because whoever this putz is, they're
adapting
 a lot.

http://article.gmane.org/gmane.mail.spam.spamassassin.general/90322

Tony.
-- 
f.a.n.finch  [EMAIL PROTECTED]  http://dotat.at/
VIKING: SOUTHERLY VEERING WESTERLY 6 TO GALE 8, OCCASIONALLY SEVERE GALE
9.
HIGH. RAIN THEN SHOWERS. MODERATE OR GOOD.


Re: Distributed Bayes DB?

2006-11-11 Thread Matt Kettler
Charlie Clark wrote:

 Am 11.11.2006 um 10:48 schrieb Matthias Leisi:


 I already took a look at using SQL, but this quote:

 | NB:  This should be considered BETA, and the interface, schema, or
 | overall operation of SQL support may change at any time with future
 | releases of SA.

 stops me from using it. Unfortunately, I can not run software officially
 considered Beta on this system.

 I suppose you could use something like NFS so that all systems share
 the same DB, config files, etc.
NFS would be HIGHLY not -recommended.

http://article.gmane.org/gmane.mail.spam.spamassassin.general/72362/match=sql

In fact, I personally would suggest never using NFS for anything at all,
and I'm shocked that you'd even consider using it for any production
purpose.

Besides, the point here is to eliminate any single-point-of-failure. NFS
would offer no redundancy at all. If the server hosting the NFS share
went down, the bayes DB would be unavailable.



 I do not see many mentions of the SQL approach - either because it is
 not used much or because it works so well?

 Probably the former. And you're right not to use something like the
 SQL backend for a large volume production system. Not because it's
 unreliable but because it's still in development and keeping the
 schema up to date could become a real headache.
But it's not still in development.. It's the recommended configuration
as of 3.1.0.

SA's SQL support is solid. I personally don't use it, but many here do.



Re: Distributed Bayes DB?

2006-11-11 Thread Dhawal Doshy

Dhawal Doshy wrote:

Matthias Leisi wrote:

Matt Kettler wrote:

Do you see additional options? 

Use a SQL server backend. If you must have a no-failure option for the
bayes DB, use a  cluster of SQL servers.
[..]

Also see the SQL readme:

http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes


I already took a look at using SQL, but this quote:

| NB:  This should be considered BETA, and the interface, schema, or
| overall operation of SQL support may change at any time with future
| releases of SA.

stops me from using it. Unfortunately, I can not run software officially
considered Beta on this system.


Like Matt mentioned.. this is an oops. I've been using global sql bayes 
ever since the 3.0.0 release (about 2 years now).. same for awl (which i 
later disabled for lack of janitor tools).


It's rock stable and quite fast (though on a dedicated server).. for 
redundancy look at DRBL or something similar.

  that should be DRBD

- dhawal


RE: Distributed Bayes DB?

2006-11-11 Thread Michael Scheidell
 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 5:23 AM
 To: Michael Scheidell
 Cc: users@spamassassin.apache.org
 Subject: Re: Distributed Bayes DB?

 Actually his point wasn't the SQL clustering was beta, but 
 that the SQL Readme on the wiki claims that SA's SQL bayes 
 backend is beta.. But that's just an oops.
 

I have asked, on this list and amavisd list, at least twice, if anyone
has tried SA with NDB clusters.

I have not gotten an answer.

Do, do you have this running? Does it work?

 


Re: FuzzyOcr problem (Re: Relay Checker plugin v0.2)

2006-11-11 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
John Rudd wrote:
 decoder wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 John Rudd wrote:
 D.J. wrote:
 On 11/10/06, Patrick Sneyers [EMAIL PROTECTED] wrote:
 I get this warning: plugin: failed to create instance of plugin
  Mail::SpamAssassin::Plugin::RelayChecker: Can't locate object
 method new via package
 Mail::SpamAssassin::Plugin::RelayChecker at (eval 26) line 1.


 (This is my own build of SA 3.1.7 on Max OS X Server 10.4 ppc)

 It seems to work OK though: *  3.0 RELAY_CHECKER RELAY: badrdns
  (I lowered the score)

 Patrick Sneyers Belgium

 I also received some weirdness.  When linting in debug mode, I
 found the following lines that seem to indicate that RelayChecker
 isn't playing nicely with FuzzyOCR:

 [28058] dbg: plugin: fixed relative path:
 /etc/mail/spamassassin/FuzzyOcr.pm [28058] dbg: plugin: loading
 FuzzyOcr from /etc/mail/spamassassin/FuzzyOcr.pm [28058] dbg:
 plugin: registered FuzzyOcr=HASH(0x9d04570) [28058] dbg: plugin:
 FuzzyOcr=HASH(0x9d04570) implements 'parse_config' [28058] dbg:
 FuzzyOcr: Option logfile =
 /home/amavis/.spamassassin/FuzzyOcr.log [28058] dbg: FuzzyOcr:
 Found scan: $gocr -i $pfile [28058] dbg: FuzzyOcr: Found scan:
 $gocr -l 180 -d 2 -i $pfile [28058] dbg: FuzzyOcr: Found scan:
 $gocr -l 140 -d 2 -i $pfile [28058] dbg: FuzzyOcr: Option
 threshold = 0.25 [28058] dbg: FuzzyOcr: Score{autodisable} =
 10.01 [28058] dbg: FuzzyOcr: Option counts_required = 3 [28058]
 dbg: plugin: fixed relative path:
 /etc/mail/spamassassin/RelayChecker.pm [28058] dbg: plugin:
 loading RelayChecker from /etc/mail/spamassassin/RelayChecker.pm
 [28058] dbg: plugin: registered RelayChecker=HASH(0x9d94a80)
 [28058] dbg: plugin: FuzzyOcr=HASH(0x9d04570) implements
 'parse_config' [28058] dbg: plugin: RelayChecker=HASH(0x9d94a80)
 implements 'parse_config' [28058] dbg: FuzzyOcr: unknown Score:
 relaychecker_score [28058] dbg: FuzzyOcr: unknown Option:
 relaychecker_skip_nordns [28058] dbg: FuzzyOcr: unknown Option:
 relaychecker_skip_badrdns [28058] dbg: FuzzyOcr: unknown Option:
 relaychecker_skip_baddns [28058] dbg: FuzzyOcr: unknown Option:
 relaychecker_skip_ipinhostname [28058] dbg: FuzzyOcr: unknown
 Option: relaychecker_skip_dynhostname [28058] dbg: FuzzyOcr:
 unknown Option: relaychecker_skip_clienthostname [28058] dbg:
 FuzzyOcr: unknown Option: relaychecker_skip_ip [28058] dbg:
 FuzzyOcr: unknown Option: relaychecker_pass_auth

 Ok that really doesn't look nice... is the fault on our (FuzzyOcr's)
 side?

 Yes.

 If so, then maybe someone can explain me what the correct way
 would be to fix this :)

 When you encounter an option you don't own (ie. it's not a
 FuzzyOcr option), then parse_config should return 0.


 If you could verify that this also applies to the latest development
 version (3.4.1), then that would be nice


 Yup, I found this in your 3.4.1 code (my comments indicate the issues):
Thank you very much for the work, I will patch this into our SVN
version and the 3.4.x devel branch right now.

Best regards

Chris

 sub parse_config {
 my ( $self, $opts ) = @_;

 # this is good: you're restricting yourself to ^focr_bin_ keys

 if ( $opts-{key} =~ /^focr_bin_/i ) {
 my $p = lc $opts-{key};
 $p =~ s/focr_bin_//;
 if (grep {m/$p/} @bin_utils) {
 $App{$p} = $opts-{value};
 debuglog(App{$p} = $App{$p});
 } else {
 debuglog(unknown App: $opts-{key});
 }
 # you should tell SA you processed this config option:
 #$self-inhibit_further_callbacks();
 }

 # this is bad: you're processing _score configs that may not belong to
 # FuzzyOcr.  A better statement might be:
 #elsif (($opts-{key} =~ /^focr_/i)  ($opts-{key} =~
 m/_score$/i)) {
 # that way you're only processing _score configs that belong to focr

 elsif ( $opts-{key} =~ m/_score$/i ) {
 my $o = lc $opts-{key};
 $o =~ s/focr_//;
 $o =~ s/_score//;
 if (grep {m/$o/} @pgm_scores) {
 $Score{$o} = $opts-{value};
 debuglog(Score{$o} = $Score{$o});
 } else {
 debuglog(unknown Score: $opts-{key});
 }
 # again, inhibit further callbacks here:
 #$self-inhibit_further_callbacks();
 }

 # same as above: now you're taking ANY key, from ANY plugin, and
 handling
 # it.  Bad bad bad.  This should be changed to:
 #elsif ($opts-{key} =~ /^focr_/i) {

 else {
 my $o = lc $opts-{key};
 $o =~ s/focr_//;
 if (grep {m/$o/} @pgm_opts) {
 if ($o eq 'scansets') {
 @scansets = (); # remove
 foreach my $s (split(',',$opts-{value})) {
 $s =~ s/^\s*//; $s =~ s/\s*$//;
 push @scansets,$s;
 debuglog(Found scan: $s);
 }
 } elsif ($o eq 'path_bin') {
 @paths = (); # remove
 foreach my $p 

Re: FuzzyOCR

2006-11-11 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
sokka wrote:
 Hi,

 Can anyone post me URL or PDF of clear documentation of the
 FuzzyOcr ?
The current URL for FuzzyOcr is http://fuzzyocr.own-hero.net/

The page (wiki) is still quite under construction, but you'll find
installation instructions inside the tarball (you can try version
3.4.1 if you want, it performs better than the stable version 2.3b,
just isnt tested as long yet..). Installation itself is not hard if
you have all the dependencies installed :) If you need further
assistance, check out our list at
http://lists.own-hero.net/mailman/listinfo/devel-spam

Once I get more time, I will also be able to do more work on the wiki :)


Best regards,

Chris

 thanks in advance

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFFVbEaJQIKXnJyDxURAkYrAJ4/ObuZsaThvCh13jBycDpMZrUpqQCgsdO6
UmIM0FUXykERwXZTIN7wLPo=
=dtEH
-END PGP SIGNATURE-



Unsubscribe

2006-11-11 Thread Eric Carlson
unsubscribe


Re: Questions about FuzzyOCR

2006-11-11 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Pascal Maes wrote:

 Version 2.3b


 1) Here is the ouptut of the scanner (gocr -i) :

 _

 date Informations



 9- 11-lO061O_30   Le __ek-end du 3-4r'11, les adresses de cou
 r_er jlectron_que des jtud_ants non ri_nscmts j _UCL ont jtj
 ddsact_vjes. La ra_son est pÄrement adm_n_strat_ve et I_je j Ia
 caNe j puce. Pour permeNre j ces jtud_ants de rjcupdrer leurs
 messaqes, nous avons fa_t en soNe qu'_Is pu_ssent encore accjder j
 leur boîte aux leNres jusqu'au l4.r l 1 ,/lo 06 . ANent_on, la
 consuttat_on se fera av_ un cI_ent de messager_e !Thunderb_rd.
 Eudora, Outlook.. .7 ou v_a le _IebMa_I ma_s plus v_a le poNa_I .


 We get almost the same result with gocr -l 180 -d 2 -i

 And FuzzyOCr says :

 13 FUZZY_OCR  BODY: Mail contains an image with common
 spam text inside Words found: wexe in 3 lines alert in 2 lines
 alert in 2 lines investor in 1 lines trade in 3 lines (11
 word occurrences found)

 But I don't find any of these words in th text above !

You can try lowering your fuzz from 0.3 to 0.2, I didn't make any
experience so far how the plugin reacts to text in different
languages, so this might produce false positives.

 2) How remove an image which as been stored by mistake in the hash
 database ?
In version 2.3b, this is not possible yet with a tool, unfortunately.
But the database is only a textfile, so you can simply search the hash
there and delete the line. Version 3.4.1 brings a tool that removes a
given hash from the database, but I am still improving it a bit, so
one can also pass it an image file to look for.

Best regards,

Chris

 Thanks -- Pascal




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFFVbIjJQIKXnJyDxURAkYjAJ9iFDj2oFrY+mVMyEBvEusYxxBxFQCgjZoM
SJny4nTsw1G3XgGqBOVl7S8=
=5S1J
-END PGP SIGNATURE-



Re: Distributed Bayes DB?

2006-11-11 Thread Alex Woick
Don't overrate Bayes. Don't focus solely on a bullet-proof highly 
available clustered or replicated database. If the Bayes database is 
gone, only one check is gone! All the others are still there.


For my mail content, the real filtering power today come from the 
network checks such as url-blocklists, content-checksums (razor/dcc) and 
  open-relay block lists. Focus on making these additional tests work.


For Bayes, use a central SQL database on one server that is used by all 
your MTA's, and keep it simple. Make a disaster recovery concept for the 
database machine and for the rebuild of an empty SA Bayes database. This 
could be very fast. Don't backup the Bayes token data. You wrote that 
you expect 500.000 messages per day. If you use Bayes auto-learning, an 
empty central Bayes database is refilled to a usable state from current 
messages in only a few hours. This is probably faster than a cumbersome 
restore process.


regards,
Alex


Re: current stock scams are easy to spot

2006-11-11 Thread Justin Mason

Loren Wilton writes:
  Well, that's all fine and dandy, but what do we do about them? 
  Since we know they all have a common element, we need to figure out a way 
  to stop them using that info.
 
 Well, just from the description and knowing the existance of header ALL, 
 it would be pretty trivial to write about three rules involving a capturing 
 clause to do the matching.

yep, agreed.  (If they work really well but perform really badly, they
can always be rewritten into an eval-rule plugin later.)

If someone *does* write this, please post 'em and I'll put them into my
sandbox for testing.

--j.


rule secrecy *again* (Re: Well, that didn't take very bloody long)

2006-11-11 Thread Justin Mason

Loren Wilton writes:
  Ok, remember that Name Wrote: :) emails?  They've completely 
  changed.  Now it's hi username instead.  Joy, oh joy.  Can anyone find 
  any common elements in these emails because whoever this putz is, they're 
  adapting a lot.  They hit us, we adapt, they immediately change tactics 
  and come at us again.  Now with all the brilliant minds on this mailing 
  list, we really should be able to find out who this putz is and nail all 
  his stuff regardless of what tactic he switches to.
 
 The reason they adapt is because there are detailed announcements on the 
 mailing list of the things that are easy to spot.  The guy sending these is 
 on the list too, so as soon as the oversight or excessive cleverness is 
 announced to the world, he knows what he has to fix.

ho hum... here we go again. :(

As I've noted several times recently -- these *are* being caught by rules
which were developed in the open -- namely RCVD_FORGED_WROTE, which has
been sitting in my sandbox for several weeks, was announced in a checkin
message (with diffs!), and is currently live in both trunk and 3.1.x
rule updates.

The rule has been visible since:

  r465179 | jm | 2006-10-18 10:11:15 +0100 (Wed, 18 Oct 2006) | 1 line

  add rule to catch 'Subject: foo wrote:' stock spam

Take a look at the graph of hit-rates over time in everyone's corpora:

http://ruleqa.spamassassin.org/last-night/RCVD_FORGED_WROTE?s_detail=ons_g_over_time=1s_zero=onsrcpath=#over_time_anchor

There's been no change in hitrates since 2006-10-18 -- in fact, in
cthielen and zmi's corpora, they rose *dramatically*.

Secrecy is *NOT* an essential element of rule development.  It seems
logical to think it is, but evidence repeatedly demonstrates otherwise.

For some spammers, it may _help_ -- but not for all, so it's by no means
essential.  On the other hand, secrecy damages collaborative development,
restricting rule refinement and improvement to a secret cabal.  It's
antithetical to open source development.

--j.


Re: Distributed Bayes DB?

2006-11-11 Thread Matthias Leisi

First, a thank you all for the suggestions relating to SQL. It seems SQL
support is better than I expected and I will give it a try.

Alex Woick wrote:
 Don't overrate Bayes. 

The system has been running without Bayes for roughly 3 years (with
incremental Spamassassin updates), and with good results until now.

However that system without the Bayes check handled the recent increase
in spam volumes with less success than other systems that do have Bayes
checks enabled.

 Don't focus solely on a bullet-proof highly
 available clustered or replicated database. If the Bayes database is
 gone, only one check is gone! All the others are still there.

That's a very good suggestion, since it seems like a bit of an overkill
to have additional database server machines for this simple task.

Is it even necessary to have a consistent shared storage amongst equal
MXes or would it be sufficient to let them run independently?

 For Bayes, use a central SQL database on one server that is used by all
 your MTA's, and keep it simple. Make a disaster recovery concept for the
 database machine and for the rebuild of an empty SA Bayes database. This
 could be very fast. Don't backup the Bayes token data. You wrote that

I don't worry too much about disaster recovery, more about avoiding a
single point of failure, ie if one or two machine go/es up in smoke or
is/are taken offline for maintenance the remaining machines should
continue just as before.

-- Matthias



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Distributed Bayes DB?

2006-11-11 Thread Charlie Clark


Am 11.11.2006 um 11:47 schrieb Matt Kettler:


I suppose you could use something like NFS so that all systems share
the same DB, config files, etc.

NFS would be HIGHLY not -recommended.

http://article.gmane.org/gmane.mail.spam.spamassassin.general/72362/ 
match=sql


In fact, I personally would suggest never using NFS for anything at  
all,

and I'm shocked that you'd even consider using it for any production
purpose.


NFS or equivalent has its place and can be made safe enough if  
required but I think other issues like concurrent access suggest that  
the SQL approach is the way to go.


Besides, the point here is to eliminate any single-point-of- 
failure. NFS

would offer no redundancy at all. If the server hosting the NFS share
went down, the bayes DB would be unavailable.


Agreed.

I do not see many mentions of the SQL approach - either because  
it is

not used much or because it works so well?


Probably the former. And you're right not to use something like the
SQL backend for a large volume production system. Not because it's
unreliable but because it's still in development and keeping the
schema up to date could become a real headache.

But it's not still in development.. It's the recommended configuration
as of 3.1.0.

SA's SQL support is solid. I personally don't use it, but many here  
do.


Yes, sorry I should have read all e-mails relating to the thread first.

Charlie
--
Charlie Clark
Helmholtzstr. 20
Düsseldorf
D- 40215
Tel: +49-211-938-5360
GSM: +49-178-782-6226





user_prefs

2006-11-11 Thread twofers
I have searched for several hours and can't seem to find the answer to this. I've found close answers, but not complete.I have SA set up as individual users. When a new user is created SA creates a new user_prefs file for them. This file contains two prefs. required_score 7 and rewrite_header subject SPAM.I am trying to find out if I can change some prefs so that the new user_prefs file will contain my prefs when it is newly created.I have changed prefs in user_prefs.template and that didn't make any difference. I assume this template is supposed to be used by SA to create the new user_prefs, but it doesn't seem so.Where can I add my own prefs so the newly created defualt user_prefs file isloaded with what I want?Thanks.  -
 /etc/mail/spamassassin/user_prefs.template: Default user preferences, for system admins to create, modify, and set defaults for users' preferences files. Takes precedence over the above prefs file, if it exists. Do not put system-wide settings in here; put them in a file in the "/etc/mail/spamassassin" directory ending in ".cf". This file is just a template, which will be copied to a user's home directory for them to change.  - $USER_HOME/.spamassassin/user_prefs: User preferences file. If it does not exist, one of the default
 prefs file from above will be copied here for the user to edit later, if they wish. Unless you're using spamd, there is no difference in interpretation between the rules file and the preferences file, so users can add new rules for their own use in the "~/.spamassassin/user_prefs" file, if they like. (spamd disables this for security and increased speed.) 

Access over 1 million songs - Yahoo! Music Unlimited.

Re: Questions about FuzzyOCR

2006-11-11 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
decoder wrote:
 Pascal Maes wrote:
 Version 2.3b


 1) Here is the ouptut of the scanner (gocr -i) :

 _

 date Informations



 9- 11-lO061O_30   Le __ek-end du 3-4r'11, les adresses de cou
  r_er jlectron_que des jtud_ants non ri_nscmts j _UCL ont jtj
 ddsact_vjes. La ra_son est pÄrement adm_n_strat_ve et I_je j Ia
 caNe j puce. Pour permeNre j ces jtud_ants de rjcupdrer leurs
 messaqes, nous avons fa_t en soNe qu'_Is pu_ssent encore accjder
 j leur boîte aux leNres jusqu'au l4.r l 1 ,/lo 06 . ANent_on, la
  consuttat_on se fera av_ un cI_ent de messager_e !Thunderb_rd.
 Eudora, Outlook.. .7 ou v_a le _IebMa_I ma_s plus v_a le poNa_I .



 We get almost the same result with gocr -l 180 -d 2 -i

 And FuzzyOCr says :

 13 FUZZY_OCR  BODY: Mail contains an image with
 common spam text inside Words found: wexe in 3 lines alert in
 2 lines alert in 2 lines investor in 1 lines trade in 3
 lines (11 word occurrences found)

 But I don't find any of these words in th text above !

 You can try lowering your fuzz from 0.3 to 0.2, I didn't make any
 experience so far how the plugin reacts to text in different
 languages, so this might produce false positives.
 2) How remove an image which as been stored by mistake in the
 hash database ?
 In version 2.3b, this is not possible yet with a tool,
 unfortunately. But the database is only a textfile, so you can
 simply search the hash there and delete the line. Version 3.4.1
 brings a tool that removes a given hash from the database, but I am
 still improving it a bit, so one can also pass it an image file to
 look for.
I must correct myself there, passing it an image is already supported :)

Best regards,

Chris


 Best regards,

 Chris
 Thanks -- Pascal




-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.2 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
iD8DBQFFVeMqJQIKXnJyDxURAhIbAKCpiYddgBqEBZZt1WnM9e4qjkgFfgCePG/R
mWU8mtJuXQlVIHdO90e6xR0=
=hMuz
-END PGP SIGNATURE-



Re: question about bayes database

2006-11-11 Thread pinoyskull

Matthias Haegele wrote:

pinoyskull schrieb:
will it be ok if i have 1000+ spam learned and only 300+ ham learned, 
will it still be effective?


Dont know. But i think it´s better if you learn *all* spam and ham ...

that's my problem, spams overwhelmed ham on our server

(If your spam-ham-ratio is really that bad perhaps you want to use 
some MTA-level antispam, or blacklists?)


could you give me an example of a MTA-level antispam, im kinda new to 
this, thanks




hth
MH






RE: question about bayes database

2006-11-11 Thread Michael Scheidell


 -Original Message-
 From: pinoyskull [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 9:55 AM
 To: users@spamassassin.apache.org
 Subject: Re: question about bayes database
 

  (If your spam-ham-ratio is really that bad perhaps you want to use
  some MTA-level antispam, or blacklists?)
 
 could you give me an example of a MTA-level antispam, im kinda new to 
 this, thanks

Google for 'postfix+spam'


Re: Distributed Bayes DB?

2006-11-11 Thread Matt Kettler
Michael Scheidell wrote:
 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 5:23 AM
 To: Michael Scheidell
 Cc: users@spamassassin.apache.org
 Subject: Re: Distributed Bayes DB?
 

   
 Actually his point wasn't the SQL clustering was beta, but 
 that the SQL Readme on the wiki claims that SA's SQL bayes 
 backend is beta.. But that's just an oops.

 

 I have asked, on this list and amavisd list, at least twice, if anyone
 has tried SA with NDB clusters.

 I have not gotten an answer.

 Do, do you have this running? Does it work?
   
No I do not.. I don't even use SQL with SA. Is SQL Clustering in mysql 
a beta feature?

My point wasn't to debate the merits of clustering vs replication, just
to point out that Matthias was under the impression that ANY use of SQL
in SA was beta..

I'd readily defer implementation details to anyone else more versed in
MySQL redundancy..



   

   



spam that only hits the BAYES_99 rule

2006-11-11 Thread Tom H

Hi,

I was getting hit by a great deal of spam that only hits the BAYES_99
rule, and maybe gets less than a point or so from elsewhere.
But now I'm getting ones through that are basically only hitting the
BAYES_99 and nothing else;

X-Spam-Score: 3.5 (***) BAYES_99

I tried to send the mail to this list to demonstrate the content but got 
bounced with 12.9 spam score.


I'm running sa-update weekly, and rules_de_jour daily with a big set of 
rules, and I'm still not hitting loads of obvious spam. Particularly 
those with the title Re: + good and then a number appended to the end.


The only thing I can think of at the moment is to reduce my 
requried_hits to 3.5 or increase the score for BAYES_99 to 5, but I 
would prefer not to do the latter as I like a default and automatically 
updated installation.


I would be grateful for any ideas on this...

Thanks,

Tom H










Re: OT : MailScanner

2006-11-11 Thread Benny Pedersen

On Sat, November 11, 2006 11:16, Suhas \(QualiSpace\) wrote:

 Need some inputs from the experts.

experts is on mailscanner mail lists

 I am planning to switch to postfix + mailscanner + sa + clamav. Just want to
 know one thing before doing that. I have kaspersky linux edition. Can I
 create two antivirus scanning layers in mailscanner?

don't know since i have only used mailscanner 3.x before one told me to use
amavisd-new with postfix

just one thing why do you want mailscanner and not amavisd-new ?

-- 
This message was sent using 100% recycled spam mails.




Re: OT : MailScanner

2006-11-11 Thread Martin Hepworth

Benny Pedersen wrote:

On Sat, November 11, 2006 11:16, Suhas \(QualiSpace\) wrote:


Need some inputs from the experts.


experts is on mailscanner mail lists


I am planning to switch to postfix + mailscanner + sa + clamav. Just want to
know one thing before doing that. I have kaspersky linux edition. Can I
create two antivirus scanning layers in mailscanner?


don't know since i have only used mailscanner 3.x before one told me to use
amavisd-new with postfix

just one thing why do you want mailscanner and not amavisd-new ?




switch from what?

as for Benny's comment, it's nice to have a choice isn't it.

Amavisd-new doesn't seem to have quite the active development 
mailScanner does. Also Amavis seems2 to be much more complicated to get 
going and do nice things with rules (policy banks) than MailScanner's 
simple config and rules syntax. Just my take from a quick 5 minute 
wonder onto the amavid-new docs.


--
Martin Hepworth
Senior Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

**

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   

**



Re: OT : MailScanner

2006-11-11 Thread Matt Kettler
Suhas (QualiSpace) wrote:

 Hi,

  

 Need some inputs from the experts.

  

 I am planning to switch to postfix + mailscanner + sa + clamav. Just
 want to know one thing before doing that. I have kaspersky linux
 edition. Can I create two antivirus scanning layers in mailscanner?

Yes. I use 3 scanners with MailScanner.. ClamAV, Command, and
BitDefender. Works just fine, you just declare more than one scanner on
your Virus Scanners config line in MailScanner.conf.

Note however that MS will always scan with all of your available
scanners. It won't stop after the first one finds a virus. This is handy
because you can do a fair comparison of scanners, but not exactly efficient.




Re: spam that only hits the BAYES_99 rule

2006-11-11 Thread Matt Kettler
Tom H wrote:
 Hi,

 I was getting hit by a great deal of spam that only hits the BAYES_99
 rule, and maybe gets less than a point or so from elsewhere.
 But now I'm getting ones through that are basically only hitting the
 BAYES_99 and nothing else;

 X-Spam-Score: 3.5 (***) BAYES_99

 I tried to send the mail to this list to demonstrate the content but
 got bounced with 12.9 spam score.

 I'm running sa-update weekly, and rules_de_jour daily with a big set
 of rules, and I'm still not hitting loads of obvious spam.
 Particularly those with the title Re: + good and then a number
 appended to the end.

 The only thing I can think of at the moment is to reduce my
 requried_hits to 3.5 or increase the score for BAYES_99 to 5, but I
 would prefer not to do the latter as I like a default and
 automatically updated installation.

 I would be grateful for any ideas on this...
Sounds like the message contains a URI that is now listed in many of the
SURBL and URIBL lists.

 It may be that this got listed after you got the spam, but do you have
network tests enabled?











Re: is there a way to block email coming from

2006-11-11 Thread Robert Nicholson
In my case the rule is designed to catch UK recruiters who are always  
contacting me.


This isn't the only way I trap spam obviously.

Another thing I just realized is that this only looks for URI's in  
the email itself in order to determine if they reside in the UK.  
Something different from RBL type solutions.


On Nov 10, 2006, at 8:54 PM, Benny Pedersen wrote:



On Sat, November 11, 2006 02:31, Robert Nicholson wrote:


header URICOUNTRY_GB eval:check_uricountry('URICOUNTRY_GB')


what if a spammer sends mails from another ip outside GB ?

imho such rules only changes the problem, not solving it :(

--
This message was sent using 100% recycled spam mails.



Is there a release date for 3.1.8?

2006-11-11 Thread Robert Nicholson

When will the Shortcircuit feature be made available in a release?


Re: Creating a signature of an email

2006-11-11 Thread Dirk Bonengel
Sounds to me as if the iXhash mechanism might be what you need.
The iXhash plugin you find on the SA wiki works on the body of a mail, removes 
(redundant) parts of it and computes a hash value from the rest.  The results 
have been found to be quite a reliable indicator for spam mails. I feed two DNS 
zones with the input of several spamtraps (this is what the plugins queries 
against), but I see no reason why you shouldn't use a modified version that 
stores its hashes differently. You'd need a modified version of the plugin then 
as well, of course.

Alternatively you could use the relevant parts of the original procmail code to 
compute the hashes and check your incoming mails against that data. See 
http://www.ix.de for that. Knowing some German might help. 

The fine thing is that you can use the iXhash plugin along razor, pyzor and 
dcc. (I don't know if it's possible to use two pyzor servers from within 
spamassassin, I think if you set up your own server you automatically lose the 
capabilty to use the public one).
 
HTH 

Dirk

On Sat, 11 Nov 2006 03:58:00 -0500
Paul Aviles [EMAIL PROTECTED] wrote:

 Hi there, is there a way to create a signature or rule more or less
 automatically based on new spam you get? I used MessageLabs in the past and
 for those new messages you got they asked to forward the headers of the
 email to a particular account so that they could create a signature for
 those emails.  
  
 Anything similar?
  
 Regards,
  
 Paul Aviles


Re: sa-update rules for SA 3.1.7 have been updated but they fail lint

2006-11-11 Thread Debbie D

Theo Van Dinter [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]

On Fri, Nov 10, 2006 at 11:31:31PM -0500, Debbie D wrote:
 Is sa-update something built in or is it an plug-in??

It's a script that comes with 3.1.

 I ran sa-update  service spamassassin restart
 and was told spamassassin is an unknown service (dur I knew that)

Ok.  replace service spamassassin restart with the appropriate command 
for
your machine.

 BUT.. I see neither directory has updated files:
 /usr/share/spamassassin
 /etc/mail/spamassassin

Correct.

 Now I ran sa-update -D

:)

 and poking more I see it did bring down the latest cf files in
 /var/lib/spamassassin/3.001007/updates_spamassassin_org

Yep.

 I have verified manually that at least one rule set has changed since I 
 last
 upgraded on Oct 11th..
 7733 Nov 10 22:53 25_uribl.cf
 6738 Oct 11 22:35 /usr/share/spamassassin/25_uribl.cf

Yep.  80_additional.cf is a new file too.

 So now my next question is.. am I missing something here to have these
 downloaded rule sets in effect?? The FAQ say I should have to do nothing 
 but

Nope.

 but somehow I don't think that's right.. I never told SA to look for 
 rules
 in this new directory and even if I did then it would be reading the rule
 sets twice and causing a huge load issue..

SA knows to look there by itself (see perldoc spamassassin), and it's not
reading anything twice.  SA uses the local state dir
(/var/lib/spamassassin/...) instead of the default rules dir
(/usr/share/spamassassin).


OK thanks Theo..  what would be the best way for the to triple verify indeed 
it is picking up these new rules?? I'll set this to cron today on a weekly 
basic I think.. is that frequent enough??

And I assume as these folders start creating themselv'es with the new update 
SA knows enough to look at the lestest set only???






Re: sa-update rules for SA 3.1.7 have been updated but they fail lint

2006-11-11 Thread Theo Van Dinter
On Sat, Nov 11, 2006 at 03:08:08PM -0500, Debbie D wrote:
 OK thanks Theo..  what would be the best way for the to triple verify indeed 
 it is picking up these new rules?? I'll set this to cron today on a weekly 
 basic I think.. is that frequent enough??

spamassassin --lint -D will show what rule files are being used.  Weekly is
probably a good choice, daily is as frequent as I would suggest at the moment.

 And I assume as these folders start creating themselv'es with the new update 
 SA knows enough to look at the lestest set only???

There's only one directory per SA version per channel.  So yes. :)

-- 
Randomly Selected Tagline:
Hey, you're shaped like buddah, millions of people follow him!
  - The Drew Carey Show


pgpkcDKDu2KEl.pgp
Description: PGP signature


Re: sa-update rules for SA 3.1.7 have been updated but they fail lint

2006-11-11 Thread Kenneth Porter
--On Saturday, November 11, 2006 3:20 PM -0500 Theo Van Dinter 
[EMAIL PROTECTED] wrote:



spamassassin --lint -D will show what rule files are being used.
Weekly is probably a good choice, daily is as frequent as I would suggest
at the moment.


It uses DNS to detect new updates, doesn't it? So one could use a frequency 
as high as the record TTL at very low cost.





RE: Running spamc via postfix not as user nobody

2006-11-11 Thread Michael Scheidell
 -Original Message-
 From: Michael Frotscher [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 6:19 AM
 To: users@spamassassin.apache.org
 Subject: Running spamc via postfix not as user nobody
 spamassassinunix -  n   n   -   -   pipe
user=nobody argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi 
 -f ${sender}  ${recipient}
 

Try postfix mailing list?

What happens with this:
   user=${recipient} argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi -f
${sender}  ${recipient}



Re: rule secrecy *again* (Re: Well, that didn't take very bloody long)

2006-11-11 Thread Steve Lake

At 12:27 PM 11/11/2006 +, Justin Mason wrote:

ho hum... here we go again. :(

As I've noted several times recently -- these *are* being caught by rules
which were developed in the open -- namely RCVD_FORGED_WROTE, which has
been sitting in my sandbox for several weeks, was announced in a checkin
message (with diffs!), and is currently live in both trunk and 3.1.x
rule updates.


Yeah, I pushed my updates for SA and now it seems that those spams 
aren't getting through anymore.  heh.  I can't wait for this spam war to 
end so I can go back to my more laid back 3 month cycle of updates instead 
of 3-4x's a day.  :(



Steven Lake
Owner/Technical Writer
Raiden's Realm
www.raiden.net
A friendly web community




RE: Distributed Bayes DB?

2006-11-11 Thread Michael Scheidell

 -Original Message-
 From: Dhawal Doshy [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, November 11, 2006 5:54 AM
 To: users@spamassassin.apache.org
 Subject: Re: Distributed Bayes DB?
   that should be DRBD
 

Or even geom_gate and geom_mirror on *BSD
 


Re: Is there a release date for 3.1.8?

2006-11-11 Thread Stuart Johnston

Robert Nicholson wrote:

When will the Shortcircuit feature be made available in a release?


The Shortcircuit plugin should be available in 3.2.0.  Recent messages 
have suggested that this might be released before January.


sa-update

2006-11-11 Thread Kyle Quillen


Hey all,

I am trying to run spamassassin updates on a qmail toaster install
centos 4.4 but when I try it throws me this
error.

[EMAIL PROTECTED] ~]# sa-update -D
Can't locate LWP/UserAgent.pm in @INC (@INC
contains: /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib=
/perl5/vendor_perl/5.8.5 /usr/lib/perl5/5.8.5/i386-linux-thread-multi /usr/=
lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr=
/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl=
/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-th=
read-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/=
perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.=
5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/pe=
rl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl=
/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.4/i386-linux=
-thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr=
/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_=
perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-li=
nux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_per=
l/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /=
usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl)
at /usr/bin/sa-=
update line 92.
BEGIN failed--compilation aborted at /usr/bin/sa-update line 92.


Anyone Have any ideas?

Thanks
Q




RE: sa-update

2006-11-11 Thread Gary V

Hey all,

I am trying to run spamassassin updates on a qmail toaster install
centos 4.4 but when I try it throws me this
error.

[EMAIL PROTECTED] ~]# sa-update -D
Can't locate LWP/UserAgent.pm

[...]

BEGIN failed--compilation aborted at /usr/bin/sa-update line 92.


Anyone Have any ideas?

Thanks
Q



Install LWP?
http://search.cpan.org/~gaas/libwww-perl-5.805/

Gary V

_
Stay in touch with old friends and meet new ones with Windows Live Spaces 
http://clk.atdmt.com/MSN/go/msnnkwsp007001msn/direct/01/?href=http://spaces.live.com/spacesapi.aspx?wx_action=createwx_url=/friends.aspxmkt=en-us




RE: sa-update

2006-11-11 Thread Gary V


Install LWP?
http://search.cpan.org/~gaas/libwww-perl-5.805/

Gary V



on Centos I think it's perl-libwww-perl

_
Get today's hot entertainment gossip  
http://movies.msn.com/movies/hotgossip?icid=T002MSN03A07001




scoring question

2006-11-11 Thread Miles Fidelman

Hi,

I got the following in a message from our list management software:

*X-Spam-Status: * Yes, hits=9.7 tagged_above=0.0 required=6.3 tests=AWL, 
BAYES_20, NO_RELAYS

*X-Spam-Level: * *
*X-Spam-Flag: * YES

Basic configuration:
Debian Sarge
Postfix
amavisd-new
spamassassin 3.001003
standard ruleset, plus updates from
- default channel
- saupdates.openprotect.com

The thing is, that if I'm reading things correctly, the scores for the 
listed tests are:

AWL 1 (default)
50_scores.cf:score BAYES_20 0.0001 0.0001 -0.740 -0.740
50_scores.cf:score NO_RELAYS -0.001

Which should add up to .259 (net tests and Bayes turned on).

So... why is this showing hits=9.7?  What am I missing?

Thanks very much,

Miles





Re: scoring question

2006-11-11 Thread Matt Kettler
Miles Fidelman wrote:
 Hi,

 I got the following in a message from our list management software:

 *X-Spam-Status: * Yes, hits=9.7 tagged_above=0.0 required=6.3
 tests=AWL, BAYES_20, NO_RELAYS
 *X-Spam-Level: * *
 *X-Spam-Flag: * YES

 Basic configuration:
 Debian Sarge
 Postfix
 amavisd-new
 spamassassin 3.001003
 standard ruleset, plus updates from
 - default channel
 - saupdates.openprotect.com

 The thing is, that if I'm reading things correctly, the scores for the
 listed tests are:
 AWL 1 (default)
Nope... the AWL has a variable score. It's the Automatic whitelist
which is really more of a History-tracking score averager than
anything else. It's only called AWL because its most common effect is to
push down scores when a normally low-scoring sender sends a message that
gets a high score. In this case, it went the other way. A sender that
was high-scoring in the past sent a low scoring message and got pushed up.
 50_scores.cf:score BAYES_20 0.0001 0.0001 -0.740 -0.740
 50_scores.cf:score NO_RELAYS -0.001

 Which should add up to .259 (net tests and Bayes turned on).

 So... why is this showing hits=9.7?  What am I missing?
See above, the variable score for the AWL would have been on the order
of +9.45 or so.

Apparently the past average for this sender is somewhere around +20,
causing the AWL to add a lot to this message.

The AWL score is based on the current pre-awl score, and the past
average for that sender.

Basically the AWL always looks at the difference between the current
score, and the past average. It then adds half that difference in.

See  http://wiki.apache.org/spamassassin/AutoWhitelist








Re: Is there a release date for 3.1.8?

2006-11-11 Thread Matt Kettler
Robert Nicholson wrote:
 When will the Shortcircuit feature be made available in a release?

I doubt that will be in 3.1.8.. sounds more like something for the 3.2.0
release.

Of course I could be wrong, but usually features that make a dramatic
change in how SA handles things are not done in minor releases.



RE: Running spamc via postfix not as user nobody

2006-11-11 Thread Benny Pedersen
On Sat, November 11, 2006 22:49, Michael Scheidell wrote:

 What happens with this:
user=${recipient} argv=/usr/bin/spamc -e /usr/sbin/sendmail -oi -f
 ${sender}  ${recipient}


unix accounts with @ in ?

-- 
This message was sent using 100% recycled spam mails.





Re: Creating a signature of an email

2006-11-11 Thread Benny Pedersen

On Sat, November 11, 2006 20:47, Dirk Bonengel wrote:

 The fine thing is that you can use the iXhash plugin along razor, pyzor and
 dcc. (I don't know if it's possible to use two pyzor servers from within
 spamassassin, I think if you set up your own server you automatically lose the
 capabilty to use the public one).

with more then one ip in pyzor servers list all ip will be queried and
reported to, atleast it seems so here on my pyzord

don't use pyzor discover that will remove your own server

could be the same reason its called servers not server, to my knowledge from
pyzor maillist there will be pyzord to pyzord digest exchange in a new version
when ready, this will hopefully improve pyzor alot

-- 
This message was sent using 100% recycled spam mails.




Re: rule secrecy *again* (Re: Well, that didn't take very bloody long)

2006-11-11 Thread jdow

From: Justin Mason [EMAIL PROTECTED]


Loren Wilton writes:

 Ok, remember that Name Wrote: :) emails?  They've completely
 changed.  Now it's hi username instead.  Joy, oh joy.  Can anyone find
 any common elements in these emails because whoever this putz is, they're
 adapting a lot.  They hit us, we adapt, they immediately change tactics
 and come at us again.  Now with all the brilliant minds on this mailing
 list, we really should be able to find out who this putz is and nail all
 his stuff regardless of what tactic he switches to.

The reason they adapt is because there are detailed announcements on the
mailing list of the things that are easy to spot.  The guy sending these is
on the list too, so as soon as the oversight or excessive cleverness is
announced to the world, he knows what he has to fix.


ho hum... here we go again. :(

As I've noted several times recently -- these *are* being caught by rules
which were developed in the open -- namely RCVD_FORGED_WROTE, which has
been sitting in my sandbox for several weeks, was announced in a checkin
message (with diffs!), and is currently live in both trunk and 3.1.x
rule updates.

The rule has been visible since:

 r465179 | jm | 2006-10-18 10:11:15 +0100 (Wed, 18 Oct 2006) | 1 line

 add rule to catch 'Subject: foo wrote:' stock spam

Take a look at the graph of hit-rates over time in everyone's corpora:

http://ruleqa.spamassassin.org/last-night/RCVD_FORGED_WROTE?s_detail=ons_g_over_time=1s_zero=onsrcpath=#over_time_anchor

There's been no change in hitrates since 2006-10-18 -- in fact, in
cthielen and zmi's corpora, they rose *dramatically*.

Secrecy is *NOT* an essential element of rule development.  It seems
logical to think it is, but evidence repeatedly demonstrates otherwise.


Indeed - if you have a rule that depends on secrecy then it is too
fragile to have a long life. Good rules have long usable lifetimes.

{^_^} 



Re: spam that only hits the BAYES_99 rule

2006-11-11 Thread jdow

From: Tom H [EMAIL PROTECTED]


Hi,

I was getting hit by a great deal of spam that only hits the BAYES_99
rule, and maybe gets less than a point or so from elsewhere.
But now I'm getting ones through that are basically only hitting the
BAYES_99 and nothing else;

X-Spam-Score: 3.5 (***) BAYES_99

I tried to send the mail to this list to demonstrate the content but got 
bounced with 12.9 spam score.


I'm running sa-update weekly, and rules_de_jour daily with a big set of 
rules, and I'm still not hitting loads of obvious spam. Particularly 
those with the title Re: + good and then a number appended to the end.


The only thing I can think of at the moment is to reduce my 
requried_hits to 3.5 or increase the score for BAYES_99 to 5, but I 
would prefer not to do the latter as I like a default and automatically 
updated installation.


I would be grateful for any ideas on this...


Tom, my answer is a cheat. Simply raise Bayes 99 score until you start
seeing false positives from it. Then reduce the score a little. It appears
that either Bayes 99 is pessimistic of its likelihood of being spam or
else one of my few negative scores has saved me from the expected potload
of mismarked ham. I run at 5.0001. (The .0001 is just to be obnoxious
about it.)

{^_^}


Re: When Bayes goes bad... How to fix?

2006-11-11 Thread Bob Proulx
I am still trying to figure out why Bayes is giving so many false
positives.

0.000  0  3  0  non-token data: bayes db version
0.000  0 101467  0  non-token data: nspam
0.000  0  39694  0  non-token data: nham
0.000  0 181047  0  non-token data: ntokens
0.000  0 1163102355  0  non-token data: oldest atime
0.000  0 1163306671  0  non-token data: newest atime
0.000  0 1163306671  0  non-token data: last journal sync atime
0.000  0 1163275571  0  non-token data: last expiry atime
0.000  0 172800  0  non-token data: last expire atime delta
0.000  0  30379  0  non-token data: last expire reduction 
count

If I read that right the all of the tokens are from the 9th to the
11th.  Is that right?  In that case my suggestion to reduce the time
is not going to help.  But then why has the Bayes locked on to so many
bad tokens?  I wish there were some way to debug this.

Bob


question re. whitelist_from_rcvd

2006-11-11 Thread Miles Fidelman

Hi,

I'm trying to figure out how to whitelist control messages generated by 
our list manager (Sympa) - which are generated on the localhost and sent 
to addresses on the localhost.


In particular, here's a specific example:

*From: *   [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
*Subject: * SPAM*** Message diffusion*
*Date: * November 11, 2006 10:22:05 AM EST
*To: *   [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
*Return-Path: * [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED]

*X-Original-To: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
*Delivered-To: * [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
*Received: * from localhost (localhost.localdomain [127.0.0.1]) by 
server1.neighborhoods.net (Postfix) with ESMTP id 5CDE2B6C2F0 for 
[EMAIL PROTECTED] mailto:[EMAIL PROTECTED]; Sat, 11 Nov 2006 10:22:18 
-0500 (EST)
*Received: * from server1.neighborhoods.net ([127.0.0.1]) by localhost 
(server1 [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 31180-01-2 
for [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]; Sat, 11 Nov 2006 
10:22:12 -0500 (EST)
*Received: * by server1.neighborhoods.net (Postfix, from userid 114) id 
1A9BFB6C2F6; Sat, 11 Nov 2006 10:22:05 -0500 (EST)

*Mime-Version: * 1.0
*Content-Type: * text/plain; charset=utf-8;
*Content-Transfer-Encoding: * 8bit
*Message-Id: * [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED]
*X-Virus-Scanned: * by amavisd-new-20030616-p10 (Debian) at 
neighborhoods.net
*X-Spam-Status: * Yes, hits=9.7 tagged_above=0.0 required=6.3 tests=AWL, 
BAYES_20, NO_RELAYS

*X-Spam-Level: * *
*X-Spam-Flag: * YES
*Status:** *

It's pretty clear that the entry in user_prefs would start with

whitelist_from_rcvd [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]

but what would I use as the domain part? 


Thanks very much,

Miles