sa ignoring whitelist_from in user_prefs

2007-02-10 Thread Rich Winkel
For a particular user, I'm finding no correlation between his whitelist_from's
in user_prefs and the whitelist status as reported in incoming messages.
I see messages with no USER_IN_WHITELIST when both the From and From: addresses
match a whitelist_from line in the user_prefs file.  I also see messages
with USER_IN_WHITELIST but that userid is NOT listed in the user_prefs.

What could cause this??

Thanks,
Rich


whitelist problems

2007-02-10 Thread urgrue

I'm having a whitelist-related problem.
-a lot of spam comes through with WHITELISTED in the headers, yet i can 
never find the senders, IPs, etc of said messages in any whitelists, 
including the auto-whitelist.

-auto-whitelist is in use although I've disabled it everywhere.

I'm running spamassassin 3.1.7 through amavis on redhat.
Besides /etc/amavisd.conf, /etc/mail/spamassassin/*, amavis user 
home/.spamassassin/*, where else could there possibly be another 
whitelist? I've searched through all the conf files for alternate places 
but couldn't find any.
Why is auto-whitelist in use? I've explicitly disabled it in 
/etc/amavisd.conf, /etc/mail/spamassassin/v310.pre, 
/etc/mail/spamassassin/local.cf. amavis user 
home/.spamassassin/user_prefs is empty. But still the auto-whitelist 
file continues to grow and mails occasionally get AWL in their headers.





Re: IADB, 70_iadb.cf and multiple A records returned

2007-02-10 Thread Raul Dias
On Sat, 2007-02-10 at 00:00 -0500, Theo Van Dinter wrote:
 O
  If the last one is true, is the ^ $ really necessary? 
 [...]
  If it really is a RE, what preventes '127.0.0.1' to not match
  127.0.0.10? Or 127.1.0.1 to not match 127.120.1.1 ?
 
 You answered your own question. :)

Ok, this answers the first one.
This also implies that the sub-test values is always a RE and needs to
be proper delimeted.

So, in the following cases:
header RCVD_IN_SBLeval:check_rbl_sub('sblxbl', '127.0.0.2')
header RCVD_IN_MAPS_RSS  eval:check_rbl_sub('rblplus', '4')
(the last one is really commented out)

If spamhaus decides expand the ruturn code and 127.0.0.20 becomes valid
for something like this ip has an opt-in list, this rule would be
broken, right? (sure, we dont expect this change to happen).


-Raul Dias



pyzor error

2007-02-10 Thread Webmaster
pyzor stopped working on my fedora core 5 system. 
I get the following error:


Traceback (most recent call last):
 File /usr/bin/pyzor, line 3, in ?
   import pyzor.client
ImportError: No module named pyzor.client

The contents of /usr/bin/pyzor are:

#!/usr/bin/python

import pyzor.client
pyzor.client.run()

What's going on and how do I fix this?
Thank you in advance!


Re: pyzor error

2007-02-10 Thread Ed Kasky

At 06:04 AM Saturday, 2/10/2007, you wrote -=

pyzor stopped working on my fedora core 5 system. I get the following error:

Traceback (most recent call last):
 File /usr/bin/pyzor, line 3, in ?
   import pyzor.client
ImportError: No module named pyzor.client

The contents of /usr/bin/pyzor are:

#!/usr/bin/python

import pyzor.client
pyzor.client.run()

What's going on and how do I fix this?
Thank you in advance!


Did you check the wiki first? It's a wealth of information:

If you get the following error message, define PYTHONPATH to point at 
($HOME/lib/python):


Traceback (most recent call last):
  File stdin, line 1, in ?
ImportError: No module named pyzor.client


Ed

. . . . . . . . . . . . . . . . . .
Randomly Generated Quote (1017 of 1172):
Those who put out the people's eyes, reproach them
for their blindness.  -John Milton, poet (1608-1674)



Re: pyzor error

2007-02-10 Thread Webmaster
yes. the that error message is slightly different but in any case I do not 
understand what 'define PYTHONPATH to point at  ($HOME/lib/python)' means 
(what/where/how).


- Original Message - 
From: Ed Kasky [EMAIL PROTECTED]

To: Webmaster [EMAIL PROTECTED]
Cc: users@spamassassin.apache.org
Sent: Saturday, February 10, 2007 9:19 AM
Subject: Re: pyzor error



At 06:04 AM Saturday, 2/10/2007, you wrote -=
pyzor stopped working on my fedora core 5 system. I get the following 
error:


Traceback (most recent call last):
 File /usr/bin/pyzor, line 3, in ?
   import pyzor.client
ImportError: No module named pyzor.client

The contents of /usr/bin/pyzor are:

#!/usr/bin/python

import pyzor.client
pyzor.client.run()

What's going on and how do I fix this?
Thank you in advance!


Did you check the wiki first? It's a wealth of information:

If you get the following error message, define PYTHONPATH to point at 
($HOME/lib/python):


Traceback (most recent call last):
  File stdin, line 1, in ?
ImportError: No module named pyzor.client


Ed

. . . . . . . . . . . . . . . . . .
Randomly Generated Quote (1017 of 1172):
Those who put out the people's eyes, reproach them
for their blindness.  -John Milton, poet (1608-1674)



--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.432 / Virus Database: 268.17.33/678 - Release Date: 2/9/2007 
4:06 PM







Re: whitelist problems

2007-02-10 Thread Matt Kettler
urgrue wrote:
 I'm having a whitelist-related problem.
 -a lot of spam comes through with WHITELISTED in the headers, yet i
 can never find the senders, IPs, etc of said messages in any
 whitelists, including the auto-whitelist.
 -auto-whitelist is in use although I've disabled it everywhere.
The auto-whitelist has nothing to do with anything that says WHITELISTED.

The  auto-whitelist will show up as a rule named AWL. Nothing else.

That said, can you be VERY specific about what your headers say?

Does it say USER_IN_WHITELIST?

If so, check your whitelist_from and whitelist_from_rcvd entries.

In particular, make sure you didn't do anything like the common mistake
of whitelist_from [EMAIL PROTECTED]. Any spammer can trivially forge a
From: or Return-Path header, and forging your own domain in these fields
is a common tactic because spammers know many people make this mistake.


 I'm running spamassassin 3.1.7 through amavis on redhat.
 Besides /etc/amavisd.conf, /etc/mail/spamassassin/*, amavis user
 home/.spamassassin/*, where else could there possibly be another
 whitelist? I've searched through all the conf files for alternate
 places but couldn't find any.
 Why is auto-whitelist in use? I've explicitly disabled it in
 /etc/amavisd.conf, /etc/mail/spamassassin/v310.pre,
 /etc/mail/spamassassin/local.cf. amavis user
 home/.spamassassin/user_prefs is empty. But still the auto-whitelist
 file continues to grow and mails occasionally get AWL in their headers.
Well, disabling the loadplugin should do it. That said, did you restart
amavis after doing so? (these files only get parsed when a SA instance
starts up, and amavis keeps its own perl-API based copy of SA in order
to avoid the waste of calling out to external commands like spamc or
spamassassin)







Startting spamassassin

2007-02-10 Thread Mário Gamito

Hi,

I've just installed spamassassin.

I'ts been a long time since i've installed the last mail server and i 
never used version 3.


Ok, i've compiled it and copied spamd to /etc/init.d

If i just run ./spamd start, it will run as root and stucks the terminal.

So, i'm running ./spamd -u qscand start .

Is there any place where i can configure the user qscan to be the user 
that spamassassin runs by default ?


Any help would be appreciated.

Warm Regards,
Mário Gamito


Re: Startting spamassassin

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 16:44:16 +, Mário Gamito [EMAIL PROTECTED]
wrote:

Hi,

I've just installed spamassassin.

I'ts been a long time since i've installed the last mail server and i 
never used version 3.

Ok, i've compiled it and copied spamd to /etc/init.d

If i just run ./spamd start, it will run as root and stucks the terminal.

So, i'm running ./spamd -u qscand start .

Is there any place where i can configure the user qscan to be the user 
that spamassassin runs by default ?

Any help would be appreciated.

Warm Regards,
Mário Gamito

From past experience it's usually easier to lob SA in from rpm/yum. I
run it here on 3 servers and (knock on wood), this approach has yet to
cause a problem.

It's worth noting that one of the mail programs (who's name escapes
me) installs SA; I pull that off as part of my setup since I don't use
nix as a workstation so it has no reason to run a mail client.

After install it's just a matter of running setup from the cl and
enabling spamassassin (if it hasn't already been enabled).

This will of course depend very much on exactly what flavour of nix
you are running, your mailserver and various other things. I use
CentOS and have been very pleased with it.

Let me know if you need a step by step guide; I have one kicking about
here somewhere from the 'old days' of FC3.

Hope that helps.

Nigel


Re: Startting spamassassin

2007-02-10 Thread Mário Gamito

Hi,

I have spamassassin already 100% installed in a Linux server.
I just want to know how to run it as user qscand without having to type 
./spamd -u qscand start , so i can start it at boot time.


Regards,
Mário Gamito

Nigel Frankcom wrote:

On Sat, 10 Feb 2007 16:44:16 +, Mário Gamito [EMAIL PROTECTED]
wrote:


Hi,

I've just installed spamassassin.

I'ts been a long time since i've installed the last mail server and i 
never used version 3.


Ok, i've compiled it and copied spamd to /etc/init.d

If i just run ./spamd start, it will run as root and stucks the terminal.

So, i'm running ./spamd -u qscand start .

Is there any place where i can configure the user qscan to be the user 
that spamassassin runs by default ?


Any help would be appreciated.

Warm Regards,
Mário Gamito


From past experience it's usually easier to lob SA in from rpm/yum. I
run it here on 3 servers and (knock on wood), this approach has yet to
cause a problem.

It's worth noting that one of the mail programs (who's name escapes
me) installs SA; I pull that off as part of my setup since I don't use
nix as a workstation so it has no reason to run a mail client.

After install it's just a matter of running setup from the cl and
enabling spamassassin (if it hasn't already been enabled).

This will of course depend very much on exactly what flavour of nix
you are running, your mailserver and various other things. I use
CentOS and have been very pleased with it.

Let me know if you need a step by step guide; I have one kicking about
here somewhere from the 'old days' of FC3.

Hope that helps.

Nigel





Re: Startting spamassassin

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 17:12:24 +, Mário Gamito [EMAIL PROTECTED]
wrote:

Hi,

I have spamassassin already 100% installed in a Linux server.
I just want to know how to run it as user qscand without having to type 
./spamd -u qscand start , so i can start it at boot time.

Regards,
Mário Gamito

Nigel Frankcom wrote:
 On Sat, 10 Feb 2007 16:44:16 +, Mário Gamito [EMAIL PROTECTED]
 wrote:
 
 Hi,

 I've just installed spamassassin.

 I'ts been a long time since i've installed the last mail server and i 
 never used version 3.

 Ok, i've compiled it and copied spamd to /etc/init.d

 If i just run ./spamd start, it will run as root and stucks the terminal.

 So, i'm running ./spamd -u qscand start .

 Is there any place where i can configure the user qscan to be the user 
 that spamassassin runs by default ?

 Any help would be appreciated.

 Warm Regards,
 Mário Gamito
 
 From past experience it's usually easier to lob SA in from rpm/yum. I
 run it here on 3 servers and (knock on wood), this approach has yet to
 cause a problem.
 
 It's worth noting that one of the mail programs (who's name escapes
 me) installs SA; I pull that off as part of my setup since I don't use
 nix as a workstation so it has no reason to run a mail client.
 
 After install it's just a matter of running setup from the cl and
 enabling spamassassin (if it hasn't already been enabled).
 
 This will of course depend very much on exactly what flavour of nix
 you are running, your mailserver and various other things. I use
 CentOS and have been very pleased with it.
 
 Let me know if you need a step by step guide; I have one kicking about
 here somewhere from the 'old days' of FC3.
 
 Hope that helps.
 
 Nigel
 

I'm assuming you are running this under qmail? If I recall correctly
www.qmailrocks.org has a decent section on getting SA working with
qmail; though it's been a while since I tried.

Hope that helps

Kind regards

Nigel


Re: whitelist problems

2007-02-10 Thread John D. Hardin
On Sat, 10 Feb 2007, Matt Kettler wrote:

 In particular, make sure you didn't do anything like the common
 mistake of whitelist_from [EMAIL PROTECTED]. Any spammer can
 trivially forge a From: or Return-Path header, and forging your
 own domain in these fields is a common tactic because spammers
 know many people make this mistake.

Perhaps --lint should warn about whitelist_from being used at all...

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174 pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  End users want eye candy and the ooo's and hhh's experience
  when reading mail. To them email isn't a tool, but an entertainment
  form. -- Steve Lake
---
 2 days until Abraham Lincoln's and Charles Darwin's 198th Birthdays



Re: Startting spamassassin

2007-02-10 Thread Matt Kettler
Mário Gamito wrote:
 Hi,

 I've just installed spamassassin.

 I'ts been a long time since i've installed the last mail server and i
 never used version 3.

 Ok, i've compiled it and copied spamd to /etc/init.d
Don't do that. spamd isn't an init script. It's a binary executable. It
belongs in /usr/sbin or similar.

I would *STRONGLY* suggest using make install to put spamd where it belongs.

If you need an init script, look in the spamd directory. There are
several init scripts in there you can work from. (they all end in
rc-script.sh)



 If i just run ./spamd start, it will run as root and stucks the
 terminal.
Well, don't pass the word start to spamd for starters. That's
something you pass to an init script.


 So, i'm running ./spamd -u qscand start .
You got the -u part right... But don't use , use the -d option instead.
However, all of this really should be rolled up in one of the provided
init scripts.


 Is there any place where i can configure the user qscan to be the user
 that spamassassin runs by default ?
Yes, the -u parameter to spamd. But you first need to get your system
sorted out into something resembling sanity.



A New Approach: Find the Ham

2007-02-10 Thread Dan
I've developed a new approach to scoring that I want to 1) share with  
everyone and 2) make into a working system thats as accurate as what  
I've already built, but easier to use.  First, the theory:




SITUATION
In the beginning, all email was ham.  When spam came along, we left  
the ham alone and targeted the annoyance (spam).


ASSUMPTION
All messages are ham unless x,y,z score says they're spam.

APPROACH
Block nothing, then create rules to catch what you don't want.  ie,  
build tests that target the spam, then score the millions of ways  
spam can occur.


RESULT
Huge time spent tuning and retuning weights, catching everything in  
sight (including much ham).




NEW SITUATION
Ham is now the tiniest minority of all email.

NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you do want.   
ie, build tests that target the spam (keeping all the tests you've  
already built), then score the thousands of ways ham triggers on  
those tests.


NEW RESULT
Spend less time and energy while catching more of what you do want  
and less of what you don't.




CHALLENGE
All filtering software is written to score for results that equal  
spam - catch the bad


SOLUTION
Make filtering software score for results that equal ham - uncatch  
the good.



Your thoughts?

Dan


BTW, is there a better forum for this level of question?



RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Dan [mailto:[EMAIL PROTECTED]
 
 I've developed a new approach to scoring that I want to 1) share with  
 everyone and 2) make into a working system thats as accurate as what  
 I've already built, but easier to use.  First, the theory:
 
 
 
 SITUATION
 In the beginning, all email was ham.  When spam came along, we left  
 the ham alone and targeted the annoyance (spam).
 
 ASSUMPTION
 All messages are ham unless x,y,z score says they're spam.
 
 APPROACH
 Block nothing, then create rules to catch what you don't want.  ie,  
 build tests that target the spam, then score the millions of ways  
 spam can occur.
 
 RESULT
 Huge time spent tuning and retuning weights, catching everything in  
 sight (including much ham).
 
 
 
 NEW SITUATION
 Ham is now the tiniest minority of all email.
 
 NEW ASSUMPTION
 All messages are spam unless x,y,z score says they're ham.
 
 NEW APPROACH
 Block everything, then create rules to not catch what you do want.   
 ie, build tests that target the spam (keeping all the tests you've  
 already built), then score the thousands of ways ham triggers on  
 those tests.
 
 NEW RESULT
 Spend less time and energy while catching more of what you do want  
 and less of what you don't.
 
 
 
 CHALLENGE
 All filtering software is written to score for results that equal  
 spam - catch the bad
 
 SOLUTION
 Make filtering software score for results that equal ham - uncatch  
 the good.
 
 
 Your thoughts?

How can this method spend less time and energy? Aren't you going to build a 
mirrored method with respect to the actual one? Your rules wouldn't be like 
the actual ones, but negated?

Giampaolo

 
 Dan
 
 
 BTW, is there a better forum for this level of question?
 



Re: A New Approach: Find the Ham

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 20:52:17 +0100, Giampaolo Tomassoni
[EMAIL PROTECTED] wrote:

From: Dan [mailto:[EMAIL PROTECTED]
 
 I've developed a new approach to scoring that I want to 1) share with  
 everyone and 2) make into a working system thats as accurate as what  
 I've already built, but easier to use.  First, the theory:
 
 
 
 SITUATION
 In the beginning, all email was ham.  When spam came along, we left  
 the ham alone and targeted the annoyance (spam).
 
 ASSUMPTION
 All messages are ham unless x,y,z score says they're spam.
 
 APPROACH
 Block nothing, then create rules to catch what you don't want.  ie,  
 build tests that target the spam, then score the millions of ways  
 spam can occur.
 
 RESULT
 Huge time spent tuning and retuning weights, catching everything in  
 sight (including much ham).
 
 
 
 NEW SITUATION
 Ham is now the tiniest minority of all email.
 
 NEW ASSUMPTION
 All messages are spam unless x,y,z score says they're ham.
 
 NEW APPROACH
 Block everything, then create rules to not catch what you do want.   
 ie, build tests that target the spam (keeping all the tests you've  
 already built), then score the thousands of ways ham triggers on  
 those tests.
 
 NEW RESULT
 Spend less time and energy while catching more of what you do want  
 and less of what you don't.
 
 
 
 CHALLENGE
 All filtering software is written to score for results that equal  
 spam - catch the bad
 
 SOLUTION
 Make filtering software score for results that equal ham - uncatch  
 the good.
 
 
 Your thoughts?

How can this method spend less time and energy? Aren't you going to build a 
mirrored method with respect to the actual one? Your rules wouldn't be like 
the actual ones, but negated?

Giampaolo

 
 Dan
 
 
 BTW, is there a better forum for this level of question?
 

Dan has a good point; on the surface at least. spam now accounts for
80%+ of all mail, so why are we concentrating on that?

At least the point is worth debate (IMHO).

Can it be done? Even I can see that it can, given the right impetus.
Though perhaps too many companies are making a good $/£/Y off
anti-spam systems based on, around or directly using SA.

Be interesting to see where this thread goes.

Kind regards

Nigel


Re: A New Approach: Find the Ham

2007-02-10 Thread Tom Allison



CHALLENGE
All filtering software is written to score for results that equal  
spam - catch the bad


SOLUTION
Make filtering software score for results that equal ham - uncatch  
the good.



Your thoughts?


How can this method spend less time and energy? Aren't you going to build a 
mirrored method with respect to the actual one? Your rules wouldn't be like the actual 
ones, but negated?

Giampaolo


Dan


BTW, is there a better forum for this level of question?






This would be easier to filter.
It would also be more adaptive to a statistical approach than a regex approach.

Personally, I think HTML email should be outright discarded from the start.
If you look at this arguement presented by the OP then it reinforces the idea 
that most ascii is ham and most html is spam.  Therefore, reject delivery of all 
html based email.  Or to be more succinct -- reject any MIME type of alternative 
content or html only content.  That would remove probably 90% of the spam in one 
shot.


Re: A New Approach: Find the Ham

2007-02-10 Thread Miles Fidelman

Dan wrote:
I've developed a new approach to scoring that I want to 1) share with 
everyone and 2) make into a working system thats as accurate as what 
I've already built, but easier to use.  First, the theory:


NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you do want.  
ie, build tests that target the spam (keeping all the tests you've 
already built), then score the thousands of ways ham triggers on those 
tests.
It strikes me that the hardest part of this approach is filtering out 
too much ham.  At least for me, it's more important to make sure that 
people reach me, than to filter out all spam.  If we take the approach 
that everything is to be filtered out, except x,y,z - then the risk of 
filtering out too much seems pretty high.


RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Tom Allison [mailto:[EMAIL PROTECTED]
 
  CHALLENGE
  All filtering software is written to score for results that equal  
  spam - catch the bad
 
  SOLUTION
  Make filtering software score for results that equal ham - uncatch  
  the good.
 
 
  Your thoughts?
  
  How can this method spend less time and energy? Aren't you 
 going to build a mirrored method with respect to the actual 
 one? Your rules wouldn't be like the actual ones, but negated?
  
  Giampaolo
  
  Dan
 
 
  BTW, is there a better forum for this level of question?
 
  
  
 
 This would be easier to filter.
 It would also be more adaptive to a statistical approach than a 
 regex approach.
 
 Personally, I think HTML email should be outright discarded from 
 the start.
 If you look at this arguement presented by the OP then it 
 reinforces the idea 
 that most ascii is ham and most html is spam.  Therefore, reject 
 delivery of all 
 html based email.  Or to be more succinct -- reject any MIME type 
 of alternative 
 content or html only content.  That would remove probably 90% of 
 the spam in one 
 shot.

Sending text/ascii e-mails may probably fit your habits and the ones from your 
contacts, but it would result in thrashing a lot of ham on larger userbases.

Giampaolo



RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Tom Allison [mailto:[EMAIL PROTECTED]
 
  CHALLENGE
  All filtering software is written to score for results that equal  
  spam - catch the bad
 
  SOLUTION
  Make filtering software score for results that equal ham - uncatch  
  the good.
 
 
  Your thoughts?
  
  How can this method spend less time and energy? Aren't you 
 going to build a mirrored method with respect to the actual 
 one? Your rules wouldn't be like the actual ones, but negated?
  
  Giampaolo
  
  Dan
 
 
  BTW, is there a better forum for this level of question?
 
  
  
 
 This would be easier to filter.
 It would also be more adaptive to a statistical approach than a 
 regex approach.
 
 Personally, I think HTML email should be outright discarded from 
 the start.
 If you look at this arguement presented by the OP then it 
 reinforces the idea 
 that most ascii is ham and most html is spam.  Therefore, reject 
 delivery of all 
 html based email.  Or to be more succinct -- reject any MIME type 
 of alternative 
 content or html only content.  That would remove probably 90% of 
 the spam in one 
 shot.

Sending text/ascii e-mails may probably fit your habits and the ones from your 
contacts, but it would result in thrashing a lot of ham on larger userbases.

Giampaolo



Re: A New Approach: Find the Ham

2007-02-10 Thread urgrue
One consideration is that spam getting through is never more than an 
annoyance. Ham getting caught can be a big problem. So any kind of deny 
by default system has to deal with how to respond to people sending you 
mail that gets trapped and provide a way for the sender to get 
approval.  How does one join the global whitelist and how does one 
prevent spammers from joining it?


I dont think spam will ever go away until sending email costs money, via 
some kind of global digital stamp system. Which, frankly, i would 
welcome with open arms, but will probably never happen.



Dan has a good point; on the surface at least. spam now accounts for
80%+ of all mail, so why are we concentrating on that?

At least the point is worth debate (IMHO).

Can it be done? Even I can see that it can, given the right impetus.
Though perhaps too many companies are making a good $/£/Y off
anti-spam systems based on, around or directly using SA.

Be interesting to see where this thread goes.

Kind regards

Nigel
  




Re: A New Approach: Find the Ham

2007-02-10 Thread urgrue




This would be easier to filter.
It would also be more adaptive to a statistical approach than a regex 
approach.


Personally, I think HTML email should be outright discarded from the 
start.
If you look at this arguement presented by the OP then it reinforces 
the idea that most ascii is ham and most html is spam.  Therefore, 
reject delivery of all html based email.  Or to be more succinct -- 
reject any MIME type of alternative content or html only content.  
That would remove probably 90% of the spam in one shot.


Yeah, for about a week. Obviously they wont keep sending HTML mail if 
everyone is blocking it, right?


Re: A New Approach: Find the Ham

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman
[EMAIL PROTECTED] wrote:

Dan wrote:
 I've developed a new approach to scoring that I want to 1) share with 
 everyone and 2) make into a working system thats as accurate as what 
 I've already built, but easier to use.  First, the theory:

 NEW ASSUMPTION
 All messages are spam unless x,y,z score says they're ham.

 NEW APPROACH
 Block everything, then create rules to not catch what you do want.  
 ie, build tests that target the spam (keeping all the tests you've 
 already built), then score the thousands of ways ham triggers on those 
 tests.
It strikes me that the hardest part of this approach is filtering out 
too much ham.  At least for me, it's more important to make sure that 
people reach me, than to filter out all spam.  If we take the approach 
that everything is to be filtered out, except x,y,z - then the risk of 
filtering out too much seems pretty high.

These are my local stats... I'd far rather those numbers were the
other way round.

Even if Dan is wrong, at least he's thinking.

http://www.blue-canoe.com/stats/index.php?D1=11

What do Theo, Matt  Co have to say? They've been doing this a lot
longer than us.

Kind regards


RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Miles Fidelman [mailto:[EMAIL PROTECTED]
 
 Dan wrote:
  I've developed a new approach to scoring that I want to 1) share with 
  everyone and 2) make into a working system thats as accurate as what 
  I've already built, but easier to use.  First, the theory:
 
  NEW ASSUMPTION
  All messages are spam unless x,y,z score says they're ham.
 
  NEW APPROACH
  Block everything, then create rules to not catch what you do want.  
  ie, build tests that target the spam (keeping all the tests you've 
  already built), then score the thousands of ways ham triggers on those 
  tests.
 It strikes me that the hardest part of this approach is filtering out 
 too much ham.  At least for me, it's more important to make sure that 
 people reach me, than to filter out all spam.  If we take the approach 
 that everything is to be filtered out, except x,y,z - then the risk of 
 filtering out too much seems pretty high.

I definitely agree with you.

By the way, if Dan really brought a new perspective to us (i.e.: a new way to 
detect ham), what would stop us in integrating it into SA?

I would like to see this new perspective, however...

Giampaolo



Re: A New Approach: Find the Ham

2007-02-10 Thread Dan

Clarifications:

1) I'm not talking about generating new rules.  Rules stay the same.   
I'm describing a new scoring process only.


2) This would not be a replacement to SA, but an improvement.  Just a  
new way to process results already generated by SA.  Ideally, this  
would be a replacement for weights and metas.


Dan



How can this method spend less time and energy? Aren't you going  
to build a mirrored method with respect to the actual one? Your  
rules wouldn't be like the actual ones, but negated?


Giampaolo


Dan has a good point; on the surface at least. spam now accounts for
80%+ of all mail, so why are we concentrating on that?

At least the point is worth debate (IMHO).

Can it be done? Even I can see that it can, given the right impetus.
Though perhaps too many companies are making a good $/£/Y off
anti-spam systems based on, around or directly using SA.

Be interesting to see where this thread goes.

Kind regards

Nigel




Re: A New Approach: Find the Ham

2007-02-10 Thread Mark Samples
Is that the same as whitelisting, maybe I do not understand, but a very 
rigorous approach would
be a whitelist methodology which, once a new account is created, they 
send email to everyone they
want to communicate with, and it 'autowhitelists' those addresses, so 
you can only receive from those
you communicate with (or want to), i.e. the user will have to authorize 
the receipt of a message into the
whitelist (that way the email address owner is soley responsible for 
what they receive).  The main problem
(although someone may be able to come up with an appropriate 
compromise), is that if everyone were using
this methodology, how would one ever receive email?  But nonetheless, 
since there is less ham than spam
nowadays, it make more since to do what you are saying and deal with 
only the traffic the user wishes
to see instead of that which they don't,  seems the actual programming 
need to deal with this would be
less stressful on machine resources as well.  I.e. less resources would 
be consumed dealing with less
incoming crap (er mail, I mean)  Stop it at the connection... maybe 
a ulog plugin just a thought

Miles Fidelman wrote:


Dan wrote:

I've developed a new approach to scoring that I want to 1) share with 
everyone and 2) make into a working system thats as accurate as what 
I've already built, but easier to use.  First, the theory:


NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you do want.  
ie, build tests that target the spam (keeping all the tests you've 
already built), then score the thousands of ways ham triggers on 
those tests.


It strikes me that the hardest part of this approach is filtering out 
too much ham.  At least for me, it's more important to make sure that 
people reach me, than to filter out all spam.  If we take the approach 
that everything is to be filtered out, except x,y,z - then the risk of 
filtering out too much seems pretty high.






Re: whitelist problems

2007-02-10 Thread urgrue



The auto-whitelist has nothing to do with anything that says WHITELISTED.

The  auto-whitelist will show up as a rule named AWL. Nothing else.

That said, can you be VERY specific about what your headers say?

Does it say USER_IN_WHITELIST?

If so, check your whitelist_from and whitelist_from_rcvd entries.
  


It says, precisely:
X-Spam-Status: No, hits=- tagged_above=-.0 required=5.0 WHITELISTED

So if its not whitelist_from or the AWL, what can it be?


Well, disabling the loadplugin should do it. That said, did you restart
amavis after doing so? (these files only get parsed when a SA instance
starts up, and amavis keeps its own perl-API based copy of SA in order
to avoid the waste of calling out to external commands like spamc or
spamassassin)
  
AWL was never enabled, and I did restart amavis many times. The 
loadplugin line is commented out and I restarted amavis just now to make 
absolutely sure, deleted the auto-whitelist file, but back it came. I 
don't get it.


  





  




Re: whitelist problems

2007-02-10 Thread Theo Van Dinter
On Sat, Feb 10, 2007 at 10:34:35PM +0200, urgrue wrote:
 It says, precisely:
 X-Spam-Status: No, hits=- tagged_above=-.0 required=5.0 WHITELISTED
 
 So if its not whitelist_from or the AWL, what can it be?

That's not an SA header, so I'm guessing you call SA from a third party
daemon.  I'd look there.

 AWL was never enabled, and I did restart amavis many times. The 

Aha.  amavis...

-- 
Randomly Selected Tagline:
A closed mouth gathers no foot.


pgp3eTSZvFfDp.pgp
Description: PGP signature


Re: A New Approach: Find the Ham

2007-02-10 Thread Dan

On Feb 10, 2007, at 12:14, Miles Fidelman wrote:

Dan wrote:
I've developed a new approach to scoring that I want to 1) share  
with everyone and 2) make into a working system thats as accurate  
as what I've already built, but easier to use.  First, the theory:


NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you do  
want.  ie, build tests that target the spam (keeping all the tests  
you've already built), then score the thousands of ways ham  
triggers on those tests.
It strikes me that the hardest part of this approach is filtering  
out too much ham.  At least for me, it's more important to make  
sure that people reach me, than to filter out all spam.  If we take  
the approach that everything is to be filtered out, except x,y,z -  
then the risk of filtering out too much seems pretty high.


Actually, [unparalleled] accuracy is built into this approach.   
Currently, a ham gets caught and you either take out the rule that  
caught it or make a whitelist entry.


Lots of ongoing work = little cumulative return

With Find the Ham, whitelisting is almost obsolete.  When you find an  
FP, you make an exception for the specific profile, the permutation  
of which tests/rules caught the message so this specific assembly  
doesn't catch any more.  The rules stays at full strength for every  
other permutation and no whitelist is needed.


This training process is the best part of the whole approach.  It  
begins with huge FPs, but significant improvements take only a few  
weeks.  A few months (depending on the diversity of your ham) and FPs  
are very very rare.


Little ongoing work = huge cumulative return


Dan


How to Scan just incoming not outcoming emails?

2007-02-10 Thread correiob

Hi:

I have a Centos Linux, running Apache, Sendmail, Spam Assassin and 
MailScanner. This Server is POP as well as SMTP for all the mailboxes 
of my customers.


Actually, the SpamAssassin at this Server filters both, the emails 
that are being received and the emails that are being sent. This is 
giving my Server a really heavy load.


I think I don't have neither the need (nor the obligation) to filter 
agains spam the emails those mailboxes are sending. This is a task up 
to the users at their own desktops and networks. But I have to filter 
just what is being received.


So, my question is: is it possible to set Sendmail / Spam Assassin in 
order filters just the receiving emails? If so, please, tell me what 
to do. But, please, tell me like a cooking recipe, because I am not 
quite experienced with operating systems. Thanks a lot.


Mario./


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.33/678 - Release Date: 9/2/2007 16:06




___ 
Yahoo! Mail - Sempre a melhor opção para você! 
Experimente já e veja as novidades. 
http://br.yahoo.com/mailbeta/tudonovo/





Re: How to Scan just incoming not outcoming emails?

2007-02-10 Thread Theo Van Dinter
On Sat, Feb 10, 2007 at 07:42:55PM -0300, correiob wrote:
 So, my question is: is it possible to set Sendmail / Spam Assassin in 
 order filters just the receiving emails? If so, please, tell me what 
 to do. But, please, tell me like a cooking recipe, because I am not 
 quite experienced with operating systems. Thanks a lot.

Probably, but it's not a SpamAssassin question.  You need to look at your
setup and change it so that it only sends the mails you want scanned to SA.
According to your mail, you're using MailScanner, so you can look at their
docs/ask them, for more information.

-- 
Randomly Selected Tagline:
Clones are people two.


pgpXLkwsXk1HO.pgp
Description: PGP signature


Re: How to Scan just incoming not outcoming emails?

2007-02-10 Thread Evan Platt

At 02:42 PM 2/10/2007, correiob wrote:

Hi:

I have a Centos Linux, running Apache, Sendmail, Spam Assassin and 
MailScanner. This Server is POP as well as SMTP for all the 
mailboxes of my customers.


Actually, the SpamAssassin at this Server filters both, the emails 
that are being received and the emails that are being sent. This is 
giving my Server a really heavy load.


I think I don't have neither the need (nor the obligation) to filter 
agains spam the emails those mailboxes are sending. This is a task 
up to the users at their own desktops and networks. But I have to 
filter just what is being received.


So, my question is: is it possible to set Sendmail / Spam Assassin 
in order filters just the receiving emails? If so, please, tell me 
what to do. But, please, tell me like a cooking recipe, because I am 
not quite experienced with operating systems. Thanks a lot.


This is more of a sendmail question, so if no one here can answer, 
you may look there.


I run Sendmail with postfix and spamassassin and only my incoming 
mails are scanned. And as SpamAssassin only does what it's told, 
you've somehow told sendmail to scan outgoing e-mails.


Perhaps something in your sendmail config file? 



Re: A New Approach: Find the Ham

2007-02-10 Thread Raul Dias
 NEW SITUATION
 Ham is now the tiniest minority of all email.
 
 NEW ASSUMPTION
 All messages are spam unless x,y,z score says they're ham.
 
 NEW APPROACH
 Block everything, then create rules to not catch what you do want.   
 ie, build tests that target the spam (keeping all the tests you've  
 already built), then score the thousands of ways ham triggers on  
 those tests.
 
 NEW RESULT
 Spend less time and energy while catching more of what you do want  
 and less of what you don't.
 
 
 
 CHALLENGE
 All filtering software is written to score for results that equal  
 spam - catch the bad
 
 SOLUTION
 Make filtering software score for results that equal ham - uncatch  
 the good.
 
 
 Your thoughts?


Here is my $0,02.

I have a similar approach already.  My problem is that 80% of the
messages are in pt_BR, which makes a lot of the rules in SA that target
english uneffective.

There is a lot of grey area that have too much spam (FN) and ham (FP).

So, my approach is to quarentine mail from some users a low as 4.0 (or
even less).

This mail is separated to an imap folder and then manually inspected to
ham and spam folders.  This let rules be created to catch spam, but also
to catch ham (which is harder and dangerous ground).
If necessary, white and black lists are created, but this is the last
resource as it is not an affordable/scalable solution.

The spam and ham folder is then trainned with sa-learn and the ham is
given back to the user if necessary.

This approach has a drawback.  An explicity authorization of the user is
necessary (in my view).  So a user (if wants to help) may choose to let
their mail be quarentined and then get it back, or let their mail (above
4.0 score) be analysed but not quarantined (just a copy is kept and it
is not necessary to give back).

A good side of this is that is not necessary lot of users to let their
mail be analysed.  The rules will improve for everyone based of a few
users.

Bayes also plays a more important rule than in a english environment,
because of the lack of good rules in the native language.  

Site-wide Bayes is missed (per user is used), but would help separated
the grey area even more for non monitored users or low volume users.

in the scripts side I use Mail::IMAPClient and I urge anyone writting
your own scripts to stay away from Mail::Box.


-Raul Dias



Re: IADB, 70_iadb.cf and multiple A records returned

2007-02-10 Thread Theo Van Dinter
On Sat, Feb 10, 2007 at 10:09:35AM -0300, Raul Dias wrote:
 This also implies that the sub-test values is always a RE and needs to
 be proper delimeted.

If you read perldoc Mail::SpamAssassin::Conf, specifically the
check_rbl_sub() section, it'll explain what the subtests can be.  It can
be several things, including an RE.

-- 
Randomly Selected Tagline:
No, I'm not interested in developing a powerful brain.  All I'm after is
 just a mediocre brain, something like the president of American Telephone
 and Telegraph Company.
-- Alan Turing on the possibilities of a thinking
   machine, 1943.


pgp6716N0MiyG.pgp
Description: PGP signature


Re: How to Scan just incoming not outcoming emails?

2007-02-10 Thread correiob

Hi, Evan / Theo:

Well, until what I have understood, my Sendmail / Mailscanner are the 
responsible to send to Spam Assassin the emails to be filterd, so, I 
have to set Sendmail / Mailscanner in order they send to SA just 
incoming emails, right?


Thanks a lot.

Mario./


At 18:47 10/2/2007, Evan Platt wrote:


At 02:42 PM 2/10/2007, correiob wrote:

Hi:

I have a Centos Linux, running Apache, Sendmail, Spam Assassin and 
MailScanner. This Server is POP as well as SMTP for all the 
mailboxes of my customers.


Actually, the SpamAssassin at this Server filters both, the emails 
that are being received and the emails that are being sent. This is 
giving my Server a really heavy load.


I think I don't have neither the need (nor the obligation) to 
filter agains spam the emails those mailboxes are sending. This is 
a task up to the users at their own desktops and networks. But I 
have to filter just what is being received.


So, my question is: is it possible to set Sendmail / Spam Assassin 
in order filters just the receiving emails? If so, please, tell me 
what to do. But, please, tell me like a cooking recipe, because I 
am not quite experienced with operating systems. Thanks a lot.


This is more of a sendmail question, so if no one here can answer, 
you may look there.


I run Sendmail with postfix and spamassassin and only my incoming 
mails are scanned. And as SpamAssassin only does what it's told, 
you've somehow told sendmail to scan outgoing e-mails.


Perhaps something in your sendmail config file?


--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.33/678 - Release Date: 
9/2/2007 16:06





--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.33/678 - Release Date: 
9/2/2007 16:06




--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.441 / Virus Database: 268.17.33/678 - Release Date: 9/2/2007 16:06







___ 
Yahoo! Mail - Sempre a melhor opção para você! 
Experimente já e veja as novidades. 
http://br.yahoo.com/mailbeta/tudonovo/


Re: A New Approach: Find the Ham

2007-02-10 Thread Mathieu Bouchard

On Sat, 10 Feb 2007, Dan wrote:


With Find the Ham, whitelisting is almost obsolete.  When you find an FP,


How do you ever find FPs if you have so many TP to sort through that it's 
not even worth sorting through FP+TP to find the FP ? IMHO, that'd be why 
we assume that mails are ham rather than assume that they are spam.


 _ _ __ ___ _  _ _ ...
| Mathieu Bouchard - tél:+1.514.383.3801 - http://artengine.ca/matju
| Freelance Digital Arts Engineer, Montréal QC Canada

Re: IADB, 70_iadb.cf and multiple A records returned

2007-02-10 Thread Raul Dias
On Sat, 2007-02-10 at 16:53 -0500, Theo Van Dinter wrote:
 On Sat, Feb 10, 2007 at 10:09:35AM -0300, Raul Dias wrote:
  This also implies that the sub-test values is always a RE and needs to
  be proper delimeted.
 
 If you read perldoc Mail::SpamAssassin::Conf, specifically the
 check_rbl_sub() section, it'll explain what the subtests can be.  It can
 be several things, including an RE.
 

Yes, I read that.  The question is what makes it a RE if not the
delimiter?

As we discussed earlier the ^ $ is necessary to avoid matching other
numbers, which will only be possible if the value is a RE.

So:
1 - '^127.0.0.1$' matches only 127.0.0.1 and thats a RE.
2 - '127.0.0.1' might match 127.0.0.12 (if it is considered an RE).

If 2 is false, than 1 is unecessary, right?

-Raul Dias




Re: whitelist problems

2007-02-10 Thread Matt Kettler
John D. Hardin wrote:
 On Sat, 10 Feb 2007, Matt Kettler wrote:

   
 In particular, make sure you didn't do anything like the common
 mistake of whitelist_from [EMAIL PROTECTED]. Any spammer can
 trivially forge a From: or Return-Path header, and forging your
 own domain in these fields is a common tactic because spammers
 know many people make this mistake.
 

 Perhaps --lint should warn about whitelist_from being used at all...
   

Or at least warn if you haven't set an option:

 yes_i_understand_whitelist_from_sucks 1

After all, generating a lint warning will mess up folks who use RDJ..
there should be a way to disable the warning for people who really have
no other option. (otherwise they'd just remove the damn thing.)



Re: A New Approach: Find the Ham

2007-02-10 Thread Dan


On Feb 10, 2007, at 14:38, Mathieu Bouchard wrote:
How do you ever find FPs if you have so many TP to sort through  
that it's not even worth sorting through FP+TP to find the FP ?  
IMHO, that'd be why we assume that mails are ham rather than assume  
that they are spam.


I haven't found FP reviewing to be a big deal.  In my latest SA based  
configuration, for example, I organize captures according to the  
quantity of tests a given message fails.  The more tests are  
involved, the less a message needs to be double checked.


So as with other particulars, ease of use will depend on how well the  
approach is implemented.


Dan




Re: whitelist problems

2007-02-10 Thread Matt Kettler
urgrue wrote:

 The auto-whitelist has nothing to do with anything that says
 WHITELISTED.

 The  auto-whitelist will show up as a rule named AWL. Nothing else.

 That said, can you be VERY specific about what your headers say?

 Does it say USER_IN_WHITELIST?

 If so, check your whitelist_from and whitelist_from_rcvd entries.
   

 It says, precisely:
 X-Spam-Status: No, hits=- tagged_above=-.0 required=5.0 WHITELISTED

 So if its not whitelist_from or the AWL, what can it be?
That's nothing coming from spamassassin. There's no such thing as
WHITELISTED in SA, so that must be an amavis thing.

Also, the lack of any score is a good hint SA didn't do it. All of SA's
whitelists are just score modifiers.

Spamassassin is always more specific than than just WHITELISTED. AFAIK,
this is a complete list of whitelist rules for SA:

USER_IN_WHITELIST
USER_IN_DEF_WHITELIST
SUBJECT_IN_WHITELIST
USER_IN_DKIM_WHITELIST
USER_IN_DEF_DKIM_WL
USER_IN_SPF_WHITELIST
USER_IN_DK_WHITELIST
USER_IN_DEF_DK_WL
USER_IN_SPF_WHITELIST
 USER_IN_DEF_SPF_WL
USER_IN_WHITELIST_TO
 USER_IN_MORE_SPAM_TO
USER_IN_ALL_SPAM_TO
AWL

note there are some other things with negative scores, like bonded
sender, habeas coi, and hashcash, but none of these has rule names even
remotely related to the word white.





How to use eval: methods without calling check?

2007-02-10 Thread Robert Nicholson
I'd like to programatically call the methods SA uses to check for  
8bit charsets and the like but I personally do not care to make use  
of the rules engine at all. Do I need an instance of PerMsgStatus  
fully setup before I can call eval: methods programatically?


For instance I already have

my $spamtest = new Mail::SpamAssassin({
PREFIX = $PREFIX,
DEF_RULES_DIR = $DEF_RULES_DIR,
LOCAL_RULES_DIR = $LOCAL_RULES_DIR,
LOCAL_STATE_DIR = $LOCAL_STATE_DIR,
userprefs_filename = $PREFIX/.spamassassin/user_prefs,
userstate_dir = $PREFIX/.spamassassin,
debug = $debugLevel
   });

but I do not want to have to call check()

I'm looking to call things like check_for_faraway_charset,  
check_for_faraway_charset_in_headers 


Charset dealing in SA

2007-02-10 Thread Raul Dias

I am writting some rules with accents which is out of ASCII.

In my case it is ISO-8859-1 and I am sure it will match ISO-8859-1
equivalent messages.

However, how will it behave agains different charset (utf-8) in the
message body?

Does SA do anything regarding this issue like converting everything to
utf-8 first before running the REs?

This is something to consider with most non english languages.

Or is this something I shouldn't worry about?


-Raul Dias



bad OCR with some GIF images

2007-02-10 Thread Spamy.cz - Maxim Cerny
Hello,

I'm using SA 3.1.7 with FuzzyOCR 3.5.1 . This month I started having
troubles with some GIF spams. The OCR can't recognize it and prints out
only some letters after doing the OCR. Have anybody seen it?


Max

[EMAIL PROTECTED] f]# spamassassin --debug FuzzyOcr  Přep\:\ Now\ this\ is\
clearly\ not\ re.eml  /dev/null
[21573] dbg: FuzzyOcr: focr_bin_helper:
'pnmnorm,pnminvert,pamthreshold,ppmtopgm,pamtopnm'
[21573] info: FuzzyOcr: Adding 5 new helper apps
[21573] dbg: FuzzyOcr: focr_bin_helper: 'tesseract'
[21573] info: FuzzyOcr: Adding 1 new helper apps
[21573] info: FuzzyOcr: Starting preprocessor parser for file
/etc/mail/spamassassin/FuzzyOcr.preps...
[21573] dbg: FuzzyOcr: line: preprocessor normalize {
[21573] dbg: FuzzyOcr: line: command = pnmnorm
[21573] dbg: FuzzyOcr: line: }
[21573] dbg: FuzzyOcr: line: preprocessor invert {
[21573] dbg: FuzzyOcr: line: command = pnminvert
[21573] dbg: FuzzyOcr: line: }
[21573] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
[21573] dbg: FuzzyOcr: line: command = ppmtopgm
[21573] dbg: FuzzyOcr: line: }
[21573] dbg: FuzzyOcr: line: preprocessor pamtopnm {
[21573] dbg: FuzzyOcr: line: command = pamtopnm
[21573] dbg: FuzzyOcr: line: }
[21573] dbg: FuzzyOcr: line: preprocessor pamthreshold {
[21573] dbg: FuzzyOcr: line: command = pamthreshold
[21573] dbg: FuzzyOcr: line: args = -simple -threshold 0.5
[21573] dbg: FuzzyOcr: line: }
[21573] dbg: FuzzyOcr: line: preprocessor maketiff {
[21573] dbg: FuzzyOcr: line: command = pnmtotiff
[21573] dbg: FuzzyOcr: line: args = -color -truecolor
[21573] dbg: FuzzyOcr: line: }
[21573] info: FuzzyOcr: Starting scanset parser for file
/etc/mail/spamassassin/FuzzyOcr.scansets...
[21573] dbg: FuzzyOcr: line scanset ocrad {
[21573] dbg: FuzzyOcr: line command = $ocrad
[21573] dbg: FuzzyOcr: line args = -s5 $input
[21573] dbg: FuzzyOcr: line }
[21573] dbg: FuzzyOcr: line scanset ocrad-invert {
[21573] dbg: FuzzyOcr: line command = $ocrad
[21573] dbg: FuzzyOcr: line args = -s5 -i $input
[21573] dbg: FuzzyOcr: line }
[21573] dbg: FuzzyOcr: line scanset ocrad-decolorize-invert {
[21573] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[21573] dbg: FuzzyOcr: line command = $ocrad
[21573] dbg: FuzzyOcr: line args = -s5 -i $input
[21573] dbg: FuzzyOcr: line }
[21573] dbg: FuzzyOcr: line scanset ocrad-decolorize {
[21573] dbg: FuzzyOcr: line preprocessors = ppmtopgm, pamthreshold, pamtopnm
[21573] dbg: FuzzyOcr: line command = $ocrad
[21573] dbg: FuzzyOcr: line args = -s5 $input
[21573] dbg: FuzzyOcr: line }
[21573] dbg: FuzzyOcr: line scanset gocr {
[21573] dbg: FuzzyOcr: line command = $gocr
[21573] dbg: FuzzyOcr: line args = -i $input
[21573] dbg: FuzzyOcr: line }
[21573] dbg: FuzzyOcr: line scanset gocr-180 {
[21573] dbg: FuzzyOcr: line command = $gocr
[21573] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
[21573] dbg: FuzzyOcr: line }
[21573] info: FuzzyOcr: Searching in: /usr/local/netpbm/bin
[21573] info: FuzzyOcr: Searching in: /usr/local/bin
[21573] info: FuzzyOcr: Searching in: /usr/bin
[21573] info: FuzzyOcr: Using gifsicle = /usr/bin/gifsicle
[21573] dbg: FuzzyOcr: Using giffix = /bin/giffix
[21573] dbg: FuzzyOcr: Using giftext = /bin/giftext
[21573] dbg: FuzzyOcr: Using gifinter = /bin/gifinter
[21573] info: FuzzyOcr: Using giftopnm = /usr/bin/giftopnm
[21573] info: FuzzyOcr: Using jpegtopnm = /usr/bin/jpegtopnm
[21573] info: FuzzyOcr: Using pngtopnm = /usr/bin/pngtopnm
[21573] info: FuzzyOcr: Using bmptopnm = /usr/bin/bmptopnm
[21573] info: FuzzyOcr: Using tifftopnm = /usr/bin/tifftopnm
[21573] info: FuzzyOcr: Using ppmhist = /usr/bin/ppmhist
[21573] info: FuzzyOcr: Using pamfile = /usr/bin/pamfile
[21573] info: FuzzyOcr: Using ocrad = /usr/bin/ocrad
[21573] dbg: FuzzyOcr: Using gocr = /usr/local/bin/gocr
[21573] info: FuzzyOcr: Using pnmnorm = /usr/bin/pnmnorm
[21573] info: FuzzyOcr: Using pnminvert = /usr/bin/pnminvert
[21573] info: FuzzyOcr: Using pamthreshold = /usr/bin/pamthreshold
[21573] info: FuzzyOcr: Using ppmtopgm = /usr/bin/ppmtopgm
[21573] info: FuzzyOcr: Using pamtopnm = /usr/bin/pamtopnm
[21573] info: FuzzyOcr: Using tesseract = /usr/bin/tesseract
[21573] dbg: FuzzyOcr: Threshold[max_hash] = 5
[21573] dbg: FuzzyOcr: Threshold[c] = 5
[21573] dbg: FuzzyOcr: Threshold[s] = 0.01
[21573] dbg: FuzzyOcr: Threshold[w] = 0.01
[21573] dbg: FuzzyOcr: Threshold[h] = 0.01
[21573] dbg: FuzzyOcr: Threshold[cn] = 0.01
[21573] dbg: FuzzyOcr: focr_add_score = 1
[21573] dbg: FuzzyOcr: focr_autodisable_negative_score = -8
[21573] dbg: FuzzyOcr: focr_autodisable_score = 1000
[21573] dbg: FuzzyOcr: focr_autosort_buffer = 10
[21573] dbg: FuzzyOcr: focr_autosort_scanset = 1
[21573] dbg: FuzzyOcr: focr_base_score = 5
[21573] dbg: FuzzyOcr: focr_corrupt_score = 2.5
[21573] dbg: FuzzyOcr: focr_corrupt_unfixable_score = 5
[21573] dbg: FuzzyOcr: focr_counts_required = 2
[21573] dbg: FuzzyOcr: focr_db_hash = /etc/mail/spamassassin/FuzzyOcr.db
[21573] dbg: FuzzyOcr: focr_db_max_days = 35
[21573] dbg: FuzzyOcr: 

Re: How to use eval: methods without calling check?

2007-02-10 Thread Theo Van Dinter
On Sat, Feb 10, 2007 at 06:30:50PM -0600, Robert Nicholson wrote:
 I'd like to programatically call the methods SA uses to check for  
 8bit charsets and the like but I personally do not care to make use  
 of the rules engine at all. Do I need an instance of PerMsgStatus  
 fully setup before I can call eval: methods programatically?

Generally speaking, yes.

 I'm looking to call things like check_for_faraway_charset,  
 check_for_faraway_charset_in_headers 

If you look at the code, those functions clearly want a PMS object.

-- 
Randomly Selected Tagline:
Check book: a book with a unhappy ending.


pgpRAwZ8xxaKa.pgp
Description: PGP signature


Re: How to use eval: methods without calling check?

2007-02-10 Thread Robert Nicholson

This appears to be working

sub handle_potential_faraway
{
  my $mail = shift(@_);

  $spamtest-init(1);
  my $msg = Mail::SpamAssassin::PerMsgStatus-new($spamtest, $mail);

  if ($msg-check_for_faraway_charset())
  {
ignore_mail($mail);
  } elsif ($msg-check_for_faraway_charset_in_headers())
  {
ignore_mail($mail);
  } elsif ($msg-html_charset_faraway())
  {
ignore_mail($mail);
  } elsif ($msg-check_for_mime('mime_faraway_charset'))
  {
ignore_mail($mail);
  }
}

On Feb 10, 2007, at 6:50 PM, Theo Van Dinter wrote:


On Sat, Feb 10, 2007 at 06:30:50PM -0600, Robert Nicholson wrote:

I'd like to programatically call the methods SA uses to check for
8bit charsets and the like but I personally do not care to make use
of the rules engine at all. Do I need an instance of PerMsgStatus
fully setup before I can call eval: methods programatically?


Generally speaking, yes.


I'm looking to call things like check_for_faraway_charset,
check_for_faraway_charset_in_headers


If you look at the code, those functions clearly want a PMS object.

--
Randomly Selected Tagline:
Check book: a book with a unhappy ending.




Re: A New Approach: Find the Ham

2007-02-10 Thread Burak Ueda
Good point, but will cause trouble UNLESS we find a way to  recognize 
ham 100%. And it must me exactly 100% (99% won't be enough).
As other users said, with current system, if we can filter 70-80 of the 
spam, remaining 20-30% will only be an annoyance, but ham will be delivered.


But with the new approach event if the spam stopped 100%, only 1% 
undelivered ham will cause a lot of trouble.


Just my 1 Yen  :-)




Dan wrote:
I've developed a new approach to scoring that I want to 1) share with 
everyone and 2) make into a working system thats as accurate as what 
I've already built, but easier to use.  First, the theory:




SITUATION
In the beginning, all email was ham.  When spam came along, we left 
the ham alone and targeted the annoyance (spam).


ASSUMPTION
All messages are ham unless x,y,z score says they're spam.

APPROACH
Block nothing, then create rules to catch what you don't want.  ie, 
build tests that target the spam, then score the millions of ways spam 
can occur.


RESULT
Huge time spent tuning and retuning weights, catching everything in 
sight (including much ham).




NEW SITUATION
Ham is now the tiniest minority of all email.

NEW ASSUMPTION
All messages are spam unless x,y,z score says they're ham.

NEW APPROACH
Block everything, then create rules to not catch what you do want.  
ie, build tests that target the spam (keeping all the tests you've 
already built), then score the thousands of ways ham triggers on those 
tests.


NEW RESULT
Spend less time and energy while catching more of what you do want and 
less of what you don't.




CHALLENGE
All filtering software is written to score for results that equal spam 
- catch the bad


SOLUTION
Make filtering software score for results that equal ham - uncatch 
the good.



Your thoughts?

Dan


BTW, is there a better forum for this level of question?






Re: How to Scan just incoming not outcoming emails?

2007-02-10 Thread qqqq

| So, my question is: is it possible to set Sendmail / Spam Assassin in 
| order filters just the receiving emails? If so, please, tell me what 
| to do. But, please, tell me like a cooking recipe, because I am not 
| quite experienced with operating systems. Thanks a lot.
| 
| Mario./

Call SA from Procmail rather than from MailScanner.

This is what I do.

B


What does this mean? FROM_ENDS_IN_NUMS From: ends in numbers

2007-02-10 Thread dreambat
Hi,

 I got at test mailing spam report back with a score I had never seen before for
FROM_ENDS_IN_NUMS From: ends in numbers 

 There wasn't a number in the email from or reply, so I just didn't get this. 

 Thanks 

 RC


 

Never miss an email again!
Yahoo! Toolbar alerts you the instant new Mail arrives.
http://tools.search.yahoo.com/toolbar/features/mail/


Query regarding whitelist_to

2007-02-10 Thread sushma


hi,

 	Spam mail originated to list of user, if one user in whitelist_to 
then score will be neagtive so all other user also get that spam mail. how 
to aviod this.


Regards
sushma




Re: Query regarding whitelist_to

2007-02-10 Thread Theo Van Dinter
On Sun, Feb 11, 2007 at 11:36:56AM +, sushma wrote:
   Spam mail originated to list of user, if one user in whitelist_to 
 then score will be neagtive so all other user also get that spam mail. how 
 to aviod this.

If you scan your mails site-wide (ie: once per message), then you can't avoid
this.  Your best bet is to remove the whitelist_to and then at delivery for
the specific user/address, don't filter based on the scan results.

Otherwise, switch to per-user filtering and set whitelist_to for the
appropriate user.

-- 
Randomly Selected Tagline:
Human beings, who are almost unique in having the ability to learn
 from the experience of others, are also remarkable for their apparent
 disinclination to do so.- Douglas Adams


pgptE4dPA8Wjc.pgp
Description: PGP signature


Re: spamassassin learning method

2007-02-10 Thread Rizal Ferdiyan

John D. Hardin wrote:

On Sat, 10 Feb 2007, Rizal Ferdiyan wrote:

  
I want to create spamassassin learning 
method, if my client find any spam for their email they can forward it 



The act of forwarding completely changes the message.
  

Yes, i know that. Email will be add with forward header.
The best way is for them to move the message to a folder that you have 
access to. What is the mail server that the messages eventually end up 
on? Sendmail with standard mbox/maildir? Exchange?
  
My smtp proxy server serve many mail server client. My client build 
many server with their own, so that server contain two mbox format, 
mailbox or maildir. But i don't have access for that mail server 
client. That for i want my client forward spam to my account that i 
create, example [EMAIL PROTECTED] cause i can't move spam in their 
server. Any idea how to solve this problem ?



--
Best Regards,
-Rizal Ferdiyan