sa-learn --ham ground rules

2008-02-07 Thread Gene Heskett
Greetings;

About an hour ago, based on some comments made that the bayes database needed 
trained on ham as well as spam, and because it seemed to be forgetting some 
of the stuff I'd fed it as spam, I re-wrote that filter rule in kmail to 
launch it using one of my sorted directories from a mailing this as the 
argument.  Syntax otherwise the same as the sa-learn-spam filter.

The sa-learn --spam can process a message in 5 to 10 seconds or so, so if I've 
dropped 20 doofus mails in the spam directory and fire it off, I have it done 
and kmail is back among the living in 2-3 minutes.

But, feeding it a 'ham' directory with about 7k messages in it, turned 
sa-learn into a 100% cpu hog, incrementing the message processed number only 
about every 3 to 5 minutes. I couldn't kill it, it kept coming back and I 
must have fed it a kill -9 50 times.  Finally, one of the kills killed x too!  
But no console came back, so I had to hit the reset button.  The reboot was 
like molassses in January, so I did a power down, same story.  Same story 3 
times running, so I went and made a sandwich while it set powered down.  Then 
the reboot was normal up to e2fscking a a 372GB drive I use for amanda, the 
backup proggy.  That hung, with no indication of progress for about 20 
minutes, no marching  or anything.  But it finally fell through and 
completed the bootup, and is running normally now but it has taken the 
majority of an hour to do this.

So what is the maximum number of files in a directory that one can feed to 
sa-learn --ham and expect it to achieve normal speed?  I vaguely recall 
feeding it my corpus of another folder it was having trouble with a year ago, 
the linux-usb list, 600 to 1k messages in it and it was finished in an hour 
that time.

The command that kmail issues to it is:
sa-learn --ham  /root/Mail/(foldername)/cur

Where foldername is whatever mailing list I want to tell it is ham.

Is this correct?  I've had it setup that way for 2 or 3 years at least and 
till now it hasn't been that much of a problem.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
"What a wonder is USENET; such wholesale production of conjecture from
such a trifling investment in fact."
-- Carl S. Gutekunst


Re: score

2008-02-07 Thread Matt Kettler

Andrea Bencini wrote:

I installed postfix-2.4.5-2.fc8, amavisd-new-2.5.2-2.fc8 and
spamassassin-3.2.3-2.fc8.
They are running.
I would like to test spam changing "score" in local.cf.
My local.cf is:

report_safe 0
use_bayes 1
use_bayes_rules 1
skip_rbl_checks 0
bayes_path /var/spool/amavisd/.spamassassin/bayes
score FREE_PORN 1000
score LIVE_PORN 1100

Now I send an e-mail where there are  the words "porno" and "sex" in the
message body.

FREE_PORN triggers on that two-word phrase, not just "porno"

The regex for the rule is:
/\bfree (?:porn|xxx|adult)/i

So, it will catch "free" followed (immediately) by porn, xxx or adult.

LIVE_PORN works in a similar fashion.




Re: Getting ? in spam scores.

2008-02-07 Thread Karsten Bräckelmann
On Thu, 2008-02-07 at 14:57 -0800, fchan wrote:
> I'm getting spam scores (ie No, hits=? required=?) from certain types 
> of spam messages. Most of them have phishing links but I have no 
> problem with most messages with links. I'm using qmail with 
> qmail-scanner-2.01st on RedHat Linux ES5.

> Wed, 06 Feb 2008 09:16:41 PST:18972: clamdscan: finished scan in 0.011407 secs
> Wed, 06 Feb 2008 09:17:26 PST:18972: SA: finished scan in 45.026522 
> secs - hits=?/?

Does that mean qmail-scanner forced further processing due to the
timeout, without actually waiting for SA to finish? (Despite the success
suggesting phrase...)

> Wed, 06 Feb 2008 09:17:26 PST:18972: p_s: finished scan in 0.020737 secs
> Wed, 06 Feb 2008 09:17:26 PST:18972: ini_sc: finished scan of 
> "/var/spool/qmailscan/tmp/s1.molsci.org120231820076418972"
> 
> I have set timeout on qmailscanner for spamc to 45 seconds. Why are, 
> what I guess, links causing this.

Are you positive this is related to links?  SA queries URI blacklists.
Is it possible you have a DNS issue by any chance?


> Does spamassassin normally take 
> this long to scan or should I set it longer? Am I missing some 
> setting/plugin that is causing this?

No, generally, SA should have been done within that timeout. Depending
on the machine, network and stuff, the average scanning time mentioned
here is a few seconds only.

However, occasionally single messages taking up to 90 seconds or above
have been reported, too. IMHO, this is nothing to worry about, unless
your average is really high. In that case, you might have DNS issues. Or
feeding really large mail through SA. Or, etc.

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Body vs headers

2008-02-07 Thread Karsten Bräckelmann
On Thu, 2008-02-07 at 20:15 +0100, Per Jessen wrote:
> Paul Douglas Franklin of Yakima UGM wrote:
> 
> > I have noticed that spammers are putting dead giveaways into some of
> > the headers which are not checked with the body rules.  Specifically,
> > I received an email with a sender name that was obviously spam.  
> 
> How did you determine that the sender name made the email "obviously
> spam" ?

Well, a few typical examples I've seen in the past couple hours (email
addresses munged):

 Cilais <[EMAIL PROTECTED]>
 Ciails <[EMAIL PROTECTED]>
 Amazing Watches <[EMAIL PROTECTED]>
 Most Trusted Replica <[EMAIL PROTECTED]>
 Cartier Replica <[EMAIL PROTECTED]>

However, even though spammers seem to shift some "body" into the user
visible From header, as far as I am concerned, I don't really see a need
to make SA treat the real-name part as body. The Subject tends to hold
the same [1] info. As does the body.

All those examples are really big scorers anyway -- score of 16+, Bayes
confidentiality of 99%, and they hit at least one known-to-be-good
blacklist (IP and URI).

  guenther


[1] Well, or similar. I have seen advertisement for Replica Watches
(Subject) with a From of Replica Purses. And vice versa. ;)

-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Getting ? in spam scores.

2008-02-07 Thread fchan

Hi,
I'm getting spam scores (ie No, hits=? required=?) from certain types 
of spam messages. Most of them have phishing links but I have no 
problem with most messages with links. I'm using qmail with 
qmail-scanner-2.01st on RedHat Linux ES5.


Below is excerpt of a message that I get with this:
Wed, 06 Feb 2008 09:16:40 PST:18972: +++ starting debugging for 
process 18972 (ppid=18967) by uid=509
Wed, 06 Feb 2008 09:16:41 PST:18972: c_a_g: found URL in message - 
maybe phishy - better scan it
Wed, 06 Feb 2008 09:16:41 PST:18972: w_c: Total time between DATA 
command and "." was 0.000115 secs
Wed, 06 Feb 2008 09:16:41 PST:18972: w_c: elapsedscoe time from start 
0.000123 secs
Wed, 06 Feb 2008 09:16:41 PST:18972: g_e_h: 
return-path='[EMAIL PROTECTED]', 
recips='[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED],[EMAIL PROTECTED]'
Wed, 06 Feb 2008 09:16:41 PST:18972: from='"Veronica Salas" 
<[EMAIL PROTECTED]>', subj='Customer alert!', via SMTP from 83.23.38.47

Wed, 06 Feb 2008 09:16:41 PST:18972: clamdscan: finished scan in 0.011407 secs
Wed, 06 Feb 2008 09:17:26 PST:18972: SA: finished scan in 45.026522 
secs - hits=?/?

Wed, 06 Feb 2008 09:17:26 PST:18972: p_s: finished scan in 0.020737 secs
Wed, 06 Feb 2008 09:17:26 PST:18972: ini_sc: finished scan of 
"/var/spool/qmailscan/tmp/s1.molsci.org120231820076418972"


I have set timeout on qmailscanner for spamc to 45 seconds. Why are, 
what I guess, links causing this. Does spamassassin normally take 
this long to scan or should I set it longer? Am I missing some 
setting/plugin that is causing this?


Thank you for your assistance,
Frank


Re: No URIBL after upgrade to 3.2.4

2008-02-07 Thread Christopher Bort

On 02/04/08 16:33, [EMAIL PROTECTED] (Daryl C. W. O'Shea) wrote:


Christopher Bort wrote:
I have recently upgraded a SpamAssassin installation from 
3.2.1 to 3.2.4. Since then URIBL hits have dropped to nearly 
zero, where before there were several hundred per day. 
Immediately after the upgrade, there were a handful of hits on 
URIBL_BLACK, but I have not seen any at all in the last few 
days. No configs have been changed with the upgrade, but 
sa_update is run nightly via cron.


Run a message that you expect an URIBL hit on through spamassassin -D
and look at the debug output to find out what is going on.


Curious. A message that was run through SA by my mail server's 
helper without hits on any URIBL or RAZOR2 rules gets hits on 
multiple URIBL and RAZOR2 rules when fed to spamassassin -D 
manually. I will look into the possibility that the server's 
helper program is either misconfigured or is doing something 
wrong. Whatever it is doesn't seem terribly consistent at this 
point, though, because I continue to get plenty of RAZOR2 hits 
and a small handful of URIBL hits. At any rate, it warrants 
further investigation...


Thank you for your help.

--
Christopher Bort
<[EMAIL PROTECTED]>




Re: upgrading is just like installing

2008-02-07 Thread Karsten Bräckelmann
On Wed, 2008-02-06 at 12:32 +0800, [EMAIL PROTECTED] wrote:
> Let's record my 3.23 to 3.24 upgrade attempt.
> 
> I untar Mail-SpamAssassin-3.2.4 and of course read the file entitled
> UPGRADE.
> 
> In it I find "Note for Users Upgrading to SpamAssassin 3.2.0".
> But I am upgrading to 3.24.

No, you are not. Please note that these are version numbers, not floats.
With respect to minor versions, 24 is massively larger than 2...

Anyway, the UPGRADE file and its information is targeted at upgrading
minor (or even major versions). Read, when upgrading from 3.1.x (or
older) to 3.2.x for example. It mentions incompatibilities or general
issues that may need attention when doing such an upgrade.

Generally, no such issues exist when upgrading micro versions. Which you
in fact did.


> OK, now after reading the whole INSTALL file, I come to the conclusion
> that to upgrade, one just acts like one never installed SpamAssassin
> in the first place, and apparently the new SpamAssassin will just
> install on top of the old one (with cruft surely accumulating in the
> corners, I bet.)

Unless your configure or build options change, there should be no cruft.
However, you likely just overwrote your changes to local.cf and friends,
if any.

> OK... it all worked out. I even remembered to do sa-update, a step
> mentioned in here in the newsgroup, not in the highly cluttered
> INSTALL file.

I guess that is, because it is entirely optional. You do not need to
sa-update, when installing.

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Body vs headers

2008-02-07 Thread Per Jessen
Paul Douglas Franklin of Yakima UGM wrote:

> I have noticed that spammers are putting dead giveaways into some of
> the headers which are not checked with the body rules.  Specifically,
> I received an email with a sender name that was obviously spam.  

How did you determine that the sender name made the email "obviously
spam" ?


/Per Jessen, Zürich



Re: score

2008-02-07 Thread Theo Van Dinter
On Thu, Feb 07, 2008 at 07:39:00PM +0100, Andrea Bencini wrote:
> Now I send an e-mail where there are  the words "porno" and "sex" in the
> message body.
> 
> Why there aren't  FREE_PORN and LIVE_PORN scores?

Because the single words "porno" and "sex" don't trigger the rules.  There are
few rules that trigger on a single word, and in this case, neither porn nor
sex is necessarily indicative of spam.

Single word spaminess is handled best by Bayes, and the mail received a
BAYES_99 rule hit.

-- 
Randomly Selected Tagline:
"The porcupine with the sharpest quills gets stuck on a tree more often."


pgpYFdiizofiZ.pgp
Description: PGP signature


score

2008-02-07 Thread Andrea Bencini

I installed postfix-2.4.5-2.fc8, amavisd-new-2.5.2-2.fc8 and
spamassassin-3.2.3-2.fc8.
They are running.
I would like to test spam changing "score" in local.cf.
My local.cf is:

report_safe 0
use_bayes 1
use_bayes_rules 1
skip_rbl_checks 0
bayes_path /var/spool/amavisd/.spamassassin/bayes
score FREE_PORN 1000
score LIVE_PORN 1100

Now I send an e-mail where there are  the words "porno" and "sex" in the
message body.

I receive the e-mail via postfix/amavisd and in the message header there are
X-Spam-Flag: NO
X-Spam-Score: 3.181
X-Spam-Level: ***
X-Spam-Status: No, score=3.181 tagged_above=0 required=5 tests=[AWL=-0.320,
BAYES_99=3.5, STOX_REPLY_TYPE=0.001]

Why there aren't  FREE_PORN and LIVE_PORN scores?

Thanks
Andrea



Re: Problems with CHARSET_FARAWAY_HEADER & UNWANTED_MESSAGE_BODY (was Re: Japanese emails being triggered as Spam incorrectly...)

2008-02-07 Thread Karsten Bräckelmann
On Wed, 2008-02-06 at 20:00 +1000, David Hobley wrote:
> I have been trying to work out what is the core issue here, but I am
> still stumped. Can anyone offer any suggestions?

What don't you like about my reply sent just within a couple hours after
your OP a week ago?
  
http://mail-archives.apache.org/mod_mbox/spamassassin-users/200801.mbox/[EMAIL 
PROTECTED]

I told you how to fix your problem *and* pointed to the relevant
documentation -- which you apparently did not bother to read carefully
while "struggling" with this issue for days...

  guenther


-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIDNSBL Question

2008-02-07 Thread Theo Van Dinter
On Thu, Feb 07, 2008 at 12:42:06PM -0500, [EMAIL PROTECTED] wrote:
> Does anyone know where this plugin has the DNS servers set so you can  
> change them?

The plugin doesn't set DNS servers.  It queries the servers as listed in
resolv.conf, same as everything else.

-- 
Randomly Selected Tagline:
"And the No. 1 response that you'll need to memorize if you plan to bet
 your business on Windows 2000: 'You want fries with that?'"
 - Nicholas Petreley


pgpBTNloKfuY8.pgp
Description: PGP signature


URIDNSBL Question

2008-02-07 Thread robb
Does anyone know where this plugin has the DNS servers set so you can  
change them?




Re: x-cr-hashedpuzzle

2008-02-07 Thread Matus UHLAR - fantomas
> Justin Mason wrote:
> >I've been thinking about this.  It might be useful to offer a plugin
> >implementing this hashcash, since it'd offer a good way to come up
> >with an unforgeable FORGED_MUA_OUTLOOK rule.
> >However, we'd have to be sure that the CSRI algorithm really is
> >sufficiently open, and not patent-encumbered, since this *is* MS we're
> >talking about :(

On 06.02.08 18:38, mouss wrote:
> and I'll add few points:

> - If BCC results in separate mail, then an MSA will get more mail. not 
> critical, but this is not optimal. Also, there will be multiple 
> responses of the MSA. given that it is already confusing when an MSA 
> rejects few recipients (domain doesn't exist, .. etc), this will only 
> add to the confusion ("should I resend to everybody or only to few 
> people"...).

this can result in MUA eating more CPU when using Bcc:, thus lowering of
Bcc: usage. Of course it will cause the same problem to bots, but which
bots use Bcc? imho this would also cause more problems to users using Bcc
than bots...
-- 
Matus UHLAR - fantomas, [EMAIL PROTECTED] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Due to unexpected conditions Windows 2000 will be released
in first quarter of year 1901


Re: libspamc.so and bayes

2008-02-07 Thread Michael Parker

On Feb 6, 2008, at 4:49 AM, Or Goshen wrote:
Is it possible to use libspamc.so to tell spamd that a message is  
either spam or ham ?

ie, imitate  "sa-learn --spam/--ham" using libspamc.so.

There dont seem to be any documentation about the library, all I  
could find are comments in the header file which weren't really  
helpful.




Yes, look at message_tell:

/* Process the message through the spamd tell command, making as many
 * connection attempts as are implied by the transport structure. To  
make
 * this do failover, more than one host is defined, but if there is  
only

 * one there, no failover is done.
 */
int message_tell(struct transport *tp, const char *username, int flags,
 struct message *m, int msg_class,
 unsigned int tellflags, unsigned int *didtellflags);


You can look for some example usage in the actual spamc command:

   −L learn type, −−learntype=type
   Send message to spamd for learning.  The "learn type" can  
be either
   spam, ham or forget.  The exitcode for spamc will be set  
to 5 if

   the message was learned, or 6 if it was already learned.

   Note that the "spamd" must run with the "−−allow 
−tell" option for

   this to work.

   −C report type, −−reporttype=type
   Report or revoke a message to one of the configured  
collaborative
   filtering databases.  The "report type" can be either  
report or

   revoke.

   Note that the "spamd" must run with the "−−allow 
−tell" option for

   this to work.


Michael

Re: SpamAssassin with Ubuntu and Windows

2008-02-07 Thread Rubin Bennett
You have to tell spamd to listen to the IP address of the Ethernet port
on your linux box an not just on the loopback address:

netstat -lpn | grep 783
tcp0  0 127.0.0.1:783   0.0.0.0:*
LISTEN  15865/spamd -q -x -

I believe it's the -i flag that tells spamd to listen on your ethernet
adapter's address.

Rubin

On Thu, 2008-02-07 at 06:46 -0800, dedisoft wrote:
> Dear all,
> 
> I've a Ubuntu 7.1 server edition with SpamAssassin 3.2.4 daemon.
> 
> I want to use the spamc command to check for spam from my Windows mail
> server.
> 
> When I try to test it, spamc tell me that he can't connect to the ubuntu
> server.
> 
> How can I do this ?
> 
> Thanks a lot for your help.
-- 
Rubin Bennett
RB Technologies
http://thatitguy.com
[EMAIL PROTECTED]
(802)223-4448

"They that can give up essential liberty to obtain a little
temporary security deserve neither liberty nor safety"
  --Benjamin Franklin, Historical Review of Pennsylvania, 1759




SpamAssassin with Ubuntu and Windows

2008-02-07 Thread dedisoft

Dear all,

I've a Ubuntu 7.1 server edition with SpamAssassin 3.2.4 daemon.

I want to use the spamc command to check for spam from my Windows mail
server.

When I try to test it, spamc tell me that he can't connect to the ubuntu
server.

How can I do this ?

Thanks a lot for your help.
-- 
View this message in context: 
http://www.nabble.com/SpamAssassin-with-Ubuntu-and-Windows-tp15335329p15335329.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: report before Received ...

2008-02-07 Thread Matt Kettler

Valentin. wrote:

In old version:

X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on ...

  



In new version:
/usr/bin/spamc   



How to print report after all Received .
  
It breaks DKIM to do that, so this is a very intentional change that is 
not readily reversed. If you really want to change it, you'll have to 
hack the code that inserts headers.





report before Received ...

2008-02-07 Thread Valentin.

In old version:

/usr/bin/spamc   http://www.nabble.com/report-before-Received-...-tp15334428p15334428.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Rules statistics and custom values

2008-02-07 Thread Matt Kettler

Francesco Abeni wrote:

Good morning everyone.

I'm using succesfully SpamAssassin to filter spam. It is already good, 
but there are still false positives. I checked their Spam Level one by 
one and get a maximum of 6.7. So i increased "required hits" from 5.0 
to 7.0, to eliminate false positives, but of course this means that 
more spam is not identified.


What i'd like to do is compare ham and spam messages with similar Spam 
Level, to check which rules are applied in one case or the other, to 
adjust manually the values of these rules in case there is an evident 
pattern.


My question is, does anyone know of a tool that can give me this 
"rules usage" on a folder of messages?
Mass-check does this. It, combined with the hit-frequencies tool is how 
the STATISTICS-set*.txt files are generated (see them in the rules 
subdir of the tarball)


http://wiki.apache.org/spamassassin/MassCheck





Rules statistics and custom values

2008-02-07 Thread Francesco Abeni

Good morning everyone.

I'm using succesfully SpamAssassin to filter spam. It is already good, 
but there are still false positives. I checked their Spam Level one by 
one and get a maximum of 6.7. So i increased "required hits" from 5.0 to 
7.0, to eliminate false positives, but of course this means that more 
spam is not identified.


What i'd like to do is compare ham and spam messages with similar Spam 
Level, to check which rules are applied in one case or the other, to 
adjust manually the values of these rules in case there is an evident 
pattern.


My question is, does anyone know of a tool that can give me this "rules 
usage" on a folder of messages?


--
Francesco Abeni
[EMAIL PROTECTED]
tel. 328 317 85 48
skype f.abeni


Re: flooded with jr* spam

2008-02-07 Thread Per Jessen
Michael W Cocke wrote:

> 
> They use DHCP.  Netops has to trace it, and I seem to be about 5Kth on
> the list.Ironic as hell, considering the effort I put into
> avoiding MIT netops about 20 years ago.

But you should be able to run tcpdump locally on your own machine? 
Unless the addresse changes rapidly, you catch one such ICMP then
report the IP to your netops guys. 


/Per Jessen, Zürich



libspamc.so and bayes

2008-02-07 Thread Or Goshen
Hi

Is it possible to use libspamc.so to tell spamd that a message is either
spam or ham ?
ie, imitate  "sa-learn --spam/--ham" using libspamc.so.

There dont seem to be any documentation about the library, all I could find
are comments in the header file which weren't really helpful.

   Or