Problems on Solaris x86

2006-08-13 Thread Pascal Maes

Hello,

I have installed MailScanner (4.55.10-3) on a solaris 10 (x86) box.
MailScanner is using SpamAssassin 3.1.4

I'm also using postfix and MailScanner is running as the user postfix.

MailScanner, in debugging mode, is going fine.
When I run spamassassin -D --lint (as user postfix) all is going fine  
too.


But when I launch MailScanner in "normal" mode (with fork), the call to

$self->do_full_eval_tests($priority, \$fulltext);

never finish;

In MailScanner, we have

$MailScanner::SA::SAspamtest = new Mail::SpamAssassin(\%settings);
$MailScanner::SA::SAspamtest->compile_now();

That's this last call which never finish except if the line
$self->do_full_eval_tests($priority, \$fulltext);
is commented.


Everything is going fine with the same config on a linux box or on a  
solaris 9 sparc box



Any idea ?



--
Pascal





SARE sa-update channels available!

2006-08-13 Thread Daryl C. W. O'Shea

Hello all,

For those of you interested in SpamAssassin's sa-update, I've created
sa-update channels for all of the rules found at the SpamAssassin Rules
Emporium website (http://www.rulesemporium.com/rules.htm).

Brief directions for use are as follows:

- download the channels' GPG key from:

http://daryl.dostech.ca/sa-update/sare/GPG.KEY

- import that key into sa-update's keyring:

sa-update --import GPG.KEY

- add the channels you want to a channel file (text file):

updates.spamassassin.org
70_sare_adult.cf.sare.sa-update.dostech.net
70_sare_spoof.cf.sare.sa-update.dostech.net

etc...

- run sa-update -- tell it to use your channel file and to trust the
  channels' GPG key:

sa-update --channelfile your-channel-file.txt --gpgkey 856AA88A


Slightly more verbose directions are available here:

http://daryl.dostech.ca/sa-update/sare/sare-sa-update-howto.txt


Also note that you'll want to remove any of the SARE rulesets updated
above from your local site directory (often /etc/mail/spamassassin/) to
keep them from overriding the ones installed by sa-update.


Regards,

Daryl



SPF softfail when mail has been forwarded from another domain

2006-08-13 Thread Andreas Pettersson

Hi all.

I've noticed a problem. We receive a few legit mails that has travelled 
through a forwarder. That causes some problems for the SPF check.
Since the mail claiming to be from hotmail clearly doesn't arrive 
directly from one of the machines listed in hotmail's spf record, the 
SPF_SOFTFAIL kicks in another 1.4 points.


What can I do to prevent this from happening?
Are there any generic solution, or am I bound to know from which servers 
I might receive forwarded mails?


I'm running SA 3.1.3 on FreeBSD.
Below is a snip of a mail that got hit by softfail because of forwarding.


Regards,
Andreas




Received: from mail.forwardingdomain.com
 by mail.mydomain.com with smtp
 (envelope-from <[EMAIL PROTECTED]>)
 for [EMAIL PROTECTED]; Fri, 11 Aug 2006 14:54:13 +0200
Received: (qmail 13341 invoked by uid 729); 11 Aug 2006 12:54:00 -
Delivered-To: [EMAIL PROTECTED]
Received: (qmail 13326 invoked from network); 11 Aug 2006 12:53:59 -
Received: from bay0-omc3-s32.bay0.hotmail.com
 by mail.forwardingdomain.com with SMTP; 11 Aug 2006 12:53:59 -
Received: from hotmail.com by bay0-omc3-s32.bay0.hotmail.com;
 Fri, 11 Aug 2006 05:53:57 -0700
Received: from mail pickup service by hotmail.com;
 Fri, 11 Aug 2006 05:53:57 -0700
Received: from 64.4.19.200 by by109fd.bay109.hotmail.msn.com with HTTP;
 Fri, 11 Aug 2006 12:53:54 GMT
X-Originating-IP: [zz.zz.zz.zz]
X-Originating-Email: [EMAIL PROTECTED]
X-Sender: [EMAIL PROTECTED]
From: "User" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]




Re: SPF softfail when mail has been forwarded from another domain

2006-08-13 Thread Loren Wilton
I've noticed a problem. We receive a few legit mails that has travelled 
through a forwarder. That causes some problems for the SPF check.
Since the mail claiming to be from hotmail clearly doesn't arrive directly 
from one of the machines listed in hotmail's spf record, the SPF_SOFTFAIL 
kicks in another 1.4 points.


What can I do to prevent this from happening?


What you've described is the basic problem with SPF.  It works fine as long 
as things don't get forwarded, or otherwise come form unauthorized sources - 
like the salesman closing a deal down at the corner wireless hotspot and 
sending the deal in directly from his laptop.


There are only three things you can do if this is causing you a problem:
1 Disable SPF checks
2 Reduce the score on some or all of the SPF checks
3 Whitelist or otherwise provide a positive adjustment for specific senders.

None of those are particularly attractive things to do.  However, you might 
have to do one of them.


Now, there is another consideration.  The SPF check is only adding 1.4 
points.  If your limit is the default 5 points, then you need to hit a few 
other rules before the mail becomes a spam.  If you have taken the threshold 
down to something like 2.0 - well, there's your problem.  The SPF rules (and 
all the rules) were scored for a threshold of 5 points.  If you are using a 
lower threshold you should reduce all of the rule scores proportionally. 
Since that is a big job, it is simpler to just leave the threshold at 5.


   Loren



Razor vs Pyzor

2006-08-13 Thread David Baron
Which is best and what do these actauly offer over spamassassin's own 
rulesets?


Re: SPF softfail when mail has been forwarded from another domain

2006-08-13 Thread Andreas Pettersson

Loren Wilton wrote:

I've noticed a problem. We receive a few legit mails that has 
travelled through a forwarder. That causes some problems for the SPF 
check.
Since the mail claiming to be from hotmail clearly doesn't arrive 
directly from one of the machines listed in hotmail's spf record, the 
SPF_SOFTFAIL kicks in another 1.4 points.


What can I do to prevent this from happening?



What you've described is the basic problem with SPF.  It works fine as 
long as things don't get forwarded, or otherwise come form 
unauthorized sources - like the salesman closing a deal down at the 
corner wireless hotspot and sending the deal in directly from his laptop.


There are only three things you can do if this is causing you a problem:
1 Disable SPF checks
2 Reduce the score on some or all of the SPF checks
3 Whitelist or otherwise provide a positive adjustment for specific 
senders.


None of those are particularly attractive things to do.  However, you 
might have to do one of them.


Now, there is another consideration.  The SPF check is only adding 1.4 
points.  If your limit is the default 5 points, then you need to hit a 
few other rules before the mail becomes a spam.  If you have taken the 
threshold down to something like 2.0 - well, there's your problem.  
The SPF rules (and all the rules) were scored for a threshold of 5 
points.  If you are using a lower threshold you should reduce all of 
the rule scores proportionally. Since that is a big job, it is simpler 
to just leave the threshold at 5.


   Loren


Thanks for an excellent answer, Loren.
I have kept the limit at 5 points, so there's still a pretty comfortable 
margin, but as long as users continues to write subjects with caps and 
exclamationmarks (like "IMPORTANT!!!"), together with some html-only, 
rfc-ignorants and gif attaches theres also the risk of FP.


Looking at the 3rd option, what would be an effective way to whitelist 
(or subtract some score from) specific relays?



Regards,
Andreas



Re: SPAM: Increase in targeted spams

2006-08-13 Thread Michael Scheidell
John Rudd wrote:
>
> On Aug 12, 2006, at 7:42 AM, Michael Scheidell wrote:
>
>>> It is very easy to unsubscribe at
>>> genutrust.com/trust . It would be impossible to get all the
>>
>> Even easier to add scores to SA rules so that thousands of users don't
>> have to individually unsubscribe from your partners lists.
>>
>> Also, violations of whois registration rules (surprise) phone number is
>> invalid and email bounces.
>
>
> Have they been reported to RFC-Ignorant?  They have an RBL for people
> with bogus whois data.  And SA already has rules for RFCI listed hosts.
>
yes


-- 
Michael Scheidell, CTO
SECNAP Network Security / www.secnap.com
[EMAIL PROTECTED]  / 1+561-999-5000, x 1131



RE: Razor vs Pyzor

2006-08-13 Thread Michael Scheidell


> -Original Message-
> From: David Baron [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, August 13, 2006 6:12 AM
> To: users@spamassassin.apache.org
> Subject: Razor vs Pyzor
> 
> 
> Which is best and what do these actauly offer over spamassassin's own 
> rulesets?
> 
Pyzor was a fork of razor.

Rumor has it that pyzor may be more cpu intensive or slower then razor.

Both take extra cpu cycles, but we use razor and believe it is worth it.

> 


Re: SPF softfail when mail has been forwarded from another domain

2006-08-13 Thread Benny Pedersen
On Sun, August 13, 2006 10:46, Andreas Pettersson wrote:

> What can I do to prevent this from happening?

generic there is 2 solutions

1: stop using forwarding
2: setup trusted_networks to include ip of the forwarding mta's ip

both should help on your problem

forwarding realy sooks

-- 
Benny



users@spamassassin.apache.org

2006-08-13 Thread Michael Di Martino
> Which is best and what do these actauly offer over spamassassin's own 
> rulesets?
> 

So how does razor differ over SA's ruleset?
Regards,
Michael Di Martino
Director of MIS
The telx Group
Office: 212 480 3300  X.2022
Cell: 646 207 6603
[EMAIL PROTECTED]
--
Sent from my BlackBerry Wireless Handheld


Re: users@spamassassin.apache.org

2006-08-13 Thread Justin Mason

Michael Di Martino writes:
> > Which is best and what do these actauly offer over spamassassin's own 
> > rulesets?
> > 
> 
> So how does razor differ over SA's ruleset?

it's entirely different -- it's a hash-sharing system, with parts
similar to SURBL.  Hard to tell, really, though, as it's proprietary
and secret ;)

--j.


RE: Registrar RBL: nomination and scoring

2006-08-13 Thread John D. Hardin
On Sat, 12 Aug 2006, Rob McEwen wrote:

> >I'm not sure zone transfers will be feasible, since the registrar
> >determination will be made dynamically.
> 
> I think, to prevent processing overloads, you might want to cache
> results at least for a period of minutes and not recalculate
> results for every thing query. I'm sure this isn't something that
> changes that much minute to minute.

But of course! I was thinking of a TTL on the order of a week.

> There still remains the question about what **exactly** should the
> numerator and the denominator be when calculating that percentage?
> Any ideas yet?

Not from me.

It might be useful to bring this up on n.a.n.e and see what the
denizens there have to say.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  They [the Republicans] have written a new constitution for Iraq
  and ignored the Constitution here at home.
 -- Julian Bond, www.tompaine.com
---



Re: Registrar RBL: nomination and scoring

2006-08-13 Thread John D. Hardin
On Sun, 13 Aug 2006, Benny Pedersen wrote:

> On Sun, August 13, 2006 02:11, John D. Hardin wrote:
> > On Sat, 12 Aug 2006, John Rudd wrote:
> >
> > 127.0.0.1 ... 127.0.0.100 perhaps? How would a rule to score points
> > based on the returned IP look? Can/does SA cache the returned IP and
> > test it in multiple rules without making multiple DNS queries?
> 
> yes, i have created an example.cf to SA

Good.

...is there any way to write a rule that mathematically bases the
score points on the IP returned, without having 100 rules (one for
each score point)?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  They [the Republicans] have written a new constitution for Iraq
  and ignored the Constitution here at home.
 -- Julian Bond, www.tompaine.com
---



Re: users@spamassassin.apache.org

2006-08-13 Thread Theo Van Dinter
On Sun, Aug 13, 2006 at 09:08:50AM -0400, Michael Di Martino wrote:
> So how does razor differ over SA's ruleset?

Razor compares MIME part hashes and URI domain hashes to a central
database where people have reported that "this is spam".

SA's ruleset looks for spammy components of messages, including calling
Razor and a bunch of other network-based services which help determine
ham vs spam.

-- 
Randomly Generated Tagline:
"Hoping the problem magically goes away by ignoring it is the 'Microsoft
 approach to programming' and should never be allowed."  - Linus Torvalds


pgpTXFe1r8hfW.pgp
Description: PGP signature


Re: users@spamassassin.apache.org

2006-08-13 Thread John D. Hardin
On Sun, 13 Aug 2006, Michael Di Martino wrote:

> > Which is best and what do these actauly offer over spamassassin's own 
> > rulesets?
> 
> So how does razor differ over SA's ruleset?

The basic difference is that SA rules try to analyze the message to
determine "does this message look like spam?" Razor et. al. are
checksum tests with result sharing, so razor answers the question "has
anyone else already seen this exact message part (body, image
attachment, etc.) and determined that it is spam?"

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The fetters imposed on liberty at home have ever been forged out
  of the weapons provided for defense against real, pretended, or
  imaginary dangers from abroad.   -- James Madison, 1799
---



Re: users@spamassassin.apache.org

2006-08-13 Thread David Baron
On Sunday 13 August 2006 18:44, Theo Van Dinter wrote:
> On Sun, Aug 13, 2006 at 09:08:50AM -0400, Michael Di Martino wrote:
> > So how does razor differ over SA's ruleset?
>
> Razor compares MIME part hashes and URI domain hashes to a central
> database where people have reported that "this is spam".
>
> SA's ruleset looks for spammy components of messages, including calling
> Razor and a bunch of other network-based services which help determine
> ham vs spam.

So one does not need to actually use Razor explicitely?


Re: users@spamassassin.apache.org

2006-08-13 Thread Loren Wilton

So one does not need to actually use Razor explicitely?


One does not need to use razor at all.  It is a network test, and you can 
run with network test disabled.  You can also run with network tests 
enabled, but specifically disable Razor.  And I'm sure there are many admins 
that do this for one reason or other.


You can think of spamassassin as having two, or maybe three, kinds of tests 
for detecting spam.


It has a bunch of local rules, which are more or less simple expressions 
looking for particular patterns in the message.


It has Bayes (which is optional) that collects words and phrases from the 
mail messages and correlates them with similar things from other mails that 
have been received in the past.  If a lot of tokens look like the kind of 
stuff in spam, Bayes suggests this might be spam.  If a lot of tokens look 
like past ham, Bayes suggests this message might be ham.


And finally it has network tests.  It collects bits and pieces from the 
arriving mail messages and queries a bunch of internet spam databases, 
asking each one "does this look like spam".  Each database, if it answers at 
all, will basically say yes or no.  These answers then get added to the spam 
score for the message.


Now, how do these internet databases decide if the stuff they see is spam? 
Lots of different ways.  There is no requirement that any two do even 
slightly the same thing.


   Loren



Re: Registrar RBL: nomination and scoring

2006-08-13 Thread David Cary Hart
On Sat, 12 Aug 2006 17:11:34 -0700 (PDT), "John D. Hardin"
<[EMAIL PROTECTED]> opined:
> On Sat, 12 Aug 2006, John Rudd wrote:
> 
> > If someone does make a Registrar RBL and a Name Server RBL (both
> > of which are good ideas), _PLEASE_ do something like this:
> > 
> > a) have two lists for each RBL, one which has the above "kill the
> > bystanders" point of view, and one which is much more conservative
> > in its listing policies.
> 
> By listing policies I suppose you mean how offensive a registrar has
> to be to be put on the list. Can anyone suggest guidelines to use to
> make this decision?
>  
> > b) have an RBL which returns different values for different
> > confidence levels.  Something like a percentage of known spammers
> > are on that specific provider.  So, if a registrar is 60% spammers
> > and 40% bystanders, it will return "60"... and I can choose to
> > only block those who have a 99% or higher rating, or something.
> > This would also, hopefully, allow SA to give different score
> > values to different ratings.
> 
> 127.0.0.1 ... 127.0.0.100 perhaps? How would a rule to score points
> based on the returned IP look? Can/does SA cache the returned IP and
> test it in multiple rules without making multiple DNS queries?
> 

I actually considered doing this. However:

1. Maintenance is problematic.

2. Creating a consistent policy for listing and removal is
nearly impossible. Ultimately, the whole thing becomes very
arbitrary. 

3. It requires data that is unavailable. Unless one considers the
total of domains registered or served then the signal:noise becomes
incalculable. I would also note that there is no standardization of
whois data.

4. If you compare this to our PRC or Korea lists, a user can evaluate
whether or not they receive any valid email from those countries and
score accordingly.

5. I believe that our "quarantine" policy provides a real incentive
for administrators to lock down their servers. Yet that knowingly
creates a certain amount of ham. However there is a consistent and
pragmatic methodology associated with delisting.

-- 
Our DNSRBL - Eliminate Spam at the Source: http://www.TQMcube.com
   Don't Subsidize Criminals: http://boulderpledge.org


Re: Registrar RBL: nomination and scoring

2006-08-13 Thread John D. Hardin
On Sun, 13 Aug 2006, David Cary Hart wrote:

> > > b) have an RBL which returns different values for different
> > > confidence levels.
> > 
> > 127.0.0.1 ... 127.0.0.100 perhaps? How would a rule to score points
> > based on the returned IP look?
> 
> I actually considered doing this. However:
> 
> 1. Maintenance is problematic.
>
> 2. Creating a consistent policy for listing and removal is
> nearly impossible. Ultimately, the whole thing becomes very
> arbitrary. 

Not necessarily. 

Registrars' Terms of Service should be publicly available for review;
standards for ToS treatment of spammer behavior should be fairly easy
to develop and apply.

Registrars' responsiveness to complaints should be fairly easy to
track as well, and standards for that should also be possible.

Meta-question: *how much* responsibility for the domain-owner's
behavior does the registrar actually or reasonably bear? What form
does that responsibility take?

There might even be a consideration of how complete and accurate the
registrar's whois data is. A factor might be the registrar having lots
of obviously-bogus domain registration data that they are unwilling to
pursue correcting with the domain owners. Having correct domain owner
contact information is, after all, one of the responsibilities of a
legitimate registrar (modulo privacy issues - but if it's visible it
should be correct!).

> 3. It requires data that is unavailable. Unless one considers the
> total of domains registered or served then the signal:noise becomes
> incalculable.

True. However there are other factors (as noted above) that can be
used as a basis for a judgement that doesn't rely on knowing those
bits of data.

Remember, this rates the *registrar*, not the domains.

> I would also note that there is no standardization of whois data.

Also true, but for this the only whois data we need is the name of the
domain's registrar. We don't need to deal with the myriad of different
ways the registrars can present (or obscure) the actual registration
data.
 
> 4. If you compare this to our PRC or Korea lists, a user can
> evaluate whether or not they receive any valid email from those
> countries and score accordingly.

Agreed. The spam-friendliness of the registrar should only be a
component of the spam/ham decision, not the entire decision.

> 5. I believe that our "quarantine" policy provides a real incentive
> for administrators to lock down their servers. Yet that knowingly
> creates a certain amount of ham. However there is a consistent and
> pragmatic methodology associated with delisting.

"delisting" in this case would involve the registrar responding
promptly and effectively to complaints about the domains registered
with them, and having a ToS agreement that is not friendly to spam
behavior, and enforcing accurate domain ownership data.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The fetters imposed on liberty at home have ever been forged out
  of the weapons provided for defense against real, pretended, or
  imaginary dangers from abroad.   -- James Madison, 1799
---



Re: Registrar RBL: nomination and scoring

2006-08-13 Thread David Cary Hart
On Sun, 13 Aug 2006 10:26:28 -0700 (PDT), "John D. Hardin"
<[EMAIL PROTECTED]> opined:
> 
> Registrars' Terms of Service should be publicly available for
> review; standards for ToS treatment of spammer behavior should be
> fairly easy to develop and apply.
> 
> Registrars' responsiveness to complaints should be fairly easy to
> track as well, and standards for that should also be possible.
> 
> Meta-question: *how much* responsibility for the domain-owner's
> behavior does the registrar actually or reasonably bear? What form
> does that responsibility take?

And how much are you willing to pay for a domain?
> 
I don't disagree with any of this. In  fact, this could be a very
powerful economic boycott which is why I thought about it. I am only
pointing our the administrative difficulties. 

How would you suggest the query mechanism works? I Most whois servers
impose some sort of volume limitation; Many are extremely slow.

Therefor, this probably warrants a RHSBL with the registrar in the
text record. In turn, that requires getting a listing of all domains
registered by a listed registrar.

How do you keep up with transfers?

If someone can figure out the  mechanics, I have a volunteer (working
on her MBA) who is great at crafting policy. I also have the mirrors
and structure. I am willing to add the zone. My first listing would
be Gandi.

-- 
Our DNSRBL - Eliminate Spam at the Source: http://www.TQMcube.com
   Don't Subsidize Criminals: http://boulderpledge.org


Re: Improved OCR Plugin with approximate matching

2006-08-13 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

decoder wrote:
> Hello there,
>
> I have improved the original OcrPlugin (found at
> http://wiki.apache.org/spamassassin/OcrPlugin), so it contains
> fuzzy matching. Like that, mistakes made by the OCR recognition or
> intentional obfuscations in the text don't make the recognition
> impossible. This is being done with a relative distance calculation
>  between the pattern (word from a given word list) and a line in
> the recognized input. Also, the plugin uses dynamic scoring (more
> matched words means more score, this can be adjusted in the
> source).
>
> You can find a full description and an example in the wiki under:
>
> http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
>
>
> Ideas for improvements or critics are always welcome :)
>
>
> Best regards,
>
>
> Chris

Hello there,

I've just released version 2.1c, which fixes problems when using
Spamassassin + Mailscanner (score is always 1.0).

Thanks for this bug report and patch to Howard Kash.

Other (minor) changes:

- -Fixed a typo (treshold -> threshold), if you are using this variable
in your config, you need to fix this.
- -Removed the '-' from jpegtopnm arguments to provide backwards
compatiblity to older netpbm (as someone else mentioned here before)

The updated version can be found at the usual download URL (see the
spamassassin wiki under FuzzyOcr)


Best regards

Christian
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE33TcJQIKXnJyDxURAukgAKCYIPpk1R0oHQH7qdCVtrd7DdHGowCfVsZh
3KUFvNC5v52BytjKnA2OooY=
=0r9I
-END PGP SIGNATURE-



Re: Registrar RBL: nomination and scoring

2006-08-13 Thread John D. Hardin
On Sun, 13 Aug 2006, David Cary Hart wrote:

> I don't disagree with any of this. In fact, this could be a very
> powerful economic boycott which is why I thought about it. I am
> only pointing our the administrative difficulties.
> 
> How would you suggest the query mechanism works? I Most whois
> servers impose some sort of volume limitation; Many are extremely
> slow.

There is caching. It shouldn't do a whois query for a given domain
more than once per TTL (which I default to a week). However the
initial surge of checking common domains may hit throttling.

Also, it doesn't need to go out to the actual registrar for all the
details, it just captures the registrar name from the root whois
query.

However, *most* domains won't be hosted by spam-friendly registrars,
and if whois gives you the finger this will return NXDOMAIN, so the
worst you'll get is a false negative response for a while, until a
definitive response *is* received.
 
> Therefor, this probably warrants a RHSBL with the registrar in the
> text record. In turn, that requires getting a listing of all
> domains registered by a listed registrar.

That's the sticking point. How and where do you obtain that
information? Do you have to become a registrar?
 
> How do you keep up with transfers?

If it's dynamically collected then transfers don't make sense. Sure,
you'll capture the known domains (ones that somebody has asked about
within the last $TTL seconds), but the unknown ones will all return
NXDOMAIN, leading to FNs.

Being able to download the domain->registrar information en masse
makes it *much* simpler, you can just reformat it as a zone file and
publish it. But then you lose the percentile support that the dynamic
server provides.

> If someone can figure out the mechanics, I have a volunteer
> (working on her MBA) who is great at crafting policy. I also have
> the mirrors and structure. I am willing to add the zone. My first
> listing would be Gandi.

I have a first cut beta available right now, if you want to try it
out. It's still rough so you have to edit the source to configure it,
but I'd be willing to get some feedback (apart from "OH MY GOD that's
hideous code! My eyes! AUGH!"). Contact me off-list if you're
interested.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The fetters imposed on liberty at home have ever been forged out
  of the weapons provided for defense against real, pretended, or
  imaginary dangers from abroad.   -- James Madison, 1799
---



Fwd: Report

2006-08-13 Thread Robert Nicholson
Why isn'tscore MICROSOFT_EXECUTABLE 20bumping the score up on these mails that have .exe attachments?Begin forwarded message:From: "Microsoft Internet Message Delivery System" <[EMAIL PROTECTED]>Date: August 13, 2006 2:41:15 PM CDTTo: "Network Client" <[EMAIL PROTECTED]>Subject: ReportX-Spam-Dcc: : grub.camros.com 1113; Body=1 Fuz1=1X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on grub.camros.comX-Spam-Status: No, score=0.0 required=0.6 tests=BAYES_50,HTML_MESSAGE, MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI autolearn=ham version=3.1.1Received: (qmail 386 invoked from network); 13 Aug 2006 19:41:10 -Received: from smtp-2.orange.nl (193.252.22.242) by 64.34.193.12 with SMTP; 13 Aug 2006 19:41:10 -Received: from jbqw (p0615.nas3-asd6.dial.wanadoo.nl [62.234.218.107]) by mwinf6104.orange.nl (SMTP Server) with SMTP id 11FDB1C00088; Sun, 13 Aug 2006 21:41:15 +0200 (CEST)X-Me-Uuid: [EMAIL PROTECTED]Mime-Version: 1.0Content-Type: multipart/alternative; boundary="ssyybkmmzsq"Message-Id: <[EMAIL PROTECTED]>X-Accept-Flag: Sender is UnknownLines: 2387

Re: Report

2006-08-13 Thread Loren Wilton



Because MICROSOFT_EXECUTABLE didn't hit on that message?
 
Because MICROSOFT_EXECUTABLE was a 2.x rule that was deleted in 3.0 and you 
are runing 3.1.1?
 
        Loren

  - Original Message - 
  From: 
  Robert 
  Nicholson 
  To: users@spamassassin.apache.org 
  
  Sent: Sunday, August 13, 2006 12:53 
  PM
  Subject: Fwd: Report
  Why isn't
  
  score MICROSOFT_EXECUTABLE 20
  
  bumping the score up on these mails that have .exe attachments?
  
  
  Begin forwarded message:
  
From: "Microsoft Internet 
Message Delivery System" <[EMAIL PROTECTED]>
Date: August 13, 2006 2:41:15 
PM CDT
To: "Network Client" <[EMAIL PROTECTED]>
Subject: Report
X-Spam-Dcc: : grub.camros.com 1113; 
Body=1 Fuz1=1
X-Spam-Checker-Version: 
SpamAssassin 3.1.1 (2006-03-10) on grub.camros.com
X-Spam-Status: No, score=0.0 
required=0.6 tests=BAYES_50,HTML_MESSAGE, 
MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI autolearn=ham version=3.1.1
Received: (qmail 386 invoked from 
network); 13 Aug 2006 19:41:10 -
Received: from smtp-2.orange.nl 
(193.252.22.242) by 64.34.193.12 with SMTP; 13 Aug 2006 19:41:10 
-
Received: from jbqw 
(p0615.nas3-asd6.dial.wanadoo.nl [62.234.218.107]) by mwinf6104.orange.nl 
(SMTP Server) with SMTP id 11FDB1C00088; Sun, 13 Aug 2006 21:41:15 +0200 
(CEST)
X-Me-Uuid: [EMAIL PROTECTED]
Mime-Version: 1.0
Content-Type: multipart/alternative; 
boundary="ssyybkmmzsq"
Message-Id: <[EMAIL PROTECTED]>
X-Accept-Flag: Sender is 
Unknown
Lines: 2387




Re: Report

2006-08-13 Thread Robert Nicholson
Are you saying that 25_antivirus.cf doesn't have MICROSOFT_EXECUTABLE in 3.11?On Aug 13, 2006, at 3:10 PM, Loren Wilton wrote:Because MICROSOFT_EXECUTABLE didn't hit on that message? Because MICROSOFT_EXECUTABLE was a 2.x rule that was deleted in 3.0 and you are runing 3.1.1?     Loren- Original Message -From: Robert NicholsonTo: users@spamassassin.apache.orgSent: Sunday, August 13, 2006 12:53 PMSubject: Fwd: ReportWhy isn'tscore MICROSOFT_EXECUTABLE 20bumping the score up on these mails that have .exe attachments?Begin forwarded message:From: "Microsoft Internet Message Delivery System" <[EMAIL PROTECTED]>Date: August 13, 2006 2:41:15 PM CDTTo: "Network Client" <[EMAIL PROTECTED]>Subject: ReportX-Spam-Dcc: : grub.camros.com 1113; Body=1 Fuz1=1X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on grub.camros.comX-Spam-Status: No, score=0.0 required=0.6 tests=BAYES_50,HTML_MESSAGE, MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI autolearn=ham version=3.1.1Received: (qmail 386 invoked from network); 13 Aug 2006 19:41:10 -Received: from smtp-2.orange.nl (193.252.22.242) by 64.34.193.12 with SMTP; 13 Aug 2006 19:41:10 -Received: from jbqw (p0615.nas3-asd6.dial.wanadoo.nl [62.234.218.107]) by mwinf6104.orange.nl (SMTP Server) with SMTP id 11FDB1C00088; Sun, 13 Aug 2006 21:41:15 +0200 (CEST)X-Me-Uuid: [EMAIL PROTECTED]Mime-Version: 1.0Content-Type: multipart/alternative; boundary="ssyybkmmzsq"Message-Id: <[EMAIL PROTECTED]>X-Accept-Flag: Sender is UnknownLines: 2387

Re: Report

2006-08-13 Thread Michele Neylon :: Blacknight.ie
Robert Nicholson wrote:
> Are you saying that 25_antivirus.cf doesn't have MICROSOFT_EXECUTABLE in
> 3.11?
> 

That requires an extra plugin from what I can see:

# Requires the Mail::SpamAssassin::Plugin::AntiVirus plugin be loaded.



-- 
Mr Michele Neylon
Blacknight Solutions
Quality Business Hosting & Colocation
http://www.blacknight.ie/
Tel. 1850 927 280
Intl. +353 (0) 59  9183072
Direct Dial: +353 (0)59 9183090
Fax. +353 (0) 59  9164239


Re: LOG: Re: Report

2006-08-13 Thread Robert Nicholson
Do I have to specifically enable that plugin? I have that installed.On Aug 13, 2006, at 3:22 PM, Michele Neylon :: Blacknight.ie wrote:    Accepting to folder lists/unix/spamassassin-usersFrom: "Michele Neylon :: Blacknight.ie" <[EMAIL PROTECTED]>Date: August 13, 2006 3:22:04 PM CDTTo: Robert Nicholson <[EMAIL PROTECTED]>, users@spamassassin.apache.orgSubject: Re: ReportRobert Nicholson wrote: Are you saying that 25_antivirus.cf doesn't have MICROSOFT_EXECUTABLE in3.11? That requires an extra plugin from what I can see:# Requires the Mail::SpamAssassin::Plugin::AntiVirus plugin be loaded.-- Mr Michele NeylonBlacknight SolutionsQuality Business Hosting & Colocationhttp://www.blacknight.ie/Tel. 1850 927 280Intl. +353 (0) 59  9183072Direct Dial: +353 (0)59 9183090Fax. +353 (0) 59  9164239 

Re: SARE sa-update channels available!

2006-08-13 Thread DAve

Daryl C. W. O'Shea wrote:

Hello all,

For those of you interested in SpamAssassin's sa-update, I've created
sa-update channels for all of the rules found at the SpamAssassin Rules
Emporium website (http://www.rulesemporium.com/rules.htm).


Ya stole my thunder. I just came in from running a chainsaw all day and 
was beginning to work on that again. If you are interested, I'd be happy 
to mirror for you.


Two things I saw, maybe you covered them, maybe you don't care.

One, I had two URL vars in my script. A URL hitting my site so I could 
download rules as often as I wanted, and another URL that hit 
rulesemporium. Use the wrong URL too often and you get the following 
instead of a rules file,


AUTOBAN: Over 500 *.cf requests in 48 hours period - Check your CRON
CONTACT: [EMAIL PROTECTED]

So checking for updates too often can cause you to create a big pile of 
channel files that will not lint. Sorry Chris, I was trying to do 
laundry and code at the same time. I knew better too, which was why I 
had two URLs in the script.


Two, the GPG key really only says the rules are valid from your server, 
it doesn't guarantee the rules are valid SARE rules. Not sure how to 
handle that, or if users/authors will even care. Possibly authors would 
be willing to tar, gzip, and sign their rules if they were provided an 
upload facility.


Just some thoughts. Thanks for taking the time to do this, I think it 
will be welcomed once the word gets out.


DAve



Brief directions for use are as follows:

- download the channels' GPG key from:

http://daryl.dostech.ca/sa-update/sare/GPG.KEY

- import that key into sa-update's keyring:

sa-update --import GPG.KEY

- add the channels you want to a channel file (text file):

updates.spamassassin.org
70_sare_adult.cf.sare.sa-update.dostech.net
70_sare_spoof.cf.sare.sa-update.dostech.net

etc...

- run sa-update -- tell it to use your channel file and to trust the
  channels' GPG key:

sa-update --channelfile your-channel-file.txt --gpgkey 856AA88A


Slightly more verbose directions are available here:

http://daryl.dostech.ca/sa-update/sare/sare-sa-update-howto.txt


Also note that you'll want to remove any of the SARE rulesets updated
above from your local site directory (often /etc/mail/spamassassin/) to
keep them from overriding the ones installed by sa-update.


Regards,

Daryl






--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Re: Registrar RBL: nomination and scoring

2006-08-13 Thread John Rudd


On Aug 13, 2006, at 8:41 AM, John D. Hardin wrote:




There still remains the question about what **exactly** should the
numerator and the denominator be when calculating that percentage?
Any ideas yet?


Not from me.



I don't know either.  I base the general idea on the IronPort "Sender 
Base Reputation Score", but that's not an open content thing.  You can 
browse their database, but it wont tell you the actual -10 
(overwhelmingly likely to be a spam sender) to +10 (pure innocent 
angels of email) rating unless you've got a license.  You can set the 
IronPort box to whatever threshold you want for blocking sending hosts.



I like the idea of an RBL gives ratings instead of binary values.  
That's why I thought of it being a "confidence percentage" instead of 
just a "yes, we have them listed in the zone".  How to build that 
confidence rating is another matter entirely.


SBRS is a cross section of data sources and data items, whereas what 
we're talking about here is a single data item (whether or not we can 
trust a host based upon who its domain registrar is).  So it's not like 
we can start out by pulling data from multiple zones and building up a 
number based on how much we trust each zone and how many zones someone 
is listed in.  The only other thought I have, which is not going to be 
an immediate result, is simply to have people give feedback, over time, 
about different hosts ... and then have that feed into a database which 
tracks hosts and registrars to build up that confidence rating over 
time.


Sorry, my idea is only half baked so far :-}



Re: SARE sa-update channels available!

2006-08-13 Thread Daryl C. W. O'Shea

On 8/13/2006 4:49 PM, DAve wrote:

Daryl C. W. O'Shea wrote:


Hello all,

For those of you interested in SpamAssassin's sa-update, I've created
sa-update channels for all of the rules found at the SpamAssassin Rules
Emporium website (http://www.rulesemporium.com/rules.htm).



Ya stole my thunder. I just came in from running a chainsaw all day and 
was beginning to work on that again. If you are interested, I'd be happy 
to mirror for you.


Sorry about that.  I've actually had this running for about a month and
I got all my chainsaw work done last week while waiting five days for
power to be restored.  I wanted to fully test it and talk to some of the
folks from SARE before I made it public.

Judging on the traffic stats I was provided with, I think I should be
able to handle the traffic for a while anyway.  I do plan on writing
some code to efficiently update channel mirrors in a timely manner
though, so once that's done I'll be sure to let you know.



Two things I saw, maybe you covered them, maybe you don't care.

One, I had two URL vars in my script. A URL hitting my site so I could 
download rules as often as I wanted, and another URL that hit 
rulesemporium. Use the wrong URL too often and you get the following 
instead of a rules file,


AUTOBAN: Over 500 *.cf requests in 48 hours period - Check your CRON
CONTACT: [EMAIL PROTECTED]

So checking for updates too often can cause you to create a big pile of 
channel files that will not lint. Sorry Chris, I was trying to do 
laundry and code at the same time. I knew better too, which was why I 
had two URLs in the script.


Covered, thanks for pointing it out though.


Two, the GPG key really only says the rules are valid from your server, 
it doesn't guarantee the rules are valid SARE rules. Not sure how to 
handle that, or if users/authors will even care. Possibly authors would 
be willing to tar, gzip, and sign their rules if they were provided an 
upload facility.


I suppose they could.  It'd be a little more work for the channel users
though, having to import each key and include them in a trusted gpgkey
file.  Additionally it would require documentation to be updated for
every new ruleset, saying what key it uses.

I think that I'm familiar enough with a lot of SA users that it won't be
an issue (heck, I could post crappy rules to the users' list that a lot
of people would probably blindly use).  Of course, I'd listen to
anyone's concerns otherwise.

Also, FWIW, I won't be modifying the rulesets.  Even the
70_sare_whitelist_spf.cf file that currently can't be updated (the
channel update will fail) since the file doesn't pass a --lint test if
the SPF plugin isn't enabled.  I've sent mail to Bob about this.  I'm
hoping that he adds the missing ifplugin lines soon.  See SA bug 5044.


Just some thoughts. Thanks for taking the time to do this, I think it 
will be welcomed once the word gets out.


No problem.  Thanks for the comments.


Daryl



Re: Registrar RBL: nomination and scoring

2006-08-13 Thread jdow

From: "John D. Hardin" <[EMAIL PROTECTED]>


On Sun, 13 Aug 2006, Benny Pedersen wrote:


On Sun, August 13, 2006 02:11, John D. Hardin wrote:
> On Sat, 12 Aug 2006, John Rudd wrote:
>
> 127.0.0.1 ... 127.0.0.100 perhaps? How would a rule to score points
> based on the returned IP look? Can/does SA cache the returned IP and
> test it in multiple rules without making multiple DNS queries?

yes, i have created an example.cf to SA


Good.

...is there any way to write a rule that mathematically bases the
score points on the IP returned, without having 100 rules (one for
each score point)?


Of course - look at the Bayes rules and "eval".

{^_-}


Re: Report

2006-08-13 Thread jdow

SpamAssassin is not an anti-virus tool.
{^_^}
- Original Message - 
From: "Robert Nicholson" <[EMAIL PROTECTED]>


Are you saying that 25_antivirus.cf doesn't have MICROSOFT_EXECUTABLE  
in 3.11?


On Aug 13, 2006, at 3:10 PM, Loren Wilton wrote:


Because MICROSOFT_EXECUTABLE didn't hit on that message?

Because MICROSOFT_EXECUTABLE was a 2.x rule that was deleted in 3.0  
and you are runing 3.1.1?




Re: Registrar RBL: nomination and scoring

2006-08-13 Thread John D. Hardin
On Sun, 13 Aug 2006, John Rudd wrote:

> I like the idea of an RBL gives ratings instead of binary values.  
> That's why I thought of it being a "confidence percentage" instead
> of just a "yes, we have them listed in the zone".  How to build
> that confidence rating is another matter entirely.

There's another option: develop a set of registrar behavior criteria
(e.g. "does not have a strong anti-spam AUP", "does not respond to
complaints", "does not enforce AUP", etc.) and assign bits to those
criteria. There wouldn't be a confidence score per se, but a bitmapped
report of why they are considered spam-friendly. If you don't want to
judge on a particular criteria, mask it out of your subtest.

It's also much less subjective.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The fetters imposed on liberty at home have ever been forged out
  of the weapons provided for defense against real, pretended, or
  imaginary dangers from abroad.   -- James Madison, 1799
---



Re: SARE sa-update channels available!

2006-08-13 Thread Loren Wilton

channel update will fail) since the file doesn't pass a --lint test if
the SPF plugin isn't enabled.  I've sent mail to Bob about this.  I'm
hoping that he adds the missing ifplugin lines soon.  See SA bug 5044.


Bob's been REAL busy lately on his day job, but we're hoping he will get a 
little breathing time sometime soon.


   Loren



Re: Registrar RBL: nomination and scoring

2006-08-13 Thread jdow

From: "John Rudd" <[EMAIL PROTECTED]>


On Aug 13, 2006, at 8:41 AM, John D. Hardin wrote:




There still remains the question about what **exactly** should the
numerator and the denominator be when calculating that percentage?
Any ideas yet?


Not from me.



I don't know either.  I base the general idea on the IronPort "Sender 
Base Reputation Score", but that's not an open content thing.  You can 
browse their database, but it wont tell you the actual -10 
(overwhelmingly likely to be a spam sender) to +10 (pure innocent 
angels of email) rating unless you've got a license.  You can set the 
IronPort box to whatever threshold you want for blocking sending hosts.


 I wonder what the reputation of homelinux.org is these days.
(I just posted a couple "rules" to the FC mailing list about them.
A spam was relayed through them to the list followed by two shills
who copied the entire message and complained at the bottom "pro
forma." This is not the first time this has happened.)

{^_^}


Re: SARE sa-update channels available!

2006-08-13 Thread DAve

Daryl C. W. O'Shea wrote:

On 8/13/2006 4:49 PM, DAve wrote:

Daryl C. W. O'Shea wrote:


Hello all,

For those of you interested in SpamAssassin's sa-update, I've created
sa-update channels for all of the rules found at the SpamAssassin Rules
Emporium website (http://www.rulesemporium.com/rules.htm).



Ya stole my thunder. I just came in from running a chainsaw all day 
and was beginning to work on that again. If you are interested, I'd be 
happy to mirror for you.


Sorry about that.  I've actually had this running for about a month and
I got all my chainsaw work done last week while waiting five days for
power to be restored.  I wanted to fully test it and talk to some of the
folks from SARE before I made it public.


Chainsaws, couldn't live without 'em. I hope all you lost were trees.



Judging on the traffic stats I was provided with, I think I should be
able to handle the traffic for a while anyway.  I do plan on writing
some code to efficiently update channel mirrors in a timely manner
though, so once that's done I'll be sure to let you know.



Two things I saw, maybe you covered them, maybe you don't care.

One, I had two URL vars in my script. A URL hitting my site so I could 
download rules as often as I wanted, and another URL that hit 
rulesemporium. Use the wrong URL too often and you get the following 
instead of a rules file,


AUTOBAN: Over 500 *.cf requests in 48 hours period - Check your CRON
CONTACT: [EMAIL PROTECTED]

So checking for updates too often can cause you to create a big pile 
of channel files that will not lint. Sorry Chris, I was trying to do 
laundry and code at the same time. I knew better too, which was why I 
had two URLs in the script.


Covered, thanks for pointing it out though.


Two, the GPG key really only says the rules are valid from your 
server, it doesn't guarantee the rules are valid SARE rules. Not sure 
how to handle that, or if users/authors will even care. Possibly 
authors would be willing to tar, gzip, and sign their rules if they 
were provided an upload facility.


I suppose they could.  It'd be a little more work for the channel users
though, having to import each key and include them in a trusted gpgkey
file.  Additionally it would require documentation to be updated for
every new ruleset, saying what key it uses.


We were thinking of going another way with that. We didn't consider the 
possibility of providing the author's key. Good point, we will make sure 
we don't.




I think that I'm familiar enough with a lot of SA users that it won't be
an issue (heck, I could post crappy rules to the users' list that a lot
of people would probably blindly use).  Of course, I'd listen to
anyone's concerns otherwise.

Also, FWIW, I won't be modifying the rulesets.  Even the
70_sare_whitelist_spf.cf file that currently can't be updated (the
channel update will fail) since the file doesn't pass a --lint test if
the SPF plugin isn't enabled.  I've sent mail to Bob about this.  I'm
hoping that he adds the missing ifplugin lines soon.  See SA bug 5044.


Just some thoughts. Thanks for taking the time to do this, I think it 
will be welcomed once the word gets out.


No problem.  Thanks for the comments.


We might start using your channel until we get ours working the way we 
want:^)  Possibly instead of mirroring you, we could go ahead and offer 
a full set of files providing two independent sources. Just for 
availabilities sake.


DAve

PS. If I could have any plugin for SA, it would be a Snopes plugin. Scan 
my inbox, check the message against snopes and score accordingly. I 
don't need another story sent to me by family about people bolting JATO 
packs to their cars or David Bowie and Mick Jagger sleeping together.



--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Re: Report

2006-08-13 Thread Robert Nicholson

Could it be because the use the following Content Type?

Content-Type: audio/x-wav; name="hwrs.exe"

disguising a .exe as a wav?

On Aug 13, 2006, at 5:17 PM, jdow wrote:


SpamAssassin is not an anti-virus tool.
{^_^}
- Original Message - From: "Robert Nicholson"  
<[EMAIL PROTECTED]>


Are you saying that 25_antivirus.cf doesn't have  
MICROSOFT_EXECUTABLE  in 3.11?

On Aug 13, 2006, at 3:10 PM, Loren Wilton wrote:

Because MICROSOFT_EXECUTABLE didn't hit on that message?

Because MICROSOFT_EXECUTABLE was a 2.x rule that was deleted in  
3.0  and you are runing 3.1.1?




Re: Razor vs Pyzor

2006-08-13 Thread John Andersen
On Sunday 13 August 2006 02:12, David Baron wrote:
> Which is best and what do these actauly offer over spamassassin's own
> rulesets?

The intent of Razor is to use hashes of the body to identify spam by 
comparing it to previously reported spam.Spam previously trapped by
other means is reported to Razor. Originally, this was intended to be done
by manual review, but in truth, most of the spam hashes fed to razor are
automated, by spamassassin and other tools.  So Razor ends up being
recipient of a lot of intelligence from a wide variety of other tools.

If razor says its spam you can give it a very high score.  In my case a high 
razor alone is enough to send mail to /dev/null

-- 
_
John Andersen


pgpOd7kfKQ82Z.pgp
Description: PGP signature


autolearn never learn

2006-08-13 Thread Beast

local.cf:

bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.2
bayes_auto_learn_threshold_spam 12.0

spam:
-
*X-Spam-Status:* Yes, score=17.9 required=5.2 
tests=ADVANCE_FEE_1,ADVANCE_FEE_2,

ADVANCE_FEE_3,ADVANCE_FEE_4,BAYES_99,DEAR_FRIEND,HTML_00_10,
HTML_MESSAGE,SARE_MSGID_LONG40,SUB_HELLO autolearn=no version=3.1.4

From my understanding, SA should automatically learn any mail which has 
score > 12 as spam and < 0.2 as a ham. Am I correct?



--beast



Re: autolearn never learn

2006-08-13 Thread Theo Van Dinter
On Mon, Aug 14, 2006 at 11:21:00AM +0700, Beast wrote:
> From my understanding, SA should automatically learn any mail which has 
> score > 12 as spam and < 0.2 as a ham. Am I correct?

http://wiki.apache.org/spamassassin/AutolearningNotWorking

-- 
Randomly Generated Tagline:
"In politics, absurdity is not a handicap." - NB


pgp3nyuPEMYOE.pgp
Description: PGP signature


bayes not run on some mail

2006-08-13 Thread Beast

Hi,

From some (spam) mail which not caught by SA, it seems that bayes is 
not applied to this mail.


X-Spam-Report:
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.7 SARE_SPEC_ROLEX Rolex watch spam
X-Spam-Status: No, score=1.7 required=5.2 tests=HTML_MESSAGE,SARE_SPEC_ROLEX
autolearn=no version=3.1.4

Is bayes check is not run for every mail?


--beast



Re: bayes not run on some mail

2006-08-13 Thread jdow

From: "Beast" <[EMAIL PROTECTED]>


Hi,

From some (spam) mail which not caught by SA, it seems that bayes is 
not applied to this mail.


X-Spam-Report:
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.7 SARE_SPEC_ROLEX Rolex watch spam
X-Spam-Status: No, score=1.7 required=5.2 tests=HTML_MESSAGE,SARE_SPEC_ROLEX
autolearn=no version=3.1.4

Is bayes check is not run for every mail?


It is not run if you have not yet learned from at least 200 each of
spam and ham messages. You do not learn form all messages because the
scores are "indicative" rather than "certain" with regards to estimating
ham or spam properties. If you collect a random bunch of 200 or more
ham messages and 200 or more known spam messages and manually train
with them via sa-learn you can get Bayes working sooner.

{^_^}


Re: bayes not run on some mail

2006-08-13 Thread Beast

jdow wrote:

From: "Beast" <[EMAIL PROTECTED]>


Hi,

From some (spam) mail which not caught by SA, it seems that bayes is 
not applied to this mail.


X-Spam-Report:
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.7 SARE_SPEC_ROLEX Rolex watch spam
X-Spam-Status: No, score=1.7 required=5.2 
tests=HTML_MESSAGE,SARE_SPEC_ROLEX

autolearn=no version=3.1.4

Is bayes check is not run for every mail?


It is not run if you have not yet learned from at least 200 each of
spam and ham messages. You do not learn form all messages because the
scores are "indicative" rather than "certain" with regards to estimating
ham or spam properties. If you collect a random bunch of 200 or more
ham messages and 200 or more known spam messages and manually train
with them via sa-learn you can get Bayes working sooner.


It actually has enough corpus learned. I was running this for more than 
a year with manual tarined (daily tarined by human). Bayes was working 
for most mail but not for all mails.


[EMAIL PROTECTED] ~]# spamassassin --lint -D 2>&1 |  grep 'corpus size'
[12081] dbg: bayes: corpus size: nspam = 34035, nham = 7399

I will turn on auto leaarn mostly because I need to feed more HAM to SA 
(so far I only feed ham for any false positive which is very low daily 
and i think that is not good enough for SA)



--beast



Re: bayes not run on some mail

2006-08-13 Thread Nigel Frankcom
On Mon, 14 Aug 2006 12:21:16 +0700, Beast <[EMAIL PROTECTED]> wrote:

>Hi,
>
> From some (spam) mail which not caught by SA, it seems that bayes is 
>not applied to this mail.
>
>X-Spam-Report:
> * 0.0 HTML_MESSAGE BODY: HTML included in message
> * 1.7 SARE_SPEC_ROLEX Rolex watch spam
>X-Spam-Status: No, score=1.7 required=5.2 tests=HTML_MESSAGE,SARE_SPEC_ROLEX
> autolearn=no version=3.1.4
>
>Is bayes check is not run for every mail?
>
>
>--beast


Are you using SQL for bayes? I seem to recall an flock switch for use
with flat file bayes - though I wouldn't bet too much on my memory at
this time of the morning :-D

If you are using SQL, check the logs and see if you are maxing the
concurrent connections. Not sure how you get that from the CL but it's
easy enough to grab with the SQL Admin tool.

Nigel


Re: bayes not run on some mail

2006-08-13 Thread Beast

Nigel Frankcom wrote:

On Mon, 14 Aug 2006 12:21:16 +0700, Beast <[EMAIL PROTECTED]> wrote:

  

Hi,

From some (spam) mail which not caught by SA, it seems that bayes is 
not applied to this mail.


X-Spam-Report:
* 0.0 HTML_MESSAGE BODY: HTML included in message
* 1.7 SARE_SPEC_ROLEX Rolex watch spam
X-Spam-Status: No, score=1.7 required=5.2 tests=HTML_MESSAGE,SARE_SPEC_ROLEX
autolearn=no version=3.1.4

Is bayes check is not run for every mail?


--beast




Are you using SQL for bayes? I seem to recall an flock switch for use
with flat file bayes - though I wouldn't bet too much on my memory at
this time of the morning :-D

If you are using SQL, check the logs and see if you are maxing the
concurrent connections. Not sure how you get that from the CL but it's
easy enough to grab with the SQL Admin tool.
  


It is not, I have in local.cf and I haven't plan to use sql as a backend:

bayes_path   /var/spamassassin/bayes
bayes_file_mode 0770

--beast