Re: Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
That is the output of --dump magic? I haven't ever seen it formatted that
nicely. I assume you skipped the first line, but there's also missing the
expire atime delta. So, where do you got this from? Not directly from 
sa-learn
--dump magic I'd say. You are running SA thru some interface? You should 
have
said something about the whereabouts of your installation.
You are right, I am using MailWatch. I just posted this output to be easy 
for one to see the actual dates without having to convert. Here is the 
actual output:

# /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  49740  0  non-token data: nspam
0.000  0  47167  0  non-token data: nham
0.000  0 123325  0  non-token data: ntokens
0.000  0 1107319073  0  non-token data: oldest atime
0.000  0 1110636450  0  non-token data: newest atime
0.000  0 1108137790  0  non-token data: last journal sync 
atime
0.000  0 1108129534  0  non-token data: last expiry atime
0.000  0 804361  0  non-token data: last expire atime 
delta
0.000  0   3475  0  non-token data: last expire 
reduction count

Ok. Get the values. Then learn a message to it. Make sure it says that it
actually learned, then check the values again. Is either the spam or ham 
count
increased by one or not?
No it isn't. This is exactly the point I mentioned. But as I said earlier, 
sa-learn claims it has learned, even from the web interface:
SA Learn: Learned from 1 message(s) (1 message(s) examined).

Ok, this finally looks a bit suspicious. No sync and no expire for a month. 
If
it doesn't sync you don't get new tokens. Check in your bayes directory how 
big
your bayes_journal is. I'd think it's quite big. Do a sync now. (Please 
don't
do it via an interface, do it on the command line.) What's the output? Is 
the
journal gone and the number of tokens increased now? If so, you need to
investigate why it doesn't sync anymore. Also do an expire then.
This is getting more suspicious: there is no bayes_journal file!
# ll /var/spool/MailScanner/bayes/
total 11780
drwxrwxrwx  2 root nobody 4096 Mar 14 00:22 .
drwxr-xr-x  4 root nobody 4096 Mar 13 11:55 ..
-rw-rw-rw-  1 root nobody 1236 Mar 14 00:22 bayes.mutex
-rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen
-rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks
I can assure you noone has touched anything inside this directory. If this 
is the reason for the problems I've been facing, is there a way to recreate 
the file without having to lose my current data? (perhaps by copying the 
above files somewhere, execute sa-learn --clear and some time later restore 
the above files?)

Thanks for your help
_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-13 Thread Kai Schaetzl
GRP Productions wrote on Sun, 13 Mar 2005 22:54:22 +0200:

> Perhaps I have not been clear enough. It's not only that the files' size is 
> constant. I am pasting the output of dump magic,

That is the output of --dump magic? I haven't ever seen it formatted that 
nicely. I assume you skipped the first line, but there's also missing the 
expire atime delta. So, where do you got this from? Not directly from sa-learn 
--dump magic I'd say. You are running SA thru some interface? You should have 
said something about the whereabouts of your installation.

 and I have to explain that 
> the nham and nspam values are the same for many days now.

Ok. Get the values. Then learn a message to it. Make sure it says that it 
actually learned, then check the values again. Is either the spam or ham count 
increased by one or not?

> work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
> sex teen " and other nice words, I certainly do not want it to pass. 
> Bayes classifies it as 50% spam.  I tried to sa-learn --forget, and then 
> re-learn, still is BAYES_50.

Again, this is NOT how Bayes works. You can't learn it one message and then 
expect it to flag that message as spam next time. Bayes does not work like 
this!
And that it classifies that message as 50%, which means, it cannot determine if 
it's ham or spam, just says that the tokens in the db are not good enough for 
that message. Or maybe it contains enough hammy tokens, whatever.

> Number of Spam Messages: 49,740 
> Number of Ham Messages: 47,167 
> Number of Tokens: 123,325 
> Oldest Token: Wed, 2 Feb 2005 06:37:53 +0200 
> Newest Token: Sat, 12 Mar 2005 16:07:30 +0200 

Says it added/changed time a token yesterday.

> Last Journal Sync: Fri, 11 Feb 2005 18:03:10 +0200 
> Last Expiry: Fri, 11 Feb 2005 15:45:34 +0200 
> Last Expiry Reduction Count: 3,475 tokens

Ok, this finally looks a bit suspicious. No sync and no expire for a month. If 
it doesn't sync you don't get new tokens. Check in your bayes directory how big 
your bayes_journal is. I'd think it's quite big. Do a sync now. (Please don't 
do it via an interface, do it on the command line.) What's the output? Is the 
journal gone and the number of tokens increased now? If so, you need to 
investigate why it doesn't sync anymore. Also do an expire then.


Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread jdow
From: "List Mail User" <[EMAIL PROTECTED]>

> >   ...The person with two clocks is never really sure of
> > the current time.
> 
> OT, but... above - *not* a good quote, but it sounds nice)
> To be `sure' of the time, you need at least three clocks (look at the
> documentation for ntp/ntpd).

And even that is a gross oversimplification. (And I COULD setup my
system at one time, at least, to be approximately 1 second off by
picking the wrong ntp servers. Seems GTEI.NET's time server used for
their DNS machines and first hop routers was off, considerably.)

Reading the ntp/ntpd documentation is desirable in any case if one is
interested in precision time keeping.

{^_-}



Re: Tests results are different?

2005-03-13 Thread jdow
From: "David Suen" <[EMAIL PROTECTED]>

> Hi all, I installed spam assassin (3.0.2) in my linux box and looks like
> does not work (probably due to my configuration). I tried RTFM but still
> cannot make it work.
> 
> Situation:
> I use spamassassin with simscan + qmail. My problem is when I use the
> spamassassin (not spamc) + sample-spam.txt it gives me the correct result
> (and I tested another real spam with the 'keyword' "viagra"). However,
> when I use spamc ...it keeps saying (all the emails) score is 0.0.
> 
> Header using spamc:
> X-Spam-Status: spam=No, score=0.0 required=4.2 tests= none
> autolearn=unavailable
> 
> Header using spamassassin: X-Spam-Level: 
> X-Spam-Status: spam=Yes, score=4.3 required=4.2 tests= BIZ_TLD=0.527,
> DATE_IN_PAST_06_12=0.211,DRUGS_ERECTILE=0.026,DRUG_ED_CAPS=1.535,
> EXTRA_MPART_TYPE=0.222,SUBJECT_DRUG_GAP_VIA=1.77 autolearn=no
> 

That suggests something like "spamd is not running." For systems like
RedHat and Mandrake the incantation would be "service spamassassin
restart". (That stops it if it is running then starts it.)

Or if spamd is running a firewall setting may be blocking it, it is
not running on the local machine, or it's running on a nonstandard
port number. The latter two conditions require that the address and
port options or spamc be set to the correct values to make the TCP
connection.

{^_^}



Re: Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
This doesn't prove anything. sa-learn --dump magic shows you what's inside.
Also, Bayes is not a checksum system like Razor, that's its strength. If 
you
learn something to it that means that it extracts tokens (short pieces) 
from
the message and adjusts its internal probability for them being ham or spam 
by
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the 
fact
that the db format seems to have some "air" in it, so that it grows in 
jumps
and not continually.
Perhaps I have not been clear enough. It's not only that the files' size is 
constant. I am pasting the output of dump magic, and I have to explain that 
the nham and nspam values are the same for many days now. This is not 
normal, since we are talking about a very busy server (more than 4,000 
messages per day). This behaviour has not always been the case, it used to 
work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
sex teen " and other nice words, I certainly do not want it to pass. 
Bayes classifies it as 50% spam. I tried to sa-learn --forget, and then 
re-learn, still is BAYES_50. The nham and nspam values used to increase very 
rapidly (sometimes by a value of 200-300 per day). No errors are produced. I 
wouldn't have noticed the particular problem, but fortunately during the 
last days we started having more spam than usual to be passing. Also, I 
tried to force an expiration many times, but as you can see the expiration 
did not take place. Its definitely not a file permission issue.

Thanks
Number of Spam Messages:49,740
Number of Ham Messages: 47,167
Number of Tokens:   123,325
Oldest Token:   Wed, 2 Feb 2005 06:37:53 +0200
Newest Token:   Sat, 12 Mar 2005 16:07:30 +0200
Last Journal Sync:  Fri, 11 Feb 2005 18:03:10 +0200
Last Expiry:Fri, 11 Feb 2005 15:45:34 +0200
Last Expiry Reduction Count:3,475 tokens
_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/



[RD] evilnumbers update & changes

2005-03-13 Thread Matt Yackley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I've released a new version of evilnumbers and there are several changes in the 
new
version.

Ruleset name change:
In order to get this old setup in line with current SARE standards the name of 
the
ruleset has changed from evilnumbers.cf to 70_sare_evilnum*.cf

Multiple files:
The set has been spilt into three different files..
70_sare_evilnum0.cf = hits 0 ham during SARE masschecks
70_sare_evilnum1.cf = hits a few ham, but most folks consider these messages 
spam
70_sare_evilnum2.cf = hit 0 spam & ham during last masscheck, but may come back

RulesDuJour:
A new version of RDJ will be released soon to handle these changes, but here is 
a
manual fix.

In your RDJ or MyRDJ config file locate the evilnumbers entry and change the
following lines.
ADD = OLD_CF_FILES[8]="evilnumbers.cf"
CHANGE = CF_FILES[8]="70_sare_evilnum0.cf"
CHANGE = CF_URLS[8]="http://www.rulesemporium.com/rules/70_sare_evilnum0.cf";

Info on adding files 1 & 2 to RDJ
http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets

Language files:
If you use a local language file 98_text_**_evilnumbers.cf, please delete this 
file.
 The structure of the rules may change soon, if/when that happens I'll release
updated language files.

Cheers,
matt








-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCNITmjzAeShEp8NMRAkS2AJ9O3Wvt4qvc5BmRlKh1fFmxJP+/WACfQch7
gSpphFJ7593ULRK4L79hnck=
=ECdQ
-END PGP SIGNATURE-


Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
It would probably help if I explained that I brought up two
different but related ides in quick succession:

1.  Asking for URI domains of messages sent through zombies, open
relays, open proxies, etc. detected by XBL that mentioned SURBL URIs.

2.  Asking for URI domains of messages sent through zombies, open
relays, open proxies, etc. detected by XBL regardless of whether
those domains were already listed in SURBLs or not.

The latter may actually be more useful since it's broader and
more inclusive.  We could easily intersect them against SURBLs
ourselves if it were useful for other applications.

I believe this could be a valuable new data source.  It's true
that Spamhaus and others probably already have this data
internally but we don't.  ;-)  It's also possibly true that
existing trap based lists like ob.surbl.org and jp.surbl.org
may already have similar data in them.  As Paul notes there
is probably a lot of overlap between the various datasets
being used or proposed.

I'd probably ask for messages sent through XBL and list.dsbl.org
listed hosts since both lists are pretty reliable.  Completeness
of compromised host detection is probably non-essential for this
application.  The resulting dataset would be so large that missing
some fraction of zombies probably would not affect the end result
very much.  The sites of the biggest spammers would tend to
bubble to the top of a volume-ranked list.

Jeff C.
--
"If it appears in hams, then don't list it."



Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 8:07:47 AM, List User wrote:
> I think it would be useful, *but* Spamhaus is very good at adding
> IPs of sites that exploit the XBL - So you would see a significant overlap.
> [...]  And, SURBLs are RHS lists, so you will catch IP jumping that
> the SBL often misses (for a little while).

Yep slightly different but related tools.  As you note there are
advantages to the different list types.

> I don't believe that you will find `spamvertised' domains using
> exploited machines one day, and valid mailers later - Just a `new' exploited
> machine that hasn't made its way onto the lists yet (like IP jumping, being
> a RHS list is an advantage here too).

Exactly.  Any site advertised many times through zombie-delivered
spams is likely to belong to spammers and not whitehats.
Whitehats probably tend not to use zombies.

> Also, it wouldn't take a "major" joe job (or whatever the name for
> chafe that isn't personally directed would be - remember "joe job" refers
> to a specific spammer who was pissed at being thrown off joe.com).  You
> would just have to maintain a whitelist like you do now for people like
> w3c.org who are always being abused (or the phishing spam target companies,
> whose own pictures and logos usually appear, or newspapers and magazines
> who end up in 419s).

Yes our whitelist always applies, and additional processing and
testing would be done on the raw data before it was deemed usable.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread List Mail User
...
On Sun, 13 Mar 2005 05:29:04 -0800, Jeff Chan wrote:
>On Sunday, March 13, 2005, 5:12:30 AM, Jeff Chan wrote:
>> On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
>>> Does anyone have or know about a list of spam-advertised URIs
>>> where the spam they appeared in was sent through open relays,
>>> zombies, open proxies, etc.  In other words does anyone know
>>> of a list of spamvertised web sites or their domains that's
>>> been cross referenced to exploited hosts?
>
>>> We could use that information as a valuable tool for getting
>>> more records into SURBLs.
>
>> One fairly easy for anyone running a large SpamAssassin
>> installation to help us get this data would be to simply grep
>> for "XBL" and "SURBL" rules hitting the same message and report
>> out the URI domains from those messages.
>
>> Perhaps some kind person could write a reporting function in
>> SpamAssassin for this?
>
>Hmm, perhaps if we could extract *all* URI domains from messages
>sent through XBLed senders then prioritize those say by frequency
>of appearance, we could create a new SURBL list of spamvertised
>domains sent through exploited hosts.  That would pretty directly
>address the use of zombies, etc. and put a penalty on using them
>to advertise sites through them.  Even with volume weighting such
>a list of sites could be attacked by major joe job unless we took
>additional countermeasures, but does anyone else think this might
>be a useful type of data source for SURBLs?
>
>Jeff C.
>--
>"If it appears in hams, then don't list it."
>
>
Jeff,

I think it would be useful, *but* Spamhaus is very good at adding
IPs of sites that exploit the XBL - So you would see a significant overlap.
That said, the SURBL policy of attempting for zero FPs means that a SURBL
would probably have a higher score than the SBL gets (I think for 3.0.2 the
rule RCVD_IN_SBL is scored as "0 1.050 0 0.107" and the rule URIBL_SBL is
scored as "0 0.629 0 0.996"), so you could probably add a "safer" two points
or even more.  And, SURBLs are RHS lists, so you will catch IP jumping that
the SBL often misses (for a little while).

I don't believe that you will find `spamvertised' domains using
exploited machines one day, and valid mailers later - Just a `new' exploited
machine that hasn't made its way onto the lists yet (like IP jumping, being
a RHS list is an advantage here too).  The best current example I know of
is f2m.idv.tw-munged (who unlike most spammers, has a multi-year period on
their registration, doesn't change domain names, does IP jump and does use
exploited machines).  Today, there are on all five SURBLs and in the SBL
(and on my servers, they also get another 1.5 points for rfci.whois and
rfci.abuse URI rules).

Also, it wouldn't take a "major" joe job (or whatever the name for
chafe that isn't personally directed would be - remember "joe job" refers
to a specific spammer who was pissed at being thrown off joe.com).  You
would just have to maintain a whitelist like you do now for people like
w3c.org who are always being abused (or the phishing spam target companies,
whose own pictures and logos usually appear, or newspapers and magazines
who end up in 419s).

Sound good,

Paul Shupak
[EMAIL PROTECTED]


Re: [SURBL-Discuss] Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 7:31:01 AM, Raymond Dijkxhoorn wrote:
>> I'm not asking for trap data.  I'm asking to look for XBL hits,
>> then take the URIs from messages that hit XBL.  In other words
>> I want to get the sites that are being advertised through
>> exploited hosts.
>>
>> Nothing to do with traps or SBL.  ;-)

> If you can get a feed, why limit this to hosts found inside XBL?

This is not for a spam feed specifically.  It's to get data about
what sites are spam advertised through compromised hosts.  XBL
happens to be a good, reliable list of compromised hosts.  Other
lists like list.dsbl.org may be ok too, but those are the only
two RBLs I have a lot of confidence in.  The goal would not be to
get all data but to get all reliable data.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 5:36:55 AM, Raymond Dijkxhoorn wrote:
> Hi!

>>> Perhaps some kind person could write a reporting function in
>>> SpamAssassin for this?

>> Hmm, perhaps if we could extract *all* URI domains from messages
>> sent through XBLed senders then prioritize those say by frequency
>> of appearance, we could create a new SURBL list of spamvertised
>> domains sent through exploited hosts.  That would pretty directly
>> address the use of zombies, etc. and put a penalty on using them
>> to advertise sites through them.  Even with volume weighting such
>> a list of sites could be attacked by major joe job unless we took
>> additional countermeasures, but does anyone else think this might
>> be a useful type of data source for SURBLs?
[...]

> Spamtraps are bad news if you use them 1:1, you need to parse out a LOT, 
> did you run poluted spamtraps? I have been running two proxypots, i still 
> might have some tars, and most of it was really useless. What more helps 
> is a wider coverage. I rather see some automated system like spamcop 
> setup, so people can report, and we auto parse it with Joe's tool for 
> example. With a larger footprint we also get spam earlier. Its not like 
> they first send to the spamtraps and then to 'real'users alone.

> I understand you want to cover new area's but please dont rely on other 
> RBL's too much, i think waiting with own checks does much more in the end. 
> IF SBL picks it up we can pick it up faster. But we also want to pickup 
> ones NOT listed by any RBL do we ?

I think you're not understanding what I'm asking for.  :-)

I'm not asking for trap data.  I'm asking to look for XBL hits,
then take the URIs from messages that hit XBL.  In other words
I want to get the sites that are being advertised through
exploited hosts.

Nothing to do with traps or SBL.  ;-)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread List Mail User
>   ...The person with two clocks is never really sure of
> the current time.

OT, but... above - *not* a good quote, but it sounds nice)
To be `sure' of the time, you need at least three clocks (look at the
documentation for ntp/ntpd).

>
> ...
...

Paul Shupak
[EMAIL PROTECTED]


Re: [SURBL-Discuss] List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Kai Schaetzl
Jeff Chan wrote on Sun, 13 Mar 2005 05:12:30 -0800:

> One fairly easy for anyone running a large SpamAssassin 
> installation to help us get this data would be to simply grep 
> for "XBL" and "SURBL" rules hitting the same message and report 
> out the URI domains from those messages.
>

I have a large corpus of spam and ham by quarantining in MailScanner. 
Unfortunately, MailScanner doesn't alter the quarantined messages, so I 
would need to have a tool scan the saved score data in the Mailwatch db 
and then scan each corresponding message for URIs (and wouldn't know which 
one of them, matched).
So, depending on how you run SA, it's not that easy to get at this data. 
Wouldn't it be possible to have an option in SA that adds the matching URI 
to the score (URI_SURBL_domain.com) or saves it in a "summary"? Wouldn't a 
statistics module for SA make sense anyway?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Tests results are different?

2005-03-13 Thread David Suen
Hi all, I installed spam assassin (3.0.2) in my linux box and looks like
does not work (probably due to my configuration). I tried RTFM but still
cannot make it work.

Situation:
I use spamassassin with simscan + qmail. My problem is when I use the
spamassassin (not spamc) + sample-spam.txt it gives me the correct result
(and I tested another real spam with the 'keyword' "viagra"). However,
when I use spamc ...it keeps saying (all the emails) score is 0.0.

Header using spamc:
X-Spam-Status: spam=No, score=0.0 required=4.2 tests= none
autolearn=unavailable

Header using spamassassin: X-Spam-Level: 
X-Spam-Status: spam=Yes, score=4.3 required=4.2 tests= BIZ_TLD=0.527,
DATE_IN_PAST_06_12=0.211,DRUGS_ERECTILE=0.026,DRUG_ED_CAPS=1.535,
EXTRA_MPART_TYPE=0.222,SUBJECT_DRUG_GAP_VIA=1.77 autolearn=no


local.cf:

rewrite_subject 1
report_header 1
report_safe 2
required_hits 4.2
add_header all Status spam=_YESNO_, score=_SCORE_ required=_REQD_ tests=
_TESTSSCORES(,)_ autolearn=_AUTOLEARN_



Do you guys have any idea why spamc said the score is 0 for all emails?

If you need more information I am happy to provide if necessary.


Thanks

David




Re: What's wrong with my test?

2005-03-13 Thread Kai Schaetzl
 wrote on Sun, 13 Mar 2005 01:57:18 -0300:

> So, I redirect those messages to me and I received them as no spam again!
>

As I just wrote to "GRP Productions": Bayes doesn't work this way.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Re: Bayes DB does not grow anymore

2005-03-13 Thread Kai Schaetzl
GRP Productions wrote on Sun, 13 Mar 2005 11:21:12 +0200:

> for some days now my bayesian DB does not seem to grow. Its size remains 
> stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
> I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
> send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).
>

This doesn't prove anything. sa-learn --dump magic shows you what's inside. 
Also, Bayes is not a checksum system like Razor, that's its strength. If you 
learn something to it that means that it extracts tokens (short pieces) from 
the message and adjusts its internal probability for them being ham or spam by 
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the fact 
that the db format seems to have some "air" in it, so that it grows in jumps 
and not continually.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org





Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 5:12:30 AM, Jeff Chan wrote:
> On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
>> Does anyone have or know about a list of spam-advertised URIs
>> where the spam they appeared in was sent through open relays,
>> zombies, open proxies, etc.  In other words does anyone know
>> of a list of spamvertised web sites or their domains that's
>> been cross referenced to exploited hosts?

>> We could use that information as a valuable tool for getting
>> more records into SURBLs.

> One fairly easy for anyone running a large SpamAssassin
> installation to help us get this data would be to simply grep
> for "XBL" and "SURBL" rules hitting the same message and report
> out the URI domains from those messages.

> Perhaps some kind person could write a reporting function in
> SpamAssassin for this?

Hmm, perhaps if we could extract *all* URI domains from messages
sent through XBLed senders then prioritize those say by frequency
of appearance, we could create a new SURBL list of spamvertised
domains sent through exploited hosts.  That would pretty directly
address the use of zombies, etc. and put a penalty on using them
to advertise sites through them.  Even with volume weighting such
a list of sites could be attacked by major joe job unless we took
additional countermeasures, but does anyone else think this might
be a useful type of data source for SURBLs?

Jeff C.
--
"If it appears in hams, then don't list it."



Re: [SURBL-Discuss] List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
> Does anyone have or know about a list of spam-advertised URIs
> where the spam they appeared in was sent through open relays,
> zombies, open proxies, etc.  In other words does anyone know
> of a list of spamvertised web sites or their domains that's
> been cross referenced to exploited hosts?

> We could use that information as a valuable tool for getting
> more records into SURBLs.

One fairly easy for anyone running a large SpamAssassin
installation to help us get this data would be to simply grep
for "XBL" and "SURBL" rules hitting the same message and report
out the URI domains from those messages.

Perhaps some kind person could write a reporting function in
SpamAssassin for this?

Jeff C.
--
"If it appears in hams, then don't list it."



OT: "Spyware Assassin" suspended

2005-03-13 Thread Jeff Chan

  
http://news.yahoo.com/news?tmpl=story&ncid=738&e=1&u=/nm/20050311/tc_nm/tech_spyware_dc

FTC Says Anti-Spyware Vendor Shut Down

Fri Mar 11, 4:31 PM ET
[...]

The makers of Spyware Assassin tried to scare consumers into
buying software through pop-up ads and e-mail that warned their
computers had been infected with malicious monitoring software,
the Federal Trade Commission said. 

Free spyware scans offered by Spokane, Washington-based
MaxTheater Inc. turned up evidence of spyware even on machines
that were entirely clean, and its $29.95 Spyware Assassin program
did not actually remove spyware, the FTC said. 

A U.S. court has ordered the company and its owner, Thomas
Delanoy, to suspend its activities until a court hearing on
Tuesday. The company could be required to give back all the money
it made from selling Spyware Assassin. 
__

Comment:  LOL!

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread Jens Ahlin
> Jens Ahlin wrote:
>> When trying to build rpm using rpmbuild -tb
>> Mail-SpamAssassin-3.0.2.tar.gz
>> fails with
>> error: Failed build dependencies:
>> perl(Digest::SHA1) is needed by spamassassin-3.0.2-1
>> perl(HTML::Parser) is needed by spamassassin-3.0.2-1
>>
>> After installing these modules using CPAN rpmbuild still fails with the
>> same error.
>
> You are mixing CPAN installations and RPM installations.  RPM does not
> know about CPAN.  Once you have made the decision to install from CPAN
> you are commited to installing everything from CPAN.  You should
> either install everything with CPAN or install everything with RPM.
> Don't mix them.  The person with two clocks is never really sure of
> the current time.
>
> Since you have been installing perl modules by CPAN you should
> probably just continue and install spamassassin by CPAN too.  Which
> means you don't need to build an rpm package.  But if you want rpm to
> know that you have those perl modules installed then you need to
> install them by RPM.
>
>> Installing these modules from RPMS solves the problem. Why isn't
>> rpmbuild
>> find the modules installed using CPAN.
>
> Your question is rather like Bilbo Baggins asking "What do I have in
> my pocket?"  How would RPM know what you have installed by CPAN?
>
> Bob
>
>

Thanks for the clarification.

 Jens



Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
Hello,
for some days now my bayesian DB does not seem to grow. Its size remains
stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).

I use SpamAssassin 3.0.2. No configuration change has been done recently. It 
used to work fine.
I've tried --sync, --force-expire, but no luck.
Any help would be appreciated
Thanks
Greg

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread Matthias Keller
Bob Proulx wrote:
Jens Ahlin wrote:
 

When trying to build rpm using rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz
fails with
error: Failed build dependencies:
   perl(Digest::SHA1) is needed by spamassassin-3.0.2-1
   perl(HTML::Parser) is needed by spamassassin-3.0.2-1
After installing these modules using CPAN rpmbuild still fails with the
same error.
   

You are mixing CPAN installations and RPM installations.  RPM does not
know about CPAN.  Once you have made the decision to install from CPAN
you are commited to installing everything from CPAN.  You should
either install everything with CPAN or install everything with RPM.
Don't mix them.  The person with two clocks is never really sure of
the current time.
Since you have been installing perl modules by CPAN you should
probably just continue and install spamassassin by CPAN too.  Which
means you don't need to build an rpm package.  But if you want rpm to
know that you have those perl modules installed then you need to
install them by RPM.
 

Hi
If you really want to keep everything updated with CPAN except for SA 
you can always do a forced install to ignore dependencies or - better - 
edit the spec file not to depend on those modules which aren't RPMized 
and just make sure for yourself that they're up to date via CPAN... 
(Makes distributing that RPM afterwards very hard tough...)

Matt


Re: rule for mail not to me

2005-03-13 Thread Vicki Brown
At 20:15 -0800 03/06/2005, Vicki Brown wrote:
>I can create a user rule for mail not addressed (To or Cc) to me
>
>  header CF_NOT_FOR_METoCc !~ /[EMAIL PROTECTED]/
>  score CF_NOT_FOR_ME 4.0
>  describe CF_NOT_FOR_ME  Neither To nor Cc me
>
>However, the still-not-addressed user scores bug prevents me from setting the
>score any higher than 1 for these.
>   http://bugzilla.spamassassin.org/show_bug.cgi?id=4121
>


Many thanks to the SpamAssassin development team for fixing bug 4121!

However, I'm still interested in knowing:

>is there a magic variable for "my" address that would allow me to set up a
>general site-wide rule of this type?

-- 
Vicki Brown  ZZZ
Journeyman Sourceror:  zz  |\ _,,,---,,_ Code, Docs, Process,
Scripts & Philtres  zz /,`.-'`'-.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb   |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
___  '---''(_/--'  `-'\_)  ___


Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread Bob Proulx
Jens Ahlin wrote:
> When trying to build rpm using rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz
> fails with
> error: Failed build dependencies:
> perl(Digest::SHA1) is needed by spamassassin-3.0.2-1
> perl(HTML::Parser) is needed by spamassassin-3.0.2-1
> 
> After installing these modules using CPAN rpmbuild still fails with the
> same error.

You are mixing CPAN installations and RPM installations.  RPM does not
know about CPAN.  Once you have made the decision to install from CPAN
you are commited to installing everything from CPAN.  You should
either install everything with CPAN or install everything with RPM.
Don't mix them.  The person with two clocks is never really sure of
the current time.

Since you have been installing perl modules by CPAN you should
probably just continue and install spamassassin by CPAN too.  Which
means you don't need to build an rpm package.  But if you want rpm to
know that you have those perl modules installed then you need to
install them by RPM.

> Installing these modules from RPMS solves the problem. Why isn't rpmbuild
> find the modules installed using CPAN.

Your question is rather like Bilbo Baggins asking "What do I have in
my pocket?"  How would RPM know what you have installed by CPAN?

Bob


Re: What's wrong with my test?

2005-03-13 Thread shirlei
Citando Matt Kettler <[EMAIL PROTECTED]>:

> At 11:57 PM 3/12/2005, you wrote:
> >So, I redirect those messages to me and I received them as no spam again!
> 
> Define exactly what you mean by "redirect those messages". What specific 
> actions did you do?
   I used redirect tool from webmail (horde)

> 
> Also, tell us a bit about how your mail gets scanned for spam. What tools 
> do you use? procmailrc? a milter? etc.

  I'm just testing and learning about spamassassin. Wasn't me whom installed and
configured all the tools. But is used qmail . I connected in server as user spam
d and I executed the command I refer in my last message.  Do I missed something?
What information more is important to know?


> 
> 
> 
> 
> 
> 
> 
> 






Re: DCC in Spamassassin

2005-03-13 Thread Bill Randle
On Sat, 2005-03-12 at 19:07 -0800, Norman Zhang wrote:
> > I also uncommented the DCCIFD_ARGS line.
> 
> # used to start dccifd
> #   a common value is
> #   DCCIFD_ARGS="-SHELO -Smail_host -SSender -SList-ID"
> DCCIFD_ARGS=
> 
> My DCCIFD_ARGS is empty. Should I add the options that is shown above?

I would.

> > Was there a dccd file created in /etc/init.d as part of the
> > installation process for dcc? It starts dccd, grey, and dccifd. Here's
> 
> # ls -l /etc/rc.d/init.d/
> -rwx--  1 root root  1406 Mar  1 18:44 amavisd*
> -rwx--  1 root root  1101 Jan 28 08:26 clamd*
> -rwx--  1 root root  3266 Sep 28 03:32 dccd*
> -rwx--  1 root root  1219 Jan 28 08:26 freshclam*
> 
> Someone pointed out to me I should look for rcdcc, but I only have
> 
> # slocate cdcc
> /usr/bin/cdcc
> 
> Should I use this instead? BTW do I need to set use_dcc = 0 if I want to 
> use dccifd?

I don't have rcddc either. With SpamAssassin, use dccifd as previously
mentioned. Once you edited the dcc_conf file to enable DCCIFD, start it
using the init program:
# /etc/init.d/dccd start

I see you also have amavisd installed. If you run spamassassin from
amavisd you will need to reload it as well:
# amavisd reload

This will force it to re-read the spamassassin config files and pick up
your use_dcc change (should be set to 1) and look for the dccifd socket.

-Bill




What's wrong with my test?

2005-03-13 Thread shirlei


Hi everyone!
Probably I'm doing a stupid question... but, anyway, here it go:
I saved in a folder some messages that I received classified as no spam. So, i
run these command:
sa-learn --spam 
So I take the following :
Learned from 2 message(s) (3 message(s) examined).

So, I redirect those messages to me and I received them as no spam again! Am I
doing shit? What is wrong with my test?
Thanks for your attention.
bye



RE: DCC in Spamassassin

2005-03-13 Thread Greg Allen
My dcc on RedHat 8.0 is located in

/var/dcc  directory

You should see DCCIFD there as well.

I had to change a line in the /var/dcc/dcc_conf to

DCCIFD_ENABLE=ON

Also, make sure you install the latest DCC with DCCIFD from the Rayolite
website (if you do not have it).

Put the path to DCC in your local spamassassin config file just to be safe.
(local.cf)

DCC_HOME /VAR/DCC

Do all of this and spamassassin should automatically use DCCIFD instead of
DCC when it detects it.







-Original Message-
From: Norman Zhang [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 12, 2005 10:07 PM
To: users@spamassassin.apache.org
Subject: Re: DCC in Spamassassin


> I also uncommented the DCCIFD_ARGS line.

# used to start dccifd
#   a common value is
#   DCCIFD_ARGS="-SHELO -Smail_host -SSender -SList-ID"
DCCIFD_ARGS=

My DCCIFD_ARGS is empty. Should I add the options that is shown above?

> Was there a dccd file created in /etc/init.d as part of the
> installation process for dcc? It starts dccd, grey, and dccifd. Here's

# ls -l /etc/rc.d/init.d/
-rwx--  1 root root  1406 Mar  1 18:44 amavisd*
-rwx--  1 root root  1101 Jan 28 08:26 clamd*
-rwx--  1 root root  3266 Sep 28 03:32 dccd*
-rwx--  1 root root  1219 Jan 28 08:26 freshclam*

Someone pointed out to me I should look for rcdcc, but I only have

# slocate cdcc
/usr/bin/cdcc

Should I use this instead? BTW do I need to set use_dcc = 0 if I want to
use dccifd?

Regards,
Norman Zhang



Re: DCC in Spamassassin

2005-03-13 Thread Matt Kettler
At 10:07 PM 3/12/2005, Norman Zhang wrote:
Someone pointed out to me I should look for rcdcc, but I only have
# slocate cdcc
/usr/bin/cdcc
I'm not too familiar with the scripts that come with DCC for this.. I just 
wrote my own init script to start dccifd.


Should I use this instead? BTW do I need to set use_dcc = 0 if I want to 
use dccifd?
No..  SA will use neither dccproc nor dccifd if you do that. The use_dcc 
option completely enables or completely disables all of DCC at once.




Re: Header Tagging with # instead of *

2005-03-13 Thread John Andersen
On Saturday 12 March 2005 02:47 pm, jdow wrote:
> The canonical way to do it is something like:
>
> rewrite_header Subject *SPAM* _SCORE(00)_ **
>
> That gives headers that look like:
> Subject: *SPAM* 027.3 ** spoo is best for slow sex

The OP was interested in header tagging, (Hence the subject of the
thread), not munging the subject line.

-- 
_
John Andersen


pgpOWPF94wDsy.pgp
Description: signature


Re: SA addr tests need to be updated

2005-03-13 Thread List Mail User
>...
>Date: Sat, 12 Mar 2005 18:46:52 -0500
>From: "Eric A. Hall" <[EMAIL PROTECTED]>
>User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
>X-Accept-Language: en-us, en
>MIME-Version: 1.0
>To: users@spamassassin.apache.org
>Subject: Re: SA addr tests need to be updated
>References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> 
><[EMAIL PROTECTED]>
>...
>
>After considering all the discussion, I've filed these three bugs:
>
> 4188--RCVD_HELO_IP_MISMATCH should check address literals (this was
>   argued against by Justin, but I'm convinced it's spam-sign)
>
> 4186--RCVD_NUMERIC_HELO does not test "reserved" addresses (they are
>   still 'numeric' and aren't hostnames, and should still hit)
>
> 4187--RCVD_ILLEGAL_IP does not fire in all cases (reserved, malformed,
>   and literals should all be tested, but aren't)
>
>The rest of it can stay where it is and still be useful
>
>Thanks
>
>-- 
>Eric A. Hallhttp://www.ehsco.com/
>Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/
>
 
Eric,
 
I know what I say certainly hold no authority, but I clearly agree
 with 4186 and 4187.  And if you mean "literals" unqualified by brackets, I
 not only agree with 4188, but would argue that it and the others should be
 promoted to be DSN_ style rules and that the finding of unbracketed numeric
 HELO/EHLOs anywhere in the received chain is an *excellent* spam-sign
 (especially when forged one or two levels below the "relay" machine).
 For 4186 and 4187, it would seem that brackets are irrelevant - you are
 correct that all cases should be tested.
 
The only exception I would make, if they were DSN_* rules, would be
 a "-notfirsthop" qualifier for RFC1918 IP hosts and rule #4186 since they are
 so common for internal corporate networks running DHCP.
 
Paul Shupak
[EMAIL PROTECTED]


Re: DCC in Spamassassin

2005-03-13 Thread Norman Zhang
I also uncommented the DCCIFD_ARGS line.
# used to start dccifd
#   a common value is
#   DCCIFD_ARGS="-SHELO -Smail_host -SSender -SList-ID"
DCCIFD_ARGS=
My DCCIFD_ARGS is empty. Should I add the options that is shown above?
Was there a dccd file created in /etc/init.d as part of the
installation process for dcc? It starts dccd, grey, and dccifd. Here's
# ls -l /etc/rc.d/init.d/
-rwx--  1 root root  1406 Mar  1 18:44 amavisd*
-rwx--  1 root root  1101 Jan 28 08:26 clamd*
-rwx--  1 root root  3266 Sep 28 03:32 dccd*
-rwx--  1 root root  1219 Jan 28 08:26 freshclam*
Someone pointed out to me I should look for rcdcc, but I only have
# slocate cdcc
/usr/bin/cdcc
Should I use this instead? BTW do I need to set use_dcc = 0 if I want to 
use dccifd?

Regards,
Norman Zhang