Re: per-user bayes

2020-12-08 Thread micah anderson
Kris Deugau  writes:

> There will only be one database and set of tables, but one of the fields 
> in each table is the user identifier.  Fair warning - if you go full 
> per-user on a large system, this will MASSIVELY balloon the size of your 
> Bayes database, and most users will idle below the learning thresholds 
> for quite a long time.

Can you give an idea of the size calculation? I'm wanting to do this,
but I need to figure out how much space I need to allocate per user!

Thanks for the clarifications, this is super helpful.

-- 
micah


per-user bayes

2020-12-07 Thread micah anderson


Hi all,

I've got a site-wide bayes mysql setup. It keeps getting poisoned
quickly, because the user patterns are far too divergent from each
other. One person's spam is another person's ham, nobody is happy.

A per-user setup would let each user do their own thing, but I don't see
how I can do that because our system doesn't have individual system
users and I don't see that there are options in the bayes sql
configuration or per-user tables possible.

There is this bayes_sql_override_username configuration option, but this
is a configuration option that I can only set once, and is not
dynamic. There is this hint in the documentation that you can also use
this config option to trick sa-learn to learn data as a specific user,
but there is not much more information.

Has someone out there done this, and can show how you have done it?

At this point my options are to turn down the score for bayes, so it has
less of an impact, maybe turn off bayes auto-learning, or just simply
disabling bayes altogether.

thanks for any information

-- 
micah


Re: Happy Thanksgiving and Announcing the Apache SpamAssassin Channel for the KAM Rule Set

2020-11-26 Thread micah anderson


Great to hear, congrats on making this a channel! A very nice
thanksgiving treat.

"Kevin A. McGrail"  writes:

> Morning all,
> I wanted to share the news from 
> https://mcgrail.com/newsmanager/news_article.cgi?template=news.template_id=11
>  
> with you all.  We'll also have a mailing list up soon too.
> Thanks to the sponsors and to Georgia Smith and Karsten Bräckelmann who 
> worked hard on setting up the infrastructure for this.
>
> Happy Thanksgiving,
> KAM
>
>
>   Announcing the Apache SpamAssassin Channel for the KAM Rule Set
>
> Nov 26, 2020
> Happy Thanksgiving,
>
> The McGrail Foundation is proud to announce the immediate availability 
> of the channel for the KAM rule set.
>
> The rule set has been free and available to improve Apache SpamAssassin 
> installations for going on 17 years now. It includes rules for common 
> spam as well as contributed rules plus tweaks to help make things faster 
> and more efficient with the stock rules without lowering the efficacy.
>
> The KAM rule set is authored by Kevin A. McGrail with contributions from 
> Joe Quinn, Karsten Bräckelmann, Bill Cole, and Giovanni Bechis. It is 
> maintained by The McGrail Foundation.
>
> The KAM channel is made possible with the support of hosting from Linode 
> and help from PCCC & cPanel. More information about our sponsors can be 
> found at our Sponsor's Page  at 
> https://mcgrail.com/template/sponsors
>
> To enable the KAM rule set via an sa-update channel see the channel page 
>  at 
> https://mcgrail.com/template/kam.cf_channel
>
> -- 
> Kevin A. McGrail
> kmcgr...@apache.org
>
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
-- 
micah


Invaluement sendgrid list

2020-10-13 Thread micah anderson


Hi all,

I've been trying the
https://www.invaluement.com/spdata/sendgrid-id-dnsbl.txt list but
lately, I've been getting 'Couldn't connect to server' errors, fairly
regularly. The site says:

'can set them up for frequent downloads (every minute!) using CURL or
WGET - only using the setting that only downloads when the server
versions are newer.'

I am doing that, once per minute... are others having this issue?

thanks

-- 
micah


RE: Amazon, dhl, fedex, etc. phishing

2020-08-24 Thread micah anderson
John Hardin  writes:

> On Mon, 24 Aug 2020, Marc Roos wrote:
>
>> You should use spf for this.
>
> Duh.
>
> +1
>
> whitelist_auth  *@amazon.com
> blacklist_from  *@amazon.com
> whitelist_auth  *@*.amazon.com
> blacklist_from  *@*.amazon.com

I do not understand this, how does this work?

-- 
micah


A new high score!

2020-08-24 Thread micah anderson


What is the highest score you've seen a spam get? I think I just broke
my own high score, with a spam that managed to pile up 64 points.

I'm sure you all have seen much higher!

-- 
micah


Amazon, dhl, fedex, etc. phishing

2020-08-24 Thread micah anderson


We are regularly getting phishes from dhl, fedex, usps, amazon, netflix,
spotify that fakes the from (eg. amazon  wants
to send me a amadon-legit.pdf). Usually these are previously unknown to
pyzor, dcc, rbls, and domain reputation doesn't really exist[0].

I'm wondering if anyone has made a rule that looks to see if the From
contains amazon, but it is not amazon.com/.ca/.jp (all their TLDs), then
score them up, if it wants to also drop a psd, or a tar.xz, or a png, or
a pdf or whatever, then light them on fire.

thanks!

-- 
micah

0. this rule does fire, and is helpful, but not always:
FROM_FMBLA_NEWDOM From domain was registered in last 7 days


Re: Constructive solution to the blacklist thread

2020-07-24 Thread micah anderson
Noel Butler  writes:

[weird rant deleted]

> There are 192 _other_ countries in the world, the USA is united states

There are 194 other countries in the world.

-- 
micah


Re: Constructive solution to the blacklist thread

2020-07-23 Thread micah anderson


BLM thanks Eric Broch for his continued support.

If you pass on your address, I'll be sure to tell them to send you a
postcard in thanks for your donation.

Eric Broch  writes:

> Political correctness, BLM and Antifa (LGBTQ) as well as feminism (and 
> many other agendas) are being used as battering rams to destroy western 
> culture and usher in Marxist global governance. The real agenda isn't 
> "getting along" it's quite the opposite.
>
> On 7/23/2020 4:41 PM, Antony Stone wrote:
>> On Thursday 23 July 2020 at 22:44:51, Michael Orlitzky wrote:
>>
>>> The Apache foundation has some cash laying around. Make whatever wording
>>> changes you like, but **at the same time**, donate a meaningful amount
>>> of money to a cause like the ACLU or the defense/medical funds for the
>>> protestors.
>> Don't you have that the wrong way around?
>>
>> All these IT companies, groups and foundations who are changing their wording
>> to make the world a better place are doing what the ACLU has been trying to 
>> do
>> for years, so surely the ACLU should be funding the IT support people who 
>> have
>> to deal with the extra workload of managing these changes?
>>
>> The oppressed societal groups get the improvement they've been waiting for,
>> the ACLU doesn't have to work so hard, and the IT support staff get 
>> compensated
>> for the extra work they have to do for the benefit of society.
>>
>> Of course, that model all breaks down if you don't really believe that these
>> changes are going to make the world a better place, or that the oppressed
>> societal groups are not in fact going to be better off as a result of 
>> changing
>> the word black to block an an email filtering system, but nobody really 
>> thinks
>> that, do they?
>>
>> Note for those challenged by sarcasm or irony: I do not agree with the change
>> and I do not think it will have the effects it is being done in the name of.
>>
>>
>> Antony.
>>
-- 
micah


Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-14 Thread micah anderson
Eric Broch  writes:

> As I've pointed out in previous posts the proponents are under a delusion.

It is fascinating that the person who cried about ad hominem attacks so
much resorts to the very same.

Every time Eric Broch writes to me off-list, or on list about this
subject, I donate another $10 to a cultural marxist organization in his
name. Thanks Eric for your continued support of BLM!

-- 
micah


Re: IMPORTANT NOTICE FOR PEOPLE RUNNING TRUNK re: [Bug 7826] Improve language around whitelist/blacklist and master/slave

2020-07-12 Thread micah anderson
Eric Broch  writes:

> 2) You accuse "the right wing[er]" of making this issue political when 
> we've/I've done no such thing.

hilariously, you then go on to do exactly that:

> The maintainers of the list have listened to those who've turned
> something benign (whitelist/blacklist) into something political and
> are now groveling to the political Marxists.

Maybe you don't see it, but your war against the imaginary conspiracy
theory of cultural marxism is not at all benign, or apolitical. Play the
victim all you want, but invoking the spectre of "cultural Marxism" to
account for things you disapprove of is just proving the original
poster's point.

> Where does it stop. No one has answered my question. Now that
> whitelist/blacklist are gone why isn't Apache on the chopping block?
> What's next?

Depends if you want to haul out the frankfurt school, Marcuse, and
Adorno and the proletariat's desire to revolt, mix in a little bit of
Frued and claim that a mysterious group is using insidious forms of
psychological manipulation to chemtrail the 9/11 inside job. Clearly the
renaming of whitelist/blacklist is a Soros paid for plot intended to
destroy traditional Christian values and overthrow free enterprise, just
look at Clinton's emails...obvious link to pizza gate, and
Benghazi...who knows where you are going to stop this regurgitated drool
you had brainwashed into you, but...

Personally, I think it needs to stop here, the theory of cultural
Marxism is blatantly antisemtic, drawing on the idea of Jews as a fifth
column bringing down western civilisation from within, a racist trope
that has a longer history than Marxism. Like the Protocols of the Elders
of Zion, the theory was fabricated to create and perpetuated a culture
war (William Lind).

So where does it stop and what is next? It needs to stop right
here. Spewing anti-semetic bile on this mailing list is exactly what
needs to be next.

If this guy isn't spam, I don't know what is.

plonk



Re: Slipping through the cracks

2020-06-19 Thread micah anderson
John Hardin  writes:

> On Fri, 19 Jun 2020, micah anderson wrote:
>
>> So, what can I do to tweak these rules to score things up more,
>> specifically the rules that provide a low false positive rate[1]. This
>> seems something that should be done programmatically, and not
>> manually. It seems like what 'masscheck' maybe does generically for all
>> rules for all installations, but can I use that to just adjust our rules
>> for our particular breed of spam that comes through?
>
> How about: analyze your spamtrap for recent source IP addresses on a 
> quick schedule (hourly?) and drive a local DNSBL from IPs seen more than 
> 2-3 times in the last 24-48 hours?

Interesting possibility... but if I look at the current batch that made
it through, I see:

1. amazon aws
2. gmail (amusingly saying my amazon prime membership is going to
expire)
3. mailchimp
4. yahoo.com

all of those would not be good to block :(

Its not always like that, but it does happen.

-- 
micah


Slipping through the cracks

2020-06-19 Thread micah anderson


Hi folks,

I've spent a lot of time tuning our spamassassin setup over the
years. Channels, RBLs, pyzor, DCC, bayes, KAM rules, some home spun
rules, etc... and things do work fairly well, the rate is very high ,
but the ones that get through are the ones that are designed to get
around the defenses before they are shutdown. I get the feeling the
scores from many rules are too low, and I'm looking for the right way to
move forward.

The reason I say this is because I've got a spamtrap account, which is
comprised of several addresses that are heavily targeted by spam lists,
and these accounts seem to get the fast flux, rapid zone updates and ip
reputation burns (and other techniques) that are used to do initial spam
flooding before they are picked up by things. Once pyzor, dcc, and the
RBLs pick these up, they are usually scored high enough to get flagged
for everyone else, but without the RBLs, the scoring is too low to meet
that[0]. Of course I "learn" these messages when they come in.

I've been trying to analyze which are the techniques they use to try and
come up with rules that will stop them, but so far they are hard to come
up with something manually. i've taken several of these that got through
and later, after a day, checked them with network tests, and they are
all scored very high by the various lists, fuzzers, and checksums. Often
you will see these don't even hit rbls... but the ones that do, aren't
hitting enough of them to catch them... however usually, if an rbl is
hit, then it gets marked as spam, as most of the times several of the
RBLs all fire at once... but if they are not on rbls, they don't get
flagged as spam by the regular rules.

So, what can I do to tweak these rules to score things up more,
specifically the rules that provide a low false positive rate[1]. This
seems something that should be done programmatically, and not
manually. It seems like what 'masscheck' maybe does generically for all
rules for all installations, but can I use that to just adjust our rules
for our particular breed of spam that comes through?

Thanks for any ideas,
micah


0. with some notable exceptions, like KAM_DMARC_REJECT and
HELO_DYNAMIC_SPLIT_IP

1. like KAM_DMARC_STATUS, HTML_NO_CHARSET are possible ones, or mails
that do not have a To: have a score of 0.1

-- 
micah


homograph spam

2020-06-17 Thread micah anderson


Are there any plugins or techniques that can deal with UTF-8 homographs?
In particular, i'm seeing a lot of attempts to get past filters that
would match on a word like 'amazon', but do not catch it because the 'm'
has been replaced by the UTF-8 version of 'm' that looks identical.

I understand that UTF-8 From and Subject are legitimate, so I do not
want to just block those, but it seems like we should look for typical
homographs in the middle of words and add a weighted score for these.

I do have 'normalize_charset 1' set here.

-- 
micah


Re: Technically not spam

2020-05-31 Thread micah anderson
"@lbutlr"  writes:

> Squirrelmail is not supported and I would definitely not recommend
> anyone run it, especially since you have to run a version of PHP that
> hasn’t been supported in 4 years and has known exploits that will
> never be fixed.

I don't want to disagree with you, because I agree... except to point
out that the statement about old PHP being required is not true, you can
run squirrelmail with php7.3.

-- 
micah


Re: pyzor

2020-05-31 Thread micah anderson
Matus UHLAR - fantomas  writes:

>>> On 31.05.20 10:51, Noel Butler wrote:
>>>>Anyone else noticed it seems to scoring much much higher FP's in past
>>>>few weeks?
>>>>
>>>>Ima disable the damn thing I think.
>
>>Matus UHLAR - fantomas  writes:
>>> not here.
>
> On 31.05.20 08:15, micah anderson wrote:
>>here either. I've been noticing quite good results with pyzor
>>actually, and have thought it should be scored higher.
>>
>>I have seen messages reported 89 times, anyone seen more?
>
> how do you check this?

add_header all Pyzor _PYZOR_

-- 
micah


Re: pyzor

2020-05-31 Thread micah anderson
Matus UHLAR - fantomas  writes:

> On 31.05.20 10:51, Noel Butler wrote:
>>Anyone else noticed it seems to scoring much much higher FP's in past
>>few weeks?
>>
>>Ima disable the damn thing I think.
>
> not here.

here either. I've been noticing quite good results with pyzor
actually, and have thought it should be scored higher.

I have seen messages reported 89 times, anyone seen more?

-- 
micah


Re: shortcircuit internal mail

2020-05-20 Thread micah anderson


Thanks for the reply.

John Hardin  writes:

> On Tue, 19 May 2020, micah anderson wrote:
>
>> The final stage I thought would be short-circuited, because it was
>> relayed through our internal network, and we already do spam filtering
>> at the list server stage, we don't want to do it again.
>
> Nope. SA scans whatever you give it to scan, and that is driven by the 
> MTA. All you can do in SA is tune the scoring behavior.

Indeed, you are right. I had a fundamental misunderstanding in the
architecture.

>> Is there a way I can actually short-circuit this?

One way, which isn't particularly great, is to do something like this:

# if it comes from our list server, we don't want to scan it again
describe __LOCAL_OUR_LISTS  Was delivered to our lists
priority __LOCAL_OUR_LISTS  -100
header __LOCAL_OUR_LISTSDelivered-To =~ /\@lists\.example\.com/
shortcircuit __LOCAL_OUR_LISTS on

of course someone can forge the Delivered-To, there are some other list
specific headers that could also be found as well.

> Configure the second internal MTA to entirely skip passing the message to 
> SA for messages received from the first internal-only MTA, which has 
> already scanned them.
>
> You'll need to provide more-specific information about which MTA you're 
> using before we can provide more-specific advice than that.

That is an interesting idea, I'm running postfix, and doing the
following in master.cf right now:

dovecot  unix-   n   n   -  -   pipe
  flags=DRhu user=mail argv=/usr/bin/spamc --connect-retries=1 -H -d 10.0.1.90 
-s 1024 -t 100 -u ${recipient} -e /usr/lib/dovecot/dovecot-lda -f ${sender} 
-d ${user}@${domain}

and dovecot is a virtual_transport.

> Also be aware: "short-circuit" in the SA context doesn't *quite* mean what 
> you're asking.

Yeah, I am aware... it still fires up all of spamassassin and begins
processing, but at least with the priority level high, it should
determine things quickly and bail out.

-- 
micah


shortcircuit internal mail

2020-05-19 Thread micah anderson


Hi,

I've already got short-circuit setup, and it works, but not for mail
that goes like this:

gmail user sends to a mailing list on a mailing list server we
host, that server does some spamassassin scanning, and if it passes it
then delivers to our users subscribed to that mailing list, which is
sent via our internal mx server and then to our internal storage server,
where spamassassin scans it again.

The final stage I thought would be short-circuited, because it was
relayed through our internal network, and we already do spam filtering
at the list server stage, we don't want to do it again.

I've set: add_header all RelaysUntrusted _RELAYSUNTRUSTED_

and see that the final SA looks at the message that is delivered and
sees that it is coming from gmail, so internal_networks,
trusted_networks, and whitelist_to do not apply.

Is there a way I can actually short-circuit this?

This is what I have configured for short-circuit:

ifplugin Mail::SpamAssassin::Plugin::Shortcircuit
#
#   default: strongly-whitelisted mails are *really* whitelisted now, if the
#   shortcircuiting plugin is active, causing early exit to save CPU load.
#   Uncomment to turn this on
#
shortcircuit USER_IN_WHITELIST   on
shortcircuit USER_IN_DEF_WHITELIST   on
shortcircuit USER_IN_ALL_SPAM_TO on
shortcircuit SUBJECT_IN_WHITELISTon

# the opposite; blacklisted mails can also save CPU
shortcircuit USER_IN_BLACKLIST   on
shortcircuit USER_IN_BLACKLIST_TOon
shortcircuit SUBJECT_IN_BLACKLISTon

#   if you have taken the time to correctly specify your "trusted_networks",
#   this is another good way to save CPU
#
shortcircuit ALL_TRUSTED on

score ALL_TRUSTED -5

# simple, non-network-based whitelists, locally-generated messages,
# messages via a trusted relay chain, simple
meta SC_HAM 
(USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED)
priority SC_HAM -1000
shortcircuit SC_HAM ham
score SC_HAM -30

meta SC_SPAM (USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST)
priority SC_SPAM -950
shortcircuit SC_SPAM spam
score SC_SPAM 20

# slower, network-based whitelisting -- need to enable DKIM/SPF stuff before we 
can short circuit here
meta SC_NET_HAM 
(USER_IN_DKIM_WHITELIST||USER_IN_DK_WHITELIST||USER_IN_SPF_WHITELIST||USER_IN_DEF_DK_WL||USER_IN_DEF_DKIM_WL||USER_IN_DEF_SPF_WL)
priority SC_NET_HAM -500
shortcircuit SC_NET_HAM ham
score SC_NET_HAM -20

# bounce messages: always ignored if the vbounce plugin is active
priority ANY_BOUNCE_MESSAGE -700
shortcircuit ANY_BOUNCE_MESSAGE spam
score ANY_BOUNCE_MESSAGE 20

# ClamAV support: no need to scan viruses/malware
priority CLAMAV -900
shortcircuit CLAMAV spam
score CLAMAV 20

endif # Mail::SpamAssassin::Plugin::Shortcircuit


-- 
micah


Re: spamc learning/reporting

2020-05-18 Thread micah anderson
RW  writes:

>> 2. I cannot pass -C report and -L spam at the same time. If I do, I
>> get this message:
>> 
>> spamc: Learning excludes reporting to collaborative filtering
>> databases
>> 
>> and an exit code 64, which is:
>> 
>> EX_USAGE64  command line usage error
>> 
>> however, there is nothing in the manual that says these cannot both be
>> passed, and it seems like I should be able to do both at once, instead
>> of having to invoke spamc twice, once to adjust the bayes, and once to
>> report to pyzor/razor.
>
> With  'spamassassin -r', reporting implies automatically training
> Bayes (controlled by bayes_learn_during_report). IIWY I'd check that -C
> report doesn't do the same thing.

But `spamassassin -r` is different than `spamc -C report` isn't it?

I've been staring at the spamc code, but I'm not skilled enough here to
understand if -C report means it also learns.

I'd really like to know if I'm feeding the bayes database, or just
pyzor.


-- 
micah


spamc learning/reporting

2020-05-16 Thread micah anderson


Hi,

I noticed a few oddities with 'spamc':

1. I cannot pass a full email address to -u, if I pass 'user' it works,
but if I pass 'u...@example.com' it fails. How do people handle this
with multiple domains?

2. I cannot pass -C report and -L spam at the same time. If I do, I get
this message:

spamc: Learning excludes reporting to collaborative filtering databases

and an exit code 64, which is:

EX_USAGE64  command line usage error

however, there is nothing in the manual that says these cannot both be
passed, and it seems like I should be able to do both at once, instead
of having to invoke spamc twice, once to adjust the bayes, and once to
report to pyzor/razor.


-- 
micah


Re: spamtrap strategies

2020-05-16 Thread micah anderson
RW  writes:

>> I'm wanting to setup a spam trap, that should receive nothing but
>> actual spam, and feed that into spamassassin in some way. I'm
>> wondering the best way to automate feeding that data back to the
>> system.
>> 
>> Would it be best used for bayes tuning? It seems not, because it would
>> be 100% spam. 
>
> As long as there is ham from other sources and it doesn't ruin
> token retention, it shouldn't be a problem. Ideally you would only
> feed spam that doesn't reach BAYES_99 and is low-scoring.

That is the problem, our bayes database is not well fed. Its a global
database, and even with trusted 'feeders', it would drift fairly
the wrong way because usually people only trained with spam that did not
get caught, and didn't feel comfortable using their ham.

I've considered the idea of creating a per-user bayes dbs, but then I
couldn't use a spam-trap's caught spam to train all of those dbs,
because I wouldn't really have a clear idea of if those individual bayes
dbs were getting any ham.

>> Would it be better to use it for mass-check and contribute some to
>> the overall rule scoring?
>
> If you use it for Bayes or mass-checks I'd suggest not relaxing any
> pre-SpamAssassin checks. Some people do that to keep the numbers up,
> but optimizing around spam that doesn't reach SpamAssassin seems like a
> bad idea to me.

Each of the mails is 100% spam, so what I'd like to do is have an
automated way to tune my rule scoring, or improve/add rules based on
what gets sent there.

If I have to manually inspect each message by hand, and manually craft
rules, then it doesn't seem like this will scale very well at all.

-- 
micah


spamtrap strategies

2020-05-15 Thread micah anderson


Hi all,

I'm wanting to setup a spam trap, that should receive nothing but actual
spam, and feed that into spamassassin in some way. I'm wondering the
best way to automate feeding that data back to the system.

Would it be best used for bayes tuning? It seems not, because it would
be 100% spam. Would it be better to use it for mass-check and contribute
some to the overall rule scoring? Or would it be better to just build
some kind of RBL out of whatever it receives?

Thanks for any ideas/suggestions!

-- 
micah


Re: google as biggest botnet, no kidding

2020-05-12 Thread micah anderson
Riccardo Alfieri  writes:

> Yes, we are seeing an awful lot of phishing sites hosted under 
> https://firebasestorage.googleapis.com
>
> I'd say that 99% of them can be catched by a simple regex though, but I 
> don't know how common those firebasestorage URLs are in normal emails.. 
> I personally have still to see a legit one.

We receive a *huge* amount of phishing attempts from firebasestorage. My
regular routine is to wake up, and report these to google safebrowsing,
but it doesn't seem to have much of an effect.

There *are* occasional, like 1%, false positives... but something needs
to happen here.

-- 
micah


Spoofed From: names

2020-04-09 Thread micah anderson


Hi,

What is the current state of the art for dealing with tricking people in
the From with the "Name" part? For example:

From: "supp...@example.com"

The "Real Name" part is used to put a fake email address of the actual
domain (example.com would be my domain, or gmail.com or something other
than air-compressor.ml).

This has come up before[0], but at the time generic solutions seemed
problematic due to various false positives, or missing features in
spamassassin itself. I'm wondering what the current state is now.

I can do a relatively easy meta-rule for my domain, something like this,
but I'm not sure how well this would work, or if there are better
methods now:

header __LOCAL_FROM_QUOTE_ISUS  From =~ /\".*\@example\.com\"/
header __LOCAL_FROM_CONTAIN_NOTUS   From !~ /<.*\@example\.com/>/
meta TRICKY_FROM((( __LOCAL_FROM_QUOTA_ISUS ) + ( 
__LOCAL_FROM_CONTAIN_NOTUS )) > 1)
describe TRICKY_FROMFrom has example.com in quotes, but not 
in path
score TRICKY_FROM   5



0. https://www.mail-archive.com/users@spamassassin.apache.org/msg100800.html
-- 
micah


Re: Spamhaus Technology contributions to SpamAssassin

2019-07-03 Thread micah anderson
Giovanni Bechis  writes:

> On 7/3/19 7:11 PM, Riccardo Alfieri wrote:
>> On 03/07/19 17:59, atat wrote:
>> 
>>> You say in documentation:
>>>
>>>  You should also drop, by default, all Office documents with macros.
>>>
>>> What plugin / method do You reccomend for that ?
>> 
>> I'm no expert in detecting macros, but there at least two ways of doing that 
>> that comes to mind:
>> 
>> - Clamav with the option OLE2BlockMacros

Reading up on OLE2BlockMacros in clamav, I'm very confused by
https://www.mail-archive.com/clamav-users@lists.clamav.net/msg42671.html

Specifically:

Setting 'OLE2BlockMacros Yes' effectively causes
'Heuristics.OLE2.ContainsMacros' to be returned, and disables all
official and unofficial signatures.

When 'OLE2BlockMacros Yes' this causes 'Heuristics.OLE2.ContainsMacros'
to be returned first and all other signatures that are not against
uncompressed macros are ignored. You only get one signature back and
that is the first one hit, which may be a 'soft' signature ie one you
mightn't discard an email on, such as Heuristics.OLE2.ContainsMacros,
even though 'hard' signatures official or unofficial might also have hit
if they had been run later .

> This has been superseded by 
> https://svn.apache.org/repos/asf/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/OLEMacro.pm
> the plugin is for trunk but it works out of the box in 3.4.3rc3 as well (some 
> work is needed to let it work on 3.4.2)

Can't these be blocked at the MTA level to be much more CPU friendly?

-- 
micah


Re: Scoring by registrar?

2019-07-01 Thread micah anderson
Sean Lynch  writes:

>>Having such a list would be very helpful for dealing with fast flux.
>
> SA already has this. It used fresh.fmb.la to detect domains registered within 
> the past couple of weeks.

It does? Do I need to enable something to get that?
-- 
micah


Re: Scoring by registrar?

2019-07-01 Thread micah anderson
Grant Taylor  writes:

>> A very large number (nearly all, in fact) of the spams I receive these 
>> days involve domains registered with Namecheap. I've received hundreds 
>> of spams involving .icu domains from what appear to be the same spammer. 
>> I also receive a large number of scams impersonating Bitmain, again 
>> using domains involving Namecheap.
>
> Is Namecheap just the registrar?  Or are they also hosting the DNS service?

As a Namecheap customer, you are making me want to move. That is good,
but its also something you should consider, before you block the entire
registrar: there are a significant number of non-spamming Namecheap
customers that you would be cutting off if you did this. I understand
you want to put pressure on Namecheap, but the flip side of that is you
will be cutting yourself off from those domains in the process.

>> While Namecheap does suspend at least some domains within days of their 
>> being used in a campaign, it's clear that these are being treated as 
>> single-use domains, so this has very little impact on the spammers.

This sounds like Fast Flux - and it is not something that happens only
on Namecheap.

> I think there are also lists of domains that have been recently 
> registered.  Which might help if the single use domains were recently 
> registered.

Having such a list would be very helpful for dealing with fast flux.

-- 
micah


Re: multiplying in rules

2018-11-20 Thread micah anderson
"Bill Cole"  writes:

> On 20 Nov 2018, at 13:53, John Hardin wrote:
>
>> On Tue, 20 Nov 2018, micah anderson wrote:
> [...]
>>>> What it does do is prevent compiled rules from being installed. But 
>>>> as I
>>>> said it's the decimal fractions that cause it to fail and the above
>>>> rule doesn't need to contain decimal fractions.
>>>
>>> How can I do it without the fractions?
>>
>> Multiply everything by 10:(__rulename * 4) ...etc... > 10
>
> Or replace every decimal fraction with an integer division, so '0.4' 
> becomes '(4 / 10)'

oh, of course. I was thinking that these amounts contributed to the
score, but they do not. Thanks for wiping away the grime from my brain.


-- 
micah


Re: multiplying in rules

2018-11-20 Thread micah anderson
RW  writes:

> On Tue, 20 Nov 2018 12:53:18 -0500
> micah anderson wrote:
>
>> RW  writes:
>> 
>> > On Tue, 20 Nov 2018 12:38:24 -0500
>> > micah anderson wrote:
>> >  
>> >> I was doing multiplication in rules to add scores, like this:
>> >> 
>> >> meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 *
>> >> __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 *
>> >> __LOCAL_LIMIT))  
>> >> > 1)  
>> >> 
>> >> but now when I run spamassassin --lint, I'm told things like this:
>> >> 
>> >> Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4  
>> >
>> > It's the decimal fractions. 
>> >
>> >> What should I do to fix that?  
>> >
>> > It should be fixed in the next release.  
>> 
>> ok, but until then, is the only option for me to disable these rules?
>> These are particularly important rules for stopping phishing attacks,
>> so I'd like to not disable them, but find some other kind of work
>> around!
>
> I don't believe it prevents the rule from working.

It prevents sa-compile from running because spamassassin --lint fails.

> What it does do is prevent compiled rules from being installed. But as I
> said it's the decimal fractions that cause it to fail and the above
> rule doesn't need to contain decimal fractions.

How can I do it without the fractions?

I've applied the patch from the repo to make it work.
-- 
micah


Re: multiplying in rules

2018-11-20 Thread micah anderson
RW  writes:

> On Tue, 20 Nov 2018 12:38:24 -0500
> micah anderson wrote:
>
>> I was doing multiplication in rules to add scores, like this:
>> 
>> meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 *
>> __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 * __LOCAL_LIMIT))
>> > 1)
>> 
>> but now when I run spamassassin --lint, I'm told things like this:
>> 
>> Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4
>
> It's the decimal fractions. 
>  
>> What should I do to fix that?
>
> It should be fixed in the next release.

ok, but until then, is the only option for me to disable these rules?
These are particularly important rules for stopping phishing attacks, so
I'd like to not disable them, but find some other kind of work around!


-- 
micah


multiplying in rules

2018-11-20 Thread micah anderson


I was doing multiplication in rules to add scores, like this:

meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 * __LOCAL_EXCEEDED) + (0.4 
* __LOCAL_STORAGE) + (0.4 * __LOCAL_LIMIT)) > 1)

but now when I run spamassassin --lint, I'm told things like this:

Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4

What should I do to fix that?

Thanks!

-- 
micah


Re: Current update channels

2018-09-20 Thread micah anderson
"Kevin A. McGrail"  writes:

> There are people asking me to put KAM.cf under the default sa-update
> crypto signature.  Technically, it's easy.  But it would have to be
> carefully considered as it's not a project ruleset.  Thoughts on that?

I would be interested in KAM as part of an update channel, it would make
updates more frequent. The only thing is I have to adjust KAM each time
I update it. For example, the political spam section is a bit dated and
has caused some frustrations for people.

-- 
micah


Re: Understanding ruleQA results

2018-08-14 Thread micah anderson
John Hardin  writes:

> On Tue, 14 Aug 2018, micah anderson wrote:
>
>> John Hardin  writes:
>>
>>> On Tue, 14 Aug 2018, micah anderson wrote:
>
> OK, I can see about adding some mobile MUA exclusions. Any FP headers you 
> can provide (directly) will be helpful. Go ahead and sanitize the 
> recipient info, I don't think that would be relevant to tuning this one.

I put 4 of the messages here:

https://pastebin.com/YuPtBQXN

thanks for your help!

micah


Re: Understanding ruleQA results

2018-08-14 Thread micah anderson
John Hardin  writes:

> On Tue, 14 Aug 2018, RW wrote:
>
>> On Tue, 14 Aug 2018 13:24:47 -0700 (PDT)
>> John Hardin wrote:
>>
>>> On Tue, 14 Aug 2018, micah anderson wrote:
>>>
>>
>>>> I searched my pile of mail that I have from two ice ages ago, and I
>>>> did find 6 messages that were hits of this rule, one of them was
>>>> spam, five of them were this person trying to contact me.
>>>
>>> ...without a subject?
>>>
>>>>> Do you happen to be seeing FPs with this rule?
>>>>
>>>> Yes, its why I am investigating it. I think it is common for people
>>>> who are sending mail from their mobiles, where they use it more
>>>> like a quick chat instead of a 'regular mail'
>>>>
>>>> In fact, this person used:
>>>> X-Mailer: iPad Mail (15F79)
>>>
>>> OK, I can see about adding some mobile MUA exclusions. Any FP headers
>>> you can provide (directly) will be helpful. Go ahead and sanitize the
>>> recipient info, I don't think that would be relevant to tuning this
>>> one.

I'll provide some pastebin links in a separate email.

>> I don't know that this is particularly specific to mobile, lots of
>> people send emails with an empty subject.
>>
>> It sounds like the main cause would be a signature that contains the
>> senders name as the only thing in a line. That'll be why all the
>> FPs mentioned above came from the same person.

Yes, this person has as their signature their name on one line, and
their From: has that same name listed.

> Question: were those messages scored as spam?

yes, they were, will include the reports in the off-list email.

-- 
micah


Re: Understanding ruleQA results

2018-08-14 Thread micah anderson
John Hardin  writes:

> On Tue, 14 Aug 2018, micah anderson wrote:
>
>> but how can I tell how many messages are part of the corpus?
>
> As RW said, hover over the percentages.

Thanks.

>> Also, the percentages seem very low: 1.5192% Spam, and .0005%
>> Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but
>> what do I know... which is why I'm asking.
>
> It's not so much the raw amount of spam it hits, it's that it hits spam 
> that few other rules hit, or that it hits spam that other rules hit but 
> that doesn't score high enough with those other rules.
>
> You also want to look at the score-map section when evaluating a rule.

Is there an explanation of the score-map section somewhere?

For this one it says:

  scoremap  ham:  0  33.33%1 *
  scoremap  ham:  1  66.67%2 **
  scoremap spam:  1   0.08%   15 
  scoremap spam:  3   0.61%  121 
  scoremap spam:  4  90.24% 17791 
  scoremap spam:  5   2.69%  531 *
  scoremap spam:  6   4.54%  896 *
  scoremap spam:  7   1.10%  217 
  scoremap spam:  8   0.26%   52 
  scoremap spam:  9   0.40%   79 
  scoremap spam: 10   0.01%2 
  scoremap spam: 11   0.05%9 
  scoremap spam: 14   0.01%2 

What are these columns and how can I interpret it?

> It's not so much the raw amount of spam it hits, it's that it hits spam 
> that few other rules hit, or that it hits spam that other rules hit but 
> that doesn't score high enough with those other rules.

I searched my pile of mail that I have from two ice ages ago, and I did
find 6 messages that were hits of this rule, one of them was spam, five
of them were this person trying to contact me. 

> Do you happen to be seeing FPs with this rule?

Yes, its why I am investigating it. I think it is common for people who
are sending mail from their mobiles, where they use it more like a quick
chat instead of a 'regular mail'

In fact, this person used:
X-Mailer: iPad Mail (15F79)


-- 
micah


Understanding ruleQA results

2018-08-14 Thread micah anderson


Hi,

I'm trying to understand the ruleQA results because I'm trying to track
down how common the rule FRNAME_IN_MSG_NO_SUBJ is spammy.

I load the latest rules: 
http://ruleqa.spamassassin.org/20180813-r1837926-n/FRNAME_IN_MSG_NO_SUBJ/detail?s_corpus=1_g_over_time=1#overtime

and I see the S/O value is 1.0, which is a rule that hits only on spam
(a rule that only hits on ham is 0.0, a rule that doesn't anything is
0.5)... but how can I tell how many messages are part of the corpus?

Also, the percentages seem very low: 1.5192% Spam, and .0005%
Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but
what do I know... which is why I'm asking.

thanks!


-- 
micah


Re: SA MySQL DB maintenance

2018-07-17 Thread micah anderson
"Kevin A. McGrail"  writes:

> I think Bayes should be in redis though not SQL.

Curious to know why you think that?


Re: MISSING_SUBJECT

2018-06-14 Thread micah anderson
John Hardin  writes:

> On Tue, 12 Jun 2018, micah anderson wrote:
>
>> I had a message marked with:
>>
>> 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
>> Subject:
>>
>> It did not have a subject, but it did have content (although only
>> encrypted)
>
> It may not be considering an encrypted message part to be a text body 
> part. What was the MIME type of that part?

pgp/mime

-- 
micah


Re: MISSING_SUBJECT

2018-06-13 Thread micah anderson
Matus UHLAR - fantomas  writes:

> On 12.06.18 19:37, micah anderson wrote:
>>2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
>>Subject:
>>
>>It did not have a subject, but it did have content (although only
>>encrypted) it also hit:
>>
>>*  1.8 MISSING_SUBJECT Missing Subject: header
>>
>>which makes sense, because the mail did not have one, but have you
>>looked in your Spam folder lately? All spam has a subject, pretty much
>>always an informal survey of my trash heap showed 4 messages out of
>>400 did not have a Subject, and two of them were repeats.
>
> and what is your point?

The point is EMPTY_MESSAGE scores even though it did have content. But I
guess the point is that it had no 'text' parts, because the content was
only pgp/mime?

-- 
micah


Re: MISSING_SUBJECT

2018-06-12 Thread micah anderson
Reindl Harald  writes:

> Am 13.06.2018 um 01:37 schrieb micah anderson:
>> I had a message marked with:
>> 
>> 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
>> Subject:
>> 
>> It did not have a subject, but it did have content (although only
>> encrypted) it also hit:
>> 
>> *  1.8 MISSING_SUBJECT Missing Subject: header
>> 
>> which makes sense, because the mail did not have one, but have you
>> looked in your Spam folder lately? All spam has a subject, pretty much
>> always
>
> no - there is ton of junk without a subject and sometimes even floods
> with no subject and no body at all

I believe you, however the message was not empty, it had encrypted
contents (and in fact was scored -1 because of that).


MISSING_SUBJECT

2018-06-12 Thread micah anderson



I had a message marked with:

2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
Subject:

It did not have a subject, but it did have content (although only
encrypted) it also hit:

*  1.8 MISSING_SUBJECT Missing Subject: header

which makes sense, because the mail did not have one, but have you
looked in your Spam folder lately? All spam has a subject, pretty much
always an informal survey of my trash heap showed 4 messages out of
400 did not have a Subject, and two of them were repeats.

-- 
micah


Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle bayes

2015-09-23 Thread micah anderson

Hi,

I'm getting these errors in my log files, quite regularly:

Sep 23 21:58:16 towhee spamd[25561]: Issuing rollback() due to DESTROY without 
explicit disconnect() of DBD::mysql::db handle bayes:0.0.0.0 at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 1590,  line 2.

It appears that bayes is working, because I see logs like this:

Sep 23 22:02:19 towhee spamd[10768]: spamd: result: . -1 - 
AM_TRUNCATED,BAYES_00,CK_419SIZE,ENV_FROM_DIFF0,FORWARD_RELAY,HAS_REPLY_TO,HTML_MESSAGE,IP_REPEATING,MISSING_MID,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SUBJ_DATE
 
scantime=0.7,size=11555,uid=65534,required_score=5.0,rhost=0.0.0.0,raddr=0.0.0.0,rport=37464,mid=(unknown),bayes=0.000147,autolearn=disabled,shortcircuit=no

line 1590 is in the sub learner_new, but i have set in local.cf:

local.cf:bayes_auto_learn 0
local.cf:bayes_learn_to_journal0

It seems like the database is working fine...

any ideas?

thanks!
micah



trusted networks getting marked as spam

2014-10-24 Thread micah anderson

Hi,

I've got some machines that are running logcheck, they periodically send
mail to us with reports. Sometimes those mails have some spammy stuff in
them, because they are mail server logs, or web logs with some spammy
stuff in them. 

I don't want spamassassin to deal with these messages, I want them to
come through no matter what. I don't want them to contribute to bayes
scoring and I don't want them ever to end up as Spam.

Unfortunately, they are, it seems mostly because URIBL scores are
hitting before the SHORTCIRCUIT/ALL_TRUSTED stuff fires, so for example:

X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07)
X-Spam-Flag: YES
X-Spam-Status: Yes, score=8.1 required=6.0 tests=ALL_TRUSTED,SHORTCIRCUIT,
URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL shortcircuit=ham
autolearn=disabled version=3.4.0

I've got the IP in trusted_networks, and internal_networks and I've got
a couple shortcircuit rules like as follows:

# simple, non-network-based whitelists, locally-generated messages,
# messages via a trusted relay chain, simple
meta SC_HAM 
(USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED)
priority SC_HAM -1000
shortcircuit SC_HAM ham
score SC_HAM -20

meta SC_SPAM (USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST)
priority SC_SPAM -950
shortcircuit SC_SPAM spam
score SC_SPAM 20

shortcircuit ALL_TRUSTED on

yet, the high scoring due to the URIBLs caused this to get classified as
Spam.

How can I get around that?

Thanks!
micah



update channel list

2012-01-18 Thread Micah Anderson

I've had the following channel list for a while:

updates.spamassassin.org
sought.rules.yerp.org
khop-bl.sa.khopesh.com
khop-blessed.sa.khopesh.com
khop-general.sa.khopesh.com
khop-sc-neighbors.sa.khopesh.com

but I suspect that some of these are no longer good. I was hoping folks
out there might be able to make some suggestions for improvements?

thanks,
micah

-- 



pgpOebTBWqWzt.pgp
Description: PGP signature


sa-learn --force-expire taking hours

2010-10-26 Thread Micah Anderson

I was investigating this morning why a number of spam messages were
coming through and found that they weren't scoring on bayes, because it
was unavailable. The database connection was working fine, but I noticed
that the nightly sa-learn --sync --force-expire had been running since
3am, which was 4 and a half hours ago:

root 26302  0.0  0.0   2440   892 ?Ss   03:00   0:00 /bin/sh -c 
sa-learn --sync --force-expire /dev/null 21
root 26305  0.0  0.0  35492  2528 ?S03:00   0:04 /usr/bin/perl 
-T -w /usr/bin/sa-learn --sync --force-expire

I connected to the database and did a 'show processlist\g' and found a
number of really long running processes:

| Id | User| Host| db| Command | Time   | State
| Info
|  66652 | spamass | 127.0.0.1:55248 | bayes | Query   | 355113 | Sending data 
| SELECT count(*)
   FROM bayes_token
  WHERE id = '5'
AND ati | 

a bunch of NULL processes (what are these?):

| 463898 | spamass | 127.0.0.1:41393 | bayes | Sleep   |  10592 |  
| NULL  
   

and a handful of 'rollback' processes:

| 474169 | spamass | 127.0.0.1:35973 | bayes | Query   |   1078 | NULL 
| rollback

Plus the various bayes processes that I expect, a sampling of which is below:

| 474756 | spamass | 127.0.0.1:34141 | bayes | Query   |472 | end  
| UPDATE bayes_token SET atime = '1288102083' WHERE id = '5' AND token IN 
('???-6','??,'R???','Xt | 
| 475050 | spamass | 127.0.0.1:48442 | bayes | Query   |  5 | Updating 
| UPDATE bayes_vars
  SET spam_count = spam_count + '1'
 WHERE id = '5'| 
| 475089 | spamass | 127.0.0.1:48669 | bayes | Query   |  0 | statistics   
| SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
 FROM bayes_token

Any ideas what could be going on, or steps I could take to troubleshoot
this?

Thanks!
micah

-- 



pgpkF4tD1yEOu.pgp
Description: PGP signature


Re: Bayes timeouts and database handle being DESTROY'd without explicit disconnect

2010-10-26 Thread Micah Anderson
Dominic Benson domi...@lenny.cus.org writes:

 On 19 Oct 2010, at 17:05, Micah Anderson wrote:

 
 Hello,
 
 I'm running a busy mail server. We've got a bayes database on its own
 server, with InnoDB tables. 

 What is your total DB size / server RAM? Could you include a snapshot of the 
 output of top from the DB server? I would guess that your problem is 
 indexing/tuning or server capacity MySQL side rather than in SA, but without 
 more data it is just a guess.

The databsae size is 2.74gig.

$ free
 total   used   free sharedbuffers cached
Mem:   805587668727401183136  0 5840325403916
-/+ buffers/cache: 8847927171084
Swap:  1959912 5694321390480

top - 07:26:39 up 10 days, 20:37,  1 user,  load average: 9.24, 6.80, 6.15
Tasks:  24 total,   2 running,  22 sleeping,   0 stopped,   0 zombie
Cpu(s): 83.3%us, 16.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   8055876k total,  6890032k used,  1165844k free,   584364k buffers
Swap:  1959912k total,   569432k used,  1390480k free,  5405264k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND   
10744 mysql 20   0  655m 110m 5500 S  190  1.4   9296:14 mysqld 
10765 stunnel4  20   0  123m 109m 1416 S2  1.4 179:38.73 stunnel4   
1 root  20   0  1984  636  548 S0  0.0   2:40.15 init   
  397 bind  20   0 82856  23m 2632 S0  0.3   0:46.72 named  
 1812 root  20   0  3120 1176  772 S0  0.0   0:15.04 syslog-ng  
 3551 messageb  20   0  2488  648  488 S0  0.0   0:00.00 dbus-daemon
 3610 nobody20   0  6368 2668  888 S0  0.0   0:11.94 nagios-statd   
 4828 root  20   0  5484 1824 1476 S0  0.0   0:09.44 master 
10707 root  20   0  3784 1276 1076 S0  0.0   0:00.02 mysqld_safe
10745 root  20   0  2892  608  532 S0  0.0   0:00.00 logger 
10760 stunnel4  20   0  3836  688  348 S0  0.0   1:25.14 stunnel4   
10761 stunnel4  20   0  3836  692  352 S0  0.0   1:16.94 stunnel4   
10762 stunnel4  20   0  3836  692  352 S0  0.0   1:16.24 stunnel4   
10763 stunnel4  20   0  3836  692  352 S0  0.0   1:16.45 stunnel4   
10764 stunnel4  20   0  3836  692  352 S0  0.0   1:20.77 stunnel4   
11311 root  20   0  2044  888  704 S0  0.0   0:09.02 cron   
15444 postfix   20   0  5496 1788 1452 S0  0.0   0:00.00 pickup 

I'm averaging around 150 mysql threads, with peaks during peak mail
times. 

 and a few of these, although not that many:
 
 Oct 17 12:02:29 spamd3 spamd[6367]: prepare_cached(SELECT max(runtime) from 
 bayes_expire WHERE id = ?) statement handle DBI::st=HASH(0xadbb060)still 
 Active at /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line 722


 Try an EXPLAIN SELECT max(runtime) from bayes_expire WHERE id = some value; 
 as you know it to be slow it might give a clue where to look to improve 
 performance. Or try turning the general query log on for a while and see what 
 queries are taking up time. MonYog is quite a nice frontend to this, but you 
 can do it by hand fairly simply.

mysql EXPLAIN SELECT max(runtime) from bayes_expire WHERE id = 5;
++-+--+--+---+---+-+---+--+---+
| id | select_type | table| type | possible_keys | key  
 | key_len | ref   | rows | Extra |
++-+--+--+---+---+-+---+--+---+
|  1 | SIMPLE  | bayes_expire | ref  | bayes_expire_idx1 | 
bayes_expire_idx1 | 2   | const |  198 |   | 
++-+--+--+---+---+-+---+--+---+
1 row in set (0.00 sec)

Note, this might be related to the post I made today about sa-learn
--expire taking hours... 

micah



Bayes timeouts and database handle being DESTROY'd without explicit disconnect

2010-10-19 Thread Micah Anderson

Hello,

I'm running a busy mail server. We've got a bayes database on its own
server, with InnoDB tables. 

I'm seeing a number of these entries in my log files and am struggling
to determine what could be causing them and how to fix them:

Oct 19 07:02:10 spamd3 spamd[27474]: learn: exceeded time limit in pms learn
Oct 17 06:30:12 spamd3 spamd[25651]: plugin: eval failed: bayes: (in learn) 
__alarm__ignore__(15190)
Oct 17 06:30:42 spamd3 spamd[25598]: plugin: eval failed: bayes: (in learn) 
child processing timeout at /usr/sbin/spamd line 1283, GEN1295 line 185.

I get quite a few of these:

Oct 19 07:02:19 spamd3 spamd[18746]: Issuing rollback() for database handle 
being DESTROY'd without explicit disconnect() at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 1516, GEN19133 line 2.

and a few of these, although not that many:

Oct 17 12:02:29 spamd3 spamd[6367]: prepare_cached(SELECT max(runtime) from 
bayes_expire WHERE id = ?) statement handle DBI::st=HASH(0xadbb060)still Active 
at /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line 722

Oct 19 05:33:13 spamd3 spamd[1630]: bayes: db_seen corrupt: value='1287482415' 
for 5d6fb52248450ee7528848c3a78b5a0650a24...@sa_generated, ignored at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 397, GEN18675 line 
112.

thanks for any insights!
micha


pgpOWKtRHjXPz.pgp
Description: PGP signature


Re: dcc: [26896] terminated: exit 241

2010-04-22 Thread Micah Anderson
Ted Mittelstaedt t...@ipinc.net writes:

 Actually it's not even that.  The notion that Debian spent effort
 detecting and removing DCC source is rather farfetched.

Sorry, but you are pretty off here. Debian does this all the time. I'm
an official Debian Developer and I have personally been involved in
doing this a few times.

 Because Linux distros are so large, many freely available
 commercially-licensed apps - such as device drivers - some of which
 also do not carry your allowed to distribute this licenses, get
 sucked up into the distributions.

Unless you can find an example, you are making a specious argument. Do
you know the process to get software into Debian?

 Some of this happens by users contributing them and not reading the
 licensing closely enough, but quite a lot of it happens by commercial
 companies deliberately inserting their stuff in the distros.

First, 'users' do not contribute applications to Debian, that isn't how
it works. Secondly, even if an official Debian Developer (who actually
is the only person permitted to contribute things to the Debian archive)
happens to do as you assert and not read the licensing, then the Debian
FTP-masters, whose role it is to specifically determine if the Debian
Developer did their due diligence in checking the license restrictions,
would reject that package.

I guess the fact that I had to explain this answers my previous
question, you do not understand how software gets into Debian. I would
advise you to educate yourself before making arguments that by their
very nature demonstrate your misunderstanding, it weakens your argument.

[snip]

 It's also generally understood that if a commercial app seller
 doesen't like it they have the right to complain and get an immediate
 cessation of inclusion of their apps in a distro.  That is why I
 suspect happened
 here.

Sorry, but if a DFSG-licensed application is put in Debian, no
commercial app seller has any right to complain and get an immediate
cessation of inclusion of their apps in a distro. It doesn't work that
way.

 Distributed Checksum Clearinghouse quite obviously feels that they have
 captured enough fishes in the ocean and are making plenty of money now
 and so do not require all of the free advertising that inclusion of
 their source in Debian gives them.  Quite obviously they complained
 and
 their stuff was withdrawn as a result.

Your conclusions are amazing, but that does not make them any more
right.

micah



spamc randomization

2010-04-21 Thread Micah Anderson

I'm using the --randomize option to spamc, along with the -d switch that
has a hostname which resolves to multiple IP addresses. 

Does the --randomize get passed the full set of IPs that are resolved
from the -d hostname and then it randomizes those IPs? In otherwords,
you can have one host name (say 'spamd') which resolves to multiple IPs
and then passed to the --randomize to be picked from? That seems to be
how it is described, but I could be misinterpreting it.

The description of the --randomize option in the man page which says,
'the IP addresses returned for the hosts given by the -d switch', and
the -d switch says you can do this:

   If host resolves to multiple addresses, then spamc will
   fail-over to the other addresses, if the first one cannot be
   connected to.  It will first try all addresses of one host
   before it tries the next one in the list.  


I'm also a little unclear what the --randomize man section means when it
says, it will try only three times though. Say the hostname 'spamd'
resolves to four IP addresses: 192.168.1.2, 192.168.1.3, 192.168.1.4,
192.168.1.5. After -d resolve that hostname into those IPs, they are
passed to the --randomize function, and one of those four is picked. The
first one doesn't respond, so then it tries another one, that fails, it
then tries a final one and then gives up (not trying all four)?

Did I read this right? I appreciate any second eyes on my interpretation
here. 

thanks,
micah







Re: How do I filter out phishing email?

2010-04-21 Thread Micah Anderson
Jari Fredriksson ja...@iki.fi writes:

 On 14.4.2010 18:57, yongke wrote:
 
 Well, we send emails on behalf of clients, and so we are trying catch
 phishing spam before they are sent out.  Since the email aren't sent yet, we
 had to generate a mock email for SA.  The header in the example is what we
 THINK the headers will be when they are actually sent out.
 
 When you tried it with your SA, I assume you didn't change any headers?  If
 that's the case, then it should still work.  I guess I didn't setup SA
 correctly? 
 

 I did not change anything. And I think I have pretty default scores on
 the rules.

 I have following rule sets in my channels:


 90_2tld.cf.sare.sa-update.dostech.net

In a previous thread[0], it was mentioned that you should not be using the
above channel (or 90_3tld.cf) because these files have been merged into
3.3.1 and are released as 20_aux_tlds.cf

micah


0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/127703 



Re: dcc: [26896] terminated: exit 241

2010-04-21 Thread Micah Anderson
Michael Scheidell scheid...@secnap.net writes:

 On 4/15/10 5:35 PM, Micah Anderson wrote:
 M
 The Distributed Checksum Clearinghouse source carries a license that is
 free to organizations that do not sell filtering devices or services
 except to their own users and that participate in the global DCC
 network. . . you may not redistribute modified, fixed, or improved
 versions of the source or binaries. You also can't call it your own or
 blame anyone for the results of using it.

 Which seems silly for debian to remove it, since many of the
 blacklists in SA are by default, licensed similar (free for non
 commercial use, paid if  xxx queries).  maybe debian should look
 through and remove ALL 'dual licensed' software, and when you install
 SA from the RPM's, disable the dual licensed RBL's.

You misunderstand Debian's role and license guidelines. Debian is a
software distributor, and as such it is not silly for Debian to stop
distributing software (ie. dcc) when distributing that software violates
its rules. The blacklists enabled in SA by default are not software,
they are simply hostnames that the Spamassassin software
uses. Configured hostnames are not distribution restricted, and arguably
not even 'software'. There is no software distribution restriction
involved in having those blacklists enabled in SA that violates Debian's
software distribution terms. The software that is distributed is
Spamassassin, which has a fully compliant Debian software distribution
license, not the blacklists that are enabled by default in Spamassassin.

The blacklists do have a restricted use license, but that is something
else altogether.

The software 'dcc', is software, and with it carries a license which
restricts its distribution, and thus Debian, as a software distributor,
has to make decisions based on its own policy, if it is willing to
accept such a distribution restriction. Debian has the DFSG, which is
its guidelines for what is acceptable for distribution, and the license
that the software 'dcc' carries does not satisfy those criteria.

 Or, hey, lets pretend the people installing debian are smart enough to
 be able to make up their own mind if they fit the free license model.

People are free to do that, Debian wont distribute it for those people,
but people are free to put whatever they like on their systems.

 it IS a good service, and SA 3.3x supports the reputation query
 directly now in the commercial license.
 Some things to understand,  (normal language vs legal talk)

I believe it is a good service. If I could get updated software, with
security upgrades, from Debian, I would use it.

micah




Re: dcc: [26896] terminated: exit 241

2010-04-15 Thread Micah Anderson
Michael Scheidell scheid...@secnap.net writes:

 On 4/12/10 4:55 PM, Micah Anderson wrote:
 I'm getting a lot of these log entries ever since I've upgraded:

 Apr  9 22:31:14 spamd2 spamd[2774]: dcc: [26896] terminated: exit 241


 what version of dcc are you running?

This is version '1.2.74-4' from Debian... but now looking closer, it
seems as if dcc was removed after Debian Etch. It seems that it was
removed because the upstream authors changed its license to non-free
(according to Debian's DFSG) in version 1.30. This also means that it
has not been available in Ubuntu either since Dapper.


The Distributed Checksum Clearinghouse source carries a license that is
free to organizations that do not sell filtering devices or services
except to their own users and that participate in the global DCC
network. . . you may not redistribute modified, fixed, or improved
versions of the source or binaries. You also can't call it your own or
blame anyone for the results of using it.

So I guess I just will remove dcc, that is a shame, it seems like a good
service.


 what did you upgrade?

Sorry, I upgraded from Debian etch to Debian Lenny, along with that came
an upgrade to spamassassin.

micah



-- 
It is no measure of health to be well adjusted to a profoundly sick society. 
- J Krishnamurti 



Re: New log errors on upgrading

2010-04-15 Thread Micah Anderson
Mark Martinec mark.martinec...@ijs.si writes:

 More new errors that I am getting from an upgrade to spamassassin 3.3:

 3.3.0 ?

Good question... indeed the version is 3.3.0.

 Use of uninitialized value $start_time in addition (+) at
 /usr/sbin/spamd line 1382, GEN2073

 That was fixed in 3.3.1 .

Great, I didn't see that in the changelog, but I'm sure it was. I will
update before I bug you further about these! :)

 and also the following:
 
 spf: lookup failed: Can't locate object method new_from_string via
 package Mail::SPF::Mech::All at /usr/share/perl5/Mail/SPF/Record.pm
 line 227.
 
 I'm using libmail-spf-perl version: 2.005-1
 
 Might this be fixed in a newer perl version?

 No idea. Try Mail-SPF-v2.007, the 2.005 is three years old.

I am now running v2.007 to see if that fixes it, I suspect it will. If
it does I will make sure the debian package gets that noted so others
wont run into this.

thanks for your answers,
micah



dcc: [26896] terminated: exit 241

2010-04-12 Thread Micah Anderson

I'm getting a lot of these log entries ever since I've upgraded:

Apr  9 22:31:14 spamd2 spamd[2774]: dcc: [26896] terminated: exit 241

Obviously this is related to dcc, but I am not finding anything about
what 'exit 241' is, and how I can adjust things so I no longer get them
(or maybe they are normal and I need to start ignoring them?)

Does anyone have a clue about these? thanks!
micah


-- 
It is no measure of health to be well adjusted to a profoundly sick society. 
- J Krishnamurti 



New log errors on upgrading

2010-04-12 Thread Micah Anderson

More new errors that I am getting from an upgrade to spamassassin 3.3:

Use of uninitialized value $start_time in addition (+) at
/usr/sbin/spamd line 1382, GEN2073

and also the following:

spf: lookup failed: Can't locate object method new_from_string via
package Mail::SPF::Mech::All at /usr/share/perl5/Mail/SPF/Record.pm
line 227.

I'm using libmail-spf-perl version: 2.005-1

Might this be fixed in a newer perl version?

Micah




meaning of child cleanup

2010-04-01 Thread Micah Anderson

Since upgrading to the new spamassassin, I'm seeing the following two
log entries related to cleanup of child PIDs:

1. Apr  1 08:26:38 spamd2 spamd[396]: spamd: handled cleanup of child
pid [31720] due to SIGCHLD: INTERRUPTED, signal 2 (0002)

2. Mar 28 18:00:15 spamd2 spamd[17562]: spamd: handled cleanup of child
pid [391] due to SIGCHLD: exit 0

If I were to guess, the second one seems to be when things are acting
right, the first one seems problematic, and I'm trying to determine what
causes it. The logs for that process aren't particularly interesting,
they are just like any others, with various prefork childstate entries:

Mar 28 06:25:35 spamd2 spamd[396]: prefork: child states: II
Mar 28 06:25:36 spamd2 spamd[396]: prefork: child states: IB

but nothing particularly egregious looking. 

Can someone help me clarify what causes an INTERRUPTED signal? Should I
worry about it? Should I ignore it in logcheck?

thanks!
micah



-- 
It is no measure of health to be well adjusted to a profoundly sick society. 
- J Krishnamurti 



Re: Botnet plugin still relevant?

2010-03-22 Thread micah anderson
On Wed, 17 Mar 2010 14:45:53 -0700, John Rudd jr...@ucsc.edu wrote:
 Some people need to put in some alternate values for DNS timeouts, but
 if you've got a local caching name server, you typically don't need
 that.
 
 There aren't any actual bugs in it that I'm aware of, so I haven't
 released a new version.  As I see it, there isn't a need (and that is
 a somewhat controversial statement with some of the more opinionated
 people around here).
 
 I do still see some things that get nailed by it ... but there's lots
 of those same hosts that get caught by the Spamhaus PBL.  So, it kind
 of depends on what you're doing with PBL and/or Zen, as to whether or
 not you need Botnet.   But, there are still plenty of things coming
 from that class of hosts, so if you don't use one, I'd definitely
 recommend using the other.

Yeah, I've been having problems recently which I think are related to me
using both Zen/PBL along with the Botnet plugin weighted to score level
5, even if I were to have it lower at 3 it would still be too much.

Many users are complaining and when I finally get some useful messages
with headers to analyze I am finding something like the following:

X-Spam-Report: 
*  3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
*  [213.6.61.151 listed in zen.dnsbl]
*  1.0 RCVD_IN_BRBL RBL: Received via relay listed in Barracuda RBL
*  [213.6.61.151 listed in b.barracudacentral.org]
*  1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT
*  [213.6.61.151 listed in bb.barracudacentral.org]
*  0.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP 
address
*  [213.6.61.151 listed in dnsbl.sorbs.net]
*  0.8 SPF_NEUTRAL SPF: sender does not match SPF record (neutral)
*  5.0 BOTNET Relay might be a spambot or virusbot
*  
[botnet0.8,ip=213.6.61.151,rdns=a61-151.adsl.paltel.net,maildomain=palnet.com,client,ipinhostname,clientwords]
*  1.0 RDNS_DYNAMIC Delivered to internal network by host with
*  dynamic-looking rDNS

This brings it over the 8 threshold, although it is a legitimate email
From a user who has unfortunately been saddled with a dynamic IP that
previously was used by a spammer. No amount of explanation to these
users about this is going to assuage their feelings, and there isn't
really anything that can be done by them. They can complain to their ISP
I guess, they could also find another ISP, but these are not
particularly productive steps towards resolving this problem.

I'm interested in other suggestions that I offer people as alternatives,
but until then I think I may need to remove Botnet from the equation. 

micah


pgpOYcMscG6vB.pgp
Description: PGP signature


Re: Low scores

2010-03-17 Thread micah anderson
On Fri, 12 Mar 2010 15:44:21 -1000, Julian Yap julianok...@gmail.com wrote:
 On Thu, Mar 11, 2010 at 7:58 AM, micah anderson mi...@riseup.net wrote:
 
  On Tue, 9 Mar 2010 11:56:56 -1000, Julian Yap julianok...@gmail.com
  wrote:
   Just wanted to add that this particular line is incorrect:
   meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||
   USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||
   USER_IN_BLACKLIST)
  
   That will have Blacklisted email filters classified as ham.
 
  Interesting, thanks for the reply from an old thread.
 
  I got this list from:
  http://wiki.apache.org/spamassassin/ShortcircuitingRuleset which seems
  to be something that Justin Mason put together. I have CC'd Justin on
  this email.


  Which has the difference of also including SUBJECT_IN_WHITELIST, and
  SUBJECT_IN_BLACKLIST... but now I am wondering if this is the right
  thing to do.

I actually removed the SUBJECT_IN rules as this makes it so any
individual user who can whitelist/blacklist a subject can shortcircuit
for everyone.

  I'm very curious about resolving this, it does seem like a bad setup and
  it is being taken as gospel from the spamassassin wiki, but perhaps
  there is something that we are not understanding here that Justin can
  clarify?
 
 
 I'm pretty sure yours is wrong.  You need to take out the the rules which
 apply to Spam in spam short circuiting.

I agree with you, its amazing that this has been wrong on the wiki since
2007! I went to go update the wiki today, and found that you had just
done it. Thanks for doing that!

Micah


pgpBuehAyiHwT.pgp
Description: PGP signature


Botnet plugin still relevant?

2010-03-17 Thread Micah Anderson

Hi,

I've been using the Botnet plugin version 0.8 for some time now, and the
plugin itself has been around since 2003 or so. I'm just curious to test
the waters and see what other's think about the relevance in 2010 of
this plugin. Does it still contribute in positive ways to your setup? I
do not see a newer version of the plugin since 2007, is there a newer
version than 0.8?

Did you do any configuration of it beyond its defaults? Does the
proliferation of individuals on dynamically assigned cable/dsl modems
cause the plugin to misfire too often?

I've had a number of complaints somewhat recently about the last point,
and I don't have much of a solution to the situation where a user is
stuck with the dynamically assigned IP that previously a spammer was
occupying, except to explain that is the situation and eventually it
will change.

thanks for any thoughts or experiences with this plugin!

micah

ps. I notice it is not listed on
http://wiki.apache.org/spamassassin/CustomPlugins and I wonder the
reason why?



sa-update channels

2010-03-17 Thread Micah Anderson

I'm trying to find out what the current state of the art is for plugins
and channel updates.

What are people using now days? I just reviewed my plugins and ended up
deleting Freemail because it has been pulled into Spamassassin core;
removed the postcards plugin because the original source is now 404 and
it is a very old rule; removed the iXhash plugin because it was spewing
a lot of perl errors and I was not seeing a lot of hits.

I've still got 20_saught_fraud, Botnet, and PDFinfo... but nothing
beyond that. 

For channels I've been using:

updates.spamassassin.org
sought.rules.yerp.org
saupdates.openprotect.com 

But I wonder if the last two are still relevant, or if there are other
lists to use instead?

Thanks for any advice,
micah




Re: Low scores

2010-03-11 Thread micah anderson
On Tue, 9 Mar 2010 11:56:56 -1000, Julian Yap julianok...@gmail.com wrote:
 Just wanted to add that this particular line is incorrect:
 meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||
 USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||
 USER_IN_BLACKLIST)
 
 That will have Blacklisted email filters classified as ham.

Interesting, thanks for the reply from an old thread. 

I got this list from:
http://wiki.apache.org/spamassassin/ShortcircuitingRuleset which seems
to be something that Justin Mason put together. I have CC'd Justin on
this email.

This list specifies that this was a good shortcircuit rule to have first
because these are non-network-based whitelists, locally-generated
messages, messages via a trusted relay chain, simple non-network based
blacklists.

Mine now reads:

meta SC_HAM 
(USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||SUBJECT_IN_WHITELIST||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST||SUBJECT_IN_BLACKLIST)
priority SC_HAM -1000
shortcircuit SC_HAM ham
score SC_HAM -20

Which has the difference of also including SUBJECT_IN_WHITELIST, and
SUBJECT_IN_BLACKLIST... but now I am wondering if this is the right
thing to do.

I'm very curious about resolving this, it does seem like a bad setup and
it is being taken as gospel from the spamassassin wiki, but perhaps
there is something that we are not understanding here that Justin can
clarify?

micah


pgpPzA62WWh7c.pgp
Description: PGP signature


Re: two databases

2009-06-05 Thread Micah Anderson
Michael Grant michael.gr...@gmail.com writes:

 I did not realize one could store the bayes scores in sql.

 So I'd store the bayes scores on a third server and let both mxes use
 the same database.

I did this, but my bayes in mysql and pointed two different spamd
machines at it, but I had severe problems that I could not resolve. I
posted to the list[0] about the problems.

The basic problem was that as soon as I fired up the second server it
immediately starts blocking on the bayes work. Average scantimes go from
1-2 seconds up to 35+ and the max children get eaten up by blocking on
the bayes work to the point where its pointless because too many
processes are blocked. Disabling the bayes_sql stuff on one of the
machines dropped the scantimes back to their expected average of 1-2
seconds (but of course none of the BAYES tests will fire and
autolearning fails).

My mysql server is its own machine, it was local to the first spamd
(local LAN) and remote to the second (over the net). I eliminated any
hostname lookup problems, obviously couldn't eliminate network latency,
but that shouldn't have caused such a severe result. I'm running with
InnoDB tables, so I shouldn't have any row-level locking issues... in
any case I might have had some issues because my MySQL database needed
to be optimized, but I was not able to determine how and now I just run
one of the spamd's without bayes, which is not too bad because my bayes
database seems to be totally worthless at the moment. :P

micah

0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673



Bayes learning trusted networks mailing list email

2009-06-05 Thread Micah Anderson

I get a significant amount of spam that comes through mailing lists that
I am legitimately subscribed to, either they are the administration
emails asking me if I want to approve the email or not, or they are
messages that make it through the list.

These messages are either hitting ALL_TRUSTED, because they come from
mailing lists on my networks, or are tagged with a clear
untrusted-relays list. In otherwords, I've got my trusted_networks setup
so that SA knows about networks that I trust to be sending legitimate
email (they are not spam originators), but obviously spam gets through,
but the spam comes from hops previous to these networks. If I understand
things properly, because I've got these setup in my trusted_networks,
then these previous hops will be checked in RBLs, so the spam is more
detectable. For example, the debian servers do send some spam to me, but
the Received: headers in the emails are correct, so if the server's
address is in trusted_networks, then SA will look up the address debian
got the email from in RBLs.  

What I am unsure of is if I am poisoning my bayes by reporting these
messages that make it through as spam. Should I be just deleting them?
The tokens that are legitimate that will end up as collateral damage are
going to be the list footers, the list administration messages, and
potentially other pieces.

I'm hoping I can identify why my bayes database is so bad (it thinks
everything is BAYES_00 now), and if this is why I will want to change my
training behavior.

thanks,
micah



FreeMail.bl installation instructions

2009-06-05 Thread Micah Anderson

The FreeMail.pm installation instructions are a little thin:

### Install:
#
# Please add loadplugin to init.pre (so it's loaded before cf files!):
#
# loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm

My understanding, and please correct me if I am wrong, is that you
actually need to do this:

# 1. Install FreeMail.pm in /etc/spamassassin
#
# 2. Add the following loadplugin to init.pre:
#
# loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm
#
# 2. Download http://sa.hege.li/FreeMail.cf to /etc/spamassassin
#
# 3. Download http://sa.hege.li/freemail_domains.cf to /etc/spamassassin

I knew about the FreeMail.cf because I've used SA plugins before, but I
had no idea about the domain list. Might be good to make these
instructions a little more explicit, so that others will also win.

Micah



Re: two databases

2009-06-05 Thread Micah Anderson
* Michael Grant michael.gr...@gmail.com [2009-06-05 10:26-0400]:
 On Fri, Jun 5, 2009 at 16:08, Micah Anderson mi...@riseup.net wrote:
  Michael Grant michael.gr...@gmail.com writes:
 
  I did not realize one could store the bayes scores in sql.
 
  So I'd store the bayes scores on a third server and let both mxes use
  the same database.
 
  I did this, but my bayes in mysql and pointed two different spamd
  machines at it, but I had severe problems that I could not resolve. I
  posted to the list[0] about the problems.
 
  The basic problem was that as soon as I fired up the second server it
  immediately starts blocking on the bayes work. Average scantimes go from
  1-2 seconds up to 35+ and the max children get eaten up by blocking on
  the bayes work to the point where its pointless because too many
  processes are blocked. Disabling the bayes_sql stuff on one of the
  machines dropped the scantimes back to their expected average of 1-2
  seconds (but of course none of the BAYES tests will fire and
  autolearning fails).
 
  My mysql server is its own machine, it was local to the first spamd
  (local LAN) and remote to the second (over the net). I eliminated any
  hostname lookup problems, obviously couldn't eliminate network latency,
  but that shouldn't have caused such a severe result. I'm running with
  InnoDB tables, so I shouldn't have any row-level locking issues... in
  any case I might have had some issues because my MySQL database needed
  to be optimized, but I was not able to determine how and now I just run
  one of the spamd's without bayes, which is not too bad because my bayes
  database seems to be totally worthless at the moment. :P
 
  micah
 
  0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673
 
 
 
 Wow.  I did not get around to setting this up yet.  But on the MySQL
 front, did you try enabling the query cache by adding this to the
 mysql command line?
 
 --maximum-query_cache_size=1M

I presume this setting is the same in my.cnf:
query_cache_limit   = 1048576

I dont recall all the things I tried, but it seems worth trying again,
this time with a fresh approach. 

 Also, a tool I used a lot to help debug this sort of issue was mytop.

I've never had too much luck with mytop, but I have found the
tuning-primer.sh to work well: http://www.day32.com/MySQL/

micah


signature.asc
Description: Digital signature


Re: bayes training doesn't seem to have any affect

2009-05-05 Thread Micah Anderson
Adam Katz antis...@khopis.com writes:

 Micah Anderson wrote:
 Also, to see how experienced your Bayes knowledge is - use $ sa-leanrn
 --dump magic
 
 This shows me that I have no idea what these magic things are :) Does
 this tell you anything useful? 
 
 0.000  0  3  0  non-token data: bayes db version
 0.000  06798614  0  non-token data: nspam
 0.000  0   19136753  0  non-token data: nham
 0.000  0 1063157695  0  non-token data: ntokens
 0.000  0 1241301616  0  non-token data: oldest atime
 0.000  0 1241416889  0  non-token data: newest atime
 0.000  0  0  0  non-token data: last journal sync 
 atime
 0.000  0 1241344830  0  non-token data: last expiry atime
 0.000  0  43200  0  non-token data: last expire atime 
 delta
 0.000  0 496607  0  non-token data: last expire 
 reduction count

 Eh?  Last journal sync atime is Jan 1 1970?
 Try running:   sa-learn --sync

Doesn't seem to change the 'last journal sync atime' from 0.

 If that helps, put it in your nightly SpamAssassin cron job
 (and/or revisit your custom teaching scripts).

In fact, I've been running that from cron every night. 

I'm using a mysql DB and I've got the following set in my local.cf:

# We want to expire via cronjob, rather than having one of our spamd
# children do it. 
bayes_auto_expire  0

# no affect
bayes_learn_to_journal 0

 A quick primer (since this doesn't really exist anywhere...):  The
 three zeroed columns are always zero.

 bayes db version is self-explanatory.
 nspam is the number of spam messages on record.  bayes needs 200.

Should be fine: 6798649

 nham is the number of ham messages on record.  bayes needs 200.

Also should be fine: 19160960

 ntokens is the number of 'words' noted in the system.

lots of tokens: 1065483803

 oldest atime is the oldest access time of the oldest token (I think).

I've got 1241474416 which would be Mon May  4 15:00:16 PDT 2009
which is just yesterday... that doesn't seem right that this would be
the oldest access time, especially for 1065483803 tokens!

 the rest of the times should be self-explanatory.
 last expire reduction count is the number of tokens removed from the
 last expiration run (I think).

Ok, that seems to be counting, so something is being expired:

0.000  0 840628  0  non-token data: last expire reduction 
count

This is all very interesting info, I appreciate the
explanation. However, my original question still stands.

micah



Re: bayes training doesn't seem to have any affect

2009-05-05 Thread Micah Anderson
Karsten Bräckelmann guent...@rudersport.de writes:

 This shows me that I have no idea what these magic things are :) Does
 this tell you anything useful? 

 0.000  06798614  0  non-token data: nspam
 0.000  0   19136753  0  non-token data: nham

 That's quite a lot of ham compared to the spam... Does that really
 reflect your mail instream?

I would suspect not, since we probably get more spam than
non-spam. However, perhaps the spamassassin autolearning caused this?

Perhaps the DB is so out of whack, I should just reset it from scratch
and try it again. Its a lot of data to loose and I am not sure exactly
the right way to do that... so I'd be somewhat reluctant to do so. Might
be better if I could clean it out some.

 19 M hams learned and an SQL Bayes storage backend. Site wide. Do you
 trust your users? Any chance some of them are training badly? At worst

No, I don't trust my users. In fact because of that we moved from doing
site-wide training to selected users who can demonstrate that they
understand how to train. Perhaps these numbers are legacy from before we
switched to this method.

thanks,
micah



Re: bayes training doesn't seem to have any affect

2009-05-04 Thread Micah Anderson
Dave Walker davewal...@ubuntu.com writes:

 Micah Anderson wrote:
 I got a phish message that was understood by bayes as:

 -2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
 [score: 0.]

 So I traiend with spamc -L spam but even after that I am still getting
 BAYES_00. Shouldn't the training have bumped that score up?

 Thanks for any info,

 In order for Bayes to actually make a difference, it needs plenty of
 training.  It's disabled by default in most installs - unless you have
 at least 200 of both spam and ham taught.  This needs to be done
 manually, unless you have autolearn enabled.

Yeah, I've been running this bayes db for a couple years now, so I am
sure I've passed the 200 mark :)

I'm wondering if my bayes DB is too poisoned now and maybe needs to be
reset?

 To see what is really going on run $ spamassassin -D 
 /path/to/the/email  /dev/null, and see if you can learn anything as to
 why it's not working as expected.

Indeed, when I do this, I find these bayes related log entries:

[13244] dbg: bayes: corpus size: nspam = 6798614, nham = 19136735
[13244] dbg: bayes: tok_get_all: token count: 175
[13244] dbg: bayes: score = 0

 Also, to see how experienced your Bayes knowledge is - use $ sa-leanrn
 --dump magic

This shows me that I have no idea what these magic things are :) Does
this tell you anything useful? 

0.000  0  3  0  non-token data: bayes db version
0.000  06798614  0  non-token data: nspam
0.000  0   19136753  0  non-token data: nham
0.000  0 1063157695  0  non-token data: ntokens
0.000  0 1241301616  0  non-token data: oldest atime
0.000  0 1241416889  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync atime
0.000  0 1241344830  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire atime delta
0.000  0 496607  0  non-token data: last expire reduction 
count

micah



Local rules math problem

2009-05-02 Thread Micah Anderson

I've got a couple custom meta rules, that don't seem to be applying how
I expected them to.

When I run a message that should hit on these rules I get:

[14109] dbg: rules: ran one_line_body rule __LOCAL_PHISHER_USERNAME == got 
hit: Username:
[14109] dbg: rules: ran one_line_body rule __LOCAL_PHISHER_PASSWORD == got 
hit: Password:
[14109] dbg: rules: ran header rule __LOCAL_REPLYTO_NOTUS == got hit: 
negative match

Which results in the rule: LOCAL_PHISH_FROMREPLY getting set with score
0.1, which is great, that is what I expect. However there is a rule that
builds on that which doesn't fire, specifically the
LOCAL_PHISHER_USERPASS rule which does the math to add the
LOCAL_PHISH_FROM_REPLY to the __LOCAL_PHISHER_PASSWORD and
__LOCAL_PHISHER_USERNAME to get over a score of 1, but even though those
rules fire, the math addition doesn't seem to get over 1 and thus the
meta rule doesn't fire...

what am I missing here?

body __LOCAL_PHISHER_PASSWORD   /Password(.{0,10}\([\s\.\*\_]+\)|( 
.{0,4})?:)/i

header __LOCAL_RETURN_PATH_ISUS Return-Path =~ /\...@ourdomain\.net/
header __LOCAL_FROM_ISUSFrom =~ /\...@ourdomain\.net/
header __LOCAL_REPLYTO_EXISTS   exists:Reply-To
header __LOCAL_REPLYTO_NOTUSReply-to !~ /\...@ourdomain\.net/
meta LOCAL_PHISH_FROMREPLY(( __LOCAL_RETURN_PATH_ISUS || 
__LOCAL_FROM_ISUS )  ( __LOCAL_REPLYTO_EXISTS  __LOCAL_REPLYTO_NOTUS ))
score LOCAL_PHISH_FROMREPLY 0.1

body __LOCAL_PHISHER_USERNAME   
/User(\s)?(n|N)ame(.{0,10}\([\s\.\*\_]+\)|( .{0,4})?:)/i
meta LOCAL_PHISHER_USERPASS ((( 0.2 * __LOCAL_PHISHER_USERNAME ) + 
( 0.4 * __LOCAL_PHISHER_PASSWORD ) + ( 0.4 * LOCAL_PHISH_FROMREPLY))  1)
describe LOCAL_PHISHER_USERPASS Typical phish: asks for username and 
password, we dont do that
score LOCAL_PHISHER_USERPASS10.5

thanks,
micah



bayes training doesn't seem to have any affect

2009-05-02 Thread Micah Anderson

I got a phish message that was understood by bayes as:

-2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
[score: 0.]

So I traiend with spamc -L spam but even after that I am still getting
BAYES_00. Shouldn't the training have bumped that score up?

Thanks for any info,
micah



Re: hostkarma junkemailfilter

2008-11-20 Thread Micah Anderson
Benny Pedersen [EMAIL PROTECTED] writes:

 On Tue, November 18, 2008 22:16, Henrik K wrote:

 postfwd and trusted_networks msa_networks is what i do use here, then minimal
 dns lookups is needed olso, facebook have random helo so need to be
 whitelisted hard in postfwd and in spamassassin, i have contacted facebook
 about it, but the problem might still be there

 i like your postfwd config

Where is this postfwd config you refer to? I would like to see this.

micah



Re: Funds / Award release scams poor scoring

2008-11-18 Thread Micah Anderson
mouss [EMAIL PROTECTED] writes:

 Henrik K wrote:
 On Mon, Nov 10, 2008 at 08:49:00AM +0100, mouss wrote:
 Henrik K wrote:
 On Mon, Nov 10, 2008 at 12:25:42PM +0530, ram wrote:
 The number of DNSWL_LOW and DNSWL_MED misfires have gone up especially
 in last two days. Even Marc's JMF_W misfires. 

 What it means is these are good mailservers who normally relay ham and
 have some weak links ( weak password etc ) that just got exposed
 What method are they using to relay through master.debian.org? I can't
 figure out how these mail from yahoo etc can end up relaying through there
 in this case.

 they simply post to the list. if the list is not open, they
 susbcribe  first.

 Ah right, I was looking it a bit wrong.. it's silly that the original
 recipient is nowhere to be found in headers.


 Now that you say it, I don't see any list headers! so it looks like a
 bug somewhere...

No, I receive email at [EMAIL PROTECTED], so it doesn't need to go
through a debian list to get to me.

micah



Distributing the processing load

2008-11-18 Thread Micah Anderson

Our poor spamassassin machine is not able to keep up with the mail
load. We are constantly getting prefork: server reached --max-children
setting, consider raising it errors, and our max-children are already
set at the max that this machine can handle (50). 

Since we are using spamc/spamd I figured that it would be trivial to
setup a second spamd on another machine and then the load could be
split. I accomplished this by setting my mailfilter to use '-d spamd'
and configured the spamd host in my DNS to be a round-robin between the
two participating IPs. However, this seems to only work as a
'fail-over', and not a load-balancer, as the spamc man page says:

   If host resolves to multiple addresses, then spamc will
   fail-over to the other addresses, if the first one cannot be
   connected to.  It will first try all addresses of one host
   before it tries the next one in the list.  

In fact, looking at my logs, one of the spamd machines is only
processing requests for one of the three mail servers, the other
requests are going to the other spamd. Likely this is because they all
looked up the address, and then have it cached?

I am using -x, and the man page says that the fail-over behaviour is
incompatible with -x; if that switch is used, fail-over will not occur.
Thats fine, I'm not particularly interested in fail-over, but rather
load-balancing, is there any way to do this without having to setup my
different mail servers to query different spamds?

Thanks for any ideas,
micah



hostkarma junkemailfilter

2008-11-16 Thread Micah Anderson

Over at another post about Phishing[0], Brent suggested setting up
hostkarma.junkemailfilter to my RBL list, which I have done... However
it seems to hit a lot of spams giving them a -5 scoring. I've either got
this configured backwards, or this isn't working very well because it
whitelists too much actual spam. I copied the examples[1] directly from
their wiki...

Does anyone have any experience with these? I'm removing the JMF-WHITE
because its not helping at all, but I wonder if others have experience?

header __RCVD_IN_JMF 
eval:check_rbl('JMF-lastexternal','hostkarma.junkemailfilter.com.')
describe __RCVD_IN_JMF Sender listed in JunkEmailFilter
tflags __RCVD_IN_JMF net
 
header RCVD_IN_JMF_W eval:check_rbl_sub('JMF-lastexternal', '127.0.0.1')
describe RCVD_IN_JMF_W Sender listed in JMF-WHITE
tflags RCVD_IN_JMF_W net nice
score RCVD_IN_JMF_W -5
 
header RCVD_IN_JMF_BL eval:check_rbl_sub('JMF-lastexternal', '127.0.0.2')
describe RCVD_IN_JMF_BL Sender listed in JMF-BLACK
tflags RCVD_IN_JMF_BL net
score RCVD_IN_JMF_BL 3.0
 
header RCVD_IN_JMF_BR eval:check_rbl_sub('JMF-lastexternal', '127.0.0.4')
describe RCVD_IN_JMF_BR Sender listed in JMF-BROWN
tflags RCVD_IN_JMF_BR net
score RCVD_IN_JMF_BR 1.0

0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113625
1. http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists

micah



Re: Funds / Award release scams poor scoring

2008-11-12 Thread Micah Anderson
* Justin Mason [EMAIL PROTECTED] [2008-11-12 05:20-0500]:
 
 John Hardin writes:
  On Sun, 9 Nov 2008, Micah Anderson wrote:
  
   Does anyone have any rules to catch these, or suggestions of scores to
   tweak to make these hit better?  I am running clamav-milter with the
   sanesecurity add-ons, but these are still making it through.
  
  Check out the sought-fraud ruleset.
  
  http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf
  
  (I don't know if it's in sa-update yet - Justin?)
 
 That's in sa-update since last night; it's now bundled in the main
 sought ruleset channel, as well.

Which channels specifically? Do you mean to say that it is in both:

updates.spamassassin.org
sought.rules.yerp.org

now?

Thanks!
Micah


signature.asc
Description: Digital signature


Re: Overriding user prefs in local.cf

2008-11-12 Thread Micah Anderson
Matt Kettler [EMAIL PROTECTED] writes:

 Micah Anderson wrote:
 I set some 'add_header' options in my global local.cf and could not
 figure out why they were not being applied. It turns out that because I
 am using SQL user_prefs, any add_header lines I put in local.cf are just
 ignored (even though I have no global or individual add_header lines
 configured in my sql table).
   
 That's strange. They should only be ignored if the user prefs contains a
 clear_headers, or if it has an add_header for the exact same header.

 Does your user_prefs or global contain a clear_headers command?

No, thats why I was confused as well. My global prefs don't exist in SQL
at all, and my user prefs do not contain either an add_headers or
clear_headers command. 

 Is there any documentation that details which options that I might
 configure in local.cf that are overridden by user prefs simply existing?
   
 There are none that are cleared simply by the merits of user_prefs
 existing. An empty prefs is the same as no prefs.

Ok, thats how I expected things to work, clearly something else is going
on then.

thanks,
micah



Hard money conference spam

2008-11-11 Thread Micah Anderson

I'm getting probably 4-5 of these a day, the messages vary, so they
aren't the same, but they aren't firing on any specific rules related to
their 'hard money conference/webinar/seminar' etc. Does anyone have any
customized rules for these? I've been training my bayes on them, and its
starting to pick them up (at BAYES_40 now), but it could use some more
specific rules:


Content analysis details:   (5.1 points, 8.0 required)

 pts rule name  description
 -- --
 0.0 FH_XMAIL_RND_833   Special X-Mailer Version
-0.2 BAYES_40   BODY: Bayesian spam probability is 20 to 40%
[score: 0.2305]
 2.2 DCC_CHECK  Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
 1.0 RCVD_IN_BRBL   RBL: Received via relay listed in Barracuda RBL
[66.29.0.197 listed in b.barracudacentral.org]
 1.0 RCVD_IN_JMF_BR RBL: Sender listed in JMF-BROWN
 [66.29.0.197 listed in hostkarma.junkemailfilter.com]
 1.1 URIBL_RHS_DOB  Contains an URI of a new domain (Day Old Bread)
[URIs: hardmoney-event.com]

Return-Path: [EMAIL PROTECTED]
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd1.riseup.net
X-Spam-Level: ***
X-Spam-Status: No, score=3.9 required=5.0 tests=FH_XMAIL_RND_833,
RCVD_IN_JMF_BR,URIBL_BLACK,URIBL_RHS_DOB autolearn=no version=3.2.5
Delivered-To: [EMAIL PROTECTED]
Received: from mx1.riseup.net (egret-vpn.riseup.net [10.8.0.3])
by cormorant.riseup.net (Postfix) with ESMTP id 602201C38CA8
for [EMAIL PROTECTED]; Mon, 10 Nov 2008 23:23:26 -0800 (PST)
Received: from ip197.rutcommercial.com (ip197.rutcommercial.com [66.29.0.197])
by mx1.riseup.net (Postfix) with SMTP id 10F4757002B
for [EMAIL PROTECTED]; Mon, 10 Nov 2008 23:23:10 -0800 (PST)
Date: Tue, 11 Nov 2008 02:10:03 -0500
From: Larry Rivera [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: thursday's hard money 
MIME-Version: 1.0
X-Mailer: oer v8.3.3.1000.10001079
Reply-To: [EMAIL PROTECTED]
Message-Id: [EMAIL PROTECTED]
Content-Type: text/plain;
charset=iso-8859-1
X-Virus-Scanned: ClamAV 0.94/8607/Mon Nov 10 21:55:28 2008 on mx1.riseup.net
X-Virus-Status: Clean
Content-Length: 528

Hard Money National Event takes place on November 13th.

follow the following steps to register:

1. Visit our website  http://hardmoney-event.com
2. click attend a seminar and register for the event.
3. We will confirm your registration the same day.
4. call us at 858-736-7788 for additional information.

If you wish to opt out of future messages, please go to
http://hardmoney-event.com/uns/ or, send us a letter to PBMSII, 5580 la jolla 
blvd #153 La Jolla, Ca 92037



. 








Re: SURBL Usage Policy change

2008-11-11 Thread Micah Anderson
Jeff Chan [EMAIL PROTECTED] writes:

I think that SURBL is a valuable service, and I understand how it is
difficult to maintain such a service without resources.

 The funding is, by design, very moderate and will provide much needed
 support to sustain this initiative.

However, I believe that for non-profit organizations the funding model
is not moderate at all. Perhaps this is because of the unfortunate
decision to put non-profits into the same category as governments, which
typically are able to bring in much larger amounts of money. Or perhaps
it is a short-sighted view that non-profits all fall into the same
category of large, well-funded non-profits. While there are some that do
have resources available to them, a large majority of non-profits are
deeply struggling with resources and honestly I cannot imagine any being
able to afford the subscription rates that are listed for
non-profits/governments. I'm on the board of directors and am an
executive for three different non-profit organizations, and although
they all would be eager to contribute to SURBL, none of them could
possibly meet the funding bar that has been set.

The SURBL FQS is great, and it is appreciated that you have thought of
small charitable/non-profits with low email volume. However, I think you
are missing that there are small charitable/non-profits that can do this
volume on a extremely tight budget.

Micah



Re: Hard money conference spam

2008-11-11 Thread Micah Anderson
Rob McEwen [EMAIL PROTECTED] writes:

 Micah,

 In addition to the barracuda RBL, this IP is also listed on ivmSIP
 (since 10/21/08) and ivmSIP/24

Can you provide me with the local.cf details to be able to add the
ivm RBLs?

 Additionally, the domain hardmoney-event DOT com is blacklisted on
 both ivmURI and URIBL.COM

 At the very least, you should add uribl.com to your filtering since that
 list is free. Scoring with URIBL for this would have easily put that
 message over the top for you.

I understood URIBL to be enabled by default in SA, and updated via
sa-update, in fact I've got:

/var/lib/spamassassin/3.002005/updates_spamassassin_org/25_uribl.cf

 SHORT ANSWER: Start using uribl.com's URI blacklist

Am I not using it already? Maybe I'm not, and the 25_uribl.cf doesn't
include it? If so, I would really like to know about this.

Thanks!
Micah



Freemail config: dup unknown type freemail_re, Regexp

2008-11-11 Thread Micah Anderson

I recently added the FreeMail plugin, and although it appears to be
working, when I start SpamAssassin, I receive this message in my log:

Nov 11 06:45:48 spamd2 spamd[29934]: config: dup unknown type freemail_re, 
Regexp

I've put the FreeMail.pm in /etc/spamassassin, and created FreeMail.cf
as described, and it appears like it is working, as I am seeing some
messages get tagged with it. 

Are the plugins that I am installing like this compilable regexps with
sa-compile? Or do they stand separately?

Thanks,
micah



Re: Checking for SPF DKIM Checks

2008-11-11 Thread Micah Anderson
mouss [EMAIL PROTECTED] writes:

 Francis Russell wrote:
   Even with the default DKIM scores, I finding I am getting spam that are
   DKIM_VERIFIED causing the score to dip below zero and let the message
   through, for example:
  
   http://micah.riseup.net/1
  
   that's spam relayed by a debian list. definitely a different beast...

 I interpret those headers as spam being sent to a Debian e-mail
 address, then forwarded to a personal address.

That is a correct interpretation. I get most of my spam this way.

 That's what I meant. Maybe I use the term relay too liberally?
 anyway, such spam is harder to stop unless you add the list relays to
 your trusted_networks.

This is something in SA that I have the hardest time understanding, the
trusted_networks and internal_networks settings. I've read all the posts
that try to clarify it and I still can't keep it straight :) 

How would adding a list relay to my trusted_networks actually make
stopping spam easier? Doesn't that make it a network that I should spend
less time doing SA processing, because I 'trust' it?

micah



Re: Barracuda RBL

2008-11-11 Thread Micah Anderson
Sujit Acharyya-Choudhury [EMAIL PROTECTED] writes:

 Thanks Henrik.  However, I am not using SVN 3.3 so the rule on its own
 will be useful.

I'm using:

# Add a rule to give barracude RBL a +1 score, this is a really good
# RBL, but we were having false-positives when using it to block at
# the SMTP level, so using it in a weighted spamassassin rule is
# better because we can benefit from it without being strict
header RCVD_IN_BRBL eval:check_rbl('brbl-lastexternal', 
'b.barracudacentral.org.', '127.0.0.2')
describe RCVD_IN_BRBL   Received via relay listed in Barracuda 
RBL
score RCVD_IN_BRBL  1.0
tflags RCVD_IN_BRBL net

micah



Overriding user prefs in local.cf

2008-11-11 Thread Micah Anderson

I set some 'add_header' options in my global local.cf and could not
figure out why they were not being applied. It turns out that because I
am using SQL user_prefs, any add_header lines I put in local.cf are just
ignored (even though I have no global or individual add_header lines
configured in my sql table).

Is there any documentation that details which options that I might
configure in local.cf that are overridden by user prefs simply existing?

I know I can set a @GLOBAL pref with these add_header lines if I wish,
and I can set them for my user, but I thought that by setting them in my
local.cf they would be honored globally as well, as certain other things
that are set there are honored globally. I'm not sure which are and
which are not.

micah




Re: Funds / Award release scams poor scoring

2008-11-10 Thread Micah Anderson
* Justin Mason [EMAIL PROTECTED] [2008-11-10 05:30-0500]:
 
 John Hardin writes:
  On Sun, 9 Nov 2008, Micah Anderson wrote:
   Does anyone have any rules to catch these, or suggestions of scores to
   tweak to make these hit better?  I am running clamav-milter with the
   sanesecurity add-ons, but these are still making it through.
  
  Check out the sought-fraud ruleset.
  
  http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf
  
  (I don't know if it's in sa-update yet - Justin?)
 
 I thought it was, but it seems I never made that part of the publishing
 process active ;)  I'll do that.

Does this mean it will show up in the regular updates.spamassassin.org
channel? Or is there another that I should follow?

Thanks!
micah


signature.asc
Description: Digital signature


Re: Phishing rules?

2008-11-09 Thread Micah Anderson
Sahil Tandon [EMAIL PROTECTED] writes:

 Joseph Brennan [EMAIL PROTECTED] wrote:

 We get some legitimate email from @live.com users.

 But they don't set a Reply-to header.  That's the test.

 But that wasn't his question; he asked whether any legitimate mail flows
 from live.com.  That was my answer. :)

You are technically correct, but Joseph's message made clear the
information that I was not aware of, which was quite helpful and
technically better.

Micah



Re: Checking for SPF DKIM Checks

2008-11-09 Thread Micah Anderson
Byung-Hee HWANG [EMAIL PROTECTED] writes:

 mouss wrote:
 [...]
 let's start with DKIM.
 
 do you have
 loadplugin Mail::SpamAssassin::Plugin::DKIM

 + i'm use with following rule ;;
 score DKIM_VERIFIED   -45.3

Even with the default DKIM scores, I finding I am getting spam that are
DKIM_VERIFIED causing the score to dip below zero and let the message
through, for example:

http://micah.riseup.net/1

I am thinking of actually increasing the score because of this.

micah



Re: Phishing rules?

2008-11-09 Thread Micah Anderson
Joseph Brennan [EMAIL PROTECTED] writes:

 /Dear .{0,12}(web ?mail|columbia\.edu)/i

 /Password.{0,10}\([\s\.\*\_]+\)/

 /you must reply to this email/i

 Reply-to =~ /[EMAIL PROTECTED]/

I created a meta-rule out of these (with a score of 8), and then ran
spamassassin -D  phish to see how it worked, it matched the metarule
flawlessly, but the phish ended up with only a 5.4 score due to BAYES_00
dragging it down. That was surprising to me, so I started to wonder if
my bayes DB was poisoned. 

I ran some stats, and the results seem to indicate a healthy bayes
database (unless I am reading this wrong)... A side note: its
interesting to note how only 9% of our email is spam, which seems low,
but maybe clamav-milter+rbls are blocking the remaining 40%?

Email:  2379392  Autolearn: 1075396  AvgScore:  -6.32  AvgScanTime:  5.96 sec
Spam:227816  Autolearn: 114079  AvgScore:  14.75  AvgScanTime:  4.23 sec
Ham:2151576  Autolearn: 961317  AvgScore:  -8.56  AvgScanTime:  6.15 sec

Time Spent Running SA:  3941.26 hours
Time Spent Processing Spam:  267.76 hours
Time Spent Processing Ham:  3673.50 hours

TOP SPAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1HTML_MESSAGE154522   54.03   67.83   52.57
   2BAYES_991345316.09   59.050.48
   3BOTNET  1336878.90   58.683.63
   4RDNS_NONE   102255   10.19   44.886.51
   5URIBL_JP_SURBL  98879 4.94   43.400.87
   6MIME_HTML_ONLY  87518 7.62   38.424.36
   7URIBL_OB_SURBL  76624 3.98   33.630.84
   8DCC_CHECK   74600 8.51   32.755.94
   9URIBL_AB_SURBL  59890 2.72   26.290.23
  10URIBL_SC_SURBL  53911 2.51   23.660.27
  11RCVD_IN_BL_SPAMCOP_NET  43120 2.43   18.930.68
  12URIBL_WS_SURBL  38251 1.79   16.790.21
  13URIBL_RHS_DOB   36565 2.17   16.050.70
  14BAYES_5035322 3.93   15.502.71
  15HTML_IMAGE_ONLY_16  33887 1.68   14.870.28
  16HTML_SHORT_LINK_IMG_2   33118 1.56   14.540.19
  17HTML_IMAGE_RATIO_02 32757 2.93   14.381.72
  18URIBL_SBL   30456 1.80   13.370.57
  19RAZOR2_CHECK27722 2.55   12.171.53
  20RAZOR2_CF_RANGE_51_100  26856 2.41   11.791.41
--

TOP HAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1BAYES_002002969  84.675.15   93.09
   2HTML_MESSAGE1131073  54.03   67.83   52.57
   3UNPARSEABLE_RELAY   760567   32.93   10.12   35.35
   4DKIM_SIGNED 693328   29.746.26   32.22
   5DKIM_VERIFIED   531590   22.673.38   24.71
   6ALL_TRUSTED 1736127.300.058.07
   7USER_IN_WHITELIST   1557046.540.007.24
   8RDNS_NONE   140127   10.19   44.886.51
   9DCC_CHECK   1278448.51   32.755.94
  10RCVD_IN_DNSWL_LOW   1018634.310.344.73
  11MIME_HTML_ONLY  93817 7.62   38.424.36
  12RCVD_IN_DNSWL_MED   90038 3.810.314.18
  13WHOIS_NETSOLPR  87575 3.720.384.07
  14MIME_QP_LONG_LINE   82804 4.49   10.523.85
  15BOTNET  78052 8.90   58.683.63
  16BAYES_5058286 3.93   15.502.71
  17FUZZY_AMBIEN53284 2.280.382.48
  18SARE_SUB_ENC_UTF8   50533 2.140.172.35
  19SARE_MILLIONSOF 42268 1.840.671.96
  20FORGED_YAHOO_RCVD   38762 1.741.161.80
--


Then I looked to see what bayes did with the message, but I do not
understand how to read the output, can someone explain this to me and
give me an idea why BAYES_00 fired when we've been feeding every one of
these spams to bayes to train on it?

$ spamassassin -D bayes  phish 
[9595] dbg: bayes: using username: @GLOBAL
[9595] dbg: bayes: database connection established
[9595] dbg: bayes: found bayes db 

Re: Funds / Award release scams poor scoring

2008-11-09 Thread Micah Anderson
John Hardin [EMAIL PROTECTED] writes:

 On Sun, 9 Nov 2008, Micah Anderson wrote:

 Does anyone have any rules to catch these, or suggestions of scores to
 tweak to make these hit better?  I am running clamav-milter with the
 sanesecurity add-ons, but these are still making it through.

 Check out the sought-fraud ruleset.

 http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf

I am pulling the sought.rules.yerp.org channel, I thought that this was
the same, but diff'ing these shows a lot of differences.

 (I don't know if it's in sa-update yet - Justin?)

Would be nice if I could pull these in via sa-update!

micah



Re: Funds / Award release scams poor scoring

2008-11-09 Thread Micah Anderson
Chris [EMAIL PROTECTED] writes:

 On Sunday 09 November 2008 2:33 pm, Micah Anderson wrote:

  2.5 CTYME_IXHASH   BODY: iXhash found @ ixhash.junkemailfilter.com

This one is interesting to me, when I pump these messages through spamc
-R I get:

-5.0 RCVD_IN_JMF_W  RBL: Sender listed in JMF-WHITE
   [70.103.162.29 listed in hostkarma.junkemailfilter.com]

Because I added the hostkarma.junkemailfilter RBLs, as described here:
http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113625

Getting -5 on these kind of sucks, but yours doesn't look like a RBL
check, and is scoring it up. What test is that?

 Above are how these scored on my stand-alone box. You may want to run the 
 Freemail plugin, SA-Grey plugin. Are you running Razor? 

The rest of my tests were the same as yours, with the exception of the
Freemail and SA-Grey plugins, which I do not have. I'll track those
down. I am running razor, the first message gets a + .5 from
RAZOR2_CHECK, the 4th message gets 0.5 RAZOR2_CHECK + 1.5
RAZOR2_CF_RANGE_E4_51_100 + 0.5 RAZOR2_CF_RANGE_51_100

Micah



Funds / Award release scams poor scoring

2008-11-09 Thread Micah Anderson

I'm getting a number of these types of emails getting through SA with
either negative scores, or very low scores. This is surprising to me as
these are pretty classic spams. I suspect that some of the low scores
are due being DKIM signed. 

Does anyone have any rules to catch these, or suggestions of scores to
tweak to make these hit better?  I am running clamav-milter with the
sanesecurity add-ons, but these are still making it through.

I here are 5 different ones, all that got through in the last 24
hours:

http://micah.riseup.net/1
http://micah.riseup.net/2
http://micah.riseup.net/3
http://micah.riseup.net/4
http://micah.riseup.net/5

Thanks






Re: Phishing rules?

2008-11-09 Thread Micah Anderson
Joseph Brennan [EMAIL PROTECTED] writes:


 /Dear .{0,12}(web ?mail|columbia\.edu)/i

 /Password.{0,10}\([\s\.\*\_]+\)/

 /you must reply to this email/i

 Reply-to =~ /[EMAIL PROTECTED]/

I'm new at writing custom rules, so I am trying to figure out the best
way to do this. Would it be better to make a different rule for each one
of these, or would it be better to bmake a meta-rule? My guess is its
better to make a meta-rule, but that means that each rule must hit in
order to get the larger score, versus some of the individual rules
hitting and adding up to the larger score. The meta-rule seems good
because it describes a full profile phishing email that must be met, but
it seems bad because one tweak of the phish would result in the
meta-rule not matching overall. I suppose this is the point of the
arthemetic meta-rule possibility, however I'm puzzled at the best
mechanism to choose. Any advice would be appreciated.

Once I figure out the best way to match these, I need a good way to
determine what I should score these, the rule-writing documentation
suggests starting at 0.1 and then moving it up as you test it, and
suggests extreme caution scoring a custom rule over 1, however it seems
like these would be better scored higher than that.

 The first of course is partly local to us.  Another useful local rule
 is to check for the uri of your own webmail.

Yeah, i'll make a uri rule for that and probably add that to the
meta-rule.

Thanks for any advice,
micah



bayes SQL delays

2008-11-02 Thread Micah Anderson

I have spamd setup to use bayes in a mysql database, works fine. I've
turned off auto-expiry and instead run a cronjob to expire in the middle
of the night (removes about 40k tokens on a run). I've made the DB
innoDB so it can handle locking better. I've got mysql-based user prefs
coming from the same database server, and that works (not everyone wants
bayes). Autolearning is working, I chew through a lot of mail every day,
in general everything seems fine.

Except that my spamd server is overloaded, so I need a second one. So I
set up another spamd instance, with the exact same configurations as the
first, fire it up and it immediately starts blocking on the bayes
work. Average scantimes go from 1-2 seconds up to 35+ and the max
children get eaten up by blocking on the bayes work to the point where
its pointless because too many processes are blocked. If I disable the
bayes_sql stuff in my local.cf, scantimes drop back to their expected
average of 1-2 seconds, but of course none of the BAYES tests will fire
and autolearning fails. 

What gives?



Re: Phishing rules?

2008-11-02 Thread Micah Anderson
Joseph Brennan [EMAIL PROTECTED] writes:

 Reply-to: [EMAIL PROTECTED]


 First pass:

 header LOCAL_REPLYTO_LIVE Reply-to =~ /[EMAIL PROTECTED]/
 score LOCAL_REPLYTO_LIVE8.0

 Maybe scoring 8.0 for one thing scares you, but I haven't seen this
 fp in a couple of months.

Is live.com a legitimate email sender? It looks microsoft related. If I
set it to 8, then any mail from that address is surely to get caught as
spam, which may not be the right thing depending on other potential
legitimate addresses sending from that domain.

Or perhaps nothing but spam comes from live.com? I dont know anything
about it.

micah



Re: Phishing rules?

2008-11-02 Thread Micah Anderson
SM [EMAIL PROTECTED] writes:

 At 07:56 01-11-2008, Micah Anderson wrote:
Here is an example one I received recently, note the hideously low bayes
score on this one, caused it to autolearn as ham even, grr.

 [snip]

X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
 autolearn=ham version=3.2.5

 The sender is whitelisted by www.dnswl.org.

Yeah, because this one was forwarded through debian.org, which is
legitimate. The spam originator was not debian.org, but debian.org is
the one in dnswl.org.

Received: from master.debian.org (master.debian.org [70.103.162.29])
 by mx1.riseup.net (Postfix) with ESMTP id AA4465701D1
 for [EMAIL PROTECTED]; Fri, 31 Oct 2008 20:00:39 -0700 (PDT)

 The mail is coming through debian.org.  Do you want to blacklist that host?

No, I do not. 




Re: Phishing rules?

2008-11-02 Thread Micah Anderson
Karsten Bräckelmann [EMAIL PROTECTED] writes:

 On Sat, 2008-11-01 at 11:30 -0400, Micah Anderson wrote:
 Joseph Brennan [EMAIL PROTECTED] writes:

  Do you mean attempts to get your users to send their passwords,
  or fake mail pretending to be from banks?
 
 I mean attempts to get my users to send their passwords, are these not
 called phishing?

 An important bit of information, missing from the OP. :)  Targeted
 attacks at your users, so the general phishing BLs don't really apply.

 Anyway, can't you educate your users, that

 (a) Any administrative email will be sent from an official, well known,
 internal address? That means *not* an arbitrary address. Yes, sorry,
 the obvious...
 (b) They will *never* ever be asked for a password by mail. Period.
 Again, obvious...

We've been telling our users this for years, but there is always someone
who doesn't listen, or forgets, or something. I dont know. I find it
absolutely incredible that anyone would fall for any of these, yet I am
the one who has to clean up the mess :P

 Then block internal / administrative From addresses coming from any
 external SMTP.

Yeah, thats done, they dont get by faking our From, but the body is
constructed in a way to mislead and impersonate our staff or whatever,
usually by threatening people that their account will be closed, unless
they reply.

 This is not a technical way to stopping these, but an educational
 approach to prevent the most dumb and gross social engineering. At least
 the second one actually should be well-known, and I've seen ISPs
 pointing it out frequently...

Thanks, but we've done all these, and continue to do them, they are
another plank in the various mechanisms that we must employ.

micah



Re: Phishing rules?

2008-11-01 Thread Micah Anderson
Randy [EMAIL PROTECTED] writes:

 Micah Anderson wrote:
 Sadly, I do not have an example I can share at the moment, as I
 typically delete them in a rage after training my bayes filter on
 them. However, I am looking for any suggestions of other things I can
 turn on... in particular, are there rules that people have created that
 look for certain keywords where the body is asking for your
 account/password information?
   
 Report these and maybe they will add something that catches them. If
 one wanted to, they can get any mail the want through your filters if
 they are good and don't use things that trigger the rules.

Report them where exactly?

Here is an example one I received recently, note the hideously low bayes
score on this one, caused it to autolearn as ham even, grr.


From [EMAIL PROTECTED] Fri Oct 31 20:00:45 2008
Return-Path: [EMAIL PROTECTED]
X-OfflineIMAP-x792266711-4c6f63616c-494e424f58: 1225549253-0134941395044-v6.0.3
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd2.riseup.net
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
autolearn=ham version=3.2.5
Delivered-To: [EMAIL PROTECTED]
Received: from mx1.riseup.net (unknown [10.8.0.3])
by cormorant.riseup.net (Postfix) with ESMTP id 58BFA19581F7
for [EMAIL PROTECTED]; Fri, 31 Oct 2008 20:00:40 -0700 (PDT)
Received: from master.debian.org (master.debian.org [70.103.162.29])
by mx1.riseup.net (Postfix) with ESMTP id AA4465701D1
for [EMAIL PROTECTED]; Fri, 31 Oct 2008 20:00:39 -0700 (PDT)
Received: from cat.cybersurf.net ([209.197.145.185] helo=cat.cia.com)
by master.debian.org with esmtp (Exim 4.63)
(envelope-from [EMAIL PROTECTED])
id 1Kw6j8-0003iT-Ix
for [EMAIL PROTECTED]; Sat, 01 Nov 2008 03:00:38 +
Received: from reef.cybersurf.com ([209.197.145.198])
by cat.cia.com with esmtp (Exim 4.50)
id 1Kw6iz-0002Li-Pg; Fri, 31 Oct 2008 21:00:29 -0600
Received: from apache by reef.cybersurf.com with local (Exim 4.44)
id 1Kw6j0-0006W5-UJ; Fri, 31 Oct 2008 20:00:30 -0700
Received: from 196-207-0-227.netcomng.com (196-207-0-227.netcomng.com 
[196.207.0.227]) 
by webmail.3web.com (IMP) with HTTP 
for [EMAIL PROTECTED]; Sat,  1 Nov 2008 14:00:30 +1100
Message-ID: [EMAIL PROTECTED]
Date: Sat,  1 Nov 2008 14:00:30 +1100
From: WEBMAIL Help Desk [EMAIL PROTECTED]
Reply-to: [EMAIL PROTECTED]
Subject: WEBMAIL Help Desk
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.1
X-Originating-IP: 196.207.0.227
To: undisclosed-recipients:;
X-Virus-Scanned: ClamAV 0.94/8552/Fri Oct 31 18:14:36 2008 on mx1.riseup.net
X-Virus-Status: Clean
Status: RO
Content-Length: 1427
Lines: 38


Dear Webmail User,
This message was sent automatically by a program on Webmail which
periodically checks the size of inboxes, where new messages are
received.
The program is run weekly to ensure no one's inbox grows too large. If
your inbox becomes too large, you will be unable to receive new email.
Just before this message was sent, you had 18 Megabytes (MB) or more of
messages stored in your inbox on your Webmail. To help us re-set your
SPACE on our database prior to maintain your INBOX, you must reply to
this e-mail and enter your

Current User name ()
and Password(   ).

You will continue to receive this warning message periodically if your
inbox size continues to be between 18 and 20 MB. If your inbox size
grows to 20 MB, then a program on Bates Webmai
will move your oldest email to a
folder in your home directory to ensure that you will continue to be
able to receive incoming email. You will be notified by email that this
has taken place. If your inbox grows to 25 MB, you will be unable to
receive new email as it will be returned to the sender.
After you read a message, it is best to REPLY and SAVE it to another
folder.

Thank you for your cooperation.
WEBMAIL Help Desk






---
3webXS HiSpeed Dial-up...surf up to 5x faster than regular dial-up alone... 
just $14.90/mo...visit www.get3web.com for details





Re: Phishing rules?

2008-11-01 Thread Micah Anderson
Karsten Bräckelmann [EMAIL PROTECTED] writes:

 On Thu, 2008-10-30 at 15:56 -0400, Micah Anderson wrote:
 I keep getting hit by phishing attacks, and they aren't being stopped by
 anything I've thrown up in front of them:
 
 postfix is doing:
  reject_rbl_client   b.barracudacentral.org,
  reject_rbl_client   zen.spamhaus.org,
  reject_rbl_client   list.dsbl.org,
 
 I've got clamav pulling signatures updated once a day from sanesecurity
 (phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx,
 securesiteinfo) and Malware Black List, MSRBL (images, spam).

 I'd increase this, at least for the SaneSecurity phish sigs. They are
 being updated much more frequently.

Thanks for the pointer. For some reason I thought I had read on the
SaneSecurity site that you shouldn't pull more than once a day, but now
after you mentioned it I went and read again and they ask you dont pull
more frequently than once an hour... so I've changed that cronjob, that
should help.

 I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand
 pulls in the 25_uribl.cf automatically, right? Or do I need to configure

 Yes, unless you disable network tests in general. Should be easy to
 answer yourself if they are working, just by grepping for the rule names
 defined in 25_uribl.cf.

Network tests aren't disabled, and yeah I am seeing those rules occur in
some of my headers of mail that I can search through, so I think that
they are working. I've increased my overall URIBL scoring to 2.5 from
the default.

 Sadly, I do not have an example I can share at the moment, as I
 typically delete them in a rage after training my bayes filter on
 them. However, I am looking for any suggestions of other things I can
 turn on... in particular, are there rules that people have created that
 look for certain keywords where the body is asking for your
 account/password information?

 So you've pretty much thrown everything at it you could find... ;)  And
 they are still slipping through? How many are we talking here? Compared
 to the total number of spam / phish?

 Also, how many are being caught? Strikes me as odd that you don't have a
 sample but yet sound like every single one is slipping by.

These are hard for me to answer as I am not doing any analysis of how
many are caught. In the last week, I've gotten four of them through, and
I've received reports from a number of users that they too have received
them.

I've just sent a sample to the list however. 

 I guess, I would start verifying that all the above actually is working.
 Most notably the SaneSecurity phish sigs. ClamAV should catch the lions
 share, by far, assuming it comes before SA in your chain.

Yeah, I'm using the clamav-milter, so those get rejected really early
on.

Thanks for the ideas,
Micah



Re: Phishing rules?

2008-11-01 Thread Micah Anderson
Joseph Brennan [EMAIL PROTECTED] writes:

 Micah Anderson [EMAIL PROTECTED] wrote:

 I keep getting hit by phishing attacks, and they aren't being stopped by
 anything I've thrown up in front of them:

 Do you mean attempts to get your users to send their passwords,
 or fake mail pretending to be from banks?

I mean attempts to get my users to send their passwords, are these not
called phishing?

micah



Re: Phishing rules?

2008-11-01 Thread Micah Anderson
Brent Clark [EMAIL PROTECTED] writes:

 Hiya

 See SA examples

 http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists

 Also add hostkarma.junkemailfilter.com to you DNSBL.

Thanks, I'll add this to my local.cf and see how it goes.

 Another thing I do find is useful is adding additional higher valued
 MX records.

 http://www.junkemailfilter.com/spam/support.html

I dont really like the idea of adding some other site's MX to my DNS, so
I think I'll pass on this one.

thanks for the suggestions!
micah



  1   2   >