Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Matus UHLAR - fantomas

On 30.11.11 00:17, Alex wrote:

I have two fedora15 boxes that process mail for a few domains, and
recently set up bayes in mysql for each of them. The servers are in
geographically different locations, a few hops from each other. Since
they both process mail for the same domains, I thought it made sense
to share the database between them.

What's the best way to do this? Set one as a master and the other as a
slave, or perhaps replication between them?

I also thought about something like drbd, but that seems a bit
excessive for just a database.


I think this is question for MySQL mailing list, not for SA.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Support bacteria - they're the only culture some people have. 


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Robert Schetterer
Am 30.11.2011 09:06, schrieb Matus UHLAR - fantomas:
> On 30.11.11 00:17, Alex wrote:
>> I have two fedora15 boxes that process mail for a few domains, and
>> recently set up bayes in mysql for each of them. The servers are in
>> geographically different locations, a few hops from each other. Since
>> they both process mail for the same domains, I thought it made sense
>> to share the database between them.
>>
>> What's the best way to do this? Set one as a master and the other as a
>> slave, or perhaps replication between them?
>>
>> I also thought about something like drbd, but that seems a bit
>> excessive for just a database.

dont use drbd with mysql store, you dont need it
> 
> I think this is question for MySQL mailing list, not for SA.
> 
 you can use i.e master-master replication ( which i do ), but be aware
you might get doubles with bayes store, this should be ignored

but i am told PostgreSQL is better in replacation stuff

-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: A SpamAssassin Crash Course for Admins

2011-11-30 Thread Patrick Ben Koetter
Dorian,

* Dorian Chan :
> Hello again,
> I've attached version 2.0 with this email (it's the clean version without
> all the comments :) ). I've pretty much finished up the definitions and
> some cleaning up. Again, I would really enjoy feedback!

I've attached an edited version that adds puts SA in context with other
filtering methods.

p@rick

-- 
state of mind ()
Digitale Kommunikation

http://www.state-of-mind.de

Franziskanerstraße 15  Telefon +49 89 3090 4664
81669 München  Telefax +49 89 3090 4666

Amtsgericht MünchenPartnerschaftsregister PR 563



SpamAssassinPatrick.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document


Re: A SpamAssassin Crash Course for Admins

2011-11-30 Thread spamassassin
On 30/11/11 07:17, Ted Mittelstaedt wrote:

>>> I've attached version 2.0 with this email (it's the clean version without 
>>> all the comments :) ). I've pretty much finished up the definitions and 
>>> some cleaning up. Again, I would really enjoy feedback!
>>
>> Everywhere you say "SpamAssassin" you should probably be saying "Apache 
>> SpamAssassin."
>>
> 
> And instead of saying "Linux" you should say GNU/Linux, and instead of 
> saying Ford you should say Ford Motor Company, and instead of saying
> Coke you should say Coca Cola, and instead of saying.
> 
> Never thought I'd see the day when branding became this important in the 
> Free Software arena... :-(

It's not always just branding. It's also, giving proper attribution.
Organisations and people should be credited appropriately for their
contributions. It's the respectful thing to do. "GNU/Linux" is the best
example of this IMO.

At least you said "free software arena" and not "open source world" ;)

-- 
Mike Cardwell https://grepular.com/  https://twitter.com/mickeyc
Professional  http://cardwellit.com/ http://linkedin.com/in/mikecardwell
PGP.mit.edu   0018461F/35BC AF1D 3AA2 1F84 3DC3 B0CF 70A5 F512 0018 461F



signature.asc
Description: OpenPGP digital signature


T_DKIM_INVALID DKIM-Signature header exists but is not valid

2011-11-30 Thread Jari Fredriksson

I have set up DKIM on our corporate mail hosted by GMail. Google
assigned a TXT record and our DNS-provider set it.

I send a mail from GMail from the Web UI and from my Thunderbird to
myself, and SA always triggers that rule.

What does it mean? Is not valid?

-- 

There are more things in heaven and earth,
Horatio, than are dreamt of in your philosophy.
-- Wm. Shakespeare, "Hamlet"



signature.asc
Description: OpenPGP digital signature


Re: Martin Gregorie's portmanteau rule building script

2011-11-30 Thread Martin Gregorie
On Tue, 2011-11-29 at 14:22 -0800, Adam Katz wrote:

> If you want to fork the thread into a tangent, please change the subject
> so other responses to it don't follow you.  Also, don't respond to the
> parts of the thread you are not forking; those belong in another message
> in the original thread.
> 
That wasn't my intention. I *thought* I was merely adding an aside to
say "if you really want rules with lots of alternates, here's a tool
that can help" because I think we've all all struggled with rules that
straggle off the right edge of the page with many editors. I know vi/vim
will wrap those lines, but a lot of people dislike vi.

> You might want to consider Regexp::Assemble for your tool, though that
> would require using perl.  This would cause your man page's example rule
> to result in something like this:
> 
>body __AU0 /(?i-xsm:\balt[123]\b)/
> 
> rather than your script's *much* slower:
> 
>body __AU0 /\b(alt1|alt2|alt3)\b/i
> 
Interesting idea. Currently my system's performance seems 'adequate',
considering I'm running SA on an 866 mHz P3 box with 512 MB RAM:
Min Avg  Max
Scan times: 0.9 (   3401 bytes) 4.0128.3 (  72858 bytes)
Msg sizes: 2258 (1.8 secs )   10474   507533 (6.2 secs )
Messages:  2032

What sort of speed-up would Regexp::Assemble provide? 
How would that compare with compiling the portmanteau.cf file?
 

Martin




Rules for opt-in mailing list

2011-11-30 Thread Paul Houselander

Hi

Bit of an unusal question but ive been getting increasing questions of 
why spamassasin didnt classify an email as spam.


When I look at the mail its normally an opt-in mailing list of some kind 
and therefore spamassasin is correct in not classifying it as spam.


I have had numerous conversations with users explaining opt-in mailing 
lists are not spam - if you dont want it unsubscribe to it, however its 
getting so frequent now I was wondering if anyone had created a set of 
rules that would fire on the characteristics of mailling lists? e.g. 
unsubscribe links in the email, CANSPAM mentioned in body etc...


Then when someone complains ill enable the rules to stop them bothering me.

If not ill look at writing some myself, if anyone has suggestions on 
what to look for on opt-in lists please let me know.


Thanks

Paul




Re: Rules for opt-in mailing list

2011-11-30 Thread Martin Gregorie
On Wed, 2011-11-30 at 12:40 +, Paul Houselander wrote:
> Hi
> 
> Bit of an unusal question but ive been getting increasing questions of 
> why spamassasin didnt classify an email as spam.
> 
> When I look at the mail its normally an opt-in mailing list of some kind 
> and therefore spamassasin is correct in not classifying it as spam.
> 
There are a number of persistent commercial spammers (usually American,
e.g the Bonnier Corporation) that disguise their spam as newsletters and
have an 'unsubscribe' URL that doesn't do anything except, probably,
validate addresses on their spamming list. Then there are other
homegrown menaces, e.g. BT and its spammer^h^h^h^h^h^h^h advertiser
tractionplatform.com, that also ignore unsubscribe URLs or don't provide
them but, at least don't try make their junk look like newsletters. 

If asking to be unsubscribed doesn't work or the option isn't provided I
simply add their URLs to a private blacklist. This is implemented as an
SA rule that adds a large enough score to deep-six the mail immediately.
Where they use a different mail address for spam, I just add that since
they are often outfits I'll use again in future, but just don't need the
UCE stream from them. Ebuyer is a recent offender of this type.

I've noticed a rise in UCE recently, often from online shops I've bought
something from and that didn't have a check-box for newsletter
acceptance, sometimes because I first used them a very long time ago.

HTH
 
Martin




Re: Rules for opt-in mailing list

2011-11-30 Thread Michael Scheidell

Hi

Bit of an unusal question but ive been getting increasing questions of 
why spamassasin didnt classify an email as spam.


When I look at the mail its normally an opt-in mailing list of some 
kind and therefore spamassasin is correct in not classifying it as spam.


I was on icsa's anti-spam consortium, trying to create a 'specification' 
on anti-spam systems so they could certify them (quit after verizon 
bought them.. )


6 hours of the first 8 hour meeting was on trying to define 'spam' 
(because one of the specs was a minimum capture rate, and a maximum fp rate)

gotaa define spam first!

uce? bulk? what?
'spam is email you didn't want'.

we decided it is UNSOLICITED COMMERCIAL EMAIL.

You are right though, if this is CONFIRMED OPT-IN, then the user asked 
for it, it is BULK, it might be Commercial, but it is not UNSOLICITED.

its not spam.

'OPT-OUT' (or opt-in, where someone other than user opted you in.. like 
the list manager, IS SPAM)


but that doesn't solve your problem.

we tell users not to click on opt-out buttons because it confirms their 
email address.  unless they remember opting in :-).



I have had numerous conversations with users explaining opt-in mailing 
lists are not spam - if you dont want it unsubscribe to it, however 
its getting so frequent now I was wondering if anyone had created a 
set of rules that would fire on the characteristics of mailling lists? 
e.g. unsubscribe links in the email, CANSPAM mentioned in body etc...


use, SA has tests for lots of unsubscribe/opt./out links, but they use 
them to trigger 'spam', not to try to see who is sending can spam email.


and, guess what:  a fully legal, 'opt out' email list, can spam 
compliant, with full physical address, unsub instructions, and truthful 
subject line can still be spam if user did not opt-in themselves.


Then when someone complains ill enable the rules to stop them 
bothering me.


If not ill look at writing some myself, if anyone has suggestions on 
what to look for on opt-in lists please let me know.
some of the PAID reputation lists, have 'credits' for opt-in lists, look 
at some of the 'nice' rules for hints.


(YMMV.. the sender is paying someone else to let their email in because 
they feel it is likely going to be caught by sa otherwise)


I mentioned in an earlier email about the Freebsd SA update, DCC.
DCC goes the other way, sorta, and it will set higher scores on BULK 
email (yes, even bulk email you opted in to)


If you use the build in SA credits, and offset them with the DCC bulk 
scores, it still would not help you, because:
if the list owner has a good ip reputation, and your user opted in, the 
ip reputation rbls would still be giving them credit.


real answer?

get smarter users!

you can make something foolproof, but not idiot proof.

ps, publish an SLA.  offer accuracy SLA's on 'BUSINESS CRITICAL EMAIL', 
not just email.


SA will most likely score as spam that joke your brother in law sent. 
 is that SPAM?
it is sure bulk, and has lots of 'cruft' in it, by the time he has 
gotten it forwarded to him by 20 people.

did you want it? no.
is it COMMERCIAL? no.

is it SPAM?
heck yes, I didn't want it :-)

--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
>*| *SECNAP Network Security Corporation

   * Best Mobile Solutions Product of 2011
   * Best Intrusion Prevention Product
   * Hot Company Finalist 2011
   * Best Email Security Product
   * Certified SNORT Integrator

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com/
__  
 


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Michael Scheidell

Hi all,

I have two fedora15 boxes that process mail for a few domains, and
recently set up bayes in mysql for each of them. The servers are in
geographically different locations, a few hops from each other. Since
they both process mail for the same domains, I thought it made sense
to share the database between them.

What's the best way to do this? Set one as a master and the other as a
slave, or perhaps replication between them?


easy:
set master on mx1, slave on mx2.
master is in charge of adding to db, and expiring, and slave can read it.
problem:  mx2 will get mostly spam, since spammers hit mx2 first, you 
'spam' hits will be lower then you thought.


hard:
master/master.
you have replication issues, especially when the sam spammer sends 500 
emails to the mx1, and the same 500 to mx2.

only run manual expire via cronjob on master.

try this patch: (changes insert into bayes_seen to insert ignore into, 
ymmv, use at own risk, your HP printer sets on fire because of it, its 
not my fault)


cd /usr/local/lib/perl5/site_perl/${pv}/Mail/SpamAssassin/BayesStore


   sed -i '' -e '/INSERT INTO bayes_seen/s/INTO/IGNORE INTO/' MySQL.pm

(hey SA folks.. any reason not to just put that into 3.4.0?  won't hurt 
anything, will it?)



--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
>*| *SECNAP Network Security Corporation

   * Best Mobile Solutions Product of 2011
   * Best Intrusion Prevention Product
   * Hot Company Finalist 2011
   * Best Email Security Product
   * Certified SNORT Integrator

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com/
__  
 


Re: Rules for opt-in mailing list

2011-11-30 Thread Axb

On 2011-11-30 13:40, Paul Houselander wrote:

Hi

Bit of an unusal question but ive been getting increasing questions of
why spamassasin didnt classify an email as spam.

When I look at the mail its normally an opt-in mailing list of some kind
and therefore spamassasin is correct in not classifying it as spam.

I have had numerous conversations with users explaining opt-in mailing
lists are not spam - if you dont want it unsubscribe to it, however its
getting so frequent now I was wondering if anyone had created a set of
rules that would fire on the characteristics of mailling lists? e.g.
unsubscribe links in the email, CANSPAM mentioned in body etc...

Then when someone complains ill enable the rules to stop them bothering me.

If not ill look at writing some myself, if anyone has suggestions on
what to look for on opt-in lists please let me know.


I assume your question was not supposed to trigger a discussion of what 
is spam, and this company spammed me and, and, and...


Basicaly there are no generic rules for opt-in spam, because "your spam 
is my ham"


What you require to catch your spam is up to you to decided.

The easiest is to blacklist sender (tho they may change if the use 
tagged send senders)


Next choice: URLs in uri rules - these are fast and efficient
(SA 3.4 , comming up when its ready, will alow your to blacklist URI 
hosts without having to write extra hi scored URI rules)


Next: look at bulk mailer's X headers. Most bulk tools have consistent 
headers which let you tag every msg from a "blaster" type with little 
effort.


next obvious path is to use meta rules with a collection of traits which 
will get rid of your so called "opt-in" mail


It's all up to you to make a choice which fits you best fr your traffic 
and your user base... mainly because "your spam is my ham"


Axb




Re: T_DKIM_INVALID DKIM-Signature header exists but is not valid

2011-11-30 Thread Mark Martinec
> I have set up DKIM on our corporate mail hosted by GMail. Google
> assigned a TXT record and our DNS-provider set it.
> 
> I send a mail from GMail from the Web UI and from my Thunderbird to
> myself, and SA always triggers that rule.
> 
> What does it mean? Is not valid?

Whatever of mail reaches Mail::DKIM module failed DKIM verification.
Either the message was changed on its way, or there is a problem
reaching your public key in DNS or it is incorrect, or there is something
broken in the software.

You will have to provide more details to get a more useful reply.

At least you need to obtain the complete intact message as it landed
in the mailbox. Versions of SpamAssassin and Mail::DKIM module may
be helpful. If you'd prefer not to disclose a message publically, you may
send it to me off-list - make sure it remains intact, append it as an
attachment. A message from your corporate mail account (supposedly
signed) directly to me would be helpful too.

  Mark


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Matus UHLAR - fantomas

I have two fedora15 boxes that process mail for a few domains, and
recently set up bayes in mysql for each of them. The servers are in
geographically different locations, a few hops from each other. Since
they both process mail for the same domains, I thought it made sense
to share the database between them.

What's the best way to do this? Set one as a master and the other as a
slave, or perhaps replication between them?


On 30.11.11 08:23, Michael Scheidell wrote:

set master on mx1, slave on mx2.
master is in charge of adding to db, and expiring, and slave can read it.
problem:  mx2 will get mostly spam, since spammers hit mx2 first, you 
'spam' hits will be lower then you thought.


well, in this case spamc on slave can preferrably query spamd on master first:
- no problem when master/mx1 is up 
- minor performance delay when mx1/master is down

  (even good to slow down spammers).

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
There's a long-standing bug relating to the x86 architecture that
allows you to install Windows.   -- Matthew D. Fuller


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Walter Hurry
On Wed, 30 Nov 2011 09:11:49 +0100, Robert Schetterer wrote:

> Am 30.11.2011 09:06, schrieb Matus UHLAR - fantomas:
>> On 30.11.11 00:17, Alex wrote:
>>> I have two fedora15 boxes that process mail for a few domains, and
>>> recently set up bayes in mysql for each of them. The servers are in
>>> geographically different locations, a few hops from each other. Since
>>> they both process mail for the same domains, I thought it made sense
>>> to share the database between them.
>>>
>>> What's the best way to do this? Set one as a master and the other as a
>>> slave, or perhaps replication between them?
>>>
>>> I also thought about something like drbd, but that seems a bit
>>> excessive for just a database.
> 
> dont use drbd with mysql store, you dont need it
>> 
>> I think this is question for MySQL mailing list, not for SA.
>> 
>  you can use i.e master-master replication ( which i do ), but be aware
> you might get doubles with bayes store, this should be ignored
> 
> but i am told PostgreSQL is better in replacation stuff

Why replicate? Why not just share the same database?



Re: T_DKIM_INVALID DKIM-Signature header exists but is not valid

2011-11-30 Thread Jari Fredriksson
30.11.2011 16:37, Mark Martinec kirjoitti:
>> I have set up DKIM on our corporate mail hosted by GMail. Google
>> assigned a TXT record and our DNS-provider set it.
>>
>> I send a mail from GMail from the Web UI and from my Thunderbird to
>> myself, and SA always triggers that rule.
>>
>> What does it mean? Is not valid?
> 
> Whatever of mail reaches Mail::DKIM module failed DKIM verification.
> Either the message was changed on its way, or there is a problem
> reaching your public key in DNS or it is incorrect, or there is something
> broken in the software.
> 
> You will have to provide more details to get a more useful reply.
> 
> At least you need to obtain the complete intact message as it landed
> in the mailbox. Versions of SpamAssassin and Mail::DKIM module may
> be helpful. If you'd prefer not to disclose a message publically, you may
> send it to me off-list - make sure it remains intact, append it as an
> attachment. A message from your corporate mail account (supposedly
> signed) directly to me would be helpful too.
> 
>   Mark

Thanks! I tried to find the DKIM-version, but could not find it. I had
no debian package for that, so it must have been from some old
CPAN-installation. I installed Debian DKIM-package, and now the message
appears to be valid!

My stupid error.

Sorry.

Thanks!


-- 

You definitely intend to start living sometime soon.



signature.asc
Description: OpenPGP digital signature


Re: A SpamAssassin Crash Course for Admins

2011-11-30 Thread Kevin A. McGrail

On 11/30/2011 4:32 AM, spamassas...@lists.grepular.com wrote:

"GNU/Linux" is the best example of this IMO.
IMO, that is the most controversial example you could have picked.  I 
believe Debian and FSF are the only people that recognize that branding 
for Linux.  Not arguing one side or the other but no one is arguing that 
SpamAssassin is under ASF's umbrella.  And since this is a document 
basically about Spam on behalf of the project, getting our own name 
right in the document makes sense without working about kow-towing the 
capitalist pigs that rule the world ;-)


Regards,
KAM


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Nigel Frankcom
On Wed, 30 Nov 2011 15:14:33 + (UTC), Walter Hurry
 wrote:

>On Wed, 30 Nov 2011 09:11:49 +0100, Robert Schetterer wrote:
>
>> Am 30.11.2011 09:06, schrieb Matus UHLAR - fantomas:
>>> On 30.11.11 00:17, Alex wrote:
 I have two fedora15 boxes that process mail for a few domains, and
 recently set up bayes in mysql for each of them. The servers are in
 geographically different locations, a few hops from each other. Since
 they both process mail for the same domains, I thought it made sense
 to share the database between them.

 What's the best way to do this? Set one as a master and the other as a
 slave, or perhaps replication between them?

 I also thought about something like drbd, but that seems a bit
 excessive for just a database.
>> 
>> dont use drbd with mysql store, you dont need it
>>> 
>>> I think this is question for MySQL mailing list, not for SA.
>>> 
>>  you can use i.e master-master replication ( which i do ), but be aware
>> you might get doubles with bayes store, this should be ignored
>> 
>> but i am told PostgreSQL is better in replacation stuff
>
>Why replicate? Why not just share the same database?

No failover with shared. Distributed adds redundancy.

KR

Nigel


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread John Hardin

On Wed, 30 Nov 2011, Walter Hurry wrote:


On Wed, 30 Nov 2011 09:11:49 +0100, Robert Schetterer wrote:


Am 30.11.2011 09:06, schrieb Matus UHLAR - fantomas:

On 30.11.11 00:17, Alex wrote:

I have two fedora15 boxes that process mail for a few domains, and
recently set up bayes in mysql for each of them. The servers are in
geographically different locations, a few hops from each other. Since
they both process mail for the same domains, I thought it made sense
to share the database between them.

What's the best way to do this? Set one as a master and the other as a
slave, or perhaps replication between them?

I also thought about something like drbd, but that seems a bit
excessive for just a database.


dont use drbd with mysql store, you dont need it


I think this is question for MySQL mailing list, not for SA.


 you can use i.e master-master replication ( which i do ), but be aware
you might get doubles with bayes store, this should be ignored

but i am told PostgreSQL is better in replacation stuff


Why replicate? Why not just share the same database?


Latency and reliability of the link between the "geographically separate 
locations". Replication is typically robust in the face of unreliable 
networking and high latency and will recover from outages, and most people 
don't want their mail delivery system doing WAN database queries with the 
associated (assumed) high latency, limited bandwidth and risk of 
service interruption that is inherent in a WAN.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  There is no better measure of the unthinking contempt of the
  environmentalist movement for civilization than their call to turn
  off the lights and sit in the dark. -- Sultan Knish
---
 15 days until Bill of Rights day


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Walter Hurry
On Wed, 30 Nov 2011 08:25:43 -0800, John Hardin wrote:

> On Wed, 30 Nov 2011, Walter Hurry wrote:

>> Why replicate? Why not just share the same database?
> 
> Latency and reliability of the link between the "geographically separate
> locations". Replication is typically robust in the face of unreliable
> networking and high latency and will recover from outages, and most
> people don't want their mail delivery system doing WAN database queries
> with the associated (assumed) high latency, limited bandwidth and risk
> of service interruption that is inherent in a WAN.

Good points. Actually I missed the "geographically separate locations" 
bit - mea culpa.




Re: Rules for opt-in mailing list

2011-11-30 Thread Benny Pedersen

On Wed, 30 Nov 2011 12:40:02 +, Paul Houselander wrote:

Then when someone complains ill enable the rules to stop them 
bothering me.


let users self write rules in userconf if thay do not want to 
unsubscribe, if you are helpfull writeing rules, thay will later 
complain about missing maillist mails


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Benny Pedersen

On Wed, 30 Nov 2011 08:23:59 -0500, Michael Scheidell wrote:

   sed -i '' -e '/INSERT INTO bayes_seen/s/INTO/IGNORE INTO/' 
MySQL.pm


(hey SA folks.. any reason not to just put that into 3.4.0?  won't
hurt anything, will it?)


or simply just

ALTER TABLE  `bayes_seen` ENGINE = INNODB




Re: How long can a rule be?

2011-11-30 Thread Sergio
Thank you Adam,
I have been working hard in learning a lot of things about antispam rules
and I appreciate all the inputs that the list is giving to me.

I use MailScanner to check on my emails and I have not yet found a way to
train Bayes, I will check on that.

On the mean time, I have learned not to check in "ALL" headers, I have
redefined my first rules and now I have seen a better approach on what I am
doing, still need a lot more input from experts, :)

Regards,

Sergio

On Tue, Nov 29, 2011 at 2:21 PM, Adam Katz  wrote:

> Summary for the impatient:
> Do not write rules like this.
> Instead, train Bayes, make sure you're using DNSBLs.
>
> On 11/25/2011 09:49 AM, Sergio wrote:
> > I wrote all the HELO spammers that SA didn't caught
> ...
> > header   CHARLY_RULE1ALL =~ /(...)/i
> > describe CHARLY_RULE1Charly Spammers
> > scoreCHARLY_RULE111
>
> Given the description in your email, that should probably be:
>
> header   CHARLY_RULE1X-Spam-Relays-Untrusted =~ / helo=(?:...) /i
> describe CHARLY_RULE1A custom list of uncaught relay HELOs
> scoreCHARLY_RULE14
>
> You should be *very* careful about scoring any individual rule at or
> above the spam flagging threshold (default is 5, do not lower).  There
> is almost always a better (and safer!) solution.
>
> > My concern is, is too much for just one rule or the rule can grow
> > without limit?
>
> Let's just say you don't need to worry about that.  We have several 150+
> character rules on SA's trunk and I've seen rules with regexp lengths in
> the thousands (not that that's necessarily a good thing, but it does
> work, albeit slowly).
>
>
> Still, this seems like a really bad idea; one hammy HELO in there and
> the whole thing starts hurting.  I think you'll be *far* better served
> by training bayes.
>
> You should also double check to ensure your DNS lookups are properly
> configured and plugins like Razor are turned on.  We don't have the best
> of resources to walk you through this, but you can start with
> http://wiki.apache.org/spamassassin/DnsBlocklists#Questions_And_Answers
>
>


Re: Martin Gregorie's portmanteau rule building script

2011-11-30 Thread Adam Katz
On 11/30/2011 03:59 AM, Martin Gregorie wrote:
> On Tue, 2011-11-29 at 14:22 -0800, Adam Katz wrote:
>> You might want to consider Regexp::Assemble for your tool, though
>> that would require using perl. This would cause your man page's
>> example rule to result in something like this:
>> 
>>body __AU0 /(?i-xsm:\balt[123]\b)/
>>
>> rather than your script's *much* slower:
>>
>>body __AU0 /\b(alt1|alt2|alt3)\b/i
>>
> Interesting idea. Currently my system's performance seems 'adequate',
> considering I'm running SA on an 866 mHz P3 box with 512 MB RAM:
> Min Avg  Max
> Scan times: 0.9 (   3401 bytes) 4.0128.3 (  72858 bytes)
> Msg sizes: 2258 (1.8 secs )   10474   507533 (6.2 secs )
> Messages:  2032
> 
> What sort of speed-up would Regexp::Assemble provide? 
> How would that compare with compiling the portmanteau.cf file?

Great question.  I do not have an answer.

How much optimization does re2c provide?  I am under the impression all
it does is convert text-based PCREs to C/C++ code of some sort, which
fully(?) mimics the original regexp's logic, implying that optimization
before compilation matters a lot.

I popped into irc://freenode.net#regex to ask, but this is apparently
too archaic a question.  Maybe somebody will have an answer in time.  (I
am not motivated enough to create an impromptu benchmark suite myself.)



signature.asc
Description: OpenPGP digital signature


Re: How long can a rule be?

2011-11-30 Thread John Hardin

On Wed, 30 Nov 2011, Sergio wrote:


I use MailScanner to check on my emails and I have not yet found a way to
train Bayes, I will check on that.


That's going to be critical.


On the mean time, I have learned not to check in "ALL" headers, I have
redefined my first rules and now I have seen a better approach on what I am
doing, still need a lot more input from experts, :)


Avoid "poison pill" rules.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The Tea Party wants to remove the Crony from Crony Capitalism.
  OWS wants to remove Capitalism from Crony Capitalism.
-- Astaghfirullah
---
 15 days until Bill of Rights day


Re: Bayes database in mysql on multiple servers

2011-11-30 Thread Michael Scheidell

On Wed, 30 Nov 2011 08:23:59 -0500, Michael Scheidell wrote:


   sed -i '' -e '/INSERT INTO bayes_seen/s/INTO/IGNORE INTO/' MySQL.pm

(hey SA folks.. any reason not to just put that into 3.4.0?  won't
hurt anything, will it?)


or simply just

ALTER TABLE  `bayes_seen` ENGINE = INNODB


no, that won't do anything (I use engine = innodb), what has innodb have 
to do with replication collisions?


nothing.  nothing at all.


--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
>*| *SECNAP Network Security Corporation

   * Best Mobile Solutions Product of 2011
   * Best Intrusion Prevention Product
   * Hot Company Finalist 2011
   * Best Email Security Product
   * Certified SNORT Integrator

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.spammertrap.com/
__