Re: configure question

2015-01-17 Thread Daniel Staal
--As of January 17, 2015 4:20:36 PM -0700, Michael Williamson is alleged to 
have said:



to both /etc/mail/spamassassin/local.cf and
/home/username/.spamassassin/user_prefs,
I check the file permissions to be readable by all. I restart it

 # service spamassassin restart


--As for the rest, it is mine.

That's calling some script from /etc/rc.d/init.d, if I remember Centos 
correctly.  Would you be able to look at/post that script?  I suspect that 
it's probably setting the location of the config files via options, so if 
we can figure out what it's doing than we can figure out what needs to be 
changed.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: regex: chars to escape bsides @

2015-01-05 Thread Daniel Staal
--As of January 5, 2015 4:38:03 PM -0800, John Hardin is alleged to have 
said:



On Mon, 5 Jan 2015, Bowie Bailey wrote:


On 1/5/2015 4:13 PM, John Hardin wrote:

 On Mon, 5 Jan 2015, Bowie Bailey wrote:

  You can avoid having to escape the slash (/) by using a different
  separator for the regex.  This can avoid leaning toothpick
  syndrome.

  For example:
m#http://match/this/url/#

 Ouch. # won't work for that (in SA at least) as it comments out the
 rest of the RE.


Ack!  Forgot about that minor difference with SA.  # is my general go-to
character for that in normal Perl scripts.

This should illustrate the same point with the minor improvement of
actually  *working* in SA:
  m^http://match/this/url/^


I tend to avoid using symbols that are syntactically significant in REs
for that purpose. In your example, you can't then anchor the RE at the
beginning of the URL because ^ has been repurposed as the RE delimiter.


--As for the rest, it is mine.

Since we've already established this is Perl...

I like to use braces.  Perl handles them (and brackets or parens) 
specially: Open with the opening brace and you close with the closing 
brace.  I think Perl will parse for balance as well, but I haven't checked 
at the moment.


 m{http://match/this/url}

In general though I do tend to stick with slashes unless it's going to be a 
problem; it's just more common and easier for people to recognize.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: BAYES_999=0.2 how to set this score higher?

2014-11-04 Thread Daniel Staal
--As of November 4, 2014 10:39:56 AM -0800, motty cruz is alleged to have 
said:



Hello, I would like to set BAYES_999=0.2 score higher than 0.2; I
searching for file but I can't find it in
/usr/local/etc/mail/spamassassin (am using FreeBSD)


--As for the rest, it is mine.

Another poster already answered your question, but you should also keep in 
mind that BAYES_999 is an *additive* score - Anything that hits it hits 
BAYES_99 as well, so really the score for BAYES_999 should just be the 
*additional* amount of likelihood that such a mail is spam.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: procmail (was Re: Spam messages bypassing SA)

2014-10-27 Thread Daniel Staal
--As of October 27, 2014 8:29:52 PM +0100, Robert Schetterer is alleged to 
have said:



by the way

http://www.exploit-db.com/exploits/34896/

always have a shellshock patched system these days with postfix/procmail


--As for the rest, it is mine.

Interesting.  I dug a bit further out of curiosity.

Postfix is irrelevant in this - Procmail is what needs to be looked at. 
More specifically, the rules that are being used; running procmail in and 
of itself doesn't allow this to be exploited, it's only if you have a 
procmail rule that sticks info into the environment (not uncommon) that it 
happens.


The default shell is the recipient's login shell - though that can be 
overridden in procmailrc.


I wouldn't rule out other LDA's from having similar problems without proof 
- but it's something to be aware of.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: .link TLD spammer haven?

2014-10-24 Thread Daniel Staal
--As of October 25, 2014 12:45:31 AM +0200, Reindl Harald is alleged to 
have said:




Am 25.10.2014 um 00:42 schrieb RW:

On Fri, 24 Oct 2014 21:31:51 +0200
Reindl Harald wrote:


Am 24.10.2014 um 21:20 schrieb Quanah Gibson-Mount:

--On Thursday, October 23, 2014 11:56 PM +0100 Martin Gregorie

Thanks for that. I've now installed it and have been running tests
against my spam corpus to make sure that this subrule:

 uri  __MG_LTD1   /\.link/i

was now working correctly. Its hit all the stuff I thought it
should, but my subrule turned out to be deficient because it will
also hit any URI containing .linkedin, so anybody who has copied
it should rewrite that rule so it looks like this:

 uri  __MG_LTD1   /(\.link$|\.link\/)/i



Even with that change, it always hits mail from linkedin


logical, the seond part of the or is not terminated and defeats the
first one and so the whole purpose of the or


In the second part the \.link has to followed by a '/'


thanks, i stand corrected

but then it should not catch linkedin


If it does it's behaving oddly.  Still, I might try this instead:

 uri  __MG_LTD1   /\.link\b/i

That should be faster and more general than the second one above, and 
shouldn't grab linkedin either.  (Unless of course they've decided to set 
up a .link address...)


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Spam messages autolearned as ham

2014-09-25 Thread Daniel Staal
--As of September 25, 2014 11:13:16 AM -0400, Deeztek Support is alleged to 
have said:




You *did* keep your initial Bayes training corpora, right?



I have an account that I have used to sign up for everything under the
sun over the past 10 years. It's a goldmine for spam. I figured I use
that to train the Bayes.


--As for the rest, it is mine.

If it's not the same types of spam as your main mail accounts, it's pretty 
much useless for bayes training.  Check.  ;)


Also: Make sure you train enough ham.  Bayes needs to learn what's 
*different* about spam and ham.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Daniel Staal
--As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged 
to have said:



This incidence is part of the initial round of IANA accepting generic
TLDs. There's hundreds in this wave, and some are abused early. This is
moonshine registration, nothing like new TLDs being accepted in the
coming years.

Or is it? Will new generic TLDs in the future be abused like that, too?
How frequently will that happen? Is it worth being able to react to it
quickly? How long will URIBLs take to list them? How long will it take
for the average MUA to even linki-fy them?

Opinions? Discussion in here, or should I move this to dev?


--As for the rest, it is mine.

New TLDs will always be abused...

Anyway, personal opinion: Spamassassin is currently structured to have code 
and rules as separate things.  Putting this in the code blurs that - it's a 
rule.  Unless there is a major performance penalty, I would move it to be 
with the rest of the rules.  It should make maintenance easier and clearer.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: drop of score after update tonight

2014-08-25 Thread Daniel Staal
--As of August 25, 2014 7:06:32 PM +0200, Reindl Harald is alleged to have 
said:



masscheck ties to ensure spams score at least 5 points, but doesn't
care beyond that


yes, but given that the intention is to flag message above
5 with [SPAM] and reject messages above 7 which is the
intention running SA as milter the reduced score matters


--As for the rest, it is mine.

Who sets that policy?  Is it something you could think about changing (if 
it's a problem).


Did the percentage of spam flagged vs. rejected change overall?  Every time 
the rules update some rules will be scored higher and some lower, so 
figuring out each individual case is going to be pointless, but if the 
overall percentages remain stable your system hasn't actually changed how 
it operates.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: drop of score after update tonight

2014-08-25 Thread Daniel Staal
--As of August 25, 2014 7:49:39 PM +0200, Reindl Harald is alleged to have 
said:





Am 25.08.2014 um 19:35 schrieb Daniel Staal:

--As of August 25, 2014 7:06:32 PM +0200, Reindl Harald is alleged to
have said:


masscheck ties to ensure spams score at least 5 points, but doesn't
care beyond that


yes, but given that the intention is to flag message above
5 with [SPAM] and reject messages above 7 which is the
intention running SA as milter the reduced score matters


Who sets that policy?  Is it something you could think about
changing (if it's a problem).


finally i do that - which values needs to be found out and honestly
seeing that change i am unsure how to set score limits for both
(flag and reject) to prevent too mach messages passing through
and at the same time if such a large change happens introduce
false positives from one day to another


Based on a quick check of my email, if you consider 'flagged' as non-spam 
(but possible), then I'd probably set flag at 3 or 4, and reject (as spam) 
at 5.  Personally I use a 'probably spam' and 'definitely spam' system 
(both are set aside), with cutoffs at 5 and 10, respectively.


But part of the point is that 7.5 to 5.3 is *not* a large change, as far a 
spamassassin is concerned.  5.1 to 4.9 would be a large change. ;)


I have rarely ever had a false positive with spamassassin - I get maybe 
two-three a year.  I get that in false negatives a day, when things are 
working well.  (Which amounts to about 1% of the spam I get as false 
negative.)



i admit not have that much expierience but want to avoid
major mistakes in the setup as good as possible before
going live


My advice: Don't over-think it.  Spamassassin normally does a good job, 
with base settings and things turned on.  Train your bayes well, and watch 
for new things, but in general don't try messing with a lot of settings 
unless you have problems with a live mail stream.



Did the percentage of spam flagged vs. rejected change overall?


i am at early testing of SA and there is no active mail flow
since i am about finsish admin backends and how to generate
config files for SA/ClamAV/Postfix which is now at a nearly
well, for my private doamin as public test good enough


Every time the rules update some rules will be scored higher and
some lower, so figuring out each individual case is going to be
pointless, but if the overall percentages remain stable your system
hasn't actually changed how it operates


as said - i am about implement SA, saw the message from the
update cronjob the first time for some days and looked a
bit deeper if things changed


And I think you ended up over-thinking it.  It was marked as spam before, 
it's marked as spam now.  Some other emails would probably have scored 
higher than they used to.  We've actually had a long break in updates - 
usually they are multiple times a week, if not every day, but it's been 
around a month since they last updated.  Rules probably changed scores more 
than normal - but it still scored the mail as spam.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: [Spam] Re: Bayes training via inotify (incron)

2014-08-24 Thread Daniel Staal

--As of August 25, 2014 4:00:15 AM +, Eric Wong is alleged to have said:


Daniel Staal dst...@usa.net wrote:

Good points, but inotify might still be overkill.  `ls maildir/cur/
| grep ',.*S` will give you all messages that have been seen in the
mailbox, so you can run on a periodic schedule fairly easily.  I'm
not sure whether you need the immediate notification inotify gives.


I used to use `find' in a similar way you use `ls', but that redundantly
trains old ham messages.  That's slow for large ham folders, but fine
for spam, though (combined with `rm').

But maybe training ham is overkill?  I'm not sure about that.


I've never actually found it worth the effort to set up, personally.  I 
archive the old spam into another folder, but basically the same idea.


You could use `ls -t` + `head` to only get new files...  (I was mostly 
pointing out that the info is in the filename.)  Or you could resort to a 
script I wrote ages ago that simplifies some of that.  ;)


https://github.com/DanStaal/Arcfind

(I really should finish cleaning it up for CPAN at some point...)


inotify won't work for me - I'm on a BSD where inotify doesn't exist
- but it's an interesting approach.


Yeah, I wonder if there's something like incron for kqueue, since I
know kqueue supports FS notifications.


You probably could do it using famd, I think...  (Though it's a bit less 
widespread.)


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Bayes training via inotify (incron)

2014-08-22 Thread Daniel Staal
--As of August 23, 2014 3:22:13 AM +0200, Karsten Bräckelmann is alleged 
to have said:



On Fri, 2014-08-22 at 17:32 -0700, Ian Zimmerman wrote:

Isn't inotify a bit of overkill for this?  If you have a dedicated
maildir for training, you know that anything in maildir/new is, uh,
new.  So you process it and move it to maildir/cur.  What am I missing?


The new/ directory is for delivery, messages moved will end up in cur/.

Training on messages in new/ means training solely on classification.
These messages have not been seen by a human, and he's most likely not
even aware there's new mail at all.

Messages moved (copied) into dedicated (ham|spam) learning folders will
be placed in cur/.

Thus, training on content in dedicated learning folders' new/ dirs won't
work, because human reviewed mail does not go there. And training on
new/ dirs in general is like overriding all of the precaution measures
of SA auto-learning, and blindly train anything and everything above or
below the required_score threshold.


Besides, moving messages from new/ to cur/ is the IMAP server's duty. No
third-party script should ever mess with that.


--As for the rest, it is mine.

Good points, but inotify might still be overkill.  `ls maildir/cur/ | grep 
',.*S` will give you all messages that have been seen in the mailbox, so 
you can run on a periodic schedule fairly easily.  I'm not sure whether you 
need the immediate notification inotify gives.


That said: It's still an interesting and possibly useful approach.  My 
current system is that I have a 'misfiled spam' folder, and I train on 
everything in it every night.  (And auto clean it out every night as well.) 
I let autolearn take care of normal ham.  (The occasional misfiled ham I've 
always handled manually, as they are so few it's never been worth 
automating.)


inotify won't work for me - I'm on a BSD where inotify doesn't exist - but 
it's an interesting approach.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Second step with SA

2014-08-15 Thread Daniel Staal
--As of August 15, 2014 1:23:37 PM +0200, Antony Stone is alleged to have 
said:

On Friday 15 August 2014 at 13:05:26 (EU time), Timothy Murphy wrote:


1) What is the simplest way to reject mail in chinese, russian
and turkish?


http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf
.html#language_options


snip


I guess 1% of email from Brazil might be legit,
but losing it is a small sacrifice.
I guess I could look at the sites - there may be only a couple.
What is the easiest way to define email from a given site as spam?


http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf
.html#whitelist_and_blacklist_options


Both of these links are out of date.  The whitelist/blacklist it probably 
doesn't matter to much, but the language option in the first has been 
discontinued entirely.


The correct links for the current version of Spamassassin are:
http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#language_options
http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#whitelist_and_blacklist_options

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Opinions needed on what to consider spam

2014-08-13 Thread Daniel Staal
--As of August 13, 2014 11:25:26 AM -0400, David F. Skoll is alleged to 
have said:



I believe that unsubscribing is safe.  If the list owner is legitimate,
unsubscribing will work.  If the list owner is a spammer, he/she already
has your email address and I don't believe spammers track the validity
of addresses anyway.  (Safe doesn't mean effective, of course!)

The only case in which unsubscribing is dangerous is if you
unsubscribe from a previously-unknown address.  That'll get you added
to spammers' lists.


--As for the rest, it is mine.

There is a third case I've seen on occasion, that hasn't been discussed: 
Unsubscribe via web.  Many legitimate sites use it - to unsubscribe you 
click a link and go a web site, which gives some option to unsubscribe. 
(Often from multiple lists, or something similar.)


But these are *not* safe if the mail isn't 'legitimate': I have also seen 
the link go to a site filled with malware; the unsubscribe link then is the 
real attack.


I'm still split on unsubscribe-via-email, but I don't consider it actively 
hazardous.  Unsubscribe-via-web can be.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Spam Assassin - does it work or not?

2014-08-11 Thread Daniel Staal
--As of August 11, 2014 10:00:34 AM -0400, David F. Skoll is alleged to 
have said:



On Mon, 11 Aug 2014 06:45:24 -0700
Andy a...@opticaltoys.com wrote:


If I'm sounding like a leech, that's because in this case I would very
much like to be.  :o)


I have fired paying customers for behaving like you.  It's even worse
to abuse a community of free software users and authors.

Paid spam filtering is cheap.  If the spam filtering you receive from
your hosting provider is inadequate, either switch providers or pay
for spam filtering from someone else.


--As for the rest, it is mine.

He's being polite, and trying to understand if he's getting good service 
from his hosting provider, while dealing with a product that's above his 
technical level.  I really don't see what the problem is.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: I need professional help

2014-07-13 Thread Daniel Staal
--As of July 13, 2014 7:56:38 PM +0200, Antony Stone is alleged to have 
said:



On Sunday 13 July 2014 at 19:52:57, Pat Traynor wrote:


On Sun, 13 Jul 2014, Antony Stone wrote:
 Have you been able to identify whether the unsolicited mail which has
 been thus detected is:

 - genuine email (possibly of a marketing variety, but still
 deliberately sent) from your hosting customers

It's absolutely not from MY customers.  I don't let anyone relay their
outgoing email through me.


On Sunday 13 July 2014 at 16:35:14, Pat Traynor wrote:


I run a web server, and for many of my hosting customers, I'll forward
their email to other mail servers.


Now I'm confused.


--As for the rest, it is mine.

That's incoming mail - mail to known and enumerated email addresses.

He'll forward mail *to* his customers, but not *from* his customers.

To the original poster: If you want to hire someone, might I suggest a site 
like oDesk?

https://www.odesk.com/

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: More text/plain questions

2014-07-07 Thread Daniel Staal
--As of July 7, 2014 5:20:01 PM -0400, Kevin A. McGrail is alleged to have 
said:



On 7/7/2014 5:09 PM, Philip Prindeville wrote:

On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail kmcgr...@pccc.com wrote:


On 7/7/2014 2:28 AM, John Wilcock wrote:

Le 05/07/2014 19:08, Philip Prindeville a écrit :

As for encoding a cyrillic small a: there are many ways to do this.
iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this
would be very efficient—there are just too many charsets possible.

Normalising the input message to UTF-8 before body checks would help
somewhat with that. I seem to remember there's been talk of doing this.


Yes, or utf-16...  I think that will be necessary to keep SA effective
in the modern world sooner than later.


Okay, but… if the message body is non-ASCII and the CTE is 8bit or
base64 and no explicit charset has been given, how do you know which
translation to perform?

I get a lot of Han SPAM in GB2312 where the charset is never specified
(apparently it’s a national default in China, despite the requirements
stated in RFC-2045 and -2046).

Sorry, I haven't even started delving into the devilish details but I
know it's looming as a needed feature.


--As for the rest, it is mine.

Just to start the discussion: I'd say default to UTF-8 if not otherwise 
specified and can't be worked out.  (How hard to work on 'working it out' 
is a question, of course.)  It's the growing standard, as far as I can tell.


Even if it's wrong in a particular case, it would probably be useful: It 
would give rule writers something to work with.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: getting tons of SPAM

2014-07-01 Thread Daniel Staal
--As of July 1, 2014 7:39:43 PM -0500, Steve Bergman is alleged to have 
said:



On 07/01/2014 05:07 PM, motty cruz wrote:

If it needs to be *instant*, have them visit a web page to enter service
requests.



Because there's not way that web-based email forms can be abused.

Please. The whole delay thing is about the ridiculous greylisting kluge.
There are plenty of other spam avoidance kluges which don't involve
significant delay. I really can't believe what I'm hearing here. It has
little to nothing to do with reality. Spam is a problem. But you don't
have to make your users wait hours for important emails by making your
mail servers play hard to get games with each other.

This is just silly.

If I forwarded this conversation to my email users, they'd be ROTFL over
what the experts are saying about the tool they use daily.

It has problems. But long delays would be unacceptable. And http can't
really replace all it's functionality. Web email forms are the slow,
limiting, and annoying.


--As for the rest, it is mine.

95+% of the time, email is immediate, true.  But it is not uncommon for 
mail to be delayed for hours or days either, even without greylisting.  It 
happens in the wild all the time, even (especially...) with the big 
providers.  Email is also not 100% reliable: It is a best-effort service 
and can and does drop messages on occasion.  (With varying degrees of 
notification: By the spec, notification should always happen, but 
experience says that causes backscatter, so it's not always by the spec.)


If you need an immediate, reliable communication method email will appear 
to work - but will randomly fail, and there will be *nothing you can do 
about it.*  If that's what your users are expecting you are doing a 
*disservice* to your users, because it *won't work.*


There are solutions that will, which have higher overhead costs than email. 
A password-protected web form is better - it won't fail silently.  Or there 
are specialist messaging protocols.  But if your users are expecting email 
to be that solution you are going to give yourself headaches.


Now, if 'most of the time' immediate communication is enough, that's fine. 
It may not be worth it for you to implement a higher reliability protocol - 
they cost time and money.  (I used to work for a company who's sole product 
was a 100% reliable communication protocol.)  But don't complain when it 
fails, because it will, and both you and the users need to expect that.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: getting tons of SPAM

2014-07-01 Thread Daniel Staal
--As of July 1, 2014 9:40:05 PM -0500, Steve Bergman is alleged to have 
said:



95+% of the time, email is immediate, true.


More like 99%+ of the time. When it's not, I hear about it.


But it is not uncommon for
mail to be delayed for hours or days either,


It's uncommon enough that when it does happen I get a phone call about a
user not being able to receive email.


It's common enough that I saw it every day in my last job.  99.9% of the 
time the users didn't notice, or care.  On the other hand there were the 
times I had to show them the log files showing exactly when we got and sent 
the message, and had to have a talk about expectations.  (Nearly always the 
message had gone through our system in seconds.)



even without greylisting.


Greylisting is an ugly hack that I'm hesitant to even dignify by having
the topic of serious conversation.


I won't defend it.  I've never used it.  ;)


I'm not at all sure what you're talking about regarding email vs web form
reliability. What are the links in that chain?

The email client can malfunction in some way. But then again, so can a
browser. The sending server can malfunction in some way. But so can the
web proxy. Then WAN link can go down on the sending side. But then, that
can happen with both web and email. The receiving side's WAN can go down
too. But in the case of a mail server it tries and tries and tries to get
the message through as quickly as possible. The browser and proxy server
certainly don't. They just drop it if anything goes wrong.


I only said that it won't fail silently: If you are depending on it for 
immediate communications, you'll know when you didn't get that, while with 
email it'll be hidden.


Maybe 'better' wasn't the right word: It's a trade off.  If you want the 
message to go through, email is set up to keep trying.  If you want the 
message to go *now*, the web form will tell you if it did (making the 
assumption that the form returns a 'message delivered' screen once it has 
delivered the message), and the user can try for another form of 
communication if it fails.



You tell me that email is unreliable. And yet anyone can see that it *is*
quite reliable, until you, as a mail admin, foolishly introduce the
self-DOSing technique of greylisting, and fall on your own sword.

You can go on about how it makes sense to fall on your sword. But I'm a
realist, and not buying it.


As I said: I've never used greylisting.  I have seen mail queues regularly 
holding messages for hours or days.  Email is fairly reliable - but I 
wouldn't let a user treat it as 100% reliable and immediate, because I know 
it isn't.  Better a few hard conversations about expectations and options 
then lost business due to using the wrong tool for the job.



I'll also be typing this post up, putting a stamp on it, and mailing it.
It might reach you there faster. ;-)


Not faster, but probably more reliable.  ;)


How many people here actually use greylisting and don't get complaints?

Our ISP, who previously handled our email certainly didn't introduce any
noticeable delays. And nobody ever got a noticeable amount of spam, or
reported to me a missed or late email.


Then they didn't notice them.  In the normal course of things, most mail 
gets through in seconds, and most of the delays are in the range of minutes 
to hours - short enough that people don't see them unless they are paying 
close attention.  (And they may not be checking mail that often anyway.)



Amazing, IMO. But it was obviously done without the ridiculous and
unacceptable practice of greylististing.

I want to achieve the results that Windstream does.


You probably can.  ;)  But I'm sure Windstream didn't get you every piece 
of mail immediately after it was sent - just as soon as they could after 
they got it.  I'm not even saying I like greylisting - I'm just saying you 
should work to set user expectations to reality, which is that email 
sometimes takes time to get delivered and (rarely) gets lost.  If something 
is absolutely time-critical, they should treat email as a backup, not the 
primary form of communication.  If it can spare an hour or two on occasion, 
email's fine.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: SA without procmail?

2014-06-20 Thread Daniel Staal
--As of June 20, 2014 2:05:04 PM +0100, Timothy Murphy is alleged to have 
said:




On Thursday, June 19, 2014 11:52:59 PM Ian Zimmerman wrote:




Axb Dovecot's Sieve is your friend. (replaces procmail)




Not really, not in this context. OP is using procmail merely as a LDA.
And in that capacity, is is replaced by the LDA that comes with dovecot.
On my debian system, it is /usr/lib/dovecot/dovecot-lda.




Thanks for the response. (I am the OP.)
Did you mean that procmail _can_ be replaced by dovecot-lda,
or that that is done _automatically_?


Can be, as you are seeing.


On my CentOS-6.5 system, I have /usr/libexec/dovecot/dovecot-lda
but I don't see any evidence that it is replacing procmail .
I get procmail by appending
mailbox_command = /usr/bin/procmail -f- -a $USER
to /etc/postfix/main.cf .
Is there something similar I could append instead to use dovecot-lda?

Incidentally, nobody really answered my original query -
I don't see why SA couldn't divert spam to a spam-folder,
instead of adding a header?
That would seem much simpler to me.


Mostly because it's designed as a filter: It's not operating on a file, 
it's operating on a message.  That message may be from a file, from the 
mail system, or from some mail store.  It might be going to any of the 
above.  'Divert spam to a spam-folder' can mean a *lot* of different 
things, under different circumstances.  It can mean writing a file to a 
folder, it can mean appending to a file, it can mean inserting into a 
database, etc.  And what if (like me) you want some spam in one folder and 
some in another?  Or something else?


At the end of the day, delivering mail to the user is the job of the LDA, 
and it's best to let it do it's job.  SA does the 'simple' thing and 
provides the LDA with the information it needs.  Or whatever part of the 
mail system it's talking to - SA doesn't have to be used on an end system 
either, it can be part of a filter in the middle of processing/forwarding 
mail.


Writing to a spam-folder makes one use-case simpler, but only the one, and 
it makes many others harder.  (And you'd still have to work out what is 
meant by 'folder'.)  Making it an option just makes SA more complicated, 
especially if you try to cover all possible cases.


As a filter SA is simple to use, implement, and deploy.  It's usable in a 
wide variety of situations, including ones the devs never thought of or 
hear about.  Writing to a folder would be limiting and complex.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r

2014-06-15 Thread Daniel Staal


Please try to keep responses on-list; other people may have better answers 
than I do.  ;)


--As of June 16, 2014 10:58:29 AM +1000, Tom Robinson is alleged to have 
said:



On 14/06/14 05:22, Christoph (Stucki) von Stuckrad wrote:

Hi! - and Sorry, all my tries to post did bounce.
Seemingly our updated mailsystem changed something.

So directly to you and may be somebody can post it,
if it's useful.

On Fri, 13 Jun 2014, Tom Robinson wrote:
...[errormessage]...

and Daniel Staal
...[tied split into four lines, which will work]...

Try changing


227 {printf %s, ((localtime $twas) =~ s/... //r =~ s/:..
/ /r);}


to:   {printf(%s, ((localtime $twas) =~ s/... //r =~ s/:..
/ /r));}

There exist cases of ambiguity calling a function with a list of
parameters which themselves are lists and need (...).

Then you fix the interpretation of the parameter list by the extra pair
of (...) around ALL the parameters of printf.

Hope this helps, as I have not tested it, but experienced the
same problem many times in debug prints :-)

Hi Christoph,

Thanks for looking at this.

I tried your suggestion but it didn't help. :-\

I also tried Daniel's suggestion:

{
   my $temp = localtime $twas;
   $temp =~ s/:.. / /;
   $temp =~ s/... //;
   printf %s, $temp;
}

Which does allow the script to run.

Daniel, you said that your fix *should* be equivalent. How will I know?

Does any one else use this script? Where I can log a bug report?


Well, to be absolutely certain, you'd need to run it through B::Deparse 
read the output, but I don't think you'll need to go quite that far...


Mostly that was me saying 'I'm coding in the email client - no suitability 
for anything is guaranteed'.  I don't see any reason why it should be 
different in any cases - but I haven't researched all possible cases.  The 
only real difference is that I am using a temporary variable - and even 
there I suspect Deparse would show that Perl is using one anyway.


If you do file a bug report someplace, mention that they should take a look 
at the POSIX module and strftime - I have the *strong* suspicion that the 
whole convoluted mess could be replaced with one function call.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Rule Update!

2014-06-15 Thread Daniel Staal


I just wanted to say that my sa-update cronjob finally succeeded in 
updating the rules tonight.  Congrats and thanks to everyone who's been 
working on getting the update server back up and running; it appears you've 
succeeded.  ;)


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r

2014-06-13 Thread Daniel Staal

--As of June 13, 2014 8:21:50 AM -0400, Joe Quinn is alleged to have said:


On 6/12/2014 10:27 PM, Tom Robinson wrote:

Hi,

Sorry to bother you with this. As referenced on the ApacheSpamAssassin
Wiki for AutoWhiteList
(https://wiki.apache.org/spamassassin/AutoWhitelist) I downloaded the
Truxoft version of the sa-heatu utility
(http://truxoft.com/resources/sa-heatu.v4.02.tar.gz ) but when I run it
I get these errors:

Bareword found where operator expected at /usr/local/bin/sa-heatu line
227, near s/... //r Bareword found where operator expected at
/usr/local/bin/sa-heatu line 227, near s/:.. / /r syntax error at
/usr/local/bin/sa-heatu line 227, near s/... //r  Execution of
/usr/local/bin/sa-heatu aborted due to compilation errors.

I'm running a CentOS 5.10, 32bit system.

My version of perl is:
# perl -version
This is perl, v5.8.8 built for i386-linux-thread-multi
---8---snip*---

I fetched a version of sa-heatu from git hub as well but it is the same
file (diff shows no differences and I get the same errors when running).

Here is a snippet of the code in context:

224 if ($count  ($opt_verbose || ($opt_verboseHits 
$count$opt_verboseHits) || ($opt_showUpdates  $prtu))) {
225 printf $fmt, $totscore/$count, $totscore,$count, $email,
$ip, $reason; 226 if (!$opt_NoTimes  (($twas||0)!=0))
227 {printf %s, ((localtime $twas) =~ s/... //r =~ s/:.. /
/r);}  # don't include d-o-w, and drop seconds as that
implies precision
228 }

Not being a perl expert I'm not sure exactly what is wrong here. Can
anyone please help determine the issue?

Kind regards,
Tom


/r is not a valid regex modifier, and gets parsed as a bareword - see
http://perldoc.perl.org/perlre.html#Modifiers


--As for the rest, it is mine.

That's not it: /r is a valid *substitution* modifier:

http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators

I'm not sure what the problem is though.  It *looks* ok to me.  It might be 
worth breaking line 227 into four lines just to see if that can show the 
problem better.


It should be equivalent to:

{
   my $temp = localtime $twas;
   $temp =~ s/:.. / /;
   $temp =~ s/... //;
   printf %s, $temp;
}

(Note the /r is needed in the original because `localtime $twas` isn't 
something you can assign to.)


I'm not entirely certain on which order the strung-together substitutions 
are evaluated, or if it matters.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Bareword found where operator expected at /usr/local/bin/sa-heatu line 227, near s/... //r (fwd)

2014-06-13 Thread Daniel Staal


Got this off-list, might be helpful.

 Forwarded Message 
Date: June 13, 2014 9:22:21 PM +0200
From: Christoph (Stucki) von Stuckrad stu...@math.fu-berlin.de
To: Tom Robinson tom.robin...@motec.com.au, Daniel Staal dst...@usa.net
Subject: Re: Bareword found where operator expected at 
/usr/local/bin/sa-heatu line 227, near s/... //r


Hi! - and Sorry, all my tries to post did bounce.
Seemingly our updated mailsystem changed something.

So directly to you and may be somebody can post it,
if it's useful.

On Fri, 13 Jun 2014, Tom Robinson wrote:
...[errormessage]...

and Daniel Staal
...[tied split into four lines, which will work]...

Try changing


227 {printf %s, ((localtime $twas) =~ s/... //r =~ s/:.. /
/r);}


to:   {printf(%s, ((localtime $twas) =~ s/... //r =~ s/:.. /
/r));}

There exist cases of ambiguity calling a function with a list of
parameters which themselves are lists and need (...).

Then you fix the interpretation of the parameter list by the extra pair
of (...) around ALL the parameters of printf.

Hope this helps, as I have not tested it, but experienced the
same problem many times in debug prints :-)

Stucki


--
Christoph von Stuckrad  * * |nickname |Mail stu...@mi.fu-berlin.de \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(Mo.,Mi.):+49 30 838-75 459|
Mathematik  Informatik EDV |\ *|if online|  (Di,Do,Fr):+49 30 77 39 6600|
Takustr. 9 / 14195 Berlin   * * |on IRCnet|Fax(home):   +49 30 77 39 6601/


-- End Forwarded Message --



Re: Operations on headers in UTF-8

2014-06-11 Thread Daniel Staal
--As of June 11, 2014 4:25:31 AM +0200, Karsten Bräckelmann is alleged to 
have said:



On Tue, 2014-06-10 at 21:22 -0400, Daniel Staal wrote:

--As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged
to  have said:
 Worse, enabling charset normalization completely breaks UTF-8 chars
 in the regex. At least in my ad-hoc --cf command line testing.

--As for the rest, it is mine.

This sounds like something where `use feature 'unicode_strings'` might
have  an affect


Possibly.


enabling normalization is probably setting the internal utf8
flag on incoming text, which could change the semantics of the regex
matching.


Nope. *digging into code*

This option mainly affects rendered textual parts and headers, treating
them with Encode::Detect. More complex than just setting an internal
flag. What exactly made the ad-hoc regex rules fail is beyond the scope
of tonight's code-diving.


Right.  And as a side-effect, Encode::Detect (as documented in Encode) is 
probably setting the utf8 flag on the Perl string.


Note I mean internal to *perl*, not one of the modules or code.  The utf8 
flag affects what semantics perl uses when it compares strings, including 
in regexes.



If that's the case, it raises the question of if we want Spamassassin to
require Perl 5.12 (which includes that feature) - the current base
version  is 5.8.1.  Unicode support has been evolving in Perl; 5.8
supports it  generally, but there were bugs.  I think 5.12 got most of
them, but I'm not  sure.  (And of course it's not the current version of
Perl.)


The normalize_charset option requires Perl 5.8.5.

All the ad-hoc rule testing in this thread has been done with SA 3.3.2
on Perl 5.14.2 (debian 7.5). So this is not an issue of requiring a more
recent Perl version.


`use feature 'unicode_strings'`, as a feature, only tangentially cares 
about what version of Perl you are running.  Yes, you need a new enough 
version to use it, but since features are not enabled by default any affect 
they might have doesn't occur unless they are requested.



While of course something to potentially improve on itself, the topic of
charset normalization is just a by-product explaining the original
issue: Header rules and string encoding, with a grain of charset
encoding salt.


True.  I was just thinking aloud as it were, and wondering if an 
explanation could be found for breaking UTF-8 strings in the regex.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Operations on headers in UTF-8

2014-06-10 Thread Daniel Staal
--As of June 11, 2014 2:45:25 AM +0200, Karsten Bräckelmann is alleged to 
have said:



Worse, enabling charset normalization completely breaks UTF-8 chars
in the regex. At least in my ad-hoc --cf command line testing.


--As for the rest, it is mine.

This sounds like something where `use feature 'unicode_strings'` might have 
an affect - enabling normalization is probably setting the internal utf8 
flag on incoming text, which could change the semantics of the regex 
matching.


If that's the case, it raises the question of if we want Spamassassin to 
require Perl 5.12 (which includes that feature) - the current base version 
is 5.8.1.  Unicode support has been evolving in Perl; 5.8 supports it 
generally, but there were bugs.  I think 5.12 got most of them, but I'm not 
sure.  (And of course it's not the current version of Perl.)


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Spamd not scoring messages

2014-05-23 Thread Daniel Staal

--As of May 22, 2014 3:04:04 PM +0200, Tom Hendrikx is alleged to have said:


Hi,

After checking the results of sa-update and doing some manual dns
queries, it seems that last rule updates were done more than a month
ago. This used to be an almost daily process, even when there were only
score changes due to masschecks.

Any specific reason for no new updates? Something we can assist with?


--As for the rest, it is mine.

This actually brings up an issue I've been tracking and trying to isolate. 
I still haven't isolated it, but I'll bring it up here in case anyone can 
help.


My system only restarts spamd when the rules have been updated.  This break 
has brought to light an issue where spamd - after running over 24 hours - 
stops actually *scoring* messages.  It still logs that it's *processing* 
them, but no score is applied - either in header or in logging.


As I've said, I'm still trying to isolate the exact causes: other activity 
on the box seems to be involved, so there may be load issues, or something 
weird going on with FreeBSD's Jail system.  (The issue seems to go away if 
I stop my CPAN smoker jail, though I'm not 100% sure of that.  In theory 
there should be no way the two processes interact - they aren't even using 
the same perl or kernel.)  The exact time to failure is also in question - 
it doesn't seem to happen in under 24 hours, but how long over that is a 
question.  (This is complicated by the fact that spamd has occasionally 
restarted itself inside the testing period.)


I *do* know it isn't a 3.4 issue - it was occurring before I upgraded.

I'll admit I haven't been working to hard on isolating it - mostly just as 
I do other things on the box I've been noticing if the behavior changes. 
It's even possible that some of what I think is 'normal' uncaught spam is 
part of this - my main notice is the mornings when I wake up and find 50+ 
spam emails in my inboxes.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Unexpected missing rule name, failure of spams/spamd to output X-Spam headers

2014-05-23 Thread Daniel Staal
--As of May 23, 2014 11:23:44 PM +0100, Martin Gregorie is alleged to have 
said:



This morning SA 3.3.2 was working as expected on my SA test box when I
amended a rule to recognise a new spam variant. The test box is running
a fully patched (as of last Friday) copy of Fedora 20. Then I did my
normal weekly yum upgrade. Shortly after that I got some new spam which
I ran a test on using my normal spamc/spamd test system on the SA test
box. To my surprise, no X-Spam headers at all were added to it.


--As for the rest, it is mine.

Two quick questions: Does it happen to *every* message passed to spamc, and 
does restarting spamd solve it?


This sounds similar to the behavior I was mentioning in a post earlier, and 
am having trouble tracking down.  Restarting mitigates in my case.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: sa-learn from a cronjob?

2014-04-20 Thread Daniel Staal
--As of April 20, 2014 12:14:37 PM -0700, Dan Mahoney, System Admin is 
alleged to have said:



Most of my users aren't command-line friendly.  I'd like to basically
have my IMAP server default to handing out two imap mailboxes that get
auto-crontabbed to training bayes.

Ideally, I'd also like to make it so that things dropped in the
learn_spam folder are deleted, and stuff in the learn_ham folder
(mistake-based training) are de-tagged and moved back to the inbox.
Alternatively, a single learned folder would do.

Perl's Mail::Box seems like a heavy tool for this simple task.  Does
anyone else have any recommendations?


--As for the rest, it is mine.

You might find this script helpful:
https://github.com/DanStaal/Arcfind

I wrote it ages ago for my own use to help in doing basically what you are 
asking for.  I found that my IMAP server had a bad habit of auto-deleting 
newly emptied directories, so I wanted to always leave at least one message 
in the 'learn as spam' folder.


I use it with Maildir folders: the invocation is usually along the lines of 
'mv `arcfind /mail/source/dir/cur/` /mail/dest/dir/cur/'


It doesn't feed to spamassassin itself, but a separate cronjob of 
'sa-learn' works just fine.


Daniel T. Staal

(I'm planning on putting it on CPAN as well, though I'm still considering 
the name and I need to fix some of the docs.  The main page README is 
correct, I just don't have the module versions documented fully yet.)


---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: BAYES_999 strange behavior

2014-02-20 Thread Daniel Staal
--As of February 20, 2014 1:56:18 PM -0500, Kevin A. McGrail is alleged to 
have said:



People have hard_coded BAYES_999 entries as well.   I recommend
forwarding the announcement from John to the other mailing lists you are
aware of these discussions.


--As for the rest, it is mine.

I intend to, as soon as I'm sure what's going to happen.  ;)  I just don't 
want people who've fixed their scores to be penalized.  I know that doesn't 
help people who copied your block re-defining the rules entirely, but 
nothing really helps them.  (Besides telling them not to do that unless 
they know what they are doing.)


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: BAYES_999 of score 1.0 (default)

2014-02-17 Thread Daniel Staal

--As of February 17, 2014 2:54:11 PM +, RW is alleged to have said:


On Mon, 17 Feb 2014 09:09:33 -0500
Kevin A. McGrail wrote:


On 2/17/2014 8:43 AM, Matus UHLAR - fantomas wrote:
 Hello,

 seems after last rule update we've got new rule BAYES_99 in
 72_scores.cf but
 without score (and thus default 1.0) in 50_scores.cf.

 ... a mistake happened apparently?

I'll look and see.  I've never tried to promote a bayes rule so it
might need to bypass sandbox.


I have spam that's already hitting BAYES_999 with the default 1.0 score.


--As for the rest, it is mine.

Same here - it's causing a fair amount of FNs, as I have BAYES_99 set with 
a 4.7 score, so this is lowering the spam score for a lot of mail.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: New expensive Regexps

2014-02-06 Thread Daniel Staal
--As of February 6, 2014 5:32:47 PM -0800, Dave Warren is alleged to have 
said:



On 2014-02-06 17:17, John Hardin wrote:

On Thu, 6 Feb 2014, Kevin A. McGrail wrote:


I've discussed it with Alex a bit but one of my next ideas for the
Rules QA process is the following:

- we measure and report on metrics for the rules that are promoted
such as rank (existing), computational expense, time spent on rule.


I assume meta rules would combine the expense of their components?

Sounds interesting!



How about if one or more components were called more by more than one
meta-rule? It's perhaps not entirely fair to divide it evenly, since that
might imply that removing the metarule would kill off that CPU usage.

Perhaps documenting the cost of the individual components, summing them,
with a flag to indicate that some or all of the components are shared?
That sounds overly complex, but it at least gives the enterprising rule
author or server administrator the ability to understand what is
happening.


--As for the rest, it is mine.

I would probably give the meta-rule no cost - add up the cost of the 
components if you want it.  (With the understanding that all no-cost rules 
are meta rules.)


Another option would be to give meta rules *negative* cost - the number is 
the size of the cost of the sub-rules, the negative indicates that it is a 
meta rule.


Just thoughts on options.

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


RE: Rules always triggering.

2007-01-13 Thread Daniel Staal
--As of January 13, 2007 7:17:46 AM -0500, Dave Koontz is alleged to have 
said:



Just a wild stab here, run a lint check on all your rules.  I once fat
fingered a rule in my local.cf file and got similar hit results as you are
describing here.


--As for the rest, it is mine.

I fixed a couple of things, but the issue is still there.  Current lint 
output:


[24241] warn: config: failed to parse line, skipping: auto_learn 1
[24241] warn: config: failed to parse line, skipping: safe_reporting 0
[24241] warn: config: failed to parse line, skipping: use_terse_report 0
[24241] warn: config: failed to parse line, skipping: subject_tag *** 
Warning: Junk Mail ***

[24241] warn: config: failed to parse line, skipping: rewrite_subject 0
[24241] warn: config: warning: score set for non-existent rule 
FAKE_HELO_YAHOO

[24241] warn: config: warning: score set for non-existent rule HABEAS_SWE
[24241] warn: config: warning: score set for non-existent rule 
FAKE_HELO_USA_NET
[24241] warn: lint: 8 issues detected, please rerun with debug enabled for 
more information


(Yes, I've built this config over a long period of time...)

I'm liking the idea that this is an issue with Perl on Darwin expecting a 
different line ending.  I just need to figure out how to _verify_ that.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Rules always triggering.

2007-01-12 Thread Daniel Staal
--As of January 12, 2007 12:40:00 PM -0800, John D. Hardin is alleged to 
have said:



On Fri, 12 Jan 2007, Daniel T. Staal wrote:


I am scanning mail via a procmail recipe

Anything in that configuration that you can think of that would
mess up those headers?  I can post a set if you would like.


There are procmail flags that allow passing only the message body text
to the filter program.

What's the procmail rule that you're using to call spamc?


--As for the rest, it is mine.

:0fw
| spamc

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: Rules always triggering.

2007-01-12 Thread Daniel Staal
--As of January 12, 2007 7:08:18 PM -0600, Shane Williams is alleged to 
have said:



System is Darwin, running Postfix.  The sign-up message for this list got
those rules triggered.  (_Everything_ triggers them.)


This is just a guess, but is it possible that OS X's use of carriage
returns is making the message look to spamassassin as if it's a single
line of text?


--As for the rest, it is mine.

I said Darwin, not OS X, though I recognize it is a small distinction.  ;)

The mail files are all saved to my Maildir folders with unix line endings. 
In general Darwin handles files in the format it receives them, and 
unix-tools create unix-files.


...But it does raise the question of what _Perl_ thinks the line endings 
is...  Hmm.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Skipping Resent-From for blacklist.

2006-10-18 Thread Daniel Staal


I've got a problem where one of the message boards I was on has been 
hijacked and taken over by spammers.  They are sending out short 
notifications of new board topics, all of which contain nothing but spam.


Bayes hits these, but at the moment nothing else is.  They do have the nice 
distinguishing characteristic though that they are being sent *from* the 
message board's email address.  So, time for a blacklist.


Which would be nice, except that all of my mail is forwarded from a 
commercial service to my personal mailserver.  When this happens, the 
mailservice puts in a 'Resent-From' header, with my own public address. 
This obviously is screwing up my attempts to blacklist these spams.  (Since 
if SA sees a 'Resent-From' it ignores all other froms in the headers for 
basic whitelists and blacklists.)


Any ideas on how to get around this, using SA?  (I can of course just 
filter those using procmail, but I'd rather SA did my spam filtering.)


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: sa-learn and Caught spams

2006-10-01 Thread Daniel Staal
--As of September 28, 2006 11:05:35 AM -0700, Kelson is alleged to have 
said:



Daniel Staal wrote:

Depends on the setup.  For instance, given the explanations above, I'll
start a system to automatically learn from my 'checkspam' folder, but
not my 'highspam' folder.  I have procmail automatically sort my spam by
score, so I can pay extra attention to low-scoring spam.  (Which is more
likely to be ham which was misplaced than the high-scoring spam.)

So, since I *already* have them separated out, I can avoid the
double-check.  ;)


But the final score alone doesn't determine whether something gets
autolearned.

As Matt pointed out, there are a number of different factors, including
the mix of head/body tests and the current Bayes score -- and it acts on
what the score would have been if Bayes had been disabled.

So unless you've filtered on the autolearn=(ham|spam|no) tag in the
X-Spam-Status header, you could be missing some high-scoring spam that
hasn't already been learned.

You could probably filter your training folder to remove any messages
where X-Spam-Status contains autolearn=spam (assuming, of course, that
your server takes full control of that header).  That should be
relatively fast and cut down on the resources used to identify duplicates.


--As for the rest, it is mine.

Just as an update, since I'm seeing something interesting...

As an experiment, I set procmail to copy all the 'highspam' that I get that 
*doesn't* get autolearned to a separate folder, and have been attempting to 
train on that folder daily.


I say 'attempting' because despite these *only* being the emails that had 
'autolearn=no' and were definitely spam, in three days sa-learn has yet to 
see any useful tokens in one of these messages.  Generally, upon 
examination, these messages already are receiving bayes scores of 99% or 
better, so it appears that the tokens found are already fully scored. 
(Though not all of them have had such high bayes scores.)


I'll be keeping it up for a while; three days isn't much of a test, after 
all.  But at this point it appears extra training on messages with scores 
over 10 (my high-spam cut-off) doesn't actually do anything.  All relevant 
tokens are already learned, at least in a fully-trained and well-tuned 
system.


Spam emails scored less than 10 do have a number of messages each day that 
have useful tokens, on my system.  Which is to be expected, after all.


Just thought this might be of interest.

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: SpamAssassin MX Gateway Server

2006-09-30 Thread Daniel Staal
--As of September 30, 2006 12:32:41 PM -0500, Russ B. is alleged to have 
said:



Basically, anything that arrives over 15 in score, will have that
SPAM-STATUS header embedded, so it does NOT run SpamAssassin on this
server, and just puts it in the Caught-Spam. If it has LOWER than a score
of 15 from the MX, then the MX server didn't put a header on it, so it's
processed here and filed here.

Why do that? Because my users on the sendmail server farm have a whole
variety of score choices they are using, so I want their specfic score to
be utilized - but by making the score on the MX 15, I'm saving the
sendmail server from a WHOLE LOT of processing, and nobody's going to have
a default score over 15... so that's a safe number?


--As for the rest, it is mine.

Just as a thought: Since you are running procmail on them anyway, it should 
be possible to have a script in there that reads the desired score and uses 
the score count Spamassassin embeds in the 'X-Spam-Level:' header to filter.


It wouldn't reformat the mail (at least not without a lot of work), but you 
could at least file it differently...


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---


Re: sa-learn and Caught spams

2006-09-27 Thread Daniel Staal

--As of September 27, 2006 5:43:28 PM -0700, Kelson is alleged to have said:


Daniel T. Staal wrote:

True.  So...  Optimal is obviously to train, once and correctly, on all
messages.  Sending a message through that has been trained will consume
*some* resources, but less then one that still needs to be learned.

So the exact balance is a complicated question.  ;)


I just train on everything.  If it's already learned from a message, it
takes a few resources for it to recognize that, but almost certainly less
time than it would have taken me to separate them out.


--As for the rest, it is mine.

Depends on the setup.  For instance, given the explanations above, I'll 
start a system to automatically learn from my 'checkspam' folder, but not 
my 'highspam' folder.  I have procmail automatically sort my spam by score, 
so I can pay extra attention to low-scoring spam.  (Which is more likely to 
be ham which was misplaced than the high-scoring spam.)


So, since I *already* have them separated out, I can avoid the 
double-check.  ;)


Anyway, I just knew that there was an automatic system, and at the very 
least there is *some* load to re-learning, even if a full analysis is 
skipped.  It would be interesting to see how much it actually is, compared 
to an easy filter.  If I find time, I may try to figure out a good test.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---