Re: SA works great!

2014-08-31 Thread Reindl Harald

Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
 Yes, it does work great when you have the bayes filter turned on and you take 
 the time to feed it.  And that means
 you have to feed the
 learner both ham and spam and setup reliable sources for those.
 
 Unfortunately if Bayes is not turned on, it does not catch more than
 around 60-70% of spam.  As a Spamassassin user  server admin, I would
 really like to see that improve.

60-70% without training is great

keep in mind that the first 90% of incoming is eaten by RBL's
and the 60% are from the remaining 10% at all :-)

i think it's impossible to improve that much out-of-the-box because
that would make it to sensitive while the bayes has the ham side of
your communication too for decisions

i am coming from a commercial device trying to block 100% and there
it ends in zero-hour-blocklists with domains even if they are only
linked on the youtube page of the blocked facebook notification

so i am glad that i have to do soem training by myself instead fear
of false positives which do much more harm

 On 8/30/2014 2:41 PM, Reindl Harald wrote:
 after two days running SA for the first two test-domains with a
 well trained bayes for the global milter-user: impressive!

 the few crap making it through poscreen RBL scroing is detected

 0.000  0  3  0  non-token data: bayes db version
 0.000  0   1389  0  non-token data: nspam
 0.000  0   1350  0  non-token data: nham
 0.000  0 257152  0  non-token data: ntokens

 Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for 
 sa-milt:189 in 0.6 seconds, 2454 bytes.
 Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
 BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS

 scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl,bayes=0.842503,autolearn=disabled

 Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: 
 milter-reject: END-OF-MESSAGE from
 snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; 
 from=jenniferje...@hotmail.com  to=***



signature.asc
Description: OpenPGP digital signature


bayes scroing too low

2014-08-31 Thread Reindl Harald
i guess it needs to adjust them depending on block score
was one of the typical enhance your penis mails

score BAYES_95  0  0  3.23.0
score BAYES_99  0  0  3.83.5

X-Spam-Status: No, score=4.4, tag-level=4.5, block-level=8.5
X-Spam-Report:
 *  0.5 CUST_DNSBL_8 RBL: ix.dnsbl.manitu.net   
 *  [192.157.213.199 listed in ix.dnsbl.manitu.net] 
 *  0.3 CUST_DNSBL_15 RBL: spam.dnsbl.sorbs.net 
 *  [192.157.213.199 listed in spam.dnsbl.sorbs.net]
 * -0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4)
 * [192.157.213.199 listed in wl.mailspike.net] 
 *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100% 
 * [score: 1.]  
 * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record  
 *  0.0 HTML_MESSAGE BODY: HTML included in message 
 *  0.0 T_KAM_HTML_FONT_INVALID BODY: Test for Invalidly Named or Formatted 
 *  Colors in HTML  
 *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%  
 *  [score: 1.] 
 *  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to
 *  background  
 * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders



signature.asc
Description: OpenPGP digital signature


Re: bayes scroing too low

2014-08-31 Thread Axb

On 08/31/2014 11:41 AM, Reindl Harald wrote:

i guess it needs to adjust them depending on block score
was one of the typical enhance your penis mails

score BAYES_95  0  0  3.23.0
score BAYES_99  0  0  3.83.5


you missed:
+ 0.2 BAYES_999



X-Spam-Status: No, score=4.4, tag-level=4.5, block-level=8.5
X-Spam-Report:
  *  0.5 CUST_DNSBL_8 RBL: ix.dnsbl.manitu.net  
  *  [192.157.213.199 listed in ix.dnsbl.manitu.net]
  *  0.3 CUST_DNSBL_15 RBL: spam.dnsbl.sorbs.net
  *  [192.157.213.199 listed in spam.dnsbl.sorbs.net]   
  * -0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4)   
  * [192.157.213.199 listed in wl.mailspike.net]
  *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
  * [score: 1.] 
  * -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 
  *  0.0 HTML_MESSAGE BODY: HTML included in message
  *  0.0 T_KAM_HTML_FONT_INVALID BODY: Test for Invalidly Named or Formatted
  *  Colors in HTML 
  *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100% 
  *  [score: 1.]
  *  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to   
  *  background 
  * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders


Are you using RAZOR  PYZOR?

Can you post this sample to pastebin?



Re: bayes scroing too low

2014-08-31 Thread Axb

On 08/31/2014 11:58 AM, Reindl Harald wrote:

Are you using RAZOR  PYZOR?


https://bugzilla.redhat.com/show_bug.cgi?id=1127650
perl-Razor-Agent - Only used for the not enabled by default Razor plugin

so i guess no


get the source from http://razor.sourceforge.net/
I don't recommend installing via some rpm.

same with Pyzor
http://www.pyzor.org
latest release has quite a few important bugfixes.


Can you post this sample to pastebin?


i don't have accounts on any one-click-hoster hence attached as ZIP


pff..  since when does one need an account at pastebin.com?


the main question is if i should raise up the scores on a machine
with a very well trained bayes and if they are only so low to
prevent false positives in bad trained environments


Bayes scores are *not* set to be a sole indicator of spam/ham.
They're supposed to be yet another indicator.





Re: bayes scroing too low

2014-08-31 Thread Reindl Harald

Am 31.08.2014 um 12:20 schrieb Axb:
 On 08/31/2014 11:58 AM, Reindl Harald wrote:
 Are you using RAZOR  PYZOR?

 https://bugzilla.redhat.com/show_bug.cgi?id=1127650
 perl-Razor-Agent - Only used for the not enabled by default Razor plugin

 so i guess no
 
 get the source from http://razor.sourceforge.net/
 I don't recommend installing via some rpm.
 same with Pyzor
 http://www.pyzor.org
 latest release has quite a few important bugfixes.

i keep both in mind

if it comes to some rpm it's in doubt from my own rpmbuilder :-)

 Can you post this sample to pastebin?

 i don't have accounts on any one-click-hoster hence attached as ZIP
 
 pff..  since when does one need an account at pastebin.com?

honestly never had a need for pastebin working 11 years
as sysadmin / developer and on most mailing-lists you
see angry respones for linking to external ressources

looks like in case of the SA-list i start to use it in the future

 the main question is if i should raise up the scores on a machine
 with a very well trained bayes and if they are only so low to
 prevent false positives in bad trained environments
 
 Bayes scores are *not* set to be a sole indicator of spam/ham.
 They're supposed to be yet another indicator

that was my guess and is still so by give BAYES_99 7.0
and reject via milter above 8.5

here are some internal DNSWL in the mix with different
trust levels and the bayes is only trained by myself for
all users since in the past people tended to feed their
spam bayes with newsletters they subscribed and for
whatever reason instead unsubscribe mark it as spam

frankly, even parts of my own family called me by phone
saying can't you block that mails? and after have
you subscribed there? and yes a angry then unsubscribe
there instead bring me to damage the detection for others

so users in the future will send me spam which made it through
the filter as attachment, after review i move it to the global
train-folder and add the junk coming to one of my 8 accounts
combined with my non-sensible communication as ham

what users in general fail is add enough of their ham to
the mix and mostly fail to reach the 200 at all



signature.asc
Description: OpenPGP digital signature


Re: SA works great!

2014-08-31 Thread Ted Mittelstaedt



On 8/31/2014 2:21 AM, Reindl Harald wrote:


Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:

Yes, it does work great when you have the bayes filter turned on and you take 
the time to feed it.  And that means
you have to feed the
learner both ham and spam and setup reliable sources for those.

Unfortunately if Bayes is not turned on, it does not catch more than
around 60-70% of spam.  As a Spamassassin user  server admin, I would
really like to see that improve.


60-70% without training is great

keep in mind that the first 90% of incoming is eaten by RBL's
and the 60% are from the remaining 10% at all :-)

i think it's impossible to improve that much out-of-the-box because
that would make it to sensitive while the bayes has the ham side of
your communication too for decisions



Google does it.  It's not impossible.


i am coming from a commercial device trying to block 100% and there
it ends in zero-hour-blocklists with domains even if they are only
linked on the youtube page of the blocked facebook notification

so i am glad that i have to do soem training by myself instead fear
of false positives which do much more harm



My experience is that the commercial providers like Gmail are now
so aggressive that false positives are VERY common on their systems,
this leads to people nowadays quite commonly saying check your
spam folder on their websites and such that send feedback messages.

Out of the box the default decision point of 5 is too high anyway.

I think the emphasis on avoiding false positives in the stock
(non-Bayes) distribution is far too high.  I suspect that over
the years many good rule submissions have been ignored because
incidence of false positives with them was too high for the
SA maintainers.

For a newbie to SA it is disheartening to install SA and not
get 90% with a 2% false positive, out of the box, but rather get
50% with a 0% false positive.  And I think that is a mistake the
maintainers are making is over-reliance on bayes.

At the least the SA maintainers should maintain a separate
highly aggressive rule distro that was optional that would
give us a much higher success rate with a corresponding
slight increase in false positives.

Their design approach has been to rely on Bayes to be trained to go from 
50% capture out of box with 0% FP to 80-90% capture with 0% FP.


But, the design approach could easily be relying on Bayes to go
from 90% capture with 5% FP out of the box, to 90% capture with
0% FP with Bayes, and the emphasis being on training Bayes on ham,
not spam.

Note I am pulling the percentages out of my ass, but I think you
get the idea.

Ted


On 8/30/2014 2:41 PM, Reindl Harald wrote:

after two days running SA for the first two test-domains with a
well trained bayes for the global milter-user: impressive!

the few crap making it through poscreen RBL scroing is detected

0.000  0  3  0  non-token data: bayes db version
0.000  0   1389  0  non-token data: nspam
0.000  0   1350  0  non-token data: nham
0.000  0 257152  0  non-token data: ntokens

Aug 30 23:34:19 localhost spamd[4882]: spamd: identified spam (8.9/4.5) for 
sa-milt:189 in 0.6 seconds, 2454 bytes.
Aug 30 23:34:19 localhost spamd[4882]: spamd: result: Y 8 -
BAYES_80,CUST_DNSBL_15,CUST_DNSWL_2,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLYTO,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,MALFORMED_FREEMAIL,MISSING_HEADERS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,REPLYTO_WITHOUT_TO_CC,RP_MATCHES_RCVD,SPF_PASS

scantime=0.6,size=2454,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=51671,mid=snt152-w505982b05a6fbba5c49ad2b1...@phx.gbl,bayes=0.842503,autolearn=disabled

Aug 30 23:34:19 localhost postfix/cleanup[6195]: 3hlrXp5S3dz1w: milter-reject: 
END-OF-MESSAGE from
snt004-omc1s37.hotmail.com[65.55.90.48]: 5.7.1 Blocked by SpamAssassin; 
from=jenniferje...@hotmail.com   to=***




Re: SA works great!

2014-08-31 Thread Reindl Harald

Am 31.08.2014 um 16:08 schrieb Ted Mittelstaedt:
 On 8/31/2014 2:21 AM, Reindl Harald wrote:

 Am 31.08.2014 um 02:15 schrieb Ted Mittelstaedt:
 Yes, it does work great when you have the bayes filter turned on and you 
 take the time to feed it.  And that means
 you have to feed the
 learner both ham and spam and setup reliable sources for those.

 Unfortunately if Bayes is not turned on, it does not catch more than
 around 60-70% of spam.  As a Spamassassin user  server admin, I would
 really like to see that improve.

 60-70% without training is great

 keep in mind that the first 90% of incoming is eaten by RBL's
 and the 60% are from the remaining 10% at all :-)

 i think it's impossible to improve that much out-of-the-box because
 that would make it to sensitive while the bayes has the ham side of
 your communication too for decisions

 
 Google does it.  It's not impossible.

Google has a lot of more data and power to feed a global bayes
and even then: they fail as you say yourself in the next paragraph

i don't care for the 5 spam messages
i care for the eaten important one

 i am coming from a commercial device trying to block 100% and there
 it ends in zero-hour-blocklists with domains even if they are only
 linked on the youtube page of the blocked facebook notification

 so i am glad that i have to do soem training by myself instead fear
 of false positives which do much more harm
 
 My experience is that the commercial providers like Gmail are now
 so aggressive that false positives are VERY common on their systems,
 this leads to people nowadays quite commonly saying check your
 spam folder on their websites and such that send feedback messages.

which defeats the intention of a spamfilter and the whole idea
of a junk-folder is broken - i need a contenfilter running
relieable before-queue to not see the real crap and some [SPAM]
tagged messages which are hand-move to ham/spam for train bayes

 Out of the box the default decision point of 5 is too high anyway.
 
 I think the emphasis on avoiding false positives in the stock
 (non-Bayes) distribution is far too high. I suspect that over
 the years many good rule submissions have been ignored because
 incidence of false positives with them was too high for the
 SA maintainers.

if you have users to support there is nothing more bad than
a false positive - 10 slipped junk mails are not that worse
as having a user complaining that ge don't get legit mail
and is tired of try to explain his customers how the could
make it through the filter

 For a newbie to SA it is disheartening to install SA and not
 get 90% with a 2% false positive, out of the box, but rather get
 50% with a 0% false positive.  And I think that is a mistake the
 maintainers are making is over-reliance on bayes.

no - as i showed in another thread that day the opposite is true
the bayes could and should have more impact

but that can't be default values because no software can know
how good the bayes data (ham and spam) are really and if it
is trained by a noob fire any newsletter into spam it makes
damage - mine is trustable because i know what i am doing in
that context

the most important thing in train a bayes is to know what
messages you should strongly avoid to feed in

 At the least the SA maintainers should maintain a separate
 highly aggressive rule distro that was optional that would
 give us a much higher success rate with a corresponding
 slight increase in false positives.

here i agree - maybe with a meta-rule or such which have
it's own score in local.cf - but i still think you
need to know what you are doing because such meta value
also makes compromises and in my case i trust my base
nearly unconditional but would not have other default
rules with the same power

 Their design approach has been to rely on Bayes to be trained to go from 50% 
 capture out of box with 0% FP to 80-90% capture with 0% FP.

easy spoken words

spammer are not dumb and follow SA updates too
how long do you think would such a default survive in the wild?

 But, the design approach could easily be relying on Bayes to go
 from 90% capture with 5% FP out of the box, to 90% capture with
 0% FP with Bayes, and the emphasis being on training Bayes on ham,
 not spam.

5% false positives out of the box is just inacceptable

the contentfilter anyways should be only the last defense
and your 90% spam eaten by postscreen and DNSBL scores
combined with postfix-PTR-regex reject dailup networks

only with the PTR check you get rid of around 80% of
botnet junk without anything else

 Note I am pulling the percentages out of my ass, but I 
 think you get the idea.

i get the idea and a few years ago a thought the same way

but looking what support times angry customers not get
important mail (including myself) wasted and how less
time it takes for each user to just delete his 10 daily
spam never face the other thounsands already blocked
my attitude in that context changed dramatically

that's also why 

Re: Give a penalty to messages with non latin UTF-8 characters?

2014-08-31 Thread Ian Zimmerman
On Sat, 30 Aug 2014 06:44:39 -0600,
LuKreme krem...@kreme.com wrote:

LuKreme I would welcome rules that would reliably penalize messages
LuKreme that use chinese, japanese, korean, thai, or any other
LuKreme characters in the UTF-8 address space that I don’t read. I
LuKreme would put them in user_prefs.

Doesn't ok_languages and ok_locales do the job?  It does for me.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: bayes scroing too low

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 12:20:41 +0200,
Axb axb.li...@gmail.com wrote:

Axb Bayes scores are *not* set to be a sole indicator of spam/ham.
Axb They're supposed to be yet another indicator.

FWIW, I use both Razor and Pyzor, and there are times when they seem to
be just asleep.  Or maybe a particular kind of spam defeats their hash
protection methods.  Then for some hours I get repeated cases like
Harald's - positive BAYES_999 but nothing much else.  It is quite
frustrating.

I started using the KAM rules and they seem to push most such messages
over - but then _they_ include rules with 5+ scores ...

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: sa-learn and find

2014-08-31 Thread Ian Zimmerman
On Sat, 30 Aug 2014 19:59:53 -0600,
LuKreme krem...@kreme.com wrote:

RW This may run into shell argument limits if you have to learn a lot
RW of spam. Consider piping the output of find to xargs, or using -exec
RW ...{} + in find.

LuKreme Yes, I tried to do that, but as I said in my first post, if I
LuKreme do the find as part of the sa-learn command, then it stall when
LuKreme the find command returns null.

xargs (the GNU one at least) has an option to not run the inferior when
there are no args to give it.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: SA works great!

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 16:55:50 +0200,
Axb axb.li...@gmail.com wrote:

Axb During the last +-4 years, scores have been set by the masscheck GA
Axb system.  IF more ppl would contribute with masschecks and rules,
Axb detection could be better, but the lack of volunteers doing this
Axb shows that apparently what SA does is good enough or there is
Axb little interest in commitment.

So, how do I take part in masscheck?

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: bayes scroing too low

2014-08-31 Thread Reindl Harald


Am 31.08.2014 um 23:06 schrieb Ian Zimmerman:
 On Sun, 31 Aug 2014 12:20:41 +0200,
 Axb axb.li...@gmail.com wrote:
 
 Axb Bayes scores are *not* set to be a sole indicator of spam/ham.
 Axb They're supposed to be yet another indicator.
 
 FWIW, I use both Razor and Pyzor, and there are times when they seem to
 be just asleep.  Or maybe a particular kind of spam defeats their hash
 protection methods.  Then for some hours I get repeated cases like
 Harald's - positive BAYES_999 but nothing much else.  It is quite
 frustrating.

nope - there is nothing frustrating

set the bayes scores higher if you trust them, i am starring
for some hours on my maillogs and without Razor and Pyzor
the results are *impressing*

in comination with postscreen and PTR-checks and SA as last
defense there comes 1 out of 1000 delivery attempts to a
user, as far as i see no false positives and a handful
of spam makes it through - trying to eliminate that would
introduce false positives which is odd

after 8 years using a commercial spamfirewall which also
useses SA within a lot of other *real crap* and after
switch a domain with some thousand valid RCPT i hold my
breath and ask myself why i did not do that switch long ago




signature.asc
Description: OpenPGP digital signature


Re: SA works great!

2014-08-31 Thread Axb

On 08/31/2014 10:54 PM, Ian Zimmerman wrote:

On Sun, 31 Aug 2014 16:55:50 +0200,
Axb axb.li...@gmail.com wrote:

Axb During the last +-4 years, scores have been set by the masscheck GA
Axb system.  IF more ppl would contribute with masschecks and rules,
Axb detection could be better, but the lack of volunteers doing this
Axb shows that apparently what SA does is good enough or there is
Axb little interest in commitment.

So, how do I take part in masscheck?



Please see

http://wiki.apache.org/spamassassin/NightlyMassCheck




Re: sa-learn and find

2014-08-31 Thread LuKreme

On 31 Aug 2014, at 14:46 , Ian Zimmerman i...@buug.org wrote:

 On Sat, 30 Aug 2014 19:59:53 -0600,
 LuKreme krem...@kreme.com wrote:
 
 RW This may run into shell argument limits if you have to learn a lot
 RW of spam. Consider piping the output of find to xargs, or using -exec
 RW ...{} + in find.
 
 LuKreme Yes, I tried to do that, but as I said in my first post, if I
 LuKreme do the find as part of the sa-learn command, then it stall when
 LuKreme the find command returns null.
 
 xargs (the GNU one at least) has an option to not run the inferior when
 there are no args to give it.

The interior is the find:

This was my original command:

sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

Which stalls if find returns nothing. I am not seeing how xargs would help this.

(FreeBSD xargs never runs the command if the input is empty)

-- 
'I really should talk to him, sir. He's had a near-death experience!'
'We all do. It's called living.'



Re: Give a penalty to messages with non latin UTF-8 characters?

2014-08-31 Thread LuKreme

On 31 Aug 2014, at 14:38 , Ian Zimmerman i...@buug.org wrote:

 Doesn't ok_languages and ok_locales do the job?  It does for me.

Not with UTF-8 encoding, that setting only seems to apply to old-stye character 
declarations.

-- 
showing snuffy is when Sesame Street jumped the shark



Re: SA works great!

2014-08-31 Thread Bob Proulx
Ted Mittelstaedt wrote:
 Reindl Harald wrote:
  i think it's impossible to improve that much out-of-the-box because
  that would make it to sensitive while the bayes has the ham side of
  your communication too for decisions
 
 Google does it.  It's not impossible.

But not out of the box.  Google is at long term steady-state and
can't really compare to a fresh installation of any spam filter.

Plus Google can undeliver a message from your Inbox if you have not
read it yet.  Say a spammer slowly sends sneaky spam to 10,000 people.
After the first dozen report the message as spam then the next 9988
have the message undelivered from their Inbox over to the Junk folder.
That is a powerful feature but one I have never implemented for
myself.

Bob


Re: SA works great!

2014-08-31 Thread LuKreme

On 31 Aug 2014, at 08:08 , Ted Mittelstaedt t...@ipinc.net wrote:
 Google does it.  It's not impossible.

[snip]

 My experience is that the commercial providers like Gmail are now
 so aggressive that false positives are VERY common on their systems,
 this leads to people nowadays quite commonly saying check your
 spam folder on their websites and such that send feedback messages.

These two statements do not go together.


-- 
People only think for themselves if you tell them to.



Re: sa-learn and find

2014-08-31 Thread Ian Zimmerman
On Sun, 31 Aug 2014 17:37:50 -0600,
LuKreme krem...@kreme.com wrote:

Ian xargs (the GNU one at least) has an option to not run the inferior
Ian when there are no args to give it.

LuKreme The interior is the find:

_Inferior_ which is GNU speak for subprocess.  I should have tried to
be less concise :-)

 sa-learn --ham -u ${i} `find /home/${i}/Maildir/.notspam -type f -mtime -7`

find /home/${i}/Maildir/.notspam -type f -mtime -7 | xargs -r sa-learn --ham -u 
${i}

LuKreme (FreeBSD xargs never runs the command if the input is empty)

You may not need -r then.

-- 
Please *no* private copies of mailing list or newsgroup messages.
Local Variables:
mode:claws-external
End:


Re: Outlook, we do love to hate you....

2014-08-31 Thread Jason Haar
On 01/09/14 04:33, Dave Warren wrote:

 As I understand that, that's specifically for messages that originated
 within Exchange itself and had no SMTP transmission or RFC5321 or 5322
 components in the first place. This dates back to Exchange's history,
 at which point it wasn't primarily a SMTP server, SMTP was just one
 possible transport.

Ah - no. I sorta thought of that. Nope - it stripped existing Received
headers out. Stoopid, stooopud, stped


 If Exchange sends the message via SMTP, or exposes it via IMAP, it
 constructs something more standards compliant, it's only when you
 export directly from Outlook that you get this mess.

Yes - it's probable a MAPI thing (not IMAP). I bet Received headers are
kept in some MAPI metadata blob and don't follow the main message blob
when drag-n-dropped into an IMAP folder. Still - no excuse for such
heinous behaviour.


-- 
Cheers

Jason Haar
Corporate Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



random low contrast text with bayes

2014-08-31 Thread Eric Shubert
I've seen an uptick of spam lately with random low contrast (hidden) 
text. This appears to be lowering bayes probabilities.


I'd like to strip low contrast text from messages before they're learned 
by sa-learn in order to combat this.


1) does anyone have some guidance for building such a filter?

2) Is there perhaps a better way of dealing with this type of spam?

Thanks.

--
-Eric 'shubes'



Re: random low contrast text with bayes

2014-08-31 Thread John Hardin

On Sun, 31 Aug 2014, Eric Shubert wrote:

I've seen an uptick of spam lately with random low contrast (hidden) text. 
This appears to be lowering bayes probabilities.


Learn them as spam. That will tend to eliminate that effect.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  It is criminal to teach a man not to defend himself when he is the
  constant victim of brutal attacks.  -- Malcolm X (1964)
---
 822 days since the first successful private support mission to ISS (SpaceX)


A rule for Phil

2014-08-31 Thread Luciano Rinetti

I need a rule that, when a message is sento to p...@example.com
and the Subject contains CV or Curriculum, scores the message with -9
and a rule that, when a message is sent to to p...@example.com
and the Subject doesn't contains CV or Curriculum, scores the message 
with 7


Regards



Re: random low contrast text with bayes

2014-08-31 Thread Eric Shubert

On 08/31/2014 10:26 PM, John Hardin wrote:

On Sun, 31 Aug 2014, Eric Shubert wrote:


I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.


Learn them as spam. That will tend to eliminate that effect.



Been doing that (learning them) for quite a while. I've had that 
mechanism set up for several years now, and it's working fairly well 
(after I adjusted the scoring upwards for bayes rules).


It appears to me that the hidden text is being randomly generated. Even 
saw a random function of some sort in there. I presume it's been 
designed to 'poison' bayes by vitue of the random text (and a sizable 
amount of it).


Thanks.
--
-Eric 'shubes'