Re: HitFreqsRuleTiming and SpamAssassin 3.2.5

2013-05-27 Thread Alexandre Boyer
hi,

with the debug infos, do you see your module loaded?

A few hints: check permissions on the .pm itself, then check permissions
of the user to write in this directory.

You may want to try a call on the CLI with the flag --cf="loadplugin
HitFreqsRuleTiming /etc/mail/spamassassin/HitFreqsRuleTiming.pm". This
will load SA with the sepcified plugin and you should see it loading
with -D.

Cheers,

Alex, from osmosed.
Bow before me, for I am root.


On 27/05/13 11:03 AM, Rejaine Monteiro wrote:
> Hi all,
>
> I'm using SpamAssassin version 3.2.5 running on Perl version 5.8.8
>
> I I did copy HitFreqsRuleTiming.pm to
> /etc/mail/spamassassin/HitFreqsRuleTiming.pm  and add to 
> /etc/mail/spamassassin/init.pre file the following line:
>
> loadplugin HitFreqsRuleTiming
> /etc/mail/spamassassin/HitFreqsRuleTiming.pm
>
> I execute "spamassassin --lint -t -D" which correctly said:
>
>  spamassassin --lint -t < test.msg
>
> But Ido not find any 'timing.log'  file on my current directory or
> anywhere on my system, even running as root...
>
> Did I missed something ?
>
>


signature.asc
Description: OpenPGP digital signature


Re: Need rule to catch lots of font changes

2013-04-19 Thread Alexandre Boyer
Hi,

your meta is wrong.

It should be:

meta  LOC_MULT_BR  __LOC_BR > 10

Note that it will not match "just" 10 instances of this tag. It will
match "at least" ten of them.

If you want exactly 10, you have to do something like:

meta  LOC_MULT_BR  __LOC_BR = 10

Never done that, maybe you need to do "greater than 9 smaller than 11"
instead.

Alex, from prypiat.
Yes, I recycle.


On 13-04-18 07:32 PM, Alex wrote:
> Hi all,
>
>
> just write a single detection rule for FONT face= (rawbody or
> uri_detail) and use tflag multiple.
>
> Then meta this with a counter.
>
> eg:
> rawbody  __BLAH  / tflags  __BLAH  multiple maxhits=21
> meta  MULTPL_FONTS  __BLAH > 20
> score  MULTPL_FONTS  5.0
> describe MULTPL_FONTS  At least 20 FONT tags found
>
>
> I'm trying to adapt this to work with multiple  tags, but I must
> be doing something wrong. I've tried changing it to match just 10
> instances of , just for testing. Here's what I have:
>
> rawbody  __LOC_BR  //
> tflags  __LOC_BR  multiple maxhits=11
> meta  LOC_MULT_BR > 10
> score  LOC_MULT_BR 2.0
> describe LOC_MULT_BR At least 10 br tags found
>
> Here is the body example I'm working with:
>
>  href=3D"http://www.paren=
> ts-partage.org/components/com_content/bestinfo.php?tkogwruam714qhdgbfo
> ">htt=
> p://www.parents-partage.org/components/com_content/bestinfo.php?tkogwruam71=
> 
> 4qhdgbfo > r><=
> br>=
>  >__=
> __The stresses.. They just don't care. They're like you on
> Sunday m=
> orning. -- Jerry Griffin
> 
>
> Any idea why this doesn't work as expected? I've pasted an example here:
>
> http://pastebin.com/qprT2Rze
>
> Thanks for any ideas.
> Alex
>
>
>
>  
>
>
>
>
>
> Best regards,
>
> Alex, from prypiat.
> Yes, I recycle.
>
>
> On 13-04-14 08:46 PM, Marc Perkel wrote:
> > Anyone want to write a rule to catch this? Lots of font and color
> > changes.
> >
> > 
> > treatment for the summer holidays.
> > http://jmb.tw/16xul";>Achieve all your goals and this
> video
> > will
> > help you.
> >  color="#e4f4f2">One
> >  color="#e4fcf9">day
> >  > color="#e0fffb">a  > size="+2" color="#e8fffc">younger colleague,  face="Tahoma,
> > Geneva, sans-serif" size="-3" color="#f0fffd">one  > face="Courier, monospace"
> > size="5" color="#ecfbf9">of  > size="3" color="#e0fefa">my most  > color="#e0fdf9">intimate
> >  > color="#f8fffe">friends,  > size="-3" color="#f6fdfc">who had visited  > face="Arial, Helvetica, sans-serif" size="1"
> color="#f0fefc">the
> >  color="#ecfaf8">patient-  > face="Century Gothic, Times New Roman"
> > size="1" color="#e8f6f4">Irma-  size="1"
> > color="#e4f2f0">and  > size="-2" color="#e8fdfa">her
> >  color="#e4f9f6">
> > 
> > 
> >
> >
>
>


signature.asc
Description: OpenPGP digital signature


Re: Need rule to catch lots of font changes

2013-04-17 Thread Alexandre Boyer
Hi there,

just write a single detection rule for FONT face= (rawbody or
uri_detail) and use tflag multiple.

Then meta this with a counter.

eg:
rawbody  __BLAH  / 20
score  MULTPL_FONTS  5.0
describe MULTPL_FONTS  At least 20 FONT tags found

Best regards,

Alex, from prypiat.
Yes, I recycle.


On 13-04-14 08:46 PM, Marc Perkel wrote:
> Anyone want to write a rule to catch this? Lots of font and color
> changes.
>
> 
> treatment for the summer holidays.
> http://jmb.tw/16xul";>Achieve all your goals and this video
> will
> help you.
> One
> day
>  color="#e0fffb">a  size="+2" color="#e8fffc">younger colleague, one  face="Courier, monospace"
> size="5" color="#ecfbf9">of  size="3" color="#e0fefa">my most  color="#e0fdf9">intimate
>  color="#f8fffe">friends,  size="-3" color="#f6fdfc">who had visited  face="Arial, Helvetica, sans-serif" size="1" color="#f0fefc">the
> patient-  face="Century Gothic, Times New Roman"
> size="1" color="#e8f6f4">Irma-  color="#e4f2f0">and  size="-2" color="#e8fdfa">her
> 
> 
> 
>
>



signature.asc
Description: OpenPGP digital signature


Re: Yahoo single-link spam common elements

2013-03-01 Thread Alexandre Boyer
The famous 5 recipients...

I had a (very) few exceptions while having the very same pattern in
body. With 4 recipients instead of 5, and sometimes one among the 5 with
no To:address, just To:name, wich was harder to count...

I removed the similar rule as your __RP_D_00040 from my systems to avoid
false negatives.

And no FP for a long time on this rule (this is an old bot, first saw
last summer, but probably older but unnoticed).

Alex, from prypiat.
Yes, I recycle.


On 13-03-01 02:45 PM, David F. Skoll wrote:
> On Fri, 01 Mar 2013 14:39:09 -0500
> Alexandre Boyer  wrote:
>
>> Pretty the same as what David suggests :-)
> My latest attempt is this:
>
> header   __RP_D_00040_1 From:addr =~ /yahoo/i
> header   __RP_D_00040_2 To =~ /(:?@.*?){5}/
> body __RP_D_00040_3 /http.{0,200}\d{1,2}:\d{1,2}:\d{1,2}/
> meta RP_D_00040 __RP_D_00040_1 &&__RP_D_00040_2 &&__RP_D_00040_3
> describe RP_D_00040 Yahoo single-line URL spam
>
> I'm a little worried about potential FPs, but we'll see how it goes.
>
> Regards,
>
> David.



signature.asc
Description: OpenPGP digital signature


Re: Rule to check To and/or CC headers

2013-03-01 Thread Alexandre Boyer
Okey...

Didn't catch that.

Not a bad idea but cannot be a decision making thing. And need a plugin.

I thought about that already but didn't had time to code this. And I
don't remember who on this list brang objections that it would not be
such a good idea.

Plus: SA do not have access to SMTP infos. That a pitty, but you can't
have the real thing, you rely on headers.

Alex, from prypiat.
Yes, I recycle.


On 13-03-01 02:38 PM, Dave Warren wrote:
> On 3/1/2013 11:26, Alexandre Boyer wrote:
>> There is no silly question. Just noobs. FYI: most of the time, I'm a
>> noob.
>>
>> I do not understand your question: To or Cc headers are recipients. Do
>> you want to compare the name portion to the address portion?
>>
>> eg: To: "Alex Boyer" 
>>
>> If Alex matches the local part in the address, then it's OK? else
>> it's not?
>>
>> This would be a bad idea.
>>
>> Whatever... the important thing is : if you want to compare with SA, you
>> have to write a plugin. There is no way one can compare, say, a
>> signature in the body and the ToCc headers without a module.
>>
>> Or if there is a way, I am willing to learn it!
>>
>
>
> I suspect he wants to check to see if the RCPT TO (the actual
> recipient) is named in the TO or CC headers, and score based on this
> decision.
>



signature.asc
Description: OpenPGP digital signature


Re: Yahoo single-link spam common elements

2013-03-01 Thread Alexandre Boyer
Right: the suggested pattern is working great, but there are some
variants as KAM says.

However I sense that these are not the same bots. The one with the "date
in body" is always the same (the spammer only changed the date format).

I heard about a cross site botnet exploit on Yahoo! and third party
website, but did not digged into that.

Here is what is working fine for me:

body __AJB_DATE_IN_BODY   
m'\d{1,2}/\d{1,2}/\d{4}\s(\d{1,2}:){2}\d{2} [AP]M'
uri __AJB_RANDOMURI  

m'/[a-z]{2,10}/[a-z1-9]{1,30}(\.[a-z1-9]{1,10}\?[a-z1-9]{1,30}|[\=\&][a-z1-9]{1,30})'
metaAJB_YAHOO_BOT AJB_REALYAHOO && HTML_MESSAGE &&
__AJB_DATE_IN_BODY && __AJB_RANDOMURI
score   AJB_YAHOO_BOT 10.0
meta AJB_REALYAHOO __AJB_FROM_YAHOO && __RCVD_YAHOO
header __AJB_FROM_YAHOO  From:addr =~ /\@yahoo\.c(a|om)/i
header __RCVD_YAHOO Received =~ m'\.yahoo\.c(a|om) .+ by
\S+\.zerospam\.ca'm


Pretty the same as what David suggests :-)

Also noticed that the To:, Reply-To: headers and the name in the
signature in the body matches. Wanted to code a plugin but the previous
rules are doing the job so...

Alex, from prypiat.
Yes, I recycle.


On 13-03-01 12:49 PM, Kevin A. McGrail wrote:
> On 3/1/2013 12:43 PM, David F. Skoll wrote:
>> These are the common elements as far as I can see in the text/plain part
>> of the spam:
>>
>> 1) The URL always matches this regex:
>>
>> http://\S+/\S+\.\s+\?
>>
>> In other words, there's always a dot in the URL (not counting the dots
>> in the domain name itself) and a question mark.
>>
>> 2) The URL is then followed by possible whitespace and the name or
>> address
>> of the sender.
>>
>> 3) This is followed by more possible whitespace and then the date and
>> time in a format that matches this regex:
>>
>>\d{1,2}/\d{1,2}/\d{4} \d{1,2}:\d{1,2}:\d{1,2} [AP]M
>>
>> Can others confirm this pattern?
> I can confirm this is ONE of the patterns we've seen but we have seen
> other variations.
>
> For example, here's one from yesterday that you'll note forges my
> brother as the sender:
>
> Return-Path: 
> Received: from nm7.bullet.mail.gq1.yahoo.com
> (nm7.bullet.mail.gq1.yahoo.com [98.136.218.72])
> by intel1.peregrinehw.com (8.14.5/8.14.5) with SMTP id r1SI2WHg008621
> for ; Thu, 28 Feb 2013 13:02:33 -0500
> Received: from [98.137.12.61] by nm7.bullet.mail.gq1.yahoo.com with
> NNFMP; 28 Feb 2013 18:02:31 -
> Received: from [208.71.42.212] by tm6.bullet.mail.gq1.yahoo.com with
> NNFMP; 28 Feb 2013 18:02:31 -
> Received: from [127.0.0.1] by smtp223.mail.gq1.yahoo.com with NNFMP;
> 28 Feb 2013 18:02:31 -
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com;
> s=s1024; t=1362074551;
> bh=O2aFzcTOvDvCQALZoONOlZmCJiqlFu6WnhUAJG1clGI=;
> h=X-Yahoo-Newman-Id:Message-ID:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Received:From:Reply-To:Subject:Date:To;
> b=5sIC6wpAChfKFdhlWmr4OhjWCpNoMhTdxsbWPAIXYyD3f+O4QKMatwXxL7uvHeFc5TD//q4hW0HQDVJ+f/XJq71XHuBeWLySuYceP9ZP5gMRMnAR8uM9o9rWw0vnwSd7+3H3ff1rCd2FunGswYwlNAG5yz79uYE7xe+sXw5qs3c=
>
> X-Yahoo-Newman-Id: 533489.47072...@smtp223.mail.gq1.yahoo.com
> Message-ID: <533489.47072...@smtp223.mail.gq1.yahoo.com>
> X-Yahoo-Newman-Property: ymail-3
> X-YMail-OSG: jRlM9PUVM1m1fvPhWPzSnQEReLcFyK.eiCoVEK16XkMJTsp
>  FUuOvETyd8ee4KmT2FuoE1n9krae3pEbGP2MbvtNXR6sdYnhJIxvfdiuEtob
>  wr1ipSssPLDugG_B3KfoWpLJZs0YjG5TMqqVzDGih3D11pGQfAY6w.mgoOWY
>  Vemeo4DqHYY8XYokWdUpIh65s1dDZlNaYvlqfF1MZudo2pV6wlPm_rMDWHvP
>  DNawGoHaZr3qyELnp7ElYqt8BCCs0hushH3dTtn.mVpUMrTv3GzPnkMMGCvR
>  O9U8mO_UIFwTMrWvkkzLcMKqdKdukq8.cPSh8VY5TRg_Xih7mDsVxksEIVcE
>  OCOEMbBw9uApP4oRpc.pBlu9eDntaPpiUUPhpb9xxkQw4lcLJkx0RTt0GYD3
>  uAMLNtukwnvce54PkLZl3JrIDGhvQuhKnZxYyRsne49aNjP11_3wZUo8wlvg
>  guHiLuHcqkFb6lusTYz41fCHrSJ6VTYxwqlQcA0DioWPWPDZmkjLtrc2aER1
>  MbKjYki6ceeLXQT21DGdb9Gui.eE43RA2Ix6qqTYRddM-
> X-Yahoo-SMTP: bHYtILuswBDzs9L.FhYpFEHr7NQ0kndD9GjKbx8-
> Received: from localhost (rasiel_mongado29@200.121.59.161 with login)
> by smtp223.mail.gq1.yahoo.com with SMTP; 28 Feb 2013 10:02:31
> -0800 PST
> From: TOBY MCGRAIL 
> Reply-To: TOBY MCGRAIL 
> Subject: KEVIN
> Date: Thu, 28 Feb 2013 10:05:47 -0800 (PST)
> To: Kevin 
>
> kevin, hey. look what I found!   
> http://www.deguciumd-munged.lt/answerbabykevingreen/
>
>
> regards,
> KAM


signature.asc
Description: OpenPGP digital signature


Re: Rule to check To and/or CC headers

2013-03-01 Thread Alexandre Boyer
Hello,

There is no silly question. Just noobs. FYI: most of the time, I'm a noob.

I do not understand your question: To or Cc headers are recipients. Do
you want to compare the name portion to the address portion?

eg: To: "Alex Boyer" 

If Alex matches the local part in the address, then it's OK? else it's not?

This would be a bad idea.

Whatever... the important thing is : if you want to compare with SA, you
have to write a plugin. There is no way one can compare, say, a
signature in the body and the ToCc headers without a module.

Or if there is a way, I am willing to learn it!

Alex, from prypiat.
Yes, I recycle.


On 13-03-01 01:48 PM, Anthony Hoppe wrote:
> Hey All,
>
> I'm just starting to dive into advanced custom SA rules, so forgive me if 
> this is a silly question.  Is it possible to construct a rule that looks at 
> the To and/or CC field and compares it to the recipient?  I know this can be 
> dangerous as legitimate email can be BCCed, but I think being able to use 
> this to add just a teeny bit to the spam score will help complement some of 
> our current rules.
>
> Thanks!
>
> ~ Anthony
>
>
> Anthony Hoppe 
> IS Support Technician II 
> Menlo Park City School District 
> (650) 321-7140 ext. 5272 
> aho...@mpcsd.org 



signature.asc
Description: OpenPGP digital signature


Re: help

2013-02-26 Thread Alexandre Boyer
The answer is 42.

Alex, from prypiat.
Yes, I recycle.


On 13-02-25 01:07 PM, Chris Hunt wrote:



signature.asc
Description: OpenPGP digital signature


Re: blocking sender name

2013-02-24 Thread Alexandre Boyer
Hi there,

Specifically checking name is:

header  LOL  From:name =~ "AndyTheCoach"

Meta this with the excellent suggestion from Martin (header
MSGID_BLOCKER Message-ID =~ /AndyNgPC/) to minimize false positive risk.

Best regards,

Alex, from osmosed.
Bow before me, for I am root.


On 24/02/13 07:53 AM, Martin Gregorie wrote:
> On Sun, 2013-02-24 at 19:20 +0800, Nicholas C. wrote:
>> Hi,
>>
>> There are a few emails which I had already blocked their emails, but I still
>> getting spammed from them. 
>>
>> Example below. Is there a way to block the sender name, AndyTheCoach
>> instead?
>>
> header NAMEBLOCKER From =~ /AndyTheCoach/
>
> or, if its always from the same PC, this would work regardless of the
> sender name
>
> header MSGID_BLOCKER Message-ID =~ /AndyNgPC/ 
>
> or, if the return path is always the same,
>
> header RETURN_BLOCKER Return-Path =~ /Andyngkf\@singnet\.com\.sg/
>
> or you can use all three and or them together with a meta rule.
>  
>
> Martin
>
>
>



signature.asc
Description: OpenPGP digital signature


Re: URIDNSBL: how to query certain lists only?

2013-01-07 Thread Alexandre Boyer

Alex, from prypiat.
Yes, I recycle.


On 13-01-07 04:18 AM, Fabio Sangiovanni wrote:
> Hi,
>
> thanks to everybody for your answers.
>
> Il giorno 04/gen/2013, alle ore 18:12, Kris Deugau  ha 
> scritto:
>> Mmmm, the problem the OP was asking about is "how do I make sure that
>> only the specific URIBLs I want are active, no matter what may be added
>> upstream?".
>>
>> IIRC this was asked a while ago but I don't recall any answer better
>> than "watch the updates closely and disable any new ones when you see
>> them".  I think the reasoning was that new DNSBLs are not casually added
>> the way new regex or non-DNS rules, and there's usually some warning on
>> the users and/or dev lists, so you can preemptively add "score NEW_URIBL
>> 0" to your local.cf or local rules channel.
> Yes, that's exactly my problem, and unfortunately this is the only solution I 
> came up to, too.
> The introduction of symbolic name wildcards here would be of great help. Has 
> this ever been considered?
> One could set one line as:
>
> score URIBL_* 0
>
> and add specific scores for desired lists after that.
> This would imply the definition of a standard naming for rules, but as far as 
> I can see that's quite in place already.
>
>
>> If you're redefining the tests anyway (to use local datafeed versions of
>> any give URIBL) I would recommend putting them in a custom local TLD
>> that won't resolve globally, to make sure you really *are* using your
>> local copies.
>>
>> -kgd
> I have a local bind on each mta, which act as a cache and forwards queries to 
> another bind on our LAN, wich in turn forwards to rbldnsd (updated daily from 
> datafeed services).
> We'll consider the local tld as further measure.
>
> One slightly OT quesiton: as far as postfix is concerned, how could it help 
> with checks against URIDNSBLs? I'm not aware of any method to make postfix 
> scan the body of the message and look for URIs. At best, postfix can query 
> DNSBLs using client IPs and envelope sender/recipient domains, but that's out 
> of the scope of my need…am I missing something?

PF is not good at handling content of emails, especially on systems with
a lot of traffic (use body_checks and regexes). This belongs to content
filters.

I use (zen|dbl).spamhaus.org at the pre-data level, cutting *a lot* of
treatment for so few fps. As you have your own dns, you could rsync the
spamhaus zone and use your dns for queries. It's a lot faster, and your
SA instance will also appreciate :-)

>
> Thanks to everyone for your help!
>
> Fabio
>



signature.asc
Description: OpenPGP digital signature


Re: URIDNSBL: how to query certain lists only?

2013-01-04 Thread Alexandre Boyer
Hi there,

Why dont you perform those checks at the pre-data level, within postfix?

It's faster and cuts a lot of treatment for the data analysis.

The way you are doing is the way I would do, but someone on the list might
have a better way.

Alex, from N7.
Hello list,

I'm a relatively new user of Spamassassin.
My setup is a postfix + amavisd-new + spamassassin stack, with amavisd-new
acting as before-queue filter. My use case is filtering submissions by
untrusted users (customers of the company I work for); sasl authentication
is mandatory.
I'm trying to set URIDNSBL rules in such a way that only certain dns lists
are queried (Spamhaus DBL and SURBL; we have a datafeed subscription with
them).
What I did was to look at
/var/lib/spamassassin/3.003002/updates_spamassassin_org/25_uribl.cf and set
my local.cf as follows:

[...]
score URIBL_SBL 0
score URIBL_SBL_A 0
score URIBL_DBL_SPAM 0
score URIBL_DBL_REDIR 0
score URIBL_DBL_ERROR 0
score URIBL_SC_SURBL 0
score URIBL_WS_SURBL 0
score URIBL_PH_SURBL 0
score URIBL_AB_SURBL 0
score URIBL_JP_SURBL 0
score URIBL_BLACK 0
score URIBL_GREY 0
score URIBL_RED 0
score URIBL_BLOCKED 0

urirhsblURIBL_SURBL multi.surbl.org.A
bodyURIBL_SURBL eval:check_uridnsbl('URIBL_SURBL')
describeURIBL_SURBL Contains an URL listed in the SURBL
blocklist
tflags  URIBL_SURBL net
reuse   URIBL_SURBL
score   URIBL_SURBL 3

urirhsblURIBL_DBL_SPAM  dbl.spamhaus.org.   A
bodyURIBL_DBL_SPAM  eval:check_uridnsbl('URIBL_DBL_SPAM')
describeURIBL_DBL_SPAM  Contains an URL listed in the DBL blocklist
tflags  URIBL_DBL_SPAM  net domains_only
score   URIBL_DBL_SPAM  3
[...]

I *intentionally* want to check aggregate lists instead of single ones and
reassign scores.

Everything works ok, except for the fact that queries are performed to
dob.sibl.support-intelligence.net as well. The matching rule is obviously
URIBL_RHS_DOB in 72_active.cf, and adding "score URIBL_RHS_DOB 0" to
local.cf solved the issue.

So my problem is: if I understand correctly the process of
72_active.cfrule generation, new URIBL_* rules could end up appearing
in
72_active.cf at any time through sa-update.
How can I configure Spamassassin to permanently use just the URIBL_* rules
I want? Do I have to check from time to time 72_ac   tive.cf and see if
something has been added? That would be quite painful!

Thanks a lot for your support!

Fabio


Re: Scoring Yahoo mail from certain continents/countries ?

2012-12-09 Thread Alexandre Boyer
I there Frederic,

I think a geoip module exists. I saw that somewhere. Just take a look
for it.

But I think this is a bad idea. You are right about the analysis, but
geoip filtring is not efficient and may lead to FPs.

Take extra care to the rules you are going to build about it. You may
also take a look at: bayes (train your filter) et AWL.

Of course, it all depends on the size of your system.

Best,

Alex, from osmosed.
Bow before me, for I am root.


On 09/12/12 05:16 AM, Frederic De Mees wrote:
> Dear list,
>
> Here is the context.
> The French-speaking countries receive tons of e-mails, mostly fraud
> attempts, fake lotteries, originating from West-Africa and sent by
> Yahoomail users.
> Often those messages contain big attachments. The payload (text of the
> message) is embedded in a 1MB jpeg with fake certificates of a lawyer,
> a logo, or whatever.
>
> Spamassassin misses 100% of them because:
> - the sender IP (Yahoo) is genuine and has a good reputation
> - the analysis of the message text shows nothing bad, as the mill!ions
> of euros are in the picture attachment
> - due to the message size, the analysis is skipped anyway.
>
> If no customer of the mail server in question expect any mail from any
> Yahoo user in Africa, a simple 'header_checks' Postfix directive like
> this will match such messages if their sender IP starts with 41.
> /^Received: from .41\..*web.*mail.*yahoo\.com via HTTP/i
>
> I admit this is rough albeit effective. On one side, not all Africa is
> 41. On the other side, I do not want to block all 41.
>
> I would have loved to do it with SA.
> This means that the line
> "Received: from [ip.add.res.ss].*web.*mail.*yahoo\.com via HTTP"
> should be detected and analysed.
> The ip address should be extracted.
> The whois of the address should be queried.
> The country code of the IP address would return certain number of SA
> points from a list of "Yahoousers bad countries" I would manage.
>
> Have I dreamed ?
>
> Frédéric
> Brussels



signature.asc
Description: OpenPGP digital signature


Re: Gappy subject misses

2012-12-04 Thread Alexandre Boyer
Hi,

I've fairly good results with this rule:
header__AJB_OBFU_PR0N_SUBJSubject =~
/[\:\;\/\`\(\)\{\}~\#\&\"\%\$\_][a-z0-9][\:\;\`\(\)\/\{\}\_\~\#\&\"\%\$]/im

It's realy basic and desrve a rework.

Best,

Alex, from prypiat.
Yes, I recycle.


On 12-12-04 06:57 AM, Tom Hendrikx wrote:
> Hi,
>
> I'm currently seeing an increasing number of subjects like the ones
> below that are not being detected by SA. Looking through the existing
> rules (i'm still running v3.3.1) I'm seeing both the GAPPY_SUBJECT and
> the SERGIO_SUBJECT_VIAGRA01 approaches that are interested in this kind
> of stuff.
>
> I tried to adapt GAPPY_SUBJECT but it went over my head unfortunately,
> and ended up writing variants of SERGIO_SUBJECT_VIAGRA01 for several sex
> related strings. But being afraid to end up with (another ever
> expanding) list of phrases in rules: is there a better way to catch
> these? Maybe someone is able to refactor GAPPY_SUBJECT into something
> that hits on the example below too?
>
> Examples:
>
> Subject: S _C H0^0 &L (G. l ^RL S ( P0 |RN_
> Subject: H!AR -D C O !R &E`
> Subject: Un{d}r_es ,s -in {g
> Subject: P-0 :R |N . V I)D .E OS {
> Subject: P "O/R N= F "lLM
> Subject: B &AN +G l_N$G _
> Subject: G.r_ a|n.n|y P `o,r|n.
> Subject: S =E ^X/ V l D|EO (
> Subject: P{O{R N  M;O}V^I(E _S !
> Subject: B l )G ;C O {C K. S !
> Subject: Ba }n .gl&n-g
> Subject: S ;c{h\o "o /l_ g ;i ,rl Por {n ^
>
> --
> Kind regards,
>   Tom



signature.asc
Description: OpenPGP digital signature


Re: Message not scanned- Size?

2012-12-03 Thread Alexandre Boyer
Hi,

I guess you may change your threshold for the cut off? the -s flag, when
calling spamc seems to be it.

I use amavisd-new to feed SA, it does the same thing, I had to change my
threshold too to analyze bigger emails.

Best,

Alex, from prypiat.
Yes, I recycle.


On 12-12-03 06:25 AM, Joseph Acquisto wrote:
> A message slipped through untouced.  Obvious spam from "Minister of Finance" 
> with many attachments.
>
> /var/log/mail shows a message skipped "spamc[7262]: skipped message, greater 
> than max message
> size (512000 bytes)" at the time this came thru.
>
> I'd guess that was it.  Unusual, but any way to prevent that in future?
>
> joe a.
>
>



signature.asc
Description: OpenPGP digital signature


Re: FROM_MISSP_* causing FPs

2012-12-03 Thread Alexandre Boyer

Alex, from prypiat.
Yes, I recycle.


On 12-12-03 02:04 AM, John Wilcock wrote:
> Le 30/11/2012 18:18, John Hardin a écrit :
>>>header __AJB_HAS_XEROXX-Mailer =~ /WorkCentre \d{3,5}/
>>>header __AJB_XEROX_SUBJ   Subject =~ /Scan from a Xerox/
>>
>> Thanks! I will add those to my sandbox.
>>
>> Question: how often do you see that subject _without_ that X-Mailer?
>
> Whenever someone legitimately forwards a scanned document (which is
> quite a common occurrence in offices that have such scanner/copiers).
> Also worth noting that the default subject depends on the copier's
> locale, and can be changed anyway.

Right. But thing is: spammers won't try this, they tend to mimic the
default title to lure unaware/imprudent end-users. Therefore the
relative utility of a meta including the default title ;-)

To answer John Hardin's question: I will have to query my logs, I don't
have much time, but I will answer your question someday :-D

>
> PS: do you want genuine scans from other types of networked copier? I
> can forward a Rex Rotary example offlist if that would be useful.
>
> John.
>



signature.asc
Description: OpenPGP digital signature


Re: FROM_MISSP_* causing FPs

2012-11-30 Thread Alexandre Boyer
Take care with Xerox versions, it just changed.

I mentioned this in my reply to Kris.

I do not trust PHP Mailers, as PHP is wrong by design.

Alex, from prypiat.
Yes, I recycle.


On 12-11-30 10:17 AM, Kris Deugau wrote:
> John Hardin wrote:
>> On Thu, 29 Nov 2012, Kris Deugau wrote:
>>
>>> I've just had another couple of reports of false positives due to hits
>>> on one or more of the FROM_MISSP_* rules.
>>>
>>> Curious coincidence:  Almost all of the reports to date have involved
>>> webform email for real estate companies.  Most of the rest have involved
>>> scan-to-email multifunction devices - mostly Xerox used by real
>>> estate companies.  O_o
>> Is there any possibility of getting user agent headers for these FPs? If
>> a particular piece of legit software always does this then obviously
>> those rules should ignore such messages.
> The most recent scan-to-email had:
>
> X-Mailer: WorkCentre 7428
>
> and another couple of older ones showed:
>
> X-Mailer: WorkCentre 7435
>
> Walking Xerox's list of scan-to-email-capable devices will probably turn
> up another couple of possible model numbers.
>
> Digging back in the FP archive, there are a handful of webform messages
> with "X-Mailer: PHP4", and none of the rest have even that much.
>
> None of the ones I've had reported are from desktop or mobile MUAs.
>
> Two are from airlines - one an ezine from Air Canada, the other a
> receipt/itinerary from Air Creebec.
>
> -kgd



signature.asc
Description: OpenPGP digital signature


Re: FROM_MISSP_* causing FPs

2012-11-30 Thread Alexandre Boyer
Hi Kevin,

You are right, and by a lot I know what you mean, I see them too :-)

But rare are the one that fake the X-Mailer header. I can't remind
seeing one in fact.

Note: I corrected my __AJB_HAS_XEROX this very morning to:

header   __AJB_MAILER_XEROX   X-Mailer =~ /^WorkCentre .{3,6}/

I noticed false positives because of the declared Mailer version
(instead of "WorkCentre 1234", it's now "WorkCentre /4.03"). I realy
like version numbers that are consistent in time. This proves how
developers thought about things in the very first place.

Also, some Xerox machine do add some interesting headers:

X-Xerox-Source-IP: 192.168.2.130
X-Xerox-Source-Name: redac...@example.com
X-Xerox-DeviceType: Phaser 3635MFP
X-Xerox-DeviceName: XRXAADEF46B
X-Xerox-Mail-Id: 1100856957-758036596-000571194682402-758036596-535680529

I'm building rules with those, as I never saw such faked headers in
spams spoofing the Subject: Scan from a Xerox, but in the case of
forwarded scans, I keep my meta with Thread related rules.

Regards,

Alex, from prypiat.
Yes, I recycle.


On 12-11-30 09:54 AM, Kevin A. McGrail wrote:
> On 11/30/2012 8:15 AM, Alexandre Boyer wrote:
>> As a Mailer agent, I also spotted the Xerox Workcenter to have a
>> dirty bahavior.
>>
>> As I had the very same problem as Kris, I personnaly did not disabled
>> those rules but builded some metas based on X-Mailer and Subject tests:
>>
>> header __AJB_HAS_XEROXX-Mailer =~ /WorkCentre \d{3,5}/
>> header __AJB_XEROX_SUBJ   Subject =~ /Scan from a Xerox/
>>
>> I meta those sub-tests with FROM_MISSP_* and I compensate for the
>> scores. As I use some KHOP rules, I also meta this with KHOP_THREADED
>> as well as with some Thread related rules to avoid blocking forwarded
>> scans.
>>
>> I did not made a deep research, I could probably customize
>> __AJB_HAS_XEROX to match specific versions of this "broken" agent,
>> but this work good like that. As they say: "first make it work, then
>> make it better." But when it works, I ususally have something else to
>> do than make it better.
>>
>> Works pretty well indeed.
> Adding to the mix, I see a LOT of phishing attempts with Scan from XYZ...
>
> Regards,
> KAM


signature.asc
Description: OpenPGP digital signature


Re: FROM_MISSP_* causing FPs

2012-11-30 Thread Alexandre Boyer
As a Mailer agent, I also spotted the Xerox Workcenter to have a dirty
bahavior.

As I had the very same problem as Kris, I personnaly did not disabled
those rules but builded some metas based on X-Mailer and Subject tests:

header __AJB_HAS_XEROXX-Mailer =~ /WorkCentre \d{3,5}/
header __AJB_XEROX_SUBJ   Subject =~ /Scan from a Xerox/

I meta those sub-tests with FROM_MISSP_* and I compensate for the
scores. As I use some KHOP rules, I also meta this with KHOP_THREADED as
well as with some Thread related rules to avoid blocking forwarded scans.

I did not made a deep research, I could probably customize
__AJB_HAS_XEROX to match specific versions of this "broken" agent, but
this work good like that. As they say: "first make it work, then make it
better." But when it works, I ususally have something else to do than
make it better.

Works pretty well indeed.

Alex, from prypiat.
Yes, I recycle.


On 12-11-29 08:35 PM, Michael Orlitzky wrote:
> On 11/29/2012 05:43 PM, John Hardin wrote:
>> On Thu, 29 Nov 2012, Kris Deugau wrote:
>>
>>> I've just had another couple of reports of false positives due to hits
>>> on one or more of the FROM_MISSP_* rules.
>>>
>>> Curious coincidence:  Almost all of the reports to date have involved
>>> webform email for real estate companies.  Most of the rest have involved
>>> scan-to-email multifunction devices - mostly Xerox used by real
>>> estate companies.  O_o
>> Is there any possibility of getting user agent headers for these FPs? If a 
>> particular piece of legit software always does this then obviously those 
>> rules should ignore such messages.
>>
> I had one guy actually read the rejection message and contact
> postmaster@ about this.
>
> His sig shows:
>
>   Sent from my MOTOROLA ATRIX™ 2 on AT&T
>
> And the headers:
>
>   X-Spam-Flag: NO
>   X-Spam-Score: 4.224
>   X-Spam-Level: 
>   X-Spam-Status: No, score=4.224 required=5 tests=[FREEMAIL_FROM=0.001,
>   FROM_MISSP_EH_MATCH=2.499, FROM_MISSP_FREEMAIL=1.723,
>   HTML_MESSAGE=0.001] autolearn=disabled
>   From: "u...@example.com"
>   X-Mailer: Motorola android mail 1.0
>
> It was relayed through AOL, who you think would clean that up. This
> particular model also base64 encodes the entire message...


signature.asc
Description: OpenPGP digital signature


Re: Claims manager / LOTTO_AGENT

2012-11-08 Thread Alexandre Boyer
Hello there,

Well if you feel uncomfortable with running mass-check and send data
(not the email themselves, just the rules they hit, as Darxus is
pointing out), you may want to override the score for those rules in
your local.cf.

You may even write you own rules to compensate those false positives.

If you can't contribute to SA by giving feedback via the mass-check,
then do what you need to do on your side. Everybody here will be glad to
help ;)

Alex, from prypiat.
Yes, I recycle.


On 12-11-07 11:02 PM, Michael Orlitzky wrote:
> On 11/07/2012 10:36 PM, dar...@chaosreigns.com wrote:
>> On 11/07, Michael Orlitzky wrote:
>>> Sorry, I was a little rude. But saying that she shouldn't put her job
>>> title anywhere in an email, ever, is ridiculous. 
>> Certainly.
>>
>>> The inputs (spam, ham)
>>> to the classifier are assumed god-given; and the classification needs to
>>> reflect the data, not the other way around.
>> If "the classifier" is spamassassin, and "The inputs" are the spam
>> and ham data provided via masscheck, then... the scores provided via
>> sa-update *do* reflect the data.  So I'm not sure what you mean.
>>
>> The ideal rule scores are chosen to cause one false positive (ham flagged
>> as spam) in every 2,500 hams, while maximizing the number of spams
>> correctly flagged as spams.  With so few hams hitting this rule in the
>> masscheck corpora, we're way below that threshold based on the data we
>> have.
>>
> I wrote that before I saw your clarification, sorry again for coming off
> as a jerk. Ignore it.
>
>
>>> This is my fault, of course, but I'm not allowed to mass-check this
>>> stuff. It's ongoing legal correspondence.
>> Er, what?  You're not allowed to provide a list of which rules hit each
>> of your emails?  Or you're not allowed to run a program on your emails
>> that isn't spamassassin?  Or did I just not put "This does not require
>> sending us your email" in bold enough times on the masscheck page?
>>
> This is a client of ours (a law firm) and not the company that I work
> for. *I* know there's probably nothing sensitive in there, but just to
> cover my ass I'd need to get permission to send the results off-site.
> From their perspective, it's just simpler to say no: it's not worth the
> time or effort to even think about if there's a minute chance of it
> coming back to bite them legally.



signature.asc
Description: OpenPGP digital signature


Re: HK_LOTTO hitting ham from the UK national lottery

2012-10-31 Thread Alexandre Boyer
Hello,

Well as far as I know, if your SA instance restart after sa-update, it
should find the most recent and up to date ruleset.

Did you restart your instance? if you use amavis, restart it as well.

You may want to remove the ancient (theoritacally unsued) rulesets in
/var/lib/spamassassin in order to keep the most up to date one.

If this do not work, review your configuration. I don't know, maybe
/var/lib/spamassassin/3.003001 is hardcoded somewhere?

Alex, from prypiat.
Yes, I recycle.


On 12-10-31 03:16 AM, Niamh Holding wrote:
> Hello Niamh,
>
> Tuesday, October 30, 2012, 7:18:23 PM, you wrote:
>
> NH> However it seems spamassassin is using this rule from the older
> NH> /var/lib/spamassassin/3.003001
>
> No, it's there in 3.32 as well... I was grepping in the wrong place!
>



signature.asc
Description: OpenPGP digital signature


Re: HK_LOTTO hitting ham from the UK national lottery

2012-10-30 Thread Alexandre Boyer
This tends to proove that you do not sa-update your installation.

$ grep -r HK_LOTTO /usr/share/spamassassin/
/usr/share/spamassassin/50_scores.cf:score HK_LOTTO 3.599 2.755 2.993 3.599

You may either use sa-update (score is lowered to 1) or override the
score in your personnal ruleset.


Alex, from prypiat.
Yes, I recycle.


On 12-10-30 02:46 PM, Niamh Holding wrote:
> Hello John,
>
> Tuesday, October 30, 2012, 6:37:11 PM, you wrote:
>
> JH> score HK_LOTTO_NAME   0.998 0.998 0.998 0.998
>
> That is not the test I named-
>
> *  3.6 HK_LOTTO HK_LOTTO
>



signature.asc
Description: OpenPGP digital signature


Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-28 Thread Alexandre Boyer
Alex, from Nexus7.
Boyaah!
Le 28 oct. 2012 14:16, "John Hardin"  a écrit :
>
> On Sun, 28 Oct 2012, Alexandre Boyer wrote:
>
>> Le 26 oct. 2012 11:06, "Axb"  a écrit :
>>>
>>>
>>> That is all done on SA servers - all you need to do is upload your
>>
>> masscheck logs.
>>
>> I understood that. I however need to rescore my ruleset because the
setup I
>> inherited was 1) not updated with sa-update and 2) manually maintained
>> (with , for example, lot's of perso rules that essentially do the same as
>> the SA rules added over time).
>
>
> Apologies if I am misinterpreting what you are saying here, but I thought
this should be made clear:
>
> Please DO NOT run a masscheck using any local rules.
>

Of course, this must be advertised as often as possible. I'm personaly
using mass-check on both standard SA install (updated via sa-update) and on
my own ruleset, for use of hit-frequencies and fp-fn-statistics.

I was aware of which results where relevant to the SA project, but it's
very good to precise things :-)

> Contributing to masscheck does not involve any configuration changes to
your local production SA install. You need to be running masscheck using
pristine sources updated from SVN, with no local extras added in.
Masschecks should not be run using your production SA installation.
>
> The only locally-sourced things you provide to your masscheck environment
are (1) your trusted networks setup so that masscheck knows the topology of
the mail system the corpora are from, and (2) your hand-vetted ham and spam
corpora.
>
> --
>  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
>  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
>  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> ---
>   ...the Fates notice those who buy chainsaws...
>   -- www.darwinawards.com
> ---
>  3 days until Halloween

I wont be able to upload ham corpus, as those mails contains private data.
I could contribute to your personnal effort in two ways: helping you to get
better rules on those languages (french being my mother tongue, I could
help you on latin languages) and provide you with samples.


Re: Masscheck Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-28 Thread Alexandre Boyer

Alex, from osmose.
Bow before me, for I am root.

On 12-10-26 12:18 PM, dar...@chaosreigns.com wrote:
> On 10/26, Alexandre Boyer wrote:
>> Well, discouraged was implicit (as is the fact that every admin is
> I don't think there's anything implicit about it being discouraged to use a
> threshold below 5.  There are lots of local changes which are far less
> likely to cause problems, and encouraged.
>
>> The SA rules scores are computed based on the mass-checks, from the
>> project and, to some extend, from contributors. A good question is: how
>> many contributors really give a feedback on the mass-checks?
> This is public information, although not very explicit.
> On http://ruleqa.spamassassin.org/ look in the green box, it lists all the
> corpora included:
>
>   axb-coi-bulk
>   axb-fraud
>   axb-generic
>   axb-ham-misc
>   axb-sa-users
>   axb-woas
>   bb-guenther_fraud
>   bb-jhardin
>   bb-jhardin_fraud
>   bb-jm
>   bb-kmcgrail
>   bb-zmi
>   bpoliakoff
>   danmcdonald
>   darxus
>   grenier
>   jarif
>   kpg-gah
>   mas
>   zmi
>
> The ones starting with "bb-" are uploaded emails, instead of running
> masscheck locally, it's run centrally.  Other than that, the prefixes are
> each different contribtors.  So:
>
> axb, guenther, jhardin, jm, kmcgrail, zmi, bpoliakoff, danmcdonald,
> darxus, grenier, kpg-gah, kpg, mas, zmi.
>
> 14 masscheck contributors.  We'd probably benefit a lot by significantly
> increasing that, which is why I mention it somewhat often.
>
>> This is something I do not know, but the fewer they are, the greater the
>> bias is. Bias in spam and ham samples. Emails reaching my servers are
>> different from yours and from each and every SA users.
> Absolutely.
>
>> Unless everybody on earth run a nightly mass-check and report results to
>> SA project for it to compute a "world wide" scoring, there is a bias. At
>> least this is my understanding, may be I'm wrong, please correct me if so.
> No, you're totally right.  We do what we can with what we have, and I think
> we do pretty darn good.  But we could do better with more data.  
>
>> For example, I'm in the process of learning to use mass-check to
>> contribute back to SA (which implies a lot of hard work, simply to build
>> and maintain valid ham/spam corpora, use mass-check, then hit-freq, then
>> fp-fn-stat, I'm not even close to understand how to compute a re-score.
> I don't know what fp-fn-stat is.  You don't need to computer a re-score -
> that's part of what is done with your maccheck data after you upload it.

I replied AXB about this, it's my problem, nothing to do with SA in
itself ;)

fp-fn-statistics is a script in sa-trunk/masses that is telling you how
much fps and fns you're doing given your mass-check data and
hit-frequencies. You can then choose which score-set to use and which
threshold.

Very useful indeed, especially for those who have tons of personal rules.

>
> There's a reletively recently created mailing list specifically for helping
> people with this stuff, to which I believe you automatically get subscribed
> when you get a masscheck account:
> http://wiki.apache.org/spamassassin/MailingLists#RuleQA

I will subscribe sooner or later. It all depends of my other problems,
you know, maintaining dozens of servers in operation. Not a big deal but
time consuming.

>
> If you're having difficulty with it, the docs probably need improvement, so
> do let us know.

Up to the mass-check, I've got nothing to complain about. The doc is
pretty clear (while and introduction to SA in general and mass-check in
particular could not harm). On the contrary, doc is missing for the next
steps (when one want/need to use the other scripts in sa-trunk/masses,
forcing you to read the code and take guesses about script's purpose and
in which order they should be used).

>
>
> Your mention of fp-fn-stat makes me think you may have veered a little too
> far from https://wiki.apache.org/spamassassin/NightlyMassCheck

I will certainly stick to this as per my (later) contrib to SA.

>
>> with this, I'm not sure my contribution would be sufficient to make SA
>> scores to be closer to my email traffic reality.
> I think it would.  For example, I'm sure, from what you've posted, that you
> have enough examples of hams that hit DEAR_SOMETHING that the score of it
> would drop significantly.
>
>> Do you have any stat about how many contributors are giving a feedback
>> on the masscheck? and about their geographical location? I'm just asking
>> because I was not able to find this kind of information anywhere.

Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-28 Thread Alexandre Boyer
Alex, from Nexus7.
Boyaah!
Le 26 oct. 2012 11:06, "Axb"  a écrit :
>
> On 10/26/2012 04:47 PM, Alexandre Boyer wrote:
>>
>> For example, I'm in the process of learning to use mass-check to
>> contribute back to SA (which implies a lot of hard work, simply to build
>> and maintain valid ham/spam corpora, use mass-check, then hit-freq, then
>> fp-fn-stat, I'm not even close to understand how to compute a re-score.
>
>
>  You don't need to compute the rescore, etc, etc
> That is all done on SA servers - all you need to do is upload your
masscheck logs.
>

I understood that. I however need to rescore my ruleset because the setup I
inherited was 1) not updated with sa-update and 2) manually maintained
(with , for example, lot's of perso rules that essentially do the same as
the SA rules added over time).

As a brutal reset is out of question, I need to do things step by step,
rescoring being one of them prior to have my threshold back to 5 and
sa-update enabled.

All this being my own private problem, nothing to do with our off topic
exchange :-)

>
>>
>> Do you have any stat about how many contributors are giving a feedback
>> on the masscheck? and about their geographical location? I'm just asking
>> because I was not able to find this kind of information anywhere.
>
>
>
> http://ruleqa.spamassassin.org/
>
> the blocks on the right side indicate the different corpus (and user
handles they belong to)

Arround 10 corpora. Are those corpora used tu run the SA mass-check on SA
servers or do it also include what I will send one day (my mc logs)?

Is there any mean to have a geograpical mapping? Or do you think this is
not relevant?

If so, this is realy small and biased corpus (in terms of statistical
analyzis). I just can't wait to see my results with my french, german,
russian and chinese ham and spam messages (for ex. with all those FPs wih
all the FUZZY_* rules, all being rescored and/or rewritten on my ruleset
:-)  )


Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-26 Thread Alexandre Boyer

Alex, from prypiat.
Yes, I recycle.


On 12-10-25 03:04 PM, dar...@chaosreigns.com wrote:
> On 10/25, Bowie Bailey wrote:
>> On 10/25/2012 10:47 AM, Simon Loewenthal wrote:
>>> *  2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
>>>
>>> Does anyone know the rational behind this, or is our user base simply 
>>> communicating on a higher level?  :)  I imagine the rational is sound, but 
>>> I do not know what it is.
>> The rationale is simple.  The masscheck finds that this rule hits
>> more spam than ham, so it gets a higher score.
> It's slightly more complicated than that.  It's that this score results in
> the maximum spams flagged as spam without exceeding 1 false positive in
> 2,500 non-spams.
>
> A fun example is SUBJ_YOUR_DEBT, which was getting a score of 3.0 while
> hitting more non-spam than spam.  I guess it got disabled somehow.
>
>
> But more importantly, it's because we do not have have the rule
> hit statistics from your email to include them in optimal score
> generation because you're not submitting those stats via masscheck:
> https://wiki.apache.org/spamassassin/NightlyMassCheck
>
>
> RuleQA results for that rule are here:
> ruleqa.spamassassin.org/?daterev=20121020&rule=DEAR_SOMETHING
>
>   MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
>   0   0.6160   0.2324   0.7260.632.00  DEAR_SOMETHING  
>
> It hits 0.6% of spam, and 0.2% of non-spam (ham).
>
>
> On 10/25, Alexandre Boyer wrote:
>> Simon, I had some FPs because of this rule and because my threshold is
>> lower than 5.
> If you could just append "and I know this is highly discouraged"
> any time you say that, you might reduce my need to point it out to
> avoid you causing other people to think that might be a good idea.
> Scores are generated with a threshold of 5. It's often recommended to
> use a threshold above 5 for an extra safety measure.  Do you even have a
> guess what rate of false positives your causing with a lower threshold?
> I don't.
>

Well, discouraged was implicit (as is the fact that every admin is
responsible for it's own config) but I will remember to precise this
disclaimer should I mention this point again.

I know that my threshold is not good; I'm not satisfied with it but I
inherited this config from previous admins and all my maps (with
personal rules and score overrides) are "computed" with a threshold of 4.

It's not what I want, but I have to do with it. Note that may be one day
I will have SA to work as it should. To answer your question, I don't
have so many FPs because SA is not the only engine used on my systems,
I've a bunch of other filters running. My catch rate is arround 99.8%
and my FP rate is between 0.0001% and 0.03% (this is computed by an
independent source, but I have approximately the same internal stats on
my ham/spam feeds) on approximately 1.5 - 2 million messages a day.

Going a little bit more off topic now:

The SA rules scores are computed based on the mass-checks, from the
project and, to some extend, from contributors. A good question is: how
many contributors really give a feedback on the mass-checks?

This is something I do not know, but the fewer they are, the greater the
bias is. Bias in spam and ham samples. Emails reaching my servers are
different from yours and from each and every SA users.

Unless everybody on earth run a nightly mass-check and report results to
SA project for it to compute a "world wide" scoring, there is a bias. At
least this is my understanding, may be I'm wrong, please correct me if so.

For example, I'm in the process of learning to use mass-check to
contribute back to SA (which implies a lot of hard work, simply to build
and maintain valid ham/spam corpora, use mass-check, then hit-freq, then
fp-fn-stat, I'm not even close to understand how to compute a re-score.
And the doc is, when available, not really clear about this); but even
with this, I'm not sure my contribution would be sufficient to make SA
scores to be closer to my email traffic reality.

Do you have any stat about how many contributors are giving a feedback
on the masscheck? and about their geographical location? I'm just asking
because I was not able to find this kind of information anywhere.

>> I just had a score override to lower it but this rule still hist a lot
>> of spam (419 scams essentially).
> Yup, nothing wrong with customizing your rules to suit the email you get
> better.  At least in the direction of reducing false positives.  
>


Re: Question about rule: 2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'

2012-10-25 Thread Alexandre Boyer
Hi all,

Simon, I had some FPs because of this rule and because my threshold is
lower than 5.

I just had a score override to lower it but this rule still hist a lot
of spam (419 scams essentially).

You may want to fine tune the score according to your specific FPs.

Regards,

Alex, from prypiat.
Yes, I recycle.


On 12-10-25 10:57 AM, Bowie Bailey wrote:
> On 10/25/2012 10:47 AM, Simon Loewenthal wrote:
>> Evening all,
>>
>> A great majority of our ham starts with Dear Sir/ Dear Madam / Dear Bob.
>>
>> Therefore I've always wondered why this this is scored so highly:
>>
>> *  2.0 DEAR_SOMETHING BODY: Contains 'Dear (something)'
>>
>>
>> Does anyone know the rational behind this, or is our user base simply
>> communicating on a higher level?  :)  I imagine the rational is
>> sound, but I do not know what it is.
>
> The rationale is simple.  The masscheck finds that this rule hits more
> spam than ham, so it gets a higher score.
>


Re: spamd not staying up

2012-10-20 Thread Alexandre Boyer
Hi there,

This suggestion should be considered as a last chance.

I used monit in the past and had very nasty behaviors, multiple
instances of the same process running. May be monit is better know.

Debuging using your logs and knowledge is the first thing you should do.
Try to find what is your real problem, where does it come from instead
of patching without knowing.

The point from Duane is interesting: did you check that? also check your
messages log (there should be an equivalent to dmsg under freebsd) to
find out kernel warnings, hardware problemes, Perl troubles or such things.

Regards

Alex, from osmose.
Bow before me, for I am root.

On 12-10-20 02:33 AM, Robert Schetterer wrote:
> Am 19.10.2012 22:21, schrieb Ted Mittelstaedt:
>> Hi All,
>>
>>   Last month I put in a new mailserver, here are the specs:
>>
>> FreeBSD 8.3 amd64bit
>> 8GB ram
>> 2TB mirrored disk space
>> dual Xeon E5310s
>> Intel motherboard
>>
>> top output:
>>
>> last pid: 82946;  load averages:  0.71,  0.74,  0.65up 5+06:39:07
>> 13:12:33
>> 94 processes:  1 running, 93 sleeping
>> CPU:  0.3% user,  0.0% nice,  0.5% system,  0.0% interrupt, 99.2% idle
>> Mem: 395M Active, 6058M Inact, 1061M Wired, 376M Cache, 827M Buf, 25M Free
>> Swap: 4096M Total, 548K Used, 4095M Free
>>
>> The problem is that for seemingly no reason every once in a while
>> spamd will exit.  It seems to be graceful exit since it deletes it's
>> pid file - whereas if I do a kill -9 it will not delete the pid file.
>>
>> I ended up writing a script that checks once an hour for the existence
>> of spamd in the process table and if it's not there it restarts it.
>>
>> Over the last month it did it on
>> 10/3
>> 10/4
>> 10/5
>> 10/8
>> 10/11
>> 10/15
>> 10/17
>> 10/18 twice
>>
>> there is no defined time it does it.  Sometimes in the afternoon,
>> sometimes in the morning.  This is under Perl 5.14.2.  The core ram
>> consumed by the process does not appear to be increasing over time
>> that it is running.
>>
>> Any suggestions?
>>
>> Ted
>>
> as workaround you can use monit
> for monitor and restart, also usefull for other services
>
> you might have to fit example to your needs and distro
>
> i.e
>
> http://mmonit.com/wiki/Monit/ConfigurationExamples#spamd
>
> check process spamd with pidfile /var/run/spamd.pid
>group mail
>start program = "/etc/init.d/spamd start"
>stop  program = "/etc/init.d/spamd stop"
>if 5 restarts within 5 cycles then timeout
>if cpu usage > 99% for 5 cycles then alert
>if mem usage > 99% for 5 cycles then alert
>depends on spamd_bin
>depends on spamd_rc
>
>  check file spamd_bin with path /usr/local/bin/spamd
>group mail
>if failed checksum then unmonitor
>if failed permission 755 then unmonitor
>if failed uid root then unmonitor
>if failed gid root then unmonitor
>
>  check file spamd_rc with path /etc/init.d/spamd
>group mail
>if failed checksum then unmonitor
>if failed permission 755 then unmonitor
>if failed uid root then unmonitor
>if failed gid root then unmonitor
>
>




signature.asc
Description: OpenPGP digital signature


Re: I thought this message was rather spammy

2012-10-17 Thread Alexandre Boyer

On 12-10-17 02:32 PM, Ned Slider wrote:
> On 17/10/12 18:51, Alexandre Boyer wrote:
>> Right, but you have the content on the other link:
>>
>> http://igor.chudov.com/tmp/spam013.trace.txt
>>
>>
>> It scores 5.7 and should be blocked.
>>
>
> The message scored 2.3 when it was originally received.
>
> It only scored 5.7 when it was later reevaluated by SA at which point
> a URI is now hitting 2 URIBLs and thus increasing the score. These
> URIs have obviously just been added to the blacklists since this spam
> run started.
>
> Greylisting can often help here as even though it won't block the
> message it can delay the message long enough for offending IPs or URIs
> to get added to blacklists.
>

Totally agree with this analyze. :-)

Greylisting is still a powerful anti-spam tool and easy to set-up.

Alex, from prypiat.
Yes, I recycle.




Re: I thought this message was rather spammy

2012-10-17 Thread Alexandre Boyer
Right, but you have the content on the other link:

http://igor.chudov.com/tmp/spam013.trace.txt


It scores 5.7 and should be blocked.

Igor, what's the threshold of your SA installation?

Alex, from prypiat.
Yes, I recycle.


On 12-10-17 01:44 PM, John Hardin wrote:
> On Wed, 17 Oct 2012, Igor Chudov wrote:
>
>> Here's the spam message: http://igor.chudov.com/tmp/spam013.txt
>
> No permissions to view that.
>


Re: Can't locate Bignum.pm

2012-10-09 Thread Alexandre Boyer
Hi there,

If you're asking a question, I guess you wonder why you are seeing this
in your logs.

The answer is simple: your system lacks a Perl module.

Install it with your distribution package manager or directly via the CPAN.

If you are not asking any question, then ignore this answer and try to
clarify your point ;-)

Best regards,

Alex, from prypiat.
Yes, I recycle.


On 12-10-09 09:52 AM, Niamh Holding wrote:
> Hello
>
> maillog in showing-
>
> Oct  9 08:18:25 mail spamd[25346]: spamd: server killed by SIGTERM, shutting 
> down 
> Oct  9 08:18:25 mail spamd[28876]: logger: removing stderr method 
> Oct  9 08:18:26 mail spamd[28878]: Can't locate Crypt/OpenSSL/Bignum.pm in 
> @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi 
> /usr/lib/perl5/site_perl/5.8.6 
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi 
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi 
> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi 
> /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 
> /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl 
> /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi 
> /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi 
> /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi 
> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi 
> /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl/5.8.5 
> /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 
> /usr/lib/perl5/vendor_perl /usr/lib/perl5/5.8.6/i386-linux-thread-multi 
> /usr/lib/perl5/5.8.6) at 
> /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi/Crypt/OpenSSL/RSA.pm 
> line 17. 
> Oct  9 08:18:31 mail spamd[28878]: spamd: server started on port 783/tcp 
> (running version 3.3.2) 
> Oct  9 08:18:31 mail spamd[28878]: spamd: server pid: 28878
>



signature.asc
Description: OpenPGP digital signature


Re: Pyzor?

2012-10-06 Thread Alexandre Boyer
Alex, from Nexus7.
Boyaah!
Le 6 oct. 2012 06:37, "Arthur Dent"  a écrit :
>
> On Sat, 2012-10-06 at 12:25 +0200, Axb wrote:
> > On 10/06/2012 12:14 PM, Arthur Dent wrote:
> > > I am trying to improve the performance of SA on my small home server.
I
> > > use the sought rules, but though I would also include Razor and
Pyzor. I
> > > am no stranger to the command line and not afraid of compiling from
> > > source (and that is what I did in the past when installing
Razor/Pyzor)
> > > but I noticed that Pyzor was available in the Fedora17 repos - so I
yum
> > > installed it...
> > >
> > > I put "pyzor_options --homedir /home/mark/.pyzor"
> > > in /etc/mail/spamassassin/local.cf
> > >
> > > I ran "pyzor --homedir ~/.pyzor discover" and I restarted SA.
> > >
> > > To test it I used the incantation recommened on the Wiki:
> > > $ echo "test" | spamassassin -D pyzor 2>&1 | less
> > >
> > > And this is what I get:
> > >
==8<===
> > > Oct  6 11:11:46.956 [10904] dbg: pyzor: network tests on, attempting
Pyzor
> > > Oct  6 11:11:52.055 [10904] dbg: pyzor: pyzor is available: /bin/pyzor
> > > Oct  6 11:11:52.056 [10904] dbg: pyzor: opening pipe: /bin/pyzor
--homedir /home/mark/.pyzor check < /tmp/.spamassassin10904BmyCb9tmp
> > > Oct  6 11:11:52.344 [10904] dbg: pyzor: [10906] finished: exit 1
> > > Oct  6 11:11:52.345 [10904] dbg: pyzor: check failed: no response
> > > X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
mydomain.org
> > > [snip rest...]
> > >
==8<===
> > >
> > > What have I done wrong?
> > >
> > > (BTW I also yum installed Razor and that seem to work OK according to
a
> > > similar test).
> > >
> > > Thanks in advance
> >
> > are you running SA as user "mark"?
>
> Yes I am...
>
> >
> > pyzor ping
> > that should create  ~/.pyzor/servers
>
> I thought that was what "pyzor --homedir ~/.pyzor discover" did?
> I already have ~/.pyzor/servers:
>
> $ ll ~/.pyzor/
> total 4
> -rw---. 1 mark mark 23 Oct  6 00:08 servers
>
> $ cat ~/.pyzor/servers
> public.pyzor.org:24441
>
> But I will try the ping command:
> $ pyzor ping
> public.pyzor.org:24441  (200, 'OK')
>
> Hmm... seems OK...
>
> > and you should be ready to go
>
> OK let's try again:
>
==8<===
> Oct  6 11:33:41.959 [11067] dbg: pyzor: network tests on, attempting Pyzor
> Oct  6 11:33:58.504 [11067] dbg: pyzor: pyzor is available: /bin/pyzor
> Oct  6 11:33:58.506 [11067] dbg: pyzor: opening pipe: /bin/pyzor
--homedir /home/mark/.pyzor check < /tmp/.spamassassin11067FvGe8ltmp
> Oct  6 11:33:58.608 [11067] dbg: pyzor: [11069] finished: exit 1
> Oct  6 11:33:58.609 [11067] dbg: pyzor: check failed: no response
> X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mydomain.org
>

mydomain.org: Are you redacting this for the list?

Just a question, I'm not familiar with *zors.

==8<===
>
> No change...
>
> Could it be a problem at their end?
>
> Thanks
>
> Mark
>
>
>


Re: Words with embedded symbols

2012-10-05 Thread Alexandre Boyer
Try my regex ( /[:;`(){}~#&"%$_][a-z][:;`(){}_~#&"%$]/im ) in a subject
header check, and meta this with something like __HAS_ANY_URI and or
SUBJ_ALL_CAPS.

You may also want to upper your scoring for URIBL rules.

And train your bayesian filter with those spam messages. BAEYS_00 means
they are considered hammy.

If you do not train your baeysian filter, disable it at once (add
"use_bayes 0" and "bayes_auto_learn 0" in your local.cf)

These are only my opinions, I apologize for not being able to provide
more support (busy day at work).

Alex, from prypiat.
Yes, I recycle.


On 12-10-05 01:17 PM, Cathryn Mataga wrote:
> Thanks for the comments. I'll see if I can cook something up here.
> Someone asked to see the
> actual messages.
>
> I collected 4 of these messages and put them at this link.
>
> http://www.mataga.net/mataga/spam.txt
>


Re: Words with embedded symbols

2012-10-05 Thread Alexandre Boyer
Hello,


On 12-10-05 08:43 AM, Martin Gregorie wrote:
> On Thu, 2012-10-04 at 20:56 -0700, Cathryn Mataga wrote:
>> I'm getting a lot of SPAM with words written like this. These are pretty
>> horrible, and I don't like
>> getting them every day.
>>
>> A:N ;A %L"
>> P:O ~R %N ( P &lCT U #R&E /
>>
>> Is there a way to make a rule for strings of characters that would
>> ignoring non-alpha characters embedded
>> in the string?

Not in a rule, but you may want to code a plugin that would first get
rid of all non words characters, then guess if some parts of the
resulting (possibly very long) string matches SEX words, then guess if
the mail is a sexy joke or a pure annoyance.

Let's say that it would be difficult and painfull for a non obvious result.

> Try this:
>
> describe MG_TWOLETTER_OBFUSCATION Two letter obfuscation (X:X X :X))
> body MG_TWOLETTER_OBFUSCATION /[A-Z]\W[A-Z] \W[A-Z]\W[A-Z]/
> scoreMG_TWOLETTER_OBFUSCATION 5.0

It works, but matches only the second line ("T U #R&E "). If the spams
Cathryn receives is composed with those two lines, this rule is
effective enough.

But if you see some varations (different words and obfu, case
sensitivity etc.), you may want to work the regex a little more.

Based on Martin's work, here is an example:

bodyME_SPAM/[:;`(){}~#&"%$][a-z][:;`(){}~#&"%$]/im

Use a tflag multiple and if you count, say, 2 to 4 of them, flag the
mail as a spam.

You also may want to meta this with some header checks to avoid false
positives (is it HTML_ONLY, FREEMAIL_FROM, !SPF_PASS etc. ?).

Bayesian learning should also be pretty helpfull if you use it.
> It matches the data you posted and does not match anything else in my
> spam corpus, so its quite specific the that type of spam, on the
> contents of my mail stream anyway, but of course ymmv.
>
>
> Martin
>
>




Alex, from prypiat.
Yes, I recycle.




Re: short prolific spam

2012-10-02 Thread Alexandre Boyer
Hi there,

first, your threshold is high. You may want to lower it a little bit.

Then, if it's always the same phrase, rule it:

body__AYOY/HELLO dude/

Then meta this with other thing you may see a lot in those spams:

metaME_SPAMRCVD_IN_SORBS_WEB && __AYOY
score   ME_SPAM2.0
metaME_DCCDCC_CHECK && __AYOY
scoreME_DCC2.0
metaME_GETRIDOFITME_SPAM && ME_DCC
scoreME_GETRIDOFIT2.0

You may also prefer to work with specific headers and this kind of
thing, but the basoc idea is there.

Hope this helps.

Alex, from prypiat.
Yes, I recycle.


On 12-10-01 12:53 PM, JP Kelly wrote:
> I am getting a bunch of particularly annoying spam which always has a short 
> html body message similar to:
>
> HELLO dude
>
> Any ideas how to combat this spam?
>
> Here is an example:
>
> X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smallgod.net
> X-Spam-Status: No, score=6.3 required=7.0 tests=BAYES_50,DCC_CHECK,
>   HTML_MESSAGE,RCVD_IN_SORBS_WEB autolearn=no version=3.3.1
> X-Spam-LocalCF: procByLocalCf
> X-Spam-Report: 
>   *  1.5 RCVD_IN_SORBS_WEB RBL: SORBS: sender is an abusable web server
>   *  [87.204.239.251 listed in dnsbl.sorbs.net]
>   *  0.4 HTML_MESSAGE BODY: HTML included in message
>   *  0.5 BAYES_50 BODY: Bayes spam probability is 40 to 60%
>   *  [score: 0.5616]
>   *  3.9 DCC_CHECK Detected as bulk mail by DCC (dcc-servers.net)
> X-Spam-jpkvideo: jpkPrefUsed
> Received: (qmail 32278 invoked from network); 1 Oct 2012 09:44:49 -0700
> Received: from mx2.smallgod.net (72.10.53.122)
>  by mail.smallgod.net with (DHE-RSA-AES256-SHA encrypted) SMTP; 1 Oct 2012 
> 09:44:49 -0700
> Received: (qmail 3393 invoked from network); 1 Oct 2012 09:44:49 -0700
> Received: from 87-204-239-251.ip.netia.com.pl (87.204.239.251)
>  by mx2.smallgod.net with SMTP; 1 Oct 2012 09:44:48 -0700
> Received: from apache by sterkinekor.com with local (Exim 4.67)
>   (envelope-from )
>   id KI47ED-67D0M2-W8
>   for ; Mon, 1 Oct 2012 21:14:48 +0430
> To: 
> Subject: hello
> X-PHP-Script: sterkinekor.com/sendmail.php for 87.204.239.251
> From: "Octavio Herron" 
> X-Sender: "Octavio Herron" 
> X-Mailer: PHP
> X-Priority: 1
> MIME-Version: 1.0
> Content-Type: multipart/alternative;
>   boundary="05070400107050204080604"
> Message-Id: <56t5o1-w4822o...@sterkinekor.com>
> Date: Mon, 1 Oct 2012 21:14:48 +0430
>
> This is a multi-part message in MIME format.
> --05070400107050204080604
> Content-Type: text/plain; charset="us-ascii"; format=flowed
>
> HELLO dude
>
> --05070400107050204080604
> Content-Type: text/html; charset="iso-8859-2"
>
> 
> 
>  
>
>  
>  
>
> HELLO dude
> 
>  
> 
>
> --05070400107050204080604--


Re: HTML link regex

2012-09-28 Thread Alexandre Boyer
Great, thanks, will do that today.

Alex, from osmose.
Bow before me, for I am root.

On 12-09-27 07:04 PM, dar...@chaosreigns.com wrote:
> On 09/27, Alexandre Boyer wrote:
>> I met you earlier on the IRC channel, remember?
> Yup.
>
>> Anyway, I would be glad to submit my rules (corrected by Bowie Bailey).
>> I indeed asked how one could do that.
> Open a bug:  https://issues.apache.org/SpamAssassin/
>
> Include the rule(s) and request that they be added to ruleqa.
>
>
> Just came across an old related bug:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=4372
>




signature.asc
Description: OpenPGP digital signature


Re: HTML link regex

2012-09-27 Thread Alexandre Boyer
Hi there Darxus !

I met you earlier on the IRC channel, remember?

Anyway, I would be glad to submit my rules (corrected by Bowie Bailey).
I indeed asked how one could do that.

Should I start a sandbox? I'm familiar with some aspects of SA, but the
"return to the project" lack to my personnal culture.

And I would like to develop this part a lot :-)

Starting with connecting to the IRC channel from time to time :-D

Alex, from osmose.
Bow before me, for I am root.

On 12-09-27 06:41 PM, dar...@chaosreigns.com wrote:
> On 09/25, John Hardin wrote:
>> This topic comes up regularly enough that it should be a FAQ.
> Yeah.  I haven't read this thread enough to know if it's been said, but
> here's a previous thread on the subject:
>
> http://spamassassin.1065346.n5.nabble.com/antiphishing-td52027i20.html
>
> And the existing rules:  ruleqa.spamassassin.org/?rule=%2Fspoofed_url
>
>   MSECSSPAM% HAM% S/ORANK   SCORE  NAME   WHO/AGE
>   0   1.9104   0.4468   0.8100.550.01  T_SPOOFED_URL_HOST  
>   0   1.9456   0.5844   0.7690.530.01  T_SPOOFED_URL  
>   0   2.0437   3.6954   0.3560.37   (n/a)  __SPOOFED_URL_HOST  
>   0   2.0917   4.0246   0.3420.36   (n/a)  __SPOOFED_URL  
>
>
> Although, as John mentioned, this wasn't targeting specific domains.  If
> rules that you come up with do actually work for you, please submit them
> for inclusion in spamassassin QA, to see if they work well enough to
> include in future sa-updates.
>




signature.asc
Description: OpenPGP digital signature


Re: HTML link regex

2012-09-27 Thread Alexandre Boyer
Alex, from Nexus7.
Boyaah!
Le 27 sept. 2012 14:34, "Bowie Bailey"  a écrit :
>
>
> On 9/27/2012 1:48 PM, Alexandre Boyer wrote:
>>
>> Alex, from prypiat.
>> Yes, I recycle.
>>
>>
>> On 12-09-27 11:09 AM, Bowie Bailey wrote:
>>>
>>> On 9/27/2012 10:41 AM, Alexandre Boyer wrote:
>>>>
>>>> Hello all,
>>>>
>>>> Here is a small ruleset that I'm working with. I added it to our
>>>> local ruleset in prod:
>>>>
>>>>  # BAD LINKS N-NG ;-) ;
>>>>  # Canada Post
>>>>

   &n
>>>>  b sp;
>>>>  uri_detail   AJB_CANPOST_BADLINK raw !~ /canadapost\./
>>>>  text =~ /(?:https?:\/\/(?:www\.)?|www\.)canadapost\./ type =~
/^a$/
>>>>  describe AJB_CANPOST_BADLINK Found a mismatch
>>>>  between href and anchored text pretending to link to
>>>> www.canadapost.ca
>>>>  scoreAJB_CANPOST_BADLINK 1.0
>>>>  meta AJB_CANPOST_PHISH_BADTRACKNUM   Z_CANPOST_BADLINK &&
>>>>  !Z_CANPOST_TRACKNUM
>>>>  describe AJB_CANPOST_PHISH_BADTRACKNUM   Mismatch between href
>>>>  and anchored + unofficial tracking number from CanadaPost
>>>>  scoreAJB_CANPOST_PHISH_BADTRACKNUM   2.0
>>>>  #
>>>> youtube
>>>> &
>>>>  n bsp;
>>>>  uri_detail   AJB_UTUBE_BADLINK   raw !~ /youtube\./ text =~
>>>>  /(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/
>>>>  describe AJB_UTUBE_BADLINK   Found a mismatch between href and
>>>>  anchored text pretending to link to www.youtube.com
>>>>  scoreAJB_UTUBE_BADLINK   0.5
>>>>  # because of link trackers (from massmailer for example), we must
>>>>  meta this with other rulz to be sure we face our fake yutube
botnet
>>>>  meta  AJB_FK_UTUBE_BOTNET Z_UTUBE_BADLINK && Z_EMPTY_SUBJ
>>>>  && MIME_HTML_ONLY
>>>>  describe  AJB_FK_UTUBE_BOTNET mismatch between href and
>>>>  anchored + empty subject = botnet
>>>>  score AJB_FK_UTUBE_BOTNET 5.5
>>>>  ## & nbsp;
>>>>  # TODO: check if we could workwith  DKIM, exists:List-Unsubscribe,
>>>>  SPF_PASS, RCVD_IN_RP_SAFE, RCVD_IN_RP_CERTIFIED and others
>>>>  #in order to avoid FPs from MassMailers.
>>>>
>>>> Note the TODO ;-)
>>>
>>> Don't know if it makes much difference in this case, but...
>>>
>>> (?:https?:\/\/(?:www\.)?|www\.)
>>
>> Should catch:
>> http://
>> https://
>> http://www.
>> https://www.
>> www.
>>
>>> can be simplified to:
>>>
>>> (?:https?:\/\/|www\.)
>>>
>> While this catches:
>> http://
>> https://
>> www.
>>
>> Covering less. It's may be overkill, but my regex has one and only
>> purpose: match any kind of "valid" web link, as per common user
>> experience (ie. "as seen on TV").
>>
>> The spammer will try to lure the common user by mimic what the common
>> user is habituated to see, no?
>
>
> Check again.  "http://www."; and "https://www."; are caught by the "www."
pattern.  Matching the "https?://" as well is not needed. That's why I
mentioned anchoring.  If you were anchoring the front of the regexp, you
would need this match.  Since you are not, the extra specificity is not
needed.  My regexp matches exactly the same strings as yours.
>
>

Oups, that kind of anchoring... I thought you were pointing the type .

You're definitly right, sory for the misunderstanding.

I will update my rules with your simplier regex :-)

Alex, sometimes not focused on the right thing ;)

>>
>>> Since you're not anchoring the front of the regexp or trying to
>>> capture the match, the results will be the same.
>>>
>> Not capturing because not using thereafter. On a small system, this
>> makes no difference. On large systems (millions+ emails filtered a day),
>> this is probably making a difference. I take a guess here, I don't want
>> to prove this on my own systems :-)
>
>
> Right.  No need to capture here or in most SA rules.  I only mentioned it
since there would be a difference between your original regexp and my
suggestion if you were doing some capturing.
>
> As I said, it may not make any real difference here, I was simply
pointing out the possible simplification of the regexp.
>
> --
> Bowie


Re: HTML link regex

2012-09-27 Thread Alexandre Boyer

Alex, from prypiat.
Yes, I recycle.


On 12-09-27 11:09 AM, Bowie Bailey wrote:
> On 9/27/2012 10:41 AM, Alexandre Boyer wrote:
>> Hello all,
>>
>> Here is a small ruleset that I'm working with. I added it to our
>> local ruleset in prod:
>>
>> # BAD LINKS N-NG ;-) ;
>> # Canada Post
>>  
>>  
>>   
>> &n
>> b sp;
>> uri_detail   AJB_CANPOST_BADLINK raw !~ /canadapost\./
>> text =~ /(?:https?:\/\/(?:www\.)?|www\.)canadapost\./ type =~ /^a$/
>> describe AJB_CANPOST_BADLINK Found a mismatch
>> between href and anchored text pretending to link to
>> www.canadapost.ca
>> scoreAJB_CANPOST_BADLINK 1.0
>> meta AJB_CANPOST_PHISH_BADTRACKNUM   Z_CANPOST_BADLINK &&
>> !Z_CANPOST_TRACKNUM
>> describe AJB_CANPOST_PHISH_BADTRACKNUM   Mismatch between href
>> and anchored + unofficial tracking number from CanadaPost
>> scoreAJB_CANPOST_PHISH_BADTRACKNUM   2.0
>> #
>>
>> youtube  
>>  
>> 
>> &
>> n bsp;
>> uri_detail   AJB_UTUBE_BADLINK   raw !~ /youtube\./ text =~
>> /(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/
>> describe AJB_UTUBE_BADLINK   Found a mismatch between href and
>> anchored text pretending to link to www.youtube.com
>> scoreAJB_UTUBE_BADLINK   0.5
>> # because of link trackers (from massmailer for example), we must
>> meta this with other rulz to be sure we face our fake yutube botnet
>> meta  AJB_FK_UTUBE_BOTNET Z_UTUBE_BADLINK && Z_EMPTY_SUBJ
>> && MIME_HTML_ONLY
>> describe  AJB_FK_UTUBE_BOTNET mismatch between href and
>> anchored + empty subject = botnet
>> score AJB_FK_UTUBE_BOTNET 5.5
>> ## & nbsp;
>> # TODO: check if we could workwith  DKIM, exists:List-Unsubscribe,
>> SPF_PASS, RCVD_IN_RP_SAFE, RCVD_IN_RP_CERTIFIED and others
>> #in order to avoid FPs from MassMailers.
>>
>> Note the TODO ;-)
>
> Don't know if it makes much difference in this case, but...
>
> (?:https?:\/\/(?:www\.)?|www\.)

Should catch:
http://
https://
http://www.
https://www.
www.

>
> can be simplified to:
>
> (?:https?:\/\/|www\.)
>

While this catches:
http://
https://
www.

Covering less. It's may be overkill, but my regex has one and only
purpose: match any kind of "valid" web link, as per common user
experience (ie. "as seen on TV").

The spammer will try to lure the common user by mimic what the common
user is habituated to see, no?

> Since you're not anchoring the front of the regexp or trying to
> capture the match, the results will be the same.
>

Not capturing because not using thereafter. On a small system, this
makes no difference. On large systems (millions+ emails filtered a day),
this is probably making a difference. I take a guess here, I don't want
to prove this on my own systems :-)

Alex.


Re: HTML link regex

2012-09-27 Thread Alexandre Boyer
Hello all,

Here is a small ruleset that I'm working with. I added it to our local
ruleset in prod:

# BAD LINKS N-NG
;-) 

  

# Canada Post




uri_detail   AJB_CANPOST_BADLINK raw !~ /canadapost\./
text =~ /(?:https?:\/\/(?:www\.)?|www\.)canadapost\./ type =~ /^a$/
describe AJB_CANPOST_BADLINK Found a mismatch
between href and anchored text pretending to link to www.canadapost.ca
scoreAJB_CANPOST_BADLINK 1.0
meta AJB_CANPOST_PHISH_BADTRACKNUM   Z_CANPOST_BADLINK &&
!Z_CANPOST_TRACKNUM
describe AJB_CANPOST_PHISH_BADTRACKNUM   Mismatch between href
and anchored + unofficial tracking number from CanadaPost
scoreAJB_CANPOST_PHISH_BADTRACKNUM   2.0
#
youtube 

 

uri_detail   AJB_UTUBE_BADLINK   raw !~ /youtube\./ text =~
/(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/
describe AJB_UTUBE_BADLINK   Found a mismatch between href and
anchored text pretending to link to www.youtube.com
scoreAJB_UTUBE_BADLINK   0.5
# because of link trackers (from massmailer for example), we must
meta this with other rulz to be sure we face our fake yutube botnet

  

meta  AJB_FK_UTUBE_BOTNET Z_UTUBE_BADLINK && Z_EMPTY_SUBJ &&
MIME_HTML_ONLY
describe  AJB_FK_UTUBE_BOTNET mismatch between href and anchored
+ empty subject = botnet
score AJB_FK_UTUBE_BOTNET 5.5
##  

   

# TODO: check if we could workwith  DKIM, exists:List-Unsubscribe,
SPF_PASS, RCVD_IN_RP_SAFE, RCVD_IN_RP_CERTIFIED and others
#in order to avoid FPs from MassMailers.

Note the TODO ;-)

I will work on this later on.

A question here: as per
http://wiki.apache.org/spamassassin/RuleSandboxes#Editing_Another_Developer.27s_Sandbox,
one can create a sandbox to submit rules.

Is it open to anybody? Should I do this to submit those rules (and more,
some I already have on my side and future ones)?

Could anyone from the list give me a primer about this process? I'm
interested (with my boss approval) in becoming a regular contributor but
this part of SA project is a little cryptic to me right now.

Do not hesitate to contact me off-list if necessary.

Alex, from prypiat.
Yes, I recycle.


On 12-09-26 11:03 AM, Bowie Bailey wrote:
> On 9/26/2012 10:45 AM, Alexandre Boyer wrote:
>> Hi all,
>>
>> Me happy :-D
>>
>> It works as expected for simple rules.
>>
>> For example, to get rid off my problem with youtube links I had this
>> simple rule:
>>
>> uri_detail   Z_URIDETAIL_UTUBE_SPOOF   raw !~ /youtube\./ text =~
>> /(https?://)?(www\.)?youtube\./ type =~ /^a$/
>> scoreZ_URIDETAIL_UTUBE_SPOOF   10.0
>>
>
> The alternatives on the text regexp are irrelevant.  An equivalent
> simpler regexp would be:
>
> text =~ /youtube\./
>
> Any optional text at the beginning or end of a (non-anchored) regexp
> should be left off unless you are trying to capture it for later use.
>


Re: How to check from that is not on the header?

2012-09-26 Thread Alexandre Boyer

Alex, from prypiat.
Yes, I recycle.


On 12-09-26 11:09 AM, Sergio wrote:
> Hi all,
> how may I can check a FROM different to the one on the headers?
>
> I have seen that some emails on the FROM on the header has something
> different than the FROM on the email, as an example:

You are talking about the envelope from versus the body from.

Envelope from is used at SMTP transaction time. Body from is within the
headers, therefore it's part of the DATA command, and is possibly spoofed.

>
> FROM THE HEADERS:
> Received: from (127.0.0.1) by mail62.us1.rsgsv.net
>  (PowerMTA(TM) v3.5r16) id hcc8go0lj3g4
> for  >; Wed, 26 Sep 2012 14:28:26
> + (envelope-from
>  >)
> Subject:
> =?utf-8?Q?Masaje=20de=20Reflexolog=C3=ADa=20de=20pies=20con=20sales=20minerales=20relajantes=20y=20aromaterapia?=
> _*From: =?utf-8?Q?Cucupons.com?=  >*_
> Reply-To: =?utf-8?Q?Cucupons.com?=  >
>
> But the FROM that I want to block is the one that comes on the email:
> FROM:
> bounce-mc.us4_7776669.128085-Aileen.Miffs=anyemail@mail62.us1.rsgsv.net
> 
>
>
> I have the following rule:
>
> headerBLACKLIST_R From =~ /rsgsv\.net/i
> scoreBLACKLIST_R5.0
>

You may either do that:

header  BL_FROM_rsgsv  Received =~ /rsgsv\.net/i
score  BL_FROM_rsgsv  5.0

But you are subject to FPs in case that domain send you a legitimate
email some day.

Note that you may look upon a X-Envelope-From header also, depending on
your MTA and how and when it may log it in the headers.

Or you may choose to work on the body From:
header  BL_FROM_rsgsv  From:addr =~ /cucupons\.com/i
score  BL_FROM_rsgsv  5.0

But as this part of the mail is spoofable, you are succeptible to miss
some spams.


> But at the time of checking, it checks "cucupons.com
> " and the rule fails.
>
> What I have to use in order to check the FROM that comes on the email
> instead of the FROM that is on the headers?
>
> TIA.
>
> Sergio


Re: HTML link regex

2012-09-26 Thread Alexandre Boyer

Alex, from prypiat.
Yes, I recycle.


On 12-09-26 11:03 AM, Bowie Bailey wrote:
> On 9/26/2012 10:45 AM, Alexandre Boyer wrote:
>> Hi all,
>>
>> Me happy :-D
>>
>> It works as expected for simple rules.
>>
>> For example, to get rid off my problem with youtube links I had this
>> simple rule:
>>
>> uri_detail   Z_URIDETAIL_UTUBE_SPOOF   raw !~ /youtube\./ text =~
>> /(https?://)?(www\.)?youtube\./ type =~ /^a$/
>> scoreZ_URIDETAIL_UTUBE_SPOOF   10.0
>>
>
> The alternatives on the text regexp are irrelevant.  An equivalent
> simpler regexp would be:
>
> text =~ /youtube\./

I'm trying to anticipate on possible issues with with next generation
spams, trying to bypass my check. And this rule is, in my opinion,
intended to detect this particular case of spoofing. That is, an HTML
email trying to obfuscate the real link and luring the user with a fake
youtube link.

Therefore the new iteration of my regex (see below)

>
> Any optional text at the beginning or end of a (non-anchored) regexp
> should be left off unless you are trying to capture it for later use.
>

Right. Good point, thank you.

New regex:

uri_detail   Z_URIDETAIL_UTUBE_SPOOF   raw !~ /youtube\./ text =~
/(?:https?:\/\/(?:www\.)?|www\.)youtube\./ type =~ /^a$/
scoreZ_URIDETAIL_UTUBE_SPOOF   10.0




Re: HTML link regex

2012-09-26 Thread Alexandre Boyer
Hi all,

Me happy :-D

It works as expected for simple rules.

For example, to get rid off my problem with youtube links I had this
simple rule:

uri_detail   Z_URIDETAIL_UTUBE_SPOOF   raw !~ /youtube\./ text =~
/(https?://)?(www\.)?youtube\./ type =~ /^a$/
scoreZ_URIDETAIL_UTUBE_SPOOF   10.0

This is working great on my small FNs and FPs corpus. Very interesting
infos when runing debug mode, letting you know precisely what is
compiled and what matches.

Note the regex in "text" part, wich should prevent a false positive on
link like:

Hey my friend, check out my http://www.youtube.com/wa=
tch?v=3D3VvOFqaHbL5&">personal videos


For the moment, faked links are detected, valid one are not. Even when
adding a valid link in a spammy e-mail it works, letting me great hope
for "basic" detection of phishes (bank accounts).

I will develop some rules for my particular spams, including banks.
Those could be added to SA corpus I think, to catch verry short spams
(one liners with fake URI)

I will also try to see if more complex rules are working. Of course,
I'll let the list know if I'm successful or not.

Alex, from prypiat.
Yes, I recycle.


On 12-09-26 06:48 AM, Axb wrote:
> On 09/26/2012 12:46 PM, Martin Gregorie wrote:
>> On Wed, 2012-09-26 at 12:05 +0200, Axb wrote:
>>> have you looked at the URIDetail plugin ?
>>>
>> I didn't know it existed until now, but it looks useful. It looks as if
>> it can easily solve the OP's problem too.
>>
>> Martin
>
>
> If you could create a couple of (working) sample rules, we could add
> to SA documentation and to the SA ruleset (for masscheck)
>
> Thanx
>
> Axb
>


Re: HTML link regex

2012-09-26 Thread Alexandre Boyer
I found a couple of examples with uri_detail checks (instead of uri
checks) that are written in a very similar way to what John suggested.

I wil test this today.

Having writen two plugins already (that is, on the edge to begin to
understand how the PMS works ;) ), I knew that one could work with the
hash of hashes uri_details provided by the PMS.

But seeing nowhere a rule of that kind, I never tried it before.

This will be done today, tested with my own samples, and I will later
post my results on this thread.

Thank's for all of your suggestions, I really appreciate.

Alex, from prypiat.
Yes, I recycle.


On 12-09-26 06:48 AM, Axb wrote:
> On 09/26/2012 12:46 PM, Martin Gregorie wrote:
>> On Wed, 2012-09-26 at 12:05 +0200, Axb wrote:
>>> have you looked at the URIDetail plugin ?
>>>
>> I didn't know it existed until now, but it looks useful. It looks as if
>> it can easily solve the OP's problem too.
>>
>> Martin
>
>
> If you could create a couple of (working) sample rules, we could add
> to SA documentation and to the SA ruleset (for masscheck)
>
> Thanx
>
> Axb
>


HTML link regex

2012-09-25 Thread Alexandre Boyer
Hi list,

I'm receiving a lot of spam of a very particular sort.

It's essentially FREEMAIL_FROM and the body only contains a fake Youtube
link like:

http://www.probono.fr/95280_pdf";>http://www.youtube.com/wa=
tch?v=3D3VvOFqaHbL5&feature=3Dg-vrec&feature=3Dg-vrec


I ended with a regex for this kind of thing:

full   AJB_UTUBE_BADLINK   

m'\shref=.{0,3}(https?://)?(www\.)?(?!youtube)[^\.]+\.[^>]+>(https?://)?(www\.)?youtube\.'mi
score  AJB_UTUBE_BADLINK0 #
3.0 

 



I've been poking around with negative/positive lookaheads/lookbehinds,
full or rawbody rules.

I have some samples for my tests (FPs and FNs), sometime it matches,
often it's not... quite inconsistant and a large source of FPs.

I sense that this simple regex could be adapted to many domains (like
bank phishes, ebay phishes, ups phishes etc.) but it's not working as
is. It could eventually be turned in plugin of some sort, while this
will require much more work and brain.

What do you think of this? Do I miss something obvious in my regex?

-- 
Alex, from prypiat.
Yes, I recycle.



Re: Rules Needed to verify bank fraud

2012-08-24 Thread Alexandre Boyer
Yep, you are damn right. I work in a company where I maintain a list for
canadian banks and more. It's a pain, but it's effective.

Should a few responsible of us contribute, it would greatly help.

Alex, from osmose.
Bow before me, for I am root.


On 12-08-24 02:03 PM, Matt Garretson wrote:
> In my experience, banks and financial institutions tend to be among the
> worst offenders against sane bulk mailing practices.  SPF or DKIM will
> be broken or inconsistently applied, and sender/relay domains seem to
> vary with the weather.  I think it will be tough to nail down all the
> valid domains a bank might use to contact its clients.  You'd think that
> banks would care enough to do things right, but in many cases they
> really seem not to.
>
> The general technique proposed is effective, but I think that trying to
> create and maintain a list like this for more than a handful of banks
> will be a hassle at best, and will be highly prone to false positives.
>
> It might still be worth trying, but I just wanted to vent my pessimism. :)
>



signature.asc
Description: OpenPGP digital signature


Re: Rules Needed to verify bank fraud

2012-08-23 Thread Alexandre Boyer
That's my opinion too.

Therefor the community will have to contribute to the list of which
domain to add or not.

Alex, from osmose.
Bow before me, for I am root.


On 12-08-23 07:20 PM, Jason Haar wrote:
> Great idea - but don't under-estimate the amount of work. Someone
> thought there'd be "only" 20-30 domains to be covered - but I'd say
> that's actually 20-30 domains PER COUNTRY.
>
> Here in New Zealand we get a lot of phishing attacks using New Zealand
> banks - just like you get spam referring to your own country banks.
> However, it appears almost none of the NZ banks have heard of SPF. Of
> the first three I could think of, only one had a SPF record - and it
> looks like they've outsourced email too (I can't believe any financial
> institution would outsource email! Gahhh!)
>
>



signature.asc
Description: OpenPGP digital signature


Re: HEADS UP: DBSL.org is returning positive replies

2012-08-10 Thread Alexandre Boyer
That's right.

Excuse me to use this thread, but I have a short question about scoring.

When I want to prevent a rule from being used, I set it's score to 0:

score  RULE  0

Is the method asked by Brent working too?

Alex, from prypiat.
Yes, I recycle.


On 12-08-10 04:29 PM, dar...@chaosreigns.com wrote:
> On 08/10, Brent Gardner wrote:
>>> As of today, dsbl.org is returning positive replies
>> Is this enough to keep it from being used?
>>
>> meta RCVD_IN_DSBL (0)
> Not necessary, this blacklist is not used in spamassassin because it has
> been dead for years.
>
> I believe the warning was posted primarily for people who were using this
> BL at their MTA (mail server software).  Or possibly ancient versions of SA
> (before 3.3.x) which haven't been getting updates for years and you
> shouldn't be running anyway.
>


Re: HEADS UP: DBSL.org is returning positive replies

2012-08-10 Thread Alexandre Boyer
Did you meant:

score  RCVD_IN_DSBL0

?

Alex, from prypiat.
Yes, I recycle.


On 12-08-10 04:00 PM, Brent Gardner wrote:
> On 08/10/2012 04:46 AM, Axb wrote:
>> DSBL.org was shut down 4 years ago but apparently there's still ppl
>> sending lookups.
>>
>> As of today, dsbl.org is returning positive replies
>>
>> Enjoy the support case party!
>>
>> https://twitter.com/#!/search/?q=DSBL&src=typd
>>
>>
>> Axb
> Is this enough to keep it from being used?
>
> meta RCVD_IN_DSBL (0)
>
>
> Brent Gardner
>
>


Re: SELL CVV GOOD ALL COUNTRY,Transfer WU,SHIP LAPTOP( DELL, TOSHIBA,..) IPAD2,IPHONE

2012-06-09 Thread Alexandre Boyer
+1

Alex, from osmose.
Bow before me, for I am root.


On 12-06-09 03:29 AM, Niamh Holding wrote:
> Hello best_sellercvv,
>
> Saturday, June 9, 2012, 7:00:35 AM, you wrote:
>
> b> Hi every customer
>
> Oh the irony to see the spamassassin list spammed :)
>



signature.asc
Description: OpenPGP digital signature