Re: Very strange SA result!

2015-12-03 Thread Bill Cole

On 3 Dec 2015, at 9:36, Joe Quinn wrote:


On 12/3/2015 9:23 AM, Jari Fredriksson wrote:

On 3.12.2015 16.11, Kevin A. McGrail wrote:

You are using KAM.cf which isn't a project ruleset.

Please report the issue and a spample at
https://raptor.pccc.com/raptor.cgim?template=report_problem

We can likely look at it quickly and adjust.  However, the fact that 
SPF
failed makes me lean towards the fact that the rule fired 
correctly...


Regards,
KAM



There seems to be something in the spf detection. SPF claims that 
paypal is not allowed (by their sfp record) to send mail via my email 
relay. That relay IS in my trusted_networks. What am I missing now?


br. jarif

Probably this bug, which we are still working out a good solution for:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7182

The SPF RFC has a "MUST" constraint on 10 lookups per SPF check, which 
Paypal has broken before. The reasoning given is resistance to denial 
of service attacks via DNS traffic, which makes it a tricky fix. We'll 
discuss the KAM.cf issue privately, and bring it back on-list in dev@ 
if it comes back to new information on this issue.


Not in this case. Note that the URL in the SPF_FAIL line indicates 
emea.e.paypal.com as the sender domain. Not a complex record.


Re: Very strange SA result!

2015-12-03 Thread Jari Fredriksson


I was now trying to debug with spamassassin -D  to find out why sfp 
fails, but could not. It just works now, and I did not even restart 
spamd between...


Some temporary hickup? Ah well...

Sorry for the noise.

On 3.12.2015 16:07, Jari Fredriksson wrote:

KAM_PAYPAL1 rampant paypal phishing scams

Aarghs!

I found out a mail from paypal as follows:

X-Spam-Status: Yes, score=7.8 required=5.0 
tests=BAYES_00,DKIM_SIGNED,


DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,
KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,

RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ,
URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no 
version=3.4.1

X-Spam-Orig-To: 
X-Spam-Report:
* -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

*  trust
*  [96.47.30.215 listed in list.dnswl.org]
*  0.4 URIBL_GREY Contains an URL listed in the URIBL greylist
*  [URIs: ed4.net]
* -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
*  [96.47.30.215 listed in wl.mailspike.net]
*  0.6 URG_BIZ BODY: Contains urgent matter
* -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM 
white-list
* -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay 
domain

*  0.0 SPF_FAIL SPF: sender does not match SPF record (fail)
*  [SPF failed: Please see

http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi]
* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]
*  1.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or 
identical to

*   background
*  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from 
author's

*   domain
* -0.0 DKIM_VERIFIED No description available.
* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not 
necessarily

*  valid
* -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
*   16 KAM_PAYPAL1 rampant paypal phishing scams
*  0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal
*  information
X-Spam-Level: ***
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
gamecock.fredriksson.dy.fi


--
jarif.bit


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Kris Deugau
Charles Sprickman wrote:

> I had a look at that page - I use mysql to store the data, have multiple 
> spamd boxes, and spamc on the inbound servers passing mail to spamd once all 
> the “front door” checks are done.  In that config, I end up with unique 
> per-user bayes tokens.  I’m looking to just pool everyone together, but don’t 
> see an obvious way to do that.  It seems like folks in this thread are 
> however doing that somehow (perhaps just because they are using a milter or 
> similar).

Really short answer:

bayes_sql_username spamassassin

man Mail::SpamAssassin::Conf (IIRC) for details.  That directive
overrides the spamd per-user behaviour for Bayes, putting it all in one
basket.

-kgd


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Kris Deugau
Jari Fredriksson wrote:
> On 12/03/2015 02:29 AM, Charles Sprickman wrote:
>> I had a look at that page - I use mysql to store the data, have
>> multiple spamd boxes, and spamc on the inbound servers passing mail to
>> spamd once all the “front door” checks are done.  In that config, I
>> end up with unique per-user bayes tokens.  I’m looking to just pool
>> everyone together, but don’t see an obvious way to do that.  It seems
>> like folks in this thread are however doing that somehow (perhaps just
>> because they are using a milter or similar).


> I have a similar setup. I use "spamc -d spamd -u spam ..." and I think
> that -u spam is all it takes to make it site wide. Not very complex?

That'll give you a global Bayes, but it also eliminate any per-user
settings you might want to keep using, because you're not letting spamd
"know" about any different users.

We use these settings (watch for word wrap):

---

bayes_store_module Mail::SpamAssassin::BayesStore::SQL
bayes_sql_dsn DBI:mysql:spamassassin:[ip]
bayes_sql_username spamassassin
bayes_sql_password 
bayes_sql_override_username spamassassin

# awl
auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn DBI:mysql:spamassassin:[ip]
user_awl_sql_username spamassassin
user_awl_sql_password 
user_awl_sql_table awl

# userprefs.  worksforme(TM)
user_scores_dsn   DBI:mysql:spamassassin:[ip]
user_scores_sql_username  spamassassin
user_scores_sql_password  
# Need a custom query to do domainwide settings.  Default does not have
the third WHERE clause
user_scores_sql_custom_query  SELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_ OR username = '@GLOBAL' OR username =
concat('@~',_DOMAIN_) ORDER BY username ASC

# don't pass mail through unscanned if there's an error trying to get
userprefs.
# Note that "no userpref entries" is NOT an error.
# Note also this requires a custom patch
user_scores_fallback_to_global  1

---

to have per-user AWL, SQL userprefs including domainwide settings, and
global Bayes all in the same MySQL database.

A read of the fine manual page (Mail::SpamAssassin::Conf) will usually
turn up all the necessary directives for whatever you're trying to do.

-kgd


Re: Very strange SA result!

2015-12-03 Thread Joe Quinn

On 12/3/2015 9:23 AM, Jari Fredriksson wrote:

On 3.12.2015 16.11, Kevin A. McGrail wrote:

You are using KAM.cf which isn't a project ruleset.

Please report the issue and a spample at
https://raptor.pccc.com/raptor.cgim?template=report_problem

We can likely look at it quickly and adjust.  However, the fact that SPF
failed makes me lean towards the fact that the rule fired correctly...

Regards,
KAM



There seems to be something in the spf detection. SPF claims that 
paypal is not allowed (by their sfp record) to send mail via my email 
relay. That relay IS in my trusted_networks. What am I missing now?


br. jarif

Probably this bug, which we are still working out a good solution for:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7182

The SPF RFC has a "MUST" constraint on 10 lookups per SPF check, which 
Paypal has broken before. The reasoning given is resistance to denial of 
service attacks via DNS traffic, which makes it a tricky fix. We'll 
discuss the KAM.cf issue privately, and bring it back on-list in dev@ if 
it comes back to new information on this issue.


Re: Very strange SA result!

2015-12-03 Thread Jari Fredriksson

On 3.12.2015 16.11, Kevin A. McGrail wrote:

You are using KAM.cf which isn't a project ruleset.

Please report the issue and a spample at
https://raptor.pccc.com/raptor.cgim?template=report_problem

We can likely look at it quickly and adjust.  However, the fact that SPF
failed makes me lean towards the fact that the rule fired correctly...

Regards,
KAM



There seems to be something in the spf detection. SPF claims that paypal 
is not allowed (by their sfp record) to send mail via my email relay. 
That relay IS in my trusted_networks. What am I missing now?


br. jarif




On 12/3/2015 9:07 AM, Jari Fredriksson wrote:


KAM_PAYPAL1 rampant paypal phishing scams

Aarghs!

I found out a mail from paypal as follows:

X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,

KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,
RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ,

URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no
version=3.4.1
X-Spam-Orig-To: 
X-Spam-Report:
* -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at
http://www.dnswl.org/, no
*  trust
*  [96.47.30.215 listed in list.dnswl.org]
*  0.4 URIBL_GREY Contains an URL listed in the URIBL greylist
*  [URIs: ed4.net]
* -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
*  [96.47.30.215 listed in wl.mailspike.net]
*  0.6 URG_BIZ BODY: Contains urgent matter
* -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM
white-list
* -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover
relay domain
*  0.0 SPF_FAIL SPF: sender does not match SPF record (fail)
*  [SPF failed: Please see
http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi]
* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]
*  1.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or
identical to
*   background
*  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from
author's
*   domain
* -0.0 DKIM_VERIFIED No description available.
* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not
necessarily
*  valid
* -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
*   16 KAM_PAYPAL1 rampant paypal phishing scams
*  0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal
*  information
X-Spam-Level: ***
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
gamecock.fredriksson.dy.fi







--
jarif.bit


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread RW
On Wed, 2 Dec 2015 17:14:22 +
Sebastian Arcus wrote:

> On 02/12/15 12:55, Reindl Harald wrote:
> >
> >
> > Am 02.12.2015 um 12:51 schrieb Sebastian Arcus:  
> >> I hope I'm not exceeding the patience of the list by posting a
> >> third question in two days :-)
> >>
> >> I realise the above question is a "soft" question, probably
> >> without a definite "yes" or "no" answer.

Yery true.

> > additionally we share our bayes with another company which pulls
> > the dumps if the hash file is different every 30 minutes
> >
> > we as well as the other company does mail hosting on ISP level and
> > the results on both sides are perfect - we share even scorings, 
> > whitelists, custom body/subject-rules and the summary is: at least
> > in the same country sharing spamfilter configurations works like a
> > charme  
> 
> Perfect - that's exactly the sort of real-life based advice I was 
> looking for. Many thanks!

It's not really surprising that the diverse mail of 2 similar ISPs is
similar for Bayes, especially with the headers removed. Whether your
ham looks like your client's ham is an entirely different matter. If
the ham isn't similar then using your ham-heavy database is likely to
be sub-optimal. 


There's also the ham:spam ratio - at one point you quoted a figure of
12000:300. An imbalance is not intrinsically wrong, but it could cause
problems if you transplant it into a system where new training occurs at
a very different ratio. Any new tokens that appear in the second system
are heavily skewed to being treated as spammy. What's particularly bad
is if you strip headers in your corpus and then the client goes on to
train without stripping them, then neutral tokens that got stripped
enter the database as heavily spammy.


Re: Very strange SA result!

2015-12-03 Thread Kevin A. McGrail

You are using KAM.cf which isn't a project ruleset.

Please report the issue and a spample at 
https://raptor.pccc.com/raptor.cgim?template=report_problem


We can likely look at it quickly and adjust.  However, the fact that SPF 
failed makes me lean towards the fact that the rule fired correctly...


Regards,
KAM

On 12/3/2015 9:07 AM, Jari Fredriksson wrote:


KAM_PAYPAL1 rampant paypal phishing scams

Aarghs!

I found out a mail from paypal as follows:

X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, 


KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,
RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ, 

URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no 
version=3.4.1

X-Spam-Orig-To: 
X-Spam-Report:
* -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

*  trust
*  [96.47.30.215 listed in list.dnswl.org]
*  0.4 URIBL_GREY Contains an URL listed in the URIBL greylist
*  [URIs: ed4.net]
* -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
*  [96.47.30.215 listed in wl.mailspike.net]
*  0.6 URG_BIZ BODY: Contains urgent matter
* -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM 
white-list
* -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover 
relay domain

*  0.0 SPF_FAIL SPF: sender does not match SPF record (fail)
*  [SPF failed: Please see 
http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi]

* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]
*  1.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or 
identical to

*   background
*  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from 
author's

*   domain
* -0.0 DKIM_VERIFIED No description available.
* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not 
necessarily

*  valid
* -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
*   16 KAM_PAYPAL1 rampant paypal phishing scams
*  0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal
*  information
X-Spam-Level: ***
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
gamecock.fredriksson.dy.fi




--
*Kevin A. McGrail*
CEO

Peregrine Computer Consultants Corporation
3927 Old Lee Highway, Suite 102-C
Fairfax, VA 22030-2422

http://www.pccc.com/

703-359-9700 x50 / 800-823-8402 (Toll-Free)
703-798-0171 (wireless)
kmcgr...@pccc.com <mailto:kmcgr...@pccc.com>



Very strange SA result!

2015-12-03 Thread Jari Fredriksson


KAM_PAYPAL1 rampant paypal phishing scams

Aarghs!

I found out a mail from paypal as follows:

X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,

DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE,
KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,

RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ,
	URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no 
version=3.4.1

X-Spam-Orig-To: 
X-Spam-Report:
* -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, 
no
*  trust
*  [96.47.30.215 listed in list.dnswl.org]
*  0.4 URIBL_GREY Contains an URL listed in the URIBL greylist
*  [URIs: ed4.net]
* -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3)
*  [96.47.30.215 listed in wl.mailspike.net]
*  0.6 URG_BIZ BODY: Contains urgent matter
* -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM 
white-list
* -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay 
domain
*  0.0 SPF_FAIL SPF: sender does not match SPF record (fail)
	*  [SPF failed: Please see 
http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi]

* -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]
*  1.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or 
identical to
*   background
*  0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
* -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from 
author's
*   domain
* -0.0 DKIM_VERIFIED No description available.
* -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature
*  0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily
*  valid
* -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders
*   16 KAM_PAYPAL1 rampant paypal phishing scams
*  0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal
*  information
X-Spam-Level: ***
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on
gamecock.fredriksson.dy.fi

--
jarif.bit


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Reindl Harald



Am 03.12.2015 um 12:41 schrieb Jeroen de Neef:

I'd like to teach my bayes correctly especially since I don't get a lot
of emails, thanks to Reindl's list I will ignore those headers from now on.
But I don't want it to learn that the /*spam*/ in the subject
means that it is spam or ham, is there a way I can remove it before
throwing it at the bayesian filter? Perhaps an extra line in the config
or a bash script?


just add a replace in the php-script i posted before it verifies the new 
content against the old one to decide if the file needs to be rewritten


for such cleanups and anonymize i use seperated scripts to keep the code 
clean, one of them also reads the postfix configration and replaces own 
domains and email-addresses with "m...@example.com"


"I will ignore those headers from now on" - the ignore configuration is 
not enough, hence the formail script to strip the headers completly from 
the samples


the Received header is a special case - if the samples don't have any 
Received header you get *completly* different bayes results compared 
with a always identical one, hence i strip them all and add a generic at 
the end on top of the file


that leads also in have a dramatical reduced token number because you 
have at the end only one token for Received with the same date, time, host




signature.asc
Description: OpenPGP digital signature


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Jeroen de Neef
Hello all,

I'd like to teach my bayes correctly especially since I don't get a lot of
emails, thanks to Reindl's list I will ignore those headers from now on.
But I don't want it to learn that the **spam** in the subject means
that it is spam or ham, is there a way I can remove it before throwing it
at the bayesian filter? Perhaps an extra line in the config or a bash
script?

Kind regards,

Jeroen

2015-12-03 11:00 GMT+01:00 Reindl Harald :

>
>
> Am 03.12.2015 um 10:47 schrieb Sebastian Arcus:
>
>> On 03/12/15 01:40, Reindl Harald wrote:
>>
>>>
>>>
>>> Am 03.12.2015 um 01:14 schrieb Alex:
>>>
 On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren  wrote:

> On 2015-12-02 09:14, Sebastian Arcus wrote:
>
>>
>> Perfect - that's exactly the sort of real-life based advice I was
>> looking
>> for. Many thanks!
>>
>
> I run a small shared hosting environment, with a global bayes for
> all users
> as not enough users are ready/willing/able to take the time to sort ham
> (although more will press "this is spam") and in general, the
> results work
> out well enough.
>

 A portion of the bayes database is the header information from the
 email. What does it mean for those headers that contain info specific
 to a particular domain or site when it's transferred to another domain
 or site where those specifics will be different?

>>>
>>> see attached php/formail-script and list of ignored/stripped headers
>>>
>>> we strip a large portion of headers including especially the Received
>>> headers with "formail" and preprend a egenric one on top from all
>>> samples before train them
>>>
>> Does that mean that transferring  bayes databases between sites without
>> stripping the headers wouldn't work - or it is just more effective if
>> one strips the headers?
>>
>
> it worked without strip them around 6 months
> but it works better now
>
> see the 77.72% BAYES_00 which would be more but some trained ham is in
> shortcircuit and so don't touch bayes at all
>
> "SPAMMY" means >= BAYES_60 in the stats
>
> BAYES_00 3914   77.72 %
> BAYES_05   871.72 %
> BAYES_20  1342.66 %
> BAYES_40  1082.14 %
> BAYES_50  2885.71 %
> BAYES_60   611.21 %
> BAYES_80   450.89 %
> BAYES_95   340.67 %
> BAYES_99  3657.24 %
> BAYES_999 3196.33 %
>
> DELIVERED6609   95.18 %
> DNSWL6249   90.00 %
> SPF  4586   66.05 %
> SPF/DKIM WL  1880   27.07 %
> SHORTCIRCUIT 1900   27.36 %
>
> BLOCKED   5157.41 %
> SPAMMY5057.27 %98.05 % (OF TOTAL BLOCKED)
>
>
>
>


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Jari Fredriksson

On 12/03/2015 01:38 PM, Jari Fredriksson wrote:

On 12/03/2015 02:29 AM, Charles Sprickman wrote:
I had a look at that page - I use mysql to store the data, have 
multiple spamd boxes, and spamc on the inbound servers passing mail 
to spamd once all the “front door” checks are done.  In that config, 
I end up with unique per-user bayes tokens.  I’m looking to just pool 
everyone together, but don’t see an obvious way to do that.  It seems 
like folks in this thread are however doing that somehow (perhaps 
just because they are using a milter or similar).


Thanks,

Charles




I have a similar setup. I use "spamc -d spamd -u spam ..." and I think 
that -u spam is all it takes to make it site wide. Not very complex?


br. jarif

Btw, I really have a similar setup, as host name "spamd" points to 
haproxy having multiple back ends on it for spamassassin.


br. jarif


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Jari Fredriksson

On 12/03/2015 02:29 AM, Charles Sprickman wrote:

I had a look at that page - I use mysql to store the data, have multiple spamd 
boxes, and spamc on the inbound servers passing mail to spamd once all the 
“front door” checks are done.  In that config, I end up with unique per-user 
bayes tokens.  I’m looking to just pool everyone together, but don’t see an 
obvious way to do that.  It seems like folks in this thread are however doing 
that somehow (perhaps just because they are using a milter or similar).

Thanks,

Charles




I have a similar setup. I use "spamc -d spamd -u spam ..." and I think 
that -u spam is all it takes to make it site wide. Not very complex?


br. jarif



Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Reindl Harald



Am 03.12.2015 um 10:47 schrieb Sebastian Arcus:

On 03/12/15 01:40, Reindl Harald wrote:



Am 03.12.2015 um 01:14 schrieb Alex:

On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren  wrote:

On 2015-12-02 09:14, Sebastian Arcus wrote:


Perfect - that's exactly the sort of real-life based advice I was
looking
for. Many thanks!


I run a small shared hosting environment, with a global bayes for
all users
as not enough users are ready/willing/able to take the time to sort ham
(although more will press "this is spam") and in general, the
results work
out well enough.


A portion of the bayes database is the header information from the
email. What does it mean for those headers that contain info specific
to a particular domain or site when it's transferred to another domain
or site where those specifics will be different?


see attached php/formail-script and list of ignored/stripped headers

we strip a large portion of headers including especially the Received
headers with "formail" and preprend a egenric one on top from all
samples before train them

Does that mean that transferring  bayes databases between sites without
stripping the headers wouldn't work - or it is just more effective if
one strips the headers?


it worked without strip them around 6 months
but it works better now

see the 77.72% BAYES_00 which would be more but some trained ham is in 
shortcircuit and so don't touch bayes at all


"SPAMMY" means >= BAYES_60 in the stats

BAYES_00 3914   77.72 %
BAYES_05   871.72 %
BAYES_20  1342.66 %
BAYES_40  1082.14 %
BAYES_50  2885.71 %
BAYES_60   611.21 %
BAYES_80   450.89 %
BAYES_95   340.67 %
BAYES_99  3657.24 %
BAYES_999 3196.33 %

DELIVERED6609   95.18 %
DNSWL6249   90.00 %
SPF  4586   66.05 %
SPF/DKIM WL  1880   27.07 %
SHORTCIRCUIT 1900   27.36 %

BLOCKED   5157.41 %
SPAMMY5057.27 %98.05 % (OF TOTAL BLOCKED)





signature.asc
Description: OpenPGP digital signature


Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Sebastian Arcus

On 03/12/15 01:40, Reindl Harald wrote:



Am 03.12.2015 um 01:14 schrieb Alex:

On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren  wrote:

On 2015-12-02 09:14, Sebastian Arcus wrote:


Perfect - that's exactly the sort of real-life based advice I was 
looking

for. Many thanks!


I run a small shared hosting environment, with a global bayes for 
all users

as not enough users are ready/willing/able to take the time to sort ham
(although more will press "this is spam") and in general, the 
results work

out well enough.


A portion of the bayes database is the header information from the
email. What does it mean for those headers that contain info specific
to a particular domain or site when it's transferred to another domain
or site where those specifics will be different?


see attached php/formail-script and list of ignored/stripped headers

we strip a large portion of headers including especially the Received 
headers with "formail" and preprend a egenric one on top from all 
samples before train them
Does that mean that transferring  bayes databases between sites without 
stripping the headers wouldn't work - or it is just more effective if 
one strips the headers?




Re: Is it worth transferring bayes data between different sites?

2015-12-03 Thread Sebastian Arcus


On 03/12/15 00:29, Charles Sprickman wrote:

Reindl Harald  wrote:



Am 02.12.2015 um 21:50 schrieb Charles Sprickman:

Reindl Harald  wrote:


Am 02.12.2015 um 12:51 schrieb Sebastian Arcus:

I hope I'm not exceeding the patience of the list by posting a third
question in two days :-)

I realise the above question is a "soft" question, probably without a
definite "yes" or "no" answer. I am hoping that people with experience
of using SA in various environments might be able to throw in some
opinions. Based on the documentation, it is clearly possible to transfer
a bayes database from one install to another - specially if it is a
sitewide database. What I was wondering is if it is worth doing so from
a results point of view

we use our global bayes on the incoming MX and share it with our submission 
servers to stop outgoing spam from hacked accounts

This is a bit OT, but I have had a hard time finding how to setup a global 
bayes DB rather than having everything done on a per-user basis.  Looking 
around the SA wiki, I don’t see global DBs addressed.  Any tips?

https://wiki.apache.org/spamassassin/SiteWideBayesSetup

in case you are runnign spamass-milter that's even the logical default because 
your milter is running as it's own user, with it's own .spamassassin directory 
in the userhome which contains the db

I had a look at that page - I use mysql to store the data, have multiple spamd 
boxes, and spamc on the inbound servers passing mail to spamd once all the 
“front door” checks are done.  In that config, I end up with unique per-user 
bayes tokens.  I’m looking to just pool everyone together, but don’t see an 
obvious way to do that.  It seems like folks in this thread are however doing 
that somehow (perhaps just because they are using a milter or similar).
In case in helps: I use SA with exim - and Exim talks over Unix sockets 
to spamd daemon. I've used the instructions at the wiki page above to 
setup the sitewide bayes database - but I don't use MySQL - and it all 
seems to work as expected.