Re: Very strange SA result!
On 3 Dec 2015, at 9:36, Joe Quinn wrote: On 12/3/2015 9:23 AM, Jari Fredriksson wrote: On 3.12.2015 16.11, Kevin A. McGrail wrote: You are using KAM.cf which isn't a project ruleset. Please report the issue and a spample at https://raptor.pccc.com/raptor.cgim?template=report_problem We can likely look at it quickly and adjust. However, the fact that SPF failed makes me lean towards the fact that the rule fired correctly... Regards, KAM There seems to be something in the spf detection. SPF claims that paypal is not allowed (by their sfp record) to send mail via my email relay. That relay IS in my trusted_networks. What am I missing now? br. jarif Probably this bug, which we are still working out a good solution for: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7182 The SPF RFC has a "MUST" constraint on 10 lookups per SPF check, which Paypal has broken before. The reasoning given is resistance to denial of service attacks via DNS traffic, which makes it a tricky fix. We'll discuss the KAM.cf issue privately, and bring it back on-list in dev@ if it comes back to new information on this issue. Not in this case. Note that the URL in the SPF_FAIL line indicates emea.e.paypal.com as the sender domain. Not a complex record.
Re: Very strange SA result!
I was now trying to debug with spamassassin -D to find out why sfp fails, but could not. It just works now, and I did not even restart spamd between... Some temporary hickup? Ah well... Sorry for the noise. On 3.12.2015 16:07, Jari Fredriksson wrote: KAM_PAYPAL1 rampant paypal phishing scams Aarghs! I found out a mail from paypal as follows: X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ, URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.1 X-Spam-Orig-To: X-Spam-Report: * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no * trust * [96.47.30.215 listed in list.dnswl.org] * 0.4 URIBL_GREY Contains an URL listed in the URIBL greylist * [URIs: ed4.net] * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) * [96.47.30.215 listed in wl.mailspike.net] * 0.6 URG_BIZ BODY: Contains urgent matter * -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list * -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain * 0.0 SPF_FAIL SPF: sender does not match SPF record (fail) * [SPF failed: Please see http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi] * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.] * 1.0 HTML_MESSAGE BODY: HTML included in message * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to * background * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's * domain * -0.0 DKIM_VERIFIED No description available. * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily * valid * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders * 16 KAM_PAYPAL1 rampant paypal phishing scams * 0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal * information X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on gamecock.fredriksson.dy.fi -- jarif.bit
Re: Is it worth transferring bayes data between different sites?
Charles Sprickman wrote: > I had a look at that page - I use mysql to store the data, have multiple > spamd boxes, and spamc on the inbound servers passing mail to spamd once all > the “front door” checks are done. In that config, I end up with unique > per-user bayes tokens. I’m looking to just pool everyone together, but don’t > see an obvious way to do that. It seems like folks in this thread are > however doing that somehow (perhaps just because they are using a milter or > similar). Really short answer: bayes_sql_username spamassassin man Mail::SpamAssassin::Conf (IIRC) for details. That directive overrides the spamd per-user behaviour for Bayes, putting it all in one basket. -kgd
Re: Is it worth transferring bayes data between different sites?
Jari Fredriksson wrote: > On 12/03/2015 02:29 AM, Charles Sprickman wrote: >> I had a look at that page - I use mysql to store the data, have >> multiple spamd boxes, and spamc on the inbound servers passing mail to >> spamd once all the “front door” checks are done. In that config, I >> end up with unique per-user bayes tokens. I’m looking to just pool >> everyone together, but don’t see an obvious way to do that. It seems >> like folks in this thread are however doing that somehow (perhaps just >> because they are using a milter or similar). > I have a similar setup. I use "spamc -d spamd -u spam ..." and I think > that -u spam is all it takes to make it site wide. Not very complex? That'll give you a global Bayes, but it also eliminate any per-user settings you might want to keep using, because you're not letting spamd "know" about any different users. We use these settings (watch for word wrap): --- bayes_store_module Mail::SpamAssassin::BayesStore::SQL bayes_sql_dsn DBI:mysql:spamassassin:[ip] bayes_sql_username spamassassin bayes_sql_password bayes_sql_override_username spamassassin # awl auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList user_awl_dsn DBI:mysql:spamassassin:[ip] user_awl_sql_username spamassassin user_awl_sql_password user_awl_sql_table awl # userprefs. worksforme(TM) user_scores_dsn DBI:mysql:spamassassin:[ip] user_scores_sql_username spamassassin user_scores_sql_password # Need a custom query to do domainwide settings. Default does not have the third WHERE clause user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '@GLOBAL' OR username = concat('@~',_DOMAIN_) ORDER BY username ASC # don't pass mail through unscanned if there's an error trying to get userprefs. # Note that "no userpref entries" is NOT an error. # Note also this requires a custom patch user_scores_fallback_to_global 1 --- to have per-user AWL, SQL userprefs including domainwide settings, and global Bayes all in the same MySQL database. A read of the fine manual page (Mail::SpamAssassin::Conf) will usually turn up all the necessary directives for whatever you're trying to do. -kgd
Re: Very strange SA result!
On 12/3/2015 9:23 AM, Jari Fredriksson wrote: On 3.12.2015 16.11, Kevin A. McGrail wrote: You are using KAM.cf which isn't a project ruleset. Please report the issue and a spample at https://raptor.pccc.com/raptor.cgim?template=report_problem We can likely look at it quickly and adjust. However, the fact that SPF failed makes me lean towards the fact that the rule fired correctly... Regards, KAM There seems to be something in the spf detection. SPF claims that paypal is not allowed (by their sfp record) to send mail via my email relay. That relay IS in my trusted_networks. What am I missing now? br. jarif Probably this bug, which we are still working out a good solution for: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7182 The SPF RFC has a "MUST" constraint on 10 lookups per SPF check, which Paypal has broken before. The reasoning given is resistance to denial of service attacks via DNS traffic, which makes it a tricky fix. We'll discuss the KAM.cf issue privately, and bring it back on-list in dev@ if it comes back to new information on this issue.
Re: Very strange SA result!
On 3.12.2015 16.11, Kevin A. McGrail wrote: You are using KAM.cf which isn't a project ruleset. Please report the issue and a spample at https://raptor.pccc.com/raptor.cgim?template=report_problem We can likely look at it quickly and adjust. However, the fact that SPF failed makes me lean towards the fact that the rule fired correctly... Regards, KAM There seems to be something in the spf detection. SPF claims that paypal is not allowed (by their sfp record) to send mail via my email relay. That relay IS in my trusted_networks. What am I missing now? br. jarif On 12/3/2015 9:07 AM, Jari Fredriksson wrote: KAM_PAYPAL1 rampant paypal phishing scams Aarghs! I found out a mail from paypal as follows: X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ, URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.1 X-Spam-Orig-To: X-Spam-Report: * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no * trust * [96.47.30.215 listed in list.dnswl.org] * 0.4 URIBL_GREY Contains an URL listed in the URIBL greylist * [URIs: ed4.net] * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) * [96.47.30.215 listed in wl.mailspike.net] * 0.6 URG_BIZ BODY: Contains urgent matter * -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list * -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain * 0.0 SPF_FAIL SPF: sender does not match SPF record (fail) * [SPF failed: Please see http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi] * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.] * 1.0 HTML_MESSAGE BODY: HTML included in message * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to * background * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's * domain * -0.0 DKIM_VERIFIED No description available. * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily * valid * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders * 16 KAM_PAYPAL1 rampant paypal phishing scams * 0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal * information X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on gamecock.fredriksson.dy.fi -- jarif.bit
Re: Is it worth transferring bayes data between different sites?
On Wed, 2 Dec 2015 17:14:22 + Sebastian Arcus wrote: > On 02/12/15 12:55, Reindl Harald wrote: > > > > > > Am 02.12.2015 um 12:51 schrieb Sebastian Arcus: > >> I hope I'm not exceeding the patience of the list by posting a > >> third question in two days :-) > >> > >> I realise the above question is a "soft" question, probably > >> without a definite "yes" or "no" answer. Yery true. > > additionally we share our bayes with another company which pulls > > the dumps if the hash file is different every 30 minutes > > > > we as well as the other company does mail hosting on ISP level and > > the results on both sides are perfect - we share even scorings, > > whitelists, custom body/subject-rules and the summary is: at least > > in the same country sharing spamfilter configurations works like a > > charme > > Perfect - that's exactly the sort of real-life based advice I was > looking for. Many thanks! It's not really surprising that the diverse mail of 2 similar ISPs is similar for Bayes, especially with the headers removed. Whether your ham looks like your client's ham is an entirely different matter. If the ham isn't similar then using your ham-heavy database is likely to be sub-optimal. There's also the ham:spam ratio - at one point you quoted a figure of 12000:300. An imbalance is not intrinsically wrong, but it could cause problems if you transplant it into a system where new training occurs at a very different ratio. Any new tokens that appear in the second system are heavily skewed to being treated as spammy. What's particularly bad is if you strip headers in your corpus and then the client goes on to train without stripping them, then neutral tokens that got stripped enter the database as heavily spammy.
Re: Very strange SA result!
You are using KAM.cf which isn't a project ruleset. Please report the issue and a spample at https://raptor.pccc.com/raptor.cgim?template=report_problem We can likely look at it quickly and adjust. However, the fact that SPF failed makes me lean towards the fact that the rule fired correctly... Regards, KAM On 12/3/2015 9:07 AM, Jari Fredriksson wrote: KAM_PAYPAL1 rampant paypal phishing scams Aarghs! I found out a mail from paypal as follows: X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ, URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.1 X-Spam-Orig-To: X-Spam-Report: * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no * trust * [96.47.30.215 listed in list.dnswl.org] * 0.4 URIBL_GREY Contains an URL listed in the URIBL greylist * [URIs: ed4.net] * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) * [96.47.30.215 listed in wl.mailspike.net] * 0.6 URG_BIZ BODY: Contains urgent matter * -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list * -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain * 0.0 SPF_FAIL SPF: sender does not match SPF record (fail) * [SPF failed: Please see http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi] * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.] * 1.0 HTML_MESSAGE BODY: HTML included in message * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to * background * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's * domain * -0.0 DKIM_VERIFIED No description available. * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily * valid * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders * 16 KAM_PAYPAL1 rampant paypal phishing scams * 0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal * information X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on gamecock.fredriksson.dy.fi -- *Kevin A. McGrail* CEO Peregrine Computer Consultants Corporation 3927 Old Lee Highway, Suite 102-C Fairfax, VA 22030-2422 http://www.pccc.com/ 703-359-9700 x50 / 800-823-8402 (Toll-Free) 703-798-0171 (wireless) kmcgr...@pccc.com <mailto:kmcgr...@pccc.com>
Very strange SA result!
KAM_PAYPAL1 rampant paypal phishing scams Aarghs! I found out a mail from paypal as follows: X-Spam-Status: Yes, score=7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VERIFIED,HTML_FONT_LOW_CONTRAST,HTML_MESSAGE, KAM_PAYPAL1,MIME_HTML_ONLY,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,RP_MATCHES_RCVD,SPF_FAIL,T_FILL_THIS_FORM_SHORT,URG_BIZ, URIBL_GREY,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.1 X-Spam-Orig-To: X-Spam-Report: * -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no * trust * [96.47.30.215 listed in list.dnswl.org] * 0.4 URIBL_GREY Contains an URL listed in the URIBL greylist * [URIs: ed4.net] * -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) * [96.47.30.215 listed in wl.mailspike.net] * 0.6 URG_BIZ BODY: Contains urgent matter * -7.5 USER_IN_DEF_DKIM_WL From: address is in the default DKIM white-list * -1.4 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain * 0.0 SPF_FAIL SPF: sender does not match SPF record (fail) * [SPF failed: Please see http://www.openspf.net/Why?s=mfrom;id=fdybuw6-6w2q86-ll1e2s-7aamagp-b95mhd-h-m2-20151203-1d62cdfd8632d%40emea.e.paypal.com;ip=212.16.98.57;r=gamecock.fredriksson.dy.fi] * -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.] * 1.0 HTML_MESSAGE BODY: HTML included in message * 0.0 HTML_FONT_LOW_CONTRAST BODY: HTML font color similar or identical to * background * 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts * -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's * domain * -0.0 DKIM_VERIFIED No description available. * -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature * 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily * valid * -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders * 16 KAM_PAYPAL1 rampant paypal phishing scams * 0.0 T_FILL_THIS_FORM_SHORT Fill in a short form with personal * information X-Spam-Level: *** X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on gamecock.fredriksson.dy.fi -- jarif.bit
Re: Is it worth transferring bayes data between different sites?
Am 03.12.2015 um 12:41 schrieb Jeroen de Neef: I'd like to teach my bayes correctly especially since I don't get a lot of emails, thanks to Reindl's list I will ignore those headers from now on. But I don't want it to learn that the /*spam*/ in the subject means that it is spam or ham, is there a way I can remove it before throwing it at the bayesian filter? Perhaps an extra line in the config or a bash script? just add a replace in the php-script i posted before it verifies the new content against the old one to decide if the file needs to be rewritten for such cleanups and anonymize i use seperated scripts to keep the code clean, one of them also reads the postfix configration and replaces own domains and email-addresses with "m...@example.com" "I will ignore those headers from now on" - the ignore configuration is not enough, hence the formail script to strip the headers completly from the samples the Received header is a special case - if the samples don't have any Received header you get *completly* different bayes results compared with a always identical one, hence i strip them all and add a generic at the end on top of the file that leads also in have a dramatical reduced token number because you have at the end only one token for Received with the same date, time, host signature.asc Description: OpenPGP digital signature
Re: Is it worth transferring bayes data between different sites?
Hello all, I'd like to teach my bayes correctly especially since I don't get a lot of emails, thanks to Reindl's list I will ignore those headers from now on. But I don't want it to learn that the **spam** in the subject means that it is spam or ham, is there a way I can remove it before throwing it at the bayesian filter? Perhaps an extra line in the config or a bash script? Kind regards, Jeroen 2015-12-03 11:00 GMT+01:00 Reindl Harald : > > > Am 03.12.2015 um 10:47 schrieb Sebastian Arcus: > >> On 03/12/15 01:40, Reindl Harald wrote: >> >>> >>> >>> Am 03.12.2015 um 01:14 schrieb Alex: >>> On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren wrote: > On 2015-12-02 09:14, Sebastian Arcus wrote: > >> >> Perfect - that's exactly the sort of real-life based advice I was >> looking >> for. Many thanks! >> > > I run a small shared hosting environment, with a global bayes for > all users > as not enough users are ready/willing/able to take the time to sort ham > (although more will press "this is spam") and in general, the > results work > out well enough. > A portion of the bayes database is the header information from the email. What does it mean for those headers that contain info specific to a particular domain or site when it's transferred to another domain or site where those specifics will be different? >>> >>> see attached php/formail-script and list of ignored/stripped headers >>> >>> we strip a large portion of headers including especially the Received >>> headers with "formail" and preprend a egenric one on top from all >>> samples before train them >>> >> Does that mean that transferring bayes databases between sites without >> stripping the headers wouldn't work - or it is just more effective if >> one strips the headers? >> > > it worked without strip them around 6 months > but it works better now > > see the 77.72% BAYES_00 which would be more but some trained ham is in > shortcircuit and so don't touch bayes at all > > "SPAMMY" means >= BAYES_60 in the stats > > BAYES_00 3914 77.72 % > BAYES_05 871.72 % > BAYES_20 1342.66 % > BAYES_40 1082.14 % > BAYES_50 2885.71 % > BAYES_60 611.21 % > BAYES_80 450.89 % > BAYES_95 340.67 % > BAYES_99 3657.24 % > BAYES_999 3196.33 % > > DELIVERED6609 95.18 % > DNSWL6249 90.00 % > SPF 4586 66.05 % > SPF/DKIM WL 1880 27.07 % > SHORTCIRCUIT 1900 27.36 % > > BLOCKED 5157.41 % > SPAMMY5057.27 %98.05 % (OF TOTAL BLOCKED) > > > >
Re: Is it worth transferring bayes data between different sites?
On 12/03/2015 01:38 PM, Jari Fredriksson wrote: On 12/03/2015 02:29 AM, Charles Sprickman wrote: I had a look at that page - I use mysql to store the data, have multiple spamd boxes, and spamc on the inbound servers passing mail to spamd once all the “front door” checks are done. In that config, I end up with unique per-user bayes tokens. I’m looking to just pool everyone together, but don’t see an obvious way to do that. It seems like folks in this thread are however doing that somehow (perhaps just because they are using a milter or similar). Thanks, Charles I have a similar setup. I use "spamc -d spamd -u spam ..." and I think that -u spam is all it takes to make it site wide. Not very complex? br. jarif Btw, I really have a similar setup, as host name "spamd" points to haproxy having multiple back ends on it for spamassassin. br. jarif
Re: Is it worth transferring bayes data between different sites?
On 12/03/2015 02:29 AM, Charles Sprickman wrote: I had a look at that page - I use mysql to store the data, have multiple spamd boxes, and spamc on the inbound servers passing mail to spamd once all the “front door” checks are done. In that config, I end up with unique per-user bayes tokens. I’m looking to just pool everyone together, but don’t see an obvious way to do that. It seems like folks in this thread are however doing that somehow (perhaps just because they are using a milter or similar). Thanks, Charles I have a similar setup. I use "spamc -d spamd -u spam ..." and I think that -u spam is all it takes to make it site wide. Not very complex? br. jarif
Re: Is it worth transferring bayes data between different sites?
Am 03.12.2015 um 10:47 schrieb Sebastian Arcus: On 03/12/15 01:40, Reindl Harald wrote: Am 03.12.2015 um 01:14 schrieb Alex: On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren wrote: On 2015-12-02 09:14, Sebastian Arcus wrote: Perfect - that's exactly the sort of real-life based advice I was looking for. Many thanks! I run a small shared hosting environment, with a global bayes for all users as not enough users are ready/willing/able to take the time to sort ham (although more will press "this is spam") and in general, the results work out well enough. A portion of the bayes database is the header information from the email. What does it mean for those headers that contain info specific to a particular domain or site when it's transferred to another domain or site where those specifics will be different? see attached php/formail-script and list of ignored/stripped headers we strip a large portion of headers including especially the Received headers with "formail" and preprend a egenric one on top from all samples before train them Does that mean that transferring bayes databases between sites without stripping the headers wouldn't work - or it is just more effective if one strips the headers? it worked without strip them around 6 months but it works better now see the 77.72% BAYES_00 which would be more but some trained ham is in shortcircuit and so don't touch bayes at all "SPAMMY" means >= BAYES_60 in the stats BAYES_00 3914 77.72 % BAYES_05 871.72 % BAYES_20 1342.66 % BAYES_40 1082.14 % BAYES_50 2885.71 % BAYES_60 611.21 % BAYES_80 450.89 % BAYES_95 340.67 % BAYES_99 3657.24 % BAYES_999 3196.33 % DELIVERED6609 95.18 % DNSWL6249 90.00 % SPF 4586 66.05 % SPF/DKIM WL 1880 27.07 % SHORTCIRCUIT 1900 27.36 % BLOCKED 5157.41 % SPAMMY5057.27 %98.05 % (OF TOTAL BLOCKED) signature.asc Description: OpenPGP digital signature
Re: Is it worth transferring bayes data between different sites?
On 03/12/15 01:40, Reindl Harald wrote: Am 03.12.2015 um 01:14 schrieb Alex: On Wed, Dec 2, 2015 at 6:34 PM, Dave Warren wrote: On 2015-12-02 09:14, Sebastian Arcus wrote: Perfect - that's exactly the sort of real-life based advice I was looking for. Many thanks! I run a small shared hosting environment, with a global bayes for all users as not enough users are ready/willing/able to take the time to sort ham (although more will press "this is spam") and in general, the results work out well enough. A portion of the bayes database is the header information from the email. What does it mean for those headers that contain info specific to a particular domain or site when it's transferred to another domain or site where those specifics will be different? see attached php/formail-script and list of ignored/stripped headers we strip a large portion of headers including especially the Received headers with "formail" and preprend a egenric one on top from all samples before train them Does that mean that transferring bayes databases between sites without stripping the headers wouldn't work - or it is just more effective if one strips the headers?
Re: Is it worth transferring bayes data between different sites?
On 03/12/15 00:29, Charles Sprickman wrote: Reindl Harald wrote: Am 02.12.2015 um 21:50 schrieb Charles Sprickman: Reindl Harald wrote: Am 02.12.2015 um 12:51 schrieb Sebastian Arcus: I hope I'm not exceeding the patience of the list by posting a third question in two days :-) I realise the above question is a "soft" question, probably without a definite "yes" or "no" answer. I am hoping that people with experience of using SA in various environments might be able to throw in some opinions. Based on the documentation, it is clearly possible to transfer a bayes database from one install to another - specially if it is a sitewide database. What I was wondering is if it is worth doing so from a results point of view we use our global bayes on the incoming MX and share it with our submission servers to stop outgoing spam from hacked accounts This is a bit OT, but I have had a hard time finding how to setup a global bayes DB rather than having everything done on a per-user basis. Looking around the SA wiki, I don’t see global DBs addressed. Any tips? https://wiki.apache.org/spamassassin/SiteWideBayesSetup in case you are runnign spamass-milter that's even the logical default because your milter is running as it's own user, with it's own .spamassassin directory in the userhome which contains the db I had a look at that page - I use mysql to store the data, have multiple spamd boxes, and spamc on the inbound servers passing mail to spamd once all the “front door” checks are done. In that config, I end up with unique per-user bayes tokens. I’m looking to just pool everyone together, but don’t see an obvious way to do that. It seems like folks in this thread are however doing that somehow (perhaps just because they are using a milter or similar). In case in helps: I use SA with exim - and Exim talks over Unix sockets to spamd daemon. I've used the instructions at the wiki page above to setup the sitewide bayes database - but I don't use MySQL - and it all seems to work as expected.