Re: How do I reenable AWL on spamassassin 3.3 after upgrade from 3.1

2012-08-02 Thread Adam Katz
>> Den 2012-07-26 17:26, Nißl Reinhard skrev:
>>> reading the manuals, I've discovered that the AWL plugin isn't 
>>> loaded anymore in spamassassin 3.3. Therefore I put the
>>> following lines into local.cf:

> On Fri, 27 Jul 2012 02:57:26 +0200 Benny Pedersen wrote:
>> oh no, do not put loadlugin into *.cf files its wrong pr design,
>> but so much wiki and bad behavior still continues

Speaking Hawaiian? (wiki == "quick")  Or does the wiki actually suggest
this behavior?

On 07/26/2012 06:14 PM, RW wrote:
> It seems inelegant, but is there a practical reason why this
> shouldn't be done. Some optional plugins such as Botnet and iXhash
> load themselves from their own .cf files.

Yes, there is a practical reason.

In short, .pre files are read before .cf files, allowing all rules
access to all plugins.  If a plugin was loaded in a .cf file, it would
not be available to .cf files that load earlier.



There is a very careful ordering to the loading of files to ensure that
newer versions and overrides are correctly loaded.  Others should
correct me if I have this wrong*

1. Load /etc/spamassassin/*.pre
2. If /var/lib/spamassassin/[version] exists:
   (a) Load its *.pre files and then its *.cf files
   (b) Otherwise, load *.pre then *.cf in /usr/share/spamassassin
3. Load /etc/spamassassin/*.cf
4. Parse in order of loading
5. If an "include" line is encountered, interrupt everything and
   (a) load the named file
   (b) parse the named file

Files within a directory are sourced by asciibetical order (same as
`ls`).  Sub-directories are NOT examined.  Each individual file is read
from top to bottom, pausing for "include" directives as noted above
(this is how the updates area can have a hierarchy).

This lets /etc/spamassassin/local.cf (or wherever your system puts it)
run last, thus allowing you to trump scores and definitions.

Because of the loading order, third party plugins and configs whose
installations suggest /etc/spamassassin should have file names that
asciibetically precede local.cf, ideally starting with two digits and an
underscore, mimicking the SA upstream (e.g. 20_drugs.cf).


Getting back to your question, this means that if Botnet or iXhash are
depended on before they are loaded, the dependent rule won't load
correctly.  The default install of iXhash doesn't have a problem here
because it's a self-contained item, so it loads the plugin and then uses
it later on in the same file.

This is not advisable because when you then go in to add additional
rules for that plugin, say by adding rules querying the third-party
iXhash repository from Spam-Eating Monkey in external.cf, it won't work
because the iXhash plugin isn't loaded until iXhash.cf.  Furthermore, it
prevents third-party sa-update channels from using the plugin since they
are loaded in step 2 while local.cf is loaded in step 3.

It also makes maintenance (and troubleshooting) harder, though
SpamAssassin will take it.  (There are lots of things SA can do that are
ill advised, like meta rules that use the ternary operator.)



The .pre files that live in /etc are kind of stuck named like that
(including init.pre, which is essentially v300.pre) due to their
location (otherwise, upgrading would require wiping them, which is taboo
in the Unix world).  I'd suggest installing an empty "local.pre" were it
not for the fact that this would come /before/ the others.  Maybe a
"z_local.pre" file?


* Footnote:  Methodology.

This should reveal the load order (but not the parse order):

spamassassin --lint -D config 2>&1 |egrep -o '/.*\.(pre|cf)$' |uniq



signature.asc
Description: OpenPGP digital signature


Re: Bayes causes long scan times once every 24 hours

2012-08-02 Thread RW
On Thu, 2 Aug 2012 07:37:02 +
Daniel Lemke wrote:

> We have a strange problem with our Bayes filter here, looks like the
> learning and/or the journal sync regularly causes a high scan times
> of about 90 seconds. It occurs once every day at nearly the same time
> (around 8.04pm).
> 
> This is an extract from spamd logs when the problem just occurred:
> http://pastebin.com/iutSEajZ
> 
> Debug log for the corresponding period of time can be found here:
> http://download.jam-software.de/SaWin/spamdDebugLog-Jul26.txt
> 
> We already tried disabling Bayes autolearn and set the
> bayes_journal_max_size to 0, without success. This is kind of curious
> as it occurs every day at the same time, no matter how often journal
> or token sync was done before. So the token or journal file sizes
> apparently don't matter.

If it weren't for its happening at the same, it would almost certainly
be due to autoexpiry. The standard advice would be to turn-off
autexpiry and expire by running sa-learn --force-expire from cron (or
Windows equivalent). If you aren't already doing this it's worth ruling
it out.


RE: Bayes causes long scan times once every 24 hours

2012-08-02 Thread Daniel Lemke
> -Original Message-
> From: Axb [mailto:axb.li...@gmail.com]
> Sent: Thursday, August 02, 2012 9:53 AM
> To: users@spamassassin.apache.org
> Subject: Re: Bayes causes long scan times once every 24 hours
> Shots in the dark:
>
> - do you have an AV doing a scheduled scan during period? (or some other
> scheduled job which could be blocking files)
>

No AV - but a backup...
Good point, will check that out! :)

Regards
Daniel






JAM Software GmbH
Managing Director: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Phone: +49 (0)651-145 653 -0 * Fax: +49 (0)651-145 653 -29
Commercial register number HRB 4920 (AG Wittlich) http://www.jam-software.com


Re: Bayes causes long scan times once every 24 hours

2012-08-02 Thread Axb

On 08/02/2012 09:37 AM, Daniel Lemke wrote:

We have a strange problem with our Bayes filter here, looks like the learning 
and/or the journal sync regularly causes a high scan times of about 90 seconds.
It occurs once every day at nearly the same time (around 8.04pm).

This is an extract from spamd logs when the problem just occurred:
http://pastebin.com/iutSEajZ

Debug log for the corresponding period of time can be found here:
http://download.jam-software.de/SaWin/spamdDebugLog-Jul26.txt

We already tried disabling Bayes autolearn and set the bayes_journal_max_size 
to 0, without success.
This is kind of curious as it occurs every day at the same time, no matter how 
often journal or token sync was done before.
So the token or journal file sizes apparently don't matter.

Bayes files are stored using the default Berkeley DB, system is a Windows 2008 
64 bit.

Does anybody have an idea what may be the cause for this?


- Your Bayes DB size shouldn't be an issue - we can rule that out.
- No apparent Bayes DB lock issues


Shots in the dark:

- do you have an AV doing a scheduled scan during period? (or some other 
scheduled job which could be blocking files)


- what happens if you put ".spamassassin/bayes_* " outside of windows 
system32 path? Maybe within your SA dir structure?




probably unhelpful :)

Axb






Bayes causes long scan times once every 24 hours

2012-08-02 Thread Daniel Lemke
We have a strange problem with our Bayes filter here, looks like the learning 
and/or the journal sync regularly causes a high scan times of about 90 seconds.
It occurs once every day at nearly the same time (around 8.04pm).

This is an extract from spamd logs when the problem just occurred:
http://pastebin.com/iutSEajZ

Debug log for the corresponding period of time can be found here:
http://download.jam-software.de/SaWin/spamdDebugLog-Jul26.txt

We already tried disabling Bayes autolearn and set the bayes_journal_max_size 
to 0, without success.
This is kind of curious as it occurs every day at the same time, no matter how 
often journal or token sync was done before.
So the token or journal file sizes apparently don't matter.

Bayes files are stored using the default Berkeley DB, system is a Windows 2008 
64 bit.

Does anybody have an idea what may be the cause for this?







JAM Software GmbH
Geschäftsführer: Joachim Marder
Am Wissenschaftspark 26 * 54296 Trier * Germany
Tel: 0651-145 653 -0 * Fax: 0651-145 653 -29
Handelsregister Nr. HRB 4920 (AG Wittlich) http://www.jam-software.de