Re: Shortcurcuit scoring problem (3.2.5)
Felix, Thank you for information. guenther, Yes, you are right, but this is not a reason for not working plug-ins and options. These options seems convenient and they are described in documentation, but did not work in 3.2.3 and do not work up to now. -- View this message in context: http://www.nabble.com/Shortcurcuit-scoring-problem-%283.2.5%29-tp19414806p19513260.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Shortcurcuit scoring problem (3.2.5)
Hello, I am using version 3.2.5 and, to reduce spam and SA footprint, turned on shortcircuit plug-in. I use standard 60_shortcircuit.cf. At the first time, I enabled only whitelist/blacklist rules and this works great. Now, due to increasing amount of spam, I decided to turn on bayes shortcircuit rules. Shortcircuit doc (http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html) says that in case of "spam" it "override the default score of this rule with the score from shortcircuit_spam_score" but this was not true - plug-in does not change score and it remains 3.5 which is lower than my threshold and it breaks further checking and the situation gets worse than without this rule - I have started receiving much more spam (with score 3.5 from bayes). Ok, I have added to this file "shortcircuit_spam_score 100" and "shortcircuit_ham_score 100" explicitly, but this did not help - result was the same - looks like these options do not work at all. Ok, I think, it is possible to change bayes_99 score in appropriate file, but, I think, this will not be correct. Thanks. -- View this message in context: http://www.nabble.com/Shortcurcuit-scoring-problem-%283.2.5%29-tp19414806p19414806.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Suggestion to developers
Matt Kettler-3 wrote: > > Sure, some messages will bail out faster, but most messages will take > much longer to scan. How is that better? > > I don't debate that the basic idea of having SA do this "automagically" > would be a great thing. However, the reality of doing it efficiently is > much trickier than you think. > > At one point, one idea was to run all the negative scoring rules, and > then run the positive scoring ones, and bail out if the score went over > the spam threshold during the positive phase. > > The end result of that test was abysmally slow, due to having to scan > the message in two passes (negative header, negative body, positive > header, positive body). > I trust you. And, probably, any reordering may impact performance (original ruleset is carefully tuned). Unfortunately, I don't know rules order in processing (equal to load order established by first numbers in configs filename?) But, I see that shortcirquit does reordering (bayes, whitelists and some others) and nothing dramatic happens. Even more, this plug-in is recommended for use (in propertly set up installations). Of course, if we will consider an abstract case where negative rules may happen in body as well as in header in unpredictable quantity and order, and reordering is impossible, this idea has no right to live. But, in reality, we see that almost all negative rules are about the header with the only exception - bayes. And this test (bayes) is moved to the top by shortcirquit (before all header tests), and this does not harm performance. I think, this situation (all negatives are from the header) will be preserved in future version of SA, because of nature of email messages. So, I think, it is possible to turn on collected points check after [prioritized rules + header rules] (and inside body rules), without any sorting if this is undesirable. -- View this message in context: http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12674988 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Suggestion to developers
Matt Kettler-3 wrote: > >> 1. Using this method, admin must understand that the fate of every >> message >> (for all users) will depend from the single rule. > Not if you set it up properly.. You can have multiple rules run with a > very early priority (low number), then have another one run with a > semi-early priority which does shortcircuiting. All of the "very early" > rules will be involved in the decision to shortcircuit or not. > Yes, but low-numbered rules may not generate any points and the desision may depend from one rule anyways. This does not change anything. And what is more (see (2) with which you have agreed), in default configuration, this will be bayes which generates only 3.5 points (not taking into account while/black lists because they will not be set up properly in most cases). And, I think, number of persons not wishing to reorder standard rules will be much more than "semi-professional" admins. Matt Kettler-3 wrote: > >> 2. I suspect that not every admin could be smart enough or have enough >> time >> to develop his own rulesets with shortcircuit involved to get really good >> and reliable results. But, he could be able to turn some option in config >> file and restart SA. >> > Agreed. >> 3. Method proposed by me is not mutually exclusive with shortcircuit. >> They >> could work together. >> > Yes, but the method you proposed is only feasible using these tools > anyway. SA can't "auto-sort" the rules in any reasonble way without > severely degrading performance, or risking serious miscategorization > problems. > But, as we can see, an option named "priority" exists. That means, SA really does some kind of sorting. And, theoretically, user can assign any priority to any rule and SA will work, as a stable product. Isn't it? Sort order may be: negative rules, sorted positive common rules. Any user-defined rules should be checked after negative ones and before positives, if exists. Of course, sorting should be performed once upon load procedure. Or, such a cut-off may work without any sorting; this is optional. Standard priorities could be enough, if they set up. Matt Kettler-3 wrote: > > Trust me, the topic isn't new, and shortcircuit/priority is about the > best you can do. You have to make those manual decisions. > > Now, it's possible for the devs to be the deciders, not the end-admins, > but someone has to manually prioritize. > Thank you. I just want to draw attention of developers to this problem. Every other message here is about productivity. -- View this message in context: http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12653743 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Suggestion to developers
Matt Kettler-3 wrote: > > SA 3.2.x already does this, you just need to know how. Read the docs on > the shortcircuit plugin, and the "priority" option for rules: > > Shortcircuit allows you to define when to "bail out" > http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html > Thank you for very useful information. This method and plug-in could really make checking faster. But, I have to say: 1. Using this method, admin must understand that the fate of every message (for all users) will depend from the single rule. In some cases, this looks like not enough, especially when the system is used by multiple users with quite different desired average message content. So, bayes may generate false positives, in default configuration. 2. I suspect that not every admin could be smart enough or have enough time to develop his own rulesets with shortcircuit involved to get really good and reliable results. But, he could be able to turn some option in config file and restart SA. 3. Method proposed by me is not mutually exclusive with shortcircuit. They could work together. Thanks. -- View this message in context: http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12651905 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Suggestion to developers
> The most effective way I've found to lower the SA footprint is to limit > the mail that gets to it by using some triage on the MTA side. SA as a > standalone tool might benefit from some kind of triage functionality to > kill messages immediately as per a "blacklist" rule. The blacklist > rule(s) would be run against the messages before the normal ruleset was > applied. If any of the blacklist rules were triggered, the message > would be dropped without further scanning. > I am not sure that messages after positive blacklist check will be dropped. As far as I see, SA just adds 100 points to this message and continues checking. And I am not sure about the order of rules in checking process. -- View this message in context: http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12638431 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
RE: Suggestion to developers
Of course, this would not be simple to implement this, but, I think, as SA becomes more heavy, developers will be forced to find ways of "scissoring". To preserve nagative scores, SA could run these rules first. And, while sorting, SA should take into account possible dependencies between rules - read all rules from all config files and build a forest of rule trees. I think, SA does this anyways and all custom rules will be included into a set of rules in memory. Sort order, for simplicity, could be from rules with high score to ones with low score. And even this could help greatly. Skip Brott wrote: > > In order to implement something like this, you would need to know the > order > of rules processing (which perhaps there is one - but I don't know it). > You > would need to be careful if you have rules which will assign negative > scores > which typically do so after other rules have already given positive ones. > Every SA implementation would be unique, so SA would have to be modified > to > rules some specific rule sets first before any others (maybe it does now?) > and you would then want to make certain your custom scores go into those > files. In my own implementation, I put my custom rules into a unique .cf > file which I have created so I can distinguish it from other rule sets. > The > "out-of-the-box" SA wouldn't run this file first (unless SA can be > modified > to read a designated file before it reads others). > > -Original Message- > From: Crocomoth [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 12, 2007 9:42 AM > To: users@spamassassin.apache.org > Subject: Suggestion to developers > > > SpamAssassin is a really great product. > But, it is perl-based and checks every message with a lot of (all) rules > (, > always!). > Volume of spam is constantly increasing, as well as CPU and memory load > that > SA creates on servers. > As a SA user, I would be happy to have the following possibility in the > next > version: > 1. Add an option which will allow to limit number of rules run against > every > message. I.e., if the limit of spam points is reached to required_score, > stop further checking and process the message as a spam. > I think, not all users really interested in gathering all statistics about > all spam messages. > 2. According to (1), it makes sense to sort all rules from lightweight to > heavyweight (including ones which require internet queries) and make > checking in this order. > > This could allow to lower SA footprint. > Thanks. > > -- > View this message in context: > http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12637043 > Sent from the SpamAssassin - Users mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12638411 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Suggestion to developers
SpamAssassin is a really great product. But, it is perl-based and checks every message with a lot of (all) rules (, always!). Volume of spam is constantly increasing, as well as CPU and memory load that SA creates on servers. As a SA user, I would be happy to have the following possibility in the next version: 1. Add an option which will allow to limit number of rules run against every message. I.e., if the limit of spam points is reached to required_score, stop further checking and process the message as a spam. I think, not all users really interested in gathering all statistics about all spam messages. 2. According to (1), it makes sense to sort all rules from lightweight to heavyweight (including ones which require internet queries) and make checking in this order. This could allow to lower SA footprint. Thanks. -- View this message in context: http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12637043 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.