Re: Shortcurcuit scoring problem (3.2.5)

2008-09-16 Thread Crocomoth

Felix,
Thank you for information.

guenther,
Yes, you are right, but this is not a reason for not working plug-ins and
options.
These options seems convenient and they are described in documentation, but
did not work in 3.2.3 and do not work up to now.

-- 
View this message in context: 
http://www.nabble.com/Shortcurcuit-scoring-problem-%283.2.5%29-tp19414806p19513260.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Shortcurcuit scoring problem (3.2.5)

2008-09-10 Thread Crocomoth

Hello,

I am using version 3.2.5 and, to reduce spam and SA footprint, turned on
shortcircuit plug-in.
I use standard 60_shortcircuit.cf.
At the first time, I enabled only whitelist/blacklist rules and this works
great.
Now, due to increasing amount of spam, I decided to turn on bayes
shortcircuit rules.
Shortcircuit doc
(http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html)
says that in case of "spam" it "override the default score of this rule with
the score from shortcircuit_spam_score" but this was not true - plug-in does
not change score and it remains 3.5 which is lower than my threshold and it
breaks further checking and the situation gets worse than without this rule
- I have started receiving much more spam (with score 3.5 from bayes).
Ok, I have added to this file "shortcircuit_spam_score 100" and
"shortcircuit_ham_score 100" explicitly, but this did not help - result was
the same - looks like these options do not work at all.

Ok, I think, it is possible to change bayes_99 score in appropriate file,
but, I think, this will not be correct.

Thanks.

-- 
View this message in context: 
http://www.nabble.com/Shortcurcuit-scoring-problem-%283.2.5%29-tp19414806p19414806.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Suggestion to developers

2007-09-14 Thread Crocomoth


Matt Kettler-3 wrote:
> 
> Sure, some messages will bail out faster, but most messages will take
> much longer to scan. How is that better?
> 
> I don't debate that the basic idea of having SA do this "automagically"
> would be a great thing. However, the reality of doing it efficiently is
> much trickier than you think.
> 
> At one point, one idea was to run all the negative scoring rules, and
> then run the positive scoring ones, and bail out if the score went over
> the spam threshold during the positive phase.
> 
> The end result of that test was abysmally slow, due to having to scan
> the message in two passes (negative header, negative body, positive
> header, positive body).
> 

I trust you.
And, probably, any reordering may impact performance (original ruleset is
carefully tuned).
Unfortunately, I don't know rules order in processing (equal to load order
established by first numbers in configs filename?)
But, I see that shortcirquit does reordering (bayes, whitelists and some
others) and nothing dramatic happens. Even more, this plug-in is recommended
for use (in propertly set up installations).

Of course, if we will consider an abstract case where negative rules may
happen in body as well as in header in unpredictable quantity and order, and
reordering is impossible, this idea has no right to live.
But, in reality, we see that almost all negative rules are about the header
with the only exception - bayes.
And this test (bayes) is moved to the top by shortcirquit (before all header
tests), and this does not harm performance.
I think, this situation (all negatives are from the header) will be
preserved in future version of SA, because of nature of email messages.
So, I think, it is possible to turn on collected points check after
[prioritized rules + header rules] (and inside body rules), without any
sorting if this is undesirable.

-- 
View this message in context: 
http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12674988
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Suggestion to developers

2007-09-13 Thread Crocomoth


Matt Kettler-3 wrote:
> 
>> 1. Using this method, admin must understand that the fate of every
>> message
>> (for all users) will depend from the single rule.
> Not if you set it up properly..  You can have multiple rules run with a
> very early priority (low number), then have another one run with a
> semi-early priority which does shortcircuiting. All of the "very early"
> rules will be involved in the decision to shortcircuit or not.
> 

Yes, but low-numbered rules may not generate any points and the desision may
depend from one rule anyways. This does not change anything. And what is
more (see (2) with which you have agreed), in default configuration, this
will be bayes which generates only 3.5 points (not taking into account
while/black lists because they will not be set up properly in most cases). 
And, I think, number of persons not wishing to reorder standard rules will
be much more than "semi-professional" admins.
 

Matt Kettler-3 wrote:
> 
>> 2. I suspect that not every admin could be smart enough or have enough
>> time
>> to develop his own rulesets with shortcircuit involved to get really good
>> and reliable results. But, he could be able to turn some option in config
>> file and restart SA.
>>   
> Agreed.
>> 3. Method proposed by me is not mutually exclusive with shortcircuit.
>> They
>> could work together.
>>   
> Yes, but the method you proposed is only feasible using these tools
> anyway. SA can't "auto-sort" the rules in any reasonble way without
> severely degrading performance, or risking serious miscategorization
> problems.
> 

But, as we can see, an option named "priority" exists.
That means, SA really does some kind of sorting.
And, theoretically, user can assign any priority to any rule and SA will
work, as a stable product. Isn't it?
Sort order may be: negative rules, sorted positive common rules. Any
user-defined rules should be checked after negative ones and before
positives, if exists. Of course, sorting should be performed once upon load
procedure.

Or, such a cut-off may work without any sorting; this is optional. Standard
priorities could be enough, if they set up.


Matt Kettler-3 wrote:
> 
> Trust me, the topic isn't new, and shortcircuit/priority is about the
> best you can do. You have to make those manual decisions.
> 
> Now, it's possible for the devs to be the deciders, not the end-admins,
> but someone has to manually prioritize.
> 

Thank you.
I just want to draw attention of developers to this problem.
Every other message here is about productivity.

-- 
View this message in context: 
http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12653743
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Suggestion to developers

2007-09-13 Thread Crocomoth


Matt Kettler-3 wrote:
> 
> SA 3.2.x already does this, you just need to know how. Read the docs on
> the shortcircuit plugin, and the "priority" option for rules:
> 
> Shortcircuit allows you to define when to "bail out"
> http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html
> 

Thank you for very useful information.
This method and plug-in could really make checking faster.
But, I have to say:
1. Using this method, admin must understand that the fate of every message
(for all users) will depend from the single rule. In some cases, this looks
like not enough, especially when the system is used by multiple users with
quite different desired average message content. So, bayes may generate
false positives, in default configuration.
2. I suspect that not every admin could be smart enough or have enough time
to develop his own rulesets with shortcircuit involved to get really good
and reliable results. But, he could be able to turn some option in config
file and restart SA.
3. Method proposed by me is not mutually exclusive with shortcircuit. They
could work together.

Thanks.

-- 
View this message in context: 
http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12651905
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



RE: Suggestion to developers

2007-09-12 Thread Crocomoth



> The most effective way I've found to lower the SA footprint is to limit
> the mail that gets to it by using some triage on the MTA side.  SA as a
> standalone tool might benefit from some kind of triage functionality to
> kill messages immediately as per a "blacklist" rule.  The blacklist
> rule(s) would be run against the messages before the normal ruleset was
> applied.  If any of the blacklist rules were triggered, the message
> would be dropped without further scanning.  
> 

I am not sure that messages after positive blacklist check will be dropped.
As far as I see, SA just adds 100 points to this message and continues
checking.
And I am not sure about the order of rules in checking process.

-- 
View this message in context: 
http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12638431
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



RE: Suggestion to developers

2007-09-12 Thread Crocomoth

Of course, this would not be simple to implement this, but, I think, as SA
becomes more heavy, developers will be forced to find ways of "scissoring".
To preserve nagative scores, SA could run these rules first.
And, while sorting, SA should take into account possible dependencies
between rules - read all rules from all config files and build a forest of
rule trees. I think, SA does this anyways and all custom rules will be
included into a set of rules in memory.
Sort order, for simplicity, could be from rules with high score to ones with
low score.
And even this could help greatly.


Skip Brott wrote:
> 
> In order to implement something like this, you would need to know the
> order
> of rules processing (which perhaps there is one - but I don't know it). 
> You
> would need to be careful if you have rules which will assign negative
> scores
> which typically do so after other rules have already given positive ones.
> Every SA implementation would be unique, so SA would have to be modified
> to
> rules some specific rule sets first before any others (maybe it does now?)
> and you would then want to make certain your custom scores go into those
> files.  In my own implementation, I put my custom rules into a unique .cf
> file which I have created so I can distinguish it from other rule sets. 
> The
> "out-of-the-box" SA wouldn't run this file first (unless SA can be
> modified
> to read a designated file before it reads others).
> 
> -Original Message-
> From: Crocomoth [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, September 12, 2007 9:42 AM
> To: users@spamassassin.apache.org
> Subject: Suggestion to developers
> 
> 
> SpamAssassin is a really great product.
> But, it is perl-based and checks every message with a lot of (all) rules
> (,
> always!).
> Volume of spam is constantly increasing, as well as CPU and memory load
> that
> SA creates on servers.
> As a SA user, I would be happy to have the following possibility in the
> next
> version:
> 1. Add an option which will allow to limit number of rules run against
> every
> message. I.e., if the limit of spam points is reached to required_score,
> stop further checking and process the message as a spam.
> I think, not all users really interested in gathering all statistics about
> all spam messages.
> 2. According to (1), it makes sense to sort all rules from lightweight to
> heavyweight (including ones which require internet queries) and make
> checking in this order.
> 
> This could allow to lower SA footprint.
> Thanks.
> 
> --
> View this message in context:
> http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12637043
> Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12638411
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Suggestion to developers

2007-09-12 Thread Crocomoth

SpamAssassin is a really great product.
But, it is perl-based and checks every message with a lot of (all) rules (,
always!).
Volume of spam is constantly increasing, as well as CPU and memory load that
SA creates on servers.
As a SA user, I would be happy to have the following possibility in the next
version:
1. Add an option which will allow to limit number of rules run against every
message. I.e., if the limit of spam points is reached to required_score,
stop further checking and process the message as a spam.
I think, not all users really interested in gathering all statistics about
all spam messages.
2. According to (1), it makes sense to sort all rules from lightweight to
heavyweight (including ones which require internet queries) and make
checking in this order.

This could allow to lower SA footprint.
Thanks.

-- 
View this message in context: 
http://www.nabble.com/Suggestion-to-developers-tf4429767.html#a12637043
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.