On 2022-02-08 at 07:46:17 UTC-0500 (Tue, 8 Feb 2022 13:46:17 +0100)
Axb <[email protected]>
is rumored to have said:

> On 2/8/22 11:33, Kevin A. McGrail wrote:
>> Auto learning is something that should never of existed. All it does is
>> reinforce misclassification and slowly spirals the database into having
>> wrong answers be more wrong.
>
> I don't agree - I've been running autoloearn for years and my bayes results 
> have always been solid.
> (and I'm speaking of a global bayes redis DB in a 200k user setup)

With substantially smaller systems (my own personal server and those I manage 
for my employer) I have the same benign experience. I don't think we should 
disable auto-learn by default *in any way* without actual research and hard 
data beyond anecdotal experience.


> Where I see potential is in optimizing auto expiration when using a file 
> based DB. Very often DB is locked and tokens cannot be expired which leads to 
> what you call "reinforce misclassification". If tokens are expired regularly, 
> skewing is very improbable.
> Thankfully, using Redis, it's way more controllable.

I think that's also not a problem for systems that are not persistently loaded 
with in-process mail.

All we see as SA maintainers are our own systems and cases that people are 
having problems with. I don't think we really know whether auto-learn works 
well generally or why/how it breaks when it does.

>> Since we don't seem to have consensus on changing the default does anybody
>> object to a pre-file that disables it? That would be more clearly
>> documented in people will look at the pre-file for V4.
>
> I'm -1 for disabling, one way or another.

Same. It would substantially change how peoples' existing stable systems 
operate.

I'm less averse to tweaking default auto-learning parameters. In ALL cases 
where I use auto-learn I have reduced both thresholds, so I learn as ham ONLY 
mail with negative scores (< -0.1, so effectively at least 2 ham-signs...) and 
learn as spam substantially more than just the absurdly spammy stuff. This 
sacrifices some overall effectiveness in theory but I think it also helps make 
Bayes less brittle. I have NOT done rigorous testing to prove that.

I believe that SA has reached the point of broad use where we should be making 
substantial change decisions based on hard data rather than anecdote and lore.

-- 
Bill Cole
[email protected] or [email protected]
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Reply via email to