On 2022-02-08 at 07:46:17 UTC-0500 (Tue, 8 Feb 2022 13:46:17 +0100) Axb <[email protected]> is rumored to have said:
> On 2/8/22 11:33, Kevin A. McGrail wrote: >> Auto learning is something that should never of existed. All it does is >> reinforce misclassification and slowly spirals the database into having >> wrong answers be more wrong. > > I don't agree - I've been running autoloearn for years and my bayes results > have always been solid. > (and I'm speaking of a global bayes redis DB in a 200k user setup) With substantially smaller systems (my own personal server and those I manage for my employer) I have the same benign experience. I don't think we should disable auto-learn by default *in any way* without actual research and hard data beyond anecdotal experience. > Where I see potential is in optimizing auto expiration when using a file > based DB. Very often DB is locked and tokens cannot be expired which leads to > what you call "reinforce misclassification". If tokens are expired regularly, > skewing is very improbable. > Thankfully, using Redis, it's way more controllable. I think that's also not a problem for systems that are not persistently loaded with in-process mail. All we see as SA maintainers are our own systems and cases that people are having problems with. I don't think we really know whether auto-learn works well generally or why/how it breaks when it does. >> Since we don't seem to have consensus on changing the default does anybody >> object to a pre-file that disables it? That would be more clearly >> documented in people will look at the pre-file for V4. > > I'm -1 for disabling, one way or another. Same. It would substantially change how peoples' existing stable systems operate. I'm less averse to tweaking default auto-learning parameters. In ALL cases where I use auto-learn I have reduced both thresholds, so I learn as ham ONLY mail with negative scores (< -0.1, so effectively at least 2 ham-signs...) and learn as spam substantially more than just the absurdly spammy stuff. This sacrifices some overall effectiveness in theory but I think it also helps make Bayes less brittle. I have NOT done rigorous testing to prove that. I believe that SA has reached the point of broad use where we should be making substantial change decisions based on hard data rather than anecdote and lore. -- Bill Cole [email protected] or [email protected] (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
