Re: Bayes problems and German Spam

2005-05-16 Thread Duncan Hill
On Monday 16 May 2005 12:15, Ronan McGlue typed:

> I too have all net tests enabled and have started from a fresh clean new
> database friday, and already Im seeing the german spams hit bayes_00...
> I dont want to switch autolearning off becuase well i find it incredibly
> usefull. i have spam/ham thresholds at 10/0 respectivly and all appears
> well aside from the german bunch of spams...

The prolocation rulesets based on subject seem to be working quite well here.  
They're slowing dragging Bayes up - currently at _44 after ~50 spams that 
scored nice and high.  At least one server had worked out on its own that the 
spams should be _99.


Re: Bayes problems and German Spam

2005-05-16 Thread Ronan McGlue
Simon Byrnand wrote:
At 09:53 16/05/2005, Jo wrote:
Simon Byrnand wrote:
Hi All,
After going from 2.64 to 3.0.3 I thought Bayes was working much 
better - previously certain classes of spam were being consistently 
reported as ham, scoring BAYES_00 no matter what I did, or how much 
manual training I did. (Autolearning enabled)

After upgrading to 3.0.3 and clearing the Bayes database everything 
seemed fine for a week or so, now it's back to its old habits :(

Particularly frustrating is the complete inability of sa-learn to 
correct the thinking of Bayes - all the recent flood of German spams 
are scoring BAYES_00, and DESPITE the fact that I have manually 
learnt well over two dozen of these as spam (which includes all the 
variations of them I've seen so far) new copies of identical spams 
STILL score BAYES_00. WHY ?

If the autolearn system can't be overridden with some manual 
learning, it makes it more of less useless :(

A few other spams that were previously getting BAYES_99 are now down 
to BAYES_00 for no apparent reason. It's highly unlikely that they 
were autolearnt as ham, as they hit several other tests too. It seems 
that Bayes is still exploitable... :(

Any suggestions ?
Regards,
Simon

Clear your bayes database and start all over again. Switch off 
auto-learning and rely purely on manual learning in a feedback loop. 
Grab a mail box of known ham and another folder of known spam. 
Preferably use a thousand of each.

Hmm, not very practical when the system has several thousand 
users/mailboxes. There is no way I would be able to keep current with 
manual learning just based on my own personal mailbox...(and I can 
hardly go poking around in other peoples mailboxes to gather ham/spam to 
learn)

 If you ever switch on autolearning again. Set the treshold at -0.2 
for ham and 10 or 15 for spam.

Are there even any negative scores in 3.0.3 ? I thought negative scores 
were pretty much eliminated in recent versions, so with -0.2 it would 
never learn any ham.

Enable network tests, razor2, pyzor and dcc work wonders on the site I 
administer.

Already have all network tests enabled, always have done.
Regards,
Simon
I too have all net tests enabled and have started from a fresh clean new 
database friday, and already Im seeing the german spams hit bayes_00...
I dont want to switch autolearning off becuase well i find it incredibly 
usefull. i have spam/ham thresholds at 10/0 respectivly and all appears 
well aside from the german bunch of spams...

dont know what else i can do...
*cluches at straws*
Is there a way to tie in a positive net test... say multi.surbl.org  to 
sway the bayes as generally if the SURBL reports spam you can guaratee 
that all the other rules are surplus to requiremtns... IMHO

ronan
--

Regards
Ronan McGlue
Info. Services
QUB


Re: Bayes problems and German Spam

2005-05-15 Thread Simon Byrnand
At 09:53 16/05/2005, Jo wrote:
Simon Byrnand wrote:
Hi All,
After going from 2.64 to 3.0.3 I thought Bayes was working much better - 
previously certain classes of spam were being consistently reported as 
ham, scoring BAYES_00 no matter what I did, or how much manual training I 
did. (Autolearning enabled)

After upgrading to 3.0.3 and clearing the Bayes database everything 
seemed fine for a week or so, now it's back to its old habits :(

Particularly frustrating is the complete inability of sa-learn to correct 
the thinking of Bayes - all the recent flood of German spams are scoring 
BAYES_00, and DESPITE the fact that I have manually learnt well over two 
dozen of these as spam (which includes all the variations of them I've 
seen so far) new copies of identical spams STILL score BAYES_00. WHY ?

If the autolearn system can't be overridden with some manual learning, it 
makes it more of less useless :(

A few other spams that were previously getting BAYES_99 are now down to 
BAYES_00 for no apparent reason. It's highly unlikely that they were 
autolearnt as ham, as they hit several other tests too. It seems that 
Bayes is still exploitable... :(

Any suggestions ?
Regards,
Simon
Clear your bayes database and start all over again. Switch off 
auto-learning and rely purely on manual learning in a feedback loop. Grab 
a mail box of known ham and another folder of known spam. Preferably use a 
thousand of each.
Hmm, not very practical when the system has several thousand 
users/mailboxes. There is no way I would be able to keep current with 
manual learning just based on my own personal mailbox...(and I can hardly 
go poking around in other peoples mailboxes to gather ham/spam to learn)

 If you ever switch on autolearning again. Set the treshold at -0.2 for 
ham and 10 or 15 for spam.
Are there even any negative scores in 3.0.3 ? I thought negative scores 
were pretty much eliminated in recent versions, so with -0.2 it would never 
learn any ham.

Enable network tests, razor2, pyzor and dcc work wonders on the site I 
administer.
Already have all network tests enabled, always have done.
Regards,
Simon


Re: Bayes problems and German Spam

2005-05-15 Thread Jo
Simon Byrnand wrote:
Hi All,
After going from 2.64 to 3.0.3 I thought Bayes was working much better 
- previously certain classes of spam were being consistently reported 
as ham, scoring BAYES_00 no matter what I did, or how much manual 
training I did. (Autolearning enabled)

After upgrading to 3.0.3 and clearing the Bayes database everything 
seemed fine for a week or so, now it's back to its old habits :(

Particularly frustrating is the complete inability of sa-learn to 
correct the thinking of Bayes - all the recent flood of German spams 
are scoring BAYES_00, and DESPITE the fact that I have manually learnt 
well over two dozen of these as spam (which includes all the 
variations of them I've seen so far) new copies of identical spams 
STILL score BAYES_00. WHY ?

If the autolearn system can't be overridden with some manual learning, 
it makes it more of less useless :(

A few other spams that were previously getting BAYES_99 are now down 
to BAYES_00 for no apparent reason. It's highly unlikely that they 
were autolearnt as ham, as they hit several other tests too. It seems 
that Bayes is still exploitable... :(

Any suggestions ?
Regards,
Simon
Clear your bayes database and start all over again. Switch off 
auto-learning and rely purely on manual learning in a feedback loop. 
Grab a mail box of known ham and another folder of known spam. 
Preferably use a thousand of each. If you ever switch on autolearning 
again. Set the treshold at -0.2 for ham and 10 or 15 for spam.
Enable network tests, razor2, pyzor and dcc work wonders on the site I 
administer.

Good luck,
Jo


Bayes problems and German Spam

2005-05-15 Thread Simon Byrnand
Hi All,
After going from 2.64 to 3.0.3 I thought Bayes was working much better - 
previously certain classes of spam were being consistently reported as ham, 
scoring BAYES_00 no matter what I did, or how much manual training I did. 
(Autolearning enabled)

After upgrading to 3.0.3 and clearing the Bayes database everything seemed 
fine for a week or so, now it's back to its old habits :(

Particularly frustrating is the complete inability of sa-learn to correct 
the thinking of Bayes - all the recent flood of German spams are scoring 
BAYES_00, and DESPITE the fact that I have manually learnt well over two 
dozen of these as spam (which includes all the variations of them I've seen 
so far) new copies of identical spams STILL score BAYES_00. WHY ?

If the autolearn system can't be overridden with some manual learning, it 
makes it more of less useless :(

A few other spams that were previously getting BAYES_99 are now down to 
BAYES_00 for no apparent reason. It's highly unlikely that they were 
autolearnt as ham, as they hit several other tests too. It seems that Bayes 
is still exploitable... :(

Any suggestions ?
Regards,
Simon