Re: SA: lottery message scored hammy by bayes

2009-08-27 Thread Dennis German

Apparently I am not sure if bayes is autolearning
I am on a shared host service  (midphase)
which uses cPanel and has exim do the spamassassin stuff.
They use my scores but ignore other commands.
When I get a message I think I shouldn't have I
save it and run spamc  m   .out inorder to
see the X-Spam-report (which is Not included in ham !)

My userprefs is always available at
http:/www.Real-World-Systems.com/mail/user_prefs.html


I have not manually trained bayes.
Thanks



John Hardin wrote:

On Tue, 25 Aug 2009, Dennis German wrote:


email with this content:

CONGRATULATION ...

received these scores

X-Spam-testscores: 
BAYES_00=-2.599,HTML_MESSAGE=0.001,MISSING_HEADERS=5.7,

   SUBJ_ALL_CAPS=3.1,UPPERCASE_75_100=1.528

Does this indicate that bayes needs tuning/learning?


Can you paste the output from sa-learn --dump magic ?

It probably indicates that Bayes has been mistrained - somebody is 
training spammy messages as ham.


How do you do your Bayes training? Autolearning, or purely manual, or 
some combination?


How many messages are getting inappropriate Bayes scores? If a lot are, 
you'll probably want to turn off autolearning (if you're using it) until 
you analyze the problem. You may need to wipe your Bayes database and 
start fresh if the problem is bad enough.


If you're using autolearning, what are your learning thresholds?

If you're manually training, do you keep your corpora so that you can 
review and correct errors? If so, review your ham corpora and see if any 
spams have crept in - and if so, retrain them as spam, SA will forget 
that they were hammy.




Re: lottery message scored hammy by bayes

2009-08-26 Thread Karsten Bräckelmann
On Tue, 2009-08-25 at 20:59 -0400, Dennis German wrote:
 email with this content:

Do *NOT* paste spam samples to the list. Use a pastebin or upload them
to your own server and provide a link instead.


 X-Spam-testscores: BAYES_00=-2.599,HTML_MESSAGE=0.001,MISSING_HEADERS=5.7,
 SUBJ_ALL_CAPS=3.1,UPPERCASE_75_100=1.528

That MISSING_HEADERS score is custom, and *way* off-base IMHO.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Auto-Learn Thresholds (was: lottery message scored hammy by bayes)

2009-08-26 Thread Karsten Bräckelmann
On Tue, 2009-08-25 at 22:13 -0400, Alex wrote:
  If you're using autolearning, what are your learning thresholds?
 
 What do you recommend for thresholds? I'm considering using
 autolearning, but very concerned about corrupting the database. I
 think I would use something like +15 for spam.

I generally recommend the defaults, unless you *do* know you need
something else. That's why they are defaults.

That's = 0.1 for ham and = 12.0 for spam. Keep in mind these scores
are calculated using a non-Bayes score set, so they generally differ
from the overall score of the message. Also, this does not take various
specific rules' scores into account, like Bayes and AWL. Plus some more
esoteric constraints.

See the docs. [1]


 There are FNs on occasion in the 2.x range with low bayes numbers (or
 BAYES_50) that I wouldn't want to be tagged as ham. Should that be a
 concern?

No.  Bayes auto-learning is *not* self-feeding.

Any overall score of about 2 (with Bayes) is *very* unlikely to cross
either threshold when using the respective non-Bayes score-set.

Moreover, your concern is with Bayes probability = 50%, and thus a
negative score for the BAYES hit. This hit is not considered for
auto-learning, though, and as a first rule-of-thumb subtract that score
again -- which yields a slightly higher score. Still no way even close
to the thresholds.


 Even mail that has been whitelisted could also contain spam, so would
 a ham threshold of like -100 work, or present the same problem?

60_whitelist.cf:  tflags USER_IN_WHITELIST  userconf nice noautolearn

Again, as per the docs [1], whitelisting will not be considered for the
decision whether to auto-learn or not.

  guenther


[1] 
http://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: lottery message scored hammy by bayes

2009-08-26 Thread Benny Pedersen

On ons 26 aug 2009 15:23:01 CEST, Karsten Bräckelmann wrote


X-Spam-testscores: BAYES_00=-2.599,HTML_MESSAGE=0.001,MISSING_HEADERS=5.7,
SUBJ_ALL_CAPS=3.1,UPPERCASE_75_100=1.528


That MISSING_HEADERS score is custom, and *way* off-base IMHO.


question is more what header is missing

--
xpoint



lottery message scored hammy by bayes

2009-08-25 Thread Dennis German

email with this content:

CONGRATULATION YOUR EMAIL ADDRESS HAS WON YOU THE 2010 FIFA WORLDCUP LOTTER=
Y OPEN THE ATTACHMENT AND VIEW THE PROFILE OF YOUR WINNING FUND=2C ALSO CON=
TACT YOUR CLAIM AGENT

received these scores

X-Spam-testscores: BAYES_00=-2.599,HTML_MESSAGE=0.001,MISSING_HEADERS=5.7,
   SUBJ_ALL_CAPS=3.1,UPPERCASE_75_100=1.528

Does this indicate that bayes needs tuning/learning?

Thank you



Re: lottery message scored hammy by bayes

2009-08-25 Thread John Hardin

On Tue, 25 Aug 2009, Dennis German wrote:


email with this content:

CONGRATULATION YOUR EMAIL ADDRESS HAS WON YOU THE 2010 FIFA WORLDCUP LOTTER=
Y OPEN THE ATTACHMENT AND VIEW THE PROFILE OF YOUR WINNING FUND=2C ALSO CON=
TACT YOUR CLAIM AGENT

received these scores

X-Spam-testscores: BAYES_00=-2.599,HTML_MESSAGE=0.001,MISSING_HEADERS=5.7,
   SUBJ_ALL_CAPS=3.1,UPPERCASE_75_100=1.528

Does this indicate that bayes needs tuning/learning?


Can you paste the output from sa-learn --dump magic ?

It probably indicates that Bayes has been mistrained - somebody is 
training spammy messages as ham.


How do you do your Bayes training? Autolearning, or purely manual, or some 
combination?


How many messages are getting inappropriate Bayes scores? If a lot are, 
you'll probably want to turn off autolearning (if you're using it) until 
you analyze the problem. You may need to wipe your Bayes database and 
start fresh if the problem is bad enough.


If you're using autolearning, what are your learning thresholds?

If you're manually training, do you keep your corpora so that you can 
review and correct errors? If so, review your ham corpora and see if any 
spams have crept in - and if so, retrain them as spam, SA will forget that 
they were hammy.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If someone has a gun and is trying to kill you, it would be
  reasonable to shoot back with your own gun.
  -- the Dalai Lama, May 15, 2001
---
 Today: the 1930th anniversary of the destruction of Pompeii


Re: lottery message scored hammy by bayes

2009-08-25 Thread MySQL Student
Hi,

 If you're using autolearning, what are your learning thresholds?

What do you recommend for thresholds? I'm considering using
autolearning, but very concerned about corrupting the database. I
think I would use something like +15 for spam.

There are FNs on occasion in the 2.x range with low bayes numbers (or
BAYES_50) that I wouldn't want to be tagged as ham. Should that be a
concern?

Even mail that has been whitelisted could also contain spam, so would
a ham threshold of like -100 work, or present the same problem?

Thanks,
Alex


Re: lottery message scored hammy by bayes

2009-08-25 Thread Benny Pedersen

On ons 26 aug 2009 02:59:06 CEST, Dennis German wrote


X-Spam-testscores: BAYES_00=-2.599,HTML_MESSAGE=0.001,MISSING_HEADERS=5.7,
   SUBJ_ALL_CAPS=3.1,UPPERCASE_75_100=1.528

Does this indicate that bayes needs tuning/learning?


if you want bayes to know its spam yes, remember to train every email  
as spam not only this msg if you get more then one, the more spam you  
get the better bayes know you dont want it to be ham in bayes


same goes for ham the other way around, but dont train to much if msgs  
is unsure, if unsire do it anyway :)


missing headers seems bad, are you sure the msg is full rfc822 ?

--
xpoint