[SAtalk] Dump bayes db please explain the columns

2004-01-19 Thread Mrvka Andreas
hi,

i've made a dump of my bayes db but i don't
know exactly the columns.

please explain them.

thanks.
Andrew




---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk


Re: [SAtalk] Dump bayes db please explain the columns

2004-01-19 Thread Pedro Sam
On January 19, 2004 04:25 am, Mrvka Andreas wrote:
 hi,

 i've made a dump of my bayes db but i don't
 know exactly the columns.

 please explain them.

I ran into the same problem, and was unable to find any documentation ... but 
here is my guess of what the columns mean:

sa-learn --dump data | sort -n  /tmp/asdf

I sorted the output for a reason:

0.995 10  0 1074305205  U*p6618-qp2sam
0.995 10  0 1074305205  sk:p6618-q
0.995 10  0 1074308138  270
0.995 10  0 1074308138  avoiding
0.995 10  0 1074308138  elsewhere
0.995 10  0 1074310744  Forfeiture
0.995 10  0 1074310744  Notify
0.995 10  0 1074310744  g2.gif

1st: low equals hammy, high equals spammy
2nd: roughly equal # of occurrence of that particular token learnt as spam
3rd: roughly equal # of occurrence of that particular token learnt as ham
4th: # of seconds since 1970 ... (a Unix tradition of measuring time in # of 
seconds since 1970)
5th: the token itself.

Disclaimer: I'm not a developer ...

Pedro

-- 
If I had any humility I would be perfect.
-- Ted Turner


---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk


AW: [SAtalk] Dump bayes db please explain the columns

2004-01-19 Thread mrv
hi,

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
Gesendet: Montag, 19. Jänner 2004 11:20

I ran into the same problem, and was unable to find any
documentation ... but
here is my guess of what the columns mean:

sa-learn --dump data | sort -n  /tmp/asdf

I sorted the output for a reason:

0.995 10  0 1074305205  U*p6618-qp2sam
0.995 10  0 1074305205  sk:p6618-q
0.995 10  0 1074308138  270
0.995 10  0 1074308138  avoiding
0.995 10  0 1074308138  elsewhere
0.995 10  0 1074310744  Forfeiture
0.995 10  0 1074310744  Notify
0.995 10  0 1074310744  g2.gif

1st: low equals hammy, high equals spammy
2nd: roughly equal # of occurrence of that particular token
 learnt as spam
3rd: roughly equal # of occurrence of that particular token
 learnt as ham
4th: # of seconds since 1970 ... (a Unix tradition of
 measuring time in # of seconds since 1970)
5th: the token itself.

Disclaimer: I'm not a developer ...

Pedro


i agree with you, i would call the 2nd and 3rd columns 'multiplier'.

if a token is matched as spam
(e.g.: 0.995 10  0 1074310744  Notify)

then i have to multiply the number of the 2nd column 10 with 0.995
and add it to the summary of bayes value.

if the token is matched as ham then i multiply it again with third
column but i decrease it from the bayes value.

do you know what i mean?

i hope it will be this way as i mentioned it.

AND that i can add / remove tokens if i wish.

regards
Andrew




---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk


Re: [SAtalk] Dump bayes db please explain the columns

2004-01-19 Thread Theo Van Dinter
On Mon, Jan 19, 2004 at 05:19:46AM -0500, Pedro Sam wrote:
 I ran into the same problem, and was unable to find any documentation ... but 
 here is my guess of what the columns mean:

yeah, the dumps weren't meant for general viewing (more of a debug thing),
so they're not very verbose.

 0.995 10  0 1074305205  U*p6618-qp2sam
 0.995 10  0 1074305205  sk:p6618-q
 0.995 10  0 1074308138  270
 0.995 10  0 1074308138  avoiding
 0.995 10  0 1074308138  elsewhere
 0.995 10  0 1074310744  Forfeiture
 0.995 10  0 1074310744  Notify
 0.995 10  0 1074310744  g2.gif
 
 1st: low equals hammy, high equals spammy

probability of spam.  0.5 is unknown, below is hammy, above is spammy.

 2nd: roughly equal # of occurrence of that particular token learnt as spam

actually exactly equal.

 3rd: roughly equal # of occurrence of that particular token learnt as ham

also exactly equal.

 4th: # of seconds since 1970 ... (a Unix tradition of measuring time in # of 
 seconds since 1970)

yep.  it's atime, aka the last time the token was accessed (either
learning or scanning).  generally useless except during expiry.

 5th: the token itself.

yep.

-- 
Randomly Generated Tagline:
If the future navigation system [for interactive networked services on
 the NII] looks like something from Microsoft, it will never work.
 (Chairman of Walt Disney Television  Telecommunications)


pgp0.pgp
Description: PGP signature


Re: [SAtalk] Dump bayes db please explain the columns

2004-01-19 Thread Matt Kettler
At 04:25 AM 1/19/2004, Mrvka Andreas wrote:
hi,

i've made a dump of my bayes db but i don't
know exactly the columns.
please explain them.

thanks.
Andrew
Let's use this fictitious example line:

0.029  0  2 1071094490  word

The above line indicates:
 0.029: the calculated spam probability is 0.029 (aka 2.9%) for this token.
 0  this has been seen 0 times in spam training
 2  this token has been seen 2 tines in nonspam training
 1071094490 a timestamp, used when doing expiry so that the oldest 
are the ones pushed out.
 word   the token itself



---
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
___
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk