[SAtalk] Dump bayes db please explain the columns
hi, i've made a dump of my bayes db but i don't know exactly the columns. please explain them. thanks. Andrew --- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn ___ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
Re: [SAtalk] Dump bayes db please explain the columns
On January 19, 2004 04:25 am, Mrvka Andreas wrote: hi, i've made a dump of my bayes db but i don't know exactly the columns. please explain them. I ran into the same problem, and was unable to find any documentation ... but here is my guess of what the columns mean: sa-learn --dump data | sort -n /tmp/asdf I sorted the output for a reason: 0.995 10 0 1074305205 U*p6618-qp2sam 0.995 10 0 1074305205 sk:p6618-q 0.995 10 0 1074308138 270 0.995 10 0 1074308138 avoiding 0.995 10 0 1074308138 elsewhere 0.995 10 0 1074310744 Forfeiture 0.995 10 0 1074310744 Notify 0.995 10 0 1074310744 g2.gif 1st: low equals hammy, high equals spammy 2nd: roughly equal # of occurrence of that particular token learnt as spam 3rd: roughly equal # of occurrence of that particular token learnt as ham 4th: # of seconds since 1970 ... (a Unix tradition of measuring time in # of seconds since 1970) 5th: the token itself. Disclaimer: I'm not a developer ... Pedro -- If I had any humility I would be perfect. -- Ted Turner --- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn ___ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
AW: [SAtalk] Dump bayes db please explain the columns
hi, -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] Gesendet: Montag, 19. Jänner 2004 11:20 I ran into the same problem, and was unable to find any documentation ... but here is my guess of what the columns mean: sa-learn --dump data | sort -n /tmp/asdf I sorted the output for a reason: 0.995 10 0 1074305205 U*p6618-qp2sam 0.995 10 0 1074305205 sk:p6618-q 0.995 10 0 1074308138 270 0.995 10 0 1074308138 avoiding 0.995 10 0 1074308138 elsewhere 0.995 10 0 1074310744 Forfeiture 0.995 10 0 1074310744 Notify 0.995 10 0 1074310744 g2.gif 1st: low equals hammy, high equals spammy 2nd: roughly equal # of occurrence of that particular token learnt as spam 3rd: roughly equal # of occurrence of that particular token learnt as ham 4th: # of seconds since 1970 ... (a Unix tradition of measuring time in # of seconds since 1970) 5th: the token itself. Disclaimer: I'm not a developer ... Pedro i agree with you, i would call the 2nd and 3rd columns 'multiplier'. if a token is matched as spam (e.g.: 0.995 10 0 1074310744 Notify) then i have to multiply the number of the 2nd column 10 with 0.995 and add it to the summary of bayes value. if the token is matched as ham then i multiply it again with third column but i decrease it from the bayes value. do you know what i mean? i hope it will be this way as i mentioned it. AND that i can add / remove tokens if i wish. regards Andrew --- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn ___ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
Re: [SAtalk] Dump bayes db please explain the columns
On Mon, Jan 19, 2004 at 05:19:46AM -0500, Pedro Sam wrote: I ran into the same problem, and was unable to find any documentation ... but here is my guess of what the columns mean: yeah, the dumps weren't meant for general viewing (more of a debug thing), so they're not very verbose. 0.995 10 0 1074305205 U*p6618-qp2sam 0.995 10 0 1074305205 sk:p6618-q 0.995 10 0 1074308138 270 0.995 10 0 1074308138 avoiding 0.995 10 0 1074308138 elsewhere 0.995 10 0 1074310744 Forfeiture 0.995 10 0 1074310744 Notify 0.995 10 0 1074310744 g2.gif 1st: low equals hammy, high equals spammy probability of spam. 0.5 is unknown, below is hammy, above is spammy. 2nd: roughly equal # of occurrence of that particular token learnt as spam actually exactly equal. 3rd: roughly equal # of occurrence of that particular token learnt as ham also exactly equal. 4th: # of seconds since 1970 ... (a Unix tradition of measuring time in # of seconds since 1970) yep. it's atime, aka the last time the token was accessed (either learning or scanning). generally useless except during expiry. 5th: the token itself. yep. -- Randomly Generated Tagline: If the future navigation system [for interactive networked services on the NII] looks like something from Microsoft, it will never work. (Chairman of Walt Disney Television Telecommunications) pgp0.pgp Description: PGP signature
Re: [SAtalk] Dump bayes db please explain the columns
At 04:25 AM 1/19/2004, Mrvka Andreas wrote: hi, i've made a dump of my bayes db but i don't know exactly the columns. please explain them. thanks. Andrew Let's use this fictitious example line: 0.029 0 2 1071094490 word The above line indicates: 0.029: the calculated spam probability is 0.029 (aka 2.9%) for this token. 0 this has been seen 0 times in spam training 2 this token has been seen 2 tines in nonspam training 1071094490 a timestamp, used when doing expiry so that the oldest are the ones pushed out. word the token itself --- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn ___ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk