Re[2]: Help with BayesIt tuning
Hello DZ-Jay, Sunday, August 15, 2004, 9:31:38 AM, you wrote: DJ Some time around 08/15/2004 09:23:46, I think I heard MikeD (3) say: Hello Andre, Sunday, August 15, 2004, 6:44:17 AM, you wrote: AW Have you deleted you spam and non-spam dictionary files when you AW upgraded? Funny, that. When I first upgraded I did not and it seemed to work fine ... until I rebooted. DJ Strange... rebooting shouldn't affect anything... Well I am guessing that because I had been running the old version of Bayesit earlier in the day, that it continued to use that until I rebooted. It is the only thing that I can think of that makes sense. After that, yes, I deleted all the dict files I could find. Apparently there were two sets, one from the old version and one set from the new. DJ I had to do the same thing when upgrading from v0.4gm to DJ v0.5.4 because I was having problems. I then re-trained it on the accumulated spam and ham folders I have with about 2,000 messages each. BTW, If I give Bayesit all 2,000 messages at once to chew on, it would hang. If I gave it in chunks it seemed to work OK shrug DJ Hum... after deleting the dict files, I trained normally with DJ lots of spam/non-spam messages (I'm pretty sure it was more than DJ 2,000) without a problem. So I don't know what could have DJ happened in your case (?) DJ I personally find BayesIt extremely powerful, accurate, and DJ fast (I come from POPFile, with an accuracy of 99.6 % which DJ required a LOT of manual tuning, had quite some false positives, DJ and was VERY slow...), but what it misses it *really* misses (0%, DJ as opposed to some mid-way value). I have used several 0.4 versions and they worked great, so I am guessing that I just need to 'fix' a setting somewhere ... or at least I hope that is it g -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello DZ-Jay, Sunday, August 15, 2004, 10:20:52 AM, you wrote: DJ Some time around 08/15/2004 09:24:56, I think I heard MikeD (3) say: DJ I was too. I just upgraded yesterday to 0.5.9 and I haven't DJ noticed a difference. It does provide a white/black list, which I DJ don't care to use because it defeats the purpose of a Bayesian DJ filter (there's huge discussion -- more like religious wars -- DJ about this on the POPFile list hehe). Also, the kludges.txt file DJ doesn't seem to be implemented either (ignore list for headers). That's too bad sigh DJ I just learned (by re-reading a babelfished translation of DJ the russian BayesIt page) that the kludges file (whitelist of DJ kludges) does seem to work, except I misunderstood it. I thought DJ it worked like POPFile's ignore list, which ignores the DJ specified tokens when computing the probability of a message. But DJ it is not a list of just tokens, it is a list of header names DJ that will be ignored, for example, if you put in the list: DJ message-id DJ x-mailer DJ subject DJ If will ignore the values of headers that start with those DJ strings. This is very useful, though. DJ I wonder, is the ignore list in the black/white list rules DJ window what I confused the kludges list for? i.e. is it akin to DJ the POPFile ignore list? Anybody know? Hmmm ... does it just ignore those 'lines' in the header? If so, I don't think that will be a problem for me. My Kludges contains: x-spam-checker-version x-spam-level x-spam-report x-spam-status x-uidl And I don't think any of those are causing a problem. -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello George, Sunday, August 15, 2004, 11:35:29 AM, you wrote: GM DZ-Jay wrote: DJ Some time around 08/15/2004 11:13:49, I think I heard Stuart Cuddy say: What is Graham? What is Spam-grade? DJ AFAIK, spam-grade would be the probability of it being spam, and DJ Graham, I suppose, means the probability of it being not-spam (I DJ suppose, non-spam-grade ham-grade graham ?) GM It might be coincidence, but Paul Graham has written much about GM Bayesian filtering. I'd guess it has something to do with his GM methodology. Even if I'm wrong, there's some interesting reading at: GM http://www.paulgraham.com/antispam.html Yes, Paul uses a slightly modified algorithm from the original Bayes. So does that mean it is calculating using both algorithms to create two values? -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello DZ-Jay, Sunday, August 15, 2004, 1:47:45 PM, you wrote: It might be coincidence, but Paul Graham has written much about Bayesian filtering. I'd guess it has something to do with his methodology. Even if I'm wrong, there's some interesting reading at: http://www.paulgraham.com/antispam.html DJ Thanx for the info... that would make more sense, although DJ how come the spam-grade and graham values coinside in all messages DJ without exception? I guess I'll ask Alexey about it. In the DJ meantime, I'll check out the link you sent :) Does Alexey not frequent this list? It would sure be helpful if he could answer directly. Does anyone know how we can continue this conversation directly with him? -- Stuartmailto:[EMAIL PROTECTED] Using The Bat! v2.13 Lucky Beta/5 on Windows 98 4.10 Build A Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello Andre, Sunday, August 15, 2004, 6:44:17 AM, you wrote: AW Hello MikeD, AW On 14 Aug 2004 at 14:47:24 -0500 GMT [21:47 CEST] you wrote: AW Have you deleted you spam and non-spam dictionary files when you AW upgraded? Funny, that. When I first upgraded I did not and it seemed to work fine ... until I rebooted. After that, yes, I deleted all the dict files I could find. Apparently there were two sets, one from the old version and one set from the new. I then re-trained it on the accumulated spam and ham folders I have with about 2,000 messages each. BTW, If I give Bayesit all 2,000 messages at once to chew on, it would hang. If I gave it in chunks it seemed to work OK shrug -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello DZ-Jay, Sunday, August 15, 2004, 8:12:23 AM, you wrote: DJ Some time around 08/14/2004 22:24:58, I think I heard MikeD (3) say: What settings are you using? Under the old version (0.4gm) I had it trained and was getting most spam caught, no false positives with a Move message setting of 10. Now I have gone down as low as 1 and as high as 99 without success. DJ I started with the move message setting at 40 and continued DJ to lowered it without noticing any effect. That's when I checked DJ the BAYESIT.LOG file and realized that all messages are marked DJ with either 100/99 % or 0% probability, which means that no matter DJ how low I set the parameter, it will continue working the same. I DJ don't understand how come there is no gray area, with messages DJ marked with a, say, 30% probability, etc. I do not get any false DJ positives at all, but I do get about 4% of false negatives... At the moment, everything in the log is .99. Nothing has any other value. Does that sound right? BTW, I am using the 0.5.5 verision that came with 2.12. Should I be using the newer version that I saw mentioned? DJ I was too. I just upgraded yesterday to 0.5.9 and I haven't DJ noticed a difference. It does provide a white/black list, which I DJ don't care to use because it defeats the purpose of a Bayesian DJ filter (there's huge discussion -- more like religious wars -- DJ about this on the POPFile list hehe). Also, the kludges.txt file DJ doesn't seem to be implemented either (ignore list for headers). That's too bad sigh -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Sunday, August 15, 2004, 7:44:17 AM, you wrote: AW Hello MikeD, AW On 14 Aug 2004 at 14:47:24 -0500 GMT [21:47 CEST] you wrote: AW Have you deleted you spam and non-spam dictionary files when you AW upgraded? What are their names and where are they? -- Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello DZ-Jay, Sunday, August 15, 2004, 9:25:14 AM, you wrote: DJ However, I recently noticed why some obviously spam messages DJ are given a probability of 0%: Apparently the analysis engine is DJ regarding a few empty tokens with a value of 0%, which DJ unspamifies the final value, for example, in my log file, I get DJ this in some messages: I am not seeing the empty tokens, but the following message is being received without being caught. I sent it again to myself about 5 or 6 times and marked it as junk each time. The values do not seem to change at all. What is Graham? What is Spam-grade? [EMAIL PROTECTED] Graham: 7.59688e-029 Spam-grade: 7.59688e-029 Value for The Bat!: 0 : --- biz: 0.01 --: 0.0212766 size: 0.01 Advance: 0.01 H this: 0.058463 partners: 0.01 Today: 0.01 H PLease: 0.01 H de: 0.0359281 Career: 0.01 text: 0.01 experience: 0.0133407 aol: 0.01 Verdana: 0.01 past: 0.01 -- Stuartmailto:[EMAIL PROTECTED] Using The Bat! v2.13 Lucky Beta/5 on Windows 98 4.10 Build A Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Sunday, August 15, 2004, 11:11:00 AM, you wrote: DJ Some time around 08/15/2004 10:52:14, I think I heard Pete Holsberg say: Sunday, August 15, 2004, 7:44:17 AM, you wrote: AW Hello MikeD, AW On 14 Aug 2004 at 14:47:24 -0500 GMT [21:47 CEST] you wrote: AW Have you deleted you spam and non-spam dictionary files when you AW upgraded? What are their names and where are they? DJ Their names are spamdict.* and nspamdict.* and they are located in a directory DJ called base within the BayesIt working directory, which is normally either: DJ TB! installation dir\BayesIt\base DJ or DJ TB! installation dir\MAIL\BayesIt\base ??? Mine are in C:\Documents and Settings\pjh\Application Data\BayesIt\base TB is in C:\Program Files\The Bat!\thebat.exe and BayesIt is in C:\Program Files\BayesIt under Windows 2000. Is this significant? -- Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Saturday, August 14, 2004, 12:27:41 PM, you wrote: AW Hello DZ-Jay, AW On 14 Aug 2004 at 11:30:32 -0400 GMT [17:30 CEST] you wrote: DJ Yes, I am aware of its definition, but what I don't understand DJ is what would be the effect of changing it to, say, 1.2 from 1.5 DJ (apart from the academic answer of making non-spam tokens a bit less DJ heavier). How does the plugin use this value? AW Assume a word orccurs equally often in spam and non-spam mails. If you AW set the value to 1 the word will get a spam propability of 0.5. If you AW set it to a higher value the word will get something lower than 0.5. AW Words in non-spam mails just count more and you can set just how much AW more. Where do you do the setting??? -- Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Saturday, August 14, 2004, 2:37:03 PM, you wrote: DJ Some time around 08/14/2004 12:34:07, I think I heard Pete Holsberg say: Where do you do the setting??? DJ In a file called ADVANCED.INI in the BayesIt working directory, or DJ in the TB! installation directory. Not found anywhere on either HD! Can it be created manually?? -- Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello All, I have been following this thread since I have been having some problems too. I was using the old version (0.4gm) until I upgraded to the current version of TB. The settings I used to use don't seem to work any more and I either get everything filtered as junk or nothing is filtered as junk. I trained it with about 2000 spam and 2000 ham messages and still no joy. I have tried low threshold numbers and high with out much difference. Is there a good getting started file somewhere that I have just missed? -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html
Re[2]: Help with BayesIt tuning
Hello DZ-Jay, Saturday, August 14, 2004, 5:31:59 PM, you wrote: DJ Some time around 08/14/2004 15:47:24, I think I heard MikeD say: The settings I used to use don't seem to work any more and I either get everything filtered as junk or nothing is filtered as junk. I trained it with about 2000 spam and 2000 ham messages and still no joy. I have tried low threshold numbers and high with out much difference. DJ That's pretty much what I get: messages are either DJ COMPLETELY spam (99 or 100 % probability) or COMPLETELY not-spam DJ (0% probability). Although mine seems to catch most (~97%) of DJ spam, out of a few hundred emails daily, so its not that bad. And DJ that's with the default settings. I'm trying to tune it to get it DJ a bit higher in accuracy, if possible, but can't seem to get much DJ help on this subject :( What settings are you using? Under the old version (0.4gm) I had it trained and was getting most spam caught, no false positives with a Move message setting of 10. Now I have gone down as low as 1 and as high as 99 without success. BTW, I am using the 0.5.5 verision that came with 2.12. Should I be using the newer version that I saw mentioned? -- Best regards, MikeDmailto:[EMAIL PROTECTED] Using The Bat! v2.12.00 on Windows ME 4.90 Build 3000 Current version is 2.12.00 | 'Using TBUDL' information: http://www.silverstones.com/thebat/TBUDLInfo.html