Re[2]: Help with BayesIt tuning

2004-08-16 Thread MikeD (3)
Hello DZ-Jay,

Sunday, August 15, 2004, 9:31:38 AM, you wrote:

DJ Some time around 08/15/2004 09:23:46, I think I heard MikeD (3) say:
 Hello Andre,

 Sunday, August 15, 2004, 6:44:17 AM, you wrote:

AW Have you deleted you spam and non-spam dictionary files when you
AW upgraded?

 Funny, that.  When I first upgraded I did not and it seemed to work
 fine ... until I rebooted.

DJ Strange... rebooting shouldn't affect anything...

Well I am guessing that because I had been running the old version of
Bayesit earlier in the day, that it continued to use that until I
rebooted.  It is the only thing that I can think of that makes sense.

 After that, yes, I deleted all the dict files I could find.
 Apparently there were two sets, one from the old version and one set
 from the new.

DJ I had to do the same thing when upgrading from v0.4gm to
DJ v0.5.4 because I was having problems.

 I then re-trained it on the accumulated spam and ham folders I have
 with about 2,000 messages each.  BTW, If I give Bayesit all 2,000
 messages at once to chew on, it would hang.  If I gave it in
 chunks it seemed to work OK shrug

DJ Hum... after deleting the dict files, I trained normally with
DJ lots of spam/non-spam messages (I'm pretty sure it was more than
DJ 2,000) without a problem.  So I don't know what could have
DJ happened in your case (?)

DJ I personally find BayesIt extremely powerful, accurate, and
DJ fast (I come from POPFile, with an accuracy of 99.6 % which
DJ required a LOT of manual tuning, had quite some false positives,
DJ and was VERY slow...), but what it misses it *really* misses (0%,
DJ as opposed to some mid-way value).

I have used several 0.4 versions and they worked great, so I am
guessing that I just need to 'fix' a setting somewhere ... or at least
I hope that is it g

-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000
 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-16 Thread MikeD (3)
Hello DZ-Jay,

Sunday, August 15, 2004, 10:20:52 AM, you wrote:

DJ Some time around 08/15/2004 09:24:56, I think I heard MikeD (3) say:
DJ I was too.  I just upgraded yesterday to 0.5.9 and I haven't
DJ noticed a difference.  It does provide a white/black list, which I
DJ don't care to use because it defeats the purpose of a Bayesian
DJ filter (there's huge discussion -- more like religious wars --
DJ about this on the POPFile list hehe).  Also, the kludges.txt file
DJ doesn't seem to be implemented either (ignore list for headers).

 That's too bad sigh

DJ I just learned (by re-reading a babelfished translation of
DJ the russian BayesIt page) that the kludges file (whitelist of
DJ kludges) does seem to work, except I misunderstood it.  I thought
DJ it worked like POPFile's ignore list, which ignores the
DJ specified tokens when computing the probability of a message.  But
DJ it is not a list of just tokens, it is a list of header names
DJ that will be ignored, for example, if you put in the list:

DJ message-id
DJ x-mailer
DJ subject

DJ If will ignore the values of headers that start with those
DJ strings.  This is very useful, though.

DJ I wonder, is the ignore list in the black/white list rules
DJ window what I confused the kludges list for? i.e. is it akin to
DJ the POPFile ignore list?  Anybody know?

Hmmm ... does it just ignore those 'lines' in the header?  If so, I
don't think that will be a problem for me.  My Kludges contains:

x-spam-checker-version
x-spam-level
x-spam-report
x-spam-status
x-uidl

And I don't think any of those are causing a problem.

-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000
 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-16 Thread MikeD (3)
Hello George,

Sunday, August 15, 2004, 11:35:29 AM, you wrote:

GM DZ-Jay wrote:

DJ Some time around 08/15/2004 11:13:49, I think I heard Stuart Cuddy say:

What is Graham?
What is Spam-grade?

DJ AFAIK, spam-grade would be the probability of it being spam, and
DJ Graham, I suppose, means the probability of it being not-spam (I
DJ suppose, non-spam-grade  ham-grade  graham ?)

GM It might be coincidence, but Paul Graham has written much about
GM Bayesian filtering.  I'd guess it has something to do with his
GM methodology.  Even if I'm wrong, there's some interesting reading at:

GM http://www.paulgraham.com/antispam.html


Yes, Paul uses a slightly modified algorithm from the original Bayes.
So does that mean it is calculating using both algorithms to create
two values?

-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000
 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-16 Thread Stuart Cuddy
Hello DZ-Jay,
Sunday, August 15, 2004, 1:47:45 PM, you wrote:

 It might be coincidence, but Paul Graham has written much about
 Bayesian filtering.  I'd guess it has something to do with his
 methodology.  Even if I'm wrong, there's some interesting reading at:

 http://www.paulgraham.com/antispam.html

DJ Thanx for the info... that would make more sense, although
DJ how come the spam-grade and graham values coinside in all messages
DJ without exception?  I guess I'll ask Alexey about it.  In the
DJ meantime, I'll check out the link you sent :)

Does Alexey not frequent this list?  It would sure be helpful if he
could answer directly.

Does anyone know how we can continue this conversation directly with
him?


-- 
 Stuartmailto:[EMAIL PROTECTED]
Using The Bat! v2.13 Lucky Beta/5 on Windows 98 4.10 Build   A 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-15 Thread MikeD (3)
Hello Andre,

Sunday, August 15, 2004, 6:44:17 AM, you wrote:

AW Hello MikeD,

AW On 14 Aug 2004 at 14:47:24 -0500 GMT [21:47 CEST] you wrote:

AW Have you deleted you spam and non-spam dictionary files when you
AW upgraded?

Funny, that.  When I first upgraded I did not and it seemed to work
fine ... until I rebooted.

After that, yes, I deleted all the dict files I could find.
Apparently there were two sets, one from the old version and one set
from the new.

I then re-trained it on the accumulated spam and ham folders I have
with about 2,000 messages each.  BTW, If I give Bayesit all 2,000
messages at once to chew on, it would hang.  If I gave it in
chunks it seemed to work OK shrug

-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000
 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-15 Thread MikeD (3)
Hello DZ-Jay,

Sunday, August 15, 2004, 8:12:23 AM, you wrote:

DJ Some time around 08/14/2004 22:24:58, I think I heard MikeD (3) say:
 What settings are you using?  Under the old version (0.4gm) I had it
 trained and was getting most spam caught, no false positives with a
 Move message setting of 10.  Now I have gone down as low as 1 and as
 high as 99 without success.

DJ I started with the move message setting at 40 and continued
DJ to lowered it without noticing any effect.  That's when I checked
DJ the BAYESIT.LOG file and realized that all messages are marked
DJ with either 100/99 % or 0% probability, which means that no matter
DJ how low I set the parameter, it will continue working the same.  I
DJ don't understand how come there is no gray area, with messages
DJ marked with a, say, 30% probability, etc.  I do not get any false
DJ positives at all, but I do get about 4%  of false negatives...

At the moment, everything in the log is .99.  Nothing has any other
value.  Does that sound right?

 BTW, I am using the 0.5.5 verision that came with 2.12.  Should I be
 using the newer version that I saw mentioned?

DJ I was too.  I just upgraded yesterday to 0.5.9 and I haven't
DJ noticed a difference.  It does provide a white/black list, which I
DJ don't care to use because it defeats the purpose of a Bayesian
DJ filter (there's huge discussion -- more like religious wars --
DJ about this on the POPFile list hehe).  Also, the kludges.txt file
DJ doesn't seem to be implemented either (ignore list for headers).

That's too bad sigh


-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000
 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-15 Thread Pete Holsberg
Sunday, August 15, 2004, 7:44:17 AM, you wrote:

AW Hello MikeD,

AW On 14 Aug 2004 at 14:47:24 -0500 GMT [21:47 CEST] you wrote:

AW Have you deleted you spam and non-spam dictionary files when you
AW upgraded?

What are their names and where are they?


-- 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-15 Thread Stuart Cuddy
Hello DZ-Jay,
Sunday, August 15, 2004, 9:25:14 AM, you wrote:

DJ However, I recently noticed why some obviously spam messages
DJ are given a probability of 0%:  Apparently the analysis engine is
DJ regarding a few empty tokens with a value of 0%, which
DJ unspamifies the final value, for example, in my log file, I get
DJ this in some messages:

I am not seeing the empty tokens, but the following message is being
   received without being caught. I sent it again to myself about 5 or
   6 times and marked it as junk each time. The values do not seem to
   change at all.

   What is Graham?
   What is Spam-grade?
   

[EMAIL PROTECTED]
Graham:  7.59688e-029
Spam-grade:  7.59688e-029
Value for The Bat!: 0
: ---
biz:  0.01
--:  0.0212766
size:  0.01
Advance:  0.01
H this:  0.058463
partners:  0.01
Today:  0.01
H PLease:  0.01
H de:  0.0359281
Career:  0.01
text:  0.01
experience:  0.0133407
aol:  0.01
Verdana:  0.01
past:  0.01

-- 
 Stuartmailto:[EMAIL PROTECTED]
Using The Bat! v2.13 Lucky Beta/5 on Windows 98 4.10 Build   A 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-15 Thread Pete Holsberg
Sunday, August 15, 2004, 11:11:00 AM, you wrote:

DJ Some time around 08/15/2004 10:52:14, I think I heard Pete Holsberg say:
 Sunday, August 15, 2004, 7:44:17 AM, you wrote:

AW Hello MikeD,

AW On 14 Aug 2004 at 14:47:24 -0500 GMT [21:47 CEST] you wrote:

AW Have you deleted you spam and non-spam dictionary files when you
AW upgraded?

 What are their names and where are they?

DJ Their names are spamdict.* and nspamdict.* and they are located in a directory
DJ called base within the BayesIt working directory, which is normally either:

DJ TB! installation dir\BayesIt\base
DJ or
DJ TB! installation dir\MAIL\BayesIt\base


??? Mine are in C:\Documents and Settings\pjh\Application Data\BayesIt\base

TB is in C:\Program Files\The Bat!\thebat.exe and BayesIt is in C:\Program 
Files\BayesIt
under Windows 2000.

Is this significant?

-- 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-14 Thread Pete Holsberg
Saturday, August 14, 2004, 12:27:41 PM, you wrote:

AW Hello DZ-Jay,

AW On 14 Aug 2004 at 11:30:32 -0400 GMT [17:30 CEST] you wrote:

DJ Yes, I am aware of its definition, but what I don't understand
DJ is what would be the effect of changing it to, say, 1.2 from 1.5
DJ (apart from the academic answer of making non-spam tokens a bit less
DJ heavier).  How does the plugin use this value?

AW Assume a word orccurs equally often in spam and non-spam mails. If you
AW set the value to 1 the word will get a spam propability of 0.5. If you
AW set it to a higher value the word will get something lower than 0.5.
AW Words in non-spam mails just count more and you can set just how much
AW more.

Where do you do the setting???



-- 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-14 Thread Pete Holsberg
Saturday, August 14, 2004, 2:37:03 PM, you wrote:

DJ Some time around 08/14/2004 12:34:07, I think I heard Pete Holsberg say:

 Where do you do the setting???

DJ In a file called ADVANCED.INI in the BayesIt working directory, or 
DJ in the TB! installation directory.

Not found anywhere on either HD!

Can it be created manually??

-- 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-14 Thread MikeD
Hello All,

I have been following this thread since I have been having some
problems too.  I was using the old version (0.4gm) until I upgraded to
the current version of TB.

The settings I used to use don't seem to work any more and I either
get everything filtered as junk or nothing is filtered as junk.  I
trained it with about 2000 spam and 2000 ham messages and still no
joy.  I have tried low threshold numbers and high with out much
difference.

Is there a good getting started file somewhere that I have
just missed?

-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: Help with BayesIt tuning

2004-08-14 Thread MikeD (3)
Hello DZ-Jay,

Saturday, August 14, 2004, 5:31:59 PM, you wrote:

DJ Some time around 08/14/2004 15:47:24, I think I heard MikeD say:
 The settings I used to use don't seem to work any more and I either
 get everything filtered as junk or nothing is filtered as junk.  I
 trained it with about 2000 spam and 2000 ham messages and still no
 joy.  I have tried low threshold numbers and high with out much
 difference.

DJ That's pretty much what I get:  messages are either
DJ COMPLETELY spam (99 or 100 % probability) or COMPLETELY not-spam
DJ (0% probability).  Although mine seems to catch most (~97%) of
DJ spam, out of a few hundred emails daily, so its not that bad.  And
DJ that's with the default settings.  I'm trying to tune it to get it
DJ a bit higher in accuracy, if possible, but can't seem to get much
DJ help on this subject :(

What settings are you using?  Under the old version (0.4gm) I had it
trained and was getting most spam caught, no false positives with a
Move message setting of 10.  Now I have gone down as low as 1 and as
high as 99 without success.

BTW, I am using the 0.5.5 verision that came with 2.12.  Should I be
using the newer version that I saw mentioned?

-- 
Best regards,
 MikeDmailto:[EMAIL PROTECTED]
Using The Bat! v2.12.00 on Windows ME 4.90 Build  3000
 



Current version is 2.12.00 | 'Using TBUDL' information:
http://www.silverstones.com/thebat/TBUDLInfo.html