Re[2]: problems with bayesian filter

2003-03-31 Thread Alexey N. Vinogradov
Hello, rhabib001. 
You wrote in mid:[EMAIL PROTECTED]


ryc One thing I don't understand about the log is that if multiple
ryc messages are downloaded, the log doesn't reflect this.  Is this a bug?

As I understand myself, The Bat can call multiple instances of
filtering procedure in a time. For this reason it give to every call a
number, usually begins from 1001. By this reason I write into log also
this unique number (it is unique in the bounds of current The Bat!
session). You can see this numbers in the beginning of every line
logged during mailcheck. And there are no such lines for global plugin
functions like getname or getversion. Your log has combined the
logs of first version and last, so there are no such numbers at all
at the beginning of the log.

 If any of them includes a spam then the really problem is in your
 regarding base - either you confused spam and non-spam corpuses when
 you create your regarding base, either your regarding base is not
 enough yet.

ryc I have trained the good dictionary on 745 letters, but the spam
ryc dictionary on only 35 letters.  Could this be the problem (I have
ryc attached the regard.rdb file).

This is the feature of method itself - you can investigate it from
mathematically viewpoint - the numbers of spam and non-spam base
(counted in letters) ought to be equal. Simple speaking, your base
very well known what is not-spam, but has a relative hazy idea of
what is spam. You need more spam to work, - but this is total problem
with this method of filtering! You can, of course, download somewhere
a base with spam, but the problem is that in different countries spam
is different. Main grain of this method is that all user's regarding
bases are different, because their grades includes also knowledge of
concrete private user mail. So, it is very hard for spammers to cheat
many of such filters simultaneously. From the other hand, spam base
seems not to be such different from user to user, because spam is a
mass mailing. So you can ask a friend to send you many (real) spam and
make a better base. Or you can just take some good letters and make
a new base with relatively equal quantity of spam and non-spam.

-- 
Sincerely,
 Alexey.
Using TB 1.63b7 on WinXP SP1 Corp + MUI RU, spelling by ORFO2002
   mailto:[EMAIL PROTECTED]



Current version is 1.62 | Using TBDEV information:
http://www.silverstones.com/thebat/TBUDLInfo.html


Re[2]: problems with bayesian filter

2003-03-30 Thread Alexey N. Vinogradov
Hello, rhabib001. 
You wrote in mid:[EMAIL PROTECTED]


ryc I have trained the good dictionary on 745 letters, but the spam
ryc dictionary on only 35 letters.  Could this be the problem (I have
ryc attached the regard.rdb file).

Wow!!! I will see to your file, but, please, in future - sent all big
attachment DIRECTLY to me, to my private address. Now you sent it to
everybody in this list. I think, somebody will angry to you for it :)

I will answer quite latter...


-- 
Sincerely,
 Alexey.
Using TB 1.63b7 on WinXP SP1 Corp + MUI RU, spelling by ORFO2002
   mailto:[EMAIL PROTECTED]



Current version is 1.62 | Using TBDEV information:
http://www.silverstones.com/thebat/TBUDLInfo.html