On Wed, 28 Feb 2007 16:03:08 +0100 Gianluigi Tiesi <[EMAIL PROTECTED]> wrote:
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Tomasz Kojm wrote: > > On Wed, 28 Feb 2007 15:21:52 +0100 > > Gianluigi Tiesi <[EMAIL PROTECTED]> wrote: > > > >>>> I've noticed it too, in my port I have changed it to: > >>>> > >>>> if(!(iscntrl(buf[i]) || isprint(buf[i])) || !internat[buf[i] & xff]) > >>> This one is much worse because it will lead to many false nagatives with > >>> HTML and mail files. > >>> > >> yes so I've never posted it as official patch, > >> btw I do the check for whole magic buffer (150?) to be more realable > >> also I've noticed the internat table is quite different from the one in > >> file (magic) utility. > > > > In your case checking more data will only increase the chance for a false > > negative. After your change the first condition (i.e. !(iscntrl(buf[i]) || > > isprint(buf[i]))) will disqualify LOTS (more than 100 for sure) of > > characters which can be valid international chars. > > > > So what we can use for the better (or at least optimal) way to guess the > kind of data (rather than having a always true/false check)? isprint First of all, you should drop your change which is erroneous and for now I'd strongly suggest to classify all unknown data as CL_TYPE_UNKNOWN_TEXT. We will address this issue in the near future and depending on the results of regression testing decide which way to go. -- oo ..... Tomasz Kojm <[EMAIL PROTECTED]> (\/)\......... http://www.ClamAV.net/gpg/tkojm.gpg \..........._ 0DCA5A08407D5288279DB43454822DC8985A444B //\ /\ Wed Feb 28 16:40:04 CET 2007 _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net