On Wed, 28 Feb 2007 16:03:08 +0100
Gianluigi Tiesi <[EMAIL PROTECTED]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Tomasz Kojm wrote:
> > On Wed, 28 Feb 2007 15:21:52 +0100
> > Gianluigi Tiesi <[EMAIL PROTECTED]> wrote:
> > 
> >>>> I've noticed it too, in my port I have changed it to:
> >>>>
> >>>> if(!(iscntrl(buf[i]) || isprint(buf[i])) || !internat[buf[i] & xff])
> >>> This one is much worse because it will lead to many false nagatives with
> >>> HTML and mail files.
> >>>
> >> yes so I've never posted it as official patch,
> >> btw I do the check for whole magic buffer (150?) to be more realable
> >> also I've noticed the internat table is quite different from the one in
> >> file (magic) utility.
> > 
> > In your case checking more data will only increase the chance for a false
> > negative. After your change the first condition (i.e. !(iscntrl(buf[i]) ||
> > isprint(buf[i]))) will disqualify LOTS (more than 100 for sure) of
> > characters which can be valid international chars.
> > 
> 
> So what we can use for the better (or at least optimal) way to guess the
> kind of data (rather than having a always true/false check)? isprint

First of all, you should drop your change which is erroneous and for now I'd
strongly suggest to classify all unknown data as CL_TYPE_UNKNOWN_TEXT.

We will address this issue in the near future and depending on the results of
regression testing decide which way to go.

-- 
   oo    .....         Tomasz Kojm <[EMAIL PROTECTED]>
  (\/)\.........         http://www.ClamAV.net/gpg/tkojm.gpg
     \..........._         0DCA5A08407D5288279DB43454822DC8985A444B
       //\   /\              Wed Feb 28 16:40:04 CET 2007
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Reply via email to