At 17:47 12.01.00 -0600, you wrote:
>At 6:49 PM +0100 1/12/00, Marc Pohl wrote:
>>i reviewed the sourcecode for htdig-3.2.0b1-dev-010900 this weekend 
>>and discovered that there could be similar errors in 
>>htword/WordType.cc because of signed char to int casts. The exactly 
>>same error cannot happen because the iscntrl() is in the else branch 
>>of IsStrictChar() in 3.2.
>
>Could you also post your original patch to 3.1.4 with diff -c as 
>well? I'd like to have it on the [EMAIL PROTECTED] lists because I 
>think it will help some of these recent questions about indexing and 
>searching foreign characters.
>
>>My proposed patch is the following snippet, introducing two new 
>>member functions to WordType, instead of calling isdigit() and 
>>iscntrl() directly.
>
>This looks fine to me. Since it's a bug-fix, unless I hear screams of 
>protest, it's going in sometime tomorrow.
>
>-Geoff
>

Hello Geoff,

Yesterday i found a small potential problem in the patched code: 
At the beginning of the initialisation of WordType is the line 
chrtypes[0] = 0;
Because we never call iscntrl(0) this line must be
chrtypes[0] = WORD_TYPE_CONTROL;

During my tests this make no difference, but i think that i don't have any unwanted #0 
in my html-docs.

Marc


And here is my patch against the version 3.1.4:

*** WordList.cc.orig    Fri Dec 10 01:28:44 1999
--- WordList.cc Thu Jan 13 20:23:29 2000
***************
*** 108,125 ****
  
      while (word && *word)
      {
!       if (HtIsStrictWordChar((unsigned char)*word) && !isdigit(*word))
        {
            alpha = 1;
            // break;   /* Can't stop here, there may still be control chars! */
        }
!       else if (allow_numbers && isdigit(*word))
        {
          alpha = 1;
          // break;     /* Can't stop here, there may still be control chars! */
        }
  //    if (*word >= 0 && *word < ' ')
!       else if (iscntrl(*word))
        {
            control = 1;
            break;
--- 108,125 ----
  
      while (word && *word)
      {
!       if (HtIsStrictWordChar((unsigned char)*word) && !isdigit((unsigned char)*word))
        {
            alpha = 1;
            // break;   /* Can't stop here, there may still be control chars! */
        }
!       else if (allow_numbers && isdigit((unsigned char)*word))
        {
          alpha = 1;
          // break;     /* Can't stop here, there may still be control chars! */
        }
  //    if (*word >= 0 && *word < ' ')
!       else if (iscntrl((unsigned char)*word))
        {
            control = 1;
            break;


I hope that my email program will not mangle that ;-)



 -----------------------------------------------------

Marc Pohl
                                 Westdeutscher Rundfunk
Tel.:  +49 221  220 8618         OSC/Videotextredaktion
FAX:   +49 221  220 3882         D-50600 Koeln
Email: [EMAIL PROTECTED]



------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to