Hi, folks.  I stumbled onto a bug in WordList::valid_word() on Friday,
and after looking into it, one thing led to another, so I've made some
fairly significant changes to this function.

As a result, could all of you who are testing out the 3.1.2 pre-release
that Geoff announced last week please try this patch, or the latest
htdig3-1-x CVS source tree to make sure I didn't break something else.

My concern is the switch to iscntrl(), which I think is a better test for
control characters than the previous *word < ' ' test which misses some.
However, on systems with broken locales this could potentially lead to
further problems with indexing other languages, because the whole upper
half of the character set may be treated as control.  As a solution to
that, I've also added else clauses so that if HtIsStrictWordChar() accepts
the character, it won't test to see if iscntrl() would reject it.  I also
realised that the earlier switch from isalpha() to HtIsStrictWordChar()
would allow digits, even if allow_numbers was false, so I added an extra
test to prevent that.  I'd appreciate extra eyeballs looking this over.

Thanks!

--- htcommon/WordList.cc.old    Tue Mar 23 17:17:31 1999
+++ htcommon/WordList.cc        Mon Apr 19 15:47:34 1999
@@ -107,17 +107,18 @@ int WordList::valid_word(char *word)
 
     while (word && *word)
     {
-      if (HtIsStrictWordChar((unsigned char)*word))
+      if (HtIsStrictWordChar((unsigned char)*word) && !isdigit(*word))
        {
            alpha = 1;
-           break;
+           // break;   /* Can't stop here, there may still be control chars! */
        }
-      if (allow_numbers && isdigit(*word))
+      else if (allow_numbers && isdigit(*word))
        {
          alpha = 1;
-         break;
+         // break;     /* Can't stop here, there may still be control chars! */
        }
-      if (*word >= 0 && *word < ' ')
+//    if (*word >= 0 && *word < ' ')
+      else if (iscntrl(*word))
        {
            control = 1;
            break;


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to