Hi,

the last weeks i wondered why htdig don't like any words with the german U umlaut 
(char 252) on my solaris server. All locale setting were correct and the same 
configuration runs on a linux box without any problems.

Today i discovered, that the reason for that is, that WordList::valid_word() is not 
8-bit-clean on Sun Solaris 2.6 !
(iscntrl(252) gets 1, but iscntrl((unsigned char)252) is 0)


The patch in htdig-3.1.4/htcommon/WordList.cc is easy:

111c111
<       if (HtIsStrictWordChar((unsigned char)*word) && !isdigit(*word))
---
>       if (HtIsStrictWordChar((unsigned char)*word) && !isdigit((unsigned char)*word))
116c116
<       else if (allow_numbers && isdigit(*word))
---
>       else if (allow_numbers && isdigit((unsigned char)*word))
122c122
<       else if (iscntrl(*word))
---
>       else if (iscntrl((unsigned char)*word))

Marc


Marc Pohl, Online-Service-Center, Westdeutscher Rundfunk, D-50600 Koeln
[EMAIL PROTECTED], +49 221 220 8618,  http://www.wdr.de/




------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to