At 9:38 AM +0900 3/23/00, Oskar Bartenstein wrote:
>Boils down to 2 questions (sorry I never looked at the source code):
> - is htdig 8-bit clean?
> - is htdig words and dictionaries sequences of bytes?
>If both is yes, then I would guess the core is ok,
>and we only have to look at how to use it properly.
>Hope I did not overlook a parsing issue.
It is 8-bit clean, but it treats characters as synonymous with 8
bits. Many parts of the code (the String class in particular) assume
that a character is only 1 byte and keeps going. In many encodings,
this is *not* the case, and so you're stuck.
>A correct HTML page includes info about its encoding, therefore
>htdig on the receiving end can convert it to any code it likes.
Yes, provided that it has code to convert from one encoding into
another. :-) This is the crux of the problem. Currently ht://Dig
assumes the host system has working locale support and is getting the
pages in the default encoding of the system. If they're not, it
assumes they are anyway. :-) It makes no attempt to convert character
encodings.
Basically, if you have an Latin-1 encoding for your character-set,
you're OK. That's the limit of the current i18n.
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.