At 1:54 PM +0900 3/23/00, Oskar Bartenstein wrote:
>Yes in general a character is not a byte. Still dont see,
>at least for clean encodings like EUC, where this difference
>should break the workings of htdig?

No, 8-bit encodings will probably work fine. But is there actually an 
8-bit encoding for Chinese?

>  > >A correct HTML page includes info about its encoding, therefore
>  > >htdig on the receiving end can convert it to any code it likes.
>  >
>  > Yes, provided that it has code to convert from one encoding into
>  > another. :-) This is the crux of the problem.
>
>I would use an external converter. There is good code, e.g.
>nkf, tcs, many others. See http://ftp.monash.edu.au/pub/nihongo/

I would disagree--I'd use library code like iconv() that's in later 
versions of the glibc. This has recently been packaged with some 
other nice UTF8/Unicode support into a separate, platform-independent 
library.

>A person who carefully serves an international audience will include
>something like this example for EUC:
><META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=x-euc-jp">
>to allow a browser to display the page properly.

Sure, but as you say, this requires the HTML.cc parser to read those 
tags. :-) It also needs to recognize them when they come from the 
server itself.

>3 - Determine if the cgi input is understood by htsearch as it is,
>     or also needs special attention?

I don't know, it would require considerable testing.

I guess my point is that you can push htdig into other character 
sets, but this isn't the best solution all-around. I've seen it work 
on an 8-bit Korean charset (I don't remember what that was), but it 
should really have built-in charset conversion and full 
wide-character support. This would help considerably, esp. in a few 
languages (Russian springs to mind) where people serve the same page 
in multiple character sets.

Regards,

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to