At 1:54 PM +0900 3/23/00, Oskar Bartenstein wrote:
>Yes in general a character is not a byte. Still dont see,
>at least for clean encodings like EUC, where this difference
>should break the workings of htdig?
No, 8-bit encodings will probably work fine. But is there actually an
8-bit encoding for Chinese?
> > >A correct HTML page includes info about its encoding, therefore
> > >htdig on the receiving end can convert it to any code it likes.
> >
> > Yes, provided that it has code to convert from one encoding into
> > another. :-) This is the crux of the problem.
>
>I would use an external converter. There is good code, e.g.
>nkf, tcs, many others. See http://ftp.monash.edu.au/pub/nihongo/
I would disagree--I'd use library code like iconv() that's in later
versions of the glibc. This has recently been packaged with some
other nice UTF8/Unicode support into a separate, platform-independent
library.
>A person who carefully serves an international audience will include
>something like this example for EUC:
><META HTTP-EQUIV="Content-Type" CONTENT="text/html;CHARSET=x-euc-jp">
>to allow a browser to display the page properly.
Sure, but as you say, this requires the HTML.cc parser to read those
tags. :-) It also needs to recognize them when they come from the
server itself.
>3 - Determine if the cgi input is understood by htsearch as it is,
> or also needs special attention?
I don't know, it would require considerable testing.
I guess my point is that you can push htdig into other character
sets, but this isn't the best solution all-around. I've seen it work
on an 8-bit Korean charset (I don't remember what that was), but it
should really have built-in charset conversion and full
wide-character support. This would help considerably, esp. in a few
languages (Russian springs to mind) where people serve the same page
in multiple character sets.
Regards,
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.