A very important activity in this area can be caught at
http://www.w3.org/International/
a quick look at this will indicate the work that is being done
in indexing and string matching with Unicode.
PS. Unicode has to be the way to proceed.
> -----Original Message-----
> From: Geoff Hutchison [SMTP:[EMAIL PROTECTED]]
> Sent: Wednesday, October 28, 1998 5:29 AM
> To: Jeff Breidenbach
> Cc: [EMAIL PROTECTED]
> Subject: Re: htdig: Chinese Support
>
> At 10:25 PM -0500 10/27/98, Jeff Breidenbach wrote:
> >>process single double character is not enough, we MUST implement a
> >>mechanics to detect Chinese Words.
>
> >Are you saying Chinese support would be specific to Chinese and
> >not automatically support Korean, Japanese, and other two byte (or
> >Unicode) character sets?
>
> I'm certainly not familiar enough with Chinese or storage of Chinese
> documents to know. It seems to imply a requirement to "detect Chinese
> words." Granted, changes to the String class and other double-byte
> character changes, etc., that I mentioned earlier would make supporting
> other languages much easier.
>
> If supporting Unicode would help (or obsolete) any changes needed for a
> specific character set, I think that's the best bet. We currently have
> problems with some accented high-byte characters too.
>
> Does anyone know of programs that parse double-byte text files. It need
> not
> be HTML or SGML, but it would be useful to look at example code.
>
>
> -Geoff Hutchison
> Williams Students Online
> http://wso.williams.edu/
>
>
> ----------------------------------------------------------------------
> To unsubscribe from the htdig mailing list, send a message to
> [EMAIL PROTECTED] containing the single word "unsubscribe" in
> the body of the message.
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.