[EMAIL PROTECTED] wrote:
>> * UTF-8/Unicode support                                         ?
>> * Character-Set translation                                     ?
>  The two are indeed related. The iconv/iconvdata functions and tables
> that come with glibc-2.1 provide the most complete set I've found. Here is
> the list of conversion tables available:

The two *are* related, but need not be related. If it seems like a good
idea, it's easy enough to put in a character-set translation using the
HtWordCodec code. Basically, it could load a translation table from disk
and "encode" from one set to another.

>   I think the internal charset of choice must be UTF8 because it is ascii
> compatible and uses 8 bits chars instead of 16 bits chars.

Actually Andrew had a good point about this. The Java code keeps
everything as 16-bit internally, assumedly as Unicode. This requires
some minimal translation from ASCII, but avoids the complexity of UTF-8
encoding.

>   The work involved to use this is merely porting. At present the iconv
> functions of glibc are compiled within glibc. They must be ported to
> a separate library for portability along with all the string manipulation
> routines that are able to deal with UTF8. This requires some work but no
> need to actually write code, i.e. no hard debugging process.

I wouldn't mind the glibc code available in a separate library for
Unicode. However, there's also 
a library and a set of utilities under BSD license:
http://www.whizkidtech.net/i18n/

I just don't know how good it is--I don't have the time to take more
than a cursory look. It looks pretty good. But unless someone tells me
"I took a look and I think it's doable," it's going to fall by the
wayside.

-- 
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to