Geoff Hutchison writes:
 > 
 > The two *are* related, but need not be related. If it seems like a good
 > idea, it's easy enough to put in a character-set translation using the
 > HtWordCodec code. Basically, it could load a translation table from disk
 > and "encode" from one set to another.
 > 

 Right. If porting the glibc code related to this stuff we get both at
the same time. 

 > Actually Andrew had a good point about this. The Java code keeps
 > everything as 16-bit internally, assumedly as Unicode. This requires
 > some minimal translation from ASCII, but avoids the complexity of UTF-8
 > encoding.

 Again the glibc code has a whole set of wide characters handling
routines.  In fact the conversion tables all lead to wide
charset. Having UTF-8 as an external charset (used when storing
strings in bases, when read documents) would allow smooth
migration. Let's say that the word database handling code supports
UTF-8 but nothing else does.  We are 100% sure it won't break
anything. We can slowly migrate each set of functions to support UTF-8
without advertising it because it does not work as a whole. Then at
some point all functions/data will support UTF-8 and people who have
old databases containing ascii will not need to migrate to a wide
charset format. I think that's for all these reasons that Perl (for
instance) did chose UTF-8 as it's internal character set for the next
version). In short I suggest:

          . Using glibc i18n code -> standalone lib
          . Internal charset is wide (16 bits) UTF-16
          . External charset is UTF-8

 > I wouldn't mind the glibc code available in a separate library for
 > Unicode. However, there's also 
 > a library and a set of utilities under BSD license:
 > http://www.whizkidtech.net/i18n/

 I had a look at this when searching for i18n support. It only provides
a small subset of the functions provided by glibc, as far as i18n support is 
concerned. Besides, it does not conform to the standard (?) iconv functions.
 The other alternative is the use the i18n functions of mozilla. They
are even more versatile than the glibc functions. The only problem is that
extracting them from the mozilla environment to have a standalone library
seems (at least to me :-) to be a nightmare. See 
http://www.mozilla.org/docs/refList/i18n/i18n-guidelines.html for more.

 > I just don't know how good it is--I don't have the time to take more
 > than a cursory look. It looks pretty good. But unless someone tells me
 > "I took a look and I think it's doable," it's going to fall by the
 > wayside.

 i18n is definitely a very important issue. I can't handle it before end of 
September.

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to