Hi,

       Well, it may not be *your* conclusion, at least it's mine :-)

       http://www.alphaworks.ibm.com/tech/icu/

       Considering that simplicity is a must, I'd say that it
disqualifies IBM ICU library. Although very complete it's a typical
proprietary -> open source product : it's far too complex and requires
days to master. There is a nice documentation but it's 100
pages. Doing something simple is complex. It's also quite likely that
we won't get the usual free software feedback when submitting bugs or
suggesting directions. I doubt every developper on htdig will spend
three days reading the documentation before typing a unicode string
compare. (Just remember the pain to move to automake & libtool...).

       http://cvs.gnome.org/bonsai/rview.cgi?cvsroot=/cvs/gnome&dir=libunicode

       The alternative is libunicode then. Yes but it will require work before
we are able to use it. It's a C library that does not yet contain all the
functions for string manipulation. However, as Geoff said, Unicode integration
in htdig will not be for this version but for the next one. And there is
a good chance that libunicode is more mature at that time. To accelerate thing
my company will probably hire someone to work on it.

       http://www.gnu.org/software/libc/libc.html

       The third choice was glibc-2.1 iconv/wchar. The bottleneck is that
it requires a fair amount of work to transform it into a standalone library.
The code is more mature than libunicode but doing this we will indeed create
an alternate libunicode which seems like a silly thing to do. Looking more
closely at libunicode it appears that Tom Tromey is closely working with
glibc-2.1 when possible. And given the fact that he works for Cygnus support
leads me to think that he had good reasons to create libunicode instead of
porting the existing glibc-2.1 functions, although he did not yet explained
why he did so. 

       In short, I advocate that we chose to bet on libunicode. If everyone
agrees I'll integrate libunicode into the CVS tree. I won't touch the tree
in the meantime, don't worry :-)

       The other big issue regarding unicode is the regexp lib. There is
a big effort in perl (quite done, as far as I know). I have no idea where
the rx library is regarding this. I just checked and rx-1.5 although old
seems to be the latest. Is it possible to take advantage of the work done
in perl-5.006 ? News someone ?

       A minor issue that will be a big one is charset support in
databases.  No problem in Berkeley DB since the application provides
the comparison functions. If we're planing to use an SQL database,
this database will have to support unicode too. At least UTF-8 or
UTF-16 so that string comparisons is done the right way and select
comparison operators work. It's not a problem if they do not support
many charsets as long as they support one charset that can include all
others. I've posted a request regarding this using our contract
support with MySQL. Anyone aware of postgres status ?

        Thanks for listening,

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to