My suggestion is go the glib/gtk approach and use utf-8 everywhere and have the API accept char*, i.e. there is no typedef for a unicode character strings. If this is not acceptable because of speed (this is its only tradeoff), then use UCS-4 internally and provide two external interfaces for UCS-4 and UTF-8. For backwards compatibility you can provide your own iso-8859-8 to utf8 conversion functions. I suggest that you don't add an iconv dependence but let the user take care of character set conversions, which you don't really care about.
Regards, Dov 2012/3/12 Elazar Leibovich <elaz...@gmail.com> > The simplest option is, to accept StringPiece-like structure (pointer to > buffer + size), and encoding, then to convert the data internally to your > encoding (say, ISO-8859-8, replacing illegal characters with whitespace), > and convert the other output back. > > Do you mind using iconv-like library? > > > On Mon, Mar 12, 2012 at 3:05 PM, Nadav Har'El <n...@math.technion.ac.il>wrote: > >> Hi, I have a question that I was sort of sad that I couldn't readily >> find the answer to... >> >> Let's say I want to create a C API (a C library), with functions which >> take strings as arguments. What am I supposed to use if I want these >> strings >> to be in any language? Obviously the answer is "Unicode", but that >> doesn't really answer the question... How is Unicode used in C? >> >> As far as I can see, there are two major approaches to this problem. >> >> One approach, used in the Win32 C APIs on MS-Windows, and also in Java and >> other languages, is to use "wide characters" - characters of 16 or 32 bit >> size, and strings are an array of such characters. >> >> The second approach, proposed by Plan 9, is to use UTF-8. >> >> I personally like better the UTF-8 approach, because it naturally fits >> with C's "char *" type and with Linux's system calls (which take char*, >> not any sort of wide characters), but I'm completely unsure that this is >> what users actually want. If not, then I wonder, why? >> >> Some background on this question: People have been complaining for years >> that Hspell, and in particular the libhspell functions, use ISO-8859-8 >> instead of "unicode". But if one wants to add unicode to libhspell, what >> should it be? UTF-8? Wide chars (UTF-16 or UTF-32)? >> >> Thanks, >> Nadav. >> >> -- >> Nadav Har'El | Monday, Mar 12 >> 2012, >> n...@math.technion.ac.il >> |----------------------------------------- >> Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if >> we knew >> http://nadav.harel.org.il |how to make AOL's Free CD's edible! >> >> _______________________________________________ >> Linux-il mailing list >> Linux-il@cs.huji.ac.il >> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il >> > > > _______________________________________________ > Linux-il mailing list > Linux-il@cs.huji.ac.il > http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il > >
_______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il