What's the advantage of using ucs-4 internally? Especially if the program needs to save memory (embedded devices are pretty common these days).
Ely 2012/3/12 Dov Grobgeld <dov.grobg...@gmail.com> > My suggestion is go the glib/gtk approach and use utf-8 everywhere and > have the API accept char*, i.e. there is no typedef for a unicode character > strings. If this is not acceptable because of speed (this is its only > tradeoff), then use UCS-4 internally and provide two external interfaces > for UCS-4 and UTF-8. For backwards compatibility you can provide your own > iso-8859-8 to utf8 conversion functions. I suggest that you don't add an > iconv dependence but let the user take care of character set conversions, > which you don't really care about. > > Regards, > Dov > > 2012/3/12 Elazar Leibovich <elaz...@gmail.com> > >> The simplest option is, to accept StringPiece-like structure (pointer to >> buffer + size), and encoding, then to convert the data internally to your >> encoding (say, ISO-8859-8, replacing illegal characters with whitespace), >> and convert the other output back. >> >> Do you mind using iconv-like library? >> >> >> On Mon, Mar 12, 2012 at 3:05 PM, Nadav Har'El >> <n...@math.technion.ac.il>wrote: >> >>> Hi, I have a question that I was sort of sad that I couldn't readily >>> find the answer to... >>> >>> Let's say I want to create a C API (a C library), with functions which >>> take strings as arguments. What am I supposed to use if I want these >>> strings >>> to be in any language? Obviously the answer is "Unicode", but that >>> doesn't really answer the question... How is Unicode used in C? >>> >>> As far as I can see, there are two major approaches to this problem. >>> >>> One approach, used in the Win32 C APIs on MS-Windows, and also in Java >>> and >>> other languages, is to use "wide characters" - characters of 16 or 32 bit >>> size, and strings are an array of such characters. >>> >>> The second approach, proposed by Plan 9, is to use UTF-8. >>> >>> I personally like better the UTF-8 approach, because it naturally fits >>> with C's "char *" type and with Linux's system calls (which take char*, >>> not any sort of wide characters), but I'm completely unsure that this is >>> what users actually want. If not, then I wonder, why? >>> >>> Some background on this question: People have been complaining for years >>> that Hspell, and in particular the libhspell functions, use ISO-8859-8 >>> instead of "unicode". But if one wants to add unicode to libhspell, what >>> should it be? UTF-8? Wide chars (UTF-16 or UTF-32)? >>> >>> Thanks, >>> Nadav. >>> >>> -- >>> Nadav Har'El | Monday, Mar 12 >>> 2012, >>> n...@math.technion.ac.il >>> |----------------------------------------- >>> Phone +972-523-790466, ICQ 13349191 |We could wipe out world hunger if >>> we knew >>> http://nadav.harel.org.il |how to make AOL's Free CD's edible! >>> >>> _______________________________________________ >>> Linux-il mailing list >>> Linux-il@cs.huji.ac.il >>> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il >>> >> >> >> _______________________________________________ >> Linux-il mailing list >> Linux-il@cs.huji.ac.il >> http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il >> >> > > _______________________________________________ > Linux-il mailing list > Linux-il@cs.huji.ac.il > http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il > >
_______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il