Hi Boosters,

I have put in the Boost file section the first version of my UTF library. You can find it here:

http://groups.yahoo.com/group/boost/files/utf/

A couple of months ago, I posted a message to check if there was interest in such a library and I got just one answer from Vladimir Prus (hi, Vladimir!). I hope that in front of a full-blown and working library I might get more attention.

What you will find in the library:

* codecvt facets for the following external encodings: UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE. The facets are templated, in order to avoid any reference to the platform wchar_t type (if present).

The internal encoding can be either UTF-16 or UTF-32. A convenience interface is provided to automatically select the internal encoding according to the size (2 or 4 bytes) of the character type used internally.

The facets will perform correct handling of the following Unicode features:

- all 17 character planes
- non-characters (U+XFFFE, U+XFFFF, U+FDD0 - U+FFEF)
- UTF-16 surrogates pairs (both externally and internally)
- UTF-8 non-shortest forms (externally)

* a convenience interface to autodetect the correct facet according the file signature (BOM)

* a comprehensive test suite (with Jamfile)

* a little example (with Jamfile)

What you won't find in the library:

* documentation :-( I'm working on it!!! I swear. Give me some more time! (and a little feedback)

* facets for UCS-2 or UCS-4 (these encoding are very similar to UTF-16 and UTF-32 but are *not* the same!)

* facets that uses UTF-8 internally (this is too complex and won't work portably, believe me!)

Compatibility

The test suite and the example have been tested with VS.NET with both the native STL and STLport 4.5.3. However, STLport have major bugs in the codecvt interface and in the basic_filebuf implementation, so in order to compile and run the wchar_t and uint16_t tests you need to apply a patch that is provided with the library and *rebuild* STLport. The uint32_t test won't compile in any case due to an incomplete implementation of the entire locale suite (I am going to contact Boris Fomitchev in order to see how we can make a patch).
The test suite will compile and run correctly even in presence of the /Zc:wchar_t option (that's why there are a wchar_t and a uint16_t test in the first place).

The facets that have UTF16 internally were a major challenge. I provided two different implementations. The default one is a "compatibility" one that should work with most STL implementations (including VS.NET and STLport that have a minor flaw in them :-( ). The other one should be a little more performant but I don't know on how many compiler it will work. The alternative implementation can be selected from file config.hpp. In that file you can also find a #define that should be changed if your implementation correctly implements Library Issue 75 about the prototype of function do_length().

I hope you enjoy this library and find it useful. According to the feedback I receive, I will go on writing a decent document in view of a formal submission.

Thanks in advance for your time and help,

Alberto Barbati



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Reply via email to