Vladimir Prus wrote:
First interpretation is that you're interested in support for
different Unicode encodings, via appropriate facets. Then
Alberto Barbati is the last person who touches this matter,
in
    news://news.gmane.org:119/aq72e4$pog$[EMAIL PROTECTED]

I assume he's holding a lock on implementation work. Alberto,
did you get anywhere?
Yes, despite the clear lack of interest from Boosters about this issue, I'm still working on it ( but I don't have any "lock" ;) ).

I had a few problems with the interpretation of the standard, but thanks to a few guys from comp.std.c++ I can now say that I have a working implementation of facets to converts from UTF-8/16/32 (external) to UTF-16/32 (internal), with endian variants, a total of 10 facets. The implementation fulfill a basic suite of tests on VS.NET with both the native STL and STLport.

The facets are conformant to Unicode 3.2 requirements about non-characters, use of surrogates and non-shortest UTF-8 sequences. After a private discussion with a field expert, I decided to drop the UCS-2 facets, so surrogate support is no longer optional. I also decided to drop facets with UTF-8 as the internal encoding because they are not very useful and the current wording of the C++ standard de facto disallows a portable implementation :(. I hope the LWG would consider clarifying the issue.

My next steps would be to polish the code, write the docs and prepare a more complete test suite. If everything goes well, I think I could submit the library for review by the end of the month.

Second interpretation is conversion between all the 8-bit encodings
out there. E.g. from koi8-r to windows-1251. Since there's GNU
iconv already, I'd rather see a tiny wrapper over it. (GNU iconv works
on Windows, too).
Here things become more complex. UTF conversions are just algorithmic stuff, easy to do. Other conversions like koi8-r o windows-1251 require look-up tables and simply gathering the data for all of them will be equivalent to rewriting a part of ICU, which is a huge piece of work.

The idea of wrapping ICU is very interesting. However the Boost policy explictly disallows dependencies from external libraries, so this solution is out of discussion. Moreover, the only things ICU is missing are the conversion facets. I don't see any reason to wrap anything else. Unfortunately, as I said before, not all conversions can be portably expressed as a facet with the current C++ standard, so even writing wrapping facets has little meaning.

Alberto Barbati




_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Reply via email to