After Matt's mega-refactoring of the Fl_Text_{Buffer,Display,Editor}
classes, and relative simplification of the code, I've been able to
isolate one of the major problems remaining for the UTF-8 interface.That problem relates to the display of characters from the CP1252 code page commonly used on Windows, as shown by the screenshots that Albrecht made available in the FLTK-1.3/src/misc directory, and then raised as an STR by me in http://www.fltk.org/str.php?L2348 "test/editor fails to display misc/cp1252.txt and can hang" The upshot of all of this is that handling strings containing CP1252 C1 codes, and other bytes with the top bit set, is that some major changes have to be made to the Fl_Text_{Buffer,Display} code to copy bytes from the ring buffer into a straight char array to allow the use of fl_utf8decode(), et al, on an unbroken consecutive sequence of bytes. In addition, the low level fl_draw(s, n, x, y) and fl_width(s, n) functions call routines in src/xutf8/utf8Wrap.c that only handle pure UTF-8 byte sequences. These routines are at a lower level than fl_utf8decode() and it therefore seems inappropriate to pollute them to be CP1252-aware. To get round this, I've used a new function in Fl_Text_Display that expands CP1252 strings into pure UTF-8 strings before calling the fl_draw() and fl_width() functions. Very recently in http://www.fltk.org/str.php?L2348 I argued that we should continue to offer the hybrid UTF-8/CP1252 capability so that users could load files into their FLTK application without the risk that they would be silently converted by the framework. But now that I have seen that supporting CP1252 in Fl_Text_* involves an awful lot of unnecessary copying of strings to handle it, I worry that it could affect the "Fast" part of the Fast Light ToolKit name. Therefore I am now leaning more towards a pure UTF-8 future, but this could have major implications for FLTK-1.1 users porting their own text handling widgets to 1.3.0. Before we release FLTK-1.3.0 and commit to keeping the same character set support until at least the next major release after that, we need to decide on whether we support one of [at least] three options: 1. We decide that FLTK-1.3.0 will be the first release that will support pure UTF-8 only, and that CP1252 data in files, etc. will be converted to pure UTF-8 during input, with or without warning. 2. We decide that FLTK-1.3.0 will continue to support a hybrid system of UTF-8 plus CP1252, using the array copying techniques described above, or similar. 3. We decide that FLTK-1.3.0 will continue to support a hybrid system of UTF-8 plus CP1252, but we avoid the array copying by extending the low level fl_draw() and fl_width() functions to handle CP1252. Comments? Other options that I have missed? D. _______________________________________________ fltk-dev mailing list [email protected] http://lists.easysw.com/mailman/listinfo/fltk-dev
