[fltk.development] RFC: Pure UTF-8 or Hybrid CP1252 ?

Duncan Gibson Fri, 19 Nov 2010 15:50:05 -0800

After Matt's mega-refactoring of the Fl_Text_{Buffer,Display,Editor}
classes, and relative simplification of the code, I've been able to
isolate one of the major problems remaining for the UTF-8 interface.


That problem relates to the display of characters from the CP1252
code page commonly used on Windows, as shown by the screenshots that
Albrecht made available in the FLTK-1.3/src/misc directory, and then
raised as an STR by me in http://www.fltk.org/str.php?L2348
"test/editor fails to display misc/cp1252.txt and can hang"

The upshot of all of this is that handling strings containing CP1252
C1 codes, and other bytes with the top bit set, is that some major
changes have to be made to the Fl_Text_{Buffer,Display} code to copy
bytes from the ring buffer into a straight char array to allow the
use of fl_utf8decode(), et al, on an unbroken consecutive sequence of
bytes.

In addition, the low level fl_draw(s, n, x, y) and fl_width(s, n)
functions call routines in src/xutf8/utf8Wrap.c that only handle
pure UTF-8 byte sequences. These routines are at a lower level than
fl_utf8decode() and it therefore seems inappropriate to pollute them
to be CP1252-aware. To get round this, I've used a new function in
Fl_Text_Display that expands CP1252 strings into pure UTF-8 strings
before calling the fl_draw() and fl_width() functions.

Very recently in http://www.fltk.org/str.php?L2348 I argued that we
should continue to offer the hybrid UTF-8/CP1252 capability so that
users could load files into their FLTK application without the risk
that they would be silently converted by the framework.

But now that I have seen that supporting CP1252 in Fl_Text_* involves
an awful lot of unnecessary copying of strings to handle it, I worry
that it could affect the "Fast" part of the Fast Light ToolKit name.
Therefore I am now leaning more towards a pure UTF-8 future, but this
could have major implications for FLTK-1.1 users porting their own
text handling widgets to 1.3.0.

Before we release FLTK-1.3.0 and commit to keeping the same character
set support until at least the next major release after that, we need
to decide on whether we support one of [at least] three options:

1. We decide that FLTK-1.3.0 will be the first release that will
   support pure UTF-8 only, and that CP1252 data in files, etc. will
   be converted to pure UTF-8 during input, with or without warning.

2. We decide that FLTK-1.3.0 will continue to support a hybrid system
   of UTF-8 plus CP1252, using the array copying techniques described
   above, or similar.

3. We decide that FLTK-1.3.0 will continue to support a hybrid system
   of UTF-8 plus CP1252, but we avoid the array copying by extending
   the low level fl_draw() and fl_width() functions to handle CP1252.

Comments? Other options that I have missed?

D.
_______________________________________________
fltk-dev mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-dev

[fltk.development] RFC: Pure UTF-8 or Hybrid CP1252 ?

Reply via email to