On 2016-12-15 20:38-0000 Phil Rosenberg wrote:

> Hi All
> I've just posted a bug to the bug tracker regarding  the buffer.
>
> I just wanted to send an email out to say that I don't particularly
> intend to fix it before the freeze as I don't think there is time. But
> I wanted to stick it on the tracker so it didn't get forgotten.
>
> The problem is that text is written and read in "device native"
> encoding (i.e. Unicode or ascii) so if a buffer is copied from a
> Unicode device to an ascii one then the plreplot call will hit a
> problem with buffer reading and call plexit. The solution would be to
> always write Unicode or to add a flag to indicate the encoding in the
> buffer. If anyone has a preference then let me know.

Hi Phil:

Thanks for bringing this potentially nasty issue to our attention,
and I agree with your judgement this is an issue that should be
tackled after the release.

To answer your question for when you do work on this, if I recall
correctly, user-specified strings which are assumed to be written in
the UTF-8 encoding (which includes ascii) of unicode are uniformly
translated to a modified UCS4 (32-bit) unicode encoding for internal
PLplot use where the modifications include the possibility of
embedding 32-bit FCI (font characterization integer) words (to change
some font characteristic in mid string) in the middle of the modified
UCS4 arrays.  I also recall (but haven't checked to be sure) that
right now that if the device cannot handle
unicode, then we store the text information in an entirely different
array which is 4 times shorter.

But I would strongly prefer instead that we move to always using the
modified UCS4 array for text with the ascii-only devices using a core
function to convert that back to an ascii string (with appropriate
filtering when the UCS4 array contains an FCI or a 32-bit
representation of a non-ascii glyph) for their own text plotting.  The
big advantage of this approach is our text handling then automatically
becomes greatly simplified/much less confusing to understand.  The
downside, of course, is UCS4 is 4 times larger than the equivalent
ascii (for those cases when the pure ascii subset of UTF-8 is used by
our users).  However, I think pure ascii use is becoming rarer and
rarer (especially for scientific plotting where many Unicode math
symbols are typically used in labels).  So instead of using our own
(incredibly inefficient) "#[nnn]" unicode encoding that keeps users
strings strictly in ascii, users now tend to just cut and paste the
UTF-8 math symbol they need into their PLplot text string with
some very nice results for our modern (unicode-aware) devices.

So assuming that is a correct summary of the present text situation,
and we also decide to make the simplifying code modification I
strongly suggest above, then obviously the modified UCS4 array that
will then always contain the text information should be included in
the buffer.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to