On 2016-12-15 20:38-0000 Phil Rosenberg wrote: > Hi All > I've just posted a bug to the bug tracker regarding the buffer. > > I just wanted to send an email out to say that I don't particularly > intend to fix it before the freeze as I don't think there is time. But > I wanted to stick it on the tracker so it didn't get forgotten. > > The problem is that text is written and read in "device native" > encoding (i.e. Unicode or ascii) so if a buffer is copied from a > Unicode device to an ascii one then the plreplot call will hit a > problem with buffer reading and call plexit. The solution would be to > always write Unicode or to add a flag to indicate the encoding in the > buffer. If anyone has a preference then let me know.
Hi Phil: Thanks for bringing this potentially nasty issue to our attention, and I agree with your judgement this is an issue that should be tackled after the release. To answer your question for when you do work on this, if I recall correctly, user-specified strings which are assumed to be written in the UTF-8 encoding (which includes ascii) of unicode are uniformly translated to a modified UCS4 (32-bit) unicode encoding for internal PLplot use where the modifications include the possibility of embedding 32-bit FCI (font characterization integer) words (to change some font characteristic in mid string) in the middle of the modified UCS4 arrays. I also recall (but haven't checked to be sure) that right now that if the device cannot handle unicode, then we store the text information in an entirely different array which is 4 times shorter. But I would strongly prefer instead that we move to always using the modified UCS4 array for text with the ascii-only devices using a core function to convert that back to an ascii string (with appropriate filtering when the UCS4 array contains an FCI or a 32-bit representation of a non-ascii glyph) for their own text plotting. The big advantage of this approach is our text handling then automatically becomes greatly simplified/much less confusing to understand. The downside, of course, is UCS4 is 4 times larger than the equivalent ascii (for those cases when the pure ascii subset of UTF-8 is used by our users). However, I think pure ascii use is becoming rarer and rarer (especially for scientific plotting where many Unicode math symbols are typically used in labels). So instead of using our own (incredibly inefficient) "#[nnn]" unicode encoding that keeps users strings strictly in ascii, users now tend to just cut and paste the UTF-8 math symbol they need into their PLplot text string with some very nice results for our modern (unicode-aware) devices. So assuming that is a correct summary of the present text situation, and we also decide to make the simplifying code modification I strongly suggest above, then obviously the modified UCS4 array that will then always contain the text information should be included in the buffer. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Plplot-devel mailing list Plplot-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/plplot-devel