Re: [Plplot-devel] plbuffer bug

2016-12-15 Thread p.d.rosenberg
Hi Alan, Jim
I don't think the size impact of storing everything as a ucs4 string is a big 
deal. Most text stored will be single character points. But we already have 
overheads for storing the position, rotation, style, font, etc. So for a 1 byte 
ascii character we probably already store 20 or so bytes. Adding another 3 is 
not an issue.

Sent from my Windows 10 phone

From: Alan W. Irwin
Sent: 15 December 2016 22:59
To: Phil Rosenberg
Cc: plplot-devel@lists.sourceforge.net
Subject: Re: [Plplot-devel] plbuffer bug

On 2016-12-15 20:38- Phil Rosenberg wrote:

> Hi All
> I've just posted a bug to the bug tracker regarding  the buffer.
>
> I just wanted to send an email out to say that I don't particularly
> intend to fix it before the freeze as I don't think there is time. But
> I wanted to stick it on the tracker so it didn't get forgotten.
>
> The problem is that text is written and read in "device native"
> encoding (i.e. Unicode or ascii) so if a buffer is copied from a
> Unicode device to an ascii one then the plreplot call will hit a
> problem with buffer reading and call plexit. The solution would be to
> always write Unicode or to add a flag to indicate the encoding in the
> buffer. If anyone has a preference then let me know.

Hi Phil:

Thanks for bringing this potentially nasty issue to our attention,
and I agree with your judgement this is an issue that should be
tackled after the release.

To answer your question for when you do work on this, if I recall
correctly, user-specified strings which are assumed to be written in
the UTF-8 encoding (which includes ascii) of unicode are uniformly
translated to a modified UCS4 (32-bit) unicode encoding for internal
PLplot use where the modifications include the possibility of
embedding 32-bit FCI (font characterization integer) words (to change
some font characteristic in mid string) in the middle of the modified
UCS4 arrays.  I also recall (but haven't checked to be sure) that
right now that if the device cannot handle
unicode, then we store the text information in an entirely different
array which is 4 times shorter.

But I would strongly prefer instead that we move to always using the
modified UCS4 array for text with the ascii-only devices using a core
function to convert that back to an ascii string (with appropriate
filtering when the UCS4 array contains an FCI or a 32-bit
representation of a non-ascii glyph) for their own text plotting.  The
big advantage of this approach is our text handling then automatically
becomes greatly simplified/much less confusing to understand.  The
downside, of course, is UCS4 is 4 times larger than the equivalent
ascii (for those cases when the pure ascii subset of UTF-8 is used by
our users).  However, I think pure ascii use is becoming rarer and
rarer (especially for scientific plotting where many Unicode math
symbols are typically used in labels).  So instead of using our own
(incredibly inefficient) "#[nnn]" unicode encoding that keeps users
strings strictly in ascii, users now tend to just cut and paste the
UTF-8 math symbol they need into their PLplot text string with
some very nice results for our modern (unicode-aware) devices.

So assuming that is a correct summary of the present text situation,
and we also decide to make the simplifying code modification I
strongly suggest above, then obviously the modified UCS4 array that
will then always contain the text information should be included in
the buffer.

Alan
__
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__

Linux-powered Science
__

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel


Re: [Plplot-devel] plbuffer bug

2016-12-15 Thread Jim Dishaw


> On Dec 15, 2016, at 5:59 PM, Alan W. Irwin  wrote:
> 
>> On 2016-12-15 20:38- Phil Rosenberg wrote:
>> 
>> The problem is that text is written and read in "device native"
>> encoding (i.e. Unicode or ascii) so if a buffer is copied from a
>> Unicode device to an ascii one then the plreplot call will hit a
>> problem with buffer reading and call plexit. The solution would be to
>> always write Unicode or to add a flag to indicate the encoding in the
>> buffer. If anyone has a preference then let me know.
> 
> Hi Phil:
> 
> Thanks for bringing this potentially nasty issue to our attention,
> and I agree with your judgement this is an issue that should be
> tackled after the release.
> 
> To answer your question for when you do work on this, if I recall
> correctly, user-specified strings which are assumed to be written in
> the UTF-8 encoding (which includes ascii) of unicode are uniformly
> translated to a modified UCS4 (32-bit) unicode encoding for internal
> PLplot use where the modifications include the possibility of
> embedding 32-bit FCI (font characterization integer) words (to change
> some font characteristic in mid string) in the middle of the modified
> UCS4 arrays.  I also recall (but haven't checked to be sure) that
> right now that if the device cannot handle
> unicode, then we store the text information in an entirely different
> array which is 4 times shorter.
> 
> But I would strongly prefer instead that we move to always using the
> modified UCS4 array for text with the ascii-only devices using a core
> function to convert that back to an ascii string (with appropriate
> filtering when the UCS4 array contains an FCI or a 32-bit
> representation of a non-ascii glyph) for their own text plotting.  The
> big advantage of this approach is our text handling then automatically
> becomes greatly simplified/much less confusing to understand.  The
> downside, of course, is UCS4 is 4 times larger than the equivalent
> ascii (for those cases when the pure ascii subset of UTF-8 is used by
> our users).  However, I think pure ascii use is becoming rarer and
> rarer (especially for scientific plotting where many Unicode math
> symbols are typically used in labels).  So instead of using our own
> (incredibly inefficient) "#[nnn]" unicode encoding that keeps users
> strings strictly in ascii, users now tend to just cut and paste the
> UTF-8 math symbol they need into their PLplot text string with
> some very nice results for our modern (unicode-aware) devices.
> 
> So assuming that is a correct summary of the present text situation,
> and we also decide to make the simplifying code modification I
> strongly suggest above, then obviously the modified UCS4 array that
> will then always contain the text information should be included in
> the buffer.
> 

Oops. I didn't think of that when I implemented the buffer awhile back. It 
probably would be best to have an encoding flag because if the library is 
compiled without unicode support, I think forcing Unicode could cause problems. 

My goof, so I can try my hand at a patch, but it will be after the freeze. 

> Alan
> __
> Alan W. Irwin
> 
> Astronomical research affiliation with Department of Physics and Astronomy,
> University of Victoria (astrowww.phys.uvic.ca).
> 
> Programming affiliations with the FreeEOS equation-of-state
> implementation for stellar interiors (freeeos.sf.net); the Time
> Ephemerides project (timeephem.sf.net); PLplot scientific plotting
> software package (plplot.sf.net); the libLASi project
> (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
> and the Linux Brochure Project (lbproject.sf.net).
> __
> 
> Linux-powered Science
> __
> 
> --
> Check out the vibrant tech community on one of the world's most 
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Plplot-devel mailing list
> Plplot-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/plplot-devel


--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel


Re: [Plplot-devel] plbuffer bug

2016-12-15 Thread Alan W. Irwin
On 2016-12-15 20:38- Phil Rosenberg wrote:

> Hi All
> I've just posted a bug to the bug tracker regarding  the buffer.
>
> I just wanted to send an email out to say that I don't particularly
> intend to fix it before the freeze as I don't think there is time. But
> I wanted to stick it on the tracker so it didn't get forgotten.
>
> The problem is that text is written and read in "device native"
> encoding (i.e. Unicode or ascii) so if a buffer is copied from a
> Unicode device to an ascii one then the plreplot call will hit a
> problem with buffer reading and call plexit. The solution would be to
> always write Unicode or to add a flag to indicate the encoding in the
> buffer. If anyone has a preference then let me know.

Hi Phil:

Thanks for bringing this potentially nasty issue to our attention,
and I agree with your judgement this is an issue that should be
tackled after the release.

To answer your question for when you do work on this, if I recall
correctly, user-specified strings which are assumed to be written in
the UTF-8 encoding (which includes ascii) of unicode are uniformly
translated to a modified UCS4 (32-bit) unicode encoding for internal
PLplot use where the modifications include the possibility of
embedding 32-bit FCI (font characterization integer) words (to change
some font characteristic in mid string) in the middle of the modified
UCS4 arrays.  I also recall (but haven't checked to be sure) that
right now that if the device cannot handle
unicode, then we store the text information in an entirely different
array which is 4 times shorter.

But I would strongly prefer instead that we move to always using the
modified UCS4 array for text with the ascii-only devices using a core
function to convert that back to an ascii string (with appropriate
filtering when the UCS4 array contains an FCI or a 32-bit
representation of a non-ascii glyph) for their own text plotting.  The
big advantage of this approach is our text handling then automatically
becomes greatly simplified/much less confusing to understand.  The
downside, of course, is UCS4 is 4 times larger than the equivalent
ascii (for those cases when the pure ascii subset of UTF-8 is used by
our users).  However, I think pure ascii use is becoming rarer and
rarer (especially for scientific plotting where many Unicode math
symbols are typically used in labels).  So instead of using our own
(incredibly inefficient) "#[nnn]" unicode encoding that keeps users
strings strictly in ascii, users now tend to just cut and paste the
UTF-8 math symbol they need into their PLplot text string with
some very nice results for our modern (unicode-aware) devices.

So assuming that is a correct summary of the present text situation,
and we also decide to make the simplifying code modification I
strongly suggest above, then obviously the modified UCS4 array that
will then always contain the text information should be included in
the buffer.

Alan
__
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__

Linux-powered Science
__

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel


[Plplot-devel] plbuffer bug

2016-12-15 Thread Phil Rosenberg
Hi All
I've just posted a bug to the bug tracker regarding  the buffer.

I just wanted to send an email out to say that I don't particularly
intend to fix it before the freeze as I don't think there is time. But
I wanted to stick it on the tracker so it didn't get forgotten.

The problem is that text is written and read in "device native"
encoding (i.e. Unicode or ascii) so if a buffer is copied from a
Unicode device to an ascii one then the plreplot call will hit a
problem with buffer reading and call plexit. The solution would be to
always write Unicode or to add a flag to indicate the encoding in the
buffer. If anyone has a preference then let me know.

Phil

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel