On 2010-12-24 13:05+0100 Arjen Markus wrote:

> Hi Alan,
>
> On 2010-12-21 20:03, Alan W. Irwin wrote:
>
>> 
>> I will stop now and not comment more on the Tcl case, because I think
>> it is essential to focus on C for now and one of the cairo or qt
>> devices.
>> 
>> Of course, as I have stated before the psc device driver is not useful
>> for diagnosis of encoding issues because everything exotic such as the
>> Mandarin Peace word ends up as blanks in any case because the standard
>> Type 1 fonts that the psc device uses have an extremely limited glyph
>> set that does not include Mandarin glyphs or any other non-English
>> glyphs besides Greek (for mathematical purposes).
>> 
>
> I have checked the C and F95 examples 24 with the wxWidgets device on
> Windows: they work very nicely, except that my PC does not have all the
> required fonts. With Tcl I recognise the odd sequences I also see in
> the source code (viewed from a cp1252 perspective).

Thus, it appears from your results that C and Fortran on Windows
simply accept the byte sequences mentioned in strings without
molesting them while your hypothesis is that Tcl does not follow that
simple model.  Instead it does an implicit transformation of all
strings from (presumed) system encoding to UTF-8 which messes up the
byte sequences, and your proposed cure is to take all strings input to
PLplot from Tcl and do the inverse transformation which uses
a call to Tcl_UtfToExternalDString with a
NULL encoding. From the man page, what that will do is to convert a
(presumed) UTF-8 string to the (presumed) system encoding.

If your hypothesis is correct, then your proposed cure might indeed
work on all platforms.  Certainly on Linux, the implicit
transformation is UTF-8 to UTF-8 or the identity transform (which is
why Tcl works right now for example 24 on Linux), and your inverse
transformation would also be the identify transformation on Linux and
should therefore also work on that platform.

However, I am concerned with the following issues.

1. All PLplot API arguments that are strings are assumed to be in
UTF-8.  Thus, the call to Tcl_UtfToExternalDString with NULL has to be
made in the Tcl bindings for _every_ function in the PLplot API that
has an input string.

2.  Does the implicit transformation work for arbitrary UTF-8 (e.g.,
arbitrary series of 8-bit bytes) or are there some 8-bit bytes which
cannot be validly interpreted as cp1252 or which have special
meanings.

3.  Is Tcl_UtfToExternalDString with NULL encoding the exact inverse
of the implicit transformation?

All of these issues can be dealt with.  Obviously some care in the Tcl
bindings should take care of issue 1 and to alleviate concern about
issues 2 and 3 completely, it would be a good idea to put together a
complete test of all 256 possible 8-bit character combinations.  I am
thinking along the lines of generating a file from C with a string of
all possibilities from 255 down to a zero (to terminate the string).
Then using an editor copy that exact string of 256 bytes to Tcl source
code that automatically puts that string through the implicit
transformation.  Then use the Tcl_UtfToExternalDString with NULL to
transform that string before calling a C programme that simply outputs
the string.  Then compare that output file with the original file with
256 characters to see if you get all 256 characters back in their
original form.

However, to avoid this work it would be better to convince Tcl not to
do the implicit string transformation in the first place.  The way
this is handled in Python is to put the following string in the
first or second line of the Python script that identifies the
whole Python source file is encoded in utf-8:

# -*- coding: utf-8 -*-

We do that for the following Python examples:

softw...@raven> grep  coding: examples/python/xw??.py
examples/python/xw18.py:# -*- coding: utf-8; -*-
examples/python/xw24.py:# -*- coding: utf-8; -*-
examples/python/xw26.py:# -*- coding: utf-8; -*-
examples/python/xw33.py:# -*- coding: utf-8; -*-

I think Tcl may have something equivalent in the

encoding system utf-8

command.  From the documentation users are discouraged from using that
command because it affects everything such as system calls. For
example, puts would output strings in UTF-8 encoding rather than the
actual (e.g., cp1252 on your platform) system encoding on Windows
machines.  But is that actually an issue for the above examples?
First, we don't interact with the operating system (e.g, with puts) as
far as I know with those examples, and UTF-8 and cp1252 coincide in
any case for ascii strings.  Anyhow, if "encoding system utf-8" works
for those examples, I think we should use it rather than the more
difficult steps outlined above. Of course, we should inform Tcl users
browsing our example code via a comment in those examples that PLplot
requires utf-8 system encoding for all non-ascii input strings.

>
> Over the weekend I won't be able to do anything - holiday
> obligations :).
>
> I wish you all a merry Christmas and a happy New Year.

I wish everybody here a "Merry Christmas" and "Happy New Year" as well.

Enjoy your holidays, Arjen, and I hope when you come back you will
find the "encoding system utf-8" solution works without issues for the
affected examples.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to