On 2010-12-21 11:21+0100 Arjen Markus wrote:

> Hi Alan,
>
> I think you misunderstand the issue. The problem is that a file
> consists of bytes. In the old days, each byte corresponded to a
> single character, but with the advent of UTF-8 and the like a single
> character may be represented by one, two or more bytes. What a program
> will do with these bytes depends on the assumption about the
> character encoding.
>
> For Tcl programs the following happens:
> - Based on the system encoding, all sequences of bytes are translated
>  into equivalent UTF-8 characters.
> - If the system encoding is NOT UTF-8, the internal resulting sequence
>  may not be the same as in the file. For instance, on Windows "cp1252"
>  is one way to connect the bytes above 127 to characters such as
>  A-umlaut. So a byte that represents A-umlaut according to the cp1252
>  encoding is translated to the UTF-8 sequence of bytes that represents
>  that very same character. In other words: it is a completely different
>  sequence of bytes.
> - Right now we pass that _internal_ sequence of bytes to the PLplot C
>  library - and assume that it was the original sequence of bytes.
>  But that is only true if the system encoding is UTF-8.
>  The code I propose as an alternative reverses the translation.
>
> Bytes lower/equal 127 represent exactly the same charachters in cp1252
> and UTF-8 (by design), so most examples are not affected by this
> distinction.
>
> (I agree this is highly confusing - but if you simply think of
> bytes separated from characters it becomes a bit easier)

I agree it is highly confusing and difficult to describe clearly.

When I look at the Peace words in the actual files x24.tcl (and
x24c.c) with the system tools available to me (the emacs editor in my
case), it is clear the bytes in those files can only be interpreted
properly with a UTF-8 encoding.  Please use your own system analysis
tools to confirm that conclusion so that at least our analysis is
starting at the same point.  In other words, if you had some system
tool there that assumed the Peace words in x24.tcl was cp1252, then
the result would be displayed as gibberish or blank. Only if you
interpret with the UTF-8 encoding _and_ have the Mandarin fonts
installed would the Mandarin Peace word be rendered properly as
happens for me with the emacs editor.  Does that also happen for you
with whatever file display tool that is accessible to you that is
capable of understanding UTF-8 encoded files?

I acknowledge that Tcl often does things in a very complex way so I
would advise forgetting Tcl for the moment and instead looking at the
example 24 results from C. Does the x24c executable produce the same
as http://plplot.sourceforge.net/examples-data/demo24/x24.01.png on
your system when you use the pngcairo or pngqt device drivers?

If so, that confirms you have the proper system fonts installed, and
then there is some hope of getting the same good result with Tcl.  On
the other hand, if you cannot make the C example give a good example
24 result on Windows with the cairo or qt devices, then there is
little hope for Tcl.

I will stop now and not comment more on the Tcl case, because I think
it is essential to focus on C for now and one of the cairo or qt
devices.

Of course, as I have stated before the psc device driver is not useful
for diagnosis of encoding issues because everything exotic such as the
Mandarin Peace word ends up as blanks in any case because the standard
Type 1 fonts that the psc device uses have an extremely limited glyph
set that does not include Mandarin glyphs or any other non-English
glyphs besides Greek (for mathematical purposes).

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the libLASi project (unifont.org/lasi); the Loads of
Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Forrester recently released a report on the Return on Investment (ROI) of
Google Apps. They found a 300% ROI, 38%-56% cost savings, and break-even
within 7 months.  Over 3 million businesses have gone Google with Google Apps:
an online email calendar, and document program that's accessible from your 
browser. Read the Forrester report: http://p.sf.net/sfu/googleapps-sfnew
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to