Hello,

I have found an easy way to solve the UTF-8 problem,
source -econding utf-8 (not cp1252). But it works
only for Tcl 8.5, as the [source] command was extended
with that option then.

I will either have to see about an alternative or
make sure this option is not used for Tcl 8.4 and older.

Regards,

Arjen

On 2010-12-28 11:10, Arjen Markus wrote:
> Hi Alan,
> 
> Tcl has two options to do what you suggest:
> - Set the system encoding to cp1252
> - Source the file using -encoding cp1252
> 
> (A third is to adapt the C side of the Plplot API.)
> 
> I will experiment with both.
> 
> Regards,
> 
> Arjen
> 
> On 2010-12-24 22:20, Alan W. Irwin wrote:
>> On 2010-12-24 13:05+0100 Arjen Markus wrote:
>>
>>> Hi Alan,
>>>
>>> On 2010-12-21 20:03, Alan W. Irwin wrote:
>>>
>>>>
>>>> I will stop now and not comment more on the Tcl case, because I think
>>>> it is essential to focus on C for now and one of the cairo or qt
>>>> devices.
>>>>
>>>> Of course, as I have stated before the psc device driver is not useful
>>>> for diagnosis of encoding issues because everything exotic such as the
>>>> Mandarin Peace word ends up as blanks in any case because the standard
>>>> Type 1 fonts that the psc device uses have an extremely limited glyph
>>>> set that does not include Mandarin glyphs or any other non-English
>>>> glyphs besides Greek (for mathematical purposes).
>>>>
>>>
>>> I have checked the C and F95 examples 24 with the wxWidgets device on
>>> Windows: they work very nicely, except that my PC does not have all the
>>> required fonts. With Tcl I recognise the odd sequences I also see in
>>> the source code (viewed from a cp1252 perspective).
>>
>> Thus, it appears from your results that C and Fortran on Windows
>> simply accept the byte sequences mentioned in strings without
>> molesting them while your hypothesis is that Tcl does not follow that
>> simple model.  Instead it does an implicit transformation of all
>> strings from (presumed) system encoding to UTF-8 which messes up the
>> byte sequences, and your proposed cure is to take all strings input to
>> PLplot from Tcl and do the inverse transformation which uses
>> a call to Tcl_UtfToExternalDString with a
>> NULL encoding. From the man page, what that will do is to convert a
>> (presumed) UTF-8 string to the (presumed) system encoding.
>>
>> If your hypothesis is correct, then your proposed cure might indeed
>> work on all platforms.  Certainly on Linux, the implicit
>> transformation is UTF-8 to UTF-8 or the identity transform (which is
>> why Tcl works right now for example 24 on Linux), and your inverse
>> transformation would also be the identify transformation on Linux and
>> should therefore also work on that platform.
>>
>> However, I am concerned with the following issues.
>>
>> 1. All PLplot API arguments that are strings are assumed to be in
>> UTF-8.  Thus, the call to Tcl_UtfToExternalDString with NULL has to be
>> made in the Tcl bindings for _every_ function in the PLplot API that
>> has an input string.
>>
>> 2.  Does the implicit transformation work for arbitrary UTF-8 (e.g.,
>> arbitrary series of 8-bit bytes) or are there some 8-bit bytes which
>> cannot be validly interpreted as cp1252 or which have special
>> meanings.
>>
>> 3.  Is Tcl_UtfToExternalDString with NULL encoding the exact inverse
>> of the implicit transformation?
>>
>> All of these issues can be dealt with.  Obviously some care in the Tcl
>> bindings should take care of issue 1 and to alleviate concern about
>> issues 2 and 3 completely, it would be a good idea to put together a
>> complete test of all 256 possible 8-bit character combinations.  I am
>> thinking along the lines of generating a file from C with a string of
>> all possibilities from 255 down to a zero (to terminate the string).
>> Then using an editor copy that exact string of 256 bytes to Tcl source
>> code that automatically puts that string through the implicit
>> transformation.  Then use the Tcl_UtfToExternalDString with NULL to
>> transform that string before calling a C programme that simply outputs
>> the string.  Then compare that output file with the original file with
>> 256 characters to see if you get all 256 characters back in their
>> original form.
>>
>> However, to avoid this work it would be better to convince Tcl not to
>> do the implicit string transformation in the first place.  The way
>> this is handled in Python is to put the following string in the
>> first or second line of the Python script that identifies the
>> whole Python source file is encoded in utf-8:
>>
>> # -*- coding: utf-8 -*-
>>
>> We do that for the following Python examples:
>>
>> softw...@raven> grep  coding: examples/python/xw??.py
>> examples/python/xw18.py:# -*- coding: utf-8; -*-
>> examples/python/xw24.py:# -*- coding: utf-8; -*-
>> examples/python/xw26.py:# -*- coding: utf-8; -*-
>> examples/python/xw33.py:# -*- coding: utf-8; -*-
>>
>> I think Tcl may have something equivalent in the
>>
>> encoding system utf-8
>>
>> command.  From the documentation users are discouraged from using that
>> command because it affects everything such as system calls. For
>> example, puts would output strings in UTF-8 encoding rather than the
>> actual (e.g., cp1252 on your platform) system encoding on Windows
>> machines.  But is that actually an issue for the above examples?
>> First, we don't interact with the operating system (e.g, with puts) as
>> far as I know with those examples, and UTF-8 and cp1252 coincide in
>> any case for ascii strings.  Anyhow, if "encoding system utf-8" works
>> for those examples, I think we should use it rather than the more
>> difficult steps outlined above. Of course, we should inform Tcl users
>> browsing our example code via a comment in those examples that PLplot
>> requires utf-8 system encoding for all non-ascii input strings.
>>
>>>
>>> Over the weekend I won't be able to do anything - holiday
>>> obligations :).
>>>
>>> I wish you all a merry Christmas and a happy New Year.
>>
>> I wish everybody here a "Merry Christmas" and "Happy New Year" as well.
>>
>> Enjoy your holidays, Arjen, and I hope when you come back you will
>> find the "encoding system utf-8" solution works without issues for the
>> affected examples.
>>
>> Alan
>> __________________________
>> Alan W. Irwin
>>
>> Astronomical research affiliation with Department of Physics and 
>> Astronomy,
>> University of Victoria (astrowww.phys.uvic.ca).
>>
>> Programming affiliations with the FreeEOS equation-of-state 
>> implementation
>> for stellar interiors (freeeos.sf.net); PLplot scientific plotting 
>> software
>> package (plplot.org); the libLASi project (unifont.org/lasi); the 
>> Loads of
>> Linux Links project (loll.sf.net); and the Linux Brochure Project
>> (lbproject.sf.net).
>> __________________________
>>
>> Linux-powered Science
>> __________________________
>>
> 
 

DISCLAIMER: This message is intended exclusively for the addressee(s) and may 
contain confidential and privileged information. If you are not the intended 
recipient please notify the sender immediately and destroy this message. 
Unauthorized use, disclosure or copying of this message is strictly prohibited.
The foundation 'Stichting Deltares', which has its seat at Delft, The 
Netherlands, Commercial Registration Number 41146461, is not liable in any way 
whatsoever for consequences and/or damages resulting from the improper, 
incomplete and untimely dispatch, receipt and/or content of this e-mail.





------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Plplot-devel mailing list
Plplot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to