Re: Accented characters in scripts (and whereis iconv.h?)

2001-07-23 Thread Chris Nandor

At 09:33 -0400 2001.07.23, Steve Torrence wrote:
I would just like to know what is the most common way of handling
this. It's hard to believe the Perl script authors working on
scripts for languages that contain these extended characters are
going through a lot of trouble putting the accented characters in
their scripts. It seems that a tool like bbedit would be able to
take a script that was written using the extended characters and
convert the text to something compatible with perl.

I am not sure what you are asking.  Most people don't use non-ASCII
characters.  When they do, they usually use Latin-1 or Unicode.  If you use
MacRoman or Windows character sets instead, then you need to convert.

So what is it you want to handle?  If it is BBEdit or Terminal showing the
right characters, then you need to either do a Latin1-MacRoman
character map, or you need to change the behavior of the programs to show
the characters in the character set you want (in BBEdit or Terminal, this
might be as simple as changing the font; I use ProFont, and it has a
companion ProFontIsoLatin1).

I know that the few scripts I have found don't use either of the 2
coding methods Will mentioned. They seem to substitute one extended
character for another which Perl seems to convert to the correct
character when sending the html to the browser.

Again, perl does not convert any characters.  It just sends a byte of data.
How that byte is rendered is not determined by perl, it is determined by
the rendering process (BBEdit, Terminal, Internet Explorer, etc.).

For example, if I send the byte 0xC4, Terminal (using MacRoman) will show
Ÿ (that italicized lower-case f used often for folders).  But Internet
Explorer (using the default of Latin-1) will show Ä (capital A with an
umlaut).  perl doesn't care.  It is just data to perl (unless you are in a
regex, or using isprint(), etc.).  perl does no conversion or translation.
It just sends a single byte that is rendered by the displaying process in
different ways.

So you need to know not how perl will print the data, but how a particular
process will render it.  If you are printing data in Latin-1, then you need
to make sure the rendering process displays it in Latin-1.  For most
applications (web, etc.), this is the default, and therefore not a problem.
You might want to therefore adjust the behavior of your apps to use Latin-1
instead of MacRoman.

-- 
Chris Nandor  [EMAIL PROTECTED]http://pudge.net/
Open Source Development Network[EMAIL PROTECTED] http://osdn.com/



Re: Accented characters in scripts (and whereis iconv.h?)

2001-07-23 Thread Christian Smith

On Monday, July 23, 2001 at 09:33, [EMAIL PROTECTED] (Steve Torrence) wrote:

 I would just like to know what is the most common way of handling 
 this. It's hard to believe the Perl script authors working on 
 scripts for languages that contain these extended characters are 
 going through a lot of trouble putting the accented characters in 
 their scripts. It seems that a tool like bbedit would be able to 
 take a script that was written using the extended characters and 
 convert the text to something compatible with perl.

The way I see it, you have three options. I've listed them in what I would
think of as increasing order to desirability.

1) Add a content-type meta tag like this

meta http-equiv=content-type content=text/html; charset=x-mac-roman

which will tell the browser to treat the text as MacRoman. This will cause
the text to render properly in any browser which can handle the conversion
correctly.


2) Save the files in ISO Latin-1 encoding and convert to/from MacRoamn as
necessary when editing. You'll need something like the Midex plug-in for
BBEdit to do the character set conversions.

http://www.barebones.com/support/bbedit/bbedit-plugins.html


3) Convert the 8bit characters in your files to HTML entity codes. That
is, é becomes eacute; etc. You can use the Translate command in BBEdit to
do this. (Markup:Utilities:Translate)


Actually, there is another option. You could write the script in MacRoman
and then, before spitting out the HTML, convert the text to ISO Latin-1 on
the fly. I guess I'd place this somewhere between 2 and 3 above.

-- 
Christian Smith  |  [EMAIL PROTECTED]  |  http://web.barebones.com

He who dies with the most friends... Is still dead!



Re: Accented characters in scripts (and whereis iconv.h?)

2001-07-21 Thread will

I'm new to Perl and am using BBEdit to edit scripts. I just can't
figure out how to easily type in characters with accents. For
example to get usuário I have to put in  usu·rios for Você I have to
type in VocÍ

if it's just a question of entering the character, the ascii floater
in bbedit will probably be easiest. window menu - ascii table,
double click for the %mumble or apple-doubleclick for the character.

but remember that the upper part of the mac ascii table isn't the
same as on other platforms. have a look at perldoc perlport (or
http://www.perldoc.com/perl5.6/pod/perlport.html ) for lots of stiff
warnings about platform variation and some useful advice.

bbedit also has a similar floater for html entities, which is
probably all you need. even better, the translate command will
automatically turn your accented characters into the entity
equivalents, eg é - eacute;

out of curiosity, i did some fiddling about with this:

#!/usr/bin/perl

print Content-type:text/plain\n\n;
print ñö háblà espanõl\n;

and found that the only place it works as expected is on the os x
command line, either locally or on a remote server. through a browser
it always comes up as '-s hýbl– espanðl', whatever the native set is
on the server or client. it seems that perl on os x uses the mac
roman character set until run under cgi, at which point it prints
7bit ascii. You can confirm this by running:

#!/usr/bin/perl
print($_ =  . chr($_) . \n) for (32..255);

which produces different results in the terminal and the browser.

if this proves to be a problem, and you must have raw accented text
inside your script, you can either print through a translation table
from mac ascii to the standard/other version, or (more portable) do
it all in unicode. the recommended way to do either of these seems to
be the Text::Iconv module, but i couldn't get it to make, as os x
doesn't provide the iconv.h that it wraps around. does anyone know a
fix?

thanks

will


ps. i'd be grateful if people would correct any misconceptions here.



--











pgpkey: http://www.spanner.org/keys/will.txt