Hi

I have been exploring CGI.pm and am of course interested in the HTML
escaping procedure.

perldoc CGI thrwos up this

"By default, all HTML that is emitted by the form-generating functions
is passed through a function called escapeHTML():

       $escaped_string = escapeHTML("unescaped string");
           Escape HTML formatting characters in a string.

       Provided that you have specified a character set of ISO-8859-1
(the default), the standard HTML escaping rules will be used.  The "<"
character becomes "&lt;", ">" becomes "&gt;", "&" becomes "&amp;", and
the quote character
       becomes "&quot;".  In addition, the hexadecimal 0x8b and 0x9b
characters, which some browsers incorrectly interpret as the left and
right angle-bracket characters, are replaced by their numeric
character entities ("&#8249"
       and "&#8250;").  If you manually change the charset, either by
calling the charset() method explicitly or by passing a -charset
argument to header(), then all characters will be replaced by their
numeric entities, since
       CGI.pm has no lookup table for all the possible encodings.

       The automatic escaping does not apply to other shortcuts, such
as h1().  You should call escapeHTML() yourself on untrusted data in
order to protect your pages against nasty tricks that people may enter
into guestbooks, etc..
       To change the character set, use charset().  To turn
autoescaping off completely, use autoEscape(0):"

and I need to ask some questions about it.

I'm using the OO form in case that makes any difference.  Assuming a
'my $qry = new CGI;'

The first sentence
"By default, all HTML that is emitted by the form-generating functions
is passed through a function called escapeHTML(): "
I'm slightly confused by the term 'form-generating' ... does this
specifically mean functions such as start_form, checkbox_group, submit
and end_form and to the exclusion of functions such as $qry->p(...) ?
Or does it include everything uttered between $qry->start_form and
$qry->end_form which might include a $qry->div() or $qry->p() ?

The statement later in "The automatic escaping does not apply to other
shortcuts, such as h1().  You should call escapeHTML() yourself on
untrusted data in order to protect your pages against nasty tricks
that people may enter into guestbooks, etc.." seems to indicate that
escaping does not happen and I am tempted to consider "form-generating
functions" as those that generate form elements such as radio boxes,
pop-up lists, submit buttons and so on.

I think I have understood that if I change my default language to
UTF-8 then something like "<" will be translated into a numeric code
rather that &lt;  But that this will only occur in form-generating
functions.  Odd how I find just writing the problem out sometimes
clarifies things.

I'm mostly working in ISO-8859-1 but would like to 'upgrade' to
UTF-8.  I have the routine
 $rslt =~ s/([^\w\s])/sprintf ("&#%d;", ord ($1))/ge;
to escape output before commitiing it to the web page.  Any
enlightenment as to how to ensure the ord function works in a charset
dependent way would be gratefully received.

Regards

L.





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to