In mailing lists, please write your reply below quotation, and cut
quotation to the minimum required for context. Thanks!
E R skribis 2007-10-15 17:01 (-0500):
> As a follow-up, does anyone have any suggestions about optimizing a
> routine such as this: sub escapeHTML {
Probably the best optimization is to use the freely available
HTML::Entities module that comes with LWP.
> $x =~ s/&/&/g; $x =~ s/</</g;
Use a single regex, because every regex has to scan the entire string.
See HTML::Entities for inspiration if you don't want to use the module
(e.g. if you don't want the full spectrum of entities that it supports).
> Encode::encode("iso-8859-1", $x);
It's very probably better to standardize on UTF-8 for your output. Doing
that now saves a lot of trouble when you will need it. And sooner or
later, you will.
> Basically I'm concerned about the overhead to constantly look up the
> encoder sub for every fragment of HTML I need to escape.
Encode your output once, when outputting. PerlIO layers help to automate
this and save a lot of development time:
binmode STDOUT, ":encoding(UTF-8)";
print $foo; # automatically encoded!
--
Met vriendelijke groet, Kind regards, Korajn salutojn,
Juerd Waalboer: Perl hacker <[EMAIL PROTECTED]> <http://juerd.nl/sig>
Convolution: ICT solutions and consultancy <[EMAIL PROTECTED]>