A casual user won't understand that documentation... Hell, I'm not even 
sure I completely understand the implications of it and when to use/not 
use escape_html based on it...  I think an example is called for, but 
not in the POD...  Maybe in the Guide?

  Issac

Eric Cholet wrote:

> --On Sunday, March 24, 2002 21:57:54 +0000 [EMAIL PROTECTED] wrote:
>
>> dougm       02/03/24 13:57:53
>>
>>   Modified:    .        Changes STATUS
>>                src/modules/perl Util.xs
>>                t/net/perl util.pl
>>   Log:
>>   Submitted by:   Geoff Young <[EMAIL PROTECTED]>
>>   Reviewed by:    dougm
>>   properly escape highbit chars in Apache::Utils::escape_html
>
>
> This is uncool for those of us using a non-ASCII encoding and sending
> out lots of characters with the 8th bit set, e.g. in a French page
> many accented characters will be replaced by 6-byte sequences.
> If I'm sending out "Content-type: text/html; charset=ISO-8859-1",
> and calling escape_html to escape '<', '>' and the like, I'm going
> to be serving quite a lot more bytes than before this patch.
>
> However escape_html () has no clue as to what the character set is,
> and whether it has been correctly specified in the Content-Type.
> It has also be mentionned here that escape_html is only valid for
> single-byte encodings.
>
> So this patch does the right thing to escape the odd 8 bit char in
> a mostly ASCII output, but users of other charsets should be warned
> not to use it. I use HTML::Entities::encode($_[0], '<>&"') myself.
>
> Therefore I propose a doc patch to clear this up:
>
> Index: Util.pm
> ===================================================================
> RCS file: /home/cvs/modperl/Util/Util.pm,v
> retrieving revision 1.8
> diff -u -r1.8 Util.pm
> --- Util.pm    4 Mar 2000 20:55:47 -0000    1.8
> +++ Util.pm    25 Mar 2002 18:19:37 -0000
> @@ -68,6 +68,13 @@
>
>  my $esc = Apache::Util::escape_html($html);
>
> +This function is unaware of its argument's character set and encoding.
> +It assumes a single-byte encoding and escapes all characters with the
> +8th bit set. Do not use it with multi-byte encodings such as utf8.
> +When using a single byte non-ASCII encoding such as ISO-8859-1,
> +consider specifying the character set in the Content-Type header,
> +and using HTML::Entities to avoid unnecessary escaping.
> +
> =item escape_uri
>
> This function replaces all unsafe characters in the $string with their
>
>
> -- 
> Eric Cholet
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]




Reply via email to