Re: CGI::Util unescape() after escape() loses utf8 flag

David Graff Wed, 28 Sep 2005 02:02:08 -0700

[EMAIL PROTECTED] said:
> CGI::Util has a couple functions escape() and unescape() which url encode/
> decode strings.  Unfortunately I lose the utf8 flag on my scalar when I
> encode then decode using those functions (see below).  Should unescape()
> be setting the utf8 flag? Or is there no way for unescape() to know that
> it should set the utf8 flag?


Looking at the source for CGI::Util, it appears that disabling the utf8 
flag is intended as a feature, not a bug:

# URL-encode data
sub escape {
  shift() if @_ > 1 and ( ref($_[0]) || (defined $_[1] && $_[0] eq 
$CGI::DefaultClass));
  my $toencode = shift;
  return undef unless defined($toencode);
  # force bytes while preserving backward compatibility -- dankogai
  $toencode = pack("C*", unpack("C*", $toencode));
    if ($EBCDIC) {
      $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",$E2A[ord($1)])/eg;
    } else {
      $toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",ord($1))/eg;
    }
  return $toencode;
}

Seeing how this and the "unescape" function are set up, I would guess that
there is no way for "unescape" to "know" when a given input string should
be decoded as utf8 data.  Only the calling app can know that, and it should
apply the conversion to the output of "unescape".  CGI::Util is way too 
"general purpose" to make assumptions about character encodings.

Since Dan Kogai is a frequent contributor to this list, he might have more
to say on this.

        David Graff

Re: CGI::Util unescape() after escape() loses utf8 flag

Reply via email to