[EMAIL PROTECTED] said:
> CGI::Util has a couple functions escape() and unescape() which url encode/
> decode strings. Unfortunately I lose the utf8 flag on my scalar when I
> encode then decode using those functions (see below). Should unescape()
> be setting the utf8 flag? Or is there no way for unescape() to know that
> it should set the utf8 flag?
Looking at the source for CGI::Util, it appears that disabling the utf8
flag is intended as a feature, not a bug:
# URL-encode data
sub escape {
shift() if @_ > 1 and ( ref($_[0]) || (defined $_[1] && $_[0] eq
$CGI::DefaultClass));
my $toencode = shift;
return undef unless defined($toencode);
# force bytes while preserving backward compatibility -- dankogai
$toencode = pack("C*", unpack("C*", $toencode));
if ($EBCDIC) {
$toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",$E2A[ord($1)])/eg;
} else {
$toencode=~s/([^a-zA-Z0-9_.-])/uc sprintf("%%%02x",ord($1))/eg;
}
return $toencode;
}
Seeing how this and the "unescape" function are set up, I would guess that
there is no way for "unescape" to "know" when a given input string should
be decoded as utf8 data. Only the calling app can know that, and it should
apply the conversion to the output of "unescape". CGI::Util is way too
"general purpose" to make assumptions about character encodings.
Since Dan Kogai is a frequent contributor to this list, he might have more
to say on this.
David Graff