Re: pugs CGI.pm

Nathan Gray Wed, 13 Apr 2005 10:22:14 -0700

On Wed, Apr 13, 2005 at 09:52:41AM -0400, Stevan Little wrote:
> On Apr 13, 2005, at 9:20 AM, B�RTH�ZI Andr�s wrote:
> >As Pugs works in UTF-8, my page is coded in UTF-8, too (and there are 
> >some other reasons, too). When I try to send an accented charater to 
> >the server as parameter, for example the euro character, I get back an 
> >UTF-8 coded character:
> >
> > ...?test=%E2%82%AC
> >
> >It's OK, but when my code (and CGI.pm as well) try to decode it, it 
> >will give back three characters and not just one.
> >
> >The problem is with this line in sub url_decode():
> >
> > $decoded ~~ s:perl5:g/%([\da-fA-F][\da-fA-F])/{chr(hex($1))}/;
> >
> >Have any idea, how to solve it? I think I should transform this code 
> >to recognize multi-bytes, decode the character value, and after it use 
> >chr on this value. Or is there a way to do it by not creating 
> >character by chr(), but a byte with another function?
> 
> To be honest, my experience with multi-byte character sets is very 
> limited (my first real exposure is on the Pugs project). However, I 
> think/hope that maybe the chr() builtin will eventually be able to 
> handle multi-bytes itself. In the (non-working) port of CGI-Lite 
> (http://tpe.freepan.org/repos/iblech/CGI-Lite/lib/CGI/Lite.pm), I saw 
> code which did this:
> 
>      /%(<[\da-fA-F]>**{2})/{chr :16($1)}/
> 
> Of course it was followed by this comment "# XXX -- correct?" so it may 
> not be anything official yet.


The trick is that URL encoding encodes bytes, not characters:

  http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars

So in the regex we have to determine whether we are unencoding a
single-byte or multi-byte character.

Both

  s:perl5:g/%([\da-fA-F][\da-fA-F])/{chr(hex($1))}/
  
and 
      /%(<[\da-fA-F]>**{2})/{chr :16($1)}/
 
read in a single byte and pass it to chr().  I do not have enough
experience with multi-byte characters to know when a byte can be
recognized as the first byte of a multi-byte character, and thus grab
the next byte before passing to chr().

-kolibrie

Re: pugs CGI.pm

Reply via email to