On 6 Aug 2007 at 12:50, Rob Dixon wrote:

> Dermot Paikkos wrote:
> >
> > CGI;
> > Mime::Lite;
> >
> > I am trying to take the input from a text field from a html page and
> > send it as an email. The text contains a UK sterling £ sign. It looks
> > fine on in the html page but when I send the mail or output the text
> > to STDERR, it gets transformed into this: £
> >
> > Here are a few of the lines from the script:
> >
> > 33  my $str = $q->param('pro');
> > 34  my ($p) = ($str =~ /THIS IS GOING TO COST (.*)320/);
> > 35  my $o = ord($p);
> > 36  my ($hex) = unpack( 'H', $p);
> > 37  print STDERR "Text=",$q->param('pro')," \"$p\" $hex $o\n";
> >
> > And this is the output:
> > Text=THIS IS GOING TO COST £320 "£" c 194
> >
> > I am a bit lost by this as I thought CGI did the heavy lifting with
> > character-encoding. Can anyone give me some pointers?
>
> Hey Dermot
>
> I think you are grabbing two characters from the text instead of one.
> Your ord() is looking only at the first byte (and your unpack only at the
> first four bits!) and HTML entity  is capital A circumflex. Quite
> what it's doing in there I don't know, but try using just /(.)320/ as your
> regex (it's not optional and you don't want more than one). You should get
> a character code of 163 for the pound sign.

Thanx for the tip Rob and your right that my Regex was too greedy. I
now have this:

30  my $str = $q->param('pro');
31  my $length = length($str);
32  my ($p1,$p2) = ($str =~ /(.)(.)320/);
33  my $o1 = ord($p1);
34  my $o2 = ord($p2);
35  my ($hex1) = unpack( 'H', $p1);
36  my ($hex2) = unpack( 'H', $p2);
37  print STDERR "Project=",$q->param('pro')," \"$p1\" \"$p2\" $hex1
$hex2 $o1 $o2 $length\n";


Which outputs this:
Text=THIS IS GOING TO COST £320 "Â" "£" c a 194 163 27

Interestingly I count 26 characters in the field proir to submitting
but length is reported as 27 once it in the CGI.

So the character is there but it is some misinterretation of the
space prior to that as #194; If I copy the data from the field into a
text/hex editor it's shown x20 (SPace).

UTF-8: The referring page has this in the head:
        <meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
which I think should make it legitimate utf-8. I have tried using
charset => 'utf-8' in the start_html prior to read....but wait there
is charset and $q->charset('utf-8') gives me the desired result.

So thanx W.Mumia.
Dp.




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to