Does LWP know anything (or need to know anything) about Unicode?

Rick Measham Sun, 10 Oct 2004 17:21:16 -0700

G'day Unicode Gurus and other assorted members of the perl Unicode
community.


I have a script that attempts to collect translations from Babelfish.
I've posted it below.

It uses LWP::Useragent to turn an English phrase into Japanese (or any
other language supported by BabelFish)*

However, once I get the translation out of the page it appears to be
full of null bytes. I've tried various things like Unicode::String or
Encode, but to no avail. 

The script below just does the grab-and-extract. No unicode stuff.
Please tell me what I should be doing at what point to be able to
extract the correct information.

* Please note: I'm not expecting a great translation so don't bother
pointing out that german for "Report a bug" is "Tell about a cockroach".
I just need something that I can use until a translator has done a real
translation.


#!/usr/bin/perl

use URI::Escape;
require LWP::UserAgent;

my $escape = uri_escape(join(' ', @ARGV));
 
my $ua = LWP::UserAgent->new;
 
my $response = $ua->get("http://babelfish.altavista.com/tr?trtext=$escape&lp=en_ja";);
 
if ($response->is_success) {
        $result = $response->content;
} else {
        die $response->status_line;
}
 
my ($translation) = $result =~ /\Q<td bgcolor=white class=s><div 
style=padding:10px;>\E(.+?)\Q<\/div>\E/; 
 
print $translation ."\n"
    . length($translation) ."\n"
    . ord(substr($translation,0,1));

__END__

Does LWP know anything (or need to know anything) about Unicode?

Reply via email to