G'day Unicode Gurus and other assorted members of the perl Unicode community.
I have a script that attempts to collect translations from Babelfish. I've posted it below. It uses LWP::Useragent to turn an English phrase into Japanese (or any other language supported by BabelFish)* However, once I get the translation out of the page it appears to be full of null bytes. I've tried various things like Unicode::String or Encode, but to no avail. The script below just does the grab-and-extract. No unicode stuff. Please tell me what I should be doing at what point to be able to extract the correct information. * Please note: I'm not expecting a great translation so don't bother pointing out that german for "Report a bug" is "Tell about a cockroach". I just need something that I can use until a translator has done a real translation. #!/usr/bin/perl use URI::Escape; require LWP::UserAgent; my $escape = uri_escape(join(' ', @ARGV)); my $ua = LWP::UserAgent->new; my $response = $ua->get("http://babelfish.altavista.com/tr?trtext=$escape&lp=en_ja"); if ($response->is_success) { $result = $response->content; } else { die $response->status_line; } my ($translation) = $result =~ /\Q<td bgcolor=white class=s><div style=padding:10px;>\E(.+?)\Q<\/div>\E/; print $translation ."\n" . length($translation) ."\n" . ord(substr($translation,0,1)); __END__