Christophe,

Thanks for your mail. So far Encode::Guess does not guess the encoding for filehandle not because it is impossible but because Encode::Guess takes a very conservative -- even paranoia -- approach. For example, Ambiguity raises exception, not preferred encoding.

One way to enable guessing filehandle goes like this.

sub open_and_guess{
  my $filename = shift;
  open my $fh, "<:raw", $filename or return; # or die if you like
  my $head = <$fh>; # may not work for UTF-(16|32);
  my $enc  = guess_encoding($head);
  ref $enc or die $enc; # or return $enc
  my $encname = $enc->name;
  seek $fh, 0, 0;
  binmode $fh, ":encoding($encname)";
  return $fh;
}

Here we open, guess, and reopen but this does not work for general case; Not all files are seekable and reopenable (i.e. pipes and sockets).

To know more about Encode::Guess, try

http://search.cpan.org/~dankogai/Encode-2.12/lib/Encode/Guess.pm

Yours,

Dan the Encode Maintainer


On Oct 24, 2005, at 19:12 , HERMIER Christophe wrote:

Hello,

I am using the Encode::Guess module to detect the encoding of a file before opening it. Basically I believe that I can have various sorts of unicode encodings or latin-1.

What I want to get is an encoding string to give back to "open".
my code goes like this :

my $codage      = guess_encoding ( $debut );

if ( UNIVERSAL::isa ( $codage, "Encode::utf8" ) )
{
        $codage = ":utf8";
}
elsif ( UNIVERSAL::isa ( $codage, "Encode::Unicode" ) )
{
        $codage = ":encoding(utf-16)";
}
else
{
        $codage = ":encoding(iso-8859-1)";
}


The problem is with the "Encode::Unicode" case : I don't know if it is UTF16-LE ou UTF16-BE
and it could even be UTF32....

Is there a way to know that ???


BTW, I checked your homepage (<http://www.dan.co.jp/>http:// www.dan.co.jp/) first but it does not seem to work ?

Regards,
Christophe.




Reply via email to