However, I recently noticed that if I attempt to upload a file with non-ASCII characters in the name, I can't call a simple Encode::decode_utf8 to write this name into a file. The strange thing is that if I upload and save this file, it preserves the file name. If I write just the name to a file with binmode :bytes and then cat that file to terminal, it displays it correctly. However, if I open the :bytes file in BBEdit set to render utf-8 no BOM, the characters don't display correctly. If I write the filename to a file with binmode :utf8, I think I get double-encoding and nothing displays it correctly.
If I use decode_utf8($filename, Encode::FB_CROAK) (or Encode::FB_WARN), then the utf8 flag on the string gets set and is_utf8 returns true. Otherwise it doesn't. So, it seems there are characters in the string that are causing decode_utf8 to return early. Regardless, the filename string is not valid utf-8.
Other form fields will give me the expected utf-8 string when I call decode_utf8 (in warn/cat/terminal, http/filename output). So, I'm wondering what the difference could be... a CGI module issue? multipart/form-data POST issue?
I've attached a web script example below that will hopefully come through okay for anyone who's interested. Try a filename with some non-ASCII characters and see what happens.
Any insights would be appreciated.
Thanks Andrew
#!/usr/bin/perl
use strict; use utf8; use Encode qw(is_utf8 decode_utf8); use CGI qw(:cgi uploadInfo); use IO::File;
binmode(*STDOUT, ":utf8");
print "Content-Type: text/html; charset=utf-8\n\n"; print <<HTML; <html> <head> <title>test</title> </head> <body> HTML
if($ENV{REQUEST_METHOD} eq 'POST') {
my $buf; my $rfh = upload('file'); my $f = param('file'); my $n = $f; $f = decode_utf8($f); ($f) = $f =~ m/([^\/\\]+)$/;
my $fh = new IO::File; $fh->open('> test.txt'); binmode($fh, ':bytes'); # :utf8 ? print $fh $f, "\n"; # just write the filename into this file $fh->close; $fh->open("> $f"); # save the file itself with its original name binmode($fh, ':bytes'); while(read($rfh, $buf, 1024)) { print $fh $buf; } $fh->close; print 'file name = ', $f, ' / '; $f =~ s/./sprintf("0x%02x ", ord($&))/eg; # check char codes print $f, '<br /><br />'; my $vals = uploadInfo(param('file')); for(keys %{$vals}) { print $_, ': ', decode_utf8($vals->{$_}), '<br />'; }
print '<br />text field t1 = ', decode_utf8(param('t1')), '<br />'; print 'text field t2 = ', decode_utf8(param('t2')), '<br />'; }
print <<HTML;
<br />
<form name="test" method="post" action="test.cgi" enctype="multipart/form-data">
<input type="text" name="t1" value="å †és†" /><br /> <input type="text" name="t2" value="ånøthé® †és†" /><br /> <input type="file" name="file" /><br /><br />
<input type="submit" />
</form> </body> </html> HTML