-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
> When reading file names with e.g. Umlauts from a directory, either via
> readdir() or glob() and storing them in a db these strings are not
> correctly returned from the DB. This does not appear when the strings are
> ordinary Perl Strings.
I'm pretty sure this is because of a known problem with Perl, in that it
doesn't treat globs and the like as utf-8 when they should be. To illustrate,
I modified the original script a bit and added:
use utf8;
use Data::Peek;
Then I took a look at the same named file, both provided directly in
the script, and from the glob. Note the difference via DDump($file):
SV = PV(0x8c0f0e8) at 0x8cd6430
REFCNT = 2
FLAGS = (POK,pPOK,UTF8)
PV = 0x8cd9bf8 "./files/K\303\266ln"\0 [UTF8 "./files/K\x{f6}ln"]
CUR = 14
LEN = 16
SV = PV(0x8c0f0d0) at 0x8cd62c8
REFCNT = 2
FLAGS = (POK,pPOK)
PV = 0x8cd9c38 "./files/K\303\266ln"\0
CUR = 13
LEN = 16
The first one, which Perl recognizes as a UTF8 string, goes into
and comes out of the database just fine. The second (via glob)
does not. Ideally Perl would be smart enough to set UTF8 on for
such filenames, but it does not. I'm not sure there is anything
DBD::Pg could sensibly do. One solution to the problem at hand
may be to simply upgrade the string yourself before handing it
off to the database, like so:
utf8::upgrade($file);
- --
Greg Sabino Mullane [email protected]
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201308292202
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAlIf/VcACgkQvJuQZxSWSshmSQCg7//0IBH3+GeBtmM6PHIRw9qO
F6IAnA0ylRdrgh8xplMwNTn3h+Iqvi7J
=yPxj
-----END PGP SIGNATURE-----