On Fri, Oct 7, 2011 at 4:39 AM, Igor Dovgiy <ivd.pri...@gmail.com> wrote: >> $VAR1 = { >> 'Subject' => "\x{fffd}\x{fffd}my subject", >> 'CreationDate' => 'D:20111006161347+02\'00\'', >> 'Producer' => "\x{fffd}\x{fffd}LibreOffice 3.3", >> 'Creator' => "\x{fffd}\x{fffd}Writer", >> 'Author' => "\x{fffd}\x{fffd}Marcos Rebelo", >> 'Title' => "\x{fffd}\x{fffd}my title", >> 'Keywords' => "\x{fffd}\x{fffd}my keywords" >> }; *snip* >> How can I clean the hash? >>
I know next to nothing about Unicode programming (in any language), but it seems to always be the same prefix. Printing this out in Windows' cmd shell seems to yield the same prefix that I see in UTF-8 files with a BOM (byte-order mark). Oddly, your data seems to have two of them, which I can't explain, but I digress. Could you not just remove those two characters with a s///? my $info = $pdf->info(); for my $key (keys %{$info}) { next if ref $info->{$key}; $info->{$key} =~ s/^\x{fffd}+//; } (Untested) Note that I didn't bother traversing beyond the first level of the data structure, but you may want to if the data can be more complex than that... I don't know. I'm sure that this is a bad way to handle Unicode, but perhaps it will be "good enough" for now. Maybe look here for some possibly better advice: http://ahinea.com/en/tech/perl-unicode-struggle.html -- Brandon McCaig <bamcc...@gmail.com> <bamcc...@castopulence.org> Castopulence Software <https://www.castopulence.org/> Blog <http://www.bamccaig.com/> perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }. q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.}; tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say' -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/