On Fri, Oct 7, 2011 at 4:39 AM, Igor Dovgiy <ivd.pri...@gmail.com> wrote:
>> $VAR1 = {
>>          'Subject' => "\x{fffd}\x{fffd}my subject",
>>          'CreationDate' => 'D:20111006161347+02\'00\'',
>>          'Producer' => "\x{fffd}\x{fffd}LibreOffice 3.3",
>>          'Creator' => "\x{fffd}\x{fffd}Writer",
>>          'Author' => "\x{fffd}\x{fffd}Marcos Rebelo",
>>          'Title' => "\x{fffd}\x{fffd}my title",
>>          'Keywords' => "\x{fffd}\x{fffd}my keywords"
>>        };
*snip*
>> How can I clean the hash?
>>

I know next to nothing about Unicode programming (in any
language), but it seems to always be the same prefix. Printing
this out in Windows' cmd shell seems to yield the same prefix
that I see in UTF-8 files with a BOM (byte-order mark). Oddly,
your data seems to have two of them, which I can't explain, but I
digress. Could you not just remove those two characters with a
s///?

my $info = $pdf->info();

for my $key (keys %{$info})
{
    next if ref $info->{$key};
    $info->{$key} =~ s/^\x{fffd}+//;
}

(Untested)

Note that I didn't bother traversing beyond the first level of
the data structure, but you may want to if the data can be more
complex than that... I don't know.

I'm sure that this is a bad way to handle Unicode, but perhaps it
will be "good enough" for now.

Maybe look here for some possibly better advice:

http://ahinea.com/en/tech/perl-unicode-struggle.html


-- 
Brandon McCaig <bamcc...@gmail.com> <bamcc...@castopulence.org>
Castopulence Software <https://www.castopulence.org/>
Blog <http://www.bamccaig.com/>
perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }.
q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.};
tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say'

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to