Hi, (here's the long story)
Printing the string yields the correct result, problem is afterwards. I used this code inside a Dancer route handler, now when I just printed out the string to a file or screen everything worked great. But, when I returned it to the browser, I got the wrong encoding. Moreover, if I wrote it into a file, and then used 'send_file' method to send the file, everything was OK (correct encoding). So that got me thinking it's a Dancer issue, which led me to sawyer. He explained that Dancer tries to detect the encoding of strings, and if it's not UTF-8 it will encode it to utf-8. He suggested I tried to decode my string before returning it to Dancer, which worked very well. We ended up wondering why Dancer failed to detect my string was already utf-8 encoded. I got the string from a MongoDB query, and then used lib::XML to create a sitemap with it. I tried to reproduce, but found that if I declare the string in my perl code everything works, so it's probably related to the MongoDB query (perhaps mongo returns just the bytes, so it wasn't marked as utf-8 and then Dancer failed to detect that it was already encoded). Around this step I was happy to have a working sitemap.xml for my website ( mobileweb.ynonperek.com/sitemap.xml) and moved on :) Cheers, Ynon On 12 October 2012 09:10, Gaal Yahas <[email protected]> wrote: > Hold on. The string you already had, the dump of which you gave us, was > already okay, or close enough to it. What happens if you tried just > printing it (not with Data::Dumper)? > > I'm asking because I don't see any UTF-8 specifically, I just see a bunch > of code points. The string is "הצגת-מפ", which you can easily see by > looking up some characters in a Unicode table. You didn't show us any > evidence of UTF-8 overencoding; if there was some, we'd be seeing the > values 0xd7 0x94 etc. (the UTF-8 encoding of the abstract code point > U+05d4). > > I think it's Dumper that was escaping things because it wasn't sure your > terminal could display them or whatever. Just try "print $buf". > > > On Fri, Oct 12, 2012 at 12:40 AM, ynon perek <[email protected]> wrote: > >> Hi All, >> Thanks for all the help. >> >> Problem was in fact the opposite - double encoding (turned out both >> lib::XML and Dancer encode to utf-8...) >> >> I ended up using decode('utf-8') on the data before passing it on, and >> this solved the issue (so now I have encode -> decode -> encode chain... >> which is why abstractions are evil). >> >> Have a great weekend, >> Ynon >> >> >> On 11 October 2012 18:49, Meir Guttman <[email protected]> wrote: >> >>> Hey Gaal,**** >>> >>> I would look up Data::Dumper::AutoEncode ( >>> http://search.cpan.org/~bayashi/Data-Dumper-AutoEncode-0.102/lib/Data/Dumper/AutoEncode.pm). >>> You can then use ‘eDumper’ rather than Dumper to actually see letters. This >>> package also enables you to use any encoding you want. (The default though >>> in utf8.)**** >>> >>> Meir**** >>> >>> ** ** >>> >>> *From:* [email protected] [mailto:[email protected]] *On >>> Behalf Of *Gaal Yahas >>> *Sent:* יום ה 11 אוקטובר 2012 17:03 >>> *To:* Perl in Israel >>> *Subject:* Re: [Israel.pm] Encoding Question**** >>> >>> ** ** >>> >>> U+05d4 is HEBREW LETTER HE etc. -- your buffer is already in Unicode.*** >>> * >>> >>> On Thu, Oct 11, 2012 at 4:51 PM, ynon perek <[email protected]> wrote: >>> **** >>> >>> Hi All,**** >>> >>> ** ** >>> >>> Quick encoding question: I have a text string that I think is in >>> cp1255, because when I print it with Data::Dumper I get:**** >>> >>> ** ** >>> >>> \x{5d4}\x{5e6}\x{5d2}\x{5ea}-\x{5de}\x{5e4}**** >>> >>> >>> **** >>> >>> But, when I try to decode it using:**** >>> >>> ** ** >>> >>> my $decoded = decode('CP1255', $text);**** >>> >>> ** ** >>> >>> I get this error:**** >>> >>> ** ** >>> >>> Wide character in subroutine entry at >>> /Users/ynonperek/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/darwin-2level/Encode.pm >>> line 174, <DATA> line 16.**** >>> >>> Ideas ?**** >>> >>> ** ** >>> >>> -- **** >>> >>> >>> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד >>> לדבר<http://publicspeakr.blogspot.com/> כתוב >>> במיוחד בשבילך.**** >>> >>> ** ** >>> >>> >>> _______________________________________________ >>> Perl mailing list >>> [email protected] >>> http://mail.perl.org.il/mailman/listinfo/perl**** >>> >>> >>> >>> **** >>> >>> ** ** >>> >>> -- >>> Gaal Yahas <[email protected]> >>> http://gaal.livejournal.com/**** >>> >>> _______________________________________________ >>> Perl mailing list >>> [email protected] >>> http://mail.perl.org.il/mailman/listinfo/perl >>> >> >> >> >> -- >> >> כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד >> לדבר<http://publicspeakr.blogspot.com/>כתוב במיוחד בשבילך. >> >> >> _______________________________________________ >> Perl mailing list >> [email protected] >> http://mail.perl.org.il/mailman/listinfo/perl >> > > > > -- > Gaal Yahas <[email protected]> > http://gaal.livejournal.com/ > > _______________________________________________ > Perl mailing list > [email protected] > http://mail.perl.org.il/mailman/listinfo/perl > -- כותב הרצאות ? מדבר מול קהל ? הבלוג שלי לומד לדבר<http://publicspeakr.blogspot.com/>כתוב במיוחד בשבילך.
_______________________________________________ Perl mailing list [email protected] http://mail.perl.org.il/mailman/listinfo/perl
