Hi Gaal,

I am sorry to say, but I am not familiar with MongoDB, only with MySQL.

In MySQL you have to specify what encoding are you storing text, what encoding 
your current input is, etc., although one can specify the default encoding, 
usually UTF-8. In particular, when you use DBI and you create a “connection” to 
the DB, you must specify in the “connect attributes”, among other things, also 
to enable utf-8, like this:

my %conn_attrs = (RaiseError  => 1,

                  PrintError  => 0,

                  AutoCommit  => 1,

                  mysql_enable_utf8  => 1);

Discovering this was rather long, frustrating and took a lonk time!

My be there is a similar attribute in MongoDB?

I am afraid that this is the only help I can provide...

Meir

 

From: [email protected] [mailto:[email protected]] On Behalf Of 
ynon perek
Sent: יום ו 12 אוקטובר 2012 12:56
To: Perl in Israel
Subject: Re: [Israel.pm] Encoding Question

 

Hi,

 

(here's the long story)

 

Printing the string yields the correct result, problem is afterwards.

 

I used this code inside a Dancer route handler, now when I just printed out the 
string to a file or screen everything worked great.

 

But, when I returned it to the browser, I got the wrong encoding.

Moreover, if I wrote it into a file, and then used 'send_file' method to send 
the file, everything was OK (correct encoding).

 

So that got me thinking it's a Dancer issue, which led me to sawyer. He 
explained that  Dancer tries to detect the encoding of strings, and if it's not 
UTF-8 it will encode it to utf-8.  

He suggested I tried to decode my string before returning it to Dancer, which 
worked very well.

 

We ended up wondering why Dancer failed to detect my string was already utf-8 
encoded. 

I got the string from a MongoDB query, and then used lib::XML to create a 
sitemap with it. 

 

I tried to reproduce, but found that if I declare the string in my perl code 
everything works, so it's probably related to the MongoDB query (perhaps mongo 
returns just the bytes, so it wasn't marked as utf-8 and then Dancer failed to 
detect that it was already encoded).

 

Around this step I was happy to have a working sitemap.xml for my website 
(mobileweb.ynonperek.com/sitemap.xml) and moved on :)

 

Cheers,

  Ynon

 

 

 

 

On 12 October 2012 09:10, Gaal Yahas <[email protected]> wrote:

Hold on. The string you already had, the dump of which you gave us, was already 
okay, or close enough to it. What happens if you tried just printing it (not 
with Data::Dumper)?

I'm asking because I don't see any UTF-8 specifically, I just see a bunch of 
code points. The string is "הצגת-מפ", which you can easily see by looking up 
some characters in a Unicode table. You didn't show us any evidence of UTF-8 
overencoding; if there was some, we'd be seeing the values 0xd7 0x94 etc. (the 
UTF-8 encoding of the abstract code point U+05d4).

 

I think it's Dumper that was escaping things because it wasn't sure your 
terminal could display them or whatever. Just try "print $buf".

 

 

On Fri, Oct 12, 2012 at 12:40 AM, ynon perek <[email protected]> wrote:

Hi All,

Thanks for all the help. 

 

Problem was in fact the opposite - double encoding (turned out both lib::XML 
and Dancer encode to utf-8...)

 

I ended up using decode('utf-8') on the data before passing it on, and this 
solved the issue (so now I have encode -> decode -> encode chain... which is 
why abstractions are evil).

 

Have a great weekend, 

  Ynon

 

On 11 October 2012 18:49, Meir Guttman <[email protected]> wrote:

Hey Gaal,

I would look up Data::Dumper::AutoEncode 
(http://search.cpan.org/~bayashi/Data-Dumper-AutoEncode-0.102/lib/Data/Dumper/AutoEncode.pm).
 You can then use ‘eDumper’ rather than Dumper to actually see letters. This 
package also enables you to use any encoding you want. (The default though in 
utf8.)

Meir

 

From: [email protected] [mailto:[email protected]] On Behalf Of 
Gaal Yahas
Sent: יום ה 11 אוקטובר 2012 17:03
To: Perl in Israel
Subject: Re: [Israel.pm] Encoding Question

 

U+05d4 is HEBREW LETTER HE etc. -- your buffer is already in Unicode.

On Thu, Oct 11, 2012 at 4:51 PM, ynon perek <[email protected]> wrote:

Hi All,

 

Quick encoding question: I have  a text string that I think is in cp1255, 
because when I print it with Data::Dumper I get:

 

\x{5d4}\x{5e6}\x{5d2}\x{5ea}-\x{5de}\x{5e4}




But, when I try to decode it using:

 

my $decoded = decode('CP1255', $text);

 

I get this error:

 
 
Wide character in subroutine entry at 
/Users/ynonperek/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/darwin-2level/Encode.pm
 line 174, <DATA> line 16.

Ideas ?

 

-- 


כותב הרצאות ? מדבר מול קהל ? הבלוג שלי  <http://publicspeakr.blogspot.com/> 
לומד לדבר כתוב במיוחד בשבילך.

 


_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl





 

-- 
Gaal Yahas <[email protected]>
http://gaal.livejournal.com/


_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl





 

-- 


כותב הרצאות ? מדבר מול קהל ? הבלוג שלי  <http://publicspeakr.blogspot.com/> 
לומד לדבר כתוב במיוחד בשבילך.

 


_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl





 

-- 
Gaal Yahas <[email protected]>
http://gaal.livejournal.com/


_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl





 

-- 


כותב הרצאות ? מדבר מול קהל ? הבלוג שלי  <http://publicspeakr.blogspot.com/> 
לומד לדבר כתוב במיוחד בשבילך.

 

_______________________________________________
Perl mailing list
[email protected]
http://mail.perl.org.il/mailman/listinfo/perl

Reply via email to