Re: Is perl unicode or not?

Nick Ing-Simmons Sun, 13 Oct 2002 07:11:48 -0700

Nadim <[EMAIL PROTECTED]> writes:
>Hi, I guess the following question has already been answerd somewher but I 
>couldn't find it on the archive. A search feature for the archive would be 
>great. After reading 20-30 mails I started to get tired of it. Has anyone 
>seen a a mail-archive crawler which can do a search. (That sound almost 
>like a fun script to write)
>
>More seriously. I am using 5.6.3 on windows from activestate. I do the 
>following.


I don't think you are. As far as I am aware there is only perl5.6.1
there isn't a .3 subversion yet.


>>>>>>>
>my $ole_object = ..... ;
>my $unicode_string = $ole_object->GetUnicodeString() ;

OLE objects are a Win32 thing. You would be better off asking on 
one of the Win32 aware ActiveState lists. We would at least need 
to know how you created $ole_object so we can lookup the code 
that gets the string.

>
>print length($unicode_string), "\n" ;
># prints 17, which is the length of the unicode string

Cool - but are you sure you got the real string?

>
>use byte () ;
>print byte::length($unicode_string), "\n" ;
># prints 17, wow, the string is japanese I expect 34

The byte:: hackery is _very_ confusing to all concerned.
It returns the length the string happens to be in perl's internal 
encoding. That may be either iso-8859-1 or UTF-8. If the original 
"japanese" happened to be all iso-8859-1 even though it used to be 
2-bytes/char it will be held (normally) by perl as 1-byte per-char.
You will also get 1-byte/char if (as I suspect is happening here)
->GetUnicodeString has converted things it does not understand to '?'.

>
>print $unicode_string ;
># prints ??????????????? on the console

Hmm - as perl5.6 does not have "smart" Unicode IO (perl5.8 does), 
this suggests that string is actually '?' x 17 - i.e. you got "junk"
back from the OLE call.


>
>print FILE $unicode_string ;
># prints ??????????????? in the file
><<<<<<

Likewise.

>
>What the script is to do is:
>1/ get a unicode string from an Ole object

   Contact OLE expert and find out if that works.

>2/ read a unicode string from a file

   For perl5.6 file has to be in UTF-8 and you need to do some hackery
   (which was so horrible I can't recall it).
   For perl5.8 this is easy - it was a major goal of perl5.8.
   
>3/ compare both strings and act upon the comparison

   Once you have two Unicode strings this is easy.
   
>
>if the string I get from ole _is_ unicode (and it seems so) 

What leads you to that conclusion?

>how can I 
>flatten it to binary? I tried with unpack without success.



>
>Nadim.
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Re: Is perl unicode or not?

Reply via email to