I am working on XP. If I leave the active code page as default, when I
do dir, I get question marks for the file name. If I change the code
page to 862, for example, I get accented Latin characters.
However, no matter what I do in Perl, I get real question marks back. I
know that because I dumped the values with ord(). It is ascii 63.
Peter
On Thu, 2005-02-24 at 15:23 +0100, Guido Flohr wrote:
> Hi,
>
> sorry, my original reply (see below) went to the sender, not to the list.
>
> Peter Gordon wrote:
> > I am using ActiveState Perl 5.008006.
> >
> > I am trying on Hebrew filenames at the moment, but the program will need
> > to run on all languages.
>
> The language does not matter, it is the charset. Hebrew can be coded in
> Unicode/UTF-8 or iso-8859-8 or cp-whatever. You really have to find out
> which charset your file system uses.
>
> > I tried "use bytes" and still get back question marks.
>
> What is "back" and what are the "question marks"? Do you see "back" (the
> output of your script) in your terminal window/DOS box or in an output
> file? And are there really question marks or are they not displayed
> correctly?
>
> Does your script throw warnings? Do you "use warnings"?
>
> > That's all the information that I have.
>
> The information about the charset used in your input data is required. A
> simple way to find that out goes like this:
>
> #! /usr/bin/perl
>
> use strict;
> use warnings;
> use bytes;
>
> opendir DIR, "/path/to/dir" or die "opendir: $!";
> my @files = readdir DIR;
>
> open HANDLE, ">filelist.html" or die "open filelist.html: $!";
> print HANDLE "<html><body><ul>\n";
> foreach (@files) {
> print HANDLE "<li>$_</li>\n";
> }
> print HANDLE "</body></html>\n";
> __END__
>
> Provided that you have changed the path argument to opendir in line
> 7 this will create a "filelist.html" in the current directory. Open
> that file in a browser and then change the encoding to some western
> european charset like iso-8859-1 or windows-1252. In Mozilla this is
> View->Chacter Encoding->...
>
> When you see question marks here, then they are real, i. e. something
> (readdir, the OS?) has converted the input to question marks. Otherwise
> you should see accented western european characters instead of Hebrew.
>
> Now change the encoding to utf-8/Unicode. Question marks? Then it is
> _not_ Unicode.
>
> Change it to some Hebrew character set. You see Hebrew? Then you have
> an 8 bit Hebrew character set, probably IBM-862 or ISO-8859-8.
>
> Both utf-8 and 8 bit character sets only show question marks or empty
> boxes? Then your font probably lacks the Hebrew glyphs.
>
> You can make the test again with "use utf8" and compare the results.
>
> What is your script supposed to do? If you just want to pass data from
> here to there, you have no problem. But if you want to process it
> together with data from other languages, you have to make sure that all
> data is converted to Unicode internally.
>
> Guido
>
> My original reply below:
>
> >>>The problem is, that filenames, when using opendir, are returned as
> >>>question marks. In the DOS box I have set the codepage to 862. So DIR
> >>>returns accented characters, but Perl still returns question marks. I
> >>>have also set "use utf8", but that didn't help either.
> >>
> >>Are the filenames really in UTF-8? If not, you would need "use bytes"
> >>instead of "use utf8". If that dos not help, you should give more
> >>detailed information: Which Perl version? Which character sets are
> >>actually used in the filenames?
> >>
> >>
> >>>So the problem I have is how to proceed. Should I give up with Perl and
> >>>use Java or C? Any suggestions gratefully received.
> >>
> >>Do you want to blackmail us? ;-)
> >>
> >>Regards,
> >>Guido
>
>
--
Peter Gordon
Phone: +972 544 438029
Email: [EMAIL PROTECTED]
Web: www.pg-consultants.com