I am working on XP. If I leave the active code page as default, when I
do dir, I get question marks for the file name. If I change the code
page to 862, for example, I get accented Latin characters.

However, no matter what I do in Perl, I get real question marks back. I
know that because I dumped the values with ord(). It is ascii 63.


On Thu, 2005-02-24 at 15:23 +0100, Guido Flohr wrote:
> Hi,
> sorry, my original reply (see below) went to the sender, not to the list.
> Peter Gordon wrote:
> > I am using ActiveState Perl 5.008006.
> > 
> > I am trying on Hebrew filenames at the moment, but the program will need
> > to run on all languages.
> The language does not matter, it is the charset.  Hebrew can be coded in 
> Unicode/UTF-8 or iso-8859-8 or cp-whatever.  You really have to find out 
> which charset your file system uses.
> > I tried "use bytes" and still get back question marks. 
> What is "back" and what are the "question marks"? Do you see "back" (the 
> output of your script) in your terminal window/DOS box or in an output 
> file? And are there really question marks or are they not displayed 
> correctly?
> Does your script throw warnings? Do you "use warnings"?
> > That's all the information that I have.
> The information about the charset used in your input data is required. A 
> simple way to find that out goes like this:
> #! /usr/bin/perl
> use strict;
> use warnings;
> use bytes;
> opendir DIR, "/path/to/dir" or die "opendir: $!";
> my @files = readdir DIR;
> open HANDLE, ">filelist.html" or die "open filelist.html: $!";
> print HANDLE "<html><body><ul>\n";
> foreach (@files) {
>       print HANDLE "<li>$_</li>\n";
> }
> print HANDLE "</body></html>\n";
> __END__
> Provided that you have changed the path argument to opendir in line
> 7 this will create a "filelist.html" in the current directory.  Open 
> that file in a browser and then change the encoding to some western 
> european charset like iso-8859-1 or windows-1252.  In Mozilla this is 
> View->Chacter Encoding->...
> When you see question marks here, then they are real, i. e. something 
> (readdir, the OS?) has converted the input to question marks.  Otherwise 
> you should see accented western european characters instead of Hebrew.
> Now change the encoding to utf-8/Unicode.  Question marks? Then it is 
> _not_ Unicode.
> Change it to some Hebrew character set.  You see Hebrew? Then you have 
> an 8 bit Hebrew character set, probably IBM-862 or ISO-8859-8.
> Both utf-8 and 8 bit character sets only show question marks or empty 
> boxes? Then your font probably lacks the Hebrew glyphs.
> You can make the test again with "use utf8" and compare the results.
> What is your script supposed to do? If you just want to pass data from 
> here to there, you have no problem.  But if you want to process it 
> together with data from other languages, you have to make sure that all 
> data is converted to Unicode internally.
> Guido
> My original reply below:
> >>>The problem is, that filenames, when using opendir, are returned as
> >>>question marks. In the DOS box I have set the codepage to 862. So DIR
> >>>returns accented characters, but Perl still returns question marks. I
> >>>have also set "use utf8", but that didn't help either.
> >>
> >>Are the filenames really in UTF-8? If not, you would need "use bytes" 
> >>instead of "use utf8".  If that dos not help, you should give more 
> >>detailed information: Which Perl version? Which character sets are 
> >>actually used in the filenames?
> >>
> >>
> >>>So the problem I have is how to proceed. Should I give up with Perl and
> >>>use Java or C? Any suggestions gratefully received.
> >>
> >>Do you want to blackmail us? ;-)
> >>
> >>Regards,
> >>Guido
