I am working on XP. If I leave the active code page as default, when I do dir, I get question marks for the file name. If I change the code page to 862, for example, I get accented Latin characters.
However, no matter what I do in Perl, I get real question marks back. I know that because I dumped the values with ord(). It is ascii 63. Peter On Thu, 2005-02-24 at 15:23 +0100, Guido Flohr wrote: > Hi, > > sorry, my original reply (see below) went to the sender, not to the list. > > Peter Gordon wrote: > > I am using ActiveState Perl 5.008006. > > > > I am trying on Hebrew filenames at the moment, but the program will need > > to run on all languages. > > The language does not matter, it is the charset. Hebrew can be coded in > Unicode/UTF-8 or iso-8859-8 or cp-whatever. You really have to find out > which charset your file system uses. > > > I tried "use bytes" and still get back question marks. > > What is "back" and what are the "question marks"? Do you see "back" (the > output of your script) in your terminal window/DOS box or in an output > file? And are there really question marks or are they not displayed > correctly? > > Does your script throw warnings? Do you "use warnings"? > > > That's all the information that I have. > > The information about the charset used in your input data is required. A > simple way to find that out goes like this: > > #! /usr/bin/perl > > use strict; > use warnings; > use bytes; > > opendir DIR, "/path/to/dir" or die "opendir: $!"; > my @files = readdir DIR; > > open HANDLE, ">filelist.html" or die "open filelist.html: $!"; > print HANDLE "<html><body><ul>\n"; > foreach (@files) { > print HANDLE "<li>$_</li>\n"; > } > print HANDLE "</body></html>\n"; > __END__ > > Provided that you have changed the path argument to opendir in line > 7 this will create a "filelist.html" in the current directory. Open > that file in a browser and then change the encoding to some western > european charset like iso-8859-1 or windows-1252. In Mozilla this is > View->Chacter Encoding->... > > When you see question marks here, then they are real, i. e. something > (readdir, the OS?) has converted the input to question marks. Otherwise > you should see accented western european characters instead of Hebrew. > > Now change the encoding to utf-8/Unicode. Question marks? Then it is > _not_ Unicode. > > Change it to some Hebrew character set. You see Hebrew? Then you have > an 8 bit Hebrew character set, probably IBM-862 or ISO-8859-8. > > Both utf-8 and 8 bit character sets only show question marks or empty > boxes? Then your font probably lacks the Hebrew glyphs. > > You can make the test again with "use utf8" and compare the results. > > What is your script supposed to do? If you just want to pass data from > here to there, you have no problem. But if you want to process it > together with data from other languages, you have to make sure that all > data is converted to Unicode internally. > > Guido > > My original reply below: > > >>>The problem is, that filenames, when using opendir, are returned as > >>>question marks. In the DOS box I have set the codepage to 862. So DIR > >>>returns accented characters, but Perl still returns question marks. I > >>>have also set "use utf8", but that didn't help either. > >> > >>Are the filenames really in UTF-8? If not, you would need "use bytes" > >>instead of "use utf8". If that dos not help, you should give more > >>detailed information: Which Perl version? Which character sets are > >>actually used in the filenames? > >> > >> > >>>So the problem I have is how to proceed. Should I give up with Perl and > >>>use Java or C? Any suggestions gratefully received. > >> > >>Do you want to blackmail us? ;-) > >> > >>Regards, > >>Guido > > -- Peter Gordon Phone: +972 544 438029 Email: [EMAIL PROTECTED] Web: www.pg-consultants.com