Just a quick reply (it's bedtime over here): there may be 2 problems. 1 is that the mail program put in an unwanted linebreak after the =~ part, just remove it; it should all be one line. And then: you'll need a fairly recent version of perl for it to work, what do you get when you do perl --version I guess for utf to work, it should be at least 5.8.0. Your basic idea of the usage is right (I'm not a windows person, but I assume it should be the same): save the scipt as utf2tex.pl, make it executable and call it as utf2tex.pl FILENAME.txt.
I guess it would be easiest to convert the utf to ascii directly - that would mean you could later convert it back. I have a set of scripts that do just that -- convert babel Greek into utf-8 and back. If you need more help, I'll look into it tomorrow! Best Thomas On Sat, 2004-06-05 at 23:33, Idris Samawi Hamid wrote: > On Sat, 05 Jun 2004 22:41:39 +0200, Thomas A. Schmitz > <[EMAIL PROTECTED]> wrote: > > > Idris, > > > > I know a bit of perl and would love to help. However, I fear that > > sending us your stuff via mail will be a bit difficult because the utf-8 > > chracters get transformed into gibberish. > > Thnx 4 such a speedy reply! I don't think you are getting gibberish > though; you should be getting the extended ascii representation. So the > letter alif (hex 0627) should look like this: > > ÃÂ > > Do you get a forward-slashed circle and a section symbol? If so, that's > the ascii representation I'm trying to convert to the letter `A'. > > Here are the codes you want: > > ÃÂ [0627] => A > > ÃÂ [0628] => b > > ÃÂ [062C] => j > > ÃÂ [062F] => d > > Ãâ [0647] => h > > ÃË [0648] => w > > ÃÂ [0632] => z > > Let me explain my situation more clearly:-) > > I have a unicode editor, Unitype Global Writer. I save a unicode document > as a utf *.txt file. When I open that saved file in my TeX editor > (WinEdt), it comes out as extended ascii (that's the "gibberish"). So what > I wanted to do was convert the ascii "gibberish" to my Latin > transcription. It seems that what you are suggesting is to use the hex > representation and convert the unicode txt file into a Latin transcription > file directly and bypass the gibberish. > > On your perl file, can you give me an example of how to use it? I tried > (in windows, with name > utf2tex.pl and unicode text in unicode-utf.txt) and get > > ========================= > > perl utf2tex.pl unicode-utf.txt > Unknown discipline class ':utf8' at C:/Perl/lib/open.pm line 18. > BEGIN failed--compilation aborted at utf2tex.pl line 4. > ========================= > > from your script I tried, e.g. > > ============================ > $_ =~ > s/\x{0627}/\x{0041}/esg; > # from alif to `A' > ============================ > > Your guidance will be greatly appreciated! > > Thnx a million! > Idris _______________________________________________ ntg-context mailing list [EMAIL PROTECTED] http://www.ntg.nl/mailman/listinfo/ntg-context