Re: Reading ASCII file with some codes above 127 (exten ascii)

Paul Wed, 23 May 2012 12:13:35 -0700

On Wednesday, 23 May 2012 at 19:01:53 UTC, Graham Fawcett wrote:

On Wednesday, 23 May 2012 at 18:43:04 UTC, Paul wrote:
On Wednesday, 23 May 2012 at 18:04:56 UTC, Graham Fawcettwrote:
On Wednesday, 23 May 2012 at 15:48:20 UTC, Paul wrote:
On Monday, 14 May 2012 at 12:58:20 UTC, Graham Fawcett wrote:
On Sunday, 13 May 2012 at 21:03:45 UTC, Paul wrote:
I am reading a file that has a few extended ASCII codes(e.g. degree symdol). Depending on how I read the file inand what I do with it the error shows up at differentpoints. I'm pretty sure it all boils down to the theseextended ascii codes.
Can I just tell dmd that I'm reading a Latin1 or ISO8859-1 file?I've messed with the std.encoding module but really can'tfigure out what I need to do.
There must be a simple solution to this.
This seems to work:


import std.stdio, std.file, std.encoding;

void main()
{
auto latin = cast(Latin1String) read("/tmp/hi.8859");
string s;
transcode(latin, s);
writeln(s);
}


Graham
I thought I was in good shape with your above suggestion. Idoes help me read and process text. But when I go to printit out I have problems.
Here is my input file:
°F

Here is my code:
import std.stdio;
import std.string;
import std.file;
import std.encoding;

// Main function
void main(){
 auto fout = File("out.txt","w");
 auto latinS = cast(Latin1String) read("in.txt");
 string uniS;
 transcode(latinS, uniS);
 foreach(line; uniS.splitLines()){
    transcode(line, latinS);
    fout.writeln(line);
    fout.writeln(latinS);
 }
}

Here is the output:
Â°F
[cast(immutable(Latin1Char))176,cast(immutable(Latin1Char))70]
If I print the Unicode string I get an extra weird character.
If I print the Unicode string retranslated to Latin1, it getweird pseudo-code.
Can you help?
I tried the program and it seemed to work for me.
What program are you using to read "out.txt"? Are you sure itsupports UTF-8, and knows to open the file as UTF-8? (Thislooks suspiciously like a tool's attempt to misinterpret aUTF-8 string as Latin-1.)
If you're on a Unix system, what does "file in.txt out.txt"report?
Graham
Hmmm.  I'm not communicating well.
I want to read and write ASCII. The only reason I'mconverting to Unicode is because D needs it (as I understand).
Yes if I open Â°F in notepad++ and tell notepad++ that it isUTF-8, it shows °F.
I want to:
1) Read an ascii file that may have codes above 127.
2) Convert to unicode so D funcs like .splitLines() can workwith it.3) Convert back to ascii so that stuff like °F writes out asit was read in.
If I open in.txt and out.txt in an ascii editor, °F shouldlook the same in both files with the editor encoding the filesas ANSI/ASCII. I thought my program was doing just that.
Thanks for your assistance.
To make sure we're on the same page -- ASCII is a 7-bitencoding, and any character above 127 is by definition not anASCII character. At that point we're talking about an encodingother than ASCII, such as UTF-8 or Latin-1.
If you're reading a file that has bytes > 127, you really haveno choice but to specify (assume?) an encoding, Latin-1 forexample. There's no guarantee your input file is Latin-1,though, and garbage-in will result in garbage-out.
So I think what you're trying to do is

1. read a Latin-1 file, into unicode (internally in D)
2. do splitLines(), etc., generating some result
3. Convert the result back to latin-1, and output it.

Is that right?
Graham


Exactly.

Re: Reading ASCII file with some codes above 127 (exten ascii)

Reply via email to