Re: Reading ASCII file with some codes above 127 (exten ascii)

Regan Heath Fri, 25 May 2012 02:09:00 -0700

On Wed, 23 May 2012 22:02:25 +0100, Paul <[email protected]> wrote:

This works, though it's ugly:
    foreach(line; uniS.splitLines()) {
       transcode(line, latinS);
       fout.writeln((cast(char[]) latinS));
    }
The Latin1String type, at the storage level, is a ubyte[]. By castingto char[], you can get a similar-to-string thing that writeln() canhandle.
Graham
Awesome!  What a lesson! Thannk you!
So if anyone is following this thread heres my code now. This reads atext file(encoded in Latin1 which is basic ascii with extended asciicodes), allows D to work with it in unicode, and then spits it back outas Latin1.
I wonder about the speed between this method and Era's home-spunsolution?
import std.stdio;
import std.string;
import std.file;
import std.encoding;

// Main function
void main(){
     auto fout = File("out.txt","w");
     auto latinS = cast(Latin1String) read("in.txt");
     string uniS;
     transcode(latinS, uniS);
     foreach(line; uniS.splitLines()){
        transcode(line, latinS);
        fout.writeln((cast(char[]) latinS));
     }
}

The only thing which would worry me about this code is the cast(char[]) inthe final writeln.. I know some parts of phobos verify the char data iscorrect UTF-8 and this line casts latin-1 to char[] which can potentiallycreate invalid UTF-8 data. That said, I had a really quick look at thephobos code for File.writeln and I'm not sure whether this function doesany UTF-8 validation. I would be happier if the latin-1 was written as astream of bytes with no assumed interpretation, IMO.


R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Re: Reading ASCII file with some codes above 127 (exten ascii)

Reply via email to