On Wed, 23 May 2012 22:02:25 +0100, Paul <[email protected]> wrote:
This works, though it's ugly:


    foreach(line; uniS.splitLines()) {
       transcode(line, latinS);
       fout.writeln((cast(char[]) latinS));
    }

The Latin1String type, at the storage level, is a ubyte[]. By casting to char[], you can get a similar-to-string thing that writeln() can handle.

Graham

Awesome!  What a lesson! Thannk you!

So if anyone is following this thread heres my code now. This reads a text file(encoded in Latin1 which is basic ascii with extended ascii codes), allows D to work with it in unicode, and then spits it back out as Latin1.

I wonder about the speed between this method and Era's home-spun solution?

import std.stdio;
import std.string;
import std.file;
import std.encoding;

// Main function
void main(){
     auto fout = File("out.txt","w");
     auto latinS = cast(Latin1String) read("in.txt");
     string uniS;
     transcode(latinS, uniS);
     foreach(line; uniS.splitLines()){
        transcode(line, latinS);
        fout.writeln((cast(char[]) latinS));
     }
}

The only thing which would worry me about this code is the cast(char[]) in the final writeln.. I know some parts of phobos verify the char data is correct UTF-8 and this line casts latin-1 to char[] which can potentially create invalid UTF-8 data. That said, I had a really quick look at the phobos code for File.writeln and I'm not sure whether this function does any UTF-8 validation. I would be happier if the latin-1 was written as a stream of bytes with no assumed interpretation, IMO.

R

--
Using Opera's revolutionary email client: http://www.opera.com/mail/

Reply via email to