Re: basic question about text file encodings

Adam D. Ruppe via Digitalmars-d-learn Thu, 16 Apr 2015 12:45:29 -0700

On Thursday, 16 April 2015 at 19:22:41 UTC, Laeeth Isharc wrote:

What is the best way to figure out and then decode a file ofunknown coding to dchar?

You generally can't, though some statistical analysis cansometimes help. The encoding needs to be known through some othermeans to have a correct conversion.

How was the file generated? If it came from Excel it might be inthe Windows encoding. You can try my characterencodings.d


https://github.com/adamdruppe/arsd/blob/master/characterencodings.d

this is a standalone file, just download it and add to yourbuild, and do



string utf8 = convertToUtf8Lossy(your_data, "windows-1252");

and it will work, though it might drop a character if it doesn'tknow how to convert it (hence Lossy in the name). There's also a`convertToUtf8` function which never drops characters it doesn'tknow.


Then examine the string and see if it looks right o you.



Alternatively, with Phobos only, you can try:

import std.conv, std.encoding;

string utf8 = to!string(Windows1252String(your_data));

both my module and the Phobos module expects your input data tobe immutable(ubyte)[], so you might need to cast to that.

The Phobos moduel is great if you know the type at compile timeand it is one of the few encodings it supports.

My module is a bit better taking random runtime data (I wrote itto support website and email screen scraping).

Re: basic question about text file encodings

Reply via email to