On 8/7/2011 3:21 AM, Jonathan M Davis wrote:
On Sunday 07 August 2011 14:08:06 Dmitry Olshansky wrote:
On 07.08.2011 12:09, Mehrdad wrote:
A readText() function that would read a text file (**and** autodetect
its encoding from its BOM) would be of great help.
Well the name is here, dunno if it meets your expectations:
http://d-programming-language.org/phobos/std_file.html#readText
...
What Mehrdad wants is a way to read in a file with an encoding other than
UTF-8, UTF-16, or UTF-32, have it autodetect the encoding by reading the file's
BOM, and then convert it it to whatever encoding is that the character type
that readText is using uses.
Yeah, although I don't mean anything /other/ than those -- I only care about Unicode, but I think it should be auto-detected, not based on the template parameter.


On 8/7/2011 6:21 AM, Andrei Alexandrescu wrote:
I think we could and should change readText to do the BOM trick. It's been on my mind forever.
I /do/ have an implementation, but it's (1) only for Windows, (2) hastily written (no error checking or whatever), and (3) doesn't work for UTF-16 BE (although it works for LE), and (4) only returns the result in UTF-8. It's a starting point, though. An added bonus is the fact that it actually looks at the file data as well, so the heuristic is rather nice.

    pragma(lib, "advapi32.lib");
    extern(Windows) BOOL IsTextUnicode(in void* pBuffer, int cb, int* lpi);
    string readText(const(char)[] name)
    {
        auto data = cast(char[])file.read(name);
        int test = 0xFFFF;
        if (IsTextUnicode(data.ptr, data.length, &test))
{ return (cast(wchar[])(test & 0x00088 ? data[2 .. $] : data)).toUTF8(); }
        else
{ return (data.startsWith([0xEF, 0xBB, 0xBF]) ? data[3 .. $] : data).toUTF8(); }
    }

Reply via email to