On 8/7/2011 3:21 AM, Jonathan M Davis wrote:
On Sunday 07 August 2011 14:08:06 Dmitry Olshansky wrote:
On 07.08.2011 12:09, Mehrdad wrote:
A readText() function that would read a text file (**and** autodetect
its encoding from its BOM) would be of great help.
Well the name is here, dunno if it meets your expectations:
http://d-programming-language.org/phobos/std_file.html#readText
...
What Mehrdad wants is a way to read in a file with an encoding other than
UTF-8, UTF-16, or UTF-32, have it autodetect the encoding by reading the file's
BOM, and then convert it it to whatever encoding is that the character type
that readText is using uses.
Yeah, although I don't mean anything /other/ than those -- I only care
about Unicode, but I think it should be auto-detected, not based on the
template parameter.
On 8/7/2011 6:21 AM, Andrei Alexandrescu wrote:
I think we could and should change readText to do the BOM trick. It's
been on my mind forever.
I /do/ have an implementation, but it's (1) only for Windows, (2)
hastily written (no error checking or whatever), and (3) doesn't work
for UTF-16 BE (although it works for LE), and (4) only returns the
result in UTF-8.
It's a starting point, though. An added bonus is the fact that it
actually looks at the file data as well, so the heuristic is rather nice.
pragma(lib, "advapi32.lib");
extern(Windows) BOOL IsTextUnicode(in void* pBuffer, int cb, int* lpi);
string readText(const(char)[] name)
{
auto data = cast(char[])file.read(name);
int test = 0xFFFF;
if (IsTextUnicode(data.ptr, data.length, &test))
{ return (cast(wchar[])(test & 0x00088 ? data[2 .. $] :
data)).toUTF8(); }
else
{ return (data.startsWith([0xEF, 0xBB, 0xBF]) ? data[3 .. $] :
data).toUTF8(); }
}