On Sunday 07 August 2011 14:08:06 Dmitry Olshansky wrote: > On 07.08.2011 12:09, Mehrdad wrote: > > A readText() function that would read a text file (**and** autodetect > > its encoding from its BOM) would be of great help. > > Well the name is here, dunno if it meets your expectations: > http://d-programming-language.org/phobos/std_file.html#readText
D (and Phobos in general) assumes that char in UTF-8, wchar is UTF-16, and dchar is UTF-32. You're going to get an exception thrown pretty quickly if you're trying to use those types with values that don't match those encodings. As such, readText assumes that the file is in whatever encoding the character type is that it's instantiated with. So, if you try and read in a file which doesn't match the character encoding of the character type that you're using (which is char by default), you're going to get a UtfException. What Mehrdad wants is a way to read in a file with an encoding other than UTF-8, UTF-16, or UTF-32, have it autodetect the encoding by reading the file's BOM, and then convert it it to whatever encoding is that the character type that readText is using uses. readText doesn't currently do anything of the sort. At this point, dealing with anything which has an encoding other than UTF-8, UTF-16, or UTF-32 is problematic in D. std.encoding helps, but it's not necessarily all that good (Andrei considers it a failed experiment which either needs to be redesigned or removed). So, one of the things that still needs to be figured out for Phobos is how to better handle encodings other than UTF-8, UTF-16, and UTF-32. For the most part, other encodings are likely to be dealt with only when reading or writing I/O while UTF-8, UTF-16, and UTF-32 are dealt with inside of D programs, but we still need to fix things so that we can readily deal with I/O that isn't UTF-8, UTF-16, or UTF-32. - Jonathan M Davis