On Tuesday, 17 January 2017 at 11:40:15 UTC, Nestor wrote:
Thanks, but unfortunately this function does not produce proper UTF8 strings, as a matter of fact the output even starts with the BOM. Also it doesn't handle CRLF, and even for LF terminated lines it doesn't seem to work for lines other than the first.

I thought you wanted to get line by line of contents, which would then remain as UTF-16. Translating between the two types shouldn't be hard, probably to!string or a foreach with appending to code-units on chars would convert to UTF-8.

Skipping the BOM is just a matter of skipping the first two bytes identifying it...

I guess I have to code encoding detection, buffered read, and transcoding by hand, the only problem is that the result could be sub-optimal, which is why I was looking for a built-in solution.

Maybe. Honestly I'm not nearly as familiar with the library or functions as I would love to be, so often home-made solutions seem more prevalent until I learn the lingo. A disadvantage of being self taught.

Reply via email to