Re: Parsing a UTF-16LE file line by line, BUG?

Era Scarecrow via Digitalmars-d-learn Thu, 26 Jan 2017 20:31:59 -0800

On Tuesday, 17 January 2017 at 11:40:15 UTC, Nestor wrote:

Thanks, but unfortunately this function does not produce properUTF8 strings, as a matter of fact the output even starts withthe BOM. Also it doesn't handle CRLF, and even for LFterminated lines it doesn't seem to work for lines other thanthe first.

I thought you wanted to get line by line of contents, whichwould then remain as UTF-16. Translating between the two typesshouldn't be hard, probably to!string or a foreach with appendingto code-units on chars would convert to UTF-8.

Skipping the BOM is just a matter of skipping the first twobytes identifying it...

I guess I have to code encoding detection, buffered read, andtranscoding by hand, the only problem is that the result couldbe sub-optimal, which is why I was looking for a built-insolution.

Maybe. Honestly I'm not nearly as familiar with the library orfunctions as I would love to be, so often home-made solutionsseem more prevalent until I learn the lingo. A disadvantage ofbeing self taught.

Re: Parsing a UTF-16LE file line by line, BUG?

Reply via email to