On Saturday, 28 January 2017 at 15:40:24 UTC, Nestor wrote:
On Friday, 27 January 2017 at 04:26:31 UTC, Era Scarecrow wrote:
Skipping the BOM is just a matter of skipping the first two bytes identifying it...

AFAIK in some cases the BOM takes up to 4 bytes (FOR UTF-32), so when input encoding is unknown one must perform some kind of detection in order to apply the correct transcoding later. I thought by now dmd had this functionality built-in and exposed, since the compiler itself seems to do it for source code units.

On UTF-8 files the BOM is 3 bytes long.
            • Re: Par... pineapple via Digitalmars-d-learn
              • Re:... Mike Wey via Digitalmars-d-learn
              • Re:... Nestor via Digitalmars-d-learn
              • Re:... Nestor via Digitalmars-d-learn
              • Re:... Daniel Kozák via Digitalmars-d-learn
              • Re:... Nestor via Digitalmars-d-learn
              • Re:... Era Scarecrow via Digitalmars-d-learn
              • Re:... Nestor via Digitalmars-d-learn
              • Re:... Era Scarecrow via Digitalmars-d-learn
              • Re:... Nestor via Digitalmars-d-learn
              • Re:... Patrick Schluter via Digitalmars-d-learn
              • Re:... Jack Applegame via Digitalmars-d-learn
              • Re:... Era Scarecrow via Digitalmars-d-learn
        • Re: Parsing a UT... Daniel Kozák via Digitalmars-d-learn
  • Re: Parsing a UTF-16LE file l... Steven Schveighoffer via Digitalmars-d-learn

Reply via email to