Re: Challenge: write a really really small front() for UTF8

dennis luehring Mon, 24 Mar 2014 23:42:02 -0700

Am 24.03.2014 17:44, schrieb Andrei Alexandrescu:

On 3/24/14, 5:51 AM, w0rp wrote:

On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:

Here's a baseline: http://goo.gl/91vIGc. Destroy!


Andrei


Before we roll this out, could we discuss a strategy/guideline in
regards to detecting and handling invalid UTF sequences?

Having a fast "front" is fine and all, but if it means your program
asserting in release (or worst, silently corrupting memory) just
because the client was trying to read a bad text file, I'm unsure this
is acceptable.


I would strongly advise to at least offer an option


Options are fine for functions etc. But front would need to find an
all-around good compromise between speed and correctness.

Andrei


b"\255".decode("utf-8", errors="strict") # UnicodeDecodeError
b"\255".decode("utf-8", errors="replace") # replacement character used
b"\255".decode("utf-8", errors="ignore") # Empty string, invalid
sequence removed.

i think there should be a base range for UTF8 iteration - with policybased error extension (like in python) and some variants that defer thisbase UTF8 range with different error behavior - and one of these becomethe phobos standard = default parameter so its still switchable

Re: Challenge: write a really really small front() for UTF8

Reply via email to