Am 24.03.2014 17:44, schrieb Andrei Alexandrescu:
On 3/24/14, 5:51 AM, w0rp wrote:
On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
Here's a baseline: http://goo.gl/91vIGc. Destroy!

Andrei

Before we roll this out, could we discuss a strategy/guideline in
regards to detecting and handling invalid UTF sequences?

Having a fast "front" is fine and all, but if it means your program
asserting in release (or worst, silently corrupting memory) just
because the client was trying to read a bad text file, I'm unsure this
is acceptable.

I would strongly advise to at least offer an option

Options are fine for functions etc. But front would need to find an
all-around good compromise between speed and correctness.

Andrei


b"\255".decode("utf-8", errors="strict") # UnicodeDecodeError
b"\255".decode("utf-8", errors="replace") # replacement character used
b"\255".decode("utf-8", errors="ignore") # Empty string, invalid
sequence removed.

i think there should be a base range for UTF8 iteration - with policy based error extension (like in python) and some variants that defer this base UTF8 range with different error behavior - and one of these become the phobos standard = default parameter so its still switchable


Reply via email to