[Issue 14519] Get rid of unicode validation in string processing

2022-12-17 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

Iain Buclaw  changed:

   What|Removed |Added

   Priority|P1  |P4

--


[Issue 14519] Get rid of unicode validation in string processing

2021-11-07 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

Vladimir Panteleev  changed:

   What|Removed |Added

  Component|dmd |druntime

--


[Issue 14519] Get rid of unicode validation in string processing

2021-11-07 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #39 from Vladimir Panteleev  ---
*** Issue 22473 has been marked as a duplicate of this issue. ***

--


[Issue 14519] Get rid of unicode validation in string processing

2019-08-15 Thread d-bugmail--- via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

Walter Bright  changed:

   What|Removed |Added

   See Also||https://issues.dlang.org/sh
   ||ow_bug.cgi?id=20134

--


[Issue 14519] Get rid of unicode validation in string processing

2016-05-20 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #38 from Martin Nowak  ---
(In reply to Vladimir Panteleev from comment #36)
> Question, is there any overhead in actually verifying the validity of UTF-8
> streams, or is all overhead related to error handling (i.e. inability to be
> nothrow)?

I think it's fairly measurable b/c you need to add lots of additional checks
and branches (though highly predictable ones).
While my initial decode implementation
https://github.com/MartinNowak/phobos/blob/1b0edb728c/std/utf.d#L577-L651 was
transmogrify into 200 lines in the meantime
https://github.com/dlang/phobos/blob/acafd848d8/std/utf.d#L1167-L1369, you can
still use it to benchmark validation.
I did run a lot of benchmarks when introducing that function, and the code path
for decoding just remains slow, even with the throwing code path removed out of
normal control flow.

--


[Issue 14519] Get rid of unicode validation in string processing

2016-05-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

Jack Stouffer  changed:

   What|Removed |Added

 CC||j...@jackstouffer.com

--- Comment #37 from Jack Stouffer  ---
This entire discussion is moot unless you get Andrei on board with a breaking
change to a very fundamental part of the language.

--


[Issue 14519] Get rid of unicode validation in string processing

2015-08-18 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

Vladimir Panteleev  changed:

   What|Removed |Added

   See Also||https://issues.dlang.org/sh
   ||ow_bug.cgi?id=14919

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #36 from Vladimir Panteleev  ---
Question, is there any overhead in actually verifying the validity of UTF-8
streams, or is all overhead related to error handling (i.e. inability to be
nothrow)?

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #35 from Jonathan M Davis  ---
(In reply to Martin Nowak from comment #32)
> Summary:
> 
> We should adopt a new model of unicode validations.
> The current one where every string processing function decodes unicode
> characters and performs validation causes too much overhead.
> A better alternative would be to perform unicode validation once when
> reading raw data (ubyte[]) and then assume any char[]/wchar[]/dchar[] is a
> valid unicode string.
> Invalid encodings introduced by string processing algorithms are programming
> bugs and thus do not warrant runtime checks in release builds.

Exactly.

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #34 from Vladimir Panteleev  ---
(In reply to Martin Nowak from comment #31)
> BTW, this is what I already wrote in comment 23. Not sure why you only
> partially quoted my answer to suggest a contradiction.

Err, well, to be fair, you did not state this clearly in comment 23, which is
why I asked for a clarification. I was not trying to maliciously nitpick your
words, just tried to understand your point.

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #33 from Sobirari Muhomori  ---
Removing autodecoding is good, but this issue is about making autodecode
@nothrow @nogc.

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #32 from Martin Nowak  ---
Summary:

We should adopt a new model of unicode validations.
The current one where every string processing function decodes unicode
characters and performs validation causes too much overhead.
A better alternative would be to perform unicode validation once when reading
raw data (ubyte[]) and then assume any char[]/wchar[]/dchar[] is a valid
unicode string.
Invalid encodings introduced by string processing algorithms are programming
bugs and thus do not warrant runtime checks in release builds.

Also see

https://github.com/D-Programming-Language/druntime/pull/1279

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #31 from Martin Nowak  ---
(In reply to Martin Nowak from comment #30)
> Well, b/c they contain delimited binary and ASCII data, you'll have to find
> those delimiters, then validate and cast the ASCII part to a string, and can
> then use std.string functions.

BTW, this is what I already wrote in comment 23. Not sure why you only
partially quoted my answer to suggest a contradiction.

--


[Issue 14519] Get rid of unicode validation in string processing

2015-07-17 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14519

Martin Nowak  changed:

   What|Removed |Added

Summary|[Enh] foreach on strings|Get rid of unicode
   |should return   |validation in string
   |replacementDchar rather |processing
   |than throwing   |

--