[Issue 14529] New: Bug in Regex insensitive match
https://issues.dlang.org/show_bug.cgi?id=14529 Issue ID: 14529 Summary: Bug in Regex insensitive match Product: D Version: D2 Hardware: x86_64 OS: Linux Status: NEW Severity: major Priority: P1 Component: Phobos Assignee: nob...@puremagic.com Reporter: kasamia.o.kasa...@gmail.com The following code describes the problem: import std.stdio; import std.regex; void main() { auto ctr = ctRegex!(r"^[CF]$", "i"); foreach(line; stdin.byLine) { foreach(m; line.matchAll(ctr)) { writeln("match: ", m.hit); } } } -- the simple regex should match: C, c, F, f but only C, c, F will match. and if you switch the order inside the char class: [FC] only F, f, C are matched, but not c It seems like there's something wrong with the last char that should match. The same problem happens when using regex obj too. --
[Issue 14528] New: GIT HEAD: can't pass protected member to template by alias
https://issues.dlang.org/show_bug.cgi?id=14528 Issue ID: 14528 Summary: GIT HEAD: can't pass protected member to template by alias Product: D Version: D2 Hardware: All OS: All Status: NEW Keywords: rejects-valid Severity: regression Priority: P1 Component: DMD Assignee: nob...@puremagic.com Reporter: thecybersha...@gmail.com This regression is an exacerbation of issue 13744 for protected members. // f.d / void tpl(alias a)() { a(); } // c.d / import f; class C { protected static void m() {} void fun() { tpl!m(); } } Introduced in https://github.com/D-Programming-Language/dmd/pull/4558 --
[Issue 13433] Request: Clock.currTime option to use CLOCK_REALTIME_COARSE / CLOCK_REALTIME_FAST
https://issues.dlang.org/show_bug.cgi?id=13433 --- Comment #14 from github-bugzi...@puremagic.com --- Commits pushed to master at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/8e29e0621b074a8d368b4d7d344281adb7a91e54 Add ClockType enum to core.time for issue# 13433. This adds an enum for indicating which type of clock to use when it's appropriate for a time function to have multiple options for the source clock. In the case of MonoTime, to make that work cleanly, the implementation of MonoTime has become MonoTimeImpl, templated on ClockType, and MonoTime has become an alias to MonoTimeImpl!(ClockType.normal). In the case of SysTime (in a separate PR), that will a default template argument to Clock.currTime and SysTime will be unaffected (because in MonoTime's case, the clock that it came from is integral to the type, whereas in SysTime's case, it doesn't matter after the SysTime has been initialized). https://github.com/D-Programming-Language/druntime/commit/bcfc36b3ca5a229c751c972c607fee57d4febcb2 Merge pull request #990 from jmdavis/13433 Add ClockType enum to core.time for issue# 13433. --
[Issue 14527] New: [Enh] Instrument calls to operator new with -profilenew compiler switch
https://issues.dlang.org/show_bug.cgi?id=14527 Issue ID: 14527 Summary: [Enh] Instrument calls to operator new with -profilenew compiler switch Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: DMD Assignee: nob...@puremagic.com Reporter: bugzi...@digitalmars.com Throwing the -profilenew switch to the compiler will case file, line, and function data to be added to the call. druntime's default behavior with this will be to report every location that allocates memory and how much memory. The user will be able to provide their own logging capability by overriding the default functions in druntime. An initial implementation: https://github.com/D-Programming-Language/dmd/pull/4621 --
[Issue 13867] Overriding a method from an extern(C++) interface requires extern(C++) on the method definition
https://issues.dlang.org/show_bug.cgi?id=13867 nick changed: What|Removed |Added Severity|enhancement |normal --
[Issue 12803] __traits(getFunctionAttributes) is not documented
https://issues.dlang.org/show_bug.cgi?id=12803 github-bugzi...@puremagic.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --
[Issue 12803] __traits(getFunctionAttributes) is not documented
https://issues.dlang.org/show_bug.cgi?id=12803 --- Comment #2 from github-bugzi...@puremagic.com --- Commits pushed to master at https://github.com/D-Programming-Language/dlang.org https://github.com/D-Programming-Language/dlang.org/commit/884f46101ea0cdb611cc3c5c43a1961202818980 Fix issue 12803 https://github.com/D-Programming-Language/dlang.org/commit/6be29ea15f5666361c3974bc014d1fad8f19d28a Merge pull request #984 from nomad-software/issue_12803 Issue 12803 - __traits(getFunctionAttributes) is not documented --
[Issue 13374] Wrong template overload resolution when passing function to alias/string parameter
https://issues.dlang.org/show_bug.cgi?id=13374 Vladimir Panteleev changed: What|Removed |Added See Also||https://issues.dlang.org/sh ||ow_bug.cgi?id=14520 --
[Issue 14520] [REG2.067.0] string/alias template overload
https://issues.dlang.org/show_bug.cgi?id=14520 Vladimir Panteleev changed: What|Removed |Added Keywords||rejects-valid --
[Issue 14520] [REG2.067.0] string/alias template overload
https://issues.dlang.org/show_bug.cgi?id=14520 Vladimir Panteleev changed: What|Removed |Added See Also||https://issues.dlang.org/sh ||ow_bug.cgi?id=13374 --- Comment #1 from Vladimir Panteleev --- This bug is the reincarnation of issue 13374 (the reduced code is different, but the original code was broken once again). The full timeline: - v2.060 : works - v2.061 : broken (https://github.com/D-Programming-Language/dmd/pull/599) - v2.066.1: fixed (https://github.com/D-Programming-Language/dmd/pull/3897) - v2.076.0: broken (https://github.com/D-Programming-Language/dmd/pull/4375) --
[Issue 14497] Disassembly view
https://issues.dlang.org/show_bug.cgi?id=14497 --- Comment #2 from Manu --- Yeah, requiring that the program link is annoying, and if the program is big (mine are), then the build times can get long, and iteration is slow. Short of source, at very least, there needs to be symbol names at the header of blocks of code. It must be easier to populate the assembly with symbol name headers than full source? As long as you can identify the start and end of the function you're interested in, that will give an 80% solution satisfying the majority if simple cases. Do the GNU tools make this easier? I imagine there must be tools in the GCC/Clang (GDC/LDC?) suite that do the full job? It might be easier to start there? Also be useful in that you could disassemble non-x86 arch-es too. --
[Issue 14526] New: GetOptException DDOC needs cleanup
https://issues.dlang.org/show_bug.cgi?id=14526 Issue ID: 14526 Summary: GetOptException DDOC needs cleanup Product: D Version: D2 Hardware: All URL: http://dlang.org/phobos/std_getopt.html#.GetOptExcepti on OS: All Status: NEW Keywords: ddoc Severity: trivial Priority: P1 Component: Phobos Assignee: nob...@puremagic.com Reporter: briancsch...@gmail.com http://dlang.org/phobos/std_getopt.html#.GetOptException It needs $(UL) and $(LI) macros. It also needs to list the other conditions under which it is thrown. --
[Issue 14525] New: Cannot access help information from getopt if a required parameter is not given
https://issues.dlang.org/show_bug.cgi?id=14525 Issue ID: 14525 Summary: Cannot access help information from getopt if a required parameter is not given Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: normal Priority: P1 Component: Phobos Assignee: nob...@puremagic.com Reporter: briancsch...@gmail.com http://forum.dlang.org/thread/tjraqgbvwsqgynmzj...@forum.dlang.org The problem is that getopt needs to a) throw an exception and b) return a valid GetoptResult so that the program can print help information when the exception is thrown. Obviously, this isn't possible, so we need to find some other solution. Maybe including the options array in the GetOptException? --
[Issue 14524] New: Right clicking in solution explorer to add folders does not work as expected
https://issues.dlang.org/show_bug.cgi?id=14524 Issue ID: 14524 Summary: Right clicking in solution explorer to add folders does not work as expected Product: D Version: D2 Hardware: x86_64 OS: Windows Status: NEW Severity: normal Priority: P1 Component: VisualD Assignee: nob...@puremagic.com Reporter: philip.daniels1...@gmail.com I can right click in the solution explorer to create a folder, but this appears to just be a "solution folder". No actual folder is created on disk, which means that the next step - creating a source file in that folder - fails. The only way that works seems to be to create the folder in Visual D, flip to Windows explorer and create the backing folder on disk, then flip back to Visual D to add a new file, being careful to specify the "Create in" correctly at the bottom of the Add New Item dialog box. Basically, it should work the way it does in C#: Right Click -> Add New Folder creates a new folder in solution explorer and on disk. Right Click Folder -> Add New Item opens the Add New Item dialog box with the Location synced to the folder. --
[Issue 14497] Disassembly view
https://issues.dlang.org/show_bug.cgi?id=14497 Rainer Schuetze changed: What|Removed |Added CC||r.sagita...@gmx.de --- Comment #1 from Rainer Schuetze --- Interesting idea. That would really be helpful for immediately seeing what the optimizer is able/unable to do. I use the "Compile and Debug" command sometimes with unittests and option "-main", though that still needs you to set a breakpoint somewhere and wait for the debugger to start. Getting the disassembly of an object file with obj2asm/dumpbin is possible, but syncing with the source needs to read debug information. This can be rather annoying. --
[Issue 14523] New: New Windows Application uses incorrect initialization/termination code
https://issues.dlang.org/show_bug.cgi?id=14523 Issue ID: 14523 Summary: New Windows Application uses incorrect initialization/termination code Product: D Version: D2 Hardware: x86_64 OS: Windows Status: NEW Severity: minor Priority: P1 Component: VisualD Assignee: nob...@puremagic.com Reporter: philip.daniels1...@gmail.com When you create a new Windows Application it adds two lines to winmain.d Runtime.initialize(&exceptionHandler); Runtime.terminate(&exceptionHandler); these invocations are deprecated. The correct ones seem to be Runtime.initialize(); Runtime.terminate(); Ref: See http://wiki.dlang.org/D_for_Win32 and http://forum.dlang.org/thread/mailman.199.1389129967.15871.digitalmar...@puremagic.com --
[Issue 14215] invalid import in core.sys.linux.stdio
https://issues.dlang.org/show_bug.cgi?id=14215 Joakim changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from Joakim --- Commits pushed to master at https://github.com/D-Programming-Language/druntime https://github.com/D-Programming-Language/druntime/commit/68b9e9ce325d58812e73ca8f0cd268c05f651d2e Remove references to importing core.stdc.stddef for size_t or ptrdiff_t https://github.com/D-Programming-Language/druntime/commit/18d57ffe3eed8674ca2052656bb3f410084379f6 Merge pull request #1232 from joakim-noah/size_t Fix 14215 - Unnecessary imports of core.stdc.stddef for size_t and ptrdiff_t --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #22 from Marc Schütz --- (In reply to Vladimir Panteleev from comment #20) > (In reply to Marc Schütz from comment #18) > > Data with other (or unknown) encodings needs to be stored in `ubyte[]`. > > Have you tried using ubyte[] to process ASCII text? It's horrible, you have > to cast at every step, and nothing in std.string works even when it should. For ASCII text, char[] is okay, UTF8 is a superset of ASCII. But you're right for other encodings. That's why those need to be converted "at the border": To UTF8 when read from a file or stdin, main() args, env vars, and from UTF8 to whatever on writing. Internally, they need to be UTFx encoded. This is the only sane way to handle different text encodings, IMO. --
[Issue 14277] Compile-time array casting error - ugly error report
https://issues.dlang.org/show_bug.cgi?id=14277 --- Comment #4 from Ketmar Dark --- this also ruing things like `typeof(smth).stringof[$-2..$] == "[]"` for example. so it's unusable. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #21 from Sobirari Muhomori --- (In reply to Vladimir Panteleev from comment #16) > > Global opt-in for foreach is not feasible. > > I agree - some libraries will expect one thing, and others another. Libraries don't determine on which data the program operates, it depends on the program and its environment, encoding mismatch has large scale consequence too: program crashes or corrupts data, libraries don't decide how to behave in such cases, it's a property of the program as a whole. Since they can't decide how to behave in such cases, they shouldn't decide and thus can't have different expectations on this matter, it's a per-program aspect. --
[Issue 14473] Remove deprecated HTML tags from ddoc output
https://issues.dlang.org/show_bug.cgi?id=14473 --- Comment #1 from Gary Willoughby --- Discussion regarding this issue: http://forum.dlang.org/thread/fmgylnkatvuuoeosc...@forum.dlang.org --
[Issue 12803] __traits(getFunctionAttributes) is not documented
https://issues.dlang.org/show_bug.cgi?id=12803 Gary Willoughby changed: What|Removed |Added Keywords||pull CC||d...@nomad.so --- Comment #1 from Gary Willoughby --- https://github.com/D-Programming-Language/dlang.org/pull/984 --
[Issue 13440] Keyed array literal is not documented
https://issues.dlang.org/show_bug.cgi?id=13440 Gary Willoughby changed: What|Removed |Added CC||d...@nomad.so --- Comment #1 from Gary Willoughby --- The documentation exists here: http://dlang.org/arrays.html#static-init-static --
[Issue 14522] Postfix array declaration examples should be removed from arrays.html
https://issues.dlang.org/show_bug.cgi?id=14522 Vladimir Panteleev changed: What|Removed |Added CC||thecybersha...@gmail.com --- Comment #1 from Vladimir Panteleev --- That code generates 6 warnings and 2 errors. Kill it with fire. --
[Issue 14522] New: Postfix array declaration examples should be removed from arrays.html
https://issues.dlang.org/show_bug.cgi?id=14522 Issue ID: 14522 Summary: Postfix array declaration examples should be removed from arrays.html Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: websites Assignee: nob...@puremagic.com Reporter: d...@nomad.so Postfix array declaration examples should be removed from the following page as these are heavily discouraged. http://dlang.org/arrays.html --
[Issue 14328] The terms "lvalue" and "rvalue" should be added to the glossary
https://issues.dlang.org/show_bug.cgi?id=14328 Gary Willoughby changed: What|Removed |Added Keywords||pull CC||d...@nomad.so --- Comment #1 from Gary Willoughby --- https://github.com/D-Programming-Language/dlang.org/pull/983 --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #20 from Vladimir Panteleev --- (In reply to Marc Schütz from comment #18) > Data with other (or unknown) encodings needs to be stored in `ubyte[]`. Have you tried using ubyte[] to process ASCII text? It's horrible, you have to cast at every step, and nothing in std.string works even when it should. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #19 from Vladimir Panteleev --- (In reply to Vladimir Panteleev from comment #16) > (In reply to Walter Bright from comment #15) > > It still allocates memory. But it's worth thinking about. Maybe assert()? > > Sure. Wait, now I'm not sure. For some reason I was thinking of assert(false) which will always stop executions. But continuing upon encountering invalid UTF-8 in release mode might result in bad outcomes as well. The problem is that it's impossible to achieve 100% coverage and make sure that all Unicode-handling code in your program also handles invalid UTF-8 in a good way. Thus, an invalid UTF-8 handling problem might not be caught in testing but might cause an unpleasant situation in release mode (depending on what happens next after the assert is NOT thrown). I don't feel too strongly about this though, I think programs that operate on important data shouldn't run with -release anyway. --
[Issue 14521] New: Glossary page needs updating
https://issues.dlang.org/show_bug.cgi?id=14521 Issue ID: 14521 Summary: Glossary page needs updating Product: D Version: D2 Hardware: All OS: All Status: NEW Severity: enhancement Priority: P1 Component: websites Assignee: nob...@puremagic.com Reporter: d...@nomad.so On the glossary page not all of the item headings have anchors. Also there is a 'input range' entry which lists the wrong interface. This should be removed and replaced with a general 'range' item. This would explain ranges in general and not just one type of range. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 Marc Schütz changed: What|Removed |Added CC||schue...@gmx.net --- Comment #18 from Marc Schütz --- (In reply to Walter Bright from comment #15) > If you have a pipeline A.B.C.D, then A throws on invalid UTF, and B.C.D > never are executed. But if A does not throw, then B.C.D guaranteed to be > getting valid UTF, but they still pay the penalty of the compiler thinking > they can allocate memory and throw. When `assert()` is used, whatever cost there is will of course disappear with `-release`. And IMO asserting is the right thing to do. Quoting the spec [1]: "char[] strings are in UTF-8 format. wchar[] strings are in UTF-16 format. dchar[] strings are in UTF-32 format." Note how it says "are in UTF-x format", not "should be". Therefore, a `string` not containing UTF8 is by definition a bug. Data with other (or unknown) encodings needs to be stored in `ubyte[]`. [1] http://dlang.org/arrays.html#strings --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #17 from Vladimir Panteleev --- Let's see if I understand the situation correctly... let's say we have a chain: str.a.b.c So, str is a UTF-8 string, and a, b and c are range algorithms (they use .front/.popFront and provide .front/.popFront themselves). If a/b/c don't throw anything themselves, the nothrow attribute will be inferred from the .front/.popFront of the range in front of them (the range they consume), right? That means that if str.front can throw, c can't be nothrow. But if str.front is nothrow, then c CAN be nothrow. But what if we do this: str.forceDecode.a.b.c forceDecode doesn't use str.front - it reads the str directly, code unit by code unit, and inserts replacement characters where it sees error. This allows a, b and c to be nothrow. Unless I'm wrong, I think this idea could work for opt-in replacement character substitution. Following the 90/10 law, it should be easy to insert "forceDecode" in the few relevant places as indicated by a profiler. Does this proposal make sense? --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #16 from Vladimir Panteleev --- (In reply to Walter Bright from comment #15) > It still allocates memory. But it's worth thinking about. Maybe assert()? Sure. > > I did not mean Unicode normalization - it was a joke (std.algorithm will > > "normalize" invalid UTF characters to the replacement character). But since > > .front on strings autodecodes, feeding a string to any generic range > > function in std.algorithm will cause auto-decoding (and thus, character > > substitution). > > That can be fixed as I suggested. Sorry, I'm not following. Which suggestion here will fix what in what way? > Global opt-in for foreach is not feasible. I agree - some libraries will expect one thing, and others another. > However, one can add an algorithm > "validate" which throws on invalid UTF, and put that at the start of a > pipeline, as in: > > text.validate.A.B.C.D; This is part of a solution. There also needs to be a way to ensure that validate was called, which is the hard part. > You brought up guessing the encoding of XML text by reading the start of it: > "what if it was some 8-bit encoding that only LOOKED like valid UTF-8?" No, that's not what I meant. UTF-8 and old 8-bit encodings (ISO 8859-*, Windows-125*) both use the high bit in the byte to indicate Unicode. Consider a program that expects an UTF-8 document, but is actually fed one in an 8-bit encoding: it is possible (although unlikely) that text that is actually in an 8-bit encoding may be successfully interpreted as a valid UTF-8 stream. Thus, invalid UTF-8 can indicate a problem with the entire document, and not just the immediate sequence of bytes. > If you have a pipeline A.B.C.D, then A throws on invalid UTF, and B.C.D > never are executed. But if A does not throw, then B.C.D guaranteed to be > getting valid UTF, but they still pay the penalty of the compiler thinking > they can allocate memory and throw. OK, so you're saying that we can somehow automatically remove the cost of handling invalid UTF-8 if we know that the UTF-8 we're getting is valid? I don't see how this would work in practice, or how it would provide a noticeable benefit in practice. Since the cost of removing a code path is negligible, I assume you're talking about exception frames, but I still don't see how this applies. Could you elaborate, or is this improvement a theory for now? Besides, won't A's output be a range of dchar, so B, C and D will not autodecode with or without this change? --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #15 from Walter Bright --- (In reply to Vladimir Panteleev from comment #14) > If I understand correctly, throwing Error instead of Exception will also > solve the performance issues It still allocates memory. But it's worth thinking about. Maybe assert()? > Ditto, but the @nogc aspect can also be solved with the refcounted > exceptions spec, which will fix the problem in general. We'll see. That's still a ways off. > > 2. Same thing. (Running normalization on passwords? What the hell?) > > I did not mean Unicode normalization - it was a joke (std.algorithm will > "normalize" invalid UTF characters to the replacement character). But since > .front on strings autodecodes, feeding a string to any generic range > function in std.algorithm will cause auto-decoding (and thus, character > substitution). That can be fixed as I suggested. > > The replacement char thing was not invented by me, it is commonplace as > > users don't like their documents being wholly rejected for one or two bad > > encodings. > I know, I agree it's useful, but it needs to be opt-in. Global opt-in for foreach is not feasible. However, one can add an algorithm "validate" which throws on invalid UTF, and put that at the start of a pipeline, as in: text.validate.A.B.C.D; > > I know that many programs try to guess the encoding of random text they get. > > Doing this by only reading a few characters, and assuming the rest, is a > > strange method if one cares about the integrity of the data. > > I don't see how this is relevant, sorry. You brought up guessing the encoding of XML text by reading the start of it: "what if it was some 8-bit encoding that only LOOKED like valid UTF-8?" > > Having to constantly re-sanitize data, at every step in the pipeline, is > > going to make D programs uncompetitive speed-wise. > > I don't understand what you mean by this. You could say that any way to > handle invalid UTF can be seen as a way of sanitizing data: there will > always be a code path for what to do when invalid UTF is encountered. I > would interpret "no sanitization" as not handling invalid UTF in any way > (i.e. treating it in an undefined way). If you have a pipeline A.B.C.D, then A throws on invalid UTF, and B.C.D never are executed. But if A does not throw, then B.C.D guaranteed to be getting valid UTF, but they still pay the penalty of the compiler thinking they can allocate memory and throw. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #14 from Vladimir Panteleev --- (In reply to Walter Bright from comment #13) > Vladimir, you bring up good points. I'll try to address them. First off, why > do this? > > 1. much faster If I understand correctly, throwing Error instead of Exception will also solve the performance issues > 2. string processing can be @nogc and nothrow. If you follow external > discussions on the merits of D, the "D is no good because Phobos requires > the GC" ALWAYS comes up, and sucks all the energy out of the conversation. Ditto, but the @nogc aspect can also be solved with the refcounted exceptions spec, which will fix the problem in general. > So, on to your points: > > 1. Replacement only happens when doing a UTF decoding. S+R doesn't have to > do conversion, and that's one of the things I want to fix in std.algorithm. > The string fixes I've done in std.string avoid decoding as much as possible. Inevitably it is still very easy to to accidentally use something that auto-decodes. There is no way to statically make sure that you don't (except for using a non-string type for text, which is impractical), and with this proposed change, there will be no run-time way to handle this either. > 2. Same thing. (Running normalization on passwords? What the hell?) I did not mean Unicode normalization - it was a joke (std.algorithm will "normalize" invalid UTF characters to the replacement character). But since .front on strings autodecodes, feeding a string to any generic range function in std.algorithm will cause auto-decoding (and thus, character substitution). > The replacement char thing was not invented by me, it is commonplace as > users don't like their documents being wholly rejected for one or two bad > encodings. I know, I agree it's useful, but it needs to be opt-in. > I know that many programs try to guess the encoding of random text they get. > Doing this by only reading a few characters, and assuming the rest, is a > strange method if one cares about the integrity of the data. I don't see how this is relevant, sorry. > Having to constantly re-sanitize data, at every step in the pipeline, is > going to make D programs uncompetitive speed-wise. I don't understand what you mean by this. You could say that any way to handle invalid UTF can be seen as a way of sanitizing data: there will always be a code path for what to do when invalid UTF is encountered. I would interpret "no sanitization" as not handling invalid UTF in any way (i.e. treating it in an undefined way). --
[Issue 14470] Reuse of object memory: new emplace overload
https://issues.dlang.org/show_bug.cgi?id=14470 Walter Bright changed: What|Removed |Added CC||bugzi...@digitalmars.com Version|unspecified |D2 --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #13 from Walter Bright --- Vladimir, you bring up good points. I'll try to address them. First off, why do this? 1. much faster 2. string processing can be @nogc and nothrow. If you follow external discussions on the merits of D, the "D is no good because Phobos requires the GC" ALWAYS comes up, and sucks all the energy out of the conversation. So, on to your points: 1. Replacement only happens when doing a UTF decoding. S+R doesn't have to do conversion, and that's one of the things I want to fix in std.algorithm. The string fixes I've done in std.string avoid decoding as much as possible. 2. Same thing. (Running normalization on passwords? What the hell?) The replacement char thing was not invented by me, it is commonplace as users don't like their documents being wholly rejected for one or two bad encodings. I know that many programs try to guess the encoding of random text they get. Doing this by only reading a few characters, and assuming the rest, is a strange method if one cares about the integrity of the data. Having to constantly re-sanitize data, at every step in the pipeline, is going to make D programs uncompetitive speed-wise. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #12 from Walter Bright --- (In reply to bearophile_hugs from comment #8) > Another solution is to deprecate foreach iteration on strings, and require > something like "foreach(c; mystring.byCharThrowing)" and similar things. That's not a solution as I bet it breaks 50% of the programs out there. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #11 from Walter Bright --- https://github.com/D-Programming-Language/druntime/pull/1240 --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #10 from Vladimir Panteleev --- OK, I see from your post that you don't see many of the problems with the replacement character. Let me show you some example problematic situations: 1. Bob wants to update his company's documents to use the new name for his product. He writes a program that does a recursive pattern search & replace in a directory. After testing the program on a few sample files, he is satisfied with the results, and runs the program on his company's document store. Six months later, long after the documents went out of backup rotation, Sue finds that some important historical documents have been irreversibly corrupted and full of Unicode replacement characters encoded as UTF-8. Why? Because these old documents did not use UTF-8, and Bob used D. 2. Bob is writing a secure server-side software package (let's say, a confidential document store). He is using a std.algorithm-based hashing algorithm to store the passwords securely. At some point, Mary signs up and creates a secure password, which contains entirely Cyrillic letters (let's say, "ЭтоМойПароль"). Not long after, Eve successfully logs into Mary's account with the password "". Why? Because the passwords just happened to be sent in some non-UTF-8 encoding, and, since Bob used D, when "normalized" through std.algorithm's replacement character subtitution, all Unicode-only passwords of the same length have the same hash. Automatic use of the replacement character will come as a surprise to many people who come from other languages. For example, in Delphi, strings are also the de-facto ubyte[] / void[] type - you can safely read a binary file into a string, perform search and replace, and write it back, knowing that the result will be exactly what you expected. Furthermore, from your message it appears to me that you've missed the point of my argument: > What do you do if you read in an XML file and process half of it before you > hit invalid Unicode? You abort! This should not happen. Either the XML file is in an incorrect encoding (which puts to question the integrity of all the data parsed so far - what if it was some 8-bit encoding that only LOOKED like valid UTF-8?) or the program should've sanitized the input first if it really didn't care about data correctness. But this is an XML file, meaning it's very likely to be machine generated - if it contains errors, it might indicate a problem somewhere else in the system, which is why it's all the more important to abort and get the user to figure out the true source of the problem. Ignoring the error here reminds me of how PHP never stops on errors by default, or Basic's "ON ERROR GOTO NEXT". > So, throwing an Error is forcing everyone to validate the Unicode in their > strings whether they care or not, and using the replacement character will > work, whereas the programs that do care about validating their strings should > be doing the validation up front anyway. Yes, but then there is no way to make sure you're not accidentally corrupting data! Whereas now we only have a runtime check against invalid UTF-8, now we will have no check at all. With no automatic mechanism to ensure that all text is sanitized before it gets into std.algorithm, it becomes impossible to be sure that you're not accidentally corrupting data along the way. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #9 from Jonathan M Davis --- Most string-based functions work perfectly well with invalid Unicode. Does find care? Does startsWith? Does filter? The replacement character simply won't match what you're looking for. The functions themselves don't care. The replacement character is just another character. They need a way to deal with invalid Unicode, but the replacement character deals with that beautifully. The concern is whether program input is valid - whether the user manages to type in invalid Unicode due to bad terminal settings, or whether you get junk off a socket, or whether a file has been corrupted. Anything that cares should be checking that when the data enters the program so that the error can be reported to whoever or wherever the data is coming from. Having it done via exceptions later on disconnects the reporting of the error from the point when it can actually be handled. What do you do if you read in an XML file and process half of it before you hit invalid Unicode? If the whole file was read into memory, then you may not even have any any idea where that string came from, and it's likely far too late to report to the user that they're opening a corrupted file. That validation really needs to be done when the string enters the program - not at some arbitrary point later in the program when the invalid portion happens to be decoded. So, if you insist that all strings be validated, then maybe throwing an Error makes sense, but an Exception sure doesn't. And throwing an Error assumes that you always need to validate the Unicode in strings, which definitely isn't the case when the replacement character is used. So, throwing an Error is forcing everyone to validate the Unicode in their strings whether they care or not, and using the replacement character will work, whereas the programs that do care about validating their strings should be doing the validation up front anyway. So, given that the code that cares about validation needs to be validating up front and therefore doesn't care about the replacement character being used later and that programs that don't care about validating their Unicode input will work just fine with the replacement character, it seems to me that it makes perfect sense to just use the replacement character rather than throwing. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 bearophile_h...@eml.cc changed: What|Removed |Added CC||bearophile_h...@eml.cc --- Comment #8 from bearophile_h...@eml.cc --- (In reply to Walter Bright from comment #0) > Changing foreach to return replacementDchar on invalid UTF encodings fixes > these problems, and makes it possible to do faster loops. Another solution is to deprecate foreach iteration on strings, and require something like "foreach(c; mystring.byCharThrowing)" and similar things. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #7 from Vladimir Panteleev --- (In reply to Jonathan M Davis from comment #6) > Yikes. That is far worse than throwing Exceptions, since it would kill your > program, and it's indicative of a bug in the program rather than bad input. Yes. The bug is that the string should've been sanitized. > But most programs just don't care about how valid the > Unicode is, Maybe most programs YOU write. > and the fact that throwing is how it's handled is incredibly > annoying. I can see how it can be annoying - when you don't care about your data. > It forces validation on all programs whether they need it or not, > and it makes it so that string-based code can pretty much never be nothrow. Throwing errors is allowed in nothrow code. > Using the replacement character in the stead of invalid unicode is exactly > what it was created for in the first place. Yes, in circumstances when you don't care about the "invalid" data, which should always be opt-in. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #6 from Jonathan M Davis --- (In reply to Vladimir Panteleev from comment #4) > Here's a counter-proposal: when encountering invalid UTF-8, instead of > throwing exceptions, throw errors. This will fix the nothrow and performance > problems, and will avoid the risk of data corruption. Yikes. That is far worse than throwing Exceptions, since it would kill your program, and it's indicative of a bug in the program rather than bad input. > The workaround is to > pre-sanitize the input. The impact of breaking existing code is the same as > the original proposal. Pre-sanitizing input is exactly what should be done if you care about unicode validation. You validate any strings entering the program from a file, a socket, or from user input, and then you know that you're operating on valid Unicode. But most programs just don't care about how valid the Unicode is, and the fact that throwing is how it's handled is incredibly annoying. It forces validation on all programs whether they need it or not, and it makes it so that string-based code can pretty much never be nothrow. Using the replacement character in the stead of invalid unicode is exactly what it was created for in the first place. --
[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing
https://issues.dlang.org/show_bug.cgi?id=14519 --- Comment #5 from Sobirari Muhomori --- Or provide a global override similar to assertHandler. --