date:20150429

[Issue 14529] New: Bug in Regex insensitive match

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14529

  Issue ID: 14529
   Summary: Bug in Regex insensitive match
   Product: D
   Version: D2
  Hardware: x86_64
OS: Linux
Status: NEW
  Severity: major
  Priority: P1
 Component: Phobos
  Assignee: nob...@puremagic.com
  Reporter: kasamia.o.kasa...@gmail.com

The following code describes the problem:

import std.stdio;
import std.regex;

void main() {
  auto ctr = ctRegex!(r"^[CF]$", "i");
  foreach(line; stdin.byLine) {
foreach(m; line.matchAll(ctr)) {
  writeln("match: ", m.hit);
}
  }
}

--

the simple regex should match: C, c, F, f
but only C, c, F will match.

and if you switch the order inside the char class: [FC]
only F, f, C are matched, but not c

It seems like there's something wrong with the last char that should match.
The same problem happens when using regex obj too.

--

[Issue 14528] New: GIT HEAD: can't pass protected member to template by alias

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14528

  Issue ID: 14528
   Summary: GIT HEAD: can't pass protected member to template by
alias
   Product: D
   Version: D2
  Hardware: All
OS: All
Status: NEW
  Keywords: rejects-valid
  Severity: regression
  Priority: P1
 Component: DMD
  Assignee: nob...@puremagic.com
  Reporter: thecybersha...@gmail.com

This regression is an exacerbation of issue 13744 for protected members.

// f.d /
void tpl(alias a)()
{
a();
}
// c.d /
import f;

class C
{
protected static void m() {}

void fun()
{
tpl!m();
}
}


Introduced in https://github.com/D-Programming-Language/dmd/pull/4558

--

[Issue 13433] Request: Clock.currTime option to use CLOCK_REALTIME_COARSE / CLOCK_REALTIME_FAST

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=13433

--- Comment #14 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/D-Programming-Language/druntime

https://github.com/D-Programming-Language/druntime/commit/8e29e0621b074a8d368b4d7d344281adb7a91e54
Add ClockType enum to core.time for issue# 13433.

This adds an enum for indicating which type of clock to use when it's
appropriate for a time function to have multiple options for the source
clock. In the case of MonoTime, to make that work cleanly, the
implementation of MonoTime has become MonoTimeImpl, templated on
ClockType, and MonoTime has become an alias to
MonoTimeImpl!(ClockType.normal). In the case of SysTime (in a separate
PR), that will a default template argument to Clock.currTime and SysTime
will be unaffected (because in MonoTime's case, the clock that it came
from is integral to the type, whereas in SysTime's case, it doesn't
matter after the SysTime has been initialized).

https://github.com/D-Programming-Language/druntime/commit/bcfc36b3ca5a229c751c972c607fee57d4febcb2
Merge pull request #990 from jmdavis/13433

Add ClockType enum to core.time for issue# 13433.

--

[Issue 14527] New: [Enh] Instrument calls to operator new with -profilenew compiler switch

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14527

  Issue ID: 14527
   Summary: [Enh] Instrument calls to operator new with
-profilenew compiler switch
   Product: D
   Version: D2
  Hardware: All
OS: All
Status: NEW
  Severity: enhancement
  Priority: P1
 Component: DMD
  Assignee: nob...@puremagic.com
  Reporter: bugzi...@digitalmars.com

Throwing the -profilenew switch to the compiler will case file, line, and
function data to be added to the call. druntime's default behavior with this
will be to report every location that allocates memory and how much memory. The
user will be able to provide their own logging capability by overriding the
default functions in druntime.

An initial implementation:

https://github.com/D-Programming-Language/dmd/pull/4621

--

[Issue 13867] Overriding a method from an extern(C++) interface requires extern(C++) on the method definition

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=13867

nick  changed:

   What|Removed |Added

   Severity|enhancement |normal

--

[Issue 12803] __traits(getFunctionAttributes) is not documented

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=12803

github-bugzi...@puremagic.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--

[Issue 12803] __traits(getFunctionAttributes) is not documented

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=12803

--- Comment #2 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/D-Programming-Language/dlang.org

https://github.com/D-Programming-Language/dlang.org/commit/884f46101ea0cdb611cc3c5c43a1961202818980
Fix issue 12803

https://github.com/D-Programming-Language/dlang.org/commit/6be29ea15f5666361c3974bc014d1fad8f19d28a
Merge pull request #984 from nomad-software/issue_12803

Issue 12803 - __traits(getFunctionAttributes) is not documented

--

[Issue 13374] Wrong template overload resolution when passing function to alias/string parameter

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=13374

Vladimir Panteleev  changed:

   What|Removed |Added

   See Also||https://issues.dlang.org/sh
   ||ow_bug.cgi?id=14520

--

[Issue 14520] [REG2.067.0] string/alias template overload

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14520

Vladimir Panteleev  changed:

   What|Removed |Added

   Keywords||rejects-valid

--

[Issue 14520] [REG2.067.0] string/alias template overload

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14520

Vladimir Panteleev  changed:

   What|Removed |Added

   See Also||https://issues.dlang.org/sh
   ||ow_bug.cgi?id=13374

--- Comment #1 from Vladimir Panteleev  ---
This bug is the reincarnation of issue 13374 (the reduced code is different,
but the original code was broken once again).

The full timeline:

- v2.060  : works
- v2.061  : broken (https://github.com/D-Programming-Language/dmd/pull/599)
- v2.066.1: fixed  (https://github.com/D-Programming-Language/dmd/pull/3897)
- v2.076.0: broken (https://github.com/D-Programming-Language/dmd/pull/4375)

--

[Issue 14497] Disassembly view

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14497

--- Comment #2 from Manu  ---
Yeah, requiring that the program link is annoying, and if the program is big
(mine are), then the build times can get long, and iteration is slow.

Short of source, at very least, there needs to be symbol names at the header of
blocks of code. It must be easier to populate the assembly with symbol name
headers than full source?
As long as you can identify the start and end of the function you're interested
in, that will give an 80% solution satisfying the majority if simple cases.

Do the GNU tools make this easier? I imagine there must be tools in the
GCC/Clang (GDC/LDC?) suite that do the full job? It might be easier to start
there? Also be useful in that you could disassemble non-x86 arch-es too.

--

[Issue 14526] New: GetOptException DDOC needs cleanup

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14526

  Issue ID: 14526
   Summary: GetOptException DDOC needs cleanup
   Product: D
   Version: D2
  Hardware: All
   URL: http://dlang.org/phobos/std_getopt.html#.GetOptExcepti
on
OS: All
Status: NEW
  Keywords: ddoc
  Severity: trivial
  Priority: P1
 Component: Phobos
  Assignee: nob...@puremagic.com
  Reporter: briancsch...@gmail.com

http://dlang.org/phobos/std_getopt.html#.GetOptException

It needs $(UL) and $(LI) macros. It also needs to list the other conditions
under which it is thrown.

--

[Issue 14525] New: Cannot access help information from getopt if a required parameter is not given

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14525

  Issue ID: 14525
   Summary: Cannot access help information from getopt if a
required parameter is not given
   Product: D
   Version: D2
  Hardware: All
OS: All
Status: NEW
  Severity: normal
  Priority: P1
 Component: Phobos
  Assignee: nob...@puremagic.com
  Reporter: briancsch...@gmail.com

http://forum.dlang.org/thread/tjraqgbvwsqgynmzj...@forum.dlang.org

The problem is that getopt needs to a) throw an exception and b) return a valid
GetoptResult so that the program can print help information when the exception
is thrown. Obviously, this isn't possible, so we need to find some other
solution.

Maybe including the options array in the GetOptException?

--

[Issue 14524] New: Right clicking in solution explorer to add folders does not work as expected

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14524

  Issue ID: 14524
   Summary: Right clicking in solution explorer to add folders
does not work as expected
   Product: D
   Version: D2
  Hardware: x86_64
OS: Windows
Status: NEW
  Severity: normal
  Priority: P1
 Component: VisualD
  Assignee: nob...@puremagic.com
  Reporter: philip.daniels1...@gmail.com

I can right click in the solution explorer to create a folder, but this appears
to just be a "solution folder". No actual folder is created on disk, which
means that the next step - creating a source file in that folder - fails.

The only way that works seems to be to create the folder in Visual D, flip to
Windows explorer and create the backing folder on disk, then flip back to
Visual D to add a new file, being careful to specify the "Create in" correctly
at the bottom of the Add New Item dialog box.

Basically, it should work the way it does in C#:

Right Click -> Add New Folder creates a new folder in solution explorer and on
disk.
Right Click Folder -> Add New Item opens the Add New Item dialog box with the
Location synced to the folder.

--

[Issue 14497] Disassembly view

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14497

Rainer Schuetze  changed:

   What|Removed |Added

 CC||r.sagita...@gmx.de

--- Comment #1 from Rainer Schuetze  ---
Interesting idea. That would really be helpful for immediately seeing what the
optimizer is able/unable to do.

I use the "Compile and Debug" command sometimes with unittests and option
"-main", though that still needs you to set a breakpoint somewhere and wait for
the debugger to start.

Getting the disassembly of an object file with obj2asm/dumpbin is possible, but
syncing with the source needs to read debug information. This can be rather
annoying.

--

[Issue 14523] New: New Windows Application uses incorrect initialization/termination code

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14523

  Issue ID: 14523
   Summary: New Windows Application uses incorrect
initialization/termination code
   Product: D
   Version: D2
  Hardware: x86_64
OS: Windows
Status: NEW
  Severity: minor
  Priority: P1
 Component: VisualD
  Assignee: nob...@puremagic.com
  Reporter: philip.daniels1...@gmail.com

When you create a new Windows Application it adds two lines to winmain.d

Runtime.initialize(&exceptionHandler);
Runtime.terminate(&exceptionHandler);

these invocations are deprecated. The correct ones seem to be

Runtime.initialize();
Runtime.terminate();


Ref: See http://wiki.dlang.org/D_for_Win32
and
http://forum.dlang.org/thread/mailman.199.1389129967.15871.digitalmar...@puremagic.com

--

[Issue 14215] invalid import in core.sys.linux.stdio

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14215

Joakim  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Joakim  ---
Commits pushed to master at https://github.com/D-Programming-Language/druntime

https://github.com/D-Programming-Language/druntime/commit/68b9e9ce325d58812e73ca8f0cd268c05f651d2e
Remove references to importing core.stdc.stddef for size_t or ptrdiff_t

https://github.com/D-Programming-Language/druntime/commit/18d57ffe3eed8674ca2052656bb3f410084379f6
Merge pull request #1232 from joakim-noah/size_t

Fix 14215 - Unnecessary imports of core.stdc.stddef for size_t and ptrdiff_t

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #22 from Marc Schütz  ---
(In reply to Vladimir Panteleev from comment #20)
> (In reply to Marc Schütz from comment #18)
> > Data with other (or unknown) encodings needs to be stored in `ubyte[]`.
> 
> Have you tried using ubyte[] to process ASCII text? It's horrible, you have
> to cast at every step, and nothing in std.string works even when it should.

For ASCII text, char[] is okay, UTF8 is a superset of ASCII.

But you're right for other encodings. That's why those need to be converted "at
the border": To UTF8 when read from a file or stdin, main() args, env vars, and
from UTF8 to whatever on writing. Internally, they need to be UTFx encoded.
This is the only sane way to handle different text encodings, IMO.

--

[Issue 14277] Compile-time array casting error - ugly error report

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14277

--- Comment #4 from Ketmar Dark  ---
this also ruing things like `typeof(smth).stringof[$-2..$] == "[]"` for
example. so it's unusable.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #21 from Sobirari Muhomori  ---
(In reply to Vladimir Panteleev from comment #16)
> > Global opt-in for foreach is not feasible.
> 
> I agree - some libraries will expect one thing, and others another.

Libraries don't determine on which data the program operates, it depends on the
program and its environment, encoding mismatch has large scale consequence too:
program crashes or corrupts data, libraries don't decide how to behave in such
cases, it's a property of the program as a whole. Since they can't decide how
to behave in such cases, they shouldn't decide and thus can't have different
expectations on this matter, it's a per-program aspect.

--

[Issue 14473] Remove deprecated HTML tags from ddoc output

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14473

--- Comment #1 from Gary Willoughby  ---
Discussion regarding this issue:
http://forum.dlang.org/thread/fmgylnkatvuuoeosc...@forum.dlang.org

--

[Issue 12803] __traits(getFunctionAttributes) is not documented

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=12803

Gary Willoughby  changed:

   What|Removed |Added

   Keywords||pull
 CC||d...@nomad.so

--- Comment #1 from Gary Willoughby  ---
https://github.com/D-Programming-Language/dlang.org/pull/984

--

[Issue 13440] Keyed array literal is not documented

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=13440

Gary Willoughby  changed:

   What|Removed |Added

 CC||d...@nomad.so

--- Comment #1 from Gary Willoughby  ---
The documentation exists here:

http://dlang.org/arrays.html#static-init-static

--

[Issue 14522] Postfix array declaration examples should be removed from arrays.html

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14522

Vladimir Panteleev  changed:

   What|Removed |Added

 CC||thecybersha...@gmail.com

--- Comment #1 from Vladimir Panteleev  ---
That code generates 6 warnings and 2 errors.

Kill it with fire.

--

[Issue 14522] New: Postfix array declaration examples should be removed from arrays.html

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14522

  Issue ID: 14522
   Summary: Postfix array declaration examples should be removed
from arrays.html
   Product: D
   Version: D2
  Hardware: All
OS: All
Status: NEW
  Severity: enhancement
  Priority: P1
 Component: websites
  Assignee: nob...@puremagic.com
  Reporter: d...@nomad.so

Postfix array declaration examples should be removed from the following page as
these are heavily discouraged.

http://dlang.org/arrays.html

--

[Issue 14328] The terms "lvalue" and "rvalue" should be added to the glossary

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14328

Gary Willoughby  changed:

   What|Removed |Added

   Keywords||pull
 CC||d...@nomad.so

--- Comment #1 from Gary Willoughby  ---
https://github.com/D-Programming-Language/dlang.org/pull/983

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #20 from Vladimir Panteleev  ---
(In reply to Marc Schütz from comment #18)
> Data with other (or unknown) encodings needs to be stored in `ubyte[]`.

Have you tried using ubyte[] to process ASCII text? It's horrible, you have to
cast at every step, and nothing in std.string works even when it should.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #19 from Vladimir Panteleev  ---
(In reply to Vladimir Panteleev from comment #16)
> (In reply to Walter Bright from comment #15)
> > It still allocates memory. But it's worth thinking about. Maybe assert()?
> 
> Sure.

Wait, now I'm not sure. For some reason I was thinking of assert(false) which
will always stop executions. But continuing upon encountering invalid UTF-8 in
release mode might result in bad outcomes as well.

The problem is that it's impossible to achieve 100% coverage and make sure that
all Unicode-handling code in your program also handles invalid UTF-8 in a good
way. Thus, an invalid UTF-8 handling problem might not be caught in testing but
might cause an unpleasant situation in release mode (depending on what happens
next after the assert is NOT thrown).

I don't feel too strongly about this though, I think programs that operate on
important data shouldn't run with -release anyway.

--

[Issue 14521] New: Glossary page needs updating

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14521

  Issue ID: 14521
   Summary: Glossary page needs updating
   Product: D
   Version: D2
  Hardware: All
OS: All
Status: NEW
  Severity: enhancement
  Priority: P1
 Component: websites
  Assignee: nob...@puremagic.com
  Reporter: d...@nomad.so

On the glossary page not all of the item headings have anchors.

Also there is a 'input range' entry which lists the wrong interface. This
should be removed and replaced with a general 'range' item. This would explain
ranges in general and not just one type of range.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

Marc Schütz  changed:

   What|Removed |Added

 CC||schue...@gmx.net

--- Comment #18 from Marc Schütz  ---
(In reply to Walter Bright from comment #15)
> If you have a pipeline A.B.C.D, then A throws on invalid UTF, and B.C.D
> never are executed. But if A does not throw, then B.C.D guaranteed to be
> getting valid UTF, but they still pay the penalty of the compiler thinking
> they can allocate memory and throw.

When `assert()` is used, whatever cost there is will of course disappear with
`-release`.

And IMO asserting is the right thing to do. Quoting the spec [1]:

"char[] strings are in UTF-8 format. wchar[] strings are in UTF-16 format.
dchar[] strings are in UTF-32 format."

Note how it says "are in UTF-x format", not "should be". Therefore, a `string`
not containing UTF8 is by definition a bug.

Data with other (or unknown) encodings needs to be stored in `ubyte[]`.

[1] http://dlang.org/arrays.html#strings

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #17 from Vladimir Panteleev  ---
Let's see if I understand the situation correctly... let's say we have a chain:

str.a.b.c

So, str is a UTF-8 string, and a, b and c are range algorithms (they use
.front/.popFront and provide .front/.popFront themselves).

If a/b/c don't throw anything themselves, the nothrow attribute will be
inferred from the .front/.popFront of the range in front of them (the range
they consume), right?

That means that if str.front can throw, c can't be nothrow. But if str.front is
nothrow, then c CAN be nothrow.

But what if we do this:

str.forceDecode.a.b.c

forceDecode doesn't use str.front - it reads the str directly, code unit by
code unit, and inserts replacement characters where it sees error. This allows
a, b and c to be nothrow.

Unless I'm wrong, I think this idea could work for opt-in replacement character
substitution. Following the 90/10 law, it should be easy to insert
"forceDecode" in the few relevant places as indicated by a profiler.

Does this proposal make sense?

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #16 from Vladimir Panteleev  ---
(In reply to Walter Bright from comment #15)
> It still allocates memory. But it's worth thinking about. Maybe assert()?

Sure.

> > I did not mean Unicode normalization - it was a joke (std.algorithm will
> > "normalize" invalid UTF characters to the replacement character). But since
> > .front on strings autodecodes, feeding a string to any generic range
> > function in std.algorithm will cause auto-decoding (and thus, character
> > substitution).
> 
> That can be fixed as I suggested.

Sorry, I'm not following. Which suggestion here will fix what in what way?

> Global opt-in for foreach is not feasible.

I agree - some libraries will expect one thing, and others another.

> However, one can add an algorithm
> "validate" which throws on invalid UTF, and put that at the start of a
> pipeline, as in:
> 
> text.validate.A.B.C.D;

This is part of a solution. There also needs to be a way to ensure that
validate was called, which is the hard part.

> You brought up guessing the encoding of XML text by reading the start of it:
> "what if it was some 8-bit encoding that only LOOKED like valid UTF-8?"

No, that's not what I meant.

UTF-8 and old 8-bit encodings (ISO 8859-*, Windows-125*) both use the high bit
in the byte to indicate Unicode. Consider a program that expects an UTF-8
document, but is actually fed one in an 8-bit encoding: it is possible
(although unlikely) that text that is actually in an 8-bit encoding may be
successfully interpreted as a valid UTF-8 stream. Thus, invalid UTF-8 can
indicate a problem with the entire document, and not just the immediate
sequence of bytes.

> If you have a pipeline A.B.C.D, then A throws on invalid UTF, and B.C.D
> never are executed. But if A does not throw, then B.C.D guaranteed to be
> getting valid UTF, but they still pay the penalty of the compiler thinking
> they can allocate memory and throw.

OK, so you're saying that we can somehow automatically remove the cost of
handling invalid UTF-8 if we know that the UTF-8 we're getting is valid? I
don't see how this would work in practice, or how it would provide a noticeable
benefit in practice. Since the cost of removing a code path is negligible, I
assume you're talking about exception frames, but I still don't see how this
applies. Could you elaborate, or is this improvement a theory for now?

Besides, won't A's output be a range of dchar, so B, C and D will not
autodecode with or without this change?

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #15 from Walter Bright  ---
(In reply to Vladimir Panteleev from comment #14)
> If I understand correctly, throwing Error instead of Exception will also
> solve the performance issues

It still allocates memory. But it's worth thinking about. Maybe assert()?


> Ditto, but the @nogc aspect can also be solved with the refcounted
> exceptions spec, which will fix the problem in general.

We'll see. That's still a ways off.


> > 2. Same thing. (Running normalization on passwords? What the hell?)
> 
> I did not mean Unicode normalization - it was a joke (std.algorithm will
> "normalize" invalid UTF characters to the replacement character). But since
> .front on strings autodecodes, feeding a string to any generic range
> function in std.algorithm will cause auto-decoding (and thus, character
> substitution).

That can be fixed as I suggested.


> > The replacement char thing was not invented by me, it is commonplace as
> > users don't like their documents being wholly rejected for one or two bad
> > encodings.
> I know, I agree it's useful, but it needs to be opt-in.

Global opt-in for foreach is not feasible. However, one can add an algorithm
"validate" which throws on invalid UTF, and put that at the start of a
pipeline, as in:

text.validate.A.B.C.D;


> > I know that many programs try to guess the encoding of random text they get.
> > Doing this by only reading a few characters, and assuming the rest, is a
> > strange method if one cares about the integrity of the data.
> 
> I don't see how this is relevant, sorry.

You brought up guessing the encoding of XML text by reading the start of it:
"what if it was some 8-bit encoding that only LOOKED like valid UTF-8?"

> > Having to constantly re-sanitize data, at every step in the pipeline, is
> > going to make D programs uncompetitive speed-wise.
> 
> I don't understand what you mean by this. You could say that any way to
> handle invalid UTF can be seen as a way of sanitizing data: there will
> always be a code path for what to do when invalid UTF is encountered. I
> would interpret "no sanitization" as not handling invalid UTF in any way
> (i.e. treating it in an undefined way).

If you have a pipeline A.B.C.D, then A throws on invalid UTF, and B.C.D never
are executed. But if A does not throw, then B.C.D guaranteed to be getting
valid UTF, but they still pay the penalty of the compiler thinking they can
allocate memory and throw.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #14 from Vladimir Panteleev  ---
(In reply to Walter Bright from comment #13)
> Vladimir, you bring up good points. I'll try to address them. First off, why
> do this?
> 
> 1. much faster

If I understand correctly, throwing Error instead of Exception will also solve
the performance issues

> 2. string processing can be @nogc and nothrow. If you follow external
> discussions on the merits of D, the "D is no good because Phobos requires
> the GC" ALWAYS comes up, and sucks all the energy out of the conversation.

Ditto, but the @nogc aspect can also be solved with the refcounted exceptions
spec, which will fix the problem in general.

> So, on to your points:
> 
> 1. Replacement only happens when doing a UTF decoding. S+R doesn't have to
> do conversion, and that's one of the things I want to fix in std.algorithm.
> The string fixes I've done in std.string avoid decoding as much as possible.

Inevitably it is still very easy to to accidentally use something that
auto-decodes. There is no way to statically make sure that you don't (except
for using a non-string type for text, which is impractical), and with this
proposed change, there will be no run-time way to handle this either.

> 2. Same thing. (Running normalization on passwords? What the hell?)

I did not mean Unicode normalization - it was a joke (std.algorithm will
"normalize" invalid UTF characters to the replacement character). But since
.front on strings autodecodes, feeding a string to any generic range function
in std.algorithm will cause auto-decoding (and thus, character substitution).

> The replacement char thing was not invented by me, it is commonplace as
> users don't like their documents being wholly rejected for one or two bad
> encodings.

I know, I agree it's useful, but it needs to be opt-in.

> I know that many programs try to guess the encoding of random text they get.
> Doing this by only reading a few characters, and assuming the rest, is a
> strange method if one cares about the integrity of the data.

I don't see how this is relevant, sorry.

> Having to constantly re-sanitize data, at every step in the pipeline, is
> going to make D programs uncompetitive speed-wise.

I don't understand what you mean by this. You could say that any way to handle
invalid UTF can be seen as a way of sanitizing data: there will always be a
code path for what to do when invalid UTF is encountered. I would interpret "no
sanitization" as not handling invalid UTF in any way (i.e. treating it in an
undefined way).

--

[Issue 14470] Reuse of object memory: new emplace overload

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14470

Walter Bright  changed:

   What|Removed |Added

 CC||bugzi...@digitalmars.com
Version|unspecified |D2

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #13 from Walter Bright  ---
Vladimir, you bring up good points. I'll try to address them. First off, why do
this?

1. much faster

2. string processing can be @nogc and nothrow. If you follow external
discussions on the merits of D, the "D is no good because Phobos requires the
GC" ALWAYS comes up, and sucks all the energy out of the conversation.

So, on to your points:

1. Replacement only happens when doing a UTF decoding. S+R doesn't have to do
conversion, and that's one of the things I want to fix in std.algorithm. The
string fixes I've done in std.string avoid decoding as much as possible.

2. Same thing. (Running normalization on passwords? What the hell?)

The replacement char thing was not invented by me, it is commonplace as users
don't like their documents being wholly rejected for one or two bad encodings.

I know that many programs try to guess the encoding of random text they get.
Doing this by only reading a few characters, and assuming the rest, is a
strange method if one cares about the integrity of the data.

Having to constantly re-sanitize data, at every step in the pipeline, is going
to make D programs uncompetitive speed-wise.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #12 from Walter Bright  ---
(In reply to bearophile_hugs from comment #8)
> Another solution is to deprecate foreach iteration on strings, and require
> something like "foreach(c; mystring.byCharThrowing)" and similar things.

That's not a solution as I bet it breaks 50% of the programs out there.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #11 from Walter Bright  ---
https://github.com/D-Programming-Language/druntime/pull/1240

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #10 from Vladimir Panteleev  ---
OK, I see from your post that you don't see many of the problems with the
replacement character. Let me show you some example problematic situations:

1.

Bob wants to update his company's documents to use the new name for his
product. He writes a program that does a recursive pattern search & replace in
a directory. After testing the program on a few sample files, he is satisfied
with the results, and runs the program on his company's document store.

Six months later, long after the documents went out of backup rotation, Sue
finds that some important historical documents have been irreversibly corrupted
and full of Unicode replacement characters encoded as UTF-8. Why? Because these
old documents did not use UTF-8, and Bob used D.

2.

Bob is writing a secure server-side software package (let's say, a confidential
document store). He is using a std.algorithm-based hashing algorithm to store
the passwords securely. At some point, Mary signs up and creates a secure
password, which contains entirely Cyrillic letters (let's say, "ЭтоМойПароль").

Not long after, Eve successfully logs into Mary's account with the password
"". Why? Because the passwords just happened to be sent in some
non-UTF-8 encoding, and, since Bob used D, when "normalized" through
std.algorithm's replacement character subtitution, all Unicode-only passwords
of the same length have the same hash.

Automatic use of the replacement character will come as a surprise to many
people who come from other languages. For example, in Delphi, strings are also
the de-facto ubyte[] / void[] type - you can safely read a binary file into a
string, perform search and replace, and write it back, knowing that the result
will be exactly what you expected.

Furthermore, from your message it appears to me that you've missed the point of
my argument:

> What do you do if you read in an XML file and process half of it before you 
> hit invalid Unicode?

You abort! This should not happen. Either the XML file is in an incorrect
encoding (which puts to question the integrity of all the data parsed so far -
what if it was some 8-bit encoding that only LOOKED like valid UTF-8?) or the
program should've sanitized the input first if it really didn't care about data
correctness. But this is an XML file, meaning it's very likely to be machine
generated - if it contains errors, it might indicate a problem somewhere else
in the system, which is why it's all the more important to abort and get the
user to figure out the true source of the problem. Ignoring the error here
reminds me of how PHP never stops on errors by default, or Basic's "ON ERROR
GOTO NEXT".

> So, throwing an Error is forcing everyone to validate the Unicode in their 
> strings whether they care or not, and using the replacement character will 
> work, whereas the programs that do care about validating their strings should 
> be doing the validation up front anyway.

Yes, but then there is no way to make sure you're not accidentally corrupting
data! Whereas now we only have a runtime check against invalid UTF-8, now we
will have no check at all. With no automatic mechanism to ensure that all text
is sanitized before it gets into std.algorithm, it becomes impossible to be
sure that you're not accidentally corrupting data along the way.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #9 from Jonathan M Davis  ---
Most string-based functions work perfectly well with invalid Unicode. Does find
care? Does startsWith? Does filter? The replacement character simply won't
match what you're looking for. The functions themselves don't care. The
replacement character is just another character. They need a way to deal with
invalid Unicode, but the replacement character deals with that beautifully.

The concern is whether program input is valid - whether the user manages to
type in invalid Unicode due to bad terminal settings, or whether you get junk
off a socket, or whether a file has been corrupted. Anything that cares should
be checking that when the data enters the program so that the error can be
reported to whoever or wherever the data is coming from. Having it done via
exceptions later on disconnects the reporting of the error from the point when
it can actually be handled. What do you do if you read in an XML file and
process half of it before you hit invalid Unicode? If the whole file was read
into memory, then you may not even have any any idea where that string came
from, and it's likely far too late to report to the user that they're opening a
corrupted file. That validation really needs to be done when the string enters
the program - not at some arbitrary point later in the program when the invalid
portion happens to be decoded. So, if you insist that all strings be validated,
then maybe throwing an Error makes sense, but an Exception sure doesn't. And
throwing an Error assumes that you always need to validate the Unicode in
strings, which definitely isn't the case when the replacement character is
used. So, throwing an Error is forcing everyone to validate the Unicode in
their strings whether they care or not, and using the replacement character
will work, whereas the programs that do care about validating their strings
should be doing the validation up front anyway.

So, given that the code that cares about validation needs to be validating up
front and therefore doesn't care about the replacement character being used
later and that programs that don't care about validating their Unicode input
will work just fine with the replacement character, it seems to me that it
makes perfect sense to just use the replacement character rather than throwing.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

bearophile_h...@eml.cc changed:

   What|Removed |Added

 CC||bearophile_h...@eml.cc

--- Comment #8 from bearophile_h...@eml.cc ---
(In reply to Walter Bright from comment #0)

> Changing foreach to return replacementDchar on invalid UTF encodings fixes
> these problems, and makes it possible to do faster loops.

Another solution is to deprecate foreach iteration on strings, and require
something like "foreach(c; mystring.byCharThrowing)" and similar things.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #7 from Vladimir Panteleev  ---
(In reply to Jonathan M Davis from comment #6)
> Yikes. That is far worse than throwing Exceptions, since it would kill your
> program, and it's indicative of a bug in the program rather than bad input.

Yes. The bug is that the string should've been sanitized.

> But most programs just don't care about how valid the
> Unicode is,

Maybe most programs YOU write.

> and the fact that throwing is how it's handled is incredibly
> annoying.

I can see how it can be annoying - when you don't care about your data.

> It forces validation on all programs whether they need it or not,
> and it makes it so that string-based code can pretty much never be nothrow.

Throwing errors is allowed in nothrow code.

> Using the replacement character in the stead of invalid unicode is exactly
> what it was created for in the first place.

Yes, in circumstances when you don't care about the "invalid" data, which
should always be opt-in.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #6 from Jonathan M Davis  ---
(In reply to Vladimir Panteleev from comment #4)
> Here's a counter-proposal: when encountering invalid UTF-8, instead of
> throwing exceptions, throw errors. This will fix the nothrow and performance
> problems, and will avoid the risk of data corruption.


Yikes. That is far worse than throwing Exceptions, since it would kill your
program, and it's indicative of a bug in the program rather than bad input.

> The workaround is to
> pre-sanitize the input. The impact of breaking existing code is the same as
> the original proposal.

Pre-sanitizing input is exactly what should be done if you care about unicode
validation. You validate any strings entering the program from a file, a
socket, or from user input, and then you know that you're operating on valid
Unicode. But most programs just don't care about how valid the Unicode is, and
the fact that throwing is how it's handled is incredibly annoying. It forces
validation on all programs whether they need it or not, and it makes it so that
string-based code can pretty much never be nothrow. Using the replacement
character in the stead of invalid unicode is exactly what it was created for in
the first place.

--

[Issue 14519] [Enh] foreach on strings should return replacementDchar rather than throwing

2015-04-29 Thread via Digitalmars-d-bugs

https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #5 from Sobirari Muhomori  ---
Or provide a global override similar to assertHandler.

--

44 matches

Mail list logo