Re: DMD is faster than LDC and GDC
On Saturday, 14 November 2015 at 00:37:51 UTC, Manu wrote: In the meantime, there probably needs to be strong warnings about violating attributes, and if patterns have emerged that rely on violating such attributes, we should publish a recommended alternative. One pattern that comes to mind immediately is lazily initialized members in a const object. Another one that's already officially supported is impure debug statements in pure functions.
Re: DMD is faster than LDC and GDC
On 11/12/2015 11:50 AM, Ali Çehreli wrote: I would love to be convinced. :) Can someone come up with a reduced example please? On 11/12/2015 03:59 AM, Daniel Kozak wrote: > for (i=0; i < 100; ++i) { > fmttable(table); > } I think what we are seeing here is more due to the unused side-effect in the loop, where compiling with -w fails compilation: Warning: calling deneme.fmttable without side effects discards return value of type string, prepend a cast(void) if intentional Ali Can someone please tell me if I am mistaken not. Once again, I don't think this example is fast because the compiler reuses the return value of fmttable() in the loop. Rather, it simply removes the whole expression because its only side-effect is not used in the program. Perhaps that's what everybody else is saying anyway. :) (Why don't I look at the assembly myself? Going to a meeting... :p) Ali
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 21:24:30 UTC, David Nadlinger wrote: On Thursday, 12 November 2015 at 21:16:25 UTC, Walter Bright wrote: [...] Oh, GCC has had similar notions as a non-standard attribute for ages, and LLVM since its inception. At least for LDC, the reason why we do not currently lower many of the qualifiers like pure, nothrow, immutable, etc. is that LLVM will ruthlessly consider your code to exhibit undefined behavior if you try to be clever and violate them, subsequently optimizing based on that. In other words, if you cast away const/immutable and modify a variable, for instance, you might find that the entire function body magically disappears under your feet. Maybe it is time to revisit this, though, but last time I tried it broke druntime/Phobos in a couple of places. That sounds awesome. Maybe only enable it for @safe code?
Re: DMD is faster than LDC and GDC
On Friday, 13 November 2015 at 19:59:51 UTC, Ali Çehreli wrote: On 11/12/2015 11:50 AM, Ali Çehreli wrote: I would love to be convinced. :) Can someone come up with a reduced example please? On 11/12/2015 03:59 AM, Daniel Kozak wrote: > for (i=0; i < 100; ++i) { > fmttable(table); > } I think what we are seeing here is more due to the unused side-effect in the loop, where compiling with -w fails compilation: Warning: calling deneme.fmttable without side effects discards return value of type string, prepend a cast(void) if intentional Ali Can someone please tell me if I am mistaken not. Once again, I don't think this example is fast because the compiler reuses the return value of fmttable() in the loop. Rather, it simply removes the whole expression because its only side-effect is not used in the program. Perhaps that's what everybody else is saying anyway. :) (Why don't I look at the assembly myself? Going to a meeting... :p) Ali I confirm it, here is the loop part: 0x004375c231db xor ebx, ebx ┌─> 0x004375c4ffc3 inc ebx │ 0x004375c681fb40420f00 cmp ebx, 0xf4240 └─< 0x004375cc72f6 jb 0x4375c4 0x004375ce8bfb mov edi, ebx 0x004375d0e8230d call sym._D3std5stdio14__T7writelnTiZ7writelnFNfiZv (DMD 2.069 -O -release -inline -boundscheck=off, code from the first post)
Re: DMD is faster than LDC and GDC
On 13 November 2015 at 08:38, Iain Buclaw via Digitalmars-dwrote: > On 12 Nov 2015 10:25 pm, "David Nadlinger via Digitalmars-d" > wrote: >> >> On Thursday, 12 November 2015 at 21:16:25 UTC, Walter Bright wrote: >>> >>> It's more than that - dmd's optimizer is designed to make use of the >>> guarantees of a pure function. Since C/C++ do not have pure functions, >>> ldc/gdc's optimizer may not have that capability. >> >> >> Oh, GCC has had similar notions as a non-standard attribute for ages, and >> LLVM since its inception. >> >> At least for LDC, the reason why we do not currently lower many of the >> qualifiers like pure, nothrow, immutable, etc. is that LLVM will ruthlessly >> consider your code to exhibit undefined behavior if you try to be clever and >> violate them, subsequently optimizing based on that. In other words, if you >> cast away const/immutable and modify a variable, for instance, you might >> find that the entire function body magically disappears under your feet. >> >> Maybe it is time to revisit this, though, but last time I tried it broke >> druntime/Phobos in a couple of places. >> > > Same here, and for some very surprising reasons from what I recall. These language mechanisms offer D a huge potential advantage, it would be really good to understand why we can't make use of them, and work towards fixing this. I don't think people should be surprised if the optimiser takes advantage of their code attribution. It may break existing code because violating these attributes never caused any problem before, but surely violating those attributes was never actually valid code, and it's reasonable that they expect their code to break in the future as compilers improve their ability to take advantage of these attributes? In the meantime, there probably needs to be strong warnings about violating attributes, and if patterns have emerged that rely on violating such attributes, we should publish a recommended alternative.
Re: DMD is faster than LDC and GDC
I would love to be convinced. :) Can someone come up with a reduced example please? On 11/12/2015 03:59 AM, Daniel Kozak wrote: > for (i=0; i < 100; ++i) { > fmttable(table); > } I think what we are seeing here is more due to the unused side-effect in the loop, where compiling with -w fails compilation: Warning: calling deneme.fmttable without side effects discards return value of type string, prepend a cast(void) if intentional Ali
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 19:11:25 UTC, tired_eyes wrote: On Thursday, 12 November 2015 at 14:44:49 UTC, John Colvin wrote: To test the speed of fmttable itself I split fmttable and main in to different modules, made fmttable extern(C) so I could just prototype it in the main module (no import), then compiled them separately before linking. This should prevent any possible inlining/purity cleverness. ~1s for ldmd2, ~2s for dmd, which is business as normal. dmd is being clever and spotting that fmttable is pure, it would be good if ldc/gdc could spot this to. If so, should explicitly marking fmttable as pure close the gap for initial code? Well it won't do any harm, but it really depends on what the compiler chooses to do with the information.
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 21:16:25 UTC, Walter Bright wrote: It's more than that - dmd's optimizer is designed to make use of the guarantees of a pure function. Since C/C++ do not have pure functions, ldc/gdc's optimizer may not have that capability. Oh, GCC has had similar notions as a non-standard attribute for ages, and LLVM since its inception. At least for LDC, the reason why we do not currently lower many of the qualifiers like pure, nothrow, immutable, etc. is that LLVM will ruthlessly consider your code to exhibit undefined behavior if you try to be clever and violate them, subsequently optimizing based on that. In other words, if you cast away const/immutable and modify a variable, for instance, you might find that the entire function body magically disappears under your feet. Maybe it is time to revisit this, though, but last time I tried it broke druntime/Phobos in a couple of places. — David
Re: DMD is faster than LDC and GDC
On 11/12/2015 6:44 AM, John Colvin wrote: dmd is being clever and spotting that fmttable is pure, it would be good if ldc/gdc could spot this to. It's more than that - dmd's optimizer is designed to make use of the guarantees of a pure function. Since C/C++ do not have pure functions, ldc/gdc's optimizer may not have that capability.
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 21:16:25 UTC, Walter Bright wrote: On 11/12/2015 6:44 AM, John Colvin wrote: dmd is being clever and spotting that fmttable is pure, it would be good if ldc/gdc could spot this to. It's more than that - dmd's optimizer is designed to make use of the guarantees of a pure function. Since C/C++ do not have pure functions, ldc/gdc's optimizer may not have that capability. gcc has had the pure function attribute since version 2.96 in 2000.
Re: DMD is faster than LDC and GDC
On 12 Nov 2015 10:25 pm, "David Nadlinger via Digitalmars-d" < digitalmars-d@puremagic.com> wrote: > > On Thursday, 12 November 2015 at 21:16:25 UTC, Walter Bright wrote: >> >> It's more than that - dmd's optimizer is designed to make use of the guarantees of a pure function. Since C/C++ do not have pure functions, ldc/gdc's optimizer may not have that capability. > > > Oh, GCC has had similar notions as a non-standard attribute for ages, and LLVM since its inception. > > At least for LDC, the reason why we do not currently lower many of the qualifiers like pure, nothrow, immutable, etc. is that LLVM will ruthlessly consider your code to exhibit undefined behavior if you try to be clever and violate them, subsequently optimizing based on that. In other words, if you cast away const/immutable and modify a variable, for instance, you might find that the entire function body magically disappears under your feet. > > Maybe it is time to revisit this, though, but last time I tried it broke druntime/Phobos in a couple of places. > Same here, and for some very surprising reasons from what I recall.
Re: DMD is faster than LDC and GDC
On 12 November 2015 at 12:59, Daniel Kozak via Digitalmars-d < digitalmars-d@puremagic.com> wrote: > code: > > > > GDC (-O3 -finline -frelease -fno-bounds-check): > real0m0.724s > user0m0.720s > sys 0m0.003s > > Not to be pedantic, but -finline does nothing (what you really want is -finline-functions) However -finline-functions is enabled automatically at -O3, so the whole -finline just becomes wasted typing. :-)
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 11:59:50 UTC, Daniel Kozak wrote: code: import std.stdio; auto fmttable(immutable(string[][]) table) { import std.array : appender, uninitializedArray; import std.range : take, repeat; import std.exception : assumeUnique; auto res = appender(uninitializedArray!(char[])(128)); res.clear(); if (table.length == 0) return ""; // column widths auto widths = new int[](table[0].length); foreach (rownum, row; table) { foreach (colnum, cell; row) { if (cell.length > widths[colnum]) widths[colnum] = cast(int)cell.length; } } foreach (row; table) { res ~= "|"; foreach (colnum, cell; row) { int l = widths[colnum] - cast(int)cell.length; res ~= cell; if (l) res ~= ' '.repeat().take(l); res ~= "|"; } res.put("\n"); } return res.data.assumeUnique(); } void main() { immutable table = [ ["row1.1", "row1.2 ", "row1.3"], ["row2.1", "row2.2", "row2.3"], ["row3.1", "row3.2", "row3.3 "], ["row4.1", "row4.2", "row4.3"], ["row5.1", "row5.2", "row5.3"], ]; writeln(fmttable(table)); int i; for (i=0; i < 100; ++i) { fmttable(table); } writeln(i); } timings: DMD (-O -release -inline -boundscheck=off): real0m0.003s user0m0.000s sys 0m0.000s LDMD2-ldc2 (-O -release -inline -boundscheck=off): real0m1.071s user0m1.067s sys 0m0.000s GDC (-O3 -finline -frelease -fno-bounds-check): real0m0.724s user0m0.720s sys 0m0.003s What versions of these compilers? I suspect the majority (maybe 80%-ish) of the time is spent allocating memory, so you might be seeing GC improvements in recent DMD
Re: DMD is faster than LDC and GDC
V Thu, 12 Nov 2015 13:37:28 +0100 Iain Buclaw via Digitalmars-dnapsáno: > On 12 November 2015 at 12:59, Daniel Kozak via Digitalmars-d < > digitalmars-d@puremagic.com> wrote: > > > code: > > > > > > > > > > > GDC (-O3 -finline -frelease -fno-bounds-check): > > real0m0.724s > > user0m0.720s > > sys 0m0.003s > > > > > Not to be pedantic, but -finline does nothing (what you really want is > -finline-functions) > > However -finline-functions is enabled automatically at -O3, so the > whole -finline just becomes wasted typing. :-) Yeah I know, but it is a bad habit from past
DMD is faster than LDC and GDC
code: import std.stdio; auto fmttable(immutable(string[][]) table) { import std.array : appender, uninitializedArray; import std.range : take, repeat; import std.exception : assumeUnique; auto res = appender(uninitializedArray!(char[])(128)); res.clear(); if (table.length == 0) return ""; // column widths auto widths = new int[](table[0].length); foreach (rownum, row; table) { foreach (colnum, cell; row) { if (cell.length > widths[colnum]) widths[colnum] = cast(int)cell.length; } } foreach (row; table) { res ~= "|"; foreach (colnum, cell; row) { int l = widths[colnum] - cast(int)cell.length; res ~= cell; if (l) res ~= ' '.repeat().take(l); res ~= "|"; } res.put("\n"); } return res.data.assumeUnique(); } void main() { immutable table = [ ["row1.1", "row1.2 ", "row1.3"], ["row2.1", "row2.2", "row2.3"], ["row3.1", "row3.2", "row3.3 "], ["row4.1", "row4.2", "row4.3"], ["row5.1", "row5.2", "row5.3"], ]; writeln(fmttable(table)); int i; for (i=0; i < 100; ++i) { fmttable(table); } writeln(i); } timings: DMD (-O -release -inline -boundscheck=off): real0m0.003s user0m0.000s sys 0m0.000s LDMD2-ldc2 (-O -release -inline -boundscheck=off): real0m1.071s user0m1.067s sys 0m0.000s GDC (-O3 -finline -frelease -fno-bounds-check): real0m0.724s user0m0.720s sys 0m0.003s
Re: DMD is faster than LDC and GDC
V Thu, 12 Nov 2015 12:10:30 + John Colvin via Digitalmars-dnapsáno: > On Thursday, 12 November 2015 at 11:59:50 UTC, Daniel Kozak wrote: > > code: > > > > import std.stdio; > > > > auto fmttable(immutable(string[][]) table) { > > > > import std.array : appender, uninitializedArray; > > import std.range : take, repeat; > > import std.exception : assumeUnique; > > > > auto res = appender(uninitializedArray!(char[])(128)); > > res.clear(); > > > > if (table.length == 0) return ""; > > // column widths > > auto widths = new int[](table[0].length); > > > > foreach (rownum, row; table) { > > foreach (colnum, cell; row) { > > if (cell.length > widths[colnum]) > > widths[colnum] = cast(int)cell.length; > > } > > } > > > > foreach (row; table) { > > res ~= "|"; > > foreach (colnum, cell; row) { > > int l = widths[colnum] - cast(int)cell.length; > > res ~= cell; > > if (l) > > res ~= ' '.repeat().take(l); > > res ~= "|"; > > } > > res.put("\n"); > > } > > > > return res.data.assumeUnique(); > > } > > > > void main() { > > > > immutable table = [ > > ["row1.1", "row1.2 ", "row1.3"], > > ["row2.1", "row2.2", "row2.3"], > > ["row3.1", "row3.2", "row3.3 "], > > ["row4.1", "row4.2", "row4.3"], > > ["row5.1", "row5.2", "row5.3"], > > ]; > > > > writeln(fmttable(table)); > > int i; > > for (i=0; i < 100; ++i) { > > fmttable(table); > > } > > writeln(i); > > } > > > > timings: > > > > DMD (-O -release -inline -boundscheck=off): > > real0m0.003s > > user0m0.000s > > sys 0m0.000s > > > > LDMD2-ldc2 (-O -release -inline -boundscheck=off): > > real0m1.071s > > user0m1.067s > > sys 0m0.000s > > > > > > GDC (-O3 -finline -frelease -fno-bounds-check): > > real0m0.724s > > user0m0.720s > > sys 0m0.003s > > What versions of these compilers? I suspect the majority (maybe > 80%-ish) of the time is spent allocating memory, so you might be > seeing GC improvements in recent DMD DMD 2.069 LDC 2.067 GDC 2.065 No it is not cause by memory allocations. It seems DMD can recognize that fmttable has same result every time, so it does compute it only once.
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 12:23:11 UTC, Daniel Kozak wrote: V Thu, 12 Nov 2015 12:10:30 + John Colvin via Digitalmars-dnapsáno: On Thursday, 12 November 2015 at 11:59:50 UTC, Daniel Kozak wrote: > [...] What versions of these compilers? I suspect the majority (maybe 80%-ish) of the time is spent allocating memory, so you might be seeing GC improvements in recent DMD DMD 2.069 LDC 2.067 GDC 2.065 No it is not cause by memory allocations. It seems DMD can recognize that fmttable has same result every time, so it does compute it only once. Ok, then my second hypothesis is that dmd is inferring the pure attribute for fmttable because it returns auto (new in 2.069 IIRC), which enable the above optimisation that you have noted. Gdc and ldc (and dmd) can do similar things in their backend, but perhaps not here. Do you have older dmd versions on hand to test?
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 11:59:50 UTC, Daniel Kozak wrote: code: import std.stdio; auto fmttable(immutable(string[][]) table) { import std.array : appender, uninitializedArray; import std.range : take, repeat; import std.exception : assumeUnique; auto res = appender(uninitializedArray!(char[])(128)); res.clear(); if (table.length == 0) return ""; // column widths auto widths = new int[](table[0].length); foreach (rownum, row; table) { foreach (colnum, cell; row) { if (cell.length > widths[colnum]) widths[colnum] = cast(int)cell.length; } } foreach (row; table) { res ~= "|"; foreach (colnum, cell; row) { int l = widths[colnum] - cast(int)cell.length; res ~= cell; if (l) res ~= ' '.repeat().take(l); res ~= "|"; } res.put("\n"); } return res.data.assumeUnique(); } void main() { immutable table = [ ["row1.1", "row1.2 ", "row1.3"], ["row2.1", "row2.2", "row2.3"], ["row3.1", "row3.2", "row3.3 "], ["row4.1", "row4.2", "row4.3"], ["row5.1", "row5.2", "row5.3"], ]; writeln(fmttable(table)); int i; for (i=0; i < 100; ++i) { fmttable(table); } writeln(i); } timings: DMD (-O -release -inline -boundscheck=off): real0m0.003s user0m0.000s sys 0m0.000s LDMD2-ldc2 (-O -release -inline -boundscheck=off): real0m1.071s user0m1.067s sys 0m0.000s GDC (-O3 -finline -frelease -fno-bounds-check): real0m0.724s user0m0.720s sys 0m0.003s To test the speed of fmttable itself I split fmttable and main in to different modules, made fmttable extern(C) so I could just prototype it in the main module (no import), then compiled them separately before linking. This should prevent any possible inlining/purity cleverness. ~1s for ldmd2, ~2s for dmd, which is business as normal. dmd is being clever and spotting that fmttable is pure, it would be good if ldc/gdc could spot this to.
Re: DMD is faster than LDC and GDC
V Thu, 12 Nov 2015 12:38:47 + John Colvin via Digitalmars-dnapsáno: > On Thursday, 12 November 2015 at 12:23:11 UTC, Daniel Kozak wrote: > > V Thu, 12 Nov 2015 12:10:30 + > > John Colvin via Digitalmars-d > > napsáno: > > > >> On Thursday, 12 November 2015 at 11:59:50 UTC, Daniel Kozak > >> wrote: > >> > [...] > >> > >> What versions of these compilers? I suspect the majority > >> (maybe 80%-ish) of the time is spent allocating memory, so you > >> might be seeing GC improvements in recent DMD > > > > DMD 2.069 > > > > LDC 2.067 > > > > GDC 2.065 > > > > No it is not cause by memory allocations. > > > > It seems DMD can recognize that fmttable has same result every > > time, so it does compute it only once. > > Ok, then my second hypothesis is that dmd is inferring the pure > attribute for fmttable because it returns auto (new in 2.069 > IIRC), which enable the above optimisation that you have noted. > Gdc and ldc (and dmd) can do similar things in their backend, but > perhaps not here. > > Do you have older dmd versions on hand to test? Yes (DVM) and it is same for older versions (2.066.1, 2.067.1)
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 14:44:49 UTC, John Colvin wrote: To test the speed of fmttable itself I split fmttable and main in to different modules, made fmttable extern(C) so I could just prototype it in the main module (no import), then compiled them separately before linking. This should prevent any possible inlining/purity cleverness. ~1s for ldmd2, ~2s for dmd, which is business as normal. dmd is being clever and spotting that fmttable is pure, it would be good if ldc/gdc could spot this to. If so, should explicitly marking fmttable as pure close the gap for initial code?
Re: DMD is faster than LDC and GDC
On Thursday, 12 November 2015 at 14:44:49 UTC, John Colvin wrote: dmd is being clever and spotting that fmttable is pure, it would be good if ldc/gdc could spot this to. I don't recall seeing anything in the 2.069.0 change log about improved attribute inference for auto functions. If you can find a link pointing to where it was discussed (either change log or forum or bug report), I would appreciate it.
Re: DMD is faster than LDC and GDC
On 11/12/15 13:22, Daniel Kozak via Digitalmars-d wrote: >>> timings: >>> > > >>> > > DMD (-O -release -inline -boundscheck=off): >>> > > real0m0.003s >>> > > user0m0.000s >>> > > sys 0m0.000s >>> > > >>> > > LDMD2-ldc2 (-O -release -inline -boundscheck=off): >>> > > real0m1.071s >>> > > user0m1.067s >>> > > sys 0m0.000s >>> > > >>> > > >>> > > GDC (-O3 -finline -frelease -fno-bounds-check): >>> > > real0m0.724s >>> > > user0m0.720s >>> > > sys 0m0.003s >> > >> > What versions of these compilers? I suspect the majority (maybe >> > 80%-ish) of the time is spent allocating memory, so you might be >> > seeing GC improvements in recent DMD > DMD 2.069 > > LDC 2.067 > > GDC 2.065 > > No it is not cause by memory allocations. > > It seems DMD can recognize that fmttable has same result every time, so > it does compute it only once. Comparisons using different frontend versions are very unfair - - *every D release introduces a new language dialect* (for example: http://dlang.org/changelog/2.068.0.html#attribinference3). Out of curiosity, how does this slightly more sane version perform? (I don't have any dmd or ldc compilers; it takes ~80ms using GDC) import std.stdio; auto fmttable(alias sink=sink)(immutable(string[][]) table) { import std.range : take, repeat; if (table.length == 0) return; // column widths auto widths = new int[](table[0].length); foreach (rownum, row; table) { foreach (colnum, cell; row) { if (cell.length > widths[colnum]) widths[colnum] = cast(int)cell.length; } } foreach (row; table) { sink("|"); foreach (colnum, cell; row) { sink(cell, ' '.repeat().take(widths[colnum]-cast(int)cell.length), "|"); } sink("\n"); } } void sink(S...)(S s) { foreach(I, _; S) write(s[I]); } void sink0(S...)(S s) {} void main() { immutable table = [ ["row1.1", "row1.2 ", "row1.3"], ["row2.1", "row2.2", "row2.3"], ["row3.1", "row3.2", "row3.3 "], ["row4.1", "row4.2", "row4.3"], ["row5.1", "row5.2", "row5.3"], ]; fmttable(table); int i; for (i=0; i < 100; ++i) { fmttable!sink0(table); } sink(i, "\n"); } artur