Le 14/04/2012 21:53, q66 a écrit : > On Saturday, 14 April 2012 at 19:05:40 UTC, ReneSac wrote: >> I have this simple binary arithmetic coder in C++ by Mahoney and >> translated to D by Maffi. I added "notrow", "final" and "pure" and >> "GC.disable" where it was possible, but that didn't made much >> difference. Adding "const" to the Predictor.p() (as in the C++ >> version) gave 3% higher performance. Here the two versions: >> >> http://mattmahoney.net/dc/ <-- original zip >> >> http://pastebin.com/55x9dT9C <-- Original C++ version. >> http://pastebin.com/TYT7XdwX <-- Modified D translation. >> >> The problem is that the D version is 50% slower: >> >> test.fpaq0 (16562521 bytes) -> test.bmp (33159254 bytes) >> >> Lang| Comp | Binary size | Time (lower is better) >> C++ (g++) - 13kb - 2.42s (100%) -O3 -s >> D (DMD) - 230kb - 4.46s (184%) -O -release -inline >> D (GDC) - 1322kb - 3.69s (152%) -O3 -frelease -s >> >> >> The only diference I could see between the C++ and D versions is that >> C++ has hints to the compiler about which functions to inline, and I >> could't find anything similar in D. So I manually inlined the encode >> and decode functions: >> >> http://pastebin.com/N4nuyVMh - Manual inline >> >> D (DMD) - 228kb - 3.70s (153%) -O -release -inline >> D (GDC) - 1318kb - 3.50s (144%) -O3 -frelease -s >> >> Still, the D version is slower. What makes this speed diference? Is >> there any way to side-step this? >> >> Note that this simple C++ version can be made more than 2 times faster >> with algoritimical and io optimizations, (ab)using templates, etc. So >> I'm not asking for generic speed optimizations, but only things that >> may make the D code "more equal" to the C++ code. > > I wrote a version http://codepad.org/phpLP7cx based on the C++ one. > > My commands used to compile: > > g++46 -O3 -s fpaq0.cpp -o fpaq0cpp > dmd -O -release -inline -noboundscheck fpaq0.d > > G++ 4.6, dmd 2.059. > > I did 5 tests for each: > > test.fpaq0 (34603008 bytes) -> test.bmp (34610367 bytes) > > The C++ average result was 9.99 seconds (varying from 9.98 to 10.01) > The D average result was 12.00 seconds (varying from 11.98 to 12.01) > > That means there is 16.8 percent difference in performance that would be > cleared out by usage of gdc (which I don't have around currently).
The code is nearly identical (there is a slight difference in update(), where he accesses the array once more than you), but the main difference I see is the -noboundscheck compilation option on DMD.