On Friday, 12 October 2018 at 15:11:17 UTC, welkam wrote:
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
What I am requesting here is for a person(s) who is an "expert" (very good) to create a very fast D version, using whatever tricks it has to maximize performance.

I would like to include in my paper a good comparison of various implementations in different compiled languages (C/C++, D, Nim, etc) to show how it performs with each.

I looked into your NIM code and from programmers point of view there is nothing interesting going on. Simple data structures and simple operations. If you wrote equivalent code in C, C++, D, NIM, Rust, Zig and compiled with same optimizing compiler (llvm or gcc) you should get the same machine code and almost the same performance (less than 1% difference due to runtime). If you got different machine code for equivalent implementation then you should file a bug report.

The only way you will get different performance is by changing implementation details but then you would compare apples to oranges.

Hmm,I don't think what you're saying about similar output|performance with other languages is empirically correct, but it's really not the point of the challenge.

The real point of the challenge is too see what idiomatic code, written for performance, using the best resources that the language provides, will produce compared, to the Nim version. It's not to see what a line-by-line translation from Nim to D would look like. That may be a start to get something working, but shouldn't be the end goal.

I'm using the Nim version here as the "reference implementation" so it can be used as the standard for comparison (accuracy of results and performance). The goal for D (et al) users is to use whatever resources it provides to maybe do better.

Example. Nim currently doesn't provide standard bitarrays. Using bitarrays in place of byte arrays should perform faster because more data can fit in cache and operate faster.

Also, to parallelize the algorithm maybe using OpenMP, CUDA, etc is the way to do it for D. I don't know what constructs D uses for parallel multiprocessing. And as noted before, this algorithms screams out to be done with GPUs.

But you are correct that the Nim code uses very simple coding operations. That is one of its beauties! :) It is simple to understand and implement mathematically, short and simple to code, and architecturally adaptable to hardware.

So to really do the challenge, the Nim code needs to be compiled and run (per instructions in code) to use as the "reference implementation", to see what correct outputs look like, and their times, and then other implementations can be compared to it.

I would hope, after getting an output correct implementation done (to show you really know what you're doing) then alternative implementations can be done to wring out better performance.

I think this is a good challenge for anyone wanting to learn D too, because it involves something substantially more than a "toy" algorithm, but short enough to do with minimal time and effort, that involves the need to know (learn about) D in enough detail to determine the "best" (alternative) way to do it.

Finally, a really fast D implementation can be a marketing bananza to show people in the numerical analysis, data|signal processing fields, et al, that D can be used by them to solve their problems and be more performant than C++, etc.

Again, people should feel free to email me if the want more direct answers to questions, or help.

Reply via email to