On Monday, 10 November 2014 at 19:18:21 UTC, Kirill wrote:
Dear D community (and specifically experts on cache
optimization),
I'm a C++ programmer and was waiting for a while to do a
project in D.
I'd like to build a cache-optimized decision tree forest
library, and I'm debating between D and C++. I'd like to make
it similar to atlas, spiral, or other libraries that partially
use static optimization with recompilation and meta-programming
to cache optimize the code for a specific architecture
(specifically the latest xeons / xeon phi). Given D's compile
speed and meta-programming, it should be a good fit. The
problem that I might encounter is that C++ has a lot more
information on the topic, which might be significant bottleneck
given I'm just learning cache optimization (from a few papers
and what every programmer should know about memory).
From my understanding, cache optimization mostly involves
breaking data and loops into segments that fit in cache, and
making sure that commonly used variables (for example sum in
sum+=i) stay in cache.
Assing there isn't more frequently accessed data around, you
would want that to stay in a register, not cache.
Most of this should be solved by statically defining sizes and
paddings of blocks to be used for caching. It's more related to
low level -- C, from my understanding. Are there any hidden
stones?
The other question is how mature is the compiler in terms of
optimizing for cache comparing to C++? I think gnu C++ does a
few tricks to optimize for cache and there are ways to tweak
cache line alignment.
My knowledge on the subject is not yet concrete and limited but
I hope this gave an idea of what I'm looking for and you can
recommend me a good direction to take.
Best regards,
--Kirill
D is a good language for this sort of thing. Using various
metaprogramming techniques it might even be fun.
Most advice for C(++) will also apply to D w.r.t. cache.
You will probably have to learn assembly and also make use of
tools such as cachegrind and perf unless you like trying to
optimise blind.
A word of warning: modern CPU caches are complicated and are
sometimes difficult to understand w.r.t. performance in specific
cases.