On Friday, 25 March 2016 at 01:07:16 UTC, maik klein wrote:
Link to the blog post: https://maikklein.github.io/post/soa-d/
Link to the reddit discussion:
https://www.reddit.com/r/programming/comments/4buivf/why_and_when_you_should_use_soa/
I think structs-of-arrays are a lot more situational than you
make them out to be.
You say, at the end of your article, that "SoA scales much better
because you can partially access your data without needlessly
loading unrelevant data into your cache". But most of the time,
programs access struct fields close together in time (i.e.
accessing one field of a struct usually means that you will
access another field shortly). In that case, you've now split
your data across multiple cache lines; not good.
Your ENetPeer example works against you here; the the
packetThrottle* variables would be split up into different
arrays, but they will likely be checked together when throttling
packets. Though admittedly, it's easy to fix; put fields likely
to be accessed together in their own struct.
The SoA approach also makes random access more inefficient and
makes it harder for objects to have identity. Again, your
ENetPeer example works against you; it's common for servers to
need to send packets to individual clients rather than
broadcasting them. With the SoA approach, you end up accessing a
tiny part of multiple arrays, and load several cache lines
containing data for ENetPeers that you don't care about (i.e.
loading irrelevant data).
I think SoA can be faster if you are commonly iterating over a
section of a dataset, but I don't think that's a common
occurrence. I definitely think it's unwarranted to conclude that
SoAs "scale much better" without noting when they scale better,
especially without benchmarks.
I will admit, though, that the template for making the
struct-of-arrays is a nice demonstration of D's templates.