On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:
The problem at hand here is ownership of data.

"ownership of data" is one possible solution, but not the problem.

We are facing 2 problems:

1. A performance problem: Concurrency in writes (multiple writers, one writer, periodical locking during clean up etc).

2. A structural problem: Releasing resources correctly.

I suggest that the ownership focus is on the latter, to support solid non-GC implementations. Then rely on conventions for multi-threading.

- Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer.

All performant solutions are going to be "unsafe" in the sense that you need to select a duplication/locking level that are optimal for the characteristics of the actual application. Copying data when you have no writers is too inefficient in real applications.

Hardware support for transactional memory is going to be the easy approach for speeding up locking.


 - Annotations. This is the Rust road. It also come a great

I think Rust's approach would favour a STM approach where you create thread local copies for processing then merge the result back into the "shared" memory.


Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created.

How does GC fit more data in the cache? A GC usually has overhead and would typically generate more cache-misses due to unreachable in-cache ("hot") memory not being available for reallocation.


Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to.

Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations.

The core problem is that if you are unhappy with single-threaded applications then you are looking for high throughput using multi-threading. And in that case sacrificing performance by not using the optimal strategy becomes problematic.

The optimal strategy is entirely dependent on the application and the dataset.

Therefore you need to support multiple approaches:

- per data structure GC
- thread local GC
- lock annotations of types or variables
- speculative lock optimisations (transactional memory)

And in the future you also will need to support the integration of GPU/Co-processors into mainstream CPUs. Metal and OpenCL is only a beginning…


Ideally, we want to map with what the hardware does. So what does the hardware do ?

That changes over time. The current focus in upcoming hardware is on:

1. Heterogenous architecture with high performance co-processors

2. Hardware support for transactional memory

Intel CPUs might have buffered transactional memory within 5 years.


from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order).

This will vary a lot. On x86 you can write to a whole cache line (buffered) without reading it first and it uses a convenient cache coherency protocol (so that reads/write ops are in order). This is not true for all CPUs.

I agree with others that say that a heterogeneous approach, like C++, is the better alternative. If parity with C++ is important then D needs to look closer at OpenMP, but that probably goes beyond what D can achieve in terms of implementation.


Some observations:

1. If you are not to rely on conventions for sync'ing threads then you need a pretty extensive framework if you want good performance.

2. Safety will harm performance.

3. Safety with high performance levels requires a very complicated static analysis that will probably not work very well for larger programs.

4. For most applications performance will come through co-processors (GPGPU etc).

5. If hardware progresses faster than compiler development, then you will never reach the performance frontier…


I think D needs to cut down on implementation complexity and ensure that the implementation time can catch up with hardware developments. The way to do it is:

1. Accept that generally performant multi-threaded code is unsafe and application/hardware optimized.

2. Focus on making @nogc single-threaded code robust and fast. And I agree that ownership is key.

3. Use semantic analysis to automatically generate a tailored runtime with application-optimized allocators.

Reply via email to