Re: On heap segregation, GC optimization and @nogc relaxing

via Digitalmars-d Wed, 12 Nov 2014 00:41:02 -0800

On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:

The problem at hand here is ownership of data.


"ownership of data" is one possible solution, but not the problem.

We are facing 2 problems:

1. A performance problem: Concurrency in writes (multiplewriters, one writer, periodical locking during clean up etc).


2. A structural problem: Releasing resources correctly.

I suggest that the ownership focus is on the latter, to supportsolid non-GC implementations. Then rely on conventions formulti-threading.

- Being unsafe and rely on convention. This is the C++ road(and a possible road in D). It allow to implement almost anywanted scheme, but come at great cost for the developer.

All performant solutions are going to be "unsafe" in the sensethat you need to select a duplication/locking level that areoptimal for the characteristics of the actual application.Copying data when you have no writers is too inefficient in realapplications.

Hardware support for transactional memory is going to be the easyapproach for speeding up locking.

 - Annotations. This is the Rust road. It also come a great

I think Rust's approach would favour a STM approach where youcreate thread local copies for processing then merge the resultback into the "shared" memory.

Immutability+GC allow to have safety while keeping interfacessimple. That is of great value. It also come with some nicegoodies, in the sense that is it easy and safe to shared datawithout bookkeeping, allowing one to fit more in cache, andreduce the amount of garbage created.

How does GC fit more data in the cache? A GC usually has overheadand would typically generate more cache-misses due to unreachablein-cache ("hot") memory not being available for reallocation.

Relying on convention has the advantage that any scheme can beimplemented without constraint, while keeping interface simple.The obvious drawback is that it is time consuming and errorprone. It also make a lot of things unclear, and dev choose thebetter safe than sorry road. That mean excessive copying tomake sure one own the data, which is wasteful (in term of workfor the copy itself, garbage generation and cache pressure). Ifthis must be an option locally for system code, it doesn'tseems like this is the right option at program scale and we doit in C++ simply because we have to.
Finally, annotations are a great way to combine safety andspeed, but generally come at a great cost when implentinguncommon ownership strategies where you ends up having toexpress complex lifetime and ownership relations.

The core problem is that if you are unhappy with single-threadedapplications then you are looking for high throughput usingmulti-threading. And in that case sacrificing performance by notusing the optimal strategy becomes problematic.

The optimal strategy is entirely dependent on the application andthe dataset.


Therefore you need to support multiple approaches:

- per data structure GC
- thread local GC
- lock annotations of types or variables
- speculative lock optimisations (transactional memory)

And in the future you also will need to support the integrationof GPU/Co-processors into mainstream CPUs. Metal and OpenCL isonly a beginning…

Ideally, we want to map with what the hardware does. So whatdoes the hardware do ?

That changes over time. The current focus in upcoming hardware ison:


1. Heterogenous architecture with high performance co-processors

2. Hardware support for transactional memory

Intel CPUs might have buffered transactional memory within 5years.

from one core to the other. They are bad at shared writabledata (as effectively, the cache line will have to bounce backand forth between cores, and all memory access will need to beserialized instead of performed out of order).

This will vary a lot. On x86 you can write to a whole cache line(buffered) without reading it first and it uses a convenientcache coherency protocol (so that reads/write ops are in order).This is not true for all CPUs.

I agree with others that say that a heterogeneous approach, likeC++, is the better alternative. If parity with C++ is importantthen D needs to look closer at OpenMP, but that probably goesbeyond what D can achieve in terms of implementation.



Some observations:

1. If you are not to rely on conventions for sync'ing threadsthen you need a pretty extensive framework if you want goodperformance.


2. Safety will harm performance.

3. Safety with high performance levels requires a verycomplicated static analysis that will probably not work very wellfor larger programs.

4. For most applications performance will come throughco-processors (GPGPU etc).

5. If hardware progresses faster than compiler development, thenyou will never reach the performance frontier…

I think D needs to cut down on implementation complexity andensure that the implementation time can catch up with hardwaredevelopments. The way to do it is:

1. Accept that generally performant multi-threaded code is unsafeand application/hardware optimized.

2. Focus on making @nogc single-threaded code robust and fast.And I agree that ownership is key.

3. Use semantic analysis to automatically generate a tailoredruntime with application-optimized allocators.

Re: On heap segregation, GC optimization and @nogc relaxing

Reply via email to