I've been lurking a little on the recent discussions about thread-local garbage collection and my general opinion is that implicitly thread-local GC makes it too easy to shoot oneself in the foot if using non-SafeD constructs like casting to shared/immutable or using std.parallelism or core.thread, and that SafeD concurrency is so limited that it's not reasonable to expect people to use only that.
However, what if we kept the shared GC as the default for everything **but made thread-local GCs an allocator in the allocator interface that's being worked on**? This probably wouldn't be hard to implement. All you'd need is multiple instances of the GCX struct, one for each thread, and a way to register the thread-local GCs with the shared GC so that pointers from the thread-local GCs to shared memory get scanned properly and get rid of a little locking/world stopping for the thread-local instances. Using the thread-local GC allocator would be an explicit assertion that you will **not** be casting the memory to shared/immutable and sharing it, using core.thread/std.parallelism, etc. This would also keep with D's design philosophy that the simple, safe way should be the default but the more complicated, less safe way should be available for performance-critical code. Thoughts?