> 3. ... the capability to author software which can reach the full * >> [performance]* potential of the hardware, without "dropping into another >> language". This includes no achilles heels, such as loss of >> memory-layout-control, introduction of stop-the-world-pauses, or other >> impediments to controlling the final behavior and expression of the program. >> > > The only language I know of that ultimately meets that goal is assembly > language, and we *really* don't want to relegate ourselves to that. I > would say rather that we want the need to drop to assembly language to be > drastically minimized. >
Agreed. I missed a qualification here. In #3, my intent was to talk about reaching the full "*performance* potential" of the hardware. IMO, doing this requires direct/inline access to tools like SIMD, hardware-assisted atomics/synchronization. shared-memory threading, pinned/fixed memory-layout-control, and lack of any issues which impede meeting performance and latency targets. It doesn't generally require end-programmer assembly language. AFAIK, the CLR meets all of these requirements except the costs of it's GC-world-stop and GC-tracing (see below). Many of them it only meets because of it's excellent value-type system including structs, struct-arrays, and parametric instantiation. JVM, V8, Smalltalk-VM, and others do not meet many of these requirements. > Azul C4 may solve (d) but does not solve (3). >> > > In what way does the Azul C4 collector fail to solve (3)? > The Azul C4 collector, like any tracing collector, can only meet a CPU-overhead and memory-overhead target for a specific set of programs. Using the rough goal of <25% CPU and memory overhead... There are quite trivial and common components which will not meet this performance overhead. Any non-CPU-bound LRU cache being the simplest one. IMO this is not some special case, but includes nearly any program with a fractional working set turnover that relies on indexing data-structures such as the btree, skiplist, kd-tree, r-tree, octree, bounding-volume-hierarchy, tri, bsp-tree, and the list goes on. I also don't believe regions help (see below **). I don't view C4's failure to meet #3 in all cases is a big problem. However, it means to meet my systems-programming-definition, a fully-managed solution needs more than just C4-GC. For example, it could provide C4-level no-stop-GC, ARC, and a satisfactory means of connecting them together which does not subject the whole program to GC tracing. I *believe* CLR+unsafe with Azul C4 can meet all my requirements, since we can always fall-back to unsafe for problematic subsystems. (unsafe code can use ARC, or even manual memory management) If managed-only is a goal, as you say it is for bitc, I *believe* if we have a managed-only-CLR which can choose between Azul C4 *and* threadsafe ARC per-type-instantiation, it would meet all my requirements and be completely safe. I *wonder* if supporting GC+ARC simultaneously in a single runtime is worth it, or if we are better off admitting subsections of unsafe code for now. (like CLR unsafe) I also *wonder* if we can build reasonable facsimiles of ARC by using value-type-data and integer-handles within the existing CLR. I'm experimenting with this, but so far it doesn't seem like it will perform. ** As for regions, as far as I can see, they do not help the problematic memory management problem above, since the problem as described degenerates to per-region GC. It seems like there may be some interesting possibility to reduce GC pressure through immutable regions. In this case they feel like linear-types which could admit cyclic structures. However, utility of very large immutable structures is more limited than very large mutable structures, and functional/STM styles of mutating data by returning new immutable structures can have dramatic performance consequences. For all those reasons -- if I had to choose -- rather than regions I would prefer to see CLR add NoEscape/borrowing/lifetime mechanisms to increase the efficiency of using value-types through value-type ref return and stack-scoped iterator blocks. I believe these features plus a C4-level no-stop collector, plus sparing use of unsafe, would be an incredible systems programming environment for all but the most constrained systems. For example, I believe it would be sufficient for all smartphone/mobile system software except the kernel. In summary, I think C4-ish designs may be a fantastic tool to help us push managed environments closer to C-parity for systems programming. However, I don't think C4-GC alone solves the problem for the reasons I explain above. I think C4 could be done on Android, which is basically built on a Linux >> kernel. It could be done on Windows if MSFT was sufficiently motivated. I >> agree that JVM is unsuited for systems programming, but I'm not sure C4 is >> unsuited for that. >> > There was a recent MSFT/Azul press release which implied (but did not state) that Azul-C4-JVM may be coming to windows-server. I agree C4 is likely viable on Android, as ARM MMU capabilities seem close to x86 MMU parity (though I'm not familiar with the details).
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
