>  3. ... the capability to author software which can reach the full *
>> [performance]* potential of the hardware, without "dropping into another
>> language". This includes no achilles heels, such as loss of
>> memory-layout-control, introduction of stop-the-world-pauses, or other
>> impediments to controlling the final behavior and expression of the program.
>>
>
> The only language I know of that ultimately meets that goal is assembly
> language, and we *really* don't want to relegate ourselves to that. I
> would say rather that we want the need to drop to assembly language to be
> drastically minimized.
>

Agreed. I missed a qualification here. In #3, my intent was to talk about
reaching the full "*performance* potential" of the hardware. IMO, doing
this requires direct/inline access to tools like SIMD, hardware-assisted
atomics/synchronization. shared-memory threading, pinned/fixed
memory-layout-control, and lack of any issues which impede meeting
performance and latency targets. It doesn't generally require
end-programmer assembly language.

AFAIK, the CLR meets all of these requirements except the costs of it's
GC-world-stop and GC-tracing (see below). Many of them it only meets
because of it's excellent value-type system including structs,
struct-arrays, and parametric instantiation. JVM, V8, Smalltalk-VM, and
others do not meet many of these requirements.


>  Azul C4 may solve (d) but does not solve (3).
>>
>
> In what way does the Azul C4 collector fail to solve (3)?
>

The Azul C4 collector, like any tracing collector, can only meet a
CPU-overhead and memory-overhead target for a specific set of programs.

Using the rough goal of <25% CPU and memory overhead... There are quite
trivial and common components which will not meet this performance
overhead. Any non-CPU-bound LRU cache being the simplest one. IMO this is
not some special case, but includes nearly any program with a fractional
working set turnover that relies on indexing data-structures such as the
btree, skiplist, kd-tree, r-tree, octree, bounding-volume-hierarchy, tri,
bsp-tree, and the list goes on. I also don't believe regions help (see
below **).

I don't view C4's failure to meet #3 in all cases is a big problem.
However, it means to meet my systems-programming-definition, a
fully-managed solution needs more than just C4-GC. For example, it could
provide C4-level no-stop-GC, ARC, and a satisfactory means of connecting
them together which does not subject the whole program to GC tracing.

I *believe* CLR+unsafe with Azul C4 can meet all my requirements, since we
can always fall-back to unsafe for problematic subsystems. (unsafe code can
use ARC, or even manual memory management)

If managed-only is a goal, as you say it is for bitc, I *believe* if we
have a managed-only-CLR which can choose between Azul C4 *and* threadsafe
ARC per-type-instantiation, it would meet all my requirements and be
completely safe.

I *wonder* if supporting GC+ARC simultaneously in a single runtime is worth
it, or if we are better off admitting subsections of unsafe code for now.
(like CLR unsafe)

I also *wonder* if we can build reasonable facsimiles of ARC by using
value-type-data and integer-handles within the existing CLR. I'm
experimenting with this, but so far it doesn't seem like it will perform.

** As for regions, as far as I can see, they do not help the problematic
memory management problem above, since the problem as described degenerates
to per-region GC. It seems like there may be some interesting possibility
to reduce GC pressure through immutable regions. In this case they feel
like linear-types which could admit cyclic structures. However, utility of
very large immutable structures is more limited than very large mutable
structures, and functional/STM styles of mutating data by returning new
immutable structures can have dramatic performance consequences.

For all those reasons -- if I had to choose -- rather than regions I would
prefer to see CLR add NoEscape/borrowing/lifetime mechanisms to increase
the efficiency of using value-types through value-type ref return and
stack-scoped iterator blocks. I believe these features plus a C4-level
no-stop collector, plus sparing use of unsafe, would be an incredible
systems programming environment for all but the most constrained systems.
For example, I believe it would be sufficient for all smartphone/mobile
system software except the kernel.

In summary, I think C4-ish designs may be a fantastic tool to help us push
managed environments closer to C-parity for systems programming. However, I
don't think C4-GC alone solves the problem for the reasons I explain above.

 I think C4 could be done on Android, which is basically built on a Linux
>> kernel. It could be done on Windows if MSFT was sufficiently motivated. I
>> agree that JVM is unsuited for systems programming, but I'm not sure C4 is
>> unsuited for that.
>>
>
There was a recent MSFT/Azul press release which implied (but did not
state) that Azul-C4-JVM may be coming to windows-server. I agree C4 is
likely viable on Android, as ARM MMU capabilities seem close to x86 MMU
parity (though I'm not familiar with the details).
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to