On Mon, Jul 29, 2013 at 1:31 PM, David Jeske <[email protected]> wrote:

>
>
>>  3. ... the capability to author software which can reach the full *
>>> [performance]* potential of the hardware, without "dropping into
>>> another language". This includes no achilles heels, such as loss of
>>> memory-layout-control, introduction of stop-the-world-pauses, or other
>>> impediments to controlling the final behavior and expression of the program.
>>>
>>
>> The only language I know of that ultimately meets that goal is assembly
>> language, and we *really* don't want to relegate ourselves to that. I
>> would say rather that we want the need to drop to assembly language to be
>> drastically minimized.
>>
>
> Agreed. I missed a qualification here. In #3, my intent was to talk about
> reaching the full "*performance* potential" of the hardware. IMO, doing
> this requires direct/inline access to tools like SIMD, hardware-assisted
> atomics/synchronization. shared-memory threading, pinned/fixed
> memory-layout-control, and lack of any issues which impede meeting
> performance and latency targets. It doesn't generally require
> end-programmer assembly language.
>
> AFAIK, the CLR meets all of these requirements except the costs of it's
> GC-world-stop and GC-tracing (see below). Many of them it only meets
> because of it's excellent value-type system including structs,
> struct-arrays, and parametric instantiation. JVM, V8, Smalltalk-VM, and
> others do not meet many of these requirements.
>

I find this very interesting. The objective you are setting makes sense,
but some of the particulars strike me as challenging - notably the vector
computation goals. I used to track the work on vectorizing optimization; I
have no idea where the state of the art stands today. The issue I would
anticipate is that different hardware architectures have wildly different
models for SIMD and vector fixed-point. It's sometimes hard to know what
type and algorithm support, if any, should exist in a high-level language
that seeks to target this type of computation. The debate about how to wrap
a hardware-neutral interface around GPU computation provides both an
illustration and a cautionary tale about the difficulties.

The alternative, of course, is to provide suitable intrinsics and not *try* to
write portable code. I see no problem with that, but dealing with GPUs and
vectorizing computations isn't my highest priority at the moment.


>
>
>>  Azul C4 may solve (d) but does not solve (3).
>>>
>>
>> In what way does the Azul C4 collector fail to solve (3)?
>>
>
> The Azul C4 collector, like any tracing collector, can only meet a
> CPU-overhead and memory-overhead target for a specific set of programs.
>

> Using the rough goal of <25% CPU and memory overhead... There are quite
> trivial and common components which will not meet this performance overhead.
>

My understanding is that this is incorrect. The Azul collector, if it is
able to use x<25% of a CPU, is able to keep up with the mutation rate of
the mutator.

But I think this is pushing too far. Unfortunately, languages *do* have to
make choices about memory models, and no choice of memory models is going
to be perfect for all programs. I'm not really interested in whether the
Azul collector keeps up with a particularly challenging LRU implementation.
I'm concerned with whether that makes a real difference in a real program.


> If managed-only is a goal, as you say it is for bitc, I *believe* if we
> have a managed-only-CLR which can choose between Azul C4 *and* threadsafe
> ARC per-type-instantiation, it would meet all my requirements and be
> completely safe.
>

My problem with ARC is that I don't know how to type acyclic data
structures, and ARC isn't resource-safe (because it can exhaust memory)
unless we can type that.


> I *wonder* if supporting GC+ARC simultaneously in a single runtime is
> worth it, or if we are better off admitting subsections of unsafe code for
> now. (like CLR unsafe)
>

Those are orthogonal questions. For systems programs, we need to admit
unsafe code in any case. But I really haven't heard a case articulated
(yet) that first-class regions can't handle.


> ** As for regions, as far as I can see, they do not help the problematic
> memory management problem above, since the problem as described degenerates
> to per-region GC.
>

I've been going back and forth on whether I think this is true. There are
certainly use-cases of regions for which I agree, but I'm wondering if we
can't distinguish those from the non-liveness-decreasing cases with a
suitable type system. Still pondering that.


> It seems like there may be some interesting possibility to reduce GC
> pressure through immutable regions.
>

Yes. Actually, they don't even need to be immutable. The only need to be
immutable at traced references.


> For all those reasons -- if I had to choose -- rather than regions I would
> prefer to see CLR add NoEscape/borrowing/lifetime mechanisms to increase
> the efficiency of using value-types through value-type ref return and
> stack-scoped iterator blocks. I believe these features plus a C4-level
> no-stop collector, plus sparing use of unsafe, would be an incredible
> systems programming environment for all but the most constrained systems.
> For example, I believe it would be sufficient for all smartphone/mobile
> system software except the kernel.
>

It would also be sufficient for the kernel, because a properly crafted
kernel for such platforms shouldn't be allocating at all after startup
initialization.


> In summary, I think C4-ish designs may be a fantastic tool to help us push
> managed environments closer to C-parity for systems programming. However, I
> don't think C4-GC alone solves the problem for the reasons I explain above.
>

I think you've articulated an interesting case, and you may possibly be
right. That said, there is a limit to how many problems I want to tackle at
once, and ARC definitely isn't on the list right now.


> I agree C4 is likely viable on Android, as ARM MMU capabilities seem close
> to x86 MMU parity (though I'm not familiar with the details).
>

If it *doesn't* work, the issue is going to be that the ARM memory model is
weakly consistent. I haven't put enough time into thinking about it to feel
like I have a handle on the issues.


shap
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to