On Mon, Jul 29, 2013 at 1:31 PM, David Jeske <[email protected]> wrote:
> > >> 3. ... the capability to author software which can reach the full * >>> [performance]* potential of the hardware, without "dropping into >>> another language". This includes no achilles heels, such as loss of >>> memory-layout-control, introduction of stop-the-world-pauses, or other >>> impediments to controlling the final behavior and expression of the program. >>> >> >> The only language I know of that ultimately meets that goal is assembly >> language, and we *really* don't want to relegate ourselves to that. I >> would say rather that we want the need to drop to assembly language to be >> drastically minimized. >> > > Agreed. I missed a qualification here. In #3, my intent was to talk about > reaching the full "*performance* potential" of the hardware. IMO, doing > this requires direct/inline access to tools like SIMD, hardware-assisted > atomics/synchronization. shared-memory threading, pinned/fixed > memory-layout-control, and lack of any issues which impede meeting > performance and latency targets. It doesn't generally require > end-programmer assembly language. > > AFAIK, the CLR meets all of these requirements except the costs of it's > GC-world-stop and GC-tracing (see below). Many of them it only meets > because of it's excellent value-type system including structs, > struct-arrays, and parametric instantiation. JVM, V8, Smalltalk-VM, and > others do not meet many of these requirements. > I find this very interesting. The objective you are setting makes sense, but some of the particulars strike me as challenging - notably the vector computation goals. I used to track the work on vectorizing optimization; I have no idea where the state of the art stands today. The issue I would anticipate is that different hardware architectures have wildly different models for SIMD and vector fixed-point. It's sometimes hard to know what type and algorithm support, if any, should exist in a high-level language that seeks to target this type of computation. The debate about how to wrap a hardware-neutral interface around GPU computation provides both an illustration and a cautionary tale about the difficulties. The alternative, of course, is to provide suitable intrinsics and not *try* to write portable code. I see no problem with that, but dealing with GPUs and vectorizing computations isn't my highest priority at the moment. > > >> Azul C4 may solve (d) but does not solve (3). >>> >> >> In what way does the Azul C4 collector fail to solve (3)? >> > > The Azul C4 collector, like any tracing collector, can only meet a > CPU-overhead and memory-overhead target for a specific set of programs. > > Using the rough goal of <25% CPU and memory overhead... There are quite > trivial and common components which will not meet this performance overhead. > My understanding is that this is incorrect. The Azul collector, if it is able to use x<25% of a CPU, is able to keep up with the mutation rate of the mutator. But I think this is pushing too far. Unfortunately, languages *do* have to make choices about memory models, and no choice of memory models is going to be perfect for all programs. I'm not really interested in whether the Azul collector keeps up with a particularly challenging LRU implementation. I'm concerned with whether that makes a real difference in a real program. > If managed-only is a goal, as you say it is for bitc, I *believe* if we > have a managed-only-CLR which can choose between Azul C4 *and* threadsafe > ARC per-type-instantiation, it would meet all my requirements and be > completely safe. > My problem with ARC is that I don't know how to type acyclic data structures, and ARC isn't resource-safe (because it can exhaust memory) unless we can type that. > I *wonder* if supporting GC+ARC simultaneously in a single runtime is > worth it, or if we are better off admitting subsections of unsafe code for > now. (like CLR unsafe) > Those are orthogonal questions. For systems programs, we need to admit unsafe code in any case. But I really haven't heard a case articulated (yet) that first-class regions can't handle. > ** As for regions, as far as I can see, they do not help the problematic > memory management problem above, since the problem as described degenerates > to per-region GC. > I've been going back and forth on whether I think this is true. There are certainly use-cases of regions for which I agree, but I'm wondering if we can't distinguish those from the non-liveness-decreasing cases with a suitable type system. Still pondering that. > It seems like there may be some interesting possibility to reduce GC > pressure through immutable regions. > Yes. Actually, they don't even need to be immutable. The only need to be immutable at traced references. > For all those reasons -- if I had to choose -- rather than regions I would > prefer to see CLR add NoEscape/borrowing/lifetime mechanisms to increase > the efficiency of using value-types through value-type ref return and > stack-scoped iterator blocks. I believe these features plus a C4-level > no-stop collector, plus sparing use of unsafe, would be an incredible > systems programming environment for all but the most constrained systems. > For example, I believe it would be sufficient for all smartphone/mobile > system software except the kernel. > It would also be sufficient for the kernel, because a properly crafted kernel for such platforms shouldn't be allocating at all after startup initialization. > In summary, I think C4-ish designs may be a fantastic tool to help us push > managed environments closer to C-parity for systems programming. However, I > don't think C4-GC alone solves the problem for the reasons I explain above. > I think you've articulated an interesting case, and you may possibly be right. That said, there is a limit to how many problems I want to tackle at once, and ARC definitely isn't on the list right now. > I agree C4 is likely viable on Android, as ARM MMU capabilities seem close > to x86 MMU parity (though I'm not familiar with the details). > If it *doesn't* work, the issue is going to be that the ARM memory model is weakly consistent. I haven't put enough time into thinking about it to feel like I have a handle on the issues. shap
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
