On Mon, Jul 29, 2013 at 3:22 PM, Jonathan S. Shapiro <[email protected]>wrote:

> I find this very interesting. The [performance] objective you are setting
> makes sense, but some of the particulars strike me as challenging - notably
> the vector computation goals.
>

I think you're taking my assertion just a bit too far. I simply mean
approaching the envelope of hardware performance should not require
wholesale jettisoning of the systems-runtime because of an achilles heel. I
don't mean that the language or runtime needs to directly support every
hardware capability itself.

stop-the-world GC is an achilles heel
V8 lack of support for concurrent shared data is an achilles heel
Rust's lack of concurrent data (I think because of the mechanics of owned
pointers) is an achilles heel

If you bang your head against these problems, there is no "solution" you
can manage within the design of the runtime. Your only choice is to fix the
runtime or abandon the runtime. Fixing the runtime may be a larger/harder
task than authoring the application in C/C++. This makes it a poor system's
runtime in my view.

Requirement #3 says I should be able to work within the runtime, compiler,
and existing distributed system libraries, while being able to
incrementally add my own improved components (using unsafe, JIT, inlining,
etc) to approach raw hardware performance.

To look at some counter examples..

AFAIK, the the stackless/first-class-continuations folks throwing stones at
the C-stack doesn't feel like an achilles heel. It hasn't been shown that
this is necessary to approach hardware performance potential. (it's more of
a code-size and clarity issue)

Today's .NET exception system's low performance is an issue, but it's hard
to call it an achilles heel. Libraries which overuse them could just be
considered "slow" libraries and we can author faster new ones without
throwing out the runtime and the other libraries.

My understanding is that this is incorrect. The Azul collector, if it is
> able to use x<25% of a CPU, is able to keep up with the mutation rate of
> the mutator.


In fact the C4 technique is insensitive to mutator rate regardless of CPU
consumption, because the loaded-value-barrier traps and heals references,
assuring the collector does not need to revisit them. What does keeping up
with the mutator have to do with memory overhead?

I can still cause 50% of memory to be reclaimable with a single pointer
write.. much faster than the GC can find it. I can still allocate memory
and fill it with pointers at a rate proportional to the memory bandwidth
I'm consuming, not proportional to the life heap.

Assume 75% of CPU/memory bandwidth running full tilt allocating and linking
blocks of pointers together, occasionally freeing large chunks. I fail to
see how 25% of CPU/memory bandwidth could possible scan the entirety of
memory before the program exceeded 25% un-reclaimed memory overhead.

I'm not really interested in whether the Azul collector keeps up with a
> particularly challenging LRU implementation. I'm concerned with whether
> that makes a real difference in a real program.
>

LRU caches are real programs! However, I accept your implication that
specific one-off algorithms can be hand-coded in unsafe or manual-RC if we
must. My systems rule #3 says as long as we have *some* way of solving the
problem without tossing the rest of the runtime, we're golden, so this is
completely acceptable. We just have to avoid falling down the slippery
slope of everything falling into unsafe!

Thinking about this briefly, I can't think of an easy way to benchmark
tenured churn for a large C/C++ proof-case. We will probably not see much
useful data about this until the C4 collector is more widespread and the
new programs it enables bump against this barrier (or not).


> My problem with ARC is that I don't know how to type acyclic data
> structures, and ARC isn't resource-safe (because it can exhaust memory)
> unless we can type that.
>

Programs which allocate are not resource safe, because they can exhaust
memory.

By my view, the responsibility of a systems runtime is to make sure
(nearly) anything is *possible* (rule #3) through a concrete and easy to
understand set of mechanisms. It's not to make sure everything is
*perfect*.

That said, I have not thought enough about how a hybrid GC + ARC solution
actually solves real problems.


>   But I really haven't heard a case articulated (yet) that first-class
>> regions can't handle.
>>
>
You mean besides an LRU-cache? :)

Anything long-running with tenured churn is not solved by regions,
especially with dependency graphs. Here are a few. 3d Dynamic tesselation
sculpting on large meshes. Octree based 3d virtual worlds. Pub/sub routers
/ messaging systems. The HTML DOM+JS environment. Business Analytics
computation engines. 2d/3d virtual world servers. Databases.

Many of these may be solvable with C4. My point is that I don't see how
regions helps any of them, and they are the hard GC problems. What hard
problem does first-class regions fix? I don't see it.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to