On Sat, 01 Feb 2014 17:58:01 -0800, Manu <turkey...@gmail.com> wrote:

On 1 February 2014 22:04, JR <zor...@gmail.com> wrote:

On Saturday, 1 February 2014 at 05:36:44 UTC, Manu wrote:

I write realtime and memory-constrained software (console games), and for
me, I think the biggest issue that can never be solved is the
non-deterministic nature of the collect cycles, and the unknowable memory footprint of the application. You can't make any guarantees or predictions about the GC, which is fundamentally incompatible with realtime software.

(tried to manually fix ugly linebreaks here, so apologies if it turns out
even worse.)

(Maybe this would be better posted in D.learn; if so I'll crosspost.)

In your opinion, of how much value would deadlining be? As in, "okay
handyman, you may sweep the floor now BUT ONLY FOR 6 MILLISECONDS;
whatever's left after that you'll have to take care of next time, your
pride as a professional Sweeper be damned"?


This has been my only suggestion for years. Although 6ms is way too much
(almost half a frame), 500us is more realistic (a little over 3% of a
frame, still quite a lot of time).

It obviously doesn't address memory footprint, but you would get the
illusion of determinism in cases similar to where race-to-idle approaches
work. Inarguably, this wouldn't apply if the goal is to render as many
frames per second as possible, such as for non-console shooters where
tearing is not a concern but latency is very much so.


If it were running a small amount every frame, maybe the memory footprint
would stay within a predictable level... don't know. I'd be interested in
experimenting with this, but as far as I know, nobody knows how to do it.

I'm very much a layman in this field, but I'm trying to soak up as much
knowledge as possible, and most of it from threads like these. To my
uneducated eyes, an ARC collector does seem like the near-ideal solution --
assuming, as always, the code is written with the GC in mind. But am I
right in gathering that it solves two-thirds of the problem? You don't need
to scan the managed heap, but when memory is actually freed is still
non-deterministic and may incur pauses, though not necessarily a
program-wide stop. Aye?


I'm not sure what you mean. If you mean pauses because it's simply doing
work to free things, then that's deterministically issued workload just
like anything else and can (should) be budgeted. It doesn't disappear in a
GC context, it just happens in bursts at unpredictable times.
Like you say, it doesn't stop the world, no need for that. Additionally, it
leaves room for case-by-case flexibility in destruction approach. If
something takes some time to destroy, and you're okay to have it hang
around for a while, you can easily plonk it in a dead list waiting for some idle time to work through it; thus truly allows the opportunity to use idle time to clean stuff up since you don't need to scan the heap every time, a
list of things waiting to be destroyed is already known.
You could also add dead objects to a list which can be processed by another
thread (without stopping all threads).
There is far more opportunity to implement the destruction pattern that
suits your application, and for those not interested in implementing their
own destruction patterns, it would be easy to offer a library function
setDestructionPattern(patternType); which would configure a particular
policy application-wide, and they'd never have to think about it again.

At the same time, Lucarella's dconf slides were very, very attractive. I
gather that allocations themselves may become slower with a concurrent
collector, but collection times in essence become non-issues. Technically parallelism doesn't equate to free CPU time; but that it more or less *is*
assuming there is a cores/thread to spare. Wrong?


IIRC, it assumes a particular operating system, and required significant
resource overhead. I recall being impressed, but concluding that it wasn't
a solution that could be merged into D across the board.
I'd need to go back and revisit the slides...

Lastly, am I right in understanding precise collectors as identical to the
stop-the-world collector we currently have, but with a smarter allocation
scheme resulting in a smaller managed heap to scan? With the additional
bonus of less false pointers. If so, this would seem like a good
improvement to the current implementation, with the next increment in the
same direction being a generational gc.


I think the reduction in heap to scan is due to only following what are
known to be pointers, but I don't think that reduces the scan volume, the
memory footprint of your app is the same (more because there's new
metadata), but it wouldn't chase false pointers anymore.
I recall most agreed that it was a good improvement (although it
benchmarked something like 5% slower), I would appreciate it simply for the
improved precision... but it doesn't solve the problem in any way. The
practical application for me is that it wouldn't leak (much) memory.

I would *dearly* love to have concurrency in whatever we end up with,
though. For a multi-core personal computer threads are free lunches, or
close enough so. Concurrentgate and all that jazz.


The kicker for me with the whole GC thing, is that as long as I've been
talking about it around here (among a whole bunch of experts), I've been
saying that an acceptable GC would require to 1. not stop the world, 2.
support incremental collection/time-slicing, so I can budget it a maximum
amount of time per frame.
I'm yet to hear anyone suggest how they can even IMAGINE writing a
collector like that. As far as I can tell, the consensus is, that it's
impossible.
Certainly, nobody has ever given any suggestions how it could possibly be
done in the future, or steps in that direction, which says a lot to me.
I've abandoned the idea as unrealistic fantasy.

It is impossible with a conservative GC such as the current D implementation. Boehm tried with his C++ collector but could never get beyond the basic Mark-LazySweep because C++ enforces a conservative mindset.

When you have precision a number of options open up. Firstly, you can now parallelize all phases because pointers are precisely known. And second, you can start to do things like moving (a prereq for generational collection) because you don't have to pin things that you conservatively think are pointers. Now that you have a generational collector, it's trivial to do an incremental collection on one or more generations.

So therefore precision is the first step, and more to the point it's already been done in D, VisualD uses a precise version of the stock D GC and special build of the compiler with precise information emitted at compile time. That was what Rainer's talk was about at DConf last year. Getting this merged into the mainline will mean that there is finally enough information for us to go nuts building an actually useful GC in D, but until then you really can't progress beyond what we have right now.

It's quite possible, it's just a matter of getting the worked vetted at merged.

--
Adam Wilson
GitHub/IRC: LightBender
Aurora Project Coordinator

Reply via email to