On Wed, Jan 14, 2015 at 08:07:37PM +0000, deadalnix via Digitalmars-d wrote: > On Wednesday, 14 January 2015 at 18:01:22 UTC, H. S. Teoh via Digitalmars-d > wrote: > >Recently in one of my projects I found that I can gain a huge > >performance improvement just by calling GC.disable() at the beginning > >of the program and never calling GC.enable() again, but instead > >manually calling GC.collect() at strategic points in the code. > >Obviously, YMMV, but I managed to get a 40% performance improvement, > >which is pretty big for such a relatively simple change. > > > > Interesting that you need to disable to get the effect. That mean our > heuristic for the GC collection to kick in sucks quite badly.
Well, I'm not sure what the real cause is, but what happened was that I was working on optimizing performance, and gprof indicated that a lot of time was spent in the GC collection cycle. That led me to a lot of needless GC allocations that, after I eliminated them, netted me a huge performance boost. However, I noticed that there was still a lot of time spent in the GC collection cycle -- less than before, but still a big chunk of my running times. So as an experiment I decided to turn off the GC completely to see what happens -- found that running times improved by 40-50%, which is pretty huge! Of course, that also meant I was leaking memory and the program was soaking up too much RAM, so the second thought I had was to still run the GC collection cycles, but at a much reduced frequency. This is specific to my program's memory usage patterns (an ever-increasing amount of allocations that remain live until the end of the program, plus a comparatively much smaller number of temporary allocations that need to get cleaned up every now and then to keep total memory use under control); I'm not sure how generally applicable it is. In my particular case, one of the major factors in poor GC performance was the increasing bulk of allocations that are known to remain live until the end of the program, that the GC must scan every collection cycle because it doesn't know that most of them are going to remain live for a long time. Consequently, collection cycles become slower and slower as the program progresses, with most of the work being unnecessary since the growing bulk of allocations aren't going away anytime soon. This problem would be instantly solved by a generational GC, since after a few cycles most of the bulk of the long-lived allocations will get pushed to the oldest generations and the young collection cycles won't be bogged down scanning them unnecessarily. I'm not holding my breath for D to get a generational GC, though. :-P Alternatively, since I already know exactly which allocations are going to persist until the end, I could just use malloc instead. However this is a bit annoying to implement since these allocations are coming from a (very large) AA that I'm adding stuff to (nothing is ever removed). But since I'm already working on replacing this AA with something else with better cache-friendliness (and also disk-cacheability to transcend current memory limitations), there's no point trying to improve AA performance at this time. T -- Truth, Sir, is a cow which will give [skeptics] no more milk, and so they are gone to milk the bull. -- Sam. Johnson