This sounds like something tcmalloc might help with. I believe you can link it in at run time with a command line switch. It's at http://code.google.com/p/gperftools/
Oh and a request to blender devs: if it works and it's something you want, please make it optional because it doesn't like to release memory back to the OS. On Thu, Nov 22, 2012 at 4:46 AM, Jonas Wielicki <[email protected]>wrote: > Hi all, > > First off, I did run long tests (i.e. baking) with blender 2.59 when I > experienced this issue first and I did a short check to verify it's > still present in blender 2.64a. For full system specs see [3]. > [sidenote: I'm still using blender 2.59 cause that's what my linux > distribution's (fedora 16) shipping] > I've been pointed to this mailing list after jumping into the devel irc > to find out where to discuss this problem. > > Description of symptoms > ----------------------- > > I've been baking a simulations on a rather decent PC for a few hours > now. As long as I keep memory use in the mid-terms of the available > physical memory, everything is fine. However, things start to screw up > when I go to the upper range, like blender using 80% or more of the > physical memory (which should be fine, as the remaining isn't used too > much). In that case, other (memory using) applications often stall > without any swapping involved. > > I've observed blender using htop (it's like top, just more awesome) and > did some research on the involved kernel thread, khugepaged. When the > stalls happen, blender, the other stalling application and khugepaged > are using most of the CPU (with blender and the stalling application > using 90%--99% of each core and khugepaged totalling to 8% or > something). Now, using CPU isn't unusual for blender, but it's spending > the time in the kernelspace instead of the userspace (100% of it), which > is obviously not desired. > > khugepaged is related to a linux kernel feature called Transparent > Hugepage Memory, about which more information is available here[1]. It > seems to boil down to try to keep memory for application using lots of > it as contiguous as possible. > > Appearantly, this involves some memory compaction and moving around of > pages, which I am able to observe using > > watch "cat /proc/vmstat | grep compact_*" > > Especially compact_fail and compact_pages_moved are increasing heavily > (compared to their absolute value) (the values are explained in [1]). > > Suggested diagnostic > -------------------- > > In theory, compaction should be fine and after a few minutes, everything > should even out -- the application doing heavy calculations involving > lots of memory gets its contiguous pages and can crunch the numbers > happily. > > However, things start to screw up if the application releases and > allocates large blocks memory alternatingly (possibly only on an in the > meantime averagely used desktop system (now the first question is > whether that's actually of interest for the blender project) ), > especially if the time between the allocation and deallocation is a lot > smaller than the time needed for the compaction to converge (which may > be the case with a complex smoke simulation in blender 2.5). See the > message [2] for some reference that this might be relevant. > > Indicators that the diagnostic may be correct > --------------------------------------------- > > Now, blender does exactly that. For each frame of the 256-division smoke > sim with 2 subdivisions high-resolution noise (and some 32k emitter > particles involved), blender (de-)allocates the whole memory for each > frame at the beginning/end of each frame. With hugepaging, this makes > blender stall for some time during the allocation. Other applications > trying to allocate larger blocks of memory (firefox, pdf viewers) are > also pulled into the vortex and get stalled for some time, often shorter > than blender though. > > Observing /proc/$pid/stack of the blender threads points to the > compaction routines in the kernel too (try_to_compact_pages is in the > callstack actually). > > The specific behaviour of stalling at the start of a frame is _not_ > observed when turning off transparent hugepage support (echo never > > /sys/kernel/mm/transparent_hugepage/enabled before starting blender), > _but_ the system starts swapping, possibly because no contiguous memory > is available for blender. > > > Because this is, as far as I can tell, expected behaviour in the linux > kernel (inferred from the discussion of the patch at[2]; the patch > itself is afaik not related to the problem, but the discussion is > enlightening of the purpose and the effects of hugepaging), I decided to > go ahead and report this to blender, as it seems this could be fixed by > changing the memory use behaviour of blender. > > I'm not sure what further information I can share with you. If you need > any additional information snippets, please just ask back. I tried ato > limit myself to the description of the symptoms and a diagnostic > inferred from what I've learnt about hugepaging in the last few days. > > best regards & looking forward to your replies, > Jonas > > [1]: http://www.mjmwired.net/kernel/Documentation/vm/transhuge.txt > [2]: http://article.gmane.org/gmane.linux.kernel.mm/70032 > [3]: System specification (possibly relevant parts): > blender: 2.59, 2.63a, 2.64a > linux: 3.6.6-1.fc16.x86_64 > graphics (hopefully not relevant): nvidia proprietary 304.60 > memory (for reference): 8 GB > _______________________________________________ > Bf-committers mailing list > [email protected] > http://lists.blender.org/mailman/listinfo/bf-committers > _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
