Hi, thanks for taking a look at this. If I understand the problem correctly the slowdown is mostly due to simple nodes not caching at all, that means that for a fanout>1 on a socket the node has to be calculated at least that many times. For simple nodes that is usually not a problem until you string a lot of them together with multiple fanouts inbetween, then it very quickly adds up. I would say breaking these execution groups into pieces is likely to be most successful at points where nodes have a large fanout; this should reduce the number of calculations significantly. The longer I think about it the more I feel there is an interesting graph theory problem/algorithm here that might help...
(Maybe this is all very obvious to you already, in which case disregard my input and do what you think is best ;) till then, David. On Jan 29, 2013, at 12:08 PM, Jeroen Bakker wrote: > This is a proposal to solve speed-issues of the compositor. It should > not be considered as the final solution, but should help the most common > issues. > > Problem statement. > The compositor works best when having a good mixture of simple and > complex nodes. If you have a lot of simple nodes the system is not able > to find a good balance when converting to execution groups (subprogram > that will be scheduled to a core of the CPU). It results in a few > execution groups with many simple operations and a small number of > buffers that store intermediate results. This slows down the system a > lot > [http://projects.blender.org/tracker/?func=detail&aid=33785&group_id=9&atid=498]. > > A workaround for this slowdown was to add a complex node (that doesn't > do anything, like blur 0) in the setup. > > First test shows that good result depends on the node tree setup and the > available memory of the system. We propose to split up execution groups > into smaller ones if they get too big. The split up will depend on two > variables: > 1. amount of memory in the system (not free memory) > 2. number of operations in an execution group > > As this mechanism does a lot of guesses, the user should be able to > manually control the number of cuts. > > During tests we saw the next results > Used file: file attached to issue #33785 > Used system: Intel(R) Core(TM) i5 CPU M 580 @ 2.67GHz, with 8GB of > memory, ubuntu 12.04 64 bit: > - Baseline (no changes to code): 861MB, 47.49 seconds > - Limit execution group size to 10: 3424MB, 7.267 seconds > - Limit execution group size to 15: 3289MB, 7.607 seconds > - Limit execution group size to 20: 2884MB, 9.393 seconds > - Limit execution group size to 25: 2884MB, 11.987 seconds > > Best regards, > Jeroen & Monique > - At Mind - > _______________________________________________ > Bf-committers mailing list > [email protected] > http://lists.blender.org/mailman/listinfo/bf-committers _______________________________________________ Bf-committers mailing list [email protected] http://lists.blender.org/mailman/listinfo/bf-committers
