Re: [Bf-committers] Proposal: Blender OpenCL compositor
I guess he is talking of algorithm where the entire buffer needs to be evaluated before processing the pixels. Like knowing which is going to be the Max Luma value in your buffer to divide all your pixels by this value. (very simple example, but just for the case) or something like Retinex algorithm, which recontrast localy based on value from the entire buffer, if you only have tiles and some part of the image at the process time, that would be an issue. But even with that I don't understand why it would be a problem. as I understand it, only the node itself does the tile, but the I/O of each nodes is full buffers right ? I don't know how this works exactly, but I can understand his fear about it, but again, I'm pretty sure we are not the first compositor node doing tile base right :) ? 2011/1/22 Jeroen Bakker j.bak...@atmind.nl On 01/21/2011 04:14 PM, Aurel W. wrote: You are talking about things such as convolution with a defined kernel size. There are other operations and a compositor truly transforms an image to another image and not pixels to pixels etc. If it's implemented in such a naive way, the compositor will be very limited. I got a very bad feeling about this Ok, let's normalize an image with a tile based approach,... uh damn it Aurel, don't worry on that. Tile based is that the output is part of a tile. But the input data can be the whole or a part of the image. On the technical side there will be some issues to overcome (mostly device memory related). Btw there are possibilities when you need every image pixel as input to use a intermediate to reduce memory need. I did this already in the defocus node. Please help me to determine the case when a whole output image is needed. IMO input is readonly and output is writeonly. I don't see the need atm to support whole output images in a 'per output pixel' approach. And every 'per input pixel' approach can be written by a 'per output pixel' approach. In the current nodes the two approaches are mixed. Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers -- François Tarlier www.francois-tarlier.com www.linkedin.com/in/francoistarlier ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi Jeron, Please help me to determine the case when a whole output image is needed. IMO input is readonly and output is writeonly. I don't see the need atm to support whole output images in a 'per output pixel' approach. And every 'per input pixel' approach can be written by a 'per output pixel' approach. In the current nodes the two approaches are mixed. The problem of the concept of pixel to pixel operations is also that this tends to be implemented with a lot of overhead. Like having 3 frames on the call stack for adding two pixels and this for every pixel in the buffer, it is really nasty. This is why even adding buffers together is rather inefficient at the moment. Another example would be the filter node, with these pixel_processors for convolution. If you really think about low level efficiency, down to the level of single instructions, a lot could be done better at the moment. I also realize that the argument it would work with the current compositor is a strong argument. But I got some problems with that. First of all I think that a compositor should be in principal be able to support all image processing operations. I think it's a rather bad idea to be stuck with a very limited architecture, which already requires a bunch of hacks to implement the functionality of current nodes as those doing convolution. Another problem I see with tiling is, that you are doing spacial partitioning and are therefore stuck in the spatial domain. But there are a lot of possibilities of working in gradient and frequency domain, also including speedups. But you won't be able to convert a tile to gradient domain, because you can't determine the correct gradient on the borders. When you want to work in frequency domain you also run into issues with tiling, because of your spacial partitioning. But back to the simple issue with operations, which need full buffer access. I agree that this could be still done with tiling, because you can simply compute all input tiles and just access those when computing one single output tile. So this is sort of how this should work? At least your diagram in your document looks like this. Any other workarounds like using overlapping tiles for the very special case of a 3*3 kernel convolution are just hacks, but will prevent the implementation of any future nodes, which have other non pixel-pixel operations. Such future node for instance could be tone-mapping. This is for e.g. a standard feature in lux, so I guess it's not that absurd to include such features in blenders compositor. And some tone mapping algorithms need to operate on the entire image. In terms of memory usage, caching, etc. if we assume that only reasonable sized buffers are used, let's say up to 64MB, I also don't see the strong benefits in using tiles rather than buffers, which hold the entire image. But maybe you have to be more specific about the caching scheme you want to use here. aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On Sat, Jan 22, 2011 at 8:11 PM, Aurel W. aure...@gmail.com wrote: I also realize that the argument it would work with the current compositor is a strong argument. But I got some problems with that. First of all I think that a compositor should be in principal be able to support all image processing operations. On the contrary, I think the compositor should be designed and optimised for its purpose, compositing CGI/vfx imagery. It doesn't need to be a completely generalised image processing system, it just needs to do what it's intended for, well. So far I've seen a mostly theoretical objections here, but I think it's important to keep focused on enabling people to produce shots. But back to the simple issue with operations, which need full buffer access. I agree that this could be still done with tiling, because you can simply compute all input tiles and just access those when computing one single output tile. Or rather the tiles that are necessary at any given time. In the case of the Normalize node for example (which is mostly useless for animated sequences, as are any tone mapping operators that work in a similar way), it would be possible to retrieve each tile one by one in a pre-process, read and store the statistical information, and then apply that per tile or even per pixel. In terms of memory usage, caching, etc. if we assume that only reasonable sized buffers are used, let's say up to 64MB, I also don't see the strong benefits in using tiles rather than buffers, which hold the entire image. The benefits are lower memory usage, and better/easier parallelisation. ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On the contrary, I think the compositor should be designed and optimised for its purpose, compositing CGI/vfx imagery. It doesn't need to be a completely generalised image processing system, it just needs to do what it's intended for, well. So far I've seen a mostly theoretical objections here, but I think it's important to keep focused on enabling people to produce shots. Even if we just assume the existing nodes, there are lots of issues with tiles. To be future-proof, also other nodes and operations have to be considered. I mentioned some, they might not be the best examples, but they demonstrate issues with the design. Strong limits in this tile based design are a con. Sorry if this seems to be just theoretical objections, but just comparing a design to the existing nodes won't do it. Or rather the tiles that are necessary at any given time. In the case of the Normalize node for example (which is mostly useless for animated sequences, as are any tone mapping operators that work in a similar way), it would be possible to retrieve each tile one by one in a pre-process, read and store the statistical information, and then apply that per tile or even per pixel. Again, just examples for operations on images,... and I guess tone-mapping isn't such a bad one, especially if you consider compositing of single images, not animations. The benefits are lower memory usage, and better/easier parallelisation. In practice, if you assume that your memory can hold multiple buffers anyway, I can't significant improvements in memory usage. We also have to distinguish between two use cases here, the one where compositing graph is just executed once and the one where a user interactively adjusts settings and wants to keep intermediate results in memory. Again, there is no proposal for a caching scheme for the tiled based solution in the interactive case yet and I can't think of anything that would have large benefits compared to work on full buffers and also cache those. I also highly doubt that this will lead to better/easier parallelisation. I still think that more fine grained parallelisation in each individual node, operating on the entire buffer, would turn out better in practice. At least I want to have a discussion on this. Just to assume prematurely, that tiles will give better performance is not a good idea. aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Jeron : Like what are tiles in the perspective of a user/artist and what are tiles in perspective of parallelization. Both definitions are right, but developers and users mix the definition and the meaning. Sorry for that. Thats what I thought :) Aurel : Again, just examples for operations on images,... and I guess tone-mapping isn't such a bad one, especially if you consider compositing of single images, not animations. the goal here is to build a compositor and as Matt says, it should do what it's supposed too. I know that as for now the compositor had a design to enhanced render and mostly still (and IMO that the reason it got some limitation today too). As xsi had its fxtree or whatever. Thinking about using it for still is like if you were telling me you would use After Effects to do Photoshop stuff. It is possible and yes they could make one tool of both. But there is a good reason it is two different software even if at the end they do really similar things. Just to say, I believe the design should concentrate on large image memory (4k higher is coming in future for sure) and animation above all. some gradiant base algorithm very fast blur are in needs of full buffer for sure, but I don't understand why some nodes cannot says I need full buffer, so I'll wait all my parents to compute and give me a FB as input and other nodes (by default) based on tiles. So only a few would be slower than other, but still everybody would happy ? Actually I wonder if Nuke is not doing some kind of similar thing. Matt ? The reason it makes me think of that, is on some Nukes scripts, I have seen some nodes updating all together, and then one of them updating kind of seperatly, like if it was waiting for something. But again, maybe i don't really understand the issue here, my apology :( F 2011/1/22 Aurel W. aure...@gmail.com On the contrary, I think the compositor should be designed and optimised for its purpose, compositing CGI/vfx imagery. It doesn't need to be a completely generalised image processing system, it just needs to do what it's intended for, well. So far I've seen a mostly theoretical objections here, but I think it's important to keep focused on enabling people to produce shots. Even if we just assume the existing nodes, there are lots of issues with tiles. To be future-proof, also other nodes and operations have to be considered. I mentioned some, they might not be the best examples, but they demonstrate issues with the design. Strong limits in this tile based design are a con. Sorry if this seems to be just theoretical objections, but just comparing a design to the existing nodes won't do it. Or rather the tiles that are necessary at any given time. In the case of the Normalize node for example (which is mostly useless for animated sequences, as are any tone mapping operators that work in a similar way), it would be possible to retrieve each tile one by one in a pre-process, read and store the statistical information, and then apply that per tile or even per pixel. Again, just examples for operations on images,... and I guess tone-mapping isn't such a bad one, especially if you consider compositing of single images, not animations. The benefits are lower memory usage, and better/easier parallelisation. In practice, if you assume that your memory can hold multiple buffers anyway, I can't significant improvements in memory usage. We also have to distinguish between two use cases here, the one where compositing graph is just executed once and the one where a user interactively adjusts settings and wants to keep intermediate results in memory. Again, there is no proposal for a caching scheme for the tiled based solution in the interactive case yet and I can't think of anything that would have large benefits compared to work on full buffers and also cache those. I also highly doubt that this will lead to better/easier parallelisation. I still think that more fine grained parallelisation in each individual node, operating on the entire buffer, would turn out better in practice. At least I want to have a discussion on this. Just to assume prematurely, that tiles will give better performance is not a good idea. aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers -- François Tarlier www.francois-tarlier.com www.linkedin.com/in/francoistarlier ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi, To be honest, long winded discussions on ways how to implement stuff should not take away the freedom for a developer to find out him/ herself the optimal cases. I'm confident that Jeroen is aware of boundary cases here, and he will try to find a good balance for practical usage. For as long we agree on existing and future demands on compositing in Blender, we should give him our blessings :) Relevant specs are for example: - desired input methods, like storage, types, UI workflow, colorspaces, alpha, (plugins?) - desired output specifications, like memory/cpu/gpu performance and visual feedback -Ton- Ton Roosendaal Blender Foundation t...@blender.orgwww.blender.org Blender Institute Entrepotdok 57A 1018AD Amsterdam The Netherlands On 22 Jan, 2011, at 13:28, Aurel W. wrote: some gradiant base algorithm very fast blur are in needs of full buffer for sure, but I don't understand why some nodes cannot says I need full buffer, so I'll wait all my parents to compute and give me a FB as input and other nodes (by default) based on tiles. So only a few would be slower than other, but still everybody would happy ? Yes something like that would be necessary. I guess in practice it will be very hard to determine the required tiles, so maybe there will be only two cases, one where only one tile is needed and the one where simply all tiles are needed. I am also worried about the memory layout of this. Single tiles would be computed to separate data structures, maybe just a single array. All tiles are computed like this for an entire image. The next node, which needs to operate on the entire image now has to access individually pixels in all tiles. So you have two options, introduce some sort of abstraction to access these pixels, or copy all tiles to a single buffer, which then gets processed. The first one adds a lot of overhead and cache unfriendliness. The second one also adds overhead and memory usage by copying. Of course this would need testing and better analysis, but it can tremendously slow things down. the goal here is to build a compositor and as Matt says, it should do what it's supposed too. well, of course,... aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi, j.bak...@atmind.nl (2011-01-22 at 0952.44 +0100): image. The highest/lowest value is calculated once (not parallelized) the pixel processor is parallelized. Not very good example ;] as this searching problem is near as much parallelizable as the pixel processor would be. Split the work into N workers, every one gets total_pixels/N (or tiles or whatever), looking for the local max and min. Then scan the N maxes to get the final max, and the same with the N mins. Even if you have a system with 1024 workers, that is only an extra non parallel pass of 1024 checks (assuming you do not parallelize it again, having four workers doing 256 each and finally compare four results, for example). So the questions if you want to process in pixel stacks (what is the final result for X,Y pixel before X+1,Y is known) or buffers (work in one set of tiles and never look at them except if something down the node tree changes). If you want the final full image, you will do the full work in both cases anyway. Exceptions aside, you probably want buffers approach (with tiled internal organization, that is fine), because that way the code cache gets lots and lots of hits, and data one probably too. The other way you are trashing all caches. GSR ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi Vilem On 01/21/2011 01:12 AM, Vilem Novak wrote: I'd like to have 2 more questions: Where did go the idea of integrating gegl as the library driving compositor processing(originally 1 of durian targets?)? I don't know, perhaps one of the durianers can elaborate on this. I myself see pro's and con's in using this library inside Blender. It has already solved issues we are trying to tackle now, but looking at the requirements that our users have, I am not sure that the library will be that suitable (granularity of the nodes/operators). Will it be harder to develop nodes for the tile based system than now? will it still be possible to write non-tile based nodes, or non-opencl nodes? No implementing a (tile based) node will be different, but easier. The hard part will not be visible to the node developer. The developer is not aware that OpenCL exists or that it is tiled based. There will be a difference as everything has to be written as pixelprocessors. Currently I don't have seen the requirement that non-tiled based nodes are needed. I have seen the requirement for non-opencl nodes (Py expressions, Image loading and saving, displaying, etc) so that is in scope. Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Another question, I am concerned with. What do you mean with tiles in the context of the compositor. That a node just processes patches/tiles of an image, so the basic entity, which is processed becomes a tile or even a single pixel? I hope it's commonly realized that a compositing node always has to process an entire image globally and output an entire image. The processing of each pixel depends on every other pixel in the entire image not just in tiles or on the very one input pixel. It's really that simple, a node can be expressed by a function f(image)-image and not f(tile)-tile or f(pixel)-pixel. Please remember this when doing any design of the new system, otherwise things will be heavily screwed up. aurel On 21 January 2011 08:37, Jeroen Bakker j.bak...@atmind.nl wrote: On 01/21/2011 12:10 AM, Matt Ebb wrote: I say this not to be negative, but because there is a lot of room for functional improvement in blender's compositor, and if it is to be re-coded, it should be done with an eye to workflow and future abilities, not just from a purely techno-centric perspective. I don't see it as negative. Also I don't think that I am (cap)able to implement all these functional wishes/changes. They need to be thinked about by users/developers together. I also don't think it is good to do all this work we discussed in a single project. There will be a separation of technology and functionality. First should concentrate in implementing a kernel (stable ground), that is capable of our 'future' wishes/changes/capabilities. And secondly we need to implement the more functional/workflow part. The first part needs to know the directions of the second part (vision). This vision should be clear upfront, but not in details. Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On 21 January 2011 15:34, Martin Poirier the...@yahoo.com wrote: Not all effects needs access to all of the buffer. A lot of them only need access to a neighbourhood around each pixels, for which a system of slightly overlapping tiles fits the problem. You are talking about things such as convolution with a defined kernel size. There are other operations and a compositor truly transforms an image to another image and not pixels to pixels etc. If it's implemented in such a naive way, the compositor will be very limited. I got a very bad feeling about this Ok, let's normalize an image with a tile based approach,... uh damn it aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi Jeroen, I'll comment on the tiling / OpenCL proposal itself in another mail later. I agree with Matt that it would be good to address a number of design issues first. Perhaps these could be implemented before work on tiling or OpenCL begins. * Automatic data type conversion between nodes. * Storing channels non-interleaved. * Premul vs. key alpha. We should have a convention here and stick to it. * Color management. Also think we should decide on a convention here. * Store transformations along with buffers. * Change all nodes to use a get_pixel function. Options to shuffle channels, or change color spaces can all be done outside of nodes, as part of automatic data conversion as already proposed. A get_pixel function would handle procedurals/transformations automatically. These things don't seem particularly hard to implement, but would be quite a bit of work refactoring code. Further, most of the things in the VFX proposal seems like they would not have much effect on the internal workings, more about UI and different ways to get data in/out of the compositor. Brecht. ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi Jeroen, Node compiler. I really dislike code going through a node compiler. OpenCL is basically C with some restrictions and a few extra qualifiers, let's take advantage of that and for the CPU just compile as part of the regular compilation process. It's not clear to me why this node compiler is necessary. What I would propose is to just #include the kernel into the CMP_*, and call it directly from there. There's a few things to do to make that work, but still seems simpler than having a makenodes inbetween. Automatic data conversion between nodes. What I'm not sure about is the different color data types (RGBA, HSVA, YUVA, ..). This would no be exposed to the user, all they would know is that it's a Color, right? Is it really necessary to have these as core data types, can't the nodes do such conversions if they want to? Kernel types. To me it seems perhaps better to not classify kernel types this way, but to classify buffer inputs as either random access or not. I'm not sure about how you planned to do kernel grouping, but thinking of it this way also makes it possible to group two blurs that are then mixed together for example. Memory buffers states. Not clear to me why these states are stored as part of the buffer itself, seems to me some of these are more related to the node execution, not the memory manager. Consistency between GPU and CPU. New CUDA GPU's can actually do identical floating points ops, if you're careful. If you use optimizations like fast math, SSE, fused mutliply-add, this becomes harder. My guess is that differences will nearly all the time be too small to be visible regardless, since colors don't need that many bits of precision. Another problem may be that some types of optimizations run well on the CPU but are harder on the GPU. Would it still be possible to have such CPU optimized implementations, or would everything have to be done in kernels? Brecht. ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On Fri, Jan 21, 2011 at 8:29 AM, Brecht Van Lommel brechtvanlom...@pandora.be wrote: Automatic data conversion between nodes. What I'm not sure about is the different color data types (RGBA, HSVA, YUVA, ..). This would no be exposed to the user, all they would know is that it's a Color, right? Is it really necessary to have these as core data types, can't the nodes do such conversions if they want to? Sadly RGB to YCbCr or vice versa _would_ need to be exposed as a result of different color matrices. You would need to specify Rec601 versus Rec709. Xat can speak to this further. I'm certain there are cases that aren't immediately obvious that would require exposing the color model. ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi Brecht, Great remarks, and several of these I can assist on too. The get-pixel one will be toughest though... I know several nodes have been heavily optimized to use rows. I need to study Jeroen's proposal in detail still, can't say much... but based on all the feedback he received, it's definitely good to try to limit the scope of work as much as possible, or define a step by step migration path. Reply to some concerns here; - The compositor should run always good on CPU (multi core) too. I'm convinced it would benefit OpenCL's thread balancing a lot already. A bit of performance loss compared to a full native pthread implementation (like 10-20% ?) is acceptable, provided the GPU gains are very evident. - My impression is that non-openCL usage will be mostly on render farms, and nearly every average-to-decent 3D workstation will have excellent GPU performance. Artist time is still far more valuable than computer time :) - A tile-based subdivision schedule is for two reasons; efficient memory use (valid for cpu and gpu alike) and for a potential efficient threading setup. The latter has to be carefully designed, to prevent bottlenecks on individual nodes that need full buffers (like DOF, Vector Blur). -Ton- Ton Roosendaal Blender Foundation t...@blender.orgwww.blender.org Blender Institute Entrepotdok 57A 1018AD Amsterdam The Netherlands On 21 Jan, 2011, at 16:55, Brecht Van Lommel wrote: Hi Jeroen, I'll comment on the tiling / OpenCL proposal itself in another mail later. I agree with Matt that it would be good to address a number of design issues first. Perhaps these could be implemented before work on tiling or OpenCL begins. * Automatic data type conversion between nodes. * Storing channels non-interleaved. * Premul vs. key alpha. We should have a convention here and stick to it. * Color management. Also think we should decide on a convention here. * Store transformations along with buffers. * Change all nodes to use a get_pixel function. Options to shuffle channels, or change color spaces can all be done outside of nodes, as part of automatic data conversion as already proposed. A get_pixel function would handle procedurals/transformations automatically. These things don't seem particularly hard to implement, but would be quite a bit of work refactoring code. Further, most of the things in the VFX proposal seems like they would not have much effect on the internal workings, more about UI and different ways to get data in/out of the compositor. Brecht. ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
--- On Fri, 1/21/11, Ton Roosendaal t...@blender.org wrote: - A tile-based subdivision schedule is for two reasons; efficient memory use (valid for cpu and gpu alike) and for a potential efficient threading setup. The latter has to be carefully designed, to prevent bottlenecks on individual nodes that need full buffers (like DOF, Vector Blur). DOF and Blur you can take care of with overlapping source tiles as long as you know the maximum fetch distance (the blur radius, basically). Takes a bit more memory but it means you can parallelize them pretty much how you want (with diminishing return because the overlap zone size is constant). Martin ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
DOF and Blur you can take care of with overlapping source tiles as long as you know the maximum fetch distance (the blur radius, basically). Takes a bit more memory but it means you can parallelize them pretty much how you want (with diminishing return because the overlap zone size is constant). Hi, there are many nodes, where this won't be easy, and they really need full buffer access. Just computing overlapping patches for a simple convolution case gets far too complicated and is really not flexible at all. Let's assume you have filter node, with a lot of iterations, so several convolutions taking place. The patch based approach fails here, since you would need to access also the updated regions in the other tiles, which were computed by an other patch. The only solution is to grow the the overlapping areas depending on the number of iterations. Let's assume we have a convolution node as DOF, which does several iterations and has a long graph as an input node. Essentially the patch size has to be changed each time the you adjust a setting in the node and therefore the entire sub graph has to be evaluated again. Changing patch sizes? That doesn't make sense to me and really gets overcomplicated. Full buffer access is needed in this case as I pointed out previously. There are also other operations, which need access to the entire buffer to determine a single pixel. Again, I have a very bad feeling about this patch based approach. aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi, On Fri, Jan 21, 2011 at 10:31 PM, Aurel W. aure...@gmail.com wrote: DOF and Blur you can take care of with overlapping source tiles as long as you know the maximum fetch distance (the blur radius, basically). Takes a bit more memory but it means you can parallelize them pretty much how you want (with diminishing return because the overlap zone size is constant). Hi, there are many nodes, where this won't be easy, and they really need full buffer access. Just computing overlapping patches for a simple convolution case gets far too complicated and is really not flexible at all. Let's assume you have filter node, with a lot of iterations, so several convolutions taking place. The patch based approach fails here, since you would need to access also the updated regions in the other tiles, which were computed by an other patch. The only solution is to grow the the overlapping areas depending on the number of iterations. Another solution is to execute kernels multiple times, and load/unload tiles each time. For each iteration you only need the same region, not a larger region. Let's assume we have a convolution node as DOF, which does several iterations and has a long graph as an input node. Essentially the patch size has to be changed each time the you adjust a setting in the node and therefore the entire sub graph has to be evaluated again. Changing patch sizes? That doesn't make sense to me and really gets overcomplicated. Full buffer access is needed in this case as I pointed out previously. There are also other operations, which need access to the entire buffer to determine a single pixel. Again, I have a very bad feeling about this patch based approach. I can't think of current nodes that would not work with such a tile based approach, with some implementation tweaks. But I'm not sure what your point is though, do you think there is a different, better way to handle buffers larger than memory, or do you think it's impossible? Brecht. ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Another solution is to execute kernels multiple times, and load/unload tiles each time. For each iteration you only need the same region, not a larger region. Things start to get a little confusing now. I thought that the entire graph or one output node can be evaluated for a single tile. At least this is how I understand the proposed tiled based system should work. Am I wrong in this case? So what you try to say is, that for one filter node, all tiles of the image have to be computed for each single iteration, where tiles have an overlapping area of the filter size. In the next iteration, the tiles are newly loaded containing also the results from neighboring tiles from the last iteration? And this has to be implemented somehow in the filter node then? So in the end, image by image is convoluted and the next iteration can't start before all tiles have finished? I sort of ment this by the full buffer is needed. I can't think of current nodes that would not work with such a tile based approach, with some implementation tweaks. But I'm not sure what your point is though, do you think there is a different, better way to handle buffers larger than memory, or do you think it's impossible? I have no doubt, that it wouldn't work, the question is how efficient it is and if there are no better solutions and if this is really necessary. So if I get this right, the only reason for tiling is to handle large buffers? Large as in larger than main memory order video ram in case of opencl? I am not sure, if this is really necessary and to handle such large images, also other things have to be adapted to support this, like the image viewer, exr loading, renderbuffer,... So are there really plans to support rendering/viewing/compositing like 32k images in blender from now on? I agree, that tiling would be the only way to support the processing of images larger than main memory. But I don't think that it will give better performance and I also think that it introduces a lot of unnecessary complexity. aurel ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On 01/21/2011 04:14 PM, Aurel W. wrote: You are talking about things such as convolution with a defined kernel size. There are other operations and a compositor truly transforms an image to another image and not pixels to pixels etc. If it's implemented in such a naive way, the compositor will be very limited. I got a very bad feeling about this Ok, let's normalize an image with a tile based approach,... uh damn it Aurel, don't worry on that. Tile based is that the output is part of a tile. But the input data can be the whole or a part of the image. On the technical side there will be some issues to overcome (mostly device memory related). Btw there are possibilities when you need every image pixel as input to use a intermediate to reduce memory need. I did this already in the defocus node. Please help me to determine the case when a whole output image is needed. IMO input is readonly and output is writeonly. I don't see the need atm to support whole output images in a 'per output pixel' approach. And every 'per input pixel' approach can be written by a 'per output pixel' approach. In the current nodes the two approaches are mixed. Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
There are a couple of things i'd like to note, especially those not directly related to OpenCL vs. CPU code (most arguments have been voiced already): * On the question whether horizontal layout (Blender nodes, Softimage), vertical layout (Houdini, Aviary) or completely customized layout (Nuke) is preferable: I'd like to point out that it would probably be difficult to use socket names and default input values for sockets with anything other than horizontal nodes. Most softwares that use a different layout approach seem to have just one single type of socket data, depending on the type of tree. For compositing systems this is simply the image buffer you want to manipulate, for more complex systems (such as Houdini) a socket connection can mean a parent-child object relation or vertex or particle data, etc., depending on the type of tree. * While the restriction to one single data type in a tree allows very clean layout and easily understandable data flow in trees, it also means that there needs to be a different way of controlling node parameters, which usually means scripted expressions. Currently many nodes in Blender have sockets that simply allow you to use variable parameters, calculated from input data with math nodes or other node's results. Afaik the equivalent to expressions in Blender would be the driver system, but making this into a feature that is generic enough to replace node-based inputs is probably a lot more work than only a compositor recode (correct me if i'm wrong). * Having a general system for referencing scene data could be extremely useful, especially for the types of trees in the domain i am working in: particle sims (and mesh modifiers lately). In compositor nodes the only real data that must occasionally be referenced is the camera (maybe later on curves can be useful for masking? just a rough idea). For simulation nodes having access to objects, textures, lamps, etc. is much more crucial even. We discussed already that such references/pointers would have to be constants, which means that their concrete value is already defined during tree construction and not only when executing. This makes it possible to read the data at the beginning of execution and convert it to OpenCL readable format. Also it will allow to keep track of data dependencies (not much of an issue in compositor, but again very important for simulations). Note that there are already some places where data is linked in a tree (e.g. material and texture nodes), but these are not implemented as sockets and so don't allow efficient reuse of their input values by linking. * I would love to see the memory manager you are planning for tiled compositing be abstracted just a little more, so that it can be used for data other than image buffers too. In simulations of millions of particles the buffers could easily reach sizes comparable to those in compositing, so it would be a good idea to split them into parts and process these individually where possible. In images the pixels all have fixed locations and you can easily define neighboring tiles to do convolutions. This kind of calculation is usually not present in arbitrary or unconnected data, such as particles or mesh vertices, so an element/tile/part will either depend on just one of the input parts or all of them. But still having a generic manager for loading parts into memory could avoid some double work. Cheers, Lukas ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi Lukas, Spaghetti vs Expressions :) : I agree with your conclusion. I really see this as that there are to few control parameters on a node and limited node implementations. Also currently we have different granularities of nodes. We have very functional nodes, and very mathematically nodes and data (combine, split) nodes. I would make there mathematically nodes part of the functional node. The data nodes are perhaps not needed anymore when you have a single datatype and color modes in the node itself. Currently the defocus node is 2d. but is only useful in 3d. Therefore compositors will create complex systems with z-clips and render layars to first split the image in layers, defocus every layer on its own and combine these layers. The same effect you see with vector blur and 2 objects moving in the opposite direction. As the depth is not used during the calculation, you need to split, calculate and combine. Some generic way to reference scene data: Yes, I will redo that part of the proposal. But at the moment I don't have the solution for every case. In the compositor the need of the data should be part of the kernel that will use the data. But as the compositor only has limited scene references I don't know the ideal solution for this. Currently will support camera data, and render data (current frame) and compositor settings (default color mode?) More abstract memory manager: I agree! I wouldn't implement this fixed to the compositor situation. I was thinking about something like: - alloc(deviceId, len(Struct), width, height) for images 2D - alloc(deviceId, len(Struct), size) for arrays 1D The compositor also uses the array Allocation for un/n-ary based kernel groups. Jeroen. On 01/20/2011 09:48 AM, Lukas Tönne wrote: There are a couple of things i'd like to note, especially those not directly related to OpenCL vs. CPU code (most arguments have been voiced already): * On the question whether horizontal layout (Blender nodes, Softimage), vertical layout (Houdini, Aviary) or completely customized layout (Nuke) is preferable: I'd like to point out that it would probably be difficult to use socket names and default input values for sockets with anything other than horizontal nodes. Most softwares that use a different layout approach seem to have just one single type of socket data, depending on the type of tree. For compositing systems this is simply the image buffer you want to manipulate, for more complex systems (such as Houdini) a socket connection can mean a parent-child object relation or vertex or particle data, etc., depending on the type of tree. * While the restriction to one single data type in a tree allows very clean layout and easily understandable data flow in trees, it also means that there needs to be a different way of controlling node parameters, which usually means scripted expressions. Currently many nodes in Blender have sockets that simply allow you to use variable parameters, calculated from input data with math nodes or other node's results. Afaik the equivalent to expressions in Blender would be the driver system, but making this into a feature that is generic enough to replace node-based inputs is probably a lot more work than only a compositor recode (correct me if i'm wrong). * Having a general system for referencing scene data could be extremely useful, especially for the types of trees in the domain i am working in: particle sims (and mesh modifiers lately). In compositor nodes the only real data that must occasionally be referenced is the camera (maybe later on curves can be useful for masking? just a rough idea). For simulation nodes having access to objects, textures, lamps, etc. is much more crucial even. We discussed already that such references/pointers would have to be constants, which means that their concrete value is already defined during tree construction and not only when executing. This makes it possible to read the data at the beginning of execution and convert it to OpenCL readable format. Also it will allow to keep track of data dependencies (not much of an issue in compositor, but again very important for simulations). Note that there are already some places where data is linked in a tree (e.g. material and texture nodes), but these are not implemented as sockets and so don't allow efficient reuse of their input values by linking. * I would love to see the memory manager you are planning for tiled compositing be abstracted just a little more, so that it can be used for data other than image buffers too. In simulations of millions of particles the buffers could easily reach sizes comparable to those in compositing, so it would be a good idea to split them into parts and process these individually where possible. In images the pixels all have fixed locations and you can easily define neighboring tiles to do convolutions. This kind of calculation is usually not present in
Re: [Bf-committers] Proposal: Blender OpenCL compositor
btw, you confirm that Expression PY Python Pixelizer are two different thing right ? by Expression PY, I meant behing able to type some python for each parameter. (I don't have any UI design for that, but it could be something like : right click on a param click on Set Expression a textfield popup to type the expression click OK the param goes to red color to show it is control by an expression) while a Python pixelizer as I understand it is more like a pixel processing node (like the Expression Node in Nuke) which could be coded via Py (which can call ocl, glsl, c, ... functions) ? And I understand this is not a performance processing approach, but as in a production artist side... not everyone can have time or capabilities to create a new node/filter in C and recompile the entire blender while sometimes an simple equation could just be type and get the results. I think that for a list of missing nodes or nodes to get rid of it, Sebastian Pablo should join the talk :) Thx François, 2011/1/19 Jeroen Bakker j.bak...@atmind.nl Hi Francois! well... my answer is still in a very early draft :). Today I took the time to dive into your posting in detail. I missed some parts when I read it first. Expression PY and Python pixelizer is do-able. I just need to include this in the proposal. I really see the value of having this. In my perception this is on all settings of a node (in Nuke I thought it was in a different node. perhaps we can borrow this idea from them.) also a Python based pixelizer can be done. but can have some limitation in performance. But that is to the artist to decide. On OpenFX I still haven't looked into the details. Currently I think that the bottleneck of implementing this is more on the Blender UI. As OpenFX has plugin capabilities the UI should be capable of handling flexible node settings etc based on your installed plugins. And also the reading and writing to a blend file is not that flexible (yet). I will spend some time next month in this subject. Also an issue is that it is Windows only. They state that a port is in the planning to Linux. Also what I personally miss is to really be able to twitch the internals of a node. In October I have, with the knowledge of Ton, reverse engineered the defocus node and came to a conclusion that it was implemented differently than we initially thought. Also when reading how many 'feature request' on this node is are placed in the bug-tracker and the complexity of the node there should me more options on the node to finetune the usage. This way nodes can become more generally usable. The main settings could be altered on the node, but the detailed settings can be altered in a panel (the N-key in the compositor). There is more place and will be more dynamic as it is python based. A different thing I want to change in the proposal is the current connector types. Currently they are buffers of 1, 2, 3 or 4 float values, representing Value, Vector and Color. The node system is flexible, but simple tasks can become a spaghetti of lines. I think that when we put all data of a single pixel in a single type. When connecting a rendered layer to the vector-blur node for example, will be one line, containing all needed data. Tweaking of the vectors can be done by settings, or by an expression node. This way we can reduce the number of links and make the node system easier to use. Perhaps we will introduce some limitations, but they can be tweaked. The node system will be cleaner (in functionality and usage). Also color model should be included in this data type. My question back is in this kind of situation, what nodes do we expect, and what nodes shall not be used anymore. I really like this discussion. It will take the proposal to the next level! Jeroen. On 01/19/2011 12:30 AM, François T. wrote: thx for answering to my blog post via your proposal, to answer some of your questions there : *expression py* - only because it is User/Artist oriented. While python is great for doing this kind of stuff and pretty popular to most people, I'm not so sure about openCL language. by the way this is not a way to make new node, it is just a node which can control some parameter or datas in your comp. Look at what is done with Expression in AE : http://www.videocopilot.net/basic/tutorials/09.Expressions/ I don't think it does need OCL power to do this kind of thing. Probably more for the Nuke kind of Expression node because it can be do some pixel processing, but then it is just a wrapper ? Maybe on a programmer stand point of view it needs to be openCL or whatever... maybe not for the front end user. IMO this needs to be consistent with the rest of the scripting language in Blender. Again production tool :) *custom passes* are not mask, they are just render passes (normal, P pass, vector pass... ), but more on a 3d render side rather than the compositor.
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On Fri, Jan 21, 2011 at 4:49 AM, Jeroen Bakker j.bak...@atmind.nl wrote: Hi Lukas, Spaghetti vs Expressions :) : I agree with your conclusion. I really see this as that there are to few control parameters on a node and limited node implementations It doesn't have to be a matter of spaghetti vs expressions. While Houdini uses a lot of expressions to manage multiple types of data flowing through one wire, it's not the only way. Many modern node based compositors handle several image planes per wire - in fusion there is a generic 'channel booleans' node to switch them around, in Nuke there are re-ordering nodes, even Shake had a simple text field where you could mention what RGBA channels and ordering would be processed and output from the input (not an expression). It would be very easy with the right internal design to have consistent options for each node to choose what input channels it will work on and what it will output. I mention this in some of my replies to francois' blog. It seems to me that this discussion is veering towards issues that impact workflow design, not just speed optimisation. I personally think this is a good thing - there are several things that can and should be modernised inside the compositor that would require re-coding, and the question should always be 'how can this enable users to produce work faster', rather than 'how do we integrate library/technology X'. This also comes with a different set of requirements though, if you're talking about changes to workflow, it requires more research into this aspect of user interaction, eg. understanding how other similar applications work and what can be learned from it, looking at how professional compositors do things on a daily basis, etc, not just coming up with ideas in isolation. I say this not to be negative, but because there is a lot of room for functional improvement in blender's compositor, and if it is to be re-coded, it should be done with an eye to workflow and future abilities, not just from a purely techno-centric perspective. cheers Matt ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
I'd like to have 2 more questions: Where did go the idea of integrating gegl as the library driving compositor processing(originally 1 of durian targets?)? Btw, gegl just released a new version 0.1.4 Will it be harder to develop nodes for the tile based system than now? will it still be possible to write non-tile based nodes, or non-opencl nodes? Thanks Vilem ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
thx for answering to my blog post via your proposal, to answer some of your questions there : *expression py* - only because it is User/Artist oriented. While python is great for doing this kind of stuff and pretty popular to most people, I'm not so sure about openCL language. by the way this is not a way to make new node, it is just a node which can control some parameter or datas in your comp. Look at what is done with Expression in AE : http://www.videocopilot.net/basic/tutorials/09.Expressions/ I don't think it does need OCL power to do this kind of thing. Probably more for the Nuke kind of Expression node because it can be do some pixel processing, but then it is just a wrapper ? Maybe on a programmer stand point of view it needs to be openCL or whatever... maybe not for the front end user. IMO this needs to be consistent with the rest of the scripting language in Blender. Again production tool :) *custom passes* are not mask, they are just render passes (normal, P pass, vector pass... ), but more on a 3d render side rather than the compositor. *masks* if you refer to the Addon RotoBezier, then yes it is still to be done IMO. this should be a native tool with all the features that comes with. Probably a new node any way. As I said RotoBezier is a great work around in the mean time, but not a production tool at all. *openFX* please pretty please :D F 2011/1/16 Erwin Coumans erwin.coum...@gmail.com Bullet uses its own MiniCL fallback, it requires no external references, the main issue is that it is not a full OpenCL implementation (no barriers yet etc). We developed MiniCL primarily for debugging and secondary to run the Bullet OpenCL kernels on platforms that lack an OpenCL implementation. The Intel and AMD OpenCL drivers for CPU perform similar to regular multi threaded code (pthreads, openpm etc) but it is more suitable for data parallel problems and not for complex code with many branches. So while you can port a compositor or cloth simulation to OpenCL, most general purpose code requires large refactoring and simplification causing reduced quality, so don't expect miracles. Still, it will be fun to see compositing, physics simulation etc in Blender being accelerated through OpenCL, optionally. Thanks, Erwin On Jan 16, 2011, at 5:34 AM, Jeroen Bakker j.bak...@atmind.nl wrote: On 01/15/2011 03:55 PM, (Ry)akiotakis (An)tonis wrote: On 15 January 2011 09:19, Matt Ebbm...@mke3.net wrote: While I can believe that there will be dedicated online farms set up for this sort of thing I was more referring to farms in animation studios, most of which are not designed around GPU power - now, and nor probably for a while in the future. Even imagining if in the future blender uses openCL heavily, if a studio has not designed a farm specifically for blender (which is quite rare), CPU performance will continue to be very important. I'm curious how openCL translates to CPU multiprocessing performance, especially in comparison with using something like blender's existing pthread wrapper. cheers, Matt ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers I have to disagree on that. Almost every 'serious' user today has an OpenCL capable GPU and they can benefit from an OpenCL implementation. Besides OpenCL allows for utilization of both CPU and GPU at the same time. It's not as if it sets a restriction on CPUs. In my understanding the issue is that internal renderfarms have no 'OpenCL' capable GPU (yet). It is not an issue on the user side. Like during durian, we have workstations with medium gpu's and only cpu based renderfarm. The question is how would a cpu-based renderfarm benefit from opencl? Users on the otherhand have different issues. Our user population also have non OpenCL capable hardware/OS's. therefore we still need a full CPU-based fallback or the bulletsolution by implementing an own opencl driver. The bullet solution is complicated in our situation as it needs a lot of external references (compilers, linkers, loaders etc) Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers -- François Tarlier www.francois-tarlier.com www.linkedin.com/in/francoistarlier ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
I would just like to chime in on this proposal with my personal experience developing in OpenCL for use in the Blender Game Engine. As has been pointed out, not everything can be sped up with OpenCL, and because it supports multiple device architectures, a code optimized for the GPU won't run fast on the CPU. Then there is the question of user's having the hardware to even run it, necessitating a CPU only fall-back. With all these factors one might ask, is it worth it? I personally think it is very well worth it, especially if it is viewed as an optional accelerator rather than wholesale algorithm replacement. The speed benefits for the highly parallelizable problems already mentioned such as compositing/filters as well as physics such as particle systems (plug: http://enja.org/2010/12/16/particles-in-bge-fluids-in-real-time-with-opencl/ ) are very convincing. There is a lot of research going into GPU computing for CG applications, and NVIDIA is pushing CUDA hard. While Blender won't adopt a proprietary solution such as CUDA, many of the algorithms and techniques developed for it can be translated to OpenCL. I'm excited about this proposal not because I want faster compositing, but because it sets up a framework for dealing with OpenCL in a sane way inside Blender. I'm currently developing my library standalone and linking it to Blender, using my own OpenCL wrappers around the Khronos ones. As I learn more about the Blender codebase, as well as look to Bullet I am dismayed by my own code's fragility. Sure it runs fast on the machines I've tested but I do not trust it to be in a consumer facing application for a while. As a student and a researcher I'm compelled to spend most of my time developing the algorithm and as much as I'd like to integrate my code cleanly it will be a while before that can happen. This proposal would give me as a developer a better platform for contributing directly to Blender, as well as a central location for me to put any effort into standardizing an OpenCL interface based on my experience with it. Furthermore, as other developers start to accelerate their code we will need a solid way of managing device resources and avoid redundant or competing memory transfers. With the new architectures coming out, the prevalence of capable GPUs and the increasingly sophisticated algorithms available I think OpenCL is going to be essential. I'd like to throw what little weight I have behind this proposal along with my 2 cents :) Ian Hi all, The last few months I have worked hard on a the proposal of the OpenCL based compositor. Currently the proposal is ready that it is clear how the solution should work and what the impact is. As the proposal is on the technical level the end-user won't feel a difference, except for a fast tile based compositor system. In functionality it should be the same. There are 2 aspects that will be solved: * Tiled based compositing * OpenCL compositing To implement these I will introduce additional components: * Tiled based memory manager * Node (pre-)compiler * Configurable automatically data conversion for compositor node systems * OpenCL driver manager * OpenCL configuration screen * Some debug information: * OpenCL program, performance etc. * Execution tree (including data types, resolution and kernelgrouping) * Visualizing tiles needed for calculation of an area. And introduce several new data types * Kernels and KernelGroup * Camera data type * Various color data types I have put all the documents on a project-website for review. As the proposal is quite long and complex. (all decisions are connected with each other.) Please use bf-committers or #blendercoders to discuss the proposal also if something is not clear. http://ocl.atmind.nl/doku.php?id=design:proposal:compositor-redesign Cheers, Jeroen Bakker -- Ian Johnson http://enja.org ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On 01/15/2011 03:55 PM, (Ry)akiotakis (An)tonis wrote: On 15 January 2011 09:19, Matt Ebbm...@mke3.net wrote: While I can believe that there will be dedicated online farms set up for this sort of thing I was more referring to farms in animation studios, most of which are not designed around GPU power - now, and nor probably for a while in the future. Even imagining if in the future blender uses openCL heavily, if a studio has not designed a farm specifically for blender (which is quite rare), CPU performance will continue to be very important. I'm curious how openCL translates to CPU multiprocessing performance, especially in comparison with using something like blender's existing pthread wrapper. cheers, Matt ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers I have to disagree on that. Almost every 'serious' user today has an OpenCL capable GPU and they can benefit from an OpenCL implementation. Besides OpenCL allows for utilization of both CPU and GPU at the same time. It's not as if it sets a restriction on CPUs. In my understanding the issue is that internal renderfarms have no 'OpenCL' capable GPU (yet). It is not an issue on the user side. Like during durian, we have workstations with medium gpu's and only cpu based renderfarm. The question is how would a cpu-based renderfarm benefit from opencl? Users on the otherhand have different issues. Our user population also have non OpenCL capable hardware/OS's. therefore we still need a full CPU-based fallback or the bulletsolution by implementing an own opencl driver. The bullet solution is complicated in our situation as it needs a lot of external references (compilers, linkers, loaders etc) Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Bullet uses its own MiniCL fallback, it requires no external references, the main issue is that it is not a full OpenCL implementation (no barriers yet etc). We developed MiniCL primarily for debugging and secondary to run the Bullet OpenCL kernels on platforms that lack an OpenCL implementation. The Intel and AMD OpenCL drivers for CPU perform similar to regular multi threaded code (pthreads, openpm etc) but it is more suitable for data parallel problems and not for complex code with many branches. So while you can port a compositor or cloth simulation to OpenCL, most general purpose code requires large refactoring and simplification causing reduced quality, so don't expect miracles. Still, it will be fun to see compositing, physics simulation etc in Blender being accelerated through OpenCL, optionally. Thanks, Erwin On Jan 16, 2011, at 5:34 AM, Jeroen Bakker j.bak...@atmind.nl wrote: On 01/15/2011 03:55 PM, (Ry)akiotakis (An)tonis wrote: On 15 January 2011 09:19, Matt Ebbm...@mke3.net wrote: While I can believe that there will be dedicated online farms set up for this sort of thing I was more referring to farms in animation studios, most of which are not designed around GPU power - now, and nor probably for a while in the future. Even imagining if in the future blender uses openCL heavily, if a studio has not designed a farm specifically for blender (which is quite rare), CPU performance will continue to be very important. I'm curious how openCL translates to CPU multiprocessing performance, especially in comparison with using something like blender's existing pthread wrapper. cheers, Matt ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers I have to disagree on that. Almost every 'serious' user today has an OpenCL capable GPU and they can benefit from an OpenCL implementation. Besides OpenCL allows for utilization of both CPU and GPU at the same time. It's not as if it sets a restriction on CPUs. In my understanding the issue is that internal renderfarms have no 'OpenCL' capable GPU (yet). It is not an issue on the user side. Like during durian, we have workstations with medium gpu's and only cpu based renderfarm. The question is how would a cpu-based renderfarm benefit from opencl? Users on the otherhand have different issues. Our user population also have non OpenCL capable hardware/OS's. therefore we still need a full CPU-based fallback or the bulletsolution by implementing an own opencl driver. The bullet solution is complicated in our situation as it needs a lot of external references (compilers, linkers, loaders etc) Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
On 01/15/2011 08:19 AM, Matt Ebb wrote: Thanks, Jeroen. On Sat, Jan 15, 2011 at 6:04 PM, Jeroen Bakkerj.bak...@atmind.nl wrote: Farms are already being migrated to OpenCL farms. As they are cheaper in hardware costs. BTW. renderfarm.fi should be capable of running OpenCL as this is proposal is implemented! While I can believe that there will be dedicated online farms set up for this sort of thing I was more referring to farms in animation studios, most of which are not designed around GPU power - now, and nor probably for a while in the future. Even imagining if in the future blender uses openCL heavily, if a studio has not designed a farm specifically for blender (which is quite rare), CPU performance will continue to be very important. I'm curious how openCL translates to CPU multiprocessing performance, especially in comparison with using something like blender's existing pthread wrapper. Thanks for your insight. If you only have OpenCL for CPU (AMD, Intel) it is hard to determine the results. 1. The OpenCL code is compiled to native code and executed as a shared library. The code itself should run without speed loss. 2. You have the overhead of the task scheduler in the OpenCL driver. - Speed decreases 3. You can utilize your hardware better - increase of speed. If you compare non OpenCL CPU with OpenCL CPU, you really can't say anything about which one is faster, because your comparing two different styles and implementations. It depends on the actual implementation that you will test. Currently AMD has better support for openCL, but in the short future Intel will be up to speed. Also bullet physics engine 3.x will uses OpenCL and as most studios have bullet somewhere... Still I have not yet tested it. Jeroen ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Maybe an interesting comparison could be smalluxGPU, since it has both cpu only and cpu-opencl enabled, so you can compare performance on cpu with both ways with getting similar results, although in raytracing. Původní zpráva Od: Matt Ebb m...@mke3.net Předmět: Re: [Bf-committers] Proposal: Blender OpenCL compositor Datum: 15.1.2011 08:19:25 Thanks, Jeroen. On Sat, Jan 15, 2011 at 6:04 PM, Jeroen Bakker j.bak...@atmind.nl wrote: Farms are already being migrated to OpenCL farms. As they are cheaper in hardware costs. BTW. renderfarm.fi should be capable of running OpenCL as this is proposal is implemented! While I can believe that there will be dedicated online farms set up for this sort of thing I was more referring to farms in animation studios, most of which are not designed around GPU power - now, and nor probably for a while in the future. Even imagining if in the future blender uses openCL heavily, if a studio has not designed a farm specifically for blender (which is quite rare), CPU performance will continue to be very important. I'm curious how openCL translates to CPU multiprocessing performance, especially in comparison with using something like blender's existing pthread wrapper. cheers, Matt ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
While some of the GPU based stuff nowadays looks very spectacular, I personally still feel hesitant - I don't think CPUs (and especially multiprocessing) should be left by the wayside. Not only due to the increasing prevalence of multicore systems nowadays, but also for render farms, which are very largely CPU based. cheers Matt Yes, but for how long will that remain true?? http://www.tomshardware.com/news/nvda-china-super-computer-gpu,11545.html Douglas E Knapp Creative Commons Film Group, Helping people make open source movies with open source software! http://douglas.bespin.org/CommonsFilmGroup/phpBB3/index.php Massage in Gelsenkirchen-Buer: http://douglas.bespin.org/tcm/ztab1.htm Please link to me and trade links with me! Open Source Sci-Fi mmoRPG Game project. http://sf-journey-creations.wikispot.org/Front_Page http://code.google.com/p/perspectiveproject/ ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Beside, OpenCL does not specially mean GPU. OpenCL can be executed by CPU and even be accelerated by multiple CPU/Cores 2011/1/14 Knapp magick.c...@gmail.com While some of the GPU based stuff nowadays looks very spectacular, I personally still feel hesitant - I don't think CPUs (and especially multiprocessing) should be left by the wayside. Not only due to the increasing prevalence of multicore systems nowadays, but also for render farms, which are very largely CPU based. cheers Matt Yes, but for how long will that remain true?? http://www.tomshardware.com/news/nvda-china-super-computer-gpu,11545.html Douglas E Knapp Creative Commons Film Group, Helping people make open source movies with open source software! http://douglas.bespin.org/CommonsFilmGroup/phpBB3/index.php Massage in Gelsenkirchen-Buer: http://douglas.bespin.org/tcm/ztab1.htm Please link to me and trade links with me! Open Source Sci-Fi mmoRPG Game project. http://sf-journey-creations.wikispot.org/Front_Page http://code.google.com/p/perspectiveproject/ ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
And next gen cpus are incorporating the arch from what i read Sent from my iPhone On Jan 14, 2011, at 8:17 AM, Xavier Thomas xavier.thomas.1...@gmail.com wrote: Beside, OpenCL does not specially mean GPU. OpenCL can be executed by CPU and even be accelerated by multiple CPU/Cores 2011/1/14 Knapp magick.c...@gmail.com While some of the GPU based stuff nowadays looks very spectacular, I personally still feel hesitant - I don't think CPUs (and especially multiprocessing) should be left by the wayside. Not only due to the increasing prevalence of multicore systems nowadays, but also for render farms, which are very largely CPU based. cheers Matt Yes, but for how long will that remain true?? http://www.tomshardware.com/news/nvda-china-super-computer-gpu,11545.html Douglas E Knapp Creative Commons Film Group, Helping people make open source movies with open source software! http://douglas.bespin.org/CommonsFilmGroup/phpBB3/index.php Massage in Gelsenkirchen-Buer: http://douglas.bespin.org/tcm/ztab1.htm Please link to me and trade links with me! Open Source Sci-Fi mmoRPG Game project. http://sf-journey-creations.wikispot.org/Front_Page http://code.google.com/p/perspectiveproject/ ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
From a user's perspective, it seems a few node (blur, defocus, vector blur) is responsible for over 90% of the node-compositing time in a real production, accelerating these will probably have a far larger impact/effort ratio than overhauling the entire framework. also, sorry about the LinkedIn spam earlier today. -mike pan On Fri, Jan 14, 2011 at 11:27 AM, Roger Wickes From IPhone rogerwic...@yahoo.com wrote: And next gen cpus are incorporating the arch from what i read Sent from my iPhone On Jan 14, 2011, at 8:17 AM, Xavier Thomas xavier.thomas.1...@gmail.com wrote: Beside, OpenCL does not specially mean GPU. OpenCL can be executed by CPU and even be accelerated by multiple CPU/Cores 2011/1/14 Knapp magick.c...@gmail.com While some of the GPU based stuff nowadays looks very spectacular, I personally still feel hesitant - I don't think CPUs (and especially multiprocessing) should be left by the wayside. Not only due to the increasing prevalence of multicore systems nowadays, but also for render farms, which are very largely CPU based. cheers Matt Yes, but for how long will that remain true?? http://www.tomshardware.com/news/nvda-china-super-computer-gpu,11545.html Douglas E Knapp Creative Commons Film Group, Helping people make open source movies with open source software! http://douglas.bespin.org/CommonsFilmGroup/phpBB3/index.php Massage in Gelsenkirchen-Buer: http://douglas.bespin.org/tcm/ztab1.htm Please link to me and trade links with me! Open Source Sci-Fi mmoRPG Game project. http://sf-journey-creations.wikispot.org/Front_Page http://code.google.com/p/perspectiveproject/ ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Thanks, Jeroen. On Sat, Jan 15, 2011 at 6:04 PM, Jeroen Bakker j.bak...@atmind.nl wrote: Farms are already being migrated to OpenCL farms. As they are cheaper in hardware costs. BTW. renderfarm.fi should be capable of running OpenCL as this is proposal is implemented! While I can believe that there will be dedicated online farms set up for this sort of thing I was more referring to farms in animation studios, most of which are not designed around GPU power - now, and nor probably for a while in the future. Even imagining if in the future blender uses openCL heavily, if a studio has not designed a farm specifically for blender (which is quite rare), CPU performance will continue to be very important. I'm curious how openCL translates to CPU multiprocessing performance, especially in comparison with using something like blender's existing pthread wrapper. cheers, Matt ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
Hi, I do think having openCL support is very cool for some nodes as I also believe it should be build on strong foundation. And as we discussed at several point, Blender Compositor structure is getting a bit old (Matt did mention some of the low level issues here : http://www.francois-tarlier.com/blog/blender-vfx-wish-list-features/) All I'm saying is should so much effort should be put into it right now, while some lower issue are still there to change. Or does that won't change anything ? Yes OpenCL will accelerate the compositor as it is, but isn't it a false solution to a bigger problem ? if so, no matter what openCL will always make it faster, I'm just afraid this will be taken as a solution solver to the slow side of the compositor (which IMO it is not) As a user I would prefer better performance, rather than faster (which doesn't have to be the same thing). Anyhow since I don't know much about it, it is just a reflexion. thx F 2011/1/12 Sean Olson seanol...@gmail.com You should probably put who you are and what your qualifications are on the donation site as well. I stumbled on the site through a twitter link initially and had no idea who was doing the project. -Sean On Wed, Jan 12, 2011 at 10:07 AM, Jeroen Bakker j.bak...@atmind.nl wrote: Hi all, The last few months I have worked hard on a the proposal of the OpenCL based compositor. Currently the proposal is ready that it is clear how the solution should work and what the impact is. As the proposal is on the technical level the end-user won't feel a difference, except for a fast tile based compositor system. In functionality it should be the same. There are 2 aspects that will be solved: * Tiled based compositing * OpenCL compositing To implement these I will introduce additional components: * Tiled based memory manager * Node (pre-)compiler * Configurable automatically data conversion for compositor node systems * OpenCL driver manager * OpenCL configuration screen * Some debug information: * OpenCL program, performance etc. * Execution tree (including data types, resolution and kernelgrouping) * Visualizing tiles needed for calculation of an area. And introduce several new data types * Kernels and KernelGroup * Camera data type * Various color data types I have put all the documents on a project-website for review. As the proposal is quite long and complex. (all decisions are connected with each other.) Please use bf-committers or #blendercoders to discuss the proposal also if something is not clear. http://ocl.atmind.nl/doku.php?id=design:proposal:compositor-redesign Cheers, Jeroen Bakker ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers -- ||-- Instant Messengers -- || ICQ at 11133295 || AIM at shatterstar98 || MSN Messenger at shatte...@hotmail.com || Yahoo Y! at the_7th_samuri ||-- ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers -- François Tarlier www.francois-tarlier.com www.linkedin.com/in/francoistarlier ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
[Bf-committers] Proposal: Blender OpenCL compositor
Hi all, The last few months I have worked hard on a the proposal of the OpenCL based compositor. Currently the proposal is ready that it is clear how the solution should work and what the impact is. As the proposal is on the technical level the end-user won't feel a difference, except for a fast tile based compositor system. In functionality it should be the same. There are 2 aspects that will be solved: * Tiled based compositing * OpenCL compositing To implement these I will introduce additional components: * Tiled based memory manager * Node (pre-)compiler * Configurable automatically data conversion for compositor node systems * OpenCL driver manager * OpenCL configuration screen * Some debug information: * OpenCL program, performance etc. * Execution tree (including data types, resolution and kernelgrouping) * Visualizing tiles needed for calculation of an area. And introduce several new data types * Kernels and KernelGroup * Camera data type * Various color data types I have put all the documents on a project-website for review. As the proposal is quite long and complex. (all decisions are connected with each other.) Please use bf-committers or #blendercoders to discuss the proposal also if something is not clear. http://ocl.atmind.nl/doku.php?id=design:proposal:compositor-redesign Cheers, Jeroen Bakker ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers
Re: [Bf-committers] Proposal: Blender OpenCL compositor
You should probably put who you are and what your qualifications are on the donation site as well. I stumbled on the site through a twitter link initially and had no idea who was doing the project. -Sean On Wed, Jan 12, 2011 at 10:07 AM, Jeroen Bakker j.bak...@atmind.nl wrote: Hi all, The last few months I have worked hard on a the proposal of the OpenCL based compositor. Currently the proposal is ready that it is clear how the solution should work and what the impact is. As the proposal is on the technical level the end-user won't feel a difference, except for a fast tile based compositor system. In functionality it should be the same. There are 2 aspects that will be solved: * Tiled based compositing * OpenCL compositing To implement these I will introduce additional components: * Tiled based memory manager * Node (pre-)compiler * Configurable automatically data conversion for compositor node systems * OpenCL driver manager * OpenCL configuration screen * Some debug information: * OpenCL program, performance etc. * Execution tree (including data types, resolution and kernelgrouping) * Visualizing tiles needed for calculation of an area. And introduce several new data types * Kernels and KernelGroup * Camera data type * Various color data types I have put all the documents on a project-website for review. As the proposal is quite long and complex. (all decisions are connected with each other.) Please use bf-committers or #blendercoders to discuss the proposal also if something is not clear. http://ocl.atmind.nl/doku.php?id=design:proposal:compositor-redesign Cheers, Jeroen Bakker ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers -- ||-- Instant Messengers -- || ICQ at 11133295 || AIM at shatterstar98 || MSN Messenger at shatte...@hotmail.com || Yahoo Y! at the_7th_samuri ||-- ___ Bf-committers mailing list Bf-committers@blender.org http://lists.blender.org/mailman/listinfo/bf-committers