Re: [Mesa3d-dev] Rationale of gallium-resources changes?
On Sat, Mar 27, 2010 at 5:34 PM, Luca Barbieri wrote: > Having drivers capable of doing "send-to-hardware-and-forget-about-it" > on arbitrary state setting could be a nice thing instead, but > unfortunately a lot of hardware fundamentally can't do this, since for > instance: > 1. Shaders need to be all seen to be linked, possibly modifying the > shaders themselves (nv30) > 2. Constants need to be written directly into the fragment program (nv30-nv40) > 3. Fragment programs depend on the viewport to implement > fragment.position (r300) > 4. Fragment programs depend on bound textures to specify normalization > type and emulate NPOT (r300, r600?, nv30) > and so on... > 5. Sometimes sampler state and textures must be seen together since > the hardware mixes it To be fair, this is all "old hardware sucks at new APIs." We're stretching a bit with r300 and nv30, hardware never really meant for this kind of generalized pluggable pipeline setup. r500 and nv40 are better, but it's not until r600 and nv50 that we really are completely unburdened from all of this old suckage. That's life, unfortunately. Also, I'm sure there's always going to be hardware that has quirks, regardless of the set of functionality we expose. We're just going to have to aim for the biggest common subsets plus the least painful way of adding the full pipeline features. Gallium thankfully no longer resembles its original target too much, but it's become a reasonable abstraction. -- When the facts change, I change my mind. What do you do, sir? ~ Keynes Corbin Simpson -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev
Re: [Mesa3d-dev] Rationale of gallium-resources changes?
> To me this speaks to another aspect of the gallium interface which is > a bit odd -- in particular the way several of our interfaces basically > copy their inputs into a structure and pass that back to the state > tracker. Why are we doing that? The state tracker already knows what > it asked us to do, and there is no reason to assume that it needs us > to re-present that information back to it. Yes, only the CSOs don't have this form of copying: all other structures include the input parameters there. As a random example pipe_sampler_view has the lots of parameters that a driver would have converter into the hardware format and are thus redundant, and unlikely to be read by state tracker. Textures and buffers also have many visible data members that the state tracker may or may not read. In particular, the Mesa state tracker already keeps everything in the Mesa internal structures, and so benefits little from such data We may want to consider going toward making _all_ Gallium structures opaque (and, by the way, using declared-only structs instead of void* like we do for CSOs, which are not checkable by the compiler). Another serious data duplication issue are drivers that just copy the input state in internal structures and return, to then process everything in draw calls. This usually results in state being duplicated (and copied) 3 times: in Mesa internal structures, in the state tracker structures and then in the driver. The draw module may also keep a 4th copy of the state. Note that when reference counting is involved, copies are even more expensive since they now need atomic operations. Usually drivers do this because: 1. They need to pass data to the draw module in case of fallbacks, and thus cannot send it to hardware and forget about it 2. They need to recreate the whole hardware context state in some cases 3. They multiplex multiple pipe_contexts on a single screen 4. They need a global view of state, rather than a single state change at a time, to decide what to do A possible solution is to remove all set_* and bind_* calls and replace them with data members of pipe_context that the state tracker would use instead of its own internal structures. In addition, and a new "what's new" bitmask would be added, and the driver would check it on draw calls. Performance-wise, this replaces num_state_changes dynamic function calls to the driver, with (log2(total_states) + num_state_changes) branches to check the "what's new" bitmask. Furthermore: 1. State is never copied, since the state tracker constructs it in place 2. There is no longer any need for "state save helper" in the blitter module and similar 3. The draw module can potentially directly read state from pipe_context instead of duplicating it yet a 4. Drivers no longer need to have all the functions that store the parameters, set a dirty flag and return Note that the Direct3D DDI does not do this, but they have to keep binary compatibility, which is easier with Set* calls than this scheme. softpipe, nvfx, nv50, r300 and probably others already do this internally, and having the state tracker itself construct the data would remove a lot of redundant copying code and increase performance. Having drivers capable of doing "send-to-hardware-and-forget-about-it" on arbitrary state setting could be a nice thing instead, but unfortunately a lot of hardware fundamentally can't do this, since for instance: 1. Shaders need to be all seen to be linked, possibly modifying the shaders themselves (nv30) 2. Constants need to be written directly into the fragment program (nv30-nv40) 3. Fragment programs depend on the viewport to implement fragment.position (r300) 4. Fragment programs depend on bound textures to specify normalization type and emulate NPOT (r300, r600?, nv30) and so on... 5. Sometimes sampler state and textures must be seen together since the hardware mixes it > The only really new information provided by the driver to the state > tracker by transfer_create + transfer_map is: > - the pointer to the data > - stride > - slice stride There is also the 3D box, unless transfers start covering the whole resource, which seems really suboptimal for stuff like glTexSubImage. This needs to be provided to the driver unless a buffer-specialized interface is made (then a 1D box is enough). > Thanks for the summary. I'd add that there is also some information > available publicly about the D3D10 DDI, which follows a slightly > different interface to the API. In that world, there is a single > create resource function: It is indeed extremely interesting, and it looks like it should be the first place to look for inspiration for Gallium interface. I added a comparison of the D3D11 DDI and Gallium to src/gallium/docs. > There is however clearly concern about the possible need for > specialized transfer mechanisms for particular buffer types. It seems > like they've taken an approach that leaves the choice to the driver > whether to s
Re: [Mesa3d-dev] Rationale of gallium-resources changes?
On Wed, Mar 24, 2010 at 9:20 PM, Luca Barbieri wrote: > Thanks for providing a long insightful reply. > >>> Transfers can then be split in "texture transfers" and "buffer transfers". >>> Note that they are often inherently different, since one often uses >>> memcpy-like GPU functionality, and the other often uses 2D blitter or >>> 3D engine functionality (and needs to worry about swizzling or tiling) >>> Thus, they are probably better split and not unified. >> >> My experience is that there is more in common than different about the >> paths. There are the same set of constraints about not wanting to >> stall the GPU by mapping the underlying storage directly if it is >> still in flight, and allocating a dma buffer for the upload if it is. >> There will always be some differences, but probably no more than the >> differences between uploading to eg a constant buffer and a vertex >> buffer, or uploading to a swizzled and linear texture. > > The considerations you mentioned are indeed common between buffers and > textures, but the actual mechanisms for performing the copy are often > significantly different. > > For instance, r300g ends up calling the 3D engine via > surface_copy->util_blitter for texture transfers, which I suppose it > wouldn't do for buffer transfers. > > nv30/nv40 don't have a single way to deal with swizzled textures, and > the driver must choose between many paths depending on whether the > source/destination is swizzled or not, a 3D texture or not, and even > its alignment or pitch (the current driver doesn't do fully that, and > is partially broken for this reason). > Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT. > > nv50 does indeed have a common copy functionality that can handle all > buffers and textures in a unified way (implemented as a revamped > MEMORY_TO_MEMORY_FORMAT). > However, an additional buffer-only path would surely be faster than > going through the common texture path. > In particular, for buffers tile_flags are always 0 and height is > always 1, allowing to write a significantly simplified buffer-only > version of nv50_transfer_rect_m2mf with no branches and no > multiplications at all. > > In other words, I think most drivers would be better off implementing > unified transfers with an "if" switching between a buffer and a > texture path, so it may be worth using two interfaces. > > Also note that a buffer-only interface is significantly simplified > since you don't need to specify: > - face > - level > - zslice > - y > - height > - z > - depth > - stride > - slice stride > > While this may seem a micro-optimization, note that 3D applications > often spend all the time running the OpenGL driver and Mesa/Gallium > functions are already too heavy in profiles, so I think it's important > to always keep CPU performance in mind. > > The code is also streamlined and easier to follow if it does not have > to default-initialize a lot of stuff. > > An utility function calling the right interface can be created for > state trackers that really need it (maybe Direct3D10, if the driver > interface follows the user API). I take your point, though I should point out you've double-counted z and zslice, and face+level are one dword. To me this speaks to another aspect of the gallium interface which is a bit odd -- in particular the way several of our interfaces basically copy their inputs into a structure and pass that back to the state tracker. Why are we doing that? The state tracker already knows what it asked us to do, and there is no reason to assume that it needs us to re-present that information back to it. The only really new information provided by the driver to the state tracker by transfer_create + transfer_map is: - the pointer to the data - stride - slice stride If the transfer functions ended up just passing this data back, it would reduce the overhead across the board. Your point is still valid that the last two will be zero for buffer transfers, though. > >> In DX they have >> different nomenclature for this - the graphics API level entities are >> resources and the underlying VMM buffers are labelled as allocations. >> In gallium, we're exposing the resource concept, but allocations are >> driver-internal entities, usually called winsys_buffers, or some >> similar name. > > D3D10 uses buffers, sampler views and render target views as entities > bindable to the pipeline, and the latter are constructed over either > textures or buffers. > Note however, that the "description structure" is actually different > in the buffer and texture cases. > > For render target views, they are respectively D3D10_BUFFER_RTV and > D3D10_TEX2D_RTV (and others for other texture types). > The first specifies an offset and stride, while the second specifies a > mipmap level. > Other views have similar behavior. > Buffers are directly used in the interfaces that allow binding > vertex/index/constant buffers. > > Both buffers and textures are subclass
Re: [Mesa3d-dev] Rationale of gallium-resources changes?
Thanks for providing a long insightful reply. >> Transfers can then be split in "texture transfers" and "buffer transfers". >> Note that they are often inherently different, since one often uses >> memcpy-like GPU functionality, and the other often uses 2D blitter or >> 3D engine functionality (and needs to worry about swizzling or tiling) >> Thus, they are probably better split and not unified. > > My experience is that there is more in common than different about the > paths. There are the same set of constraints about not wanting to > stall the GPU by mapping the underlying storage directly if it is > still in flight, and allocating a dma buffer for the upload if it is. > There will always be some differences, but probably no more than the > differences between uploading to eg a constant buffer and a vertex > buffer, or uploading to a swizzled and linear texture. The considerations you mentioned are indeed common between buffers and textures, but the actual mechanisms for performing the copy are often significantly different. For instance, r300g ends up calling the 3D engine via surface_copy->util_blitter for texture transfers, which I suppose it wouldn't do for buffer transfers. nv30/nv40 don't have a single way to deal with swizzled textures, and the driver must choose between many paths depending on whether the source/destination is swizzled or not, a 3D texture or not, and even its alignment or pitch (the current driver doesn't do fully that, and is partially broken for this reason). Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT. nv50 does indeed have a common copy functionality that can handle all buffers and textures in a unified way (implemented as a revamped MEMORY_TO_MEMORY_FORMAT). However, an additional buffer-only path would surely be faster than going through the common texture path. In particular, for buffers tile_flags are always 0 and height is always 1, allowing to write a significantly simplified buffer-only version of nv50_transfer_rect_m2mf with no branches and no multiplications at all. In other words, I think most drivers would be better off implementing unified transfers with an "if" switching between a buffer and a texture path, so it may be worth using two interfaces. Also note that a buffer-only interface is significantly simplified since you don't need to specify: - face - level - zslice - y - height - z - depth - stride - slice stride While this may seem a micro-optimization, note that 3D applications often spend all the time running the OpenGL driver and Mesa/Gallium functions are already too heavy in profiles, so I think it's important to always keep CPU performance in mind. The code is also streamlined and easier to follow if it does not have to default-initialize a lot of stuff. An utility function calling the right interface can be created for state trackers that really need it (maybe Direct3D10, if the driver interface follows the user API). > In DX they have > different nomenclature for this - the graphics API level entities are > resources and the underlying VMM buffers are labelled as allocations. > In gallium, we're exposing the resource concept, but allocations are > driver-internal entities, usually called winsys_buffers, or some > similar name. D3D10 uses buffers, sampler views and render target views as entities bindable to the pipeline, and the latter are constructed over either textures or buffers. Note however, that the "description structure" is actually different in the buffer and texture cases. For render target views, they are respectively D3D10_BUFFER_RTV and D3D10_TEX2D_RTV (and others for other texture types). The first specifies an offset and stride, while the second specifies a mipmap level. Other views have similar behavior. Buffers are directly used in the interfaces that allow binding vertex/index/constant buffers. Both buffers and textures are subclasses of ID3D10Resource, which is used by CopyResource, CopySubresourceRegion and UpdateSubresource, which provide a subset of the Gallium transfer functionality in gallium-resources. Note however that the two resources specified to CopyResource and CopySubresourceRegion must be of the same type. So in summary, D3D10 does indeed in some sense go in the buffer/texture unification, but with some important differences: 1. Buffers and textures still exists as separate types. Note that there is no "texture" type, but rather a separate interface for each texture type, which directly inherits from ID3D10Resource 2. Textures are never used directly by the pipeline, but rather through "views" which have texture-type-specific creation methods and have separate interfaces 3. Buffers are directly used by the pipeline for vertex, index and constant buffers 4. Resources are used in copying and transfer functionality 5. D3D10 has a more memory-centric view of resources, providing for instance a D3D10_USAGE_STAGING flag, for "A resource that supports data transfer (copy) fro
Re: [Mesa3d-dev] Rationale of gallium-resources changes?
On Tue, Mar 23, 2010 at 7:26 PM, Luca Barbieri wrote: > What is the rationale behind the gallium-resources changes? Luca, Thanks for the feedback. I posted something describing this a little while ago: http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg11375.html There are a bunch of things pushing us in this direction, but at its most basic it is a recognition that we have two gpu-side objects with very similar operations on them but exposed by the interface in quite different ways. Look for instance at the SVGA driver which has implemented two separate (and fairly complex) non-blocking upload paths for each of these entities. And crucially, we also have APIs starting to blur the line between textures and buffers. In DX10 and 11 in particular, you can perform operations on buffers (like binding as a render-target) which are easier to cope with if we have a unified abstraction. In the past we had some confusion in what a pipe_buffer really is -- is it: a) a GPU-side entity which can be bound to the pipeline? b) a mechanism for CPU/GPU communication - effectively a dma buffer? c) a way of talking about the underlying storage for GPU resources, effectively a linear allocation of VRAM memory? What we're doing in gallium-resources is a unification of textures and the view (a) of buffers as abstract GPU entities. That implies that the roles b) and c) are covered by other entites -- in particular transfers become the generic CPU/GPU communication path, and the underlying concept of winsys buffers (not strictly part of gallium) provides (c). Basically the change unifies all the GPU-side entities under a single parent (resource). The driver is free to implement textures and buffers as one code path or two. For expediency, I've tried to avoid changing the drivers signficantly at this point, which has meant keeping alive the separate texture and buffer implementations and selecting between them with a vtbl. That isn't a strict requirement of the design, just something I've done to avoid rewriting all of the drivers at once on my own... > I couldn't find any and I see several points that could be improved: > 1. How does one do texture_blanket with the gallium-resources API? > That is, how does one bind a buffer as a texture with a specific > format? > 2. Why is there a transfer_inline_write and not a transfer_inline_read? > 3. How about adding a transfer_update_region that would copy data from > the resource to the transfer, just like transfer_inline_write copies > from the transfer to the resource? > 4. How about making transfers be always mapped when alive and removing > transfer_map and transfer_unmap? I think you brought up some of these points up in the followup to my earlier post, eg: http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg11537.html I think your suggestions are good - I had an 'inline_read' initially, and although I took it out, I've been convinced by others that there are actually users for such an interface - so no objections to it coming back in. Similarly I agree there isn't much value in transfer_create/destroy being separate from map and unmap. I see these as additional enhancements beyond getting the basic resource concept up and running which can be done in follow-on work. In reference to texture_blanket - this was actually removed from gallium a little while ago - replaced by the texture_from_handle() and texture_get_handle() interfaces. In this case the 'handle' is a pointer to an operating-system specific entity -- presumably describing the underlying storage. > In addition to these micro-level issues, is the bigger picture > unification of buffers and textures as resources a good idea? I think so, not least because other APIs are moving in this direction and using them somewhat interchangeably. > It will burden all buffer operations with redundant notions of 3D > boxes, strides, formats and texture targets. I'm not sure where you see this, but if there are specific cases where there is a lot of new overhead, we can work to reduce that. > How about instead layering textures over buffers, and exposing the > underlying buffer of a texture, maybe also allowing to dynamically > change it? I think this makes sense for the view of buffers as memory-manager allocations. That works for certain cases, eg native rendering on local machines, but not all uses of gallium can be described that way. We're really positioning the gallium api at a slightly higher abstraction level, to cover both the case where that could work in addition to ones which don't fit that mold. > Then you could create a texture, asking the driver to create a buffer > too, for the normal texture creation case. > You could create a texture with a specified format and layout over an > existing buffer to implement buffer-as-texture, or reinterpret the > underyling buffer of an existing texture as another data format. > You could also create a texture without an underlyin
[Mesa3d-dev] Rationale of gallium-resources changes?
What is the rationale behind the gallium-resources changes? I couldn't find any and I see several points that could be improved: 1. How does one do texture_blanket with the gallium-resources API? That is, how does one bind a buffer as a texture with a specific format? 2. Why is there a transfer_inline_write and not a transfer_inline_read? 3. How about adding a transfer_update_region that would copy data from the resource to the transfer, just like transfer_inline_write copies from the transfer to the resource? 4. How about making transfers be always mapped when alive and removing transfer_map and transfer_unmap? In addition to these micro-level issues, is the bigger picture unification of buffers and textures as resources a good idea? It will burden all buffer operations with redundant notions of 3D boxes, strides, formats and texture targets. How about instead layering textures over buffers, and exposing the underlying buffer of a texture, maybe also allowing to dynamically change it? Then you could create a texture, asking the driver to create a buffer too, for the normal texture creation case. You could create a texture with a specified format and layout over an existing buffer to implement buffer-as-texture, or reinterpret the underyling buffer of an existing texture as another data format. You could also create a texture without an underlying buffer, to find out how large of a buffer you would need for that texture layout. (and whether it is supported). This could be useful for OpenGL texture proxies. For shared textures, you would call buffer_from_handle and then create a texture over it with the desired format/layout. Transfers can then be split in "texture transfers" and "buffer transfers". Note that they are often inherently different, since one often uses memcpy-like GPU functionality, and the other often uses 2D blitter or 3D engine functionality (and needs to worry about swizzling or tiling) Thus, they are probably better split and not unified. Furthermore, in the gallium-resource branch both r300g and nouveau drivers have different internal implementations for buffer and texture transfers (they actually look fundamentally different, not just duplicated code): why not just expose them directly as two separate, more efficient, interfaces, instead of going through a single fat interface, and then a further indirect branch in the driver? In addition transfers could be handled by an auxiliary module that would ask the driver to directly map the texture, and would otherwise create a temporary itself and use a driver-provided buffer_copy or surface_copy manually. Note that many drivers implement transfers this way and this would avoid duplicate code in drivers. transfer_inline_write can also be done by copies from user buffers, or textures layered over user buffers. -- Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev