Thanks for providing a long insightful reply. >> Transfers can then be split in "texture transfers" and "buffer transfers". >> Note that they are often inherently different, since one often uses >> memcpy-like GPU functionality, and the other often uses 2D blitter or >> 3D engine functionality (and needs to worry about swizzling or tiling) >> Thus, they are probably better split and not unified. > > My experience is that there is more in common than different about the > paths. There are the same set of constraints about not wanting to > stall the GPU by mapping the underlying storage directly if it is > still in flight, and allocating a dma buffer for the upload if it is. > There will always be some differences, but probably no more than the > differences between uploading to eg a constant buffer and a vertex > buffer, or uploading to a swizzled and linear texture.
The considerations you mentioned are indeed common between buffers and textures, but the actual mechanisms for performing the copy are often significantly different. For instance, r300g ends up calling the 3D engine via surface_copy->util_blitter for texture transfers, which I suppose it wouldn't do for buffer transfers. nv30/nv40 don't have a single way to deal with swizzled textures, and the driver must choose between many paths depending on whether the source/destination is swizzled or not, a 3D texture or not, and even its alignment or pitch (the current driver doesn't do fully that, and is partially broken for this reason). Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT. nv50 does indeed have a common copy functionality that can handle all buffers and textures in a unified way (implemented as a revamped MEMORY_TO_MEMORY_FORMAT). However, an additional buffer-only path would surely be faster than going through the common texture path. In particular, for buffers tile_flags are always 0 and height is always 1, allowing to write a significantly simplified buffer-only version of nv50_transfer_rect_m2mf with no branches and no multiplications at all. In other words, I think most drivers would be better off implementing unified transfers with an "if" switching between a buffer and a texture path, so it may be worth using two interfaces. Also note that a buffer-only interface is significantly simplified since you don't need to specify: - face - level - zslice - y - height - z - depth - stride - slice stride While this may seem a micro-optimization, note that 3D applications often spend all the time running the OpenGL driver and Mesa/Gallium functions are already too heavy in profiles, so I think it's important to always keep CPU performance in mind. The code is also streamlined and easier to follow if it does not have to default-initialize a lot of stuff. An utility function calling the right interface can be created for state trackers that really need it (maybe Direct3D10, if the driver interface follows the user API). > In DX they have > different nomenclature for this - the graphics API level entities are > resources and the underlying VMM buffers are labelled as allocations. > In gallium, we're exposing the resource concept, but allocations are > driver-internal entities, usually called winsys_buffers, or some > similar name. D3D10 uses buffers, sampler views and render target views as entities bindable to the pipeline, and the latter are constructed over either textures or buffers. Note however, that the "description structure" is actually different in the buffer and texture cases. For render target views, they are respectively D3D10_BUFFER_RTV and D3D10_TEX2D_RTV (and others for other texture types). The first specifies an offset and stride, while the second specifies a mipmap level. Other views have similar behavior. Buffers are directly used in the interfaces that allow binding vertex/index/constant buffers. Both buffers and textures are subclasses of ID3D10Resource, which is used by CopyResource, CopySubresourceRegion and UpdateSubresource, which provide a subset of the Gallium transfer functionality in gallium-resources. Note however that the two resources specified to CopyResource and CopySubresourceRegion must be of the same type. So in summary, D3D10 does indeed in some sense go in the buffer/texture unification, but with some important differences: 1. Buffers and textures still exists as separate types. Note that there is no "texture" type, but rather a separate interface for each texture type, which directly inherits from ID3D10Resource 2. Textures are never used directly by the pipeline, but rather through "views" which have texture-type-specific creation methods and have separate interfaces 3. Buffers are directly used by the pipeline for vertex, index and constant buffers 4. Resources are used in copying and transfer functionality 5. D3D10 has a more memory-centric view of resources, providing for instance a D3D10_USAGE_STAGING flag, for "A resource that supports data transfer (copy) from the GPU to the CPU." D3D11 seems to be unchanged in this respect. So, if we want to go on a DX10-like route, how about: 1. Keeping pipe_buffer and pipe_texture, perhaps with a "pipe_resource base;" field 2. Considering splitting pipe_texture into pipe_texture_2d, pipe_texture_3d, pipe_texture_2d_array, etc. 3. Adding render target views and depth/stencil views, and making those constructible over buffers 4. Having equivalent transfer mechanisms for buffers and textures, but not necessarily unified in a single function 5. Eliminating the concept of pipe_surface, in favor of render target views and explicit subresources in transfer functionality D3D10/11 do not provide a transfer concept, but rather only inline_write/copy mechanisms. They also provide D3D10_USAGE_STAGING resources, which can be used as transfers with explicit copy operations. Resource copying/updating functionality is indeed unified between buffers and textures (using a "box" structure like gallium-resources does). As for the transfer unification, it seems to me they are better kept split, following OpenGL, but it may indeed not be clear without more driver experience. A possible middle ground, given the current status of gallium-resources, could be to keep buffer-specific and texture-specific utility functions for state trackers calling a common interface, and using them where possible. If it turns out that we are very often communicating between a buffer/texture-specific state tracker interface and a buffer/texture-specific driver code (using the vtbl utilities), using an inefficient common interface, it is then easy to directly bridge them later by splitting the Gallium interface. Also, once we have drivers actually supporting efficient memory management (as opposed to the current situation where Radeon and GeForce drivers directly use kernel buffer objects, with terrible performance, and often not paying attention to uncached memory issues, especially for buffers), it may also be clearer whether transfers are a good interface, or should be replaced with user/"staging" buffers and user/"staging" textures with copies (like D3D10 does with D3D10_USAGE_STAGING) ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Mesa3d-dev mailing list Mesa3d-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mesa3d-dev