Thanks for providing a long insightful reply.

>> Transfers can then be split in "texture transfers" and "buffer transfers".
>> Note that they are often inherently different, since one often uses
>> memcpy-like GPU functionality, and the other often uses 2D blitter or
>> 3D engine functionality (and needs to worry about swizzling or tiling)
>> Thus, they are probably better split and not unified.
>
> My experience is that there is more in common than different about the
> paths.  There are the same set of constraints about not wanting to
> stall the GPU by mapping the underlying storage directly if it is
> still in flight, and allocating a dma buffer for the upload if it is.
> There will always be some differences, but probably no more than the
> differences between uploading to eg a constant buffer and a vertex
> buffer, or uploading to a swizzled and linear texture.

The considerations you mentioned are indeed common between buffers and
textures, but the actual mechanisms for performing the copy are often
significantly different.

For instance, r300g ends up calling the 3D engine via
surface_copy->util_blitter for texture transfers, which I suppose it
wouldn't do for buffer transfers.

nv30/nv40 don't have a single way to deal with swizzled textures, and
the driver must choose between many paths depending on whether  the
source/destination is swizzled or not, a 3D texture or not, and even
its alignment or pitch (the current driver doesn't do fully that, and
is partially broken for this reason).
Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT.

nv50 does indeed have a common copy functionality that can handle all
buffers and textures in a unified way (implemented as a revamped
MEMORY_TO_MEMORY_FORMAT).
However, an additional buffer-only path would surely be faster than
going through the common texture path.
In particular, for buffers tile_flags are always 0 and height is
always 1, allowing to write a significantly simplified buffer-only
version of nv50_transfer_rect_m2mf with no branches and no
multiplications at all.

In other words, I think most drivers would be better off implementing
unified transfers with an "if" switching between a buffer and a
texture path, so it may be worth using two interfaces.

Also note that a buffer-only interface is significantly simplified
since you don't need to specify:
- face
- level
- zslice
- y
- height
- z
- depth
- stride
- slice stride

While this may seem a micro-optimization, note that 3D applications
often spend all the time running the OpenGL driver and Mesa/Gallium
functions are already too heavy in profiles, so I think it's important
to always keep CPU performance in mind.

The code is also streamlined and easier to follow if it does not have
to default-initialize a lot of stuff.

An utility function calling the right interface can be created for
state trackers that really need it (maybe Direct3D10, if the driver
interface follows the user API).

> In DX they have
> different nomenclature for this - the graphics API level entities are
> resources and the underlying VMM buffers are labelled as allocations.
> In gallium, we're exposing the resource concept, but allocations are
> driver-internal entities, usually called winsys_buffers, or some
> similar name.

D3D10 uses buffers, sampler views and render target views as entities
bindable to the pipeline, and the latter are constructed over either
textures or buffers.
Note however, that the "description structure" is actually different
in the buffer and texture cases.

For render target views, they are respectively D3D10_BUFFER_RTV and
D3D10_TEX2D_RTV (and others for other texture types).
The first specifies an offset and stride, while the second specifies a
mipmap level.
Other views have similar behavior.

Buffers are directly used in the interfaces that allow binding
vertex/index/constant buffers.

Both buffers and textures are subclasses of ID3D10Resource, which is
used by CopyResource, CopySubresourceRegion and UpdateSubresource,
which provide a subset of the Gallium transfer functionality in
gallium-resources.

Note however that the two resources specified to CopyResource and
CopySubresourceRegion must be of the same type.

So in summary, D3D10 does indeed in some sense go in the
buffer/texture unification, but with some important differences:
1. Buffers and textures still exists as separate types. Note that
there is no "texture" type, but rather a separate interface for each
texture type, which directly inherits from ID3D10Resource
2. Textures are never used directly by the pipeline, but rather
through "views" which have texture-type-specific creation methods and
have separate interfaces
3. Buffers are directly used by the pipeline for vertex, index and
constant buffers
4. Resources are used in copying and transfer functionality
5. D3D10 has a more memory-centric view of resources, providing for
instance a D3D10_USAGE_STAGING flag, for "A resource that supports
data transfer (copy) from the GPU to the CPU."

D3D11 seems to be unchanged in this respect.

So, if we want to go on a DX10-like route, how about:
1. Keeping pipe_buffer and pipe_texture, perhaps with a "pipe_resource
base;" field
2. Considering splitting pipe_texture into pipe_texture_2d,
pipe_texture_3d, pipe_texture_2d_array, etc.
3. Adding render target views and depth/stencil views, and making
those constructible over buffers
4. Having equivalent transfer mechanisms for buffers and textures, but
not necessarily unified in a single function
5. Eliminating the concept of pipe_surface, in favor of render target
views and explicit subresources in transfer functionality

D3D10/11 do not provide a transfer concept, but rather only
inline_write/copy mechanisms.
They also provide D3D10_USAGE_STAGING resources, which can be used as
transfers with explicit copy operations.
Resource copying/updating functionality is indeed unified between
buffers and textures (using a "box" structure like gallium-resources
does).

As for the transfer unification, it seems to me they are better kept
split, following OpenGL, but it may indeed not be clear without more
driver experience.

A possible middle ground, given the current status of
gallium-resources, could be to keep buffer-specific and
texture-specific utility functions for state trackers calling a common
interface, and using them where possible.

If it turns out that we are very often communicating between a
buffer/texture-specific state tracker interface and a
buffer/texture-specific driver code (using the vtbl utilities), using
an inefficient common interface, it is then easy to directly bridge
them later by splitting the Gallium interface.

Also, once we have drivers actually supporting efficient memory
management (as opposed to the current situation where Radeon and
GeForce drivers directly use kernel buffer objects, with terrible
performance, and often not paying attention to uncached memory issues,
especially for buffers), it may also be clearer whether transfers are
a good interface, or should be replaced with user/"staging" buffers
and user/"staging" textures with copies (like D3D10 does with
D3D10_USAGE_STAGING)

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev

Reply via email to