Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-27 Thread Corbin Simpson
On Sat, Mar 27, 2010 at 5:34 PM, Luca Barbieri  wrote:
> Having drivers capable of doing "send-to-hardware-and-forget-about-it"
> on arbitrary state setting could be a nice thing instead, but
> unfortunately a lot of hardware fundamentally can't do this, since for
> instance:
> 1. Shaders need to be all seen to be linked, possibly modifying the
> shaders themselves (nv30)
> 2. Constants need to be written directly into the fragment program (nv30-nv40)
> 3. Fragment programs depend on the viewport to implement
> fragment.position (r300)
> 4. Fragment programs depend on bound textures to specify normalization
> type and emulate NPOT (r300, r600?, nv30)
> and so on...
> 5. Sometimes sampler state and textures must be seen together since
> the hardware mixes it

To be fair, this is all "old hardware sucks at new APIs." We're
stretching a bit with r300 and nv30, hardware never really meant for
this kind of generalized pluggable pipeline setup. r500 and nv40 are
better, but it's not until r600 and nv50 that we really are completely
unburdened from all of this old suckage. That's life, unfortunately.

Also, I'm sure there's always going to be hardware that has quirks,
regardless of the set of functionality we expose. We're just going to
have to aim for the biggest common subsets plus the least painful way
of adding the full pipeline features. Gallium thankfully no longer
resembles its original target too much, but it's become a reasonable
abstraction.

-- 
When the facts change, I change my mind. What do you do, sir? ~ Keynes

Corbin Simpson


--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-27 Thread Luca Barbieri
> To me this speaks to another aspect of the gallium interface which is
> a bit odd -- in particular the way several of our interfaces basically
> copy their inputs into a structure and pass that back to the state
> tracker.  Why are we doing that?  The state tracker already knows what
> it asked us to do, and there is no reason to assume that it needs us
> to re-present that information back to it.

Yes, only the CSOs don't have this form of copying: all other
structures include the input parameters there.

As a random example pipe_sampler_view has the lots of parameters that
a driver would have converter into the hardware format and are thus
redundant, and unlikely to be read by state tracker.

Textures and buffers also have many visible data members that the
state tracker may or may not read.
In particular, the Mesa state tracker already keeps everything in the
Mesa internal structures, and so benefits little from such data

We may want to consider going toward making _all_ Gallium structures
opaque (and, by the way, using declared-only structs instead of void*
like we do for CSOs, which are not checkable by the compiler).


Another serious data duplication issue are drivers that just copy the
input state in internal structures and return, to then process
everything in draw calls.

This usually results in state being duplicated (and copied) 3 times:
in Mesa internal structures, in the state tracker structures and then
in the driver.
The draw module may also keep a 4th copy of the state.
Note that when reference counting is involved, copies are even more
expensive since they now need atomic operations.

Usually drivers do this because:
1. They need to pass data to the draw module in case of fallbacks, and
thus cannot send it to hardware and forget about it
2. They need to recreate the whole hardware context state in some cases
3. They multiplex multiple pipe_contexts on a single screen
4. They need a global view of state, rather than a single state change
at a time, to decide what to do

A possible solution is to remove all set_* and bind_* calls and
replace them with data members of pipe_context that the state tracker
would use instead of its own internal structures.

In addition, and a new "what's new" bitmask would be added, and the
driver would check it on draw calls.

Performance-wise, this replaces num_state_changes dynamic function
calls to the driver, with (log2(total_states) + num_state_changes)
branches to check the "what's new" bitmask.

Furthermore:
1. State is never copied, since the state tracker constructs it in place
2. There is no longer any need for "state save helper" in the blitter
module and similar
3. The draw module can potentially directly read state from
pipe_context instead of duplicating it yet a
4. Drivers no longer need to have all the functions that store the
parameters, set a dirty flag and return

Note that the Direct3D DDI does not do this, but they have to keep
binary compatibility, which is easier with Set* calls than this
scheme.

softpipe, nvfx, nv50, r300 and probably others already do this
internally, and having the state tracker itself construct the data
would remove a lot of redundant copying code and increase performance.

Having drivers capable of doing "send-to-hardware-and-forget-about-it"
on arbitrary state setting could be a nice thing instead, but
unfortunately a lot of hardware fundamentally can't do this, since for
instance:
1. Shaders need to be all seen to be linked, possibly modifying the
shaders themselves (nv30)
2. Constants need to be written directly into the fragment program (nv30-nv40)
3. Fragment programs depend on the viewport to implement
fragment.position (r300)
4. Fragment programs depend on bound textures to specify normalization
type and emulate NPOT (r300, r600?, nv30)
and so on...
5. Sometimes sampler state and textures must be seen together since
the hardware mixes it


> The only really new information provided by the driver to the state
> tracker by transfer_create + transfer_map is:
> - the pointer to the data
> - stride
> - slice stride

There is also the 3D box, unless transfers start covering the whole
resource, which seems really suboptimal for stuff like glTexSubImage.

This needs to be provided to the driver unless a buffer-specialized
interface is made (then a 1D box is enough).


> Thanks for the summary.  I'd add that there is also some information
> available publicly about the D3D10 DDI, which follows a slightly
> different interface to the API.  In that world, there is a single
> create resource function:

It is indeed extremely interesting, and it looks like it should be the
first place to look for inspiration for Gallium interface.

I added a comparison of the D3D11 DDI and Gallium to src/gallium/docs.

> There is however clearly concern about the possible need for
> specialized transfer mechanisms for particular buffer types.  It seems
> like they've taken an approach that leaves the choice to the driver
> whether to s

Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-27 Thread Keith Whitwell
On Wed, Mar 24, 2010 at 9:20 PM, Luca Barbieri  wrote:
> Thanks for providing a long insightful reply.
>
>>> Transfers can then be split in "texture transfers" and "buffer transfers".
>>> Note that they are often inherently different, since one often uses
>>> memcpy-like GPU functionality, and the other often uses 2D blitter or
>>> 3D engine functionality (and needs to worry about swizzling or tiling)
>>> Thus, they are probably better split and not unified.
>>
>> My experience is that there is more in common than different about the
>> paths.  There are the same set of constraints about not wanting to
>> stall the GPU by mapping the underlying storage directly if it is
>> still in flight, and allocating a dma buffer for the upload if it is.
>> There will always be some differences, but probably no more than the
>> differences between uploading to eg a constant buffer and a vertex
>> buffer, or uploading to a swizzled and linear texture.
>
> The considerations you mentioned are indeed common between buffers and
> textures, but the actual mechanisms for performing the copy are often
> significantly different.
>
> For instance, r300g ends up calling the 3D engine via
> surface_copy->util_blitter for texture transfers, which I suppose it
> wouldn't do for buffer transfers.
>
> nv30/nv40 don't have a single way to deal with swizzled textures, and
> the driver must choose between many paths depending on whether  the
> source/destination is swizzled or not, a 3D texture or not, and even
> its alignment or pitch (the current driver doesn't do fully that, and
> is partially broken for this reason).
> Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT.
>
> nv50 does indeed have a common copy functionality that can handle all
> buffers and textures in a unified way (implemented as a revamped
> MEMORY_TO_MEMORY_FORMAT).
> However, an additional buffer-only path would surely be faster than
> going through the common texture path.
> In particular, for buffers tile_flags are always 0 and height is
> always 1, allowing to write a significantly simplified buffer-only
> version of nv50_transfer_rect_m2mf with no branches and no
> multiplications at all.
>
> In other words, I think most drivers would be better off implementing
> unified transfers with an "if" switching between a buffer and a
> texture path, so it may be worth using two interfaces.
>
> Also note that a buffer-only interface is significantly simplified
> since you don't need to specify:
> - face
> - level
> - zslice
> - y
> - height
> - z
> - depth
> - stride
> - slice stride
>
> While this may seem a micro-optimization, note that 3D applications
> often spend all the time running the OpenGL driver and Mesa/Gallium
> functions are already too heavy in profiles, so I think it's important
> to always keep CPU performance in mind.
>
> The code is also streamlined and easier to follow if it does not have
> to default-initialize a lot of stuff.
>
> An utility function calling the right interface can be created for
> state trackers that really need it (maybe Direct3D10, if the driver
> interface follows the user API).

I take your point, though I should point out you've double-counted z
and zslice, and face+level are one dword.

To me this speaks to another aspect of the gallium interface which is
a bit odd -- in particular the way several of our interfaces basically
copy their inputs into a structure and pass that back to the state
tracker.  Why are we doing that?  The state tracker already knows what
it asked us to do, and there is no reason to assume that it needs us
to re-present that information back to it.

The only really new information provided by the driver to the state
tracker by transfer_create + transfer_map is:
- the pointer to the data
- stride
- slice stride

If the transfer functions ended up just passing this data back, it
would reduce the overhead across the board.

Your point is still valid that the last two will be zero for buffer
transfers, though.

>
>> In DX they have
>> different nomenclature for this - the graphics API level entities are
>> resources and the underlying VMM buffers are labelled as allocations.
>> In gallium, we're exposing the resource concept, but allocations are
>> driver-internal entities, usually called winsys_buffers, or some
>> similar name.
>
> D3D10 uses buffers, sampler views and render target views as entities
> bindable to the pipeline, and the latter are constructed over either
> textures or buffers.
> Note however, that the "description structure" is actually different
> in the buffer and texture cases.
>
> For render target views, they are respectively D3D10_BUFFER_RTV and
> D3D10_TEX2D_RTV (and others for other texture types).
> The first specifies an offset and stride, while the second specifies a
> mipmap level.
> Other views have similar behavior.

> Buffers are directly used in the interfaces that allow binding
> vertex/index/constant buffers.
>
> Both buffers and textures are subclass

Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-24 Thread Luca Barbieri
Thanks for providing a long insightful reply.

>> Transfers can then be split in "texture transfers" and "buffer transfers".
>> Note that they are often inherently different, since one often uses
>> memcpy-like GPU functionality, and the other often uses 2D blitter or
>> 3D engine functionality (and needs to worry about swizzling or tiling)
>> Thus, they are probably better split and not unified.
>
> My experience is that there is more in common than different about the
> paths.  There are the same set of constraints about not wanting to
> stall the GPU by mapping the underlying storage directly if it is
> still in flight, and allocating a dma buffer for the upload if it is.
> There will always be some differences, but probably no more than the
> differences between uploading to eg a constant buffer and a vertex
> buffer, or uploading to a swizzled and linear texture.

The considerations you mentioned are indeed common between buffers and
textures, but the actual mechanisms for performing the copy are often
significantly different.

For instance, r300g ends up calling the 3D engine via
surface_copy->util_blitter for texture transfers, which I suppose it
wouldn't do for buffer transfers.

nv30/nv40 don't have a single way to deal with swizzled textures, and
the driver must choose between many paths depending on whether  the
source/destination is swizzled or not, a 3D texture or not, and even
its alignment or pitch (the current driver doesn't do fully that, and
is partially broken for this reason).
Buffers can instead be copied very simply with MEMORY_TO_MEMORY_FORMAT.

nv50 does indeed have a common copy functionality that can handle all
buffers and textures in a unified way (implemented as a revamped
MEMORY_TO_MEMORY_FORMAT).
However, an additional buffer-only path would surely be faster than
going through the common texture path.
In particular, for buffers tile_flags are always 0 and height is
always 1, allowing to write a significantly simplified buffer-only
version of nv50_transfer_rect_m2mf with no branches and no
multiplications at all.

In other words, I think most drivers would be better off implementing
unified transfers with an "if" switching between a buffer and a
texture path, so it may be worth using two interfaces.

Also note that a buffer-only interface is significantly simplified
since you don't need to specify:
- face
- level
- zslice
- y
- height
- z
- depth
- stride
- slice stride

While this may seem a micro-optimization, note that 3D applications
often spend all the time running the OpenGL driver and Mesa/Gallium
functions are already too heavy in profiles, so I think it's important
to always keep CPU performance in mind.

The code is also streamlined and easier to follow if it does not have
to default-initialize a lot of stuff.

An utility function calling the right interface can be created for
state trackers that really need it (maybe Direct3D10, if the driver
interface follows the user API).

> In DX they have
> different nomenclature for this - the graphics API level entities are
> resources and the underlying VMM buffers are labelled as allocations.
> In gallium, we're exposing the resource concept, but allocations are
> driver-internal entities, usually called winsys_buffers, or some
> similar name.

D3D10 uses buffers, sampler views and render target views as entities
bindable to the pipeline, and the latter are constructed over either
textures or buffers.
Note however, that the "description structure" is actually different
in the buffer and texture cases.

For render target views, they are respectively D3D10_BUFFER_RTV and
D3D10_TEX2D_RTV (and others for other texture types).
The first specifies an offset and stride, while the second specifies a
mipmap level.
Other views have similar behavior.

Buffers are directly used in the interfaces that allow binding
vertex/index/constant buffers.

Both buffers and textures are subclasses of ID3D10Resource, which is
used by CopyResource, CopySubresourceRegion and UpdateSubresource,
which provide a subset of the Gallium transfer functionality in
gallium-resources.

Note however that the two resources specified to CopyResource and
CopySubresourceRegion must be of the same type.

So in summary, D3D10 does indeed in some sense go in the
buffer/texture unification, but with some important differences:
1. Buffers and textures still exists as separate types. Note that
there is no "texture" type, but rather a separate interface for each
texture type, which directly inherits from ID3D10Resource
2. Textures are never used directly by the pipeline, but rather
through "views" which have texture-type-specific creation methods and
have separate interfaces
3. Buffers are directly used by the pipeline for vertex, index and
constant buffers
4. Resources are used in copying and transfer functionality
5. D3D10 has a more memory-centric view of resources, providing for
instance a D3D10_USAGE_STAGING flag, for "A resource that supports
data transfer (copy) fro

Re: [Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-24 Thread Keith Whitwell
On Tue, Mar 23, 2010 at 7:26 PM, Luca Barbieri  wrote:
> What is the rationale behind the gallium-resources changes?

Luca,

Thanks for the feedback.  I posted something describing this a little while ago:

http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg11375.html

There are a bunch of things pushing us in this direction, but at its
most basic it is a recognition that we have two gpu-side objects with
very similar operations on them but exposed by the interface in quite
different ways.   Look for instance at the SVGA driver which has
implemented two separate (and fairly complex) non-blocking upload
paths for each of these entities.

And crucially, we also have APIs starting to blur the line between
textures and buffers.  In DX10 and 11 in particular, you can perform
operations on buffers (like binding as a render-target) which are
easier to cope with if we have a unified abstraction.


In the past we had some confusion in what a pipe_buffer really is -- is it:
a) a GPU-side entity which can be bound to the pipeline?
b) a mechanism for CPU/GPU communication - effectively a dma buffer?
c) a way of talking about the underlying storage for GPU resources,
effectively a linear allocation of VRAM memory?

What we're doing in gallium-resources is a unification of textures and
the view (a) of buffers as abstract GPU entities.

That implies that the roles b) and c) are covered by other entites --
in particular transfers become the generic CPU/GPU communication path,
and the underlying concept of winsys buffers (not strictly part of
gallium) provides (c).

Basically the change unifies all the GPU-side entities under a single
parent (resource).  The driver is free to implement textures and
buffers as one code path or two.  For expediency, I've tried to avoid
changing the drivers signficantly at this point, which has meant
keeping alive the separate texture and buffer implementations and
selecting between them with a vtbl.  That isn't a strict requirement
of the design, just something I've done to avoid rewriting all of the
drivers at once on my own...


> I couldn't find any and I see several points that could be improved:
> 1. How does one do texture_blanket with the gallium-resources API?
> That is, how does one bind a buffer as a texture with a specific
> format?
> 2. Why is there a transfer_inline_write and not a transfer_inline_read?
> 3. How about adding a transfer_update_region that would copy data from
> the resource to the transfer, just like transfer_inline_write copies
> from the transfer to the resource?
> 4. How about making transfers be always mapped when alive and removing
> transfer_map and transfer_unmap?

I think you brought up some of these points up in the followup to my
earlier post, eg:

http://www.mail-archive.com/mesa3d-dev@lists.sourceforge.net/msg11537.html

I think your suggestions are good - I had an 'inline_read' initially,
and although I took it out, I've been convinced by others that there
are actually users for such an interface - so no objections to it
coming back in.  Similarly I agree there isn't much value in
transfer_create/destroy being separate from map and unmap.

I see these as additional enhancements beyond getting the basic
resource concept up and running which can be done in follow-on work.

In reference to texture_blanket - this was actually removed from
gallium a little while ago - replaced by the texture_from_handle() and
texture_get_handle() interfaces.  In this case the 'handle' is a
pointer to an operating-system specific entity -- presumably
describing the underlying storage.

> In addition to these micro-level issues, is the bigger picture
> unification of buffers and textures as resources a good idea?

I think so, not least because other APIs are moving in this direction
and using them somewhat interchangeably.

> It will burden all buffer operations with redundant notions of 3D
> boxes, strides, formats and texture targets.

I'm not sure where you see this, but if there are specific cases where
there is a lot of new overhead, we can work to reduce that.

> How about instead layering textures over buffers, and exposing the
> underlying buffer of a texture, maybe also allowing to dynamically
> change it?

I think this makes sense for the view of buffers as memory-manager
allocations.  That works for certain cases, eg native rendering on
local machines, but not all uses of gallium can be described that way.
 We're really positioning the gallium api at a slightly higher
abstraction level, to cover both the case where that could work in
addition to ones which don't fit that mold.

> Then you could create a texture, asking the driver to create a buffer
> too, for the normal texture creation case.
> You could create a texture with a specified format and layout over an
> existing buffer to implement buffer-as-texture, or reinterpret the
> underyling buffer of an existing texture as another data format.
> You could also create a texture without an underlyin

[Mesa3d-dev] Rationale of gallium-resources changes?

2010-03-23 Thread Luca Barbieri
What is the rationale behind the gallium-resources changes?

I couldn't find any and I see several points that could be improved:
1. How does one do texture_blanket with the gallium-resources API?
That is, how does one bind a buffer as a texture with a specific
format?
2. Why is there a transfer_inline_write and not a transfer_inline_read?
3. How about adding a transfer_update_region that would copy data from
the resource to the transfer, just like transfer_inline_write copies
from the transfer to the resource?
4. How about making transfers be always mapped when alive and removing
transfer_map and transfer_unmap?

In addition to these micro-level issues, is the bigger picture
unification of buffers and textures as resources a good idea?
It will burden all buffer operations with redundant notions of 3D
boxes, strides, formats and texture targets.

How about instead layering textures over buffers, and exposing the
underlying buffer of a texture, maybe also allowing to dynamically
change it?

Then you could create a texture, asking the driver to create a buffer
too, for the normal texture creation case.
You could create a texture with a specified format and layout over an
existing buffer to implement buffer-as-texture, or reinterpret the
underyling buffer of an existing texture as another data format.
You could also create a texture without an underlying buffer, to find
out how large of a buffer you would need for that texture layout. (and
whether it is supported). This could be useful for OpenGL texture
proxies.
For shared textures, you would call buffer_from_handle and then create
a texture over it with the desired format/layout.

Transfers can then be split in "texture transfers" and "buffer transfers".
Note that they are often inherently different, since one often uses
memcpy-like GPU functionality, and the other often uses 2D blitter or
3D engine functionality (and needs to worry about swizzling or tiling)
Thus, they are probably better split and not unified.
Furthermore, in the gallium-resource branch both r300g and nouveau
drivers have different internal implementations for buffer and texture
transfers (they actually look fundamentally different, not just
duplicated code): why not just expose them directly as two separate,
more efficient, interfaces, instead of going through a single fat
interface, and then a further indirect branch in the driver?

In addition transfers could be handled by an auxiliary module that
would ask the driver to directly map the texture, and would otherwise
create a temporary itself and use a driver-provided buffer_copy or
surface_copy manually.
Note that many drivers implement transfers this way and this would
avoid duplicate code in drivers.
transfer_inline_write can also be done by copies from user buffers, or
textures layered over user buffers.

--
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev