Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-07 Thread José Fonseca
On Mon, 2010-09-06 at 16:31 -0700, Marek Olšák wrote:
 On Mon, Sep 6, 2010 at 9:57 PM, José Fonseca jfons...@vmware.com
 wrote:
 
 On Mon, 2010-09-06 at 10:22 -0700, Marek Olšák wrote:
  On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca
 jfons...@vmware.com
  wrote:
  I'd like to know if there's any objection to change
 the
  resource_copy_region semantics to allow copies
 between
  different yet
  compatible formats, where the definition of
 compatible formats
  is:
 
   formats for which copying the bytes from the
 source resource
  unmodified to the destination resource will achieve
 the same
  effect of a
  textured quad blitter
 
  There is an helper function
 util_is_format_compatible() to
  help making
  this decision, and these are the non-trivial
 conversions that
  this
  function currently recognizes, (which was produced
 by
  u_format_compatible_test.c):
 
   b8g8r8a8_unorm - b8g8r8x8_unorm
 
  This specific case (and others) might not work, because
 there are no
  0/1 swizzles when blending pixels with the framebuffer, e.g.
 see this
  sequence of operations:
  - Blit from b8g8r8a8 to b8g8r8x8.
  - x8 now contains a8.
  - Bind b8g8r8x8 as a colorbuffer.
  - Use blending with the destination alpha channel.
  - The original a8 is read instead of 1 (x8) because of lack
 of
  swizzles.
 
 
 This is not correct. Or at least not my interpretation.
 
 The x in b8g8r8x8 means padding (potentially with with
 unitialized
 data). There is no implicit guarantee that it will contain
 0xff or
 anything.
 
 When blending to b8g8r8x8, destination alpha is by definition
 1.0. It is
 an implicit swizzle (see e.g., u_format.csv).
 
 If the hardware's fixed function blending doesn't understand
 bgrx
 formats natively, then the pipe driver should internally
 replace the
 destination alpha factor factor with one. It's really simple.
 See for
 
 The dst blending parameter is just a factor the real dst value is
 multiplied by (except for min/max). There is no way to multiply an
 arbitrary value by a constant and get 1.0. But you can force 0, of
 course. I don't think there is hardware which supports such flexible
 swizzling in the blender. 

Lets assume your hardware doesn't understand bgrx rendertargets formats
natively, and you program it with the bgra format instead.

If so then you must do these replacements in rgb_src_factor and
rgb_dst_factor:

PIPE_BLENDFACTOR_DST_ALPHA - PIPE_BLENDFACTOR_ONE;
PIPE_BLENDFACTOR_INV_DST_ALPHA -  PIPE_BLENDFACTOR_ZERO;
PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE - PIPE_BLENDFACTOR_ZERO;

This will ensure that's written in the red, green, and blue components
is consistent with a bgrx format (that is, destination alpha is always
one -- incoming values are discarded).

In this scenario, how you program alpha_src_factor/alpha_dst_factor is
irrelevant, because they will only affect what's written in the padding
bits, which is just padding -- it can and should be treated as
gibberish. 

 If x8 is just padding as you say, the value of it should be undefined
 and every operation using the padding bits should be undefined too
 except for texture sampling. It's not like I have any other choice.

IMO, there is no such thing as an operation using the padding bits. It
is more like the contents of padding is undefined after/before any
operation. And no operation should rely on it to have any particular
value, by definition.

Alpha blending of with a bgrx format should not (and needs not) to
incorporate the padding bits for any computation. It may, however, write
anything it feels like to the padding bits as a side effect.

Now we could certainly impose the restriction in gallium that dst alpha
blendfactors will produce undefined results for bgrx (and perhaps this
is what you're arguing for). Then the burden of doing the replacements
above shifts to the statetracker. I think Keith favors that stance.


At any rate, going back to the original topic, I see no reason not to
allow bgra - bgrx region_copy_regions.


Also, for the record, in the moment arbitrary swizzles in the texture
sampler bgrx formats became almost redundant. And I say almost because
knowning that there is no alpha in the color buffer allows for certain
optimizations (e.g., llvmpipe's swizzled layout separates the red,
green, blue, and alpha channels into different 128bit words, and will
not 

Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Roland Scheidegger
On 06.09.2010 15:57, José Fonseca wrote:
 I'd like to know if there's any objection to change the
 resource_copy_region semantics to allow copies between different yet
 compatible formats, where the definition of compatible formats is:
 
   formats for which copying the bytes from the source resource
 unmodified to the destination resource will achieve the same effect of a
 textured quad blitter
 
 There is an helper function util_is_format_compatible() to help making
 this decision, and these are the non-trivial conversions that this
 function currently recognizes, (which was produced by
 u_format_compatible_test.c):
 
   b8g8r8a8_unorm - b8g8r8x8_unorm
   a8r8g8b8_unorm - x8r8g8b8_unorm
   b5g5r5a1_unorm - b5g5r5x1_unorm
   b4g4r4a4_unorm - b4g4r4x4_unorm
   l8_unorm - r8_unorm
   i8_unorm - l8_unorm
   i8_unorm - a8_unorm
   i8_unorm - r8_unorm
   l16_unorm - r16_unorm
   z24_unorm_s8_uscaled - z24x8_unorm
   s8_uscaled_z24_unorm - x8z24_unorm
   r8g8b8a8_unorm - r8g8b8x8_unorm
   a8b8g8r8_srgb - x8b8g8r8_srgb
   b8g8r8a8_srgb - b8g8r8x8_srgb
   a8r8g8b8_srgb - x8r8g8b8_srgb
   a8b8g8r8_unorm - x8b8g8r8_unorm
   r10g10b10a2_uscaled - r10g10b10x2_uscaled
   r10sg10sb10sa2u_norm - r10g10b10x2_snorm
 
 Note that format compatibility is not commutative.
 
 For software drivers this means that memcpy/util_copy_rect() will
 achieve the correct result.
 
 For hardware drivers this means that a VRAM-VRAM 2D blit engine will
 also achieve the correct result.
 
 So I'd expect no implementation change of resource_copy_region() for any
 driver AFAICT. But I'd like to be sure.
 
 Jose

José,

this looks good to me. Note that the analogous function in d3d10,
ResourceCopyRegion, only requires formats to be in the same typeless
group (hence same number of bits for all components), which is certainly
a broader set of compatible formats to what util_is_format_compatible()
  is outputting. As far as I can tell, no conversion is happening at all
in d3d10, this is just like memcpy. I think we might want to support
that in the future as well, but for now extending this to the formats
you listed certainly sounds ok.

Roland

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com wrote:
 I'd like to know if there's any objection to change the
 resource_copy_region semantics to allow copies between different yet
 compatible formats, where the definition of compatible formats is:

I was about to propose something like this.

How about a much more powerful change though, that would make any pair
of non-blocked format of the same bit depth compatible?
This way you could copy z24s8 to r8g8b8a8, for instance.

In addition to this, how about explicitly allowing sampler views to
use a compatible format, and add the ability for surfaces to use a
compatible format too? (with a new parameter to get_tex_surface)

This would allow for instance to implement glBlitFramebuffer on
stencil buffers by reinterpreting the buffer as r8g8b8a8, and allow
the blitter module to copy depth/stencil buffers by simply treating
them as color buffers.

The only issue is that some drivers might hold depth/stencil surfaces
in compressed formats that cannot be interpreted as a color format,
and not have any mechanism for keeping temporaries or doing
conversions internally.

DirectX seems to have something like this with the _TYPELESS formats.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread José Fonseca
On Mon, 2010-09-06 at 08:11 -0700, Roland Scheidegger wrote:
 On 06.09.2010 15:57, José Fonseca wrote:
  I'd like to know if there's any objection to change the
  resource_copy_region semantics to allow copies between different yet
  compatible formats, where the definition of compatible formats is:
  
formats for which copying the bytes from the source resource
  unmodified to the destination resource will achieve the same effect of a
  textured quad blitter
  
  There is an helper function util_is_format_compatible() to help making
  this decision, and these are the non-trivial conversions that this
  function currently recognizes, (which was produced by
  u_format_compatible_test.c):
  
b8g8r8a8_unorm - b8g8r8x8_unorm
a8r8g8b8_unorm - x8r8g8b8_unorm
b5g5r5a1_unorm - b5g5r5x1_unorm
b4g4r4a4_unorm - b4g4r4x4_unorm
l8_unorm - r8_unorm
i8_unorm - l8_unorm
i8_unorm - a8_unorm
i8_unorm - r8_unorm
l16_unorm - r16_unorm
z24_unorm_s8_uscaled - z24x8_unorm
s8_uscaled_z24_unorm - x8z24_unorm
r8g8b8a8_unorm - r8g8b8x8_unorm
a8b8g8r8_srgb - x8b8g8r8_srgb
b8g8r8a8_srgb - b8g8r8x8_srgb
a8r8g8b8_srgb - x8r8g8b8_srgb
a8b8g8r8_unorm - x8b8g8r8_unorm
r10g10b10a2_uscaled - r10g10b10x2_uscaled
r10sg10sb10sa2u_norm - r10g10b10x2_snorm
  
  Note that format compatibility is not commutative.
  
  For software drivers this means that memcpy/util_copy_rect() will
  achieve the correct result.
  
  For hardware drivers this means that a VRAM-VRAM 2D blit engine will
  also achieve the correct result.
  
  So I'd expect no implementation change of resource_copy_region() for any
  driver AFAICT. But I'd like to be sure.
  
  Jose
 
 José,
 
 this looks good to me. Note that the analogous function in d3d10,
 ResourceCopyRegion, only requires formats to be in the same typeless
 group (hence same number of bits for all components), which is certainly
 a broader set of compatible formats to what util_is_format_compatible()
   is outputting. As far as I can tell, no conversion is happening at all
 in d3d10, this is just like memcpy. I think we might want to support
 that in the future as well, but for now extending this to the formats
 you listed certainly sounds ok.

Yes, that makes sense. Thanks for the feedback, Roland.

Jose




--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Marek Olšák
On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com wrote:

 I'd like to know if there's any objection to change the
 resource_copy_region semantics to allow copies between different yet
 compatible formats, where the definition of compatible formats is:

  formats for which copying the bytes from the source resource
 unmodified to the destination resource will achieve the same effect of a
 textured quad blitter

 There is an helper function util_is_format_compatible() to help making
 this decision, and these are the non-trivial conversions that this
 function currently recognizes, (which was produced by
 u_format_compatible_test.c):

  b8g8r8a8_unorm - b8g8r8x8_unorm


This specific case (and others) might not work, because there are no 0/1
swizzles when blending pixels with the framebuffer, e.g. see this sequence
of operations:
- Blit from b8g8r8a8 to b8g8r8x8.
- x8 now contains a8.
- Bind b8g8r8x8 as a colorbuffer.
- Use blending with the destination alpha channel.
- The original a8 is read instead of 1 (x8) because of lack of swizzles.

The blitter and other util functions just need to be extended to explicitly
write 1 instead of copying the alpha channel. Something likes this is
already done in st/mesa, see the function compatible_src_dst_formats.

Marek

 a8r8g8b8_unorm - x8r8g8b8_unorm
  b5g5r5a1_unorm - b5g5r5x1_unorm
  b4g4r4a4_unorm - b4g4r4x4_unorm
  l8_unorm - r8_unorm
  i8_unorm - l8_unorm
  i8_unorm - a8_unorm
  i8_unorm - r8_unorm
  l16_unorm - r16_unorm
  z24_unorm_s8_uscaled - z24x8_unorm
  s8_uscaled_z24_unorm - x8z24_unorm
  r8g8b8a8_unorm - r8g8b8x8_unorm
  a8b8g8r8_srgb - x8b8g8r8_srgb
  b8g8r8a8_srgb - b8g8r8x8_srgb
  a8r8g8b8_srgb - x8r8g8b8_srgb
  a8b8g8r8_unorm - x8b8g8r8_unorm
  r10g10b10a2_uscaled - r10g10b10x2_uscaled
  r10sg10sb10sa2u_norm - r10g10b10x2_snorm

 Note that format compatibility is not commutative.

 For software drivers this means that memcpy/util_copy_rect() will
 achieve the correct result.

 For hardware drivers this means that a VRAM-VRAM 2D blit engine will
 also achieve the correct result.

 So I'd expect no implementation change of resource_copy_region() for any
 driver AFAICT. But I'd like to be sure.

 Jose



 --
 This SF.net Dev2Dev email is sponsored by:

 Show off your parallel programming skills.
 Enter the Intel(R) Threading Challenge 2010.
 http://p.sf.net/sfu/intel-thread-sfd
 ___
 Mesa3d-dev mailing list
 Mesa3d-dev@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
How about dropping the idea that resource_copy_region must be just a
memcpy and have the driver instruct the hardware 2D blitter to write
1s in the alpha channel if supported by hw or have u_blitter do this
in the shader?

nv30/nv40 and apparently nv50 can do this in the 2D blitter, and all
Radeons seem to use the 3D engine, which obviously can do it in the
shader.

We may also want to allow actual conversion between arbitrary formats,
since again u_blitter can do it trivially, and so can most/all
hardware 2D engines.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Roland Scheidegger
On 06.09.2010 17:16, Luca Barbieri wrote:
 On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com wrote:
 I'd like to know if there's any objection to change the
 resource_copy_region semantics to allow copies between different yet
 compatible formats, where the definition of compatible formats is:
 
 I was about to propose something like this.
 
 How about a much more powerful change though, that would make any pair
 of non-blocked format of the same bit depth compatible?
 This way you could copy z24s8 to r8g8b8a8, for instance.
I am not sure this makes a lot of sense. There's no guarantee the bit
layout of these is even remotely similar (and it likely won't be on any
decent hardware). I think the dx10 restriction makes sense here.

 
 In addition to this, how about explicitly allowing sampler views to
 use a compatible format, and add the ability for surfaces to use a
 compatible format too? (with a new parameter to get_tex_surface)
Note that get_tex_surface is dead (in gallium-array-textures - not
merged yet but it will happen eventually). Its replacement (for render
targets or depth stencil) create_surface(), already can be supplied with
a format parameter. Compatible formats though should ultimately end up
to something similar to dx10.

 
 This would allow for instance to implement glBlitFramebuffer on
 stencil buffers by reinterpreting the buffer as r8g8b8a8, and allow
 the blitter module to copy depth/stencil buffers by simply treating
 them as color buffers.
 
 The only issue is that some drivers might hold depth/stencil surfaces
 in compressed formats that cannot be interpreted as a color format,
 and not have any mechanism for keeping temporaries or doing
 conversions internally.
I think that's a pretty big if. I could be wrong but I think operations
like blitting stencil buffers are pretty rare anyway (afaik other apis
don't allow things like that).

 
 DirectX seems to have something like this with the _TYPELESS formats.
Yes, and it precisely won't allow you to interpret s24_z8 as r8g8b8a8 or
other wonky stuff. Only if all components have same number of bits.

Roland


--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread José Fonseca
On Mon, 2010-09-06 at 10:22 -0700, Marek Olšák wrote:
 On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com
 wrote:
 I'd like to know if there's any objection to change the
 resource_copy_region semantics to allow copies between
 different yet
 compatible formats, where the definition of compatible formats
 is:
 
  formats for which copying the bytes from the source resource
 unmodified to the destination resource will achieve the same
 effect of a
 textured quad blitter
 
 There is an helper function util_is_format_compatible() to
 help making
 this decision, and these are the non-trivial conversions that
 this
 function currently recognizes, (which was produced by
 u_format_compatible_test.c):
 
  b8g8r8a8_unorm - b8g8r8x8_unorm
 
 This specific case (and others) might not work, because there are no
 0/1 swizzles when blending pixels with the framebuffer, e.g. see this
 sequence of operations:
 - Blit from b8g8r8a8 to b8g8r8x8.
 - x8 now contains a8.
 - Bind b8g8r8x8 as a colorbuffer.
 - Use blending with the destination alpha channel.
 - The original a8 is read instead of 1 (x8) because of lack of
 swizzles.

This is not correct. Or at least not my interpretation.

The x in b8g8r8x8 means padding (potentially with with unitialized
data). There is no implicit guarantee that it will contain 0xff or
anything.

When blending to b8g8r8x8, destination alpha is by definition 1.0. It is
an implicit swizzle (see e.g., u_format.csv).

If the hardware's fixed function blending doesn't understand bgrx
formats natively, then the pipe driver should internally replace the
destination alpha factor factor with one. It's really simple. See for
example llvmpipe (which needs to do that because the swizzled tile
format is always bgra, so it needs to ignore destination alpha when bgrx
surface is bound).

I'm not sure what OpenGL defines, but DirectX/DCT definetely
prescribes/enforces this behavior.

 The blitter and other util functions just need to be extended to
 explicitly write 1 instead of copying the alpha channel. Something
 likes this is already done in st/mesa, see the function
 compatible_src_dst_formats.

There is no alpha channel in b8g8r8x8 for anybody to write. The problem
here is not what's written in the padding bits -- it is instead in
making sure the padding bits are not interpreted as alpha.

If the hardware *really* works better with 0xff in the padding bits,
then that needs to be enforced not only in surface copy, but in
transfers (i.e., when the transfer is unmapped, the pipe driver would
need to fill padding bits with 0xff for every pixel.

Jose



--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
 This way you could copy z24s8 to r8g8b8a8, for instance.

 I am not sure this makes a lot of sense. There's no guarantee the bit
 layout of these is even remotely similar (and it likely won't be on any
 decent hardware). I think the dx10 restriction makes sense here.

Yes, it depends on the flexibility of the hardware and the driver.
Due to depth textures, I think it is actually likely that you can
easily treat depth as color.

The worst issue right now is that stencil cannot be accessed in a
sensible way at all, which makes implementing glBlitFramebuffer of
STENCIL_BIT with NEAREST and different rect sizes impossible.
Some cards (r600+ at least) can write stencil in shaders, but on some
you must reinterpret the surface.
And resource_copy_region does not support stretching, so it can't be used.

Since not all cards can write stencil in shaders, one either needs to
be able to bind depth/stencil as a color buffer, or extend
resource_copy_region to support stretching with nearest filtering, or
both (possibly in addition to having the option of using stencil
export in shaders).

Other things would likely benefit, such as GL_NV_copy_depth_to_color.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread José Fonseca
On Mon, 2010-09-06 at 10:41 -0700, Luca Barbieri wrote:
 How about dropping the idea that resource_copy_region must be just a
 memcpy and have the driver instruct the hardware 2D blitter to write
 1s in the alpha channel if supported by hw or have u_blitter do this
 in the shader?

It's really different functionality. You're asking for a cast, as in

   b = (type)a;

as in (int)1.0f = 1.

Another thing is

  b = *(type *)a;

as *(int *)1.0f = 0x3f80. This is my understanding of
region_copy_region (previously known as surface_copy). And Roland
provided a compelling argument for that.

Both these functionality are exposed by APIs, and neither is a superset
of the other.

Jose


--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
Yes, if x8 is interpreted as writes can write arbitrary data, reads
must return 1 (as you said), then this is not necessary in
resource_copy_region even if A8 - X8 becomes supported.

You are right that format conversions would probably be better added
as a separate function (if at all), in addition to the
reinterpret_cast mechanism you proposed to add.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Roland Scheidegger
On 06.09.2010 22:03, Luca Barbieri wrote:
 This way you could copy z24s8 to r8g8b8a8, for instance.
 
 I am not sure this makes a lot of sense. There's no guarantee the bit
 layout of these is even remotely similar (and it likely won't be on any
 decent hardware). I think the dx10 restriction makes sense here.
 
 Yes, it depends on the flexibility of the hardware and the driver.
 Due to depth textures, I think it is actually likely that you can
 easily treat depth as color.
 
 The worst issue right now is that stencil cannot be accessed in a
 sensible way at all, which makes implementing glBlitFramebuffer of
 STENCIL_BIT with NEAREST and different rect sizes impossible.
 Some cards (r600+ at least) can write stencil in shaders, but on some
 you must reinterpret the surface.
 And resource_copy_region does not support stretching, so it can't be used.
 
 Since not all cards can write stencil in shaders, one either needs to
 be able to bind depth/stencil as a color buffer, or extend
 resource_copy_region to support stretching with nearest filtering, or
 both (possibly in addition to having the option of using stencil
 export in shaders).
Yes, accessing stencil is a problem - other apis just disallow that...
There are other problems with accessing stencil, like for instance
WritePixels with multisampled depth/stencil buffer (which you can't
really map hence cpu fallbacks don't even work). Plus you really don't
want any cpu fallbacks anyway.
Using stencil export (ARB_shader_stencil_export) seems like a clean
solution, but as you said not all cards support it.
Plus you can't actually get the stencil values with texture sampling
neither, so this doesn't help that much (well you can't get them with GL
though hardware may support it I guess).
When I said it won't work with decent hardware, I really meant it won't
work due to compression. Now, it's quite possible this can be disabled
on any chip, but you don't know that before hence you need to jump
through hoops to get an uncompressed version of your compressed buffer
later.
Do applications actually really ever use blitframebuffer with stencil
bit (with different sizes, otherwise resource_copy_region could be
used)? It just seems to me that casts to completely different formats
(well still with same total bitwidth, but still) are very unclean, but I
don't have any good solution of how to solve this - if noone ever uses
this in practice cpu fallback is just fine, but as said won't work for
multisampled buffers for instance neither.

 
 Other things would likely benefit, such as GL_NV_copy_depth_to_color.

Roland

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
 When I said it won't work with decent hardware, I really meant it won't
 work due to compression. Now, it's quite possible this can be disabled
 on any chip, but you don't know that before hence you need to jump
 through hoops to get an uncompressed version of your compressed buffer
 later.

Well, you can render to a compressed depth buffer and then bind it as
a depth texture (routinely done for shadows), so there needs to be a
way to get compressed data to the sampler either directly or via the
driver automagically converting it with a blit beforehand.

Of course, this may not actually work for stencil too, or might not
allow to let you interpret depth as 8-bit color components, or perhaps
not use directly as a render target, but it seems possible, especially
on modern flexible hardware and on older dumber hardware that
lacks/doesn't force compression.

I haven't checked any hardware docs though, beyond the fact that nvfx
currently doesn't support any compression and thus can just do it.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Marek Olšák
On Mon, Sep 6, 2010 at 9:57 PM, José Fonseca jfons...@vmware.com wrote:

 On Mon, 2010-09-06 at 10:22 -0700, Marek Olšák wrote:
  On Mon, Sep 6, 2010 at 3:57 PM, José Fonseca jfons...@vmware.com
  wrote:
  I'd like to know if there's any objection to change the
  resource_copy_region semantics to allow copies between
  different yet
  compatible formats, where the definition of compatible formats
  is:
 
   formats for which copying the bytes from the source resource
  unmodified to the destination resource will achieve the same
  effect of a
  textured quad blitter
 
  There is an helper function util_is_format_compatible() to
  help making
  this decision, and these are the non-trivial conversions that
  this
  function currently recognizes, (which was produced by
  u_format_compatible_test.c):
 
   b8g8r8a8_unorm - b8g8r8x8_unorm
 
  This specific case (and others) might not work, because there are no
  0/1 swizzles when blending pixels with the framebuffer, e.g. see this
  sequence of operations:
  - Blit from b8g8r8a8 to b8g8r8x8.
  - x8 now contains a8.
  - Bind b8g8r8x8 as a colorbuffer.
  - Use blending with the destination alpha channel.
  - The original a8 is read instead of 1 (x8) because of lack of
  swizzles.

 This is not correct. Or at least not my interpretation.

 The x in b8g8r8x8 means padding (potentially with with unitialized
 data). There is no implicit guarantee that it will contain 0xff or
 anything.

 When blending to b8g8r8x8, destination alpha is by definition 1.0. It is
 an implicit swizzle (see e.g., u_format.csv).

 If the hardware's fixed function blending doesn't understand bgrx
 formats natively, then the pipe driver should internally replace the
 destination alpha factor factor with one. It's really simple. See for


The dst blending parameter is just a factor the real dst value is multiplied
by (except for min/max). There is no way to multiply an arbitrary value by a
constant and get 1.0. But you can force 0, of course. I don't think there is
hardware which supports such flexible swizzling in the blender. If x8 is
just padding as you say, the value of it should be undefined and every
operation using the padding bits should be undefined too except for texture
sampling. It's not like I have any other choice.

Marek
--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev


Re: [Mesa3d-dev] RFC: allow resource_copy_region between different (yet compatabile) formats

2010-09-06 Thread Luca Barbieri
 The dst blending parameter is just a factor the real dst value is multiplied
 by (except for min/max). There is no way to multiply an arbitrary value by a
 constant and get 1.0. But you can force 0, of course. I don't think there is
 hardware which supports such flexible swizzling in the blender. If x8 is
 just padding as you say, the value of it should be undefined and every
 operation using the padding bits should be undefined too except for texture
 sampling. It's not like I have any other choice.

As far as I can tell, the only problem you have with blending with an
X8 with random garbage, but with read value 1 is if any of the
blending factors is DST_ALPHA or INV_DST_ALPHA (or COLOR as an alpha
factor), in which case you can solve the issue by replacing the
offending factor with ONE or ZERO, as long as you have support for
RGB/A separate blend functions (which Gallium currenly assumes afaik).

You can also disable the alpha channel in the writemask to avoid
unnecessary work.

On nv30/nv40, there is an actual render target format that instructs
the card to read dst alpha as 1 (you can also choose whether to write
0 or 1).

Of course, one could argue that mesa/st should do the transformation
instead of Gallium drivers where hardware lacks such support.

I suppose just not advertising X8 formats as render target formats
could also work.

--
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
___
Mesa3d-dev mailing list
Mesa3d-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mesa3d-dev