On Wed, Apr 20, 2016 at 11:28 AM, Michel Dänzer <mic...@daenzer.net> wrote:
> On 20.04.2016 03:13, Oded Gabbay wrote:
>> On Tue, Apr 19, 2016 at 5:59 PM, Marek Olšák <mar...@gmail.com> wrote:
>>> On Tue, Apr 19, 2016 at 3:11 PM, Oded Gabbay <oded.gab...@gmail.com> wrote:
>>>> On Mon, Apr 18, 2016 at 6:03 PM,
>>>> Ilia Mirkin <imir...@alum.mit.edu> wrote:
>>>>> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <oded.gab...@gmail.com> 
>>>>> wrote:
>>>>>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <imir...@alum.mit.edu> 
>>>>>> wrote:
>>>>>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <oded.gab...@gmail.com> 
>>>>>>> wrote:
>>>>>>>>> Wouldn't it make more sense to handle such issues in transfer_map?
>>>>>>>>> (i.e. create a staging memory area, and decode into it)? This assumes
>>>>>>>>> that the transfer_map() call has enough information to "do the right
>>>>>>>>> thing". I don't think it does today, but perhaps it could be taught?
>>>>>>>> It doesn't have all the info today, that's for sure. I imagine though
>>>>>>>> we can add parameters to it.
>>>>>>>>
>>>>>>>>> That way everything that's in a pipe_resource is in some
>>>>>>>>> tightly-controlled format, and we specify the LE <-> BE parameters
>>>>>>>>> when converting between CPU-read/written and GPU-read/written data. I
>>>>>>>>> believe this is a better match for what's really happening, too. What
>>>>>>>>> do you think?
>>>>>>>>>
>>>>>>>>>   -ilia
>>>>>>>>
>>>>>>>> Unless I'm missing something, I think, at the end of the day, it will
>>>>>>>> be the same issues as in my solution - per code path per format is a
>>>>>>>> different case. That's because you will still need to "teach"
>>>>>>>> transfer_map, per each transfer per format what to do. So one will
>>>>>>>> need to go and debug every single code path there is in mesa for
>>>>>>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5
>>>>>>>> months. It's a great learning experience but it won't give anything
>>>>>>>> generic.
>>>>>>>>
>>>>>>>> Again, for example, in st_ReadPixels, I imagine you will need to give
>>>>>>>> "different orders" to transfer_map for the two different scenarios -
>>>>>>>> H/W blit and fallback. So what's the gain here ?
>>>>>>>>
>>>>>>>> If I'm missing something, please tell me.
>>>>>>>
>>>>>>> One of us is... let's figure out which one :)
>>>>>>>
>>>>>>> Here's my proposal:
>>>>>>>
>>>>>>> All data stored inside of resources is stored in a driver-happy
>>>>>>> format. The driver ensures that it's stored in proper endianness, etc.
>>>>>>> (Much like it does today wrt proper stride.)
>>>>>>>
>>>>>>> Blitting(/copying) between resources doesn't require any additional
>>>>>>> information, since you have the format(s) of the respective resources,
>>>>>>> and it's all inside the driver, so the driver does whatever it needs
>>>>>>> to do to make it all "work".
>>>>>>>
>>>>>>> *Accessing and modifying* resources (directly) from the CPU is what
>>>>>>> becomes tricky. The state tracker may have incorrect expectations of
>>>>>>> the actual backing data. There are a few different ways to resolve
>>>>>>> this. The one I'm proposing is that you only ever return a pointer to
>>>>>>> the directly underlying data if it matches the CPU's expectations
>>>>>>> (which will only be the case for byte-oriented array formats like
>>>>>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g.
>>>>>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through
>>>>>>> a bounce buffer.
>>>>>>>
>>>>>>> At transfer map time, you convert the data from GPU-style to
>>>>>>> CPU-style, and copy back the relevant bits at unmap/flush time.
>>>>>>>
>>>>>>> This presents a nice clean boundary for this stuff. Instead of the
>>>>>>> state tracker trying to guess what the driver will do and feeding it
>>>>>>> endiannesses that it can't possibly guess properly, the tracking logic
>>>>>>> is relegated to the driver, and we extend the interfaces to allow the
>>>>>>> state tracker to access the data in a proper way.
>>>>>>>
>>>>>>> I believe the advantage of this scheme is that beyond adding format
>>>>>>> parameters to pipe_transfer_map() calls, there will not need to be any
>>>>>>> adjustments to the state trackers.
>>>>>>>
>>>>>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it
>>>>>>> maps a buffer, it's formatless (at map time), and yet the GPU will be
>>>>>>> required to interpret it correctly. We could decree that PIPE_BUFFER
>>>>>>> is just *always* an array of R8_UNORM and thus never needs any type of
>>>>>>> swapping. The driver needs to adjust accordingly to deal with accesses
>>>>>>> that don't fit that pattern (and where parameters can't be fed to the
>>>>>>> GPU to interpret it properly).
>>>>>>>
>>>>>>> I think something like the above will work. And I think it presents a
>>>>>>> cleaner barrier than your proposal, because none of the "this GPU can
>>>>>>> kinda-sorta understand BE, but not everywhere" details are ever
>>>>>>> exposed to the state tracker.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>>   -ilia
>>>>>>
>>>>>> Ilia,
>>>>>>
>>>>>> To make the GPU do a conversion during blitting, I need to configure
>>>>>> registers. This is done in a couple of functions in the r600g driver
>>>>>> (r600_translate_texformat, r600_colorformat_endian_swap,
>>>>>> r600_translate_colorformat and r600_translate_colorswap).
>>>>>>
>>>>>> The problem is that transfer_map/unmap don't call directly to those
>>>>>> functions. They call other functions which eventually call those 4
>>>>>> functions. Among those "other" functions, there are several function
>>>>>> calls which are *not* in the r600g driver. i.e. we go back to generic
>>>>>> util functions. For example:
>>>>>>
>>>>>> #0  r600_translate_colorformat
>>>>>> #1  evergreen_init_color_surface
>>>>>> #2  evergreen_set_framebuffer_state
>>>>>> #3  util_blitter_custom_depth_stencil
>>>>>> #4  r600_blit_decompress_depth
>>>>>> #5  r600_texture_transfer_map
>>>>>>
>>>>>> Am I allowed to now pass information from transfer_map/unmap all the
>>>>>> way down to the 4 functions I mentioned through all these layers as
>>>>>> additional parameters ? I preferred to put it in pipe_resource as that
>>>>>> information goes all the way down to those functions, but if I can't
>>>>>> use that, then what's an acceptable alternative ?
>>>>>>
>>>>>> This time, I would like to get an agreement *before* I implement it.
>>>>>
>>>>> Probably a good idea. And as issues are investigated, people's
>>>>> opinions on the "correct" way might shift. Let's think about this...
>>>>>
>>>>> So clearly *a* correct way to handle this would be to stop all the
>>>>> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g.
>>>>> R5G6B5 but makes no mention of the byte layout in memory for those 16
>>>>> bits. Really what we have right now is a format and an *implicit*
>>>>> endian ordering, which is the CPU's. But what happens when the CPU and
>>>>> GPU don't agree?
>>>>>
>>>>> There's a path we could take which would be to add an endianness
>>>>> alongside each format (be it by doubling formats, or an explicit
>>>>> second field). This would be a very far-reaching change though, and I
>>>>> doubt you'll want to do it. What we're left with is having a format
>>>>> and an *implicit* endianness. Which means that the consumers of the
>>>>> format need to be able to work out the implicit endianness involved.
>>>>> And the endianness will be GPU endian for regular resources, and CPU
>>>>> endian for "staging" resources. So it's definitely tempting to stick
>>>>> the endian thing into a private field of the resource, like Rob is
>>>>> suggesting - when creating a staging texture in
>>>>> transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise
>>>>> set it to gpu endian. And I think this is somewhat similar to your
>>>>> former approach.
>>>>>
>>>>> What do you think?
>>>>>
>>>>>   -ilia
>>>>
>>>> I don't think I have any other choice but to stick it as a private
>>>> field, because the endian parameter simple can't go through all the
>>>> function calls as an additional parameter. The reason is that
>>>> set_framebuffer_state() function types are called from
>>>> st_invalidate_state, where I don't have any idea about the "correct"
>>>> endianess, so I can't add the endian parameter to that function type.
>>>>
>>>> The only thing that is propagated through all layers is r600_texture.
>>>> I'll try to use that.
>>>>
>>>> Marek, Michel,
>>>> Do you think it is OK to add the endian mark to that private structure ?
>>>
>>> Yes, but doesn't util_format_description::is_array provide that info 
>>> already?
>>
>> If you mean to say that they are interchangeable, than no.
>> It's true that for array formats we *never* need to do swaps, but for
>> non-array formats there are cases where we need to do it and cases
>> where we don't need to do it.
>>
>> For example, when writing to depth buffer through st_DrawPixels
>> (glDrawPixels) with GL_FLOAT, than the format used is
>> PIPE_FORMAT_Z32_FLOAT.
>> In the make_texture() part, we need to configure endian swap for the
>> staging buffer, but after you return from it and go into
>> st_create_texture_sampler_view(), you want the destination texture to
>> be configured without endian swapping.
>
> So the CB/DB block can't swap bytes when accessing Z32?
>
Not sure what you mean by this sentence. Could you please further
explain your question ?

>
>> Both those buffers are configured with PIPE_FORMAT_Z32_FLOAT, so
>> that's why is_array can't be used instead of a dedicated flag.
>
> Can't the driver figure out on its own whether it needs to swap or not
> based on other available information, e.g. considering the resource
> usage/bind flags and/or whether the resource is sampled from or rendered to?
>
Short answer, no.

Per the suggestions I received, I know try to use PIPE_TRANSFER_WRITE
to know that in r600_texture_transfer_map. But I need to propagate
that knowledge to the actual functions that do the endian/format
configuration (the 4 functions I mentioned in earlier emails). The
best way I found how to do that is to add a flag to the private
r600_texture structure. If you have a better suggestion, I'm open to
hearing it.

In addition, sometimes we don't pass through pipe_transfer_map/unmap.
e.g. st_create_texture_sampler_view(), which eventually gets to
r600_translate_texformat() and r600_colorformat_endian_swap(). How
those two functions will know what to do, without me giving them some
hint ? how do they know if they arrived from
st_create_texture_sampler_view() or from util_blitter_blit_generic() ?
And for many cases, we need this distinction, otherwise I wouldn't
need to fix anything.

Oded

>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to