On Wed, Apr 20, 2016 at 11:28 AM, Michel Dänzer <mic...@daenzer.net> wrote: > On 20.04.2016 03:13, Oded Gabbay wrote: >> On Tue, Apr 19, 2016 at 5:59 PM, Marek Olšák <mar...@gmail.com> wrote: >>> On Tue, Apr 19, 2016 at 3:11 PM, Oded Gabbay <oded.gab...@gmail.com> wrote: >>>> On Mon, Apr 18, 2016 at 6:03 PM, >>>> Ilia Mirkin <imir...@alum.mit.edu> wrote: >>>>> On Mon, Apr 18, 2016 at 10:47 AM, Oded Gabbay <oded.gab...@gmail.com> >>>>> wrote: >>>>>> On Thu, Apr 14, 2016 at 6:44 PM, Ilia Mirkin <imir...@alum.mit.edu> >>>>>> wrote: >>>>>>> On Thu, Apr 14, 2016 at 11:08 AM, Oded Gabbay <oded.gab...@gmail.com> >>>>>>> wrote: >>>>>>>>> Wouldn't it make more sense to handle such issues in transfer_map? >>>>>>>>> (i.e. create a staging memory area, and decode into it)? This assumes >>>>>>>>> that the transfer_map() call has enough information to "do the right >>>>>>>>> thing". I don't think it does today, but perhaps it could be taught? >>>>>>>> It doesn't have all the info today, that's for sure. I imagine though >>>>>>>> we can add parameters to it. >>>>>>>> >>>>>>>>> That way everything that's in a pipe_resource is in some >>>>>>>>> tightly-controlled format, and we specify the LE <-> BE parameters >>>>>>>>> when converting between CPU-read/written and GPU-read/written data. I >>>>>>>>> believe this is a better match for what's really happening, too. What >>>>>>>>> do you think? >>>>>>>>> >>>>>>>>> -ilia >>>>>>>> >>>>>>>> Unless I'm missing something, I think, at the end of the day, it will >>>>>>>> be the same issues as in my solution - per code path per format is a >>>>>>>> different case. That's because you will still need to "teach" >>>>>>>> transfer_map, per each transfer per format what to do. So one will >>>>>>>> need to go and debug every single code path there is in mesa for >>>>>>>> drawing/copying/reading/textures/etc., like what I did in the last 1.5 >>>>>>>> months. It's a great learning experience but it won't give anything >>>>>>>> generic. >>>>>>>> >>>>>>>> Again, for example, in st_ReadPixels, I imagine you will need to give >>>>>>>> "different orders" to transfer_map for the two different scenarios - >>>>>>>> H/W blit and fallback. So what's the gain here ? >>>>>>>> >>>>>>>> If I'm missing something, please tell me. >>>>>>> >>>>>>> One of us is... let's figure out which one :) >>>>>>> >>>>>>> Here's my proposal: >>>>>>> >>>>>>> All data stored inside of resources is stored in a driver-happy >>>>>>> format. The driver ensures that it's stored in proper endianness, etc. >>>>>>> (Much like it does today wrt proper stride.) >>>>>>> >>>>>>> Blitting(/copying) between resources doesn't require any additional >>>>>>> information, since you have the format(s) of the respective resources, >>>>>>> and it's all inside the driver, so the driver does whatever it needs >>>>>>> to do to make it all "work". >>>>>>> >>>>>>> *Accessing and modifying* resources (directly) from the CPU is what >>>>>>> becomes tricky. The state tracker may have incorrect expectations of >>>>>>> the actual backing data. There are a few different ways to resolve >>>>>>> this. The one I'm proposing is that you only ever return a pointer to >>>>>>> the directly underlying data if it matches the CPU's expectations >>>>>>> (which will only be the case for byte-oriented array formats like >>>>>>> PIPE_FORMAT_R8G8B8A8_* & co). Everything else, like e.g. >>>>>>> PIPE_FORMAT_R5G6B5_UNORM and countless others, will have to go through >>>>>>> a bounce buffer. >>>>>>> >>>>>>> At transfer map time, you convert the data from GPU-style to >>>>>>> CPU-style, and copy back the relevant bits at unmap/flush time. >>>>>>> >>>>>>> This presents a nice clean boundary for this stuff. Instead of the >>>>>>> state tracker trying to guess what the driver will do and feeding it >>>>>>> endiannesses that it can't possibly guess properly, the tracking logic >>>>>>> is relegated to the driver, and we extend the interfaces to allow the >>>>>>> state tracker to access the data in a proper way. >>>>>>> >>>>>>> I believe the advantage of this scheme is that beyond adding format >>>>>>> parameters to pipe_transfer_map() calls, there will not need to be any >>>>>>> adjustments to the state trackers. >>>>>>> >>>>>>> One yet-to-be-resolved issue is what to do about glMapBuffer* - it >>>>>>> maps a buffer, it's formatless (at map time), and yet the GPU will be >>>>>>> required to interpret it correctly. We could decree that PIPE_BUFFER >>>>>>> is just *always* an array of R8_UNORM and thus never needs any type of >>>>>>> swapping. The driver needs to adjust accordingly to deal with accesses >>>>>>> that don't fit that pattern (and where parameters can't be fed to the >>>>>>> GPU to interpret it properly). >>>>>>> >>>>>>> I think something like the above will work. And I think it presents a >>>>>>> cleaner barrier than your proposal, because none of the "this GPU can >>>>>>> kinda-sorta understand BE, but not everywhere" details are ever >>>>>>> exposed to the state tracker. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> -ilia >>>>>> >>>>>> Ilia, >>>>>> >>>>>> To make the GPU do a conversion during blitting, I need to configure >>>>>> registers. This is done in a couple of functions in the r600g driver >>>>>> (r600_translate_texformat, r600_colorformat_endian_swap, >>>>>> r600_translate_colorformat and r600_translate_colorswap). >>>>>> >>>>>> The problem is that transfer_map/unmap don't call directly to those >>>>>> functions. They call other functions which eventually call those 4 >>>>>> functions. Among those "other" functions, there are several function >>>>>> calls which are *not* in the r600g driver. i.e. we go back to generic >>>>>> util functions. For example: >>>>>> >>>>>> #0 r600_translate_colorformat >>>>>> #1 evergreen_init_color_surface >>>>>> #2 evergreen_set_framebuffer_state >>>>>> #3 util_blitter_custom_depth_stencil >>>>>> #4 r600_blit_decompress_depth >>>>>> #5 r600_texture_transfer_map >>>>>> >>>>>> Am I allowed to now pass information from transfer_map/unmap all the >>>>>> way down to the 4 functions I mentioned through all these layers as >>>>>> additional parameters ? I preferred to put it in pipe_resource as that >>>>>> information goes all the way down to those functions, but if I can't >>>>>> use that, then what's an acceptable alternative ? >>>>>> >>>>>> This time, I would like to get an agreement *before* I implement it. >>>>> >>>>> Probably a good idea. And as issues are investigated, people's >>>>> opinions on the "correct" way might shift. Let's think about this... >>>>> >>>>> So clearly *a* correct way to handle this would be to stop all the >>>>> lying. What's the lie? The lie is the PIPE_FORMAT. It talks about e.g. >>>>> R5G6B5 but makes no mention of the byte layout in memory for those 16 >>>>> bits. Really what we have right now is a format and an *implicit* >>>>> endian ordering, which is the CPU's. But what happens when the CPU and >>>>> GPU don't agree? >>>>> >>>>> There's a path we could take which would be to add an endianness >>>>> alongside each format (be it by doubling formats, or an explicit >>>>> second field). This would be a very far-reaching change though, and I >>>>> doubt you'll want to do it. What we're left with is having a format >>>>> and an *implicit* endianness. Which means that the consumers of the >>>>> format need to be able to work out the implicit endianness involved. >>>>> And the endianness will be GPU endian for regular resources, and CPU >>>>> endian for "staging" resources. So it's definitely tempting to stick >>>>> the endian thing into a private field of the resource, like Rob is >>>>> suggesting - when creating a staging texture in >>>>> transfer_map/unmap/flush, set the endianness the cpu endian. Otherwise >>>>> set it to gpu endian. And I think this is somewhat similar to your >>>>> former approach. >>>>> >>>>> What do you think? >>>>> >>>>> -ilia >>>> >>>> I don't think I have any other choice but to stick it as a private >>>> field, because the endian parameter simple can't go through all the >>>> function calls as an additional parameter. The reason is that >>>> set_framebuffer_state() function types are called from >>>> st_invalidate_state, where I don't have any idea about the "correct" >>>> endianess, so I can't add the endian parameter to that function type. >>>> >>>> The only thing that is propagated through all layers is r600_texture. >>>> I'll try to use that. >>>> >>>> Marek, Michel, >>>> Do you think it is OK to add the endian mark to that private structure ? >>> >>> Yes, but doesn't util_format_description::is_array provide that info >>> already? >> >> If you mean to say that they are interchangeable, than no. >> It's true that for array formats we *never* need to do swaps, but for >> non-array formats there are cases where we need to do it and cases >> where we don't need to do it. >> >> For example, when writing to depth buffer through st_DrawPixels >> (glDrawPixels) with GL_FLOAT, than the format used is >> PIPE_FORMAT_Z32_FLOAT. >> In the make_texture() part, we need to configure endian swap for the >> staging buffer, but after you return from it and go into >> st_create_texture_sampler_view(), you want the destination texture to >> be configured without endian swapping. > > So the CB/DB block can't swap bytes when accessing Z32? > Not sure what you mean by this sentence. Could you please further explain your question ?
> >> Both those buffers are configured with PIPE_FORMAT_Z32_FLOAT, so >> that's why is_array can't be used instead of a dedicated flag. > > Can't the driver figure out on its own whether it needs to swap or not > based on other available information, e.g. considering the resource > usage/bind flags and/or whether the resource is sampled from or rendered to? > Short answer, no. Per the suggestions I received, I know try to use PIPE_TRANSFER_WRITE to know that in r600_texture_transfer_map. But I need to propagate that knowledge to the actual functions that do the endian/format configuration (the 4 functions I mentioned in earlier emails). The best way I found how to do that is to add a flag to the private r600_texture structure. If you have a better suggestion, I'm open to hearing it. In addition, sometimes we don't pass through pipe_transfer_map/unmap. e.g. st_create_texture_sampler_view(), which eventually gets to r600_translate_texformat() and r600_colorformat_endian_swap(). How those two functions will know what to do, without me giving them some hint ? how do they know if they arrived from st_create_texture_sampler_view() or from util_blitter_blit_generic() ? And for many cases, we need this distinction, otherwise I wouldn't need to fix anything. Oded > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev