On Mon, Feb 22, 2016 at 11:50 AM, Hans de Goede <hdego...@redhat.com> wrote: > Hi, > > > On 22-02-16 17:13, Ilia Mirkin wrote: >> >> On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin <imir...@alum.mit.edu> >> wrote: >>> >>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdego...@redhat.com> >>> wrote: >>>>> >>>>> But assuming I'm right, what I'm proposing is that instead of passing >>>>> the input in as a global buffer, to instead pass it in as a const >>>>> buffer. As such instead of sticking it into ->set_global_binding, >>>>> you'd stick it into ->set_constant_buffer, and then you'll be able to >>>>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly, >>>>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when >>>>> it's 0.) You don't even have to load these, you can use them as args >>>>> directly anywhere you like (except as indirect addresses). >>>>> >>>>> The old code would actually take the supplied inputs, stick them into >>>>> a constbuf, and then lower RINPUT accesses to load from that constbuf. >>>>> I'm suggesting we cut out the middleman. >>>>> >>>>> By the way, another term for "constant buffer" is "uniform buffer", on >>>>> the off chance it helps. Basically it's super-cached by the shader for >>>>> values that never change across shader invocations. [And there's >>>>> special stuff in the hw to allow running multiple sets of shader >>>>> invocations with different "constant" values... or so we think.] >>>> >>>> >>>> >>>> I'm fine with using constant buffers for the input, it is not the >>>> mechanism I'm worried about it is the tgsi syntax to express things, >>>> I think it would be beneficial for the tgsi syntax to be abstract, and >>>> not worry about the underlying mechanism, this will i.e. allow us >>>> to use shared memory for input on tesla and const bufs on later >>>> generations >>>> without the part generating the tgsi code needing to worry about this. >>> >>> >>> Yeah, I think you're right. I didn't realize that tesla had a special >>> form of input for user params, I assumed it was just the usual thing. >>> So forget about constbufs, go with the INPUT thing. Which is great, >>> since we had one value left over in that (future) 2-bit field :) >>> >>>> >>>> ### >>>> >>>> Somewhat unrelated to the input problem, I'm also somewhat worried >>>> about the addressing method for MEMORY type registers. >>>> >>>> Looking at the old RES stuff then the "index" passed into say a LOAD >>>> was not as much an index as it was simply a 32 bit GPU virtual memory >>>> address, which fits well with the OpenCL ways of doing things (the >>>> register number as in the 55 in RES[55] was more or less ignored). >>>> >>>> Where as, e.g. the new BUFFER style "registers" the index really >>>> is an index, e.g. doing: >>>> LOAD TEMP[0].x, BUFFER[0], IMM[0] >>>> resp. >>>> LOAD TEMP[0].x, BUFFER[1], IMM[0] >>>> >>>> Will read from a different memory address, correct ? >>> >>> >>> Correct -- BUFFER[0] refers to the buffer at binding point 0, and >>> BUFFER[1] refers to the buffer at binding point 1. They might, in >>> fact, overlap, or even be the same buffer. But the code doesn't know >>> about that. > > > Ack. > >>>> So how will this work for MEMORY type registers ? For OpenCL having the >>>> 1-dimensional behavior of RES really is quite useful, and having the >>>> address be composed of a hidden base address which gets determined under >>>> the hood from the register number, and then adding an index on top of >>>> it does not fit so well. >>> >>> >>> Not sure what the question is... you have code like >>> >>> int *foo = [pointer value from user input]; >>> *foo = *(foo + 5); >>> >>> right? >>> >>> So that'd just become >>> >>> MOV TEMP[0].x, <val from user input, whereever it is> >>> ADD TEMP[0].y, TEMP[0].x, 5 * 4 >>> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y >>> STORE MEMORY[0], TEMP[0].x, TEMP[1].x >>> >>> or perhaps I'm misunderstanding something? >>> >>> MEMORY, GLOBAL == the global virtual memory address space, not some >>> specific buffer. Trying to load address 0 from it will likely lead to >>> sadness, unless you happen to have something mapped there. BUFFER has >>> an implied base address, based on the binding point, but MEMORY has no >>> such thing. > > > OK, that answers my questions / worries, I was worried that MEMORY > too would have an implied base address, which would more or less only > get in the way with opencl, but if the memory register file takes > a virtual memory address as second operand to LOAD then I'm happy. > > So I guess that if we mix in say TGSI-shared / OpenCL-local memory > them I would do: > > DCL MEMORY[0], GLOBAL > DCL MEMORY[1], SHARED > > And then to load something from global mem at offset TEMP[0].y: > > LOAD TEMP[0].x, MEMORY[0], TEMP[0].yyyy > > And to load something from the shared mem at offset TEMP[0].y: > > LOAD TEMP[0].x, MEMORY[1], TEMP[0].yyyy > > Correct ? And the shared mem to will take shared virtual memory > addresses, just like global takes global virtual memory > addresses ?
That's how I see it. You may have to add LOAD64/STORE64 for 64-bit addresses though. Or we could decree that all addressing on global memory shall be 64-bit (and thus read the .xy components of the address source). > >> Another way of looking at it is that instead of having the hacky >> RES[12345] being hardcoded to mean something special, you now have a >> dedicated file called 'MEMORY', which has identical semantics. > > > I'm all for getting rid of the RES[12345] hack :) > > I guess where you write "you now have a dedicated file called 'MEMORY'" > You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED > and LOCAL at least, and probably as discussed one for INPUT ? > > This all sounds good to me, as said my worry was that MEMORY would have > an implied base address like BUFFER has, now that you've > made clear that MEMORY does not have this I'm happy :) There's a bit of a wrinkle here, and it's questionable whether we want to allow for this somehow, but... Tesla actually has no way to address global memory. It's always done with a base offset (which can be set to 0). The trick is that it can only address 32 bits at a time, there's no 64-bit addressing. But it has *16* such "global" memory spaces, i.e. which are each base + up to 32-bit offset [and ultimately only 40 bits of addressability]. I don't know if OpenCL provides something good for that, if it does we can use semantic indices on the GLOBAL to make it like DCL MEMORY[0], GLOBAL[0] DCL MEMORY[1], GLOBAL[1] etc. But again, this is pretty optional. -ilia _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau