Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Hans de Goede Tue, 23 Feb 2016 02:45:13 -0800

Hi,

On 22-02-16 17:59, Ilia Mirkin wrote:

On Mon, Feb 22, 2016 at 11:50 AM, Hans de Goede <hdego...@redhat.com> wrote:

Hi,



On 22-02-16 17:13, Ilia Mirkin wrote:


On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin <imir...@alum.mit.edu>
wrote:


On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede <hdego...@redhat.com>
wrote:


But assuming I'm right, what I'm proposing is that instead of passing
the input in as a global buffer, to instead pass it in as a const
buffer. As such instead of sticking it into ->set_global_binding,
you'd stick it into ->set_constant_buffer, and then you'll be able to
refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
it's 0.) You don't even have to load these, you can use them as args
directly anywhere you like (except as indirect addresses).

The old code would actually take the supplied inputs, stick them into
a constbuf, and then lower RINPUT accesses to load from that constbuf.
I'm suggesting we cut out the middleman.

By the way, another term for "constant buffer" is "uniform buffer", on
the off chance it helps. Basically it's super-cached by the shader for
values that never change across shader invocations. [And there's
special stuff in the hw to allow running multiple sets of shader
invocations with different "constant" values... or so we think.]




I'm fine with using constant buffers for the input, it is not the
mechanism I'm worried about it is the tgsi syntax to express things,
I think it would be beneficial for the tgsi syntax to be abstract, and
not worry about the underlying mechanism, this will i.e. allow us
to use shared memory for input on tesla and const bufs on later
generations
without the part generating the tgsi code needing to worry about this.



Yeah, I think you're right. I didn't realize that tesla had a special
form of input for user params, I assumed it was just the usual thing.
So forget about constbufs, go with the INPUT thing. Which is great,
since we had one value left over in that (future) 2-bit field :)


###

Somewhat unrelated to the input problem, I'm also somewhat worried
about the addressing method for MEMORY type registers.

Looking at the old RES stuff then the "index" passed into say a LOAD
was not as much an index as it was simply a 32 bit GPU virtual memory
address, which fits well with the OpenCL ways of doing things (the
register number as in the 55 in RES[55] was more or less ignored).

Where as, e.g. the new BUFFER style "registers" the index really
is an index, e.g. doing:
LOAD TEMP[0].x, BUFFER[0], IMM[0]
resp.
LOAD TEMP[0].x, BUFFER[1], IMM[0]

Will read from a different memory address, correct ?



Correct -- BUFFER[0] refers to the buffer at binding point 0, and
BUFFER[1] refers to the buffer at binding point 1. They might, in
fact, overlap, or even be the same buffer. But the code doesn't know
about that.



Ack.

So how will this work for MEMORY type registers ? For OpenCL having the
1-dimensional behavior of RES really is quite useful, and having the
address be composed of a hidden base address which gets determined under
the hood from the register number, and then adding an index on top of
it does not fit so well.



Not sure what the question is... you have code like

int *foo = [pointer value from user input];
*foo = *(foo + 5);

right?

So that'd just become

MOV TEMP[0].x, <val from user input, whereever it is>
ADD TEMP[0].y, TEMP[0].x, 5 * 4
LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
STORE MEMORY[0], TEMP[0].x, TEMP[1].x

or perhaps I'm misunderstanding something?

MEMORY, GLOBAL == the global virtual memory address space, not some
specific buffer. Trying to load address 0 from it will likely lead to
sadness, unless you happen to have something mapped there. BUFFER has
an implied base address, based on the binding point, but MEMORY has no
such thing.



OK, that answers my questions / worries, I was worried that MEMORY
too would have an implied base address, which would more or less only
get in the way with opencl, but if the memory register file takes
a virtual memory address as second operand to LOAD then I'm happy.

So I guess that if we mix in say TGSI-shared / OpenCL-local memory
them I would do:

DCL MEMORY[0], GLOBAL
DCL MEMORY[1], SHARED

And then to load something from global mem at offset TEMP[0].y:

LOAD TEMP[0].x, MEMORY[0], TEMP[0].yyyy

And to load something from the shared mem at offset TEMP[0].y:

LOAD TEMP[0].x, MEMORY[1], TEMP[0].yyyy

Correct ?  And the shared mem to will take shared virtual memory
addresses, just like global takes global virtual memory
addresses ?


That's how I see it.


Good.

You may have to add LOAD64/STORE64 for 64-bit
addresses though. Or we could decree that all addressing on global
memory shall be 64-bit (and thus read the .xy components of the
address source).


I would prefer to keep LOAD / STORE semantics the same as with
other LOAD / STORE -s to / from 1d buffers.

I think that in the end the tgsi backend for llvm will get both
a 32 bit and a 64 bit mode, like the nvptx backend already has.

And then the 64 bit backend will use a new LOAD64 / STORE64
also do not forget that keeping 64 bit pointers takes twice as
much registers, so 32 bit will likely be optimal in a lot of
cases. I guess since OpenCL does not give the user a way
to select which mode to use we will end up with some sort
of heuristic based on the amount of memory on the card or
some such.

After all using 64 bit pointers does not make a lot of sense
on a card with only 1 GB of RAM (yes I know we're talking virtual
address space here).

Anyways this all really is too soon to tell. Maybe the performance
impact of using 64 bit pointers is negligible. But I think it would
be good (and consistent) to keep LOAD / STORE taking 32 bit addresses
even for MEMORY and add a LOAD64 / STORE64 when I get around to
implementing a 64 bit mode for the llvm tgsi backend (or when others
need them).

Another way of looking at it is that instead of having the hacky
RES[12345] being hardcoded to mean something special, you now have a
dedicated file called 'MEMORY', which has identical semantics.



I'm all for getting rid of the RES[12345] hack :)

I guess where you write "you now have a dedicated file called 'MEMORY'"
You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
and LOCAL at least, and probably as discussed one for INPUT ?

This all sounds good to me, as said my worry was that MEMORY would have
an implied base address like BUFFER has, now that you've
made clear that MEMORY does not have this I'm happy :)


There's a bit of a wrinkle here, and it's questionable whether we want
to allow for this somehow, but... Tesla actually has no way to address
global memory. It's always done with a base offset (which can be set
to 0). The trick is that it can only address 32 bits at a time,
there's no 64-bit addressing. But it has *16* such "global" memory
spaces, i.e. which are each base + up to 32-bit offset [and ultimately
only 40 bits of addressability]. I don't know if OpenCL provides
something good for that, if it does we can use semantic indices on the
GLOBAL to make it like

DCL MEMORY[0], GLOBAL[0]
DCL MEMORY[1], GLOBAL[1]

etc. But again, this is pretty optional.


I think that for Tesla we can just only support the tgsi32 target
and not the tgsi64 target, at least that is how I envision things
today, who knows what tomorrow will bring :)

Regards,

Hans
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

Reply via email to