Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-23 Thread Ilia Mirkin
On Tue, Feb 23, 2016 at 7:12 AM, Pierre Moreau  wrote:
> On 11:43 AM - Feb 23 2016, Hans de Goede wrote:
>> Hi,
>>
>
> [snip]
>
>>
>> >You may have to add LOAD64/STORE64 for 64-bit
>> >addresses though. Or we could decree that all addressing on global
>> >memory shall be 64-bit (and thus read the .xy components of the
>> >address source).
>>
>> I would prefer to keep LOAD / STORE semantics the same as with
>> other LOAD / STORE -s to / from 1d buffers.
>>
>> I think that in the end the tgsi backend for llvm will get both
>> a 32 bit and a 64 bit mode, like the nvptx backend already has.
>>
>> And then the 64 bit backend will use a new LOAD64 / STORE64
>> also do not forget that keeping 64 bit pointers takes twice as
>> much registers, so 32 bit will likely be optimal in a lot of
>> cases. I guess since OpenCL does not give the user a way
>> to select which mode to use we will end up with some sort
>> of heuristic based on the amount of memory on the card or
>> some such.
>>
>> After all using 64 bit pointers does not make a lot of sense
>> on a card with only 1 GB of RAM (yes I know we're talking virtual
>> address space here).
>>
>> Anyways this all really is too soon to tell. Maybe the performance
>> impact of using 64 bit pointers is negligible. But I think it would
>> be good (and consistent) to keep LOAD / STORE taking 32 bit addresses
>> even for MEMORY and add a LOAD64 / STORE64 when I get around to
>> implementing a 64 bit mode for the llvm tgsi backend (or when others
>> need them).
>>
>> >>>Another way of looking at it is that instead of having the hacky
>> >>>RES[12345] being hardcoded to mean something special, you now have a
>> >>>dedicated file called 'MEMORY', which has identical semantics.
>> >>
>> >>
>> >>I'm all for getting rid of the RES[12345] hack :)
>> >>
>> >>I guess where you write "you now have a dedicated file called 'MEMORY'"
>> >>You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
>> >>and LOCAL at least, and probably as discussed one for INPUT ?
>> >>
>> >>This all sounds good to me, as said my worry was that MEMORY would have
>> >>an implied base address like BUFFER has, now that you've
>> >>made clear that MEMORY does not have this I'm happy :)
>> >
>> >There's a bit of a wrinkle here, and it's questionable whether we want
>> >to allow for this somehow, but... Tesla actually has no way to address
>> >global memory. It's always done with a base offset (which can be set
>> >to 0). The trick is that it can only address 32 bits at a time,
>> >there's no 64-bit addressing. But it has *16* such "global" memory
>> >spaces, i.e. which are each base + up to 32-bit offset [and ultimately
>> >only 40 bits of addressability]. I don't know if OpenCL provides
>> >something good for that, if it does we can use semantic indices on the
>> >GLOBAL to make it like
>> >
>> >DCL MEMORY[0], GLOBAL[0]
>> >DCL MEMORY[1], GLOBAL[1]
>> >
>> >etc. But again, this is pretty optional.
>>
>> I think that for Tesla we can just only support the tgsi32 target
>> and not the tgsi64 target, at least that is how I envision things
>> today, who knows what tomorrow will bring :)
>
> Fermi is the first family supporting 64 bit addresses, on top of 32 bit
> addresses, while Kepler can only do 64 bit IIRC (or is it Maxwell which 
> dropped
> the 32 bit support?).

I've only ever seen the blob do 64-bit addressing in OpenGL shaders on
Fermi+. However Hans assures me that 32-bit addressing actually works
fine, presumably the upper 8 bits are set to 0 or some constant
supplied via some register somewhere. All of the SM20+ ISA's have a
separate "wide address" bit in their gmem-using instructions. (aka "E"
in nvdisasm)

Anyways, the only reason to add LOAD64 is not to muddy LOAD semantics.
LOAD should remain 32-bit (single-component), while LOAD64 would be
defined only for 1d entities and combine the first 2 components of the
address source.

I think Hans and I are in agreement on most points now... the only
issue is that we don't really control where in VA space buffers are
allocated, so you even if you have 1MB of vram, it might end up being
somewhere in far-off 40-bit land. Ben has promised to provide such
control in the future, but who knows when that work will land.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-23 Thread Pierre Moreau
On 11:43 AM - Feb 23 2016, Hans de Goede wrote:
> Hi,
> 

[snip]

> 
> >You may have to add LOAD64/STORE64 for 64-bit
> >addresses though. Or we could decree that all addressing on global
> >memory shall be 64-bit (and thus read the .xy components of the
> >address source).
> 
> I would prefer to keep LOAD / STORE semantics the same as with
> other LOAD / STORE -s to / from 1d buffers.
> 
> I think that in the end the tgsi backend for llvm will get both
> a 32 bit and a 64 bit mode, like the nvptx backend already has.
> 
> And then the 64 bit backend will use a new LOAD64 / STORE64
> also do not forget that keeping 64 bit pointers takes twice as
> much registers, so 32 bit will likely be optimal in a lot of
> cases. I guess since OpenCL does not give the user a way
> to select which mode to use we will end up with some sort
> of heuristic based on the amount of memory on the card or
> some such.
> 
> After all using 64 bit pointers does not make a lot of sense
> on a card with only 1 GB of RAM (yes I know we're talking virtual
> address space here).
> 
> Anyways this all really is too soon to tell. Maybe the performance
> impact of using 64 bit pointers is negligible. But I think it would
> be good (and consistent) to keep LOAD / STORE taking 32 bit addresses
> even for MEMORY and add a LOAD64 / STORE64 when I get around to
> implementing a 64 bit mode for the llvm tgsi backend (or when others
> need them).
> 
> >>>Another way of looking at it is that instead of having the hacky
> >>>RES[12345] being hardcoded to mean something special, you now have a
> >>>dedicated file called 'MEMORY', which has identical semantics.
> >>
> >>
> >>I'm all for getting rid of the RES[12345] hack :)
> >>
> >>I guess where you write "you now have a dedicated file called 'MEMORY'"
> >>You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
> >>and LOCAL at least, and probably as discussed one for INPUT ?
> >>
> >>This all sounds good to me, as said my worry was that MEMORY would have
> >>an implied base address like BUFFER has, now that you've
> >>made clear that MEMORY does not have this I'm happy :)
> >
> >There's a bit of a wrinkle here, and it's questionable whether we want
> >to allow for this somehow, but... Tesla actually has no way to address
> >global memory. It's always done with a base offset (which can be set
> >to 0). The trick is that it can only address 32 bits at a time,
> >there's no 64-bit addressing. But it has *16* such "global" memory
> >spaces, i.e. which are each base + up to 32-bit offset [and ultimately
> >only 40 bits of addressability]. I don't know if OpenCL provides
> >something good for that, if it does we can use semantic indices on the
> >GLOBAL to make it like
> >
> >DCL MEMORY[0], GLOBAL[0]
> >DCL MEMORY[1], GLOBAL[1]
> >
> >etc. But again, this is pretty optional.
> 
> I think that for Tesla we can just only support the tgsi32 target
> and not the tgsi64 target, at least that is how I envision things
> today, who knows what tomorrow will bring :)

Fermi is the first family supporting 64 bit addresses, on top of 32 bit
addresses, while Kepler can only do 64 bit IIRC (or is it Maxwell which dropped
the 32 bit support?).

Regards,
Pierre

> 
> Regards,
> 
> Hans
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau


signature.asc
Description: PGP signature
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 11:50 AM, Hans de Goede  wrote:
> Hi,
>
>
> On 22-02-16 17:13, Ilia Mirkin wrote:
>>
>> On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin 
>> wrote:
>>>
>>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede 
>>> wrote:
>
> But assuming I'm right, what I'm proposing is that instead of passing
> the input in as a global buffer, to instead pass it in as a const
> buffer. As such instead of sticking it into ->set_global_binding,
> you'd stick it into ->set_constant_buffer, and then you'll be able to
> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
> it's 0.) You don't even have to load these, you can use them as args
> directly anywhere you like (except as indirect addresses).
>
> The old code would actually take the supplied inputs, stick them into
> a constbuf, and then lower RINPUT accesses to load from that constbuf.
> I'm suggesting we cut out the middleman.
>
> By the way, another term for "constant buffer" is "uniform buffer", on
> the off chance it helps. Basically it's super-cached by the shader for
> values that never change across shader invocations. [And there's
> special stuff in the hw to allow running multiple sets of shader
> invocations with different "constant" values... or so we think.]



 I'm fine with using constant buffers for the input, it is not the
 mechanism I'm worried about it is the tgsi syntax to express things,
 I think it would be beneficial for the tgsi syntax to be abstract, and
 not worry about the underlying mechanism, this will i.e. allow us
 to use shared memory for input on tesla and const bufs on later
 generations
 without the part generating the tgsi code needing to worry about this.
>>>
>>>
>>> Yeah, I think you're right. I didn't realize that tesla had a special
>>> form of input for user params, I assumed it was just the usual thing.
>>> So forget about constbufs, go with the INPUT thing. Which is great,
>>> since we had one value left over in that (future) 2-bit field :)
>>>

 ###

 Somewhat unrelated to the input problem, I'm also somewhat worried
 about the addressing method for MEMORY type registers.

 Looking at the old RES stuff then the "index" passed into say a LOAD
 was not as much an index as it was simply a 32 bit GPU virtual memory
 address, which fits well with the OpenCL ways of doing things (the
 register number as in the 55 in RES[55] was more or less ignored).

 Where as, e.g. the new BUFFER style "registers" the index really
 is an index, e.g. doing:
 LOAD TEMP[0].x, BUFFER[0], IMM[0]
 resp.
 LOAD TEMP[0].x, BUFFER[1], IMM[0]

 Will read from a different memory address, correct ?
>>>
>>>
>>> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
>>> BUFFER[1] refers to the buffer at binding point 1. They might, in
>>> fact, overlap, or even be the same buffer. But the code doesn't know
>>> about that.
>
>
> Ack.
>
 So how will this work for MEMORY type registers ? For OpenCL having the
 1-dimensional behavior of RES really is quite useful, and having the
 address be composed of a hidden base address which gets determined under
 the hood from the register number, and then adding an index on top of
 it does not fit so well.
>>>
>>>
>>> Not sure what the question is... you have code like
>>>
>>> int *foo = [pointer value from user input];
>>> *foo = *(foo + 5);
>>>
>>> right?
>>>
>>> So that'd just become
>>>
>>> MOV TEMP[0].x, 
>>> ADD TEMP[0].y, TEMP[0].x, 5 * 4
>>> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
>>> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
>>>
>>> or perhaps I'm misunderstanding something?
>>>
>>> MEMORY, GLOBAL == the global virtual memory address space, not some
>>> specific buffer. Trying to load address 0 from it will likely lead to
>>> sadness, unless you happen to have something mapped there. BUFFER has
>>> an implied base address, based on the binding point, but MEMORY has no
>>> such thing.
>
>
> OK, that answers my questions / worries, I was worried that MEMORY
> too would have an implied base address, which would more or less only
> get in the way with opencl, but if the memory register file takes
> a virtual memory address as second operand to LOAD then I'm happy.
>
> So I guess that if we mix in say TGSI-shared / OpenCL-local memory
> them I would do:
>
> DCL MEMORY[0], GLOBAL
> DCL MEMORY[1], SHARED
>
> And then to load something from global mem at offset TEMP[0].y:
>
> LOAD TEMP[0].x, MEMORY[0], TEMP[0].
>
> And to load something from the shared mem at offset TEMP[0].y:
>
> LOAD TEMP[0].x, MEMORY[1], TEMP[0].
>
> Correct ?  And the shared mem to will take shared virtual memory
> addresses, just like global takes global 

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 22-02-16 17:13, Ilia Mirkin wrote:

On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin  wrote:

On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede  wrote:

But assuming I'm right, what I'm proposing is that instead of passing
the input in as a global buffer, to instead pass it in as a const
buffer. As such instead of sticking it into ->set_global_binding,
you'd stick it into ->set_constant_buffer, and then you'll be able to
refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
it's 0.) You don't even have to load these, you can use them as args
directly anywhere you like (except as indirect addresses).

The old code would actually take the supplied inputs, stick them into
a constbuf, and then lower RINPUT accesses to load from that constbuf.
I'm suggesting we cut out the middleman.

By the way, another term for "constant buffer" is "uniform buffer", on
the off chance it helps. Basically it's super-cached by the shader for
values that never change across shader invocations. [And there's
special stuff in the hw to allow running multiple sets of shader
invocations with different "constant" values... or so we think.]



I'm fine with using constant buffers for the input, it is not the
mechanism I'm worried about it is the tgsi syntax to express things,
I think it would be beneficial for the tgsi syntax to be abstract, and
not worry about the underlying mechanism, this will i.e. allow us
to use shared memory for input on tesla and const bufs on later generations
without the part generating the tgsi code needing to worry about this.


Yeah, I think you're right. I didn't realize that tesla had a special
form of input for user params, I assumed it was just the usual thing.
So forget about constbufs, go with the INPUT thing. Which is great,
since we had one value left over in that (future) 2-bit field :)



###

Somewhat unrelated to the input problem, I'm also somewhat worried
about the addressing method for MEMORY type registers.

Looking at the old RES stuff then the "index" passed into say a LOAD
was not as much an index as it was simply a 32 bit GPU virtual memory
address, which fits well with the OpenCL ways of doing things (the
register number as in the 55 in RES[55] was more or less ignored).

Where as, e.g. the new BUFFER style "registers" the index really
is an index, e.g. doing:
LOAD TEMP[0].x, BUFFER[0], IMM[0]
resp.
LOAD TEMP[0].x, BUFFER[1], IMM[0]

Will read from a different memory address, correct ?


Correct -- BUFFER[0] refers to the buffer at binding point 0, and
BUFFER[1] refers to the buffer at binding point 1. They might, in
fact, overlap, or even be the same buffer. But the code doesn't know
about that.


Ack.


So how will this work for MEMORY type registers ? For OpenCL having the
1-dimensional behavior of RES really is quite useful, and having the
address be composed of a hidden base address which gets determined under
the hood from the register number, and then adding an index on top of
it does not fit so well.


Not sure what the question is... you have code like

int *foo = [pointer value from user input];
*foo = *(foo + 5);

right?

So that'd just become

MOV TEMP[0].x, 
ADD TEMP[0].y, TEMP[0].x, 5 * 4
LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
STORE MEMORY[0], TEMP[0].x, TEMP[1].x

or perhaps I'm misunderstanding something?

MEMORY, GLOBAL == the global virtual memory address space, not some
specific buffer. Trying to load address 0 from it will likely lead to
sadness, unless you happen to have something mapped there. BUFFER has
an implied base address, based on the binding point, but MEMORY has no
such thing.


OK, that answers my questions / worries, I was worried that MEMORY
too would have an implied base address, which would more or less only
get in the way with opencl, but if the memory register file takes
a virtual memory address as second operand to LOAD then I'm happy.

So I guess that if we mix in say TGSI-shared / OpenCL-local memory
them I would do:

DCL MEMORY[0], GLOBAL
DCL MEMORY[1], SHARED

And then to load something from global mem at offset TEMP[0].y:

LOAD TEMP[0].x, MEMORY[0], TEMP[0].

And to load something from the shared mem at offset TEMP[0].y:

LOAD TEMP[0].x, MEMORY[1], TEMP[0].

Correct ?  And the shared mem to will take shared virtual memory
addresses, just like global takes global virtual memory
addresses ?


Another way of looking at it is that instead of having the hacky
RES[12345] being hardcoded to mean something special, you now have a
dedicated file called 'MEMORY', which has identical semantics.


I'm all for getting rid of the RES[12345] hack :)

I guess where you write "you now have a dedicated file called 'MEMORY'"
You mean up to X dedicated MEMORY[#] files, one for each of GLOBAL, SHARED
and LOCAL at least, and probably as discussed one for INPUT ?

This all sounds good to me, as said my worry was that 

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 11:07 AM, Pierre Moreau  wrote:
>> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede 
>> wrote:
>> >> But assuming I'm right, what I'm proposing is that instead of
>> >> passing
>> >> the input in as a global buffer, to instead pass it in as a const
>> >> buffer. As such instead of sticking it into ->set_global_binding,
>> >> you'd stick it into ->set_constant_buffer, and then you'll be able
>> >> to
>> >> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>> >> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim
>> >> when
>> >> it's 0.) You don't even have to load these, you can use them as
>> >> args
>> >> directly anywhere you like (except as indirect addresses).
>> >>
>> >> The old code would actually take the supplied inputs, stick them
>> >> into
>> >> a constbuf, and then lower RINPUT accesses to load from that
>> >> constbuf.
>> >> I'm suggesting we cut out the middleman.
>> >>
>> >> By the way, another term for "constant buffer" is "uniform
>> >> buffer", on
>> >> the off chance it helps. Basically it's super-cached by the shader
>> >> for
>> >> values that never change across shader invocations. [And there's
>> >> special stuff in the hw to allow running multiple sets of shader
>> >> invocations with different "constant" values... or so we think.]
>> >
>> >
>> > I'm fine with using constant buffers for the input, it is not the
>> > mechanism I'm worried about it is the tgsi syntax to express
>> > things,
>> > I think it would be beneficial for the tgsi syntax to be abstract,
>> > and
>> > not worry about the underlying mechanism, this will i.e. allow us
>> > to use shared memory for input on tesla and const bufs on later
>> > generations
>> > without the part generating the tgsi code needing to worry about
>> > this.
>>
>> Yeah, I think you're right. I didn't realize that tesla had a special
>> form of input for user params, I assumed it was just the usual thing.
>> So forget about constbufs, go with the INPUT thing. Which is great,
>> since we had one value left over in that (future) 2-bit field :)
>
> I can have a try at using constbufs for user inputs on Tesla. It's just
> that the blob was using shared for them, so we kept shared.

I suspect there are hw advantages to doing it the way it was being
done. We should keep doing that.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 11:00 AM, Ilia Mirkin  wrote:
> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede  wrote:
>>> But assuming I'm right, what I'm proposing is that instead of passing
>>> the input in as a global buffer, to instead pass it in as a const
>>> buffer. As such instead of sticking it into ->set_global_binding,
>>> you'd stick it into ->set_constant_buffer, and then you'll be able to
>>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
>>> it's 0.) You don't even have to load these, you can use them as args
>>> directly anywhere you like (except as indirect addresses).
>>>
>>> The old code would actually take the supplied inputs, stick them into
>>> a constbuf, and then lower RINPUT accesses to load from that constbuf.
>>> I'm suggesting we cut out the middleman.
>>>
>>> By the way, another term for "constant buffer" is "uniform buffer", on
>>> the off chance it helps. Basically it's super-cached by the shader for
>>> values that never change across shader invocations. [And there's
>>> special stuff in the hw to allow running multiple sets of shader
>>> invocations with different "constant" values... or so we think.]
>>
>>
>> I'm fine with using constant buffers for the input, it is not the
>> mechanism I'm worried about it is the tgsi syntax to express things,
>> I think it would be beneficial for the tgsi syntax to be abstract, and
>> not worry about the underlying mechanism, this will i.e. allow us
>> to use shared memory for input on tesla and const bufs on later generations
>> without the part generating the tgsi code needing to worry about this.
>
> Yeah, I think you're right. I didn't realize that tesla had a special
> form of input for user params, I assumed it was just the usual thing.
> So forget about constbufs, go with the INPUT thing. Which is great,
> since we had one value left over in that (future) 2-bit field :)
>
>>
>> ###
>>
>> Somewhat unrelated to the input problem, I'm also somewhat worried
>> about the addressing method for MEMORY type registers.
>>
>> Looking at the old RES stuff then the "index" passed into say a LOAD
>> was not as much an index as it was simply a 32 bit GPU virtual memory
>> address, which fits well with the OpenCL ways of doing things (the
>> register number as in the 55 in RES[55] was more or less ignored).
>>
>> Where as, e.g. the new BUFFER style "registers" the index really
>> is an index, e.g. doing:
>> LOAD TEMP[0].x, BUFFER[0], IMM[0]
>> resp.
>> LOAD TEMP[0].x, BUFFER[1], IMM[0]
>>
>> Will read from a different memory address, correct ?
>
> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
> BUFFER[1] refers to the buffer at binding point 1. They might, in
> fact, overlap, or even be the same buffer. But the code doesn't know
> about that.
>
>>
>> So how will this work for MEMORY type registers ? For OpenCL having the
>> 1-dimensional behavior of RES really is quite useful, and having the
>> address be composed of a hidden base address which gets determined under
>> the hood from the register number, and then adding an index on top of
>> it does not fit so well.
>
> Not sure what the question is... you have code like
>
> int *foo = [pointer value from user input];
> *foo = *(foo + 5);
>
> right?
>
> So that'd just become
>
> MOV TEMP[0].x, 
> ADD TEMP[0].y, TEMP[0].x, 5 * 4
> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
>
> or perhaps I'm misunderstanding something?
>
> MEMORY, GLOBAL == the global virtual memory address space, not some
> specific buffer. Trying to load address 0 from it will likely lead to
> sadness, unless you happen to have something mapped there. BUFFER has
> an implied base address, based on the binding point, but MEMORY has no
> such thing.

Another way of looking at it is that instead of having the hacky
RES[12345] being hardcoded to mean something special, you now have a
dedicated file called 'MEMORY', which has identical semantics.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Pierre Moreau


- Mail original -
> On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede 
> wrote:
> >> But assuming I'm right, what I'm proposing is that instead of
> >> passing
> >> the input in as a global buffer, to instead pass it in as a const
> >> buffer. As such instead of sticking it into ->set_global_binding,
> >> you'd stick it into ->set_constant_buffer, and then you'll be able
> >> to
> >> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
> >> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim
> >> when
> >> it's 0.) You don't even have to load these, you can use them as
> >> args
> >> directly anywhere you like (except as indirect addresses).
> >>
> >> The old code would actually take the supplied inputs, stick them
> >> into
> >> a constbuf, and then lower RINPUT accesses to load from that
> >> constbuf.
> >> I'm suggesting we cut out the middleman.
> >>
> >> By the way, another term for "constant buffer" is "uniform
> >> buffer", on
> >> the off chance it helps. Basically it's super-cached by the shader
> >> for
> >> values that never change across shader invocations. [And there's
> >> special stuff in the hw to allow running multiple sets of shader
> >> invocations with different "constant" values... or so we think.]
> >
> >
> > I'm fine with using constant buffers for the input, it is not the
> > mechanism I'm worried about it is the tgsi syntax to express
> > things,
> > I think it would be beneficial for the tgsi syntax to be abstract,
> > and
> > not worry about the underlying mechanism, this will i.e. allow us
> > to use shared memory for input on tesla and const bufs on later
> > generations
> > without the part generating the tgsi code needing to worry about
> > this.
> 
> Yeah, I think you're right. I didn't realize that tesla had a special
> form of input for user params, I assumed it was just the usual thing.
> So forget about constbufs, go with the INPUT thing. Which is great,
> since we had one value left over in that (future) 2-bit field :)

I can have a try at using constbufs for user inputs on Tesla. It's just
that the blob was using shared for them, so we kept shared.

Pierre

> 
> >
> > ###
> >
> > Somewhat unrelated to the input problem, I'm also somewhat worried
> > about the addressing method for MEMORY type registers.
> >
> > Looking at the old RES stuff then the "index" passed into say a
> > LOAD
> > was not as much an index as it was simply a 32 bit GPU virtual
> > memory
> > address, which fits well with the OpenCL ways of doing things (the
> > register number as in the 55 in RES[55] was more or less ignored).
> >
> > Where as, e.g. the new BUFFER style "registers" the index really
> > is an index, e.g. doing:
> > LOAD TEMP[0].x, BUFFER[0], IMM[0]
> > resp.
> > LOAD TEMP[0].x, BUFFER[1], IMM[0]
> >
> > Will read from a different memory address, correct ?
> 
> Correct -- BUFFER[0] refers to the buffer at binding point 0, and
> BUFFER[1] refers to the buffer at binding point 1. They might, in
> fact, overlap, or even be the same buffer. But the code doesn't know
> about that.
> 
> >
> > So how will this work for MEMORY type registers ? For OpenCL having
> > the
> > 1-dimensional behavior of RES really is quite useful, and having
> > the
> > address be composed of a hidden base address which gets determined
> > under
> > the hood from the register number, and then adding an index on top
> > of
> > it does not fit so well.
> 
> Not sure what the question is... you have code like
> 
> int *foo = [pointer value from user input];
> *foo = *(foo + 5);
> 
> right?
> 
> So that'd just become
> 
> MOV TEMP[0].x, 
> ADD TEMP[0].y, TEMP[0].x, 5 * 4
> LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
> STORE MEMORY[0], TEMP[0].x, TEMP[1].x
> 
> or perhaps I'm misunderstanding something?
> 
> MEMORY, GLOBAL == the global virtual memory address space, not some
> specific buffer. Trying to load address 0 from it will likely lead to
> sadness, unless you happen to have something mapped there. BUFFER has
> an implied base address, based on the binding point, but MEMORY has
> no
> such thing.
> 
>   -ilia
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
> 
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 10:50 AM, Hans de Goede  wrote:
>> But assuming I'm right, what I'm proposing is that instead of passing
>> the input in as a global buffer, to instead pass it in as a const
>> buffer. As such instead of sticking it into ->set_global_binding,
>> you'd stick it into ->set_constant_buffer, and then you'll be able to
>> refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
>> CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
>> it's 0.) You don't even have to load these, you can use them as args
>> directly anywhere you like (except as indirect addresses).
>>
>> The old code would actually take the supplied inputs, stick them into
>> a constbuf, and then lower RINPUT accesses to load from that constbuf.
>> I'm suggesting we cut out the middleman.
>>
>> By the way, another term for "constant buffer" is "uniform buffer", on
>> the off chance it helps. Basically it's super-cached by the shader for
>> values that never change across shader invocations. [And there's
>> special stuff in the hw to allow running multiple sets of shader
>> invocations with different "constant" values... or so we think.]
>
>
> I'm fine with using constant buffers for the input, it is not the
> mechanism I'm worried about it is the tgsi syntax to express things,
> I think it would be beneficial for the tgsi syntax to be abstract, and
> not worry about the underlying mechanism, this will i.e. allow us
> to use shared memory for input on tesla and const bufs on later generations
> without the part generating the tgsi code needing to worry about this.

Yeah, I think you're right. I didn't realize that tesla had a special
form of input for user params, I assumed it was just the usual thing.
So forget about constbufs, go with the INPUT thing. Which is great,
since we had one value left over in that (future) 2-bit field :)

>
> ###
>
> Somewhat unrelated to the input problem, I'm also somewhat worried
> about the addressing method for MEMORY type registers.
>
> Looking at the old RES stuff then the "index" passed into say a LOAD
> was not as much an index as it was simply a 32 bit GPU virtual memory
> address, which fits well with the OpenCL ways of doing things (the
> register number as in the 55 in RES[55] was more or less ignored).
>
> Where as, e.g. the new BUFFER style "registers" the index really
> is an index, e.g. doing:
> LOAD TEMP[0].x, BUFFER[0], IMM[0]
> resp.
> LOAD TEMP[0].x, BUFFER[1], IMM[0]
>
> Will read from a different memory address, correct ?

Correct -- BUFFER[0] refers to the buffer at binding point 0, and
BUFFER[1] refers to the buffer at binding point 1. They might, in
fact, overlap, or even be the same buffer. But the code doesn't know
about that.

>
> So how will this work for MEMORY type registers ? For OpenCL having the
> 1-dimensional behavior of RES really is quite useful, and having the
> address be composed of a hidden base address which gets determined under
> the hood from the register number, and then adding an index on top of
> it does not fit so well.

Not sure what the question is... you have code like

int *foo = [pointer value from user input];
*foo = *(foo + 5);

right?

So that'd just become

MOV TEMP[0].x, 
ADD TEMP[0].y, TEMP[0].x, 5 * 4
LOAD TEMP[1].x, MEMORY[0] (which is global), TEMP[0].y
STORE MEMORY[0], TEMP[0].x, TEMP[1].x

or perhaps I'm misunderstanding something?

MEMORY, GLOBAL == the global virtual memory address space, not some
specific buffer. Trying to load address 0 from it will likely lead to
sadness, unless you happen to have something mapped there. BUFFER has
an implied base address, based on the binding point, but MEMORY has no
such thing.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 22-02-16 16:24, Ilia Mirkin wrote:

On Mon, Feb 22, 2016 at 9:50 AM, Hans de Goede  wrote:

Hi,


On 22-02-16 15:22, Ilia Mirkin wrote:


On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede 
wrote:


Hi,

On 22-02-16 14:47, Ilia Mirkin wrote:



On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin 
wrote:



INPUT is for shader inputs which come from fixed function loaders.
This is not what you want. You want CONST. Stick the input params into
a constbuf, and you're done.




Oh, and in case it's not clear, I think this should be done by the st,
not by the driver. Not a big fan of the current interface where the
driver is responsible for uploading the kernel input parameters.




Moving this to the state-tracker will likely break clover for amd
cards, also what is the right place to stick the input params my



Not really. Could just have a PIPE_CAP. Could make it part of the
whole TGSI logic in the first place. This functionality isn't used for
the OpenGL compute shader, and it'd be nice to bring OpenCL in line
with the OpenGL stuff.


differ from one gpu to the other, also the opencl -> tgsi compiler
will need to know how to "address" input params without it needing to
know too much details of the targetted gpu. So of INPUT is not suitable,
then I think we are going to need MEMORY[#], INPUT for this, which
nouveau can then just treat as CONST.



That doesn't make sense... MEMORY is for... memory. Perhaps there's
something I don't understand about kernel inputs that would make my
suggestion unworkable? The way I see it is that you stick them into a
user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to
your thing})), and then launch the grid. Your TGSI would have been
composed such that it would expect the user params to show up in the
first constbuf.



Keep in mind the flat address space issue opencl has, there is no
such thing as the first constbuf, there is a const address space,
which is shared by all the const buffers, and the kernel gets
passed in offsets in the single address space to find different
const buffers. I'm not saying that this cannot work with your suggestion,
but it is something to keep in mind.

I've not yet looked closely at const bufs, I've only been working with
global buffers so far, here is how I understand those to work:

-clover calls pipe->set_global_binding() before calling launch_grid()
-clover has set up the uint32_t **handles pointer array to point to
  one uint32_t in the buffer which it is going to pass as "input" to
  launch_grid for each global buffer
-In the tgsi code for the kernel I can get to the global buffer pointed
  to by the first uint32_t in the input data by doing:
  LOAD TEMP[#].x, RES[#], IMM[0] # IMM[0] == 0


But inputs are somehow special in OpenCL, right? In other words, you
know when you want to load an input vs when you want to load
not-an-input, right?


Correct, the best way to view them is as them being in their own
address-space, which is why I suggested using MEMORY[#], INPUT.


And those inputs aren't in the flat, global
memory space, they're just a list of 32-bit values (or at least
expressible as such). If not, then ignore everything I've said and
I'll come up with another alternative.


More or less, the address-space is typically addressed with byte
addresses, so the second 32 bit value is addresses with address 4,
etc.


But assuming I'm right, what I'm proposing is that instead of passing
the input in as a global buffer, to instead pass it in as a const
buffer. As such instead of sticking it into ->set_global_binding,
you'd stick it into ->set_constant_buffer, and then you'll be able to
refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
it's 0.) You don't even have to load these, you can use them as args
directly anywhere you like (except as indirect addresses).

The old code would actually take the supplied inputs, stick them into
a constbuf, and then lower RINPUT accesses to load from that constbuf.
I'm suggesting we cut out the middleman.

By the way, another term for "constant buffer" is "uniform buffer", on
the off chance it helps. Basically it's super-cached by the shader for
values that never change across shader invocations. [And there's
special stuff in the hw to allow running multiple sets of shader
invocations with different "constant" values... or so we think.]


I'm fine with using constant buffers for the input, it is not the
mechanism I'm worried about it is the tgsi syntax to express things,
I think it would be beneficial for the tgsi syntax to be abstract, and
not worry about the underlying mechanism, this will i.e. allow us
to use shared memory for input on tesla and const bufs on later generations
without the part generating the tgsi code needing to worry about this.

###

Somewhat unrelated to the input problem, I'm also somewhat worried
about the addressing 

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 10:15 AM, Pierre Moreau  wrote:
> Apart from Tesla which uses shared memory for passing the OpenCL user params, 
> Fermi+ use const memory for that, and this is what Samuel's patches are doing 
> (at least from what I remember).

Hmmm that puts a bit of a hole in my theory of "let's just use
constbufs". That said, constbufs *will* work on Tesla as well.

Well, in that case, perhaps a separate input memory *is* the right
thing to do? Ugh :( This definitely needs to be discussed with the
wider gallium/tgsi community.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 9:50 AM, Hans de Goede  wrote:
> Hi,
>
>
> On 22-02-16 15:22, Ilia Mirkin wrote:
>>
>> On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede 
>> wrote:
>>>
>>> Hi,
>>>
>>> On 22-02-16 14:47, Ilia Mirkin wrote:


 On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin 
 wrote:
>
>
> INPUT is for shader inputs which come from fixed function loaders.
> This is not what you want. You want CONST. Stick the input params into
> a constbuf, and you're done.



 Oh, and in case it's not clear, I think this should be done by the st,
 not by the driver. Not a big fan of the current interface where the
 driver is responsible for uploading the kernel input parameters.
>>>
>>>
>>>
>>> Moving this to the state-tracker will likely break clover for amd
>>> cards, also what is the right place to stick the input params my
>>
>>
>> Not really. Could just have a PIPE_CAP. Could make it part of the
>> whole TGSI logic in the first place. This functionality isn't used for
>> the OpenGL compute shader, and it'd be nice to bring OpenCL in line
>> with the OpenGL stuff.
>>
>>> differ from one gpu to the other, also the opencl -> tgsi compiler
>>> will need to know how to "address" input params without it needing to
>>> know too much details of the targetted gpu. So of INPUT is not suitable,
>>> then I think we are going to need MEMORY[#], INPUT for this, which
>>> nouveau can then just treat as CONST.
>>
>>
>> That doesn't make sense... MEMORY is for... memory. Perhaps there's
>> something I don't understand about kernel inputs that would make my
>> suggestion unworkable? The way I see it is that you stick them into a
>> user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to
>> your thing})), and then launch the grid. Your TGSI would have been
>> composed such that it would expect the user params to show up in the
>> first constbuf.
>
>
> Keep in mind the flat address space issue opencl has, there is no
> such thing as the first constbuf, there is a const address space,
> which is shared by all the const buffers, and the kernel gets
> passed in offsets in the single address space to find different
> const buffers. I'm not saying that this cannot work with your suggestion,
> but it is something to keep in mind.
>
> I've not yet looked closely at const bufs, I've only been working with
> global buffers so far, here is how I understand those to work:
>
> -clover calls pipe->set_global_binding() before calling launch_grid()
> -clover has set up the uint32_t **handles pointer array to point to
>  one uint32_t in the buffer which it is going to pass as "input" to
>  launch_grid for each global buffer
> -In the tgsi code for the kernel I can get to the global buffer pointed
>  to by the first uint32_t in the input data by doing:
>  LOAD TEMP[#].x, RES[#], IMM[0] # IMM[0] == 0

But inputs are somehow special in OpenCL, right? In other words, you
know when you want to load an input vs when you want to load
not-an-input, right? And those inputs aren't in the flat, global
memory space, they're just a list of 32-bit values (or at least
expressible as such). If not, then ignore everything I've said and
I'll come up with another alternative.

But assuming I'm right, what I'm proposing is that instead of passing
the input in as a global buffer, to instead pass it in as a const
buffer. As such instead of sticking it into ->set_global_binding,
you'd stick it into ->set_constant_buffer, and then you'll be able to
refer to it as CONST[0], CONST[1], etc. (Which are, implicitly,
CONST[0][0], CONST[0][1], etc -- it doesn't print the second dim when
it's 0.) You don't even have to load these, you can use them as args
directly anywhere you like (except as indirect addresses).

The old code would actually take the supplied inputs, stick them into
a constbuf, and then lower RINPUT accesses to load from that constbuf.
I'm suggesting we cut out the middleman.

By the way, another term for "constant buffer" is "uniform buffer", on
the off chance it helps. Basically it's super-cached by the shader for
values that never change across shader invocations. [And there's
special stuff in the hw to allow running multiple sets of shader
invocations with different "constant" values... or so we think.]

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Pierre Moreau
Hello,

> On 22 Feb 2016, at 15:22, Ilia Mirkin  wrote:
> 
>> On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede  wrote:
>> Hi,
>> 
>>> On 22-02-16 14:47, Ilia Mirkin wrote:
>>> 
 On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin  wrote:
 
 INPUT is for shader inputs which come from fixed function loaders.
 This is not what you want. You want CONST. Stick the input params into
 a constbuf, and you're done.
>>> 
>>> 
>>> Oh, and in case it's not clear, I think this should be done by the st,
>>> not by the driver. Not a big fan of the current interface where the
>>> driver is responsible for uploading the kernel input parameters.
>> 
>> 
>> Moving this to the state-tracker will likely break clover for amd
>> cards, also what is the right place to stick the input params my
> 
> Not really. Could just have a PIPE_CAP. Could make it part of the
> whole TGSI logic in the first place. This functionality isn't used for
> the OpenGL compute shader, and it'd be nice to bring OpenCL in line
> with the OpenGL stuff.
> 
>> differ from one gpu to the other, also the opencl -> tgsi compiler
>> will need to know how to "address" input params without it needing to
>> know too much details of the targetted gpu. So of INPUT is not suitable,
>> then I think we are going to need MEMORY[#], INPUT for this, which
>> nouveau can then just treat as CONST.
> 
> That doesn't make sense... MEMORY is for... memory. Perhaps there's
> something I don't understand about kernel inputs that would make my
> suggestion unworkable? The way I see it is that you stick them into a
> user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to
> your thing})), and then launch the grid. Your TGSI would have been
> composed such that it would expect the user params to show up in the
> first constbuf.

Apart from Tesla which uses shared memory for passing the OpenCL user params, 
Fermi+ use const memory for that, and this is what Samuel's patches are doing 
(at least from what I remember). 

Regards,
Pierre

> 
> You *really* want to be using constbufs here -- they're designed for this.
> 
>  -ilia
> ___
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 22-02-16 15:42, Samuel Pitoiset wrote:

Well the pipe_loader stuff is buggy in compute.c, I can't even
create a screen object... That's sad. It fails in pipe_loader_probe() & co.


It works for me on a kepler1 (GT740), at least before the dropping of
the RES support some of the tests from compute.c worked.

Regards,

Hans
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 22-02-16 15:22, Ilia Mirkin wrote:

On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede  wrote:

Hi,

On 22-02-16 14:47, Ilia Mirkin wrote:


On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin  wrote:


INPUT is for shader inputs which come from fixed function loaders.
This is not what you want. You want CONST. Stick the input params into
a constbuf, and you're done.



Oh, and in case it's not clear, I think this should be done by the st,
not by the driver. Not a big fan of the current interface where the
driver is responsible for uploading the kernel input parameters.



Moving this to the state-tracker will likely break clover for amd
cards, also what is the right place to stick the input params my


Not really. Could just have a PIPE_CAP. Could make it part of the
whole TGSI logic in the first place. This functionality isn't used for
the OpenGL compute shader, and it'd be nice to bring OpenCL in line
with the OpenGL stuff.


differ from one gpu to the other, also the opencl -> tgsi compiler
will need to know how to "address" input params without it needing to
know too much details of the targetted gpu. So of INPUT is not suitable,
then I think we are going to need MEMORY[#], INPUT for this, which
nouveau can then just treat as CONST.


That doesn't make sense... MEMORY is for... memory. Perhaps there's
something I don't understand about kernel inputs that would make my
suggestion unworkable? The way I see it is that you stick them into a
user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to
your thing})), and then launch the grid. Your TGSI would have been
composed such that it would expect the user params to show up in the
first constbuf.


Keep in mind the flat address space issue opencl has, there is no
such thing as the first constbuf, there is a const address space,
which is shared by all the const buffers, and the kernel gets
passed in offsets in the single address space to find different
const buffers. I'm not saying that this cannot work with your suggestion,
but it is something to keep in mind.

I've not yet looked closely at const bufs, I've only been working with
global buffers so far, here is how I understand those to work:

-clover calls pipe->set_global_binding() before calling launch_grid()
-clover has set up the uint32_t **handles pointer array to point to
 one uint32_t in the buffer which it is going to pass as "input" to
 launch_grid for each global buffer
-In the tgsi code for the kernel I can get to the global buffer pointed
 to by the first uint32_t in the input data by doing:
 LOAD TEMP[#].x, RES[#], IMM[0] # IMM[0] == 0

And then using TEMP[#].x as offset into RES[#]

If I understand your proposal correctly then you're suggestion that
getting the offset would become:

 LOAD TEMP[#].x, CONST[0], IMM[0] # IMM[0] == 0

That would work fine for me, the question then becomes what do we do
with const input parameters ? use CONST[1] for those ?

It is also not entirely clear to me yet how things are going to
work with regards to the handles returned by pipe->set_global_binding()
with RES these are offsets which can be used directly to address
the "RES" address-space. I assume that the intent with
the MEMORY register file is that each global buffer gets its own
MEMORY[#] and that to address say the first byte of the second buffer
allocated one would use:

LOAD TEMP[#].x, MEMORY[1], IMM[0] # IMM[0] == 0

But I cannot do that in OpenCL generated TGSI, as at the
tgsi llvm backend level I've no idea about which global buffer it is,
I only get a base-offset for the global buffer and all global buffers
are assumed to be in one memory space, e.g. I will always be using
MEMORY[0] for all the global buffers.

Does this mean that the state-tracker now needs to figure out the size
of all global buffers combined, and then request a single global memory
pool and set the offsets which it passes as input to the kernel itself,
dividing that memory pool into multiple buffers itself ?

Or will there still be a way to pass "flat" addresses in tgsi ?

Regards,

Hans

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Samuel Pitoiset

Well the pipe_loader stuff is buggy in compute.c, I can't even
create a screen object... That's sad. It fails in pipe_loader_probe() & co.

On 02/22/2016 02:08 PM, Hans de Goede wrote:

Hi,

On 22-02-16 14:04, Samuel Pitoiset wrote:


On 02/22/2016 01:46 PM, Hans de Goede wrote:

Hi,

On 22-02-16 13:41, Samuel Pitoiset wrote:

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:





So back to the problem of getting OpenCL(ish) code to work again with
the recent mesa changes. For starters I would like to get:

src/gallium/tests/trivial/compute.c and then the test with mask 8,
test_input_global() to work again, when that is working I should be
able to adjust my llvm work (and if necessary clover) to start to
work again.

Currently the test_input_global() test uses the following bit of
TGSI code:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, RGLOBAL, TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE RGLOBAL.x, TEMP[1]., TEMP[1]
RET
ENDSUB


Where by RINPUT and RGLOBAL get replaces by processing the
code with cpp and the following defines:

#define RGLOBALRES[32767]
#define RLOCAL RES[32766]
#define RPRIVATE   RES[32765]
#define RINPUT RES[32764]

If I understand how memory is supposed to work, then I would need to
change the TGSI as follows:

COMP
DCL SV[0], THREAD_ID[0]
DCL MEMORY[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, MEMORY[0], TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE MEMORY[0].x, TEMP[1]., TEMP[1]
RET
ENDSUB


Nope, this won't work because RINPUT is RES[32764]. And you have to
remove all occurrences to RES because it's not longer supported. In my
opinion, using BUFFER[0] in a first time should work. Currently, only
SHARED with MEMORY is supported.


Right, as I say below "This only solves the accessing of the global
memory, it does not solve
getting to the kernel input kernel parameters"


This assumes, that as discussed declaring memory without a , SHARED or
other
flag means the memory is global.

So 2 questions:

1) Do the above changes for using the new MEMORY keyword look as
intended
to you?

2) This only solves the accessing of the global memory, it does not
solve
getting to the kernel input kernel parameters, how would I deal with
those ?


The input kernel parameters are directly passed through a call to
pipe_context::launch_grid. You just have to fill the
pipe_grid_info::input array with your parameters and they will be
uploaded by nvXX_compute_upload_input().


Right, the uploading side I understand, the question is how to get to
them from
the compute kernel's tgsi code ?


Right, I wonder if there is already a DECL INPUT or something like
that for input parameters of shaders. Oh yeah, there is
TGSI_FILE_INPUT, maybe this is what you want?


Yes that sounds right, so now "all" we need to do is make
nvXX_compute_upload_input() and TGSI_FILE_INPUT work together.

Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 9:17 AM, Hans de Goede  wrote:
> Hi,
>
> On 22-02-16 14:47, Ilia Mirkin wrote:
>>
>> On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin  wrote:
>>>
>>> INPUT is for shader inputs which come from fixed function loaders.
>>> This is not what you want. You want CONST. Stick the input params into
>>> a constbuf, and you're done.
>>
>>
>> Oh, and in case it's not clear, I think this should be done by the st,
>> not by the driver. Not a big fan of the current interface where the
>> driver is responsible for uploading the kernel input parameters.
>
>
> Moving this to the state-tracker will likely break clover for amd
> cards, also what is the right place to stick the input params my

Not really. Could just have a PIPE_CAP. Could make it part of the
whole TGSI logic in the first place. This functionality isn't used for
the OpenGL compute shader, and it'd be nice to bring OpenCL in line
with the OpenGL stuff.

> differ from one gpu to the other, also the opencl -> tgsi compiler
> will need to know how to "address" input params without it needing to
> know too much details of the targetted gpu. So of INPUT is not suitable,
> then I think we are going to need MEMORY[#], INPUT for this, which
> nouveau can then just treat as CONST.

That doesn't make sense... MEMORY is for... memory. Perhaps there's
something I don't understand about kernel inputs that would make my
suggestion unworkable? The way I see it is that you stick them into a
user constbuf (i.e. pipe->set_constant_buffer({cb.user = pointer to
your thing})), and then launch the grid. Your TGSI would have been
composed such that it would expect the user params to show up in the
first constbuf.

You *really* want to be using constbufs here -- they're designed for this.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 22-02-16 14:47, Ilia Mirkin wrote:

On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin  wrote:

INPUT is for shader inputs which come from fixed function loaders.
This is not what you want. You want CONST. Stick the input params into
a constbuf, and you're done.


Oh, and in case it's not clear, I think this should be done by the st,
not by the driver. Not a big fan of the current interface where the
driver is responsible for uploading the kernel input parameters.


Moving this to the state-tracker will likely break clover for amd
cards, also what is the right place to stick the input params my
differ from one gpu to the other, also the opencl -> tgsi compiler
will need to know how to "address" input params without it needing to
know too much details of the targetted gpu. So of INPUT is not suitable,
then I think we are going to need MEMORY[#], INPUT for this, which
nouveau can then just treat as CONST.

Regards,

Hans
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Ilia Mirkin
On Mon, Feb 22, 2016 at 8:45 AM, Ilia Mirkin  wrote:
> INPUT is for shader inputs which come from fixed function loaders.
> This is not what you want. You want CONST. Stick the input params into
> a constbuf, and you're done.

Oh, and in case it's not clear, I think this should be done by the st,
not by the driver. Not a big fan of the current interface where the
driver is responsible for uploading the kernel input parameters.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 22-02-16 14:04, Samuel Pitoiset wrote:


On 02/22/2016 01:46 PM, Hans de Goede wrote:

Hi,

On 22-02-16 13:41, Samuel Pitoiset wrote:

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:





So back to the problem of getting OpenCL(ish) code to work again with
the recent mesa changes. For starters I would like to get:

src/gallium/tests/trivial/compute.c and then the test with mask 8,
test_input_global() to work again, when that is working I should be
able to adjust my llvm work (and if necessary clover) to start to
work again.

Currently the test_input_global() test uses the following bit of
TGSI code:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, RGLOBAL, TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE RGLOBAL.x, TEMP[1]., TEMP[1]
RET
ENDSUB


Where by RINPUT and RGLOBAL get replaces by processing the
code with cpp and the following defines:

#define RGLOBALRES[32767]
#define RLOCAL RES[32766]
#define RPRIVATE   RES[32765]
#define RINPUT RES[32764]

If I understand how memory is supposed to work, then I would need to
change the TGSI as follows:

COMP
DCL SV[0], THREAD_ID[0]
DCL MEMORY[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, MEMORY[0], TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE MEMORY[0].x, TEMP[1]., TEMP[1]
RET
ENDSUB


Nope, this won't work because RINPUT is RES[32764]. And you have to
remove all occurrences to RES because it's not longer supported. In my
opinion, using BUFFER[0] in a first time should work. Currently, only
SHARED with MEMORY is supported.


Right, as I say below "This only solves the accessing of the global
memory, it does not solve
getting to the kernel input kernel parameters"


This assumes, that as discussed declaring memory without a , SHARED or
other
flag means the memory is global.

So 2 questions:

1) Do the above changes for using the new MEMORY keyword look as
intended
to you?

2) This only solves the accessing of the global memory, it does not
solve
getting to the kernel input kernel parameters, how would I deal with
those ?


The input kernel parameters are directly passed through a call to
pipe_context::launch_grid. You just have to fill the
pipe_grid_info::input array with your parameters and they will be
uploaded by nvXX_compute_upload_input().


Right, the uploading side I understand, the question is how to get to
them from
the compute kernel's tgsi code ?


Right, I wonder if there is already a DECL INPUT or something like that for 
input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, maybe this is 
what you want?


Yes that sounds right, so now "all" we need to do is make
nvXX_compute_upload_input() and TGSI_FILE_INPUT work together.

Regards,

Hans
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Samuel Pitoiset



On 02/22/2016 01:46 PM, Hans de Goede wrote:

Hi,

On 22-02-16 13:41, Samuel Pitoiset wrote:

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:





So back to the problem of getting OpenCL(ish) code to work again with
the recent mesa changes. For starters I would like to get:

src/gallium/tests/trivial/compute.c and then the test with mask 8,
test_input_global() to work again, when that is working I should be
able to adjust my llvm work (and if necessary clover) to start to
work again.

Currently the test_input_global() test uses the following bit of
TGSI code:

COMP
DCL SV[0], THREAD_ID[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, RGLOBAL, TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE RGLOBAL.x, TEMP[1]., TEMP[1]
RET
ENDSUB


Where by RINPUT and RGLOBAL get replaces by processing the
code with cpp and the following defines:

#define RGLOBALRES[32767]
#define RLOCAL RES[32766]
#define RPRIVATE   RES[32765]
#define RINPUT RES[32764]

If I understand how memory is supposed to work, then I would need to
change the TGSI as follows:

COMP
DCL SV[0], THREAD_ID[0]
DCL MEMORY[0]
DCL TEMP[0], LOCAL
DCL TEMP[1], LOCAL
IMM UINT32 { 8, 0, 0, 0 }

BGNSUB\n"
UMUL TEMP[0], SV[0], IMM[0]
LOAD TEMP[1].xy, RINPUT, TEMP[0]
LOAD TEMP[0].x, MEMORY[0], TEMP[1].
UADD TEMP[1].x, TEMP[0], -TEMP[1]
STORE MEMORY[0].x, TEMP[1]., TEMP[1]
RET
ENDSUB


Nope, this won't work because RINPUT is RES[32764]. And you have to
remove all occurrences to RES because it's not longer supported. In my
opinion, using BUFFER[0] in a first time should work. Currently, only
SHARED with MEMORY is supported.


Right, as I say below "This only solves the accessing of the global
memory, it does not solve
getting to the kernel input kernel parameters"


This assumes, that as discussed declaring memory without a , SHARED or
other
flag means the memory is global.

So 2 questions:

1) Do the above changes for using the new MEMORY keyword look as
intended
to you?

2) This only solves the accessing of the global memory, it does not
solve
getting to the kernel input kernel parameters, how would I deal with
those ?


The input kernel parameters are directly passed through a call to
pipe_context::launch_grid. You just have to fill the
pipe_grid_info::input array with your parameters and they will be
uploaded by nvXX_compute_upload_input().


Right, the uploading side I understand, the question is how to get to
them from
the compute kernel's tgsi code ?


Right, I wonder if there is already a DECL INPUT or something like that 
for input parameters of shaders. Oh yeah, there is TGSI_FILE_INPUT, 
maybe this is what you want?




If I understand you correctly you are suggesting to use BUFFER[0] for this,
that is fine from a nouveau point-of-view, but might be a bit nouveau
centric way of looking at things, I think a better approach would be
a separate input register-file for this, as that will be more flexible
when people try to do opencl via clang->llvm->tgsi on other GPUs.


I will have a look at the test_input_global().


Thanks!

Regards,

Hans


--
-Samuel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Samuel Pitoiset

Hi there,

On 02/22/2016 12:26 PM, Hans de Goede wrote:

Hi,

On 19-02-16 20:43, Ilia Mirkin wrote:

On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede 
wrote:

Hi,

On 18-02-16 17:39, Ilia Mirkin wrote:


On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede 
wrote:


But this does not seem to be hooked up yet for nouveau.



Samuel has patches. See
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3



Cool, I will take a look at those.


So some questions:
-The commit by Samual says:
   This introduces TGSI_FILE_MEMORY for shared, global and local
memory.
Only
shared memory is currently supported.

   The commit introduces MEMORY[x] and MEMORY[x],SHARED so in
reality it
also
introduces
   a second option next to shared, so what are we going to use plain
MEMORY[x]
for?
   I suggest using it for global memory but we need to be in
agreement on
this.



That sounds fine to me. However what I had in mind was switching the
SHARED field into a 2-bit field and making it

1 = SHARED
2 = GLOBAL
3 = LOCAL

(since for OpenCL you also need to be able to address local or private
memory). I sorta wanted Samuel to do it, but since I had no idea where
you were at, or if you were even still working on this, I figured it
should be fixed up by the first person who needed it.



Sounds good, only the naming is somewhat unfortunate since opencl uses
different
naming. I.e. it has no "shared"


Sad. Well, "shared" is what OpenGL compute shaders use, which is why I
proposed it.



OpenCL has:
-global:  accessible by all worker-groups as well as by the CPU
-const:   read-only global
-local:   shared by worker-items in the same worker-group, not shared
between worker-groups
-private: accessible only to a single worker-item

So how do these map to the TGSI:


1 = SHARED
2 = GLOBAL
3 = LOCAL


OpenCL global = TGSI global
OpenCL const = TGSI global
OpenCL local = TGSI shared
OpenCL private = TGSI local

Not sure what the distinction is between OpenCL const and global is.
If the const stuff is actually just user-supplied uniforms (and
doesn't need to be in a particular place in memory), then those should
go into TGSI CONST somehow.


AFAIK OpenCL const is really read-only global, so the data is filled
in by the CPU, then passed to the opencl-kernel running on the GPU
where all worker-items have access to it. I think that TGSI CONST might
indeed be usable for this, but it is probably easiest to treat
it as GLOBAL for now.


-What about kernel input parameters, so far these have been using
RES[32764]
   I must admit that I do not understand where the file_index of 32764
comes
   from (or where any of the file indexes come from for
src/gallium/tests/trivial/compute.c ?
   I have the feeling that these are not used at all, and everything
simply
goes
   to a flat (virtual) memory space, with the params at address 0,
correct
?



It was never particularly well-specified, which was one of the reasons
it went away. It also didn't map nicely onto the OpenGL model. There
is a remaining question of how to do addressing in memory... there's
40 bits of address space. Should these implicitly be U64
(dual-component in TGSI) addresses that are passed around? Not sure
what the OpenCL position on all this is.



So far I've been using U32 for addresses as that is what Francisco's
original
code was using. And this also is what things like the tgsi LOAD
instruction
take. If you're doing a LOAD on a 1d buffer then you will use
TEMP[#].x to
specify the index, and the way how this currently works with OpenCL
is that
clCreateBuffer() will return a cl_mem type which then gets passed into
the kernel as input parameter and gets treated as a pointer by the
compiler,
so e.g. global mem gets treated as a single address space even if there
are multiple global buffers and TEMP[#].x contains the value passed in
via cl_mem as start offset for the buffer + the index into the buffer.

So this means that currently we are limited to U32 since TEMP[#].x is
only 32 bits wide. Internally 40 bits addresses can and should probably
be used so that at least the different memory spaces each have the full
32 bits available.

Note that we could fix this by adding some sort of LOAD64 opcode, which
uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure
how this would work for 3d buffers though. I foresee the llvm backend
eventually getting a 64 bit mode where it will use 64 bits for all
pointers and use something like a LOAD64 opcode to indicate that
the indexes (which it effectively uses as addresses / pointers)
are 64 bit wide.


Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
src/dest, so you don't need to worry about 3d or anything like that. I
believe this is a good solution to the problem.


Right for MEMORY this will work fine. I've no clue yet how images will
work with OpenCL though, hopefully we can avoid the one flat address
space thing there, but I simply don't know yet.


Also it would 

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-22 Thread Hans de Goede

Hi,

On 19-02-16 20:43, Ilia Mirkin wrote:

On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede  wrote:

Hi,

On 18-02-16 17:39, Ilia Mirkin wrote:


On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede 
wrote:


But this does not seem to be hooked up yet for nouveau.



Samuel has patches. See
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3



Cool, I will take a look at those.


So some questions:
-The commit by Samual says:
   This introduces TGSI_FILE_MEMORY for shared, global and local memory.
Only
shared memory is currently supported.

   The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it
also
introduces
   a second option next to shared, so what are we going to use plain
MEMORY[x]
for?
   I suggest using it for global memory but we need to be in agreement on
this.



That sounds fine to me. However what I had in mind was switching the
SHARED field into a 2-bit field and making it

1 = SHARED
2 = GLOBAL
3 = LOCAL

(since for OpenCL you also need to be able to address local or private
memory). I sorta wanted Samuel to do it, but since I had no idea where
you were at, or if you were even still working on this, I figured it
should be fixed up by the first person who needed it.



Sounds good, only the naming is somewhat unfortunate since opencl uses
different
naming. I.e. it has no "shared"


Sad. Well, "shared" is what OpenGL compute shaders use, which is why I
proposed it.



OpenCL has:
-global:  accessible by all worker-groups as well as by the CPU
-const:   read-only global
-local:   shared by worker-items in the same worker-group, not shared
between worker-groups
-private: accessible only to a single worker-item

So how do these map to the TGSI:


1 = SHARED
2 = GLOBAL
3 = LOCAL


OpenCL global = TGSI global
OpenCL const = TGSI global
OpenCL local = TGSI shared
OpenCL private = TGSI local

Not sure what the distinction is between OpenCL const and global is.
If the const stuff is actually just user-supplied uniforms (and
doesn't need to be in a particular place in memory), then those should
go into TGSI CONST somehow.


AFAIK OpenCL const is really read-only global, so the data is filled
in by the CPU, then passed to the opencl-kernel running on the GPU
where all worker-items have access to it. I think that TGSI CONST might
indeed be usable for this, but it is probably easiest to treat
it as GLOBAL for now.


-What about kernel input parameters, so far these have been using
RES[32764]
   I must admit that I do not understand where the file_index of 32764
comes
   from (or where any of the file indexes come from for
src/gallium/tests/trivial/compute.c ?
   I have the feeling that these are not used at all, and everything
simply
goes
   to a flat (virtual) memory space, with the params at address 0, correct
?



It was never particularly well-specified, which was one of the reasons
it went away. It also didn't map nicely onto the OpenGL model. There
is a remaining question of how to do addressing in memory... there's
40 bits of address space. Should these implicitly be U64
(dual-component in TGSI) addresses that are passed around? Not sure
what the OpenCL position on all this is.



So far I've been using U32 for addresses as that is what Francisco's
original
code was using. And this also is what things like the tgsi LOAD instruction
take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to
specify the index, and the way how this currently works with OpenCL is that
clCreateBuffer() will return a cl_mem type which then gets passed into
the kernel as input parameter and gets treated as a pointer by the compiler,
so e.g. global mem gets treated as a single address space even if there
are multiple global buffers and TEMP[#].x contains the value passed in
via cl_mem as start offset for the buffer + the index into the buffer.

So this means that currently we are limited to U32 since TEMP[#].x is
only 32 bits wide. Internally 40 bits addresses can and should probably
be used so that at least the different memory spaces each have the full
32 bits available.

Note that we could fix this by adding some sort of LOAD64 opcode, which
uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure
how this would work for 3d buffers though. I foresee the llvm backend
eventually getting a 64 bit mode where it will use 64 bits for all
pointers and use something like a LOAD64 opcode to indicate that
the indexes (which it effectively uses as addresses / pointers)
are 64 bit wide.


Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
src/dest, so you don't need to worry about 3d or anything like that. I
believe this is a good solution to the problem.


Right for MEMORY this will work fine. I've no clue yet how images will
work with OpenCL though, hopefully we can avoid the one flat address
space thing there, but I simply don't know yet.


Also it would be highly preferable to avoid using GLOBAL at all in the

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-19 Thread Ilia Mirkin
On Fri, Feb 19, 2016 at 5:36 AM, Hans de Goede  wrote:
> Hi,
>
> On 18-02-16 17:39, Ilia Mirkin wrote:
>>
>> On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede 
>> wrote:
>>>
>>> But this does not seem to be hooked up yet for nouveau.
>>
>>
>> Samuel has patches. See
>> https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3
>
>
> Cool, I will take a look at those.
>
>>> So some questions:
>>> -The commit by Samual says:
>>>   This introduces TGSI_FILE_MEMORY for shared, global and local memory.
>>> Only
>>> shared memory is currently supported.
>>>
>>>   The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it
>>> also
>>> introduces
>>>   a second option next to shared, so what are we going to use plain
>>> MEMORY[x]
>>> for?
>>>   I suggest using it for global memory but we need to be in agreement on
>>> this.
>>
>>
>> That sounds fine to me. However what I had in mind was switching the
>> SHARED field into a 2-bit field and making it
>>
>> 1 = SHARED
>> 2 = GLOBAL
>> 3 = LOCAL
>>
>> (since for OpenCL you also need to be able to address local or private
>> memory). I sorta wanted Samuel to do it, but since I had no idea where
>> you were at, or if you were even still working on this, I figured it
>> should be fixed up by the first person who needed it.
>
>
> Sounds good, only the naming is somewhat unfortunate since opencl uses
> different
> naming. I.e. it has no "shared"

Sad. Well, "shared" is what OpenGL compute shaders use, which is why I
proposed it.

>
> OpenCL has:
> -global:  accessible by all worker-groups as well as by the CPU
> -const:   read-only global
> -local:   shared by worker-items in the same worker-group, not shared
> between worker-groups
> -private: accessible only to a single worker-item
>
> So how do these map to the TGSI:
>
>> 1 = SHARED
>> 2 = GLOBAL
>> 3 = LOCAL

OpenCL global = TGSI global
OpenCL const = TGSI global
OpenCL local = TGSI shared
OpenCL private = TGSI local

Not sure what the distinction is between OpenCL const and global is.
If the const stuff is actually just user-supplied uniforms (and
doesn't need to be in a particular place in memory), then those should
go into TGSI CONST somehow.

>>
>>>
>>> -What about kernel input parameters, so far these have been using
>>> RES[32764]
>>>   I must admit that I do not understand where the file_index of 32764
>>> comes
>>>   from (or where any of the file indexes come from for
>>> src/gallium/tests/trivial/compute.c ?
>>>   I have the feeling that these are not used at all, and everything
>>> simply
>>> goes
>>>   to a flat (virtual) memory space, with the params at address 0, correct
>>> ?
>>
>>
>> It was never particularly well-specified, which was one of the reasons
>> it went away. It also didn't map nicely onto the OpenGL model. There
>> is a remaining question of how to do addressing in memory... there's
>> 40 bits of address space. Should these implicitly be U64
>> (dual-component in TGSI) addresses that are passed around? Not sure
>> what the OpenCL position on all this is.
>
>
> So far I've been using U32 for addresses as that is what Francisco's
> original
> code was using. And this also is what things like the tgsi LOAD instruction
> take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to
> specify the index, and the way how this currently works with OpenCL is that
> clCreateBuffer() will return a cl_mem type which then gets passed into
> the kernel as input parameter and gets treated as a pointer by the compiler,
> so e.g. global mem gets treated as a single address space even if there
> are multiple global buffers and TEMP[#].x contains the value passed in
> via cl_mem as start offset for the buffer + the index into the buffer.
>
> So this means that currently we are limited to U32 since TEMP[#].x is
> only 32 bits wide. Internally 40 bits addresses can and should probably
> be used so that at least the different memory spaces each have the full
> 32 bits available.
>
> Note that we could fix this by adding some sort of LOAD64 opcode, which
> uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure
> how this would work for 3d buffers though. I foresee the llvm backend
> eventually getting a 64 bit mode where it will use 64 bits for all
> pointers and use something like a LOAD64 opcode to indicate that
> the indexes (which it effectively uses as addresses / pointers)
> are 64 bit wide.

Well, this LOAD64/STORE64 would just only be defined for MEMORY[]
src/dest, so you don't need to worry about 3d or anything like that. I
believe this is a good solution to the problem.

>
>> Also it would be highly preferable to avoid using GLOBAL at all in the
>> first place, and just use BUFFER[] or IMAGE[] or whatever. However
>> after having a look at how OpenCL works, I don't think that's
>> possible. But perhaps I missed something?
>
>
> OpenCL is very C-like and as such uses a flat address space per memory
> space, getting 

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-19 Thread Hans de Goede

Hi,

On 18-02-16 17:39, Ilia Mirkin wrote:

On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede  wrote:

But this does not seem to be hooked up yet for nouveau.


Samuel has patches. See
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3


Cool, I will take a look at those.


So some questions:
-The commit by Samual says:
  This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only
shared memory is currently supported.

  The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also
introduces
  a second option next to shared, so what are we going to use plain MEMORY[x]
for?
  I suggest using it for global memory but we need to be in agreement on
this.


That sounds fine to me. However what I had in mind was switching the
SHARED field into a 2-bit field and making it

1 = SHARED
2 = GLOBAL
3 = LOCAL

(since for OpenCL you also need to be able to address local or private
memory). I sorta wanted Samuel to do it, but since I had no idea where
you were at, or if you were even still working on this, I figured it
should be fixed up by the first person who needed it.


Sounds good, only the naming is somewhat unfortunate since opencl uses different
naming. I.e. it has no "shared"

OpenCL has:
-global:  accessible by all worker-groups as well as by the CPU
-const:   read-only global
-local:   shared by worker-items in the same worker-group, not shared between 
worker-groups
-private: accessible only to a single worker-item

So how do these map to the TGSI:

> 1 = SHARED
> 2 = GLOBAL
> 3 = LOCAL

?






-What about kernel input parameters, so far these have been using RES[32764]
  I must admit that I do not understand where the file_index of 32764 comes
  from (or where any of the file indexes come from for
src/gallium/tests/trivial/compute.c ?
  I have the feeling that these are not used at all, and everything simply
goes
  to a flat (virtual) memory space, with the params at address 0, correct ?


It was never particularly well-specified, which was one of the reasons
it went away. It also didn't map nicely onto the OpenGL model. There
is a remaining question of how to do addressing in memory... there's
40 bits of address space. Should these implicitly be U64
(dual-component in TGSI) addresses that are passed around? Not sure
what the OpenCL position on all this is.


So far I've been using U32 for addresses as that is what Francisco's original
code was using. And this also is what things like the tgsi LOAD instruction
take. If you're doing a LOAD on a 1d buffer then you will use TEMP[#].x to
specify the index, and the way how this currently works with OpenCL is that
clCreateBuffer() will return a cl_mem type which then gets passed into
the kernel as input parameter and gets treated as a pointer by the compiler,
so e.g. global mem gets treated as a single address space even if there
are multiple global buffers and TEMP[#].x contains the value passed in
via cl_mem as start offset for the buffer + the index into the buffer.

So this means that currently we are limited to U32 since TEMP[#].x is
only 32 bits wide. Internally 40 bits addresses can and should probably
be used so that at least the different memory spaces each have the full
32 bits available.

Note that we could fix this by adding some sort of LOAD64 opcode, which
uses TEMP[#].x and TEMP[#].y as address for 1d buffers, I'm not sure
how this would work for 3d buffers though. I foresee the llvm backend
eventually getting a 64 bit mode where it will use 64 bits for all
pointers and use something like a LOAD64 opcode to indicate that
the indexes (which it effectively uses as addresses / pointers)
are 64 bit wide.


Also it would be highly preferable to avoid using GLOBAL at all in the
first place, and just use BUFFER[] or IMAGE[] or whatever. However
after having a look at how OpenCL works, I don't think that's
possible. But perhaps I missed something?


OpenCL is very C-like and as such uses a flat address space per memory
space, getting around that is going to be very tricky, if possible at
all.


  This seems like a nvidia implementation detail though, and we should IMHO
still
  implement a specific TGSI register file for accessing kernel input params,
  agreed ?


I believe that kernel input params should be passed in via user
uniforms. i.e. CONST. It doesn't have to be a separate UBO, it can use
the user buffer thing, which is basically what it is now.



If we can agree on the TGSI syntax for all this, then I'm pretty sure I can
hack
something together to get things to work with the latest master again, and
then we
can use this as a starting point for properly implementing this.

Regards,

Hans


p.s.

I must say I was somewhat unpleasantly surprised by the RES breakage, yes I
know
some changes were coming but still. In the future when my llvm->tgsi work is
more
stable and I'll be pushing it to llvm upstream we really cannot make
breaking tgsi
changes like this.


I thought I had 

Re: [Nouveau] Dealing with opencl kernel parameters in nouveau now that RES support is gone

2016-02-18 Thread Ilia Mirkin
On Thu, Feb 18, 2016 at 9:45 AM, Hans de Goede  wrote:
> But this does not seem to be hooked up yet for nouveau.

Samuel has patches. See
https://cgit.freedesktop.org/~hakzsam/mesa/log/?h=arb_compute_shader_v3

>
> So some questions:
> -The commit by Samual says:
>  This introduces TGSI_FILE_MEMORY for shared, global and local memory. Only
> shared memory is currently supported.
>
>  The commit introduces MEMORY[x] and MEMORY[x],SHARED so in reality it also
> introduces
>  a second option next to shared, so what are we going to use plain MEMORY[x]
> for?
>  I suggest using it for global memory but we need to be in agreement on
> this.

That sounds fine to me. However what I had in mind was switching the
SHARED field into a 2-bit field and making it

1 = SHARED
2 = GLOBAL
3 = LOCAL

(since for OpenCL you also need to be able to address local or private
memory). I sorta wanted Samuel to do it, but since I had no idea where
you were at, or if you were even still working on this, I figured it
should be fixed up by the first person who needed it.

>
> -What about kernel input parameters, so far these have been using RES[32764]
>  I must admit that I do not understand where the file_index of 32764 comes
>  from (or where any of the file indexes come from for
> src/gallium/tests/trivial/compute.c ?
>  I have the feeling that these are not used at all, and everything simply
> goes
>  to a flat (virtual) memory space, with the params at address 0, correct ?

It was never particularly well-specified, which was one of the reasons
it went away. It also didn't map nicely onto the OpenGL model. There
is a remaining question of how to do addressing in memory... there's
40 bits of address space. Should these implicitly be U64
(dual-component in TGSI) addresses that are passed around? Not sure
what the OpenCL position on all this is.

Also it would be highly preferable to avoid using GLOBAL at all in the
first place, and just use BUFFER[] or IMAGE[] or whatever. However
after having a look at how OpenCL works, I don't think that's
possible. But perhaps I missed something?

>
>  This seems like a nvidia implementation detail though, and we should IMHO
> still
>  implement a specific TGSI register file for accessing kernel input params,
>  agreed ?

I believe that kernel input params should be passed in via user
uniforms. i.e. CONST. It doesn't have to be a separate UBO, it can use
the user buffer thing, which is basically what it is now.

>
> If we can agree on the TGSI syntax for all this, then I'm pretty sure I can
> hack
> something together to get things to work with the latest master again, and
> then we
> can use this as a starting point for properly implementing this.
>
> Regards,
>
> Hans
>
>
> p.s.
>
> I must say I was somewhat unpleasantly surprised by the RES breakage, yes I
> know
> some changes were coming but still. In the future when my llvm->tgsi work is
> more
> stable and I'll be pushing it to llvm upstream we really cannot make
> breaking tgsi
> changes like this.

I thought I had sufficiently explained to you that the resource thing
was going away -- it was never spec'd, and there were no users.
Perhaps I didn't make it clear enough though -- sorry. The reality is
that TGSI is a living thing, and things change every so often. I don't
think people would be comfortable with locking down TGSI in such a
way.

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau