Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-07-17 Thread Matt Arsenault


> On Jun 28, 2019, at 19:33, Jan Vesely  wrote:
> 
> On Fri, Jun 21, 2019 at 4:12 PM Matt Arsenault  wrote:
>> 
>> 
>> 
>> On Jun 20, 2019, at 2:38 PM, Jan Vesely  wrote:
>> 
>> sorry, I'm running against deadlines and traveling this week.
>> Does the timeout patch work as expected in failure path?
>> 
>> 
>> It seems to not work. I’m able to manually interrupt it still, but the 
>> timeout never triggers
> 
> Can you check running:
> python3 ./piglit run tests/cl.py -t clobbers results/foo
> ?
> 
> Other than that, LGTM.
> 
> Jan

It is actually working for me

-Matt
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-06-21 Thread Matt Arsenault


> On Jun 20, 2019, at 2:38 PM, Jan Vesely  wrote:
> 
> sorry, I'm running against deadlines and traveling this week.
> Does the timeout patch work as expected in failure path?

It seems to not work. I’m able to manually interrupt it still, but the timeout 
never triggers___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-06-18 Thread Matt Arsenault


> On Jun 5, 2019, at 10:05 PM, Jan Vesely  wrote:
> 
> On Wed, 2019-06-05 at 17:48 -0400, Matt Arsenault wrote:
>>> On Jun 3, 2019, at 12:46 PM, Jan Vesely  wrote:
>>> 
>>> Does rocm do anything special other than using compute rings?
>>> What was the HW you tested?
>> I don’t think so. This is on gfx900
>> 
>> 
>>> I checked that raven can reboot after gpu hangs/crashes (not suspend,
>>> but that's probably one of many raven problems). I'd like to check
>>> carrizo/iceland too, as that's the machine that get accessed remotely.
>>> 
>>> Other than that I think it's OK to just put default 30s timeout on all
>>> CL tests, even debug build of LLVM shouldn't need more than that.
>> 
>> Do you know where this goes?
> 
> The test base class has a timeout attribute, and the invocation uses
> self.timoute. I think just adding self.timeout = 30 to PiglitCLTest
> constructor should suffice, adding a timeout attribute to the
> PiglitCLTest class might work as well.
> 
> Dylan, is there a preferred way to do this? will the commandline --
> timeout take precedence?
> 
> Jan
> 
> -- 
> Jan Vesely 

ping
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-06-05 Thread Matt Arsenault


> On Jun 3, 2019, at 12:46 PM, Jan Vesely  wrote:
> 
> Does rocm do anything special other than using compute rings?
> What was the HW you tested?
I don’t think so. This is on gfx900


> I checked that raven can reboot after gpu hangs/crashes (not suspend,
> but that's probably one of many raven problems). I'd like to check
> carrizo/iceland too, as that's the machine that get accessed remotely.
> 
> Other than that I think it's OK to just put default 30s timeout on all
> CL tests, even debug build of LLVM shouldn't need more than that.


Do you know where this goes?___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-06-03 Thread Matt Arsenault


> On Jun 2, 2019, at 2:36 PM, Jan Vesely  wrote:
> 
> he other problem is that even with killed process hung GPU usually
> makes the machine unable to suspend or reboot on its own, which kills
> remote testing.
> I'd need to recheck if that's still the case with linux-5.1.

I was able to interrupt the process normally and everything worked OK with rocm 
(which due to the device name regex, its the only platform this runs) without 
my fix applied

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-06-02 Thread Matt Arsenault


> On Jun 1, 2019, at 10:57 PM, Jan Vesely  wrote:
> 
> On Thu, 2019-05-30 at 08:40 -0400, Matt Arsenault wrote:
>> Ping
>> 
>>> On May 23, 2019, at 7:59 PM, arse...@gmail.com wrote:
>>> 
>>> From: Matt Arsenault 
>>> 
>>> ---
>>> .../program/execute/call-clobbers-amdgcn.cl   | 102 ++
>>> 1 file changed, 102 insertions(+)
>>> 
>>> diff --git a/tests/cl/program/execute/call-clobbers-amdgcn.cl 
>>> b/tests/cl/program/execute/call-clobbers-amdgcn.cl
>>> index 18e657ce3..b0a1f8c70 100644
>>> --- a/tests/cl/program/execute/call-clobbers-amdgcn.cl
>>> +++ b/tests/cl/program/execute/call-clobbers-amdgcn.cl
>>> @@ -19,6 +19,49 @@ dimensions: 1
>>> global_size: 1 0 0
>>> arg_out: 0 buffer int[1] 0xabcd1234
>>> 
>>> +[test]
>>> +name: Conditional call
>>> +kernel_name: conditional_call
>>> +dimensions: 1
>>> +local_size: 64 0 0
>>> +global_size: 64 0 0
>>> +arg_out: 0 buffer int[64] \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234
>>> +
>>> +[test]
>>> +name: Conditional call partial dispatch
>>> +kernel_name: conditional_call
>>> +dimensions: 1
>>> +local_size: 16 0 0
>>> +global_size: 16 0 0
>>> +arg_out: 0 buffer int[16] \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234 \
>>> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
>>> 0xabcd1234 0xabcd1234
>>> +
>>> +
>>> +[test]
>>> +name: Skip call no lanes
>>> +kernel_name: skip_call_no_lanes
>>> +dimensions: 1
>>> +local_size: 64 0 0
>>> +global_size: 64 0 0
>>> +arg_out: 0 buffer int[64] \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123 \
>>> +  123 123 123 123 123 123 123 123
>>> +
>>> !*/
>>> 
>>> #ifndef __AMDGCN__
>>> @@ -65,3 +108,62 @@ kernel void call_clobber_v40(__global int* ret)
>>>  : "v40");
>>>*ret = tmp;
>>> }
>>> +
>>> +__attribute__((noinline))
>>> +void spill_sgpr_to_csr_vgpr()
>>> +{
>>> +__asm volatile(
>>> +"s_nop 1" :::
>>> +"v0","v1","v2","v3","v4","v5","v6","v7",
>>> +"v8","v9","v10","v11","v12","v13","v14","v15",
>>> +"v16","v17","v18","v19","v20","v21","v22","v23",
>>> +"v24","v25","v26","v27","v28","v29","v30","v31",
>>> +
>>> +"s0","s1","s2","s3","s4","s5","s6","s7",
>>> +"s8","s9","s10","s11","s12","s13","s14","s15",
>>> +"s16","s17","s18","s19","s20","s21","s22","s23",
>>> +"s24","s25","s26","s27","s28","s29","s30","s31",
>>> +"s32", "s33", "s34", "s35", "s36", "s37", "s38");
>>> +}
>>> +
>>> +// A CSR VGPR needs to be spilled/restored in the prolog/epilog, but
>>> +// all lanes need to be made active to avoid clobbering lanes that did
>>> +// not enter the call.
>>> +kernel void conditional_call(global int* ret)
>>> +{
>>> +__asm volatile("v_mov_b32 v32, 0xabcd1234" : : : "v32");
>>> +
>>> +int id = get_local_id(0);
>>> +if (id == 0)
>>> +{
>>> +spill_sgpr_to_csr_vgpr();
>>> +}
>>> +
>>> +int tmp;
>>> +__asm volatile("v_mov_b32 %0, v32"
>>> +   : "=v"(tmp)
>>> +   :
>>> +   : "v32");
>>> +ret[id] = tmp;
>>> +}
>>> +
>>> +__attribute__((noinline))
>>> +void hang_if_all_inactive()
>>> +{
>>> +__builtin_amdgcn_s_sendmsghalt(0, 0);
>>> +}
>>> +
>>> +// If all lanes could be dynamically false, the call must not be taken
>>> +// in case a side effecting scalar op is called inside.
>>> +kernel void skip_call_no_lanes(global int* ret)
>>> +{
>>> +int divergent_false;
>>> +__asm volatile("v_mov_b32 %0, 0" : "=v"(divergent_false));
>>> +
>>> +if (divergent_false)
>>> +{
>>> +hang_if_all_inactive();
> 
> this looks like it will hang the GPU on test failure, which is a no-
> go.
> 
> Jan


Is there a way to specify a timeout? The alternatives require more ABI support___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

Re: [Piglit] [PATCH] cl: Add tests for some cases that were broken with function calls

2019-05-30 Thread Matt Arsenault
Ping

> On May 23, 2019, at 7:59 PM, arse...@gmail.com wrote:
> 
> From: Matt Arsenault 
> 
> ---
> .../program/execute/call-clobbers-amdgcn.cl   | 102 ++
> 1 file changed, 102 insertions(+)
> 
> diff --git a/tests/cl/program/execute/call-clobbers-amdgcn.cl 
> b/tests/cl/program/execute/call-clobbers-amdgcn.cl
> index 18e657ce3..b0a1f8c70 100644
> --- a/tests/cl/program/execute/call-clobbers-amdgcn.cl
> +++ b/tests/cl/program/execute/call-clobbers-amdgcn.cl
> @@ -19,6 +19,49 @@ dimensions: 1
> global_size: 1 0 0
> arg_out: 0 buffer int[1] 0xabcd1234
> 
> +[test]
> +name: Conditional call
> +kernel_name: conditional_call
> +dimensions: 1
> +local_size: 64 0 0
> +global_size: 64 0 0
> +arg_out: 0 buffer int[64] \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234
> +
> +[test]
> +name: Conditional call partial dispatch
> +kernel_name: conditional_call
> +dimensions: 1
> +local_size: 16 0 0
> +global_size: 16 0 0
> +arg_out: 0 buffer int[16] \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234 \
> +  0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 0xabcd1234 
> 0xabcd1234 0xabcd1234
> +
> +
> +[test]
> +name: Skip call no lanes
> +kernel_name: skip_call_no_lanes
> +dimensions: 1
> +local_size: 64 0 0
> +global_size: 64 0 0
> +arg_out: 0 buffer int[64] \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123 \
> +  123 123 123 123 123 123 123 123
> +
> !*/
> 
> #ifndef __AMDGCN__
> @@ -65,3 +108,62 @@ kernel void call_clobber_v40(__global int* ret)
>   : "v40");
> *ret = tmp;
> }
> +
> +__attribute__((noinline))
> +void spill_sgpr_to_csr_vgpr()
> +{
> +__asm volatile(
> +"s_nop 1" :::
> +"v0","v1","v2","v3","v4","v5","v6","v7",
> +"v8","v9","v10","v11","v12","v13","v14","v15",
> +"v16","v17","v18","v19","v20","v21","v22","v23",
> +"v24","v25","v26","v27","v28","v29","v30","v31",
> +
> +"s0","s1","s2","s3","s4","s5","s6","s7",
> +"s8","s9","s10","s11","s12","s13","s14","s15",
> +"s16","s17","s18","s19","s20","s21","s22","s23",
> +"s24","s25","s26","s27","s28","s29","s30","s31",
> +"s32", "s33", "s34", "s35", "s36", "s37", "s38");
> +}
> +
> +// A CSR VGPR needs to be spilled/restored in the prolog/epilog, but
> +// all lanes need to be made active to avoid clobbering lanes that did
> +// not enter the call.
> +kernel void conditional_call(global int* ret)
> +{
> +__asm volatile("v_mov_b32 v32, 0xabcd1234" : : : "v32");
> +
> +int id = get_local_id(0);
> +if (id == 0)
> +{
> +spill_sgpr_to_csr_vgpr();
> +}
> +
> +int tmp;
> +__asm volatile("v_mov_b32 %0, v32"
> +   : "=v"(tmp)
> +   :
> +   : "v32");
> +ret[id] = tmp;
> +}
> +
> +__attribute__((noinline))
> +void hang_if_all_inactive()
> +{
> +__builtin_amdgcn_s_sendmsghalt(0, 0);
> +}
> +
> +// If all lanes could be dynamically false, the call must not be taken
> +// in case a side effecting scalar op is called inside.
> +kernel void skip_call_no_lanes(global int* ret)
> +{
> +int divergent_false;
> +__asm volatile("v_mov_b32 %0, 0" : "=v"(divergent_false));
> +
> +if (divergent_false)
> +{
> +hang_if_all_inactive();
> +}
> +
> +ret[get_global_id(0)] = 123;
> +}
> -- 
> 2.17.1
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit

[Piglit] [PATCH] cl: Add test for call stack realignment

2018-09-10 Thread Matt Arsenault
v2: Use uintptr_t

v3: Formatting

v4: More uintptr_t
---
 tests/cl/program/execute/realign-stack.cl | 93 +++
 1 file changed, 93 insertions(+)
 create mode 100644 tests/cl/program/execute/realign-stack.cl

diff --git a/tests/cl/program/execute/realign-stack.cl 
b/tests/cl/program/execute/realign-stack.cl
new file mode 100644
index 0..eb1a23f20
--- /dev/null
+++ b/tests/cl/program/execute/realign-stack.cl
@@ -0,0 +1,93 @@
+/*!
+
+[config]
+name: call with stack realignment
+
+[test]
+name: call stack realignment 16
+kernel_name: kernel_call_stack_realign16_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+
+[test]
+name: call stack realignment 32
+kernel_name: kernel_call_stack_realign32_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+[test]
+name: call stack realignment 64
+kernel_name: kernel_call_stack_realign64_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+[test]
+name: call stack realignment 128
+kernel_name: kernel_call_stack_realign128_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+
+!*/
+
+// Make sure the absolute private address of stack objects in callee
+// functions is properly aligned.
+
+#define NOINLINE __attribute__((noinline))
+
+NOINLINE
+int test_stack_object_alignment16() {
+volatile int4 requires_align16 = 0;
+volatile uintptr_t addr = (uintptr_t)&requires_align16;
+return (addr & 15) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment32() {
+volatile int8 requires_align32 = 0;
+volatile uintptr_t addr = (uintptr_t)&requires_align32;
+return (addr & 31) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment64() {
+volatile int16 requires_align64 = 0;
+volatile uintptr_t addr = (uintptr_t)&requires_align64;
+return (addr & 63) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment128() {
+volatile long16 requires_align128 = 0;
+volatile uintptr_t addr = (uintptr_t)&requires_align128;
+return (addr & 127) == 0;
+}
+
+kernel void kernel_call_stack_realign16_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment16();
+}
+
+kernel void kernel_call_stack_realign32_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment32();
+}
+
+kernel void kernel_call_stack_realign64_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment64();
+}
+
+kernel void kernel_call_stack_realign128_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment128();
+}
-- 
2.17.1

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 2/2] cl: Add tests for calls with special inputs

2018-09-05 Thread Matt Arsenault
ping

> On Aug 22, 2018, at 15:41, Matt Arsenault  wrote:
> 
> Also fixes apparently missing coverage for special
> input arguments not passed in registers.
> ---
> tests/cl/program/execute/calls-workitem-id.cl | 136 ++
> 1 file changed, 136 insertions(+)
> 
> diff --git a/tests/cl/program/execute/calls-workitem-id.cl 
> b/tests/cl/program/execute/calls-workitem-id.cl
> index 7edfad7e9..b42c85959 100644
> --- a/tests/cl/program/execute/calls-workitem-id.cl
> +++ b/tests/cl/program/execute/calls-workitem-id.cl
> @@ -38,6 +38,56 @@ arg_out: 2 buffer uint[64] \
>   1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 \
>   1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
> 
> +[test]
> +name: Callee function stack passed get_local_id
> +kernel_name: kernel_call_too_many_argument_regs_get_local_id_012
> +dimensions: 3
> +global_size: 8 4 2
> +local_size: 8 4 2
> +
> +arg_out: 0 buffer uint[64] \
> +  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7 \
> +  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7 \
> +  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7 \
> +  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7
> +
> +arg_out: 1 buffer uint[64] \
> +  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1 \
> +  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3 \
> +  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1 \
> +  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3
> +
> +arg_out: 2 buffer uint[64] \
> +  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 \
> +  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 \
> +  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 \
> +  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
> +
> +[test]
> +name: Callee function stack passed get_local_id with byval
> +kernel_name: kernel_call_too_many_argument_regs_byval_get_local_id_012
> +dimensions: 3
> +global_size: 8 4 2
> +local_size: 8 4 2
> +
> +arg_out: 0 buffer uint[64] \
> +  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52  \
> +  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52  \
> +  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52  \
> +  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52
> +
> +arg_out: 1 buffer uint[64] \
> +  47  47  47  47  47  47  47  47  48  48  48  48  48  48  48  48 \
> +  49  49  49  49  49  49  49  49  50  50  50  50  50  50  50  50 \
> +  47  47  47  47  47  47  47  47  48  48  48  48  48  48  48  48 \
> +  49  49  49  49  49  49  49  49  50  50  50  50  50  50  50  50
> +
> +arg_out: 2 buffer uint[64] \
> +  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50 \
> +  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50 \
> +  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51 \
> +  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51
> +
> !*/
> 
> #define NOINLINE __attribute__((noinline))
> @@ -75,3 +125,89 @@ kernel void kernel_call_pass_get_global_id_012(global 
> uint *out0,
> {
> func_get_global_id_012(out0, out1, out2);
> }
> +
> +// On amdgcn, this will require the workitem IDs be passed as values
> +// on the stack after the arguments.
> +NOINLINE
> +uint3 too_many_argument_regs_get_local_id_012(
> + int arg0, int arg1, int arg2, int arg3,
> + int arg4, int arg5, int arg6, int arg7,
> + int arg8, int arg9, int arg10, int arg11,
> + int arg12, int arg13, int arg14, int arg15,
> + int arg16, int arg17, int arg18, int arg19,
> + int arg20, int arg21, int arg22, int arg23,
> + int arg24, int arg25, int arg26, int arg27,
> + int arg28, int arg29, int arg30, int arg31)
> +{
> + uint3 id;
> + id.x = get_local_id(0);
> + id.y = get_local_id(1);
> + id.z = get_local_id(2);
> + return id;
> +}
> +
> +kernel void kernel_call_too_many_argument_regs_get_local_id_012(global uint* 
> out0, global uint* out1, global uint* out2)
> +{
> + uint id0 = get_global_id(0);
> + uint id1 = get_global_id(1);
> + uint id2 = get_global_id(2);
> + uint flat_id = (id2 * get_global_size(1) + id1) * get_global_size(0) + 
> id0;
> +
> + uint3 result = too_many_argument_regs_get_local_id_012(
> + 1234, 999, 42, , , 9009, 777, 4242,
> + 202020, 6359, 8344, 1443, 552323, 33424, 666, 98765,
> + , 232556, 5, 934121, 94991, 1337, 0xdead, 0xbeef,
> + 0x, 0x, 0x666, 0x4141, 0x1234, 0x, 0x, 0x);
> +
> + out0[flat_id] = result.x;
> + out1[flat_id] = result.y;
> + out2[flat_id] = result.z;
> +}
> +
> +
> +typedef struct ByValStruct {
> + long array[9];
> +} ByValStruct;
> +
> +// Same as previous, with an addi

Re: [Piglit] [PATCH 1/2] cl: Add test for respecting byval alignment in call setup

2018-09-05 Thread Matt Arsenault
ping

> On Aug 22, 2018, at 15:41, Matt Arsenault  wrote:
> 
> ---
> .../cl/program/execute/calls-large-struct.cl  | 36 +++
> 1 file changed, 36 insertions(+)
> 
> diff --git a/tests/cl/program/execute/calls-large-struct.cl 
> b/tests/cl/program/execute/calls-large-struct.cl
> index c10458f37..0eac4d470 100644
> --- a/tests/cl/program/execute/calls-large-struct.cl
> +++ b/tests/cl/program/execute/calls-large-struct.cl
> @@ -37,6 +37,15 @@ arg_out: 0 buffer int[16]\
> arg_in: 1 buffer int[16] \
>  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> 
> +
> +[test]
> +name: byval struct align 8
> +kernel_name: kernel_call_byval_struct_align8
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1]  1
> +
> !*/
> 
> #define NOINLINE __attribute__((noinline))
> @@ -154,3 +163,30 @@ kernel void call_sret_Char_IntArray_func(global int* 
> output, global int* input)
> 
> output[id] = sum;
> }
> +
> +typedef struct ByVal_Struct_Align8 {
> +long xs[9];
> +} ByVal_Struct_Align8;
> +
> +__attribute__((noinline))
> +int func(ByVal_Struct_Align8 val)
> +{
> +for (int i = 0; i < 9; ++i)
> +{
> +long ld = val.xs[i];
> +if (ld != i)
> +return 0;
> +}
> +return 1;
> +}
> +
> +__kernel void kernel_call_byval_struct_align8(__global uint* result)
> +{
> +struct ByVal_Struct_Align8 val = { { 0x1337 } };
> +for (int i = 0; i < 9; ++i)
> +{
> +val.xs[i] = i;
> +}
> +
> +*result = func(val);
> +}
> -- 
> 2.17.1
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for call stack realignment

2018-09-05 Thread Matt Arsenault


> On Aug 22, 2018, at 10:57, Jan Vesely  wrote:
> 
> On Tue, 2018-08-21 at 21:00 +0300, Matt Arsenault wrote:
>> ping
> 
> sorry. I won't have access to my machines until next week (possibly
> September)
ping
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH 2/2] cl: Add tests for calls with special inputs

2018-08-22 Thread Matt Arsenault
Also fixes apparently missing coverage for special
input arguments not passed in registers.
---
 tests/cl/program/execute/calls-workitem-id.cl | 136 ++
 1 file changed, 136 insertions(+)

diff --git a/tests/cl/program/execute/calls-workitem-id.cl 
b/tests/cl/program/execute/calls-workitem-id.cl
index 7edfad7e9..b42c85959 100644
--- a/tests/cl/program/execute/calls-workitem-id.cl
+++ b/tests/cl/program/execute/calls-workitem-id.cl
@@ -38,6 +38,56 @@ arg_out: 2 buffer uint[64] \
   1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 \
   1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
 
+[test]
+name: Callee function stack passed get_local_id
+kernel_name: kernel_call_too_many_argument_regs_get_local_id_012
+dimensions: 3
+global_size: 8 4 2
+local_size: 8 4 2
+
+arg_out: 0 buffer uint[64] \
+  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7 \
+  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7 \
+  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7 \
+  0  1  2  3  4  5  6  7  0  1  2  3  4  5  6  7
+
+arg_out: 1 buffer uint[64] \
+  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1 \
+  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3 \
+  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1 \
+  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3
+
+arg_out: 2 buffer uint[64] \
+  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 \
+  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 \
+  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 \
+  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
+
+[test]
+name: Callee function stack passed get_local_id with byval
+kernel_name: kernel_call_too_many_argument_regs_byval_get_local_id_012
+dimensions: 3
+global_size: 8 4 2
+local_size: 8 4 2
+
+arg_out: 0 buffer uint[64] \
+  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52  \
+  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52  \
+  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52  \
+  45  46  47  48  49  50  51  52  45  46  47  48  49  50  51  52
+
+arg_out: 1 buffer uint[64] \
+  47  47  47  47  47  47  47  47  48  48  48  48  48  48  48  48 \
+  49  49  49  49  49  49  49  49  50  50  50  50  50  50  50  50 \
+  47  47  47  47  47  47  47  47  48  48  48  48  48  48  48  48 \
+  49  49  49  49  49  49  49  49  50  50  50  50  50  50  50  50
+
+arg_out: 2 buffer uint[64] \
+  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50 \
+  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50  50 \
+  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51 \
+  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51  51
+
 !*/
 
 #define NOINLINE __attribute__((noinline))
@@ -75,3 +125,89 @@ kernel void kernel_call_pass_get_global_id_012(global uint 
*out0,
 {
 func_get_global_id_012(out0, out1, out2);
 }
+
+// On amdgcn, this will require the workitem IDs be passed as values
+// on the stack after the arguments.
+NOINLINE
+uint3 too_many_argument_regs_get_local_id_012(
+   int arg0, int arg1, int arg2, int arg3,
+   int arg4, int arg5, int arg6, int arg7,
+   int arg8, int arg9, int arg10, int arg11,
+   int arg12, int arg13, int arg14, int arg15,
+   int arg16, int arg17, int arg18, int arg19,
+   int arg20, int arg21, int arg22, int arg23,
+   int arg24, int arg25, int arg26, int arg27,
+   int arg28, int arg29, int arg30, int arg31)
+{
+   uint3 id;
+   id.x = get_local_id(0);
+   id.y = get_local_id(1);
+   id.z = get_local_id(2);
+   return id;
+}
+
+kernel void kernel_call_too_many_argument_regs_get_local_id_012(global uint* 
out0, global uint* out1, global uint* out2)
+{
+   uint id0 = get_global_id(0);
+   uint id1 = get_global_id(1);
+   uint id2 = get_global_id(2);
+   uint flat_id = (id2 * get_global_size(1) + id1) * get_global_size(0) + 
id0;
+
+   uint3 result = too_many_argument_regs_get_local_id_012(
+   1234, 999, 42, , , 9009, 777, 4242,
+   202020, 6359, 8344, 1443, 552323, 33424, 666, 98765,
+   , 232556, 5, 934121, 94991, 1337, 0xdead, 0xbeef,
+   0x, 0x, 0x666, 0x4141, 0x1234, 0x, 0x, 0x);
+
+   out0[flat_id] = result.x;
+   out1[flat_id] = result.y;
+   out2[flat_id] = result.z;
+}
+
+
+typedef struct ByValStruct {
+   long array[9];
+} ByValStruct;
+
+// Same as previous, with an additional byval passed argument.
+NOINLINE
+uint3 too_many_argument_regs_byval_get_local_id_012(
+   ByValStruct byval_arg,
+   int arg0, int arg1, int arg2, int arg3,
+   int arg4, int arg5, int arg6, int arg7,
+   int arg8, int arg9, int arg10, int arg11,
+   int arg12, int arg13, int arg14, int arg15,
+   int arg16, int arg17, int arg18, int arg19,
+   int arg20, int arg21, int arg22, int arg23,
+   int arg24, int arg25, int arg26, int arg27,
+   int arg28, int arg29, int arg30, int arg31)
+{
+   uint3 id;
+   id.x = get_local_id(0) + byval_arg.array[3]; /

[Piglit] [PATCH 1/2] cl: Add test for respecting byval alignment in call setup

2018-08-22 Thread Matt Arsenault
---
 .../cl/program/execute/calls-large-struct.cl  | 36 +++
 1 file changed, 36 insertions(+)

diff --git a/tests/cl/program/execute/calls-large-struct.cl 
b/tests/cl/program/execute/calls-large-struct.cl
index c10458f37..0eac4d470 100644
--- a/tests/cl/program/execute/calls-large-struct.cl
+++ b/tests/cl/program/execute/calls-large-struct.cl
@@ -37,6 +37,15 @@ arg_out: 0 buffer int[16]\
 arg_in: 1 buffer int[16] \
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 
+
+[test]
+name: byval struct align 8
+kernel_name: kernel_call_byval_struct_align8
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1]  1
+
 !*/
 
 #define NOINLINE __attribute__((noinline))
@@ -154,3 +163,30 @@ kernel void call_sret_Char_IntArray_func(global int* 
output, global int* input)
 
 output[id] = sum;
 }
+
+typedef struct ByVal_Struct_Align8 {
+long xs[9];
+} ByVal_Struct_Align8;
+
+__attribute__((noinline))
+int func(ByVal_Struct_Align8 val)
+{
+for (int i = 0; i < 9; ++i)
+{
+long ld = val.xs[i];
+if (ld != i)
+return 0;
+}
+return 1;
+}
+
+__kernel void kernel_call_byval_struct_align8(__global uint* result)
+{
+struct ByVal_Struct_Align8 val = { { 0x1337 } };
+for (int i = 0; i < 9; ++i)
+{
+val.xs[i] = i;
+}
+
+*result = func(val);
+}
-- 
2.17.1

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for call stack realignment

2018-08-21 Thread Matt Arsenault
ping

> On Aug 13, 2018, at 23:33, Matt Arsenault  wrote:
> 
> v2: Use uintptr_t
> 
> v3: Formatting
> ---
> tests/cl/program/execute/realign-stack.cl | 93 +++
> 1 file changed, 93 insertions(+)
> create mode 100644 tests/cl/program/execute/realign-stack.cl
> 
> diff --git a/tests/cl/program/execute/realign-stack.cl 
> b/tests/cl/program/execute/realign-stack.cl
> new file mode 100644
> index 0..ca83284fe
> --- /dev/null
> +++ b/tests/cl/program/execute/realign-stack.cl
> @@ -0,0 +1,93 @@
> +/*!
> +
> +[config]
> +name: call with stack realignment
> +
> +[test]
> +name: call stack realignment 16
> +kernel_name: kernel_call_stack_realign16_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] 1
> +
> +
> +[test]
> +name: call stack realignment 32
> +kernel_name: kernel_call_stack_realign32_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] 1
> +
> +[test]
> +name: call stack realignment 64
> +kernel_name: kernel_call_stack_realign64_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] 1
> +
> +[test]
> +name: call stack realignment 128
> +kernel_name: kernel_call_stack_realign128_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] 1
> +
> +
> +!*/
> +
> +// Make sure the absolute private address of stack objects in callee
> +// functions is properly aligned.
> +
> +#define NOINLINE __attribute__((noinline))
> +
> +NOINLINE
> +int test_stack_object_alignment16() {
> +volatile int4 requires_align16 = 0;
> +volatile uintptr_t addr = (uint)&requires_align16;
> +return (addr & 15) == 0;
> +}
> +
> +NOINLINE
> +int test_stack_object_alignment32() {
> +volatile int8 requires_align32 = 0;
> +volatile uintptr_t addr = (uint)&requires_align32;
> +return (addr & 31) == 0;
> +}
> +
> +NOINLINE
> +int test_stack_object_alignment64() {
> +volatile int16 requires_align64 = 0;
> +volatile uintptr_t addr = (uint)&requires_align64;
> +return (addr & 63) == 0;
> +}
> +
> +NOINLINE
> +int test_stack_object_alignment128() {
> +volatile long16 requires_align128 = 0;
> +volatile uintptr_t addr = (uint)&requires_align128;
> +return (addr & 127) == 0;
> +}
> +
> +kernel void kernel_call_stack_realign16_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment16();
> +}
> +
> +kernel void kernel_call_stack_realign32_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment32();
> +}
> +
> +kernel void kernel_call_stack_realign64_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment64();
> +}
> +
> +kernel void kernel_call_stack_realign128_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment128();
> +}
> -- 
> 2.17.1
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Fix types to be unsigned

2018-08-13 Thread Matt Arsenault
ping

> On Oct 27, 2017, at 13:03, Matt Arsenault  wrote:
> 
> Doesn't really matter.
> ---
> tests/cl/program/execute/store-hi16.cl | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/cl/program/execute/store-hi16.cl 
> b/tests/cl/program/execute/store-hi16.cl
> index b734b3766..4273d3369 100644
> --- a/tests/cl/program/execute/store-hi16.cl
> +++ b/tests/cl/program/execute/store-hi16.cl
> @@ -92,7 +92,7 @@ kernel void store_hi16_global(volatile global ushort* out, 
> volatile global uint*
> 
> kernel void store_hi16_local(volatile global ushort* out, volatile global 
> uint* in)
> {
> -volatile local short lds[64];
> +volatile local ushort lds[64];
> int lid = get_local_id(0);
> int gid = get_global_id(0);
> 
> @@ -104,7 +104,7 @@ kernel void store_hi16_local(volatile global ushort* out, 
> volatile global uint*
> kernel void store_hi16_private(volatile global ushort* out, volatile global 
> uint* in)
> {
> int gid = get_global_id(0);
> -volatile private short stack = in[gid] >> 16;
> +volatile private ushort stack = in[gid] >> 16;
> out[gid] = stack;
> }
> 
> @@ -117,7 +117,7 @@ kernel void truncstorei8_hi16_global(volatile global 
> uchar* out, volatile global
> 
> kernel void truncstorei8_hi16_local(volatile global uchar* out, volatile 
> global uint* in)
> {
> -volatile local short lds[64];
> +volatile local ushort lds[64];
> int lid = get_local_id(0);
> int gid = get_global_id(0);
> 
> @@ -129,6 +129,6 @@ kernel void truncstorei8_hi16_local(volatile global 
> uchar* out, volatile global
> kernel void truncstorei8_hi16_private(volatile global uchar* out, volatile 
> global uint* in)
> {
> int gid = get_global_id(0);
> -volatile private short stack = in[gid] >> 16;
> +volatile private ushort stack = in[gid] >> 16;
> out[gid] = (uchar)stack;
> }
> -- 
> 2.11.0
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add bigger versions of calls with struct tests

2018-08-13 Thread Matt Arsenault
These are just bigger versions of the existing struct
calls tests so that they stress using byval/sret. The
existing call with struct tests are now passed directly
in registers.

v2: Rename struct member
---
 .../cl/program/execute/calls-large-struct.cl  | 156 ++
 tests/cl/program/execute/calls-struct.cl  |  96 +--
 2 files changed, 204 insertions(+), 48 deletions(-)
 create mode 100644 tests/cl/program/execute/calls-large-struct.cl

diff --git a/tests/cl/program/execute/calls-large-struct.cl 
b/tests/cl/program/execute/calls-large-struct.cl
new file mode 100644
index 0..c10458f37
--- /dev/null
+++ b/tests/cl/program/execute/calls-large-struct.cl
@@ -0,0 +1,156 @@
+/*!
+
+[config]
+name: calls with large structs
+clc_version_min: 10
+
+[test]
+name: byval struct
+kernel_name: call_i32_func_byval_Char_IntArray
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 1021 1022 1023 1024 1025 1026 1027 1028 \
+ 1029 1030 1031 1032 1033 1034 1035 1036
+
+arg_out: 1 buffer int[16] \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+
+arg_in: 2 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+
+[test]
+name: sret struct
+kernel_name: call_sret_Char_IntArray_func
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 921 922 923 924 925 926 927 928 \
+ 929 930 931 932 933 934 935 936
+
+arg_in: 1 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+!*/
+
+#define NOINLINE __attribute__((noinline))
+
+typedef struct ByVal_Char_IntArray {
+char c;
+int i32_arr[32];
+} ByVal_Char_IntArray;
+
+NOINLINE
+int i32_func_byval_Char_IntArray(ByVal_Char_IntArray st)
+{
+st.i32_arr[0] += 100;
+
+int sum = 0;
+for (int i = 0; i < 32; ++i)
+{
+sum += st.i32_arr[i];
+}
+
+sum += st.c;
+return sum;
+}
+
+kernel void call_i32_func_byval_Char_IntArray(global int* out0,
+  global int* out1,
+  global int* input)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int id = get_global_id(0);
+
+int val = input[id];
+
+
+st.i32_arr[0] = 14;
+st.i32_arr[1] = -8;
+st.i32_arr[2] = val;
+st.i32_arr[3] = 900;
+
+for (int i = 4; i < 32; ++i)
+{
+st.i32_arr[i] = 0;
+}
+
+volatile int stack_object[16];
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x07080900 | i;
+stack_object[i] = test_val;
+}
+
+int result = i32_func_byval_Char_IntArray(st);
+
+// Check for stack corruption
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x07080900 | i;
+if (stack_object[i] != test_val)
+result = -1;
+}
+
+out0[id] = result;
+out1[id] = st.i32_arr[0];
+}
+
+NOINLINE
+ByVal_Char_IntArray sret_Char_IntArray_func(global int* input, int id)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int val = input[id];
+st.i32_arr[0] = 14;
+st.i32_arr[1] = -8;
+st.i32_arr[2] = val;
+st.i32_arr[3] = 900;
+
+for (int i = 4; i < 32; ++i)
+{
+st.i32_arr[i] = 0;
+}
+
+return st;
+}
+
+kernel void call_sret_Char_IntArray_func(global int* output, global int* input)
+{
+volatile int stack_object[16];
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x04030200 | i;
+stack_object[i] = test_val;
+}
+
+int id = get_global_id(0);
+ByVal_Char_IntArray st = sret_Char_IntArray_func(input, id);
+
+int sum = 0;
+for (int i = 0; i < 32; ++i)
+{
+sum += st.i32_arr[i];
+}
+
+sum += st.c;
+
+// Check for stack corruption
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x04030200 | i;
+if (stack_object[i] != test_val)
+sum = -1;
+}
+
+output[id] = sum;
+}
diff --git a/tests/cl/program/execute/calls-struct.cl 
b/tests/cl/program/execute/calls-struct.cl
index 04f769dac..3e1fa6a85 100644
--- a/tests/cl/program/execute/calls-struct.cl
+++ b/tests/cl/program/execute/calls-struct.cl
@@ -1,12 +1,12 @@
 /*!
 
 [config]
-name: calls with structs
+name: calls with structs passed in registers on amdgcn
 clc_version_min: 10
 
 [test]
-name: byval struct
-kernel_name: call_i32_func_byval_Char_IntArray
+name: regs struct
+kernel_name: call_i32_func_small_struct_regs_Char_IntArray
 dimensions: 1
 global_size: 16 0 0
 
@@ -25,8 +25,8 @@ arg_in: 2 buffer int[16] \
 
 
 [test]
-name: sret struct
-kernel_name: call_sret_Char_IntArray_func
+name: struct_smallregs struct
+kernel_name: call_struct_smallregs_Char_IntArray_func
 dimensions: 1
 global_size: 16 0 0
 
@@ -39,8 +39,8 @@ arg_in: 1 buffer int[16] \
 
 
 [test]
-name: byval struct and sret struct
-kernel_name: call_sret_Char_IntArray_func_byval_Char_IntArray
+name: small struct in regs
+kernel_name: 
call_struct_smallregs_Char_IntArray_func_small_struct_regs_Char_IntArray
 dime

Re: [Piglit] [PATCH] cl: Add bigger versions of calls with struct tests

2018-08-13 Thread Matt Arsenault


> On Mar 23, 2018, at 23:07, Jan Vesely  wrote:
> 
> On Thu, 2018-03-15 at 11:41 -0400, Matt Arsenault wrote:
>> ping
>> 
>>> On Oct 12, 2017, at 16:19, Matt Arsenault  wrote:
>>> 
>>> These are just bigger versions of the existing struct
>>> calls tests so that they stress using byval/sret. The
>>> existing call with struct tests are now passed directly
>>> in registers.
>>> ---
>>> tests/cl/program/execute/calls-large-struct.cl | 156 
>>> +
>>> tests/cl/program/execute/calls-struct.cl   |  50 
>>> 2 files changed, 181 insertions(+), 25 deletions(-)
>>> create mode 100644 tests/cl/program/execute/calls-large-struct.cl
>>> 
>>> diff --git a/tests/cl/program/execute/calls-large-struct.cl 
>>> b/tests/cl/program/execute/calls-large-struct.cl
>>> new file mode 100644
>>> index 0..46d84760d
>>> --- /dev/null
>>> +++ b/tests/cl/program/execute/calls-large-struct.cl
>>> @@ -0,0 +1,156 @@
>>> +/*!
>>> +
>>> +[config]
>>> +name: calls with large structs
>>> +clc_version_min: 10
>>> +
>>> +[test]
>>> +name: byval struct
>>> +kernel_name: call_i32_func_byval_Char_IntArray
>>> +dimensions: 1
>>> +global_size: 16 0 0
>>> +
>>> +arg_out: 0 buffer int[16]\
>>> + 1021 1022 1023 1024 1025 1026 1027 1028 \
>>> + 1029 1030 1031 1032 1033 1034 1035 1036
>>> +
>>> +arg_out: 1 buffer int[16] \
>>> +  14   14   14   14 \
>>> +  14   14   14   14 \
>>> +  14   14   14   14 \
>>> +  14   14   14   14 \
>>> +
>>> +arg_in: 2 buffer int[16] \
>>> + 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>>> +
>>> +
>>> +[test]
>>> +name: sret struct
>>> +kernel_name: call_sret_Char_IntArray_func
>>> +dimensions: 1
>>> +global_size: 16 0 0
>>> +
>>> +arg_out: 0 buffer int[16]\
>>> + 921 922 923 924 925 926 927 928 \
>>> + 929 930 931 932 933 934 935 936
>>> +
>>> +arg_in: 1 buffer int[16] \
>>> + 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>>> +
>>> +!*/
>>> +
>>> +#define NOINLINE __attribute__((noinline))
>>> +
>>> +typedef struct ByVal_Char_IntArray {
>>> +char c;
>>> +int i[32];
>>> +} ByVal_Char_IntArray;
>>> +
>>> +NOINLINE
>>> +int i32_func_byval_Char_IntArray(ByVal_Char_IntArray st)
>>> +{
>>> +st.i[0] += 100;
>>> +
>>> +int sum = 0;
>>> +for (int i = 0; i < 32; ++i)
>>> +{
>>> +sum += st.i[i];
>>> +}
>>> +
>>> +sum += st.c;
>>> +return sum;
>>> +}
>>> +
>>> +kernel void call_i32_func_byval_Char_IntArray(global int* out0,
>>> +  global int* out1,
>>> +  global int* input)
>>> +{
>>> +ByVal_Char_IntArray st;
>>> +st.c = 15;
>>> +
>>> +int id = get_global_id(0);
>>> +
>>> +int val = input[id];
>>> +
>>> +
>>> +st.i[0] = 14;
>>> +st.i[1] = -8;
>>> +st.i[2] = val;
>>> +st.i[3] = 900;
> 
> are these just some arbitrary numbers or do they have a specific
> meaning?

They’re arbitrary


> 
>>> +
>>> +for (int i = 4; i < 32; ++i)
>>> +{
>>> +st.i[i] = 0;
>>> +}
>>> +
>>> +volatile int stack_object[16];
>>> +for (int i = 0; i < 16; ++i)
>>> +{
>>> +const int test_val = 0x07080900 | i;
> same here

Just arbitrary values to test against



>>> +stack_object[i] = test_val;
>>> +}
>>> +
>>> +int result = i32_func_byval_Char_IntArray(st);
>>> +
>>> +// Check for stack corruption
>>> +for (int i = 0; i < 16; ++i)
>>> +{
>>> +const int test_val = 0x07080900 | i;
>>> +if (stack_object[i] != test_val)
>>> +result = -1;
>>> +}
>>> +
>>> +out0[id] = result;
>>> +out1[id] = st.i[0];
>>> +}
>>> +
>>> +NOINLINE
>>> +ByVal_Char_IntArray sret_Char_IntArray_func(global int* input, int id)
> 
> why is it called sret? is it "stack return"? why not s

[Piglit] [PATCH] cl: Add test for call stack realignment

2018-08-13 Thread Matt Arsenault
v2: Use uintptr_t

v3: Formatting
---
 tests/cl/program/execute/realign-stack.cl | 93 +++
 1 file changed, 93 insertions(+)
 create mode 100644 tests/cl/program/execute/realign-stack.cl

diff --git a/tests/cl/program/execute/realign-stack.cl 
b/tests/cl/program/execute/realign-stack.cl
new file mode 100644
index 0..ca83284fe
--- /dev/null
+++ b/tests/cl/program/execute/realign-stack.cl
@@ -0,0 +1,93 @@
+/*!
+
+[config]
+name: call with stack realignment
+
+[test]
+name: call stack realignment 16
+kernel_name: kernel_call_stack_realign16_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+
+[test]
+name: call stack realignment 32
+kernel_name: kernel_call_stack_realign32_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+[test]
+name: call stack realignment 64
+kernel_name: kernel_call_stack_realign64_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+[test]
+name: call stack realignment 128
+kernel_name: kernel_call_stack_realign128_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] 1
+
+
+!*/
+
+// Make sure the absolute private address of stack objects in callee
+// functions is properly aligned.
+
+#define NOINLINE __attribute__((noinline))
+
+NOINLINE
+int test_stack_object_alignment16() {
+volatile int4 requires_align16 = 0;
+volatile uintptr_t addr = (uint)&requires_align16;
+return (addr & 15) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment32() {
+volatile int8 requires_align32 = 0;
+volatile uintptr_t addr = (uint)&requires_align32;
+return (addr & 31) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment64() {
+volatile int16 requires_align64 = 0;
+volatile uintptr_t addr = (uint)&requires_align64;
+return (addr & 63) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment128() {
+volatile long16 requires_align128 = 0;
+volatile uintptr_t addr = (uint)&requires_align128;
+return (addr & 127) == 0;
+}
+
+kernel void kernel_call_stack_realign16_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment16();
+}
+
+kernel void kernel_call_stack_realign32_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment32();
+}
+
+kernel void kernel_call_stack_realign64_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment64();
+}
+
+kernel void kernel_call_stack_realign128_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment128();
+}
-- 
2.17.1

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for CSR VGPRs caused by SGPR spilling

2018-08-13 Thread Matt Arsenault


> On May 8, 2018, at 18:45, Jan Vesely  wrote:
> 
> On Tue, 2018-05-08 at 13:28 +0300, Matt Arsenault wrote:
>>> On Apr 8, 2018, at 19:56, Jan Vesely  wrote:
>>> 
>>> On Fri, 2018-04-06 at 00:49 -0400, Matt Arsenault wrote:
>>>> ping
>>> 
>>> I'll need to setup the rocm stack to test this. It will take some time.
>>> It should work with clover as well (modulo bugs; asm parser, function
>>> calls, ...), right?
>>> 
>>> Jan
>> 
>> I thought calls were broken in general with clover because of the
>> missing link step? Besides that it should work
> 
> Linking works (both linking with libclc and cl-1.2 clLinkProgram) it
> just happens at IR level (if you consider that linking) so all function
> calls can be inlined.
> 
> The problem is that llvm backend generates relocation for function
> calls. This relocation is not handled by clover (you could call this
> 'calls are broken in general').
> 
> I see two ways to fix this;
> a) fix llvm to use fixup instead of relocation for internal function
> calls.
> b) fix clover to handle the function call relocation.
> 
> I tried a) but a simple
> "|| (GV->getLinkage() == GlobalValue::InternalLinkage)"
> in shouldEmitFixup() is not enough (the fixup value looks wrong)
> 
> I still think that a) is preferable, but now that 6.0 is out with the
> breakage we'll need to implement b) anyway.
> 
> I'll try to find some time to dig a bit more into this, but it's tricky
> since wrong jump leaves the GPU in unrecoverable state that needs
> manual power cycling on reboot.
> 
> Jan
>> 
>> -Matt


ping. Should this just skip the clover platform for now?
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for CSR VGPRs caused by SGPR spilling

2018-05-08 Thread Matt Arsenault


> On Apr 8, 2018, at 19:56, Jan Vesely  wrote:
> 
> On Fri, 2018-04-06 at 00:49 -0400, Matt Arsenault wrote:
>> ping
> 
> I'll need to setup the rocm stack to test this. It will take some time.
> It should work with clover as well (modulo bugs; asm parser, function
> calls, ...), right?
> 
> Jan

I thought calls were broken in general with clover because of the missing link 
step? Besides that it should work

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH 1/2] cl: Add test for call stack realignment

2018-04-05 Thread Matt Arsenault
v2: Use uintptr_t
---
 tests/cl/program/execute/realign-stack.cl | 96 +++
 1 file changed, 96 insertions(+)
 create mode 100644 tests/cl/program/execute/realign-stack.cl

diff --git a/tests/cl/program/execute/realign-stack.cl 
b/tests/cl/program/execute/realign-stack.cl
new file mode 100644
index 0..e415cd7f8
--- /dev/null
+++ b/tests/cl/program/execute/realign-stack.cl
@@ -0,0 +1,96 @@
+/*!
+
+[config]
+name: call with stack realignment
+
+[test]
+name: call stack realignment 16
+kernel_name: kernel_call_stack_realign16_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+
+[test]
+name: call stack realignment 32
+kernel_name: kernel_call_stack_realign32_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+[test]
+name: call stack realignment 64
+kernel_name: kernel_call_stack_realign64_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+[test]
+name: call stack realignment 128
+kernel_name: kernel_call_stack_realign128_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+!*/
+
+// Make sure the absolute private address of stack objects in callee
+// functions is properly aligned.
+
+#define NOINLINE __attribute__((noinline))
+
+NOINLINE
+int test_stack_object_alignment16() {
+volatile int4 requires_align16 = 0;
+volatile uintptr_t addr = (uint)&requires_align16;
+return (addr & 15) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment32() {
+volatile int8 requires_align32 = 0;
+volatile uintptr_t addr = (uint)&requires_align32;
+return (addr & 31) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment64() {
+volatile int16 requires_align64 = 0;
+volatile uintptr_t addr = (uint)&requires_align64;
+return (addr & 63) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment128() {
+volatile long16 requires_align128 = 0;
+volatile uintptr_t addr = (uint)&requires_align128;
+return (addr & 127) == 0;
+}
+
+kernel void kernel_call_stack_realign16_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment16();
+}
+
+kernel void kernel_call_stack_realign32_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment32();
+}
+
+kernel void kernel_call_stack_realign64_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment64();
+}
+
+kernel void kernel_call_stack_realign128_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment128();
+}
-- 
2.14.1

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for CSR VGPRs caused by SGPR spilling

2018-04-05 Thread Matt Arsenault
ping

> On Mar 29, 2018, at 11:29, Matt Arsenault  wrote:
> 
> Make sure if a CSR VGPR is needed for SGPR spilling, it is
> properly saved and restored.
> ---
> .../execute/amdgcn-callee-saved-registers.cl   | 52 ++
> 1 file changed, 52 insertions(+)
> create mode 100644 tests/cl/program/execute/amdgcn-callee-saved-registers.cl
> 
> diff --git a/tests/cl/program/execute/amdgcn-callee-saved-registers.cl 
> b/tests/cl/program/execute/amdgcn-callee-saved-registers.cl
> new file mode 100644
> index 0..8b8db2783
> --- /dev/null
> +++ b/tests/cl/program/execute/amdgcn-callee-saved-registers.cl
> @@ -0,0 +1,52 @@
> +/*!
> +
> +[config]
> +name: amdgcn call clobbers
> +clc_version_min: 10
> +device_regex: gfx[\d]*
> +
> +[test]
> +name: CSR VGPR for SGPR spilling
> +kernel_name: kernel_call_need_spill_vgpr_for_csr_sgpr_spills_no_calls
> +dimensions: 1
> +global_size: 1 0 0
> +arg_out: 0 buffer int[2] \
> +  0x1337  0xabcd1234
> +
> +!*/
> +
> +#ifndef __AMDGCN__
> +#error This test is only for amdgcn
> +#endif
> +
> +__attribute__((noinline))
> +int need_spill_vgpr_for_csr_sgpr_spills_no_calls()
> +{
> +int sgpr_val;
> +__asm volatile("s_mov_b32 %0, 0x1337" : "=s"(sgpr_val));
> +
> +__asm volatile(
> +"s_nop 1" :::
> +"v0","v1","v2","v3","v4","v5","v6","v7",
> +"v8","v9","v10","v11","v12","v13","v14","v15",
> +"v16","v17","v18","v19","v20","v21","v22","v23",
> +"v24","v25","v26","v27","v28","v29","v30","v31",
> +
> +"s0","s1","s2","s3","s4","s5","s6","s7",
> +"s8","s9","s10","s11","s12","s13","s14","s15",
> +"s16","s17","s18","s19","s20","s21","s22","s23",
> +"s24","s25","s26","s27","s28","s29","s30","s31",
> + "s32", "s33", "s34", "s35", "s36", "s37", "s38");
> +
> +return sgpr_val;
> +}
> +
> +
> +kernel void kernel_call_need_spill_vgpr_for_csr_sgpr_spills_no_calls(global 
> int* ret)
> +{
> +int v32;
> +__asm volatile("v_mov_b32 %0, 0xabcd1234" : "={v32}"(v32));
> +ret[0] = need_spill_vgpr_for_csr_sgpr_spills_no_calls();
> +__asm volatile ("s_nop 0" :: "{v32}"(v32));
> +ret[1] = v32;
> +}
> -- 
> 2.14.1
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for call stack realignment

2018-04-05 Thread Matt Arsenault


> On Apr 4, 2018, at 15:52, Jan Vesely  wrote:
> 
> redundant newline

Not sure what you mean by this. Do you mean the newline to put the single array 
element on its own line? I was trying to be consistent with buffer formatting 
as most tests do___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for call stack realignment

2018-04-05 Thread Matt Arsenault


> On Apr 4, 2018, at 15:52, Jan Vesely  wrote:
> 
> On Tue, 2018-04-03 at 18:03 -0400, Matt Arsenault wrote:
>> ping
>> 
>>> On Mar 29, 2018, at 10:35, Matt Arsenault  wrote:
>>> 
>>> ---
>>> tests/cl/program/execute/realign-stack.cl | 96 
>>> +++
>>> 1 file changed, 96 insertions(+)
>>> create mode 100644 tests/cl/program/execute/realign-stack.cl
>>> 
>>> diff --git a/tests/cl/program/execute/realign-stack.cl 
>>> b/tests/cl/program/execute/realign-stack.cl
>>> new file mode 100644
>>> index 0..ed62ea211
>>> --- /dev/null
>>> +++ b/tests/cl/program/execute/realign-stack.cl
>>> @@ -0,0 +1,96 @@
>>> +/*!
>>> +
>>> +[config]
>>> +name: call with stack realignment
> 
> why does this care about call? 
> CLC requires types to be aligned to next power of 2 of their size
> irrespective of the location. HOw is this different from any other
> __private variable declaration?
> 

This is testing that requirement when the object resides in a frame that isn’t 
the entry point / kernel. The problem was this requirement wasn’t being 
respected because the frame itself wasn’t aligned, so the absolute address of 
the object wasn’t properly aligned.







>>> +
>>> +[test]
>>> +name: call stack realignment 16
>>> +kernel_name: kernel_call_stack_realign16_func
>>> +dimensions: 1
>>> +global_size: 1 0 0
>>> +
>>> +arg_out: 0 buffer int[1] \
>>> +  1
> 
> redundant newline
> 
>>> +
>>> +
>>> +[test]
>>> +name: call stack realignment 32
>>> +kernel_name: kernel_call_stack_realign32_func
>>> +dimensions: 1
>>> +global_size: 1 0 0
>>> +
>>> +arg_out: 0 buffer int[1] \
>>> +  1
> 
> same here
> 
>>> +
>>> +[test]
>>> +name: call stack realignment 64
>>> +kernel_name: kernel_call_stack_realign64_func
>>> +dimensions: 1
>>> +global_size: 1 0 0
>>> +
>>> +arg_out: 0 buffer int[1] \
>>> +  1
> 
> same here
> 
>>> +
>>> +[test]
>>> +name: call stack realignment 128
>>> +kernel_name: kernel_call_stack_realign128_func
>>> +dimensions: 1
>>> +global_size: 1 0 0
>>> +
>>> +arg_out: 0 buffer int[1] \
>>> +  1
> 
> and here
> 
>>> +
>>> +!*/
>>> +
>>> +// Make sure the absolute private address of stack objects in callee
>>> +// functions is properly aligned.
>>> +
>>> +#define NOINLINE __attribute__((noinline))
>>> +
>>> +NOINLINE
>>> +int test_stack_object_alignment16() {
>>> +volatile int4 requires_align16 = 0;
>>> +volatile uint addr = (uint)&requires_align16;
> 
> this should use uintptr_t. why is the addr variable volatile?
> same in the below tests.

Optimizations can use the alignment information to conclude this is always 
true, so the volatile load from memory prevents this


___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for call stack realignment

2018-04-03 Thread Matt Arsenault
ping

> On Mar 29, 2018, at 10:35, Matt Arsenault  wrote:
> 
> ---
> tests/cl/program/execute/realign-stack.cl | 96 +++
> 1 file changed, 96 insertions(+)
> create mode 100644 tests/cl/program/execute/realign-stack.cl
> 
> diff --git a/tests/cl/program/execute/realign-stack.cl 
> b/tests/cl/program/execute/realign-stack.cl
> new file mode 100644
> index 0..ed62ea211
> --- /dev/null
> +++ b/tests/cl/program/execute/realign-stack.cl
> @@ -0,0 +1,96 @@
> +/*!
> +
> +[config]
> +name: call with stack realignment
> +
> +[test]
> +name: call stack realignment 16
> +kernel_name: kernel_call_stack_realign16_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] \
> +  1
> +
> +
> +[test]
> +name: call stack realignment 32
> +kernel_name: kernel_call_stack_realign32_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] \
> +  1
> +
> +[test]
> +name: call stack realignment 64
> +kernel_name: kernel_call_stack_realign64_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] \
> +  1
> +
> +[test]
> +name: call stack realignment 128
> +kernel_name: kernel_call_stack_realign128_func
> +dimensions: 1
> +global_size: 1 0 0
> +
> +arg_out: 0 buffer int[1] \
> +  1
> +
> +!*/
> +
> +// Make sure the absolute private address of stack objects in callee
> +// functions is properly aligned.
> +
> +#define NOINLINE __attribute__((noinline))
> +
> +NOINLINE
> +int test_stack_object_alignment16() {
> +volatile int4 requires_align16 = 0;
> +volatile uint addr = (uint)&requires_align16;
> +return (addr & 15) == 0;
> +}
> +
> +NOINLINE
> +int test_stack_object_alignment32() {
> +volatile int8 requires_align32 = 0;
> +volatile uint addr = (uint)&requires_align32;
> +return (addr & 31) == 0;
> +}
> +
> +NOINLINE
> +int test_stack_object_alignment64() {
> +volatile int16 requires_align64 = 0;
> +volatile uint addr = (uint)&requires_align64;
> +return (addr & 63) == 0;
> +}
> +
> +NOINLINE
> +int test_stack_object_alignment128() {
> +volatile long16 requires_align128 = 0;
> +volatile uint addr = (uint)&requires_align128;
> +return (addr & 127) == 0;
> +}
> +
> +kernel void kernel_call_stack_realign16_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment16();
> +}
> +
> +kernel void kernel_call_stack_realign32_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment32();
> +}
> +
> +kernel void kernel_call_stack_realign64_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment64();
> +}
> +
> +kernel void kernel_call_stack_realign128_func(global int* out) {
> +volatile int misalign_stack = 0;
> +*out = test_stack_object_alignment128();
> +}
> -- 
> 2.14.1
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add test for CSR VGPRs caused by SGPR spilling

2018-03-29 Thread Matt Arsenault
Make sure if a CSR VGPR is needed for SGPR spilling, it is
properly saved and restored.
---
 .../execute/amdgcn-callee-saved-registers.cl   | 52 ++
 1 file changed, 52 insertions(+)
 create mode 100644 tests/cl/program/execute/amdgcn-callee-saved-registers.cl

diff --git a/tests/cl/program/execute/amdgcn-callee-saved-registers.cl 
b/tests/cl/program/execute/amdgcn-callee-saved-registers.cl
new file mode 100644
index 0..8b8db2783
--- /dev/null
+++ b/tests/cl/program/execute/amdgcn-callee-saved-registers.cl
@@ -0,0 +1,52 @@
+/*!
+
+[config]
+name: amdgcn call clobbers
+clc_version_min: 10
+device_regex: gfx[\d]*
+
+[test]
+name: CSR VGPR for SGPR spilling
+kernel_name: kernel_call_need_spill_vgpr_for_csr_sgpr_spills_no_calls
+dimensions: 1
+global_size: 1 0 0
+arg_out: 0 buffer int[2] \
+  0x1337  0xabcd1234
+
+!*/
+
+#ifndef __AMDGCN__
+#error This test is only for amdgcn
+#endif
+
+__attribute__((noinline))
+int need_spill_vgpr_for_csr_sgpr_spills_no_calls()
+{
+int sgpr_val;
+__asm volatile("s_mov_b32 %0, 0x1337" : "=s"(sgpr_val));
+
+__asm volatile(
+"s_nop 1" :::
+"v0","v1","v2","v3","v4","v5","v6","v7",
+"v8","v9","v10","v11","v12","v13","v14","v15",
+"v16","v17","v18","v19","v20","v21","v22","v23",
+"v24","v25","v26","v27","v28","v29","v30","v31",
+
+"s0","s1","s2","s3","s4","s5","s6","s7",
+"s8","s9","s10","s11","s12","s13","s14","s15",
+"s16","s17","s18","s19","s20","s21","s22","s23",
+"s24","s25","s26","s27","s28","s29","s30","s31",
+   "s32", "s33", "s34", "s35", "s36", "s37", "s38");
+
+return sgpr_val;
+}
+
+
+kernel void kernel_call_need_spill_vgpr_for_csr_sgpr_spills_no_calls(global 
int* ret)
+{
+int v32;
+__asm volatile("v_mov_b32 %0, 0xabcd1234" : "={v32}"(v32));
+ret[0] = need_spill_vgpr_for_csr_sgpr_spills_no_calls();
+__asm volatile ("s_nop 0" :: "{v32}"(v32));
+ret[1] = v32;
+}
-- 
2.14.1

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add test for call stack realignment

2018-03-29 Thread Matt Arsenault
---
 tests/cl/program/execute/realign-stack.cl | 96 +++
 1 file changed, 96 insertions(+)
 create mode 100644 tests/cl/program/execute/realign-stack.cl

diff --git a/tests/cl/program/execute/realign-stack.cl 
b/tests/cl/program/execute/realign-stack.cl
new file mode 100644
index 0..ed62ea211
--- /dev/null
+++ b/tests/cl/program/execute/realign-stack.cl
@@ -0,0 +1,96 @@
+/*!
+
+[config]
+name: call with stack realignment
+
+[test]
+name: call stack realignment 16
+kernel_name: kernel_call_stack_realign16_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+
+[test]
+name: call stack realignment 32
+kernel_name: kernel_call_stack_realign32_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+[test]
+name: call stack realignment 64
+kernel_name: kernel_call_stack_realign64_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+[test]
+name: call stack realignment 128
+kernel_name: kernel_call_stack_realign128_func
+dimensions: 1
+global_size: 1 0 0
+
+arg_out: 0 buffer int[1] \
+  1
+
+!*/
+
+// Make sure the absolute private address of stack objects in callee
+// functions is properly aligned.
+
+#define NOINLINE __attribute__((noinline))
+
+NOINLINE
+int test_stack_object_alignment16() {
+volatile int4 requires_align16 = 0;
+volatile uint addr = (uint)&requires_align16;
+return (addr & 15) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment32() {
+volatile int8 requires_align32 = 0;
+volatile uint addr = (uint)&requires_align32;
+return (addr & 31) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment64() {
+volatile int16 requires_align64 = 0;
+volatile uint addr = (uint)&requires_align64;
+return (addr & 63) == 0;
+}
+
+NOINLINE
+int test_stack_object_alignment128() {
+volatile long16 requires_align128 = 0;
+volatile uint addr = (uint)&requires_align128;
+return (addr & 127) == 0;
+}
+
+kernel void kernel_call_stack_realign16_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment16();
+}
+
+kernel void kernel_call_stack_realign32_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment32();
+}
+
+kernel void kernel_call_stack_realign64_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment64();
+}
+
+kernel void kernel_call_stack_realign128_func(global int* out) {
+volatile int misalign_stack = 0;
+*out = test_stack_object_alignment128();
+}
-- 
2.14.1

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add bigger versions of calls with struct tests

2018-03-15 Thread Matt Arsenault
ping

> On Oct 12, 2017, at 16:19, Matt Arsenault  wrote:
> 
> These are just bigger versions of the existing struct
> calls tests so that they stress using byval/sret. The
> existing call with struct tests are now passed directly
> in registers.
> ---
> tests/cl/program/execute/calls-large-struct.cl | 156 +
> tests/cl/program/execute/calls-struct.cl   |  50 
> 2 files changed, 181 insertions(+), 25 deletions(-)
> create mode 100644 tests/cl/program/execute/calls-large-struct.cl
> 
> diff --git a/tests/cl/program/execute/calls-large-struct.cl 
> b/tests/cl/program/execute/calls-large-struct.cl
> new file mode 100644
> index 0..46d84760d
> --- /dev/null
> +++ b/tests/cl/program/execute/calls-large-struct.cl
> @@ -0,0 +1,156 @@
> +/*!
> +
> +[config]
> +name: calls with large structs
> +clc_version_min: 10
> +
> +[test]
> +name: byval struct
> +kernel_name: call_i32_func_byval_Char_IntArray
> +dimensions: 1
> +global_size: 16 0 0
> +
> +arg_out: 0 buffer int[16]\
> + 1021 1022 1023 1024 1025 1026 1027 1028 \
> + 1029 1030 1031 1032 1033 1034 1035 1036
> +
> +arg_out: 1 buffer int[16] \
> +  14   14   14   14 \
> +  14   14   14   14 \
> +  14   14   14   14 \
> +  14   14   14   14 \
> +
> +arg_in: 2 buffer int[16] \
> + 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> +
> +
> +[test]
> +name: sret struct
> +kernel_name: call_sret_Char_IntArray_func
> +dimensions: 1
> +global_size: 16 0 0
> +
> +arg_out: 0 buffer int[16]\
> + 921 922 923 924 925 926 927 928 \
> + 929 930 931 932 933 934 935 936
> +
> +arg_in: 1 buffer int[16] \
> + 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> +
> +!*/
> +
> +#define NOINLINE __attribute__((noinline))
> +
> +typedef struct ByVal_Char_IntArray {
> +char c;
> +int i[32];
> +} ByVal_Char_IntArray;
> +
> +NOINLINE
> +int i32_func_byval_Char_IntArray(ByVal_Char_IntArray st)
> +{
> +st.i[0] += 100;
> +
> +int sum = 0;
> +for (int i = 0; i < 32; ++i)
> +{
> +sum += st.i[i];
> +}
> +
> +sum += st.c;
> +return sum;
> +}
> +
> +kernel void call_i32_func_byval_Char_IntArray(global int* out0,
> +  global int* out1,
> +  global int* input)
> +{
> +ByVal_Char_IntArray st;
> +st.c = 15;
> +
> +int id = get_global_id(0);
> +
> +int val = input[id];
> +
> +
> +st.i[0] = 14;
> +st.i[1] = -8;
> +st.i[2] = val;
> +st.i[3] = 900;
> +
> +for (int i = 4; i < 32; ++i)
> +{
> +st.i[i] = 0;
> +}
> +
> +volatile int stack_object[16];
> +for (int i = 0; i < 16; ++i)
> +{
> +const int test_val = 0x07080900 | i;
> +stack_object[i] = test_val;
> +}
> +
> +int result = i32_func_byval_Char_IntArray(st);
> +
> +// Check for stack corruption
> +for (int i = 0; i < 16; ++i)
> +{
> +const int test_val = 0x07080900 | i;
> +if (stack_object[i] != test_val)
> +result = -1;
> +}
> +
> +out0[id] = result;
> +out1[id] = st.i[0];
> +}
> +
> +NOINLINE
> +ByVal_Char_IntArray sret_Char_IntArray_func(global int* input, int id)
> +{
> +ByVal_Char_IntArray st;
> +st.c = 15;
> +
> +int val = input[id];
> +st.i[0] = 14;
> +st.i[1] = -8;
> +st.i[2] = val;
> +st.i[3] = 900;
> +
> +for (int i = 4; i < 32; ++i)
> +{
> +st.i[i] = 0;
> +}
> +
> +return st;
> +}
> +
> +kernel void call_sret_Char_IntArray_func(global int* output, global int* 
> input)
> +{
> +volatile int stack_object[16];
> +for (int i = 0; i < 16; ++i)
> +{
> +const int test_val = 0x04030200 | i;
> +stack_object[i] = test_val;
> +}
> +
> +int id = get_global_id(0);
> +ByVal_Char_IntArray st = sret_Char_IntArray_func(input, id);
> +
> +int sum = 0;
> +for (int i = 0; i < 32; ++i)
> +{
> +sum += st.i[i];
> +}
> +
> +sum += st.c;
> +
> +// Check for stack corruption
> +for (int i = 0; i < 16; ++i)
> +{
> +const int test_val = 0x04030200 | i;
> +if (stack_object[i] != test_val)
> +sum = -1;
> +}
> +
> +output[id] = sum;
> +}
> diff --git a/tests/cl/program/execute/calls-struct.cl 
> b/tests/cl/program/execute/calls-struct.cl
> index 04f769dac..5d52e5587 100644
> --- a/tests/cl/program/execute/calls-struct.cl
> +

Re: [Piglit] [PATCH] cl: Add test for MUBUF access with a negative vaddr

2018-01-19 Thread Matt Arsenault


> On Jan 18, 2018, at 15:02, Jan Vesely  wrote:
> 
> Why is this necessary? can't you just pass the offset argument as a
> kernel input?
> 
> Jan

It needs to specifically be in a VGPR___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for MUBUF access with a negative vaddr

2018-01-18 Thread Matt Arsenault
ping

> On Dec 15, 2017, at 14:01, Matt Arsenault  wrote:
> 
> ping
> 
>> On Nov 28, 2017, at 17:20, Matt Arsenault  wrote:
>> 
>> Explanation in test comment.
>> ---
>> .../program/execute/amdgcn-mubuf-negative-vaddr.cl | 62 
>> ++
>> 1 file changed, 62 insertions(+)
>> create mode 100644 tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
>> 
>> diff --git a/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl 
>> b/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
>> new file mode 100644
>> index 0..21f11bf66
>> --- /dev/null
>> +++ b/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
>> @@ -0,0 +1,62 @@
>> +>/*!
>> +
>> +[config]
>> +name: MUBUF stack addressing behavior
>> +clc_version_min: 10
>> +
>> +[test]
>> +name: MUBUF negative buffer offsets
>> +kernel_name: negative_mubuf_vaddr
>> +dimensions: 1
>> +global_size: 16 0 0
>> +
>> +arg_out: 0 buffer int[16]\
>> +  5 5 5 5 \
>> +  5 5 5 5 \
>> +  5 5 5 5 \
>> +  5 5 5 5
>> +
>> +!*/
>> +
>> +// Prior to gfx9, MUBUF instructions with the vaddr offset enabled
>> +// would always perform a range check. If a negative vaddr base index
>> +// was used, this would fail the range check. The overall address
>> +// computation would compute a valid address, but this doesn't happen
>> +// due to the range check. For out-of-bounds MUBUF loads, a 0 is
>> +// returned.
>> +//
>> +// Therefore it should be safe to fold any VGPR offset on gfx9 into
>> +// the MUBUF vaddr, but not on older subtargets which can only do this
>> +// if the sign bit is known 0.
>> +kernel void negative_mubuf_vaddr(global int* out0)
>> +{
>> +volatile int array[16];
>> +
>> +int id = get_global_id(0);
>> +for (int i = 0; i < 16; ++i)
>> +{
>> +array[i] = i + 1;
>> +}
>> +
>> +// Directly addressing the same buffer address works without using 
>> vaddr:
>> +//
>> +// buffer_load_dword v2, off, s[0:3], s11 offset:20
>> +// out0[id] = array[4];
>> +
>> +
>> +// But having a negative computed base index would fail:
>> +// v_mov_b32_e32 v0, -8
>> +// v_lshlrev_b32_e32 v0, 2, v0
>> +// v_add_i32_e32 v0, vcc, 4, v0
>> +// buffer_load_dword v2, v0, s[0:3], s11 offen offset:48
>> +
>> +#ifdef __AMDGCN__
>> +// Obscure the value so it can't be folded with other constant or
>> +// make known bits assumptions.
>> +int offset;
>> +__asm volatile("v_mov_b32 %0, -8" : "=v"(offset));
>> +#else
>> +int offset = -8;
>> +#endif
>> +out0[id] = array[offset + 12];
>> +}
>> -- 
>> 2.11.0
>> 
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add test for MUBUF access with a negative vaddr

2017-12-15 Thread Matt Arsenault
ping

> On Nov 28, 2017, at 17:20, Matt Arsenault  wrote:
> 
> Explanation in test comment.
> ---
> .../program/execute/amdgcn-mubuf-negative-vaddr.cl | 62 ++
> 1 file changed, 62 insertions(+)
> create mode 100644 tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
> 
> diff --git a/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl 
> b/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
> new file mode 100644
> index 0..21f11bf66
> --- /dev/null
> +++ b/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
> @@ -0,0 +1,62 @@
> +>/*!
> +
> +[config]
> +name: MUBUF stack addressing behavior
> +clc_version_min: 10
> +
> +[test]
> +name: MUBUF negative buffer offsets
> +kernel_name: negative_mubuf_vaddr
> +dimensions: 1
> +global_size: 16 0 0
> +
> +arg_out: 0 buffer int[16]\
> +  5 5 5 5 \
> +  5 5 5 5 \
> +  5 5 5 5 \
> +  5 5 5 5
> +
> +!*/
> +
> +// Prior to gfx9, MUBUF instructions with the vaddr offset enabled
> +// would always perform a range check. If a negative vaddr base index
> +// was used, this would fail the range check. The overall address
> +// computation would compute a valid address, but this doesn't happen
> +// due to the range check. For out-of-bounds MUBUF loads, a 0 is
> +// returned.
> +//
> +// Therefore it should be safe to fold any VGPR offset on gfx9 into
> +// the MUBUF vaddr, but not on older subtargets which can only do this
> +// if the sign bit is known 0.
> +kernel void negative_mubuf_vaddr(global int* out0)
> +{
> +volatile int array[16];
> +
> +int id = get_global_id(0);
> +for (int i = 0; i < 16; ++i)
> +{
> +array[i] = i + 1;
> +}
> +
> +// Directly addressing the same buffer address works without using vaddr:
> +//
> +// buffer_load_dword v2, off, s[0:3], s11 offset:20
> +// out0[id] = array[4];
> +
> +
> +// But having a negative computed base index would fail:
> +// v_mov_b32_e32 v0, -8
> +// v_lshlrev_b32_e32 v0, 2, v0
> +// v_add_i32_e32 v0, vcc, 4, v0
> +// buffer_load_dword v2, v0, s[0:3], s11 offen offset:48
> +
> +#ifdef __AMDGCN__
> +// Obscure the value so it can't be folded with other constant or
> +// make known bits assumptions.
> +int offset;
> +__asm volatile("v_mov_b32 %0, -8" : "=v"(offset));
> +#else
> +int offset = -8;
> +#endif
> +out0[id] = array[offset + 12];
> +}
> -- 
> 2.11.0
> 

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add test for MUBUF access with a negative vaddr

2017-11-28 Thread Matt Arsenault
Explanation in test comment.
---
 .../program/execute/amdgcn-mubuf-negative-vaddr.cl | 62 ++
 1 file changed, 62 insertions(+)
 create mode 100644 tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl

diff --git a/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl 
b/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
new file mode 100644
index 0..21f11bf66
--- /dev/null
+++ b/tests/cl/program/execute/amdgcn-mubuf-negative-vaddr.cl
@@ -0,0 +1,62 @@
+>/*!
+
+[config]
+name: MUBUF stack addressing behavior
+clc_version_min: 10
+
+[test]
+name: MUBUF negative buffer offsets
+kernel_name: negative_mubuf_vaddr
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+  5 5 5 5 \
+  5 5 5 5 \
+  5 5 5 5 \
+  5 5 5 5
+
+!*/
+
+// Prior to gfx9, MUBUF instructions with the vaddr offset enabled
+// would always perform a range check. If a negative vaddr base index
+// was used, this would fail the range check. The overall address
+// computation would compute a valid address, but this doesn't happen
+// due to the range check. For out-of-bounds MUBUF loads, a 0 is
+// returned.
+//
+// Therefore it should be safe to fold any VGPR offset on gfx9 into
+// the MUBUF vaddr, but not on older subtargets which can only do this
+// if the sign bit is known 0.
+kernel void negative_mubuf_vaddr(global int* out0)
+{
+volatile int array[16];
+
+int id = get_global_id(0);
+for (int i = 0; i < 16; ++i)
+{
+array[i] = i + 1;
+}
+
+// Directly addressing the same buffer address works without using vaddr:
+//
+// buffer_load_dword v2, off, s[0:3], s11 offset:20
+// out0[id] = array[4];
+
+
+// But having a negative computed base index would fail:
+// v_mov_b32_e32 v0, -8
+// v_lshlrev_b32_e32 v0, 2, v0
+// v_add_i32_e32 v0, vcc, 4, v0
+// buffer_load_dword v2, v0, s[0:3], s11 offen offset:48
+
+#ifdef __AMDGCN__
+// Obscure the value so it can't be folded with other constant or
+// make known bits assumptions.
+int offset;
+__asm volatile("v_mov_b32 %0, -8" : "=v"(offset));
+#else
+int offset = -8;
+#endif
+out0[id] = array[offset + 12];
+}
-- 
2.11.0

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add tests for load lo16 instructions

2017-11-14 Thread Matt Arsenault
ping

> On Oct 27, 2017, at 03:02, Matt Arsenault  wrote:
> 
> ---
> tests/cl/program/execute/load-lo16-generic.cl |  90 +
> tests/cl/program/execute/load-lo16.cl | 275 ++
> 2 files changed, 365 insertions(+)
> create mode 100644 tests/cl/program/execute/load-lo16-generic.cl
> create mode 100644 tests/cl/program/execute/load-lo16.cl
> 
> diff --git a/tests/cl/program/execute/load-lo16-generic.cl 
> b/tests/cl/program/execute/load-lo16-generic.cl
> new file mode 100644
> index 0..62660c629
> --- /dev/null
> +++ b/tests/cl/program/execute/load-lo16-generic.cl
> @@ -0,0 +1,90 @@
> +/*!
> +
> +[config]
> +name: load into low 16-bits of 32-bit register with generic addressing
> +clc_version_min: 20
> +dimensions: 1
> +
> +[test]
> +  name: load lo16 generic
> +  kernel_name: load_lo16_generic
> +  global_size: 4 0 0
> +  local_size: 4 0 0
> +
> +  arg_out: 0 buffer uint[4] \
> +  0xabcd  0x1234  0x 0xdeadbeef
> +
> +  arg_in: 1 buffer uint[4] \
> +  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
> +
> +  arg_in: 2 buffer ushort[4] \
> +  0x   0x   0x  0xbeef
> +
> +[test]
> +  name: zextloadi8 lo16 generic
> +  kernel_name: zextloadi8_lo16_generic
> +  global_size: 4 0 0
> +  local_size: 4 0 0
> +
> +arg_out: 0 buffer uint[4] \
> +  0x00ab0099  0x00120033  0x00110044 0x00de00be
> +
> +arg_in: 1 buffer uint[4] \
> +  0x00abf00f   0x0012f00f   0x0011f00f  0x00def00f
> +
> +arg_in: 2 buffer uchar[4] \
> +  0x99   0x33   0x44  0xbe
> +
> +
> +[test]
> +  name: sextloadi8 lo16 generic
> +  kernel_name: sextloadi8_lo16_generic
> +  global_size: 4 0 0
> +  local_size: 4 0 0
> +
> +arg_out: 0 buffer uint[4] \
> +  0x0099ffab  0x00330012  0x00440011 0x00beffde
> +
> +arg_in: 1 buffer uint[4] \
> +  0x0099f00f   0x0033f00f   0x0044f00f  0x00bef00f
> +
> +arg_in: 2 buffer char[4] \
> +  0xab   0x12   0x11  0xde
> +
> +!*/
> +
> +kernel void load_lo16_generic(volatile global uint* out,
> +  volatile global uint* in0,
> +  volatile global ushort* in1)
> +{
> +volatile generic uint* generic_in0 = (volatile generic uint*)in0;
> +volatile generic ushort* generic_in1 = (volatile generic ushort*)in1;
> +int gid = get_global_id(0);
> +ushort2 val = as_ushort2(in0[gid]);
> +val.lo = generic_in1[gid];
> +out[gid] = as_uint(val);
> +}
> +
> +kernel void zextloadi8_lo16_generic(volatile global uint* out,
> +volatile global uint* in0,
> +volatile global uchar* in1)
> +{
> +volatile generic uint* generic_in0 = (volatile generic uint*)in0;
> +volatile generic uchar* generic_in1 = (volatile generic uchar*)in1;
> +int gid = get_global_id(0);
> +ushort2 val = as_ushort2(in0[gid]);
> +val.lo = (ushort)generic_in1[gid];
> +out[gid] = as_uint(val);
> +}
> +
> +kernel void sextloadi8_lo16_generic(volatile global uint* out,
> +volatile global uint* in0,
> +volatile global char* in1)
> +{
> +volatile generic uint* generic_in0 = (volatile generic uint*)in0;
> +volatile generic char* generic_in1 = (volatile generic char*)in1;
> +int gid = get_global_id(0);
> +short2 val = as_short2(in0[gid]);
> +val.lo = (short)generic_in1[gid];
> +out[gid] = as_uint(val);
> +}
> diff --git a/tests/cl/program/execute/load-lo16.cl 
> b/tests/cl/program/execute/load-lo16.cl
> new file mode 100644
> index 0..f8bf2c2f6
> --- /dev/null
> +++ b/tests/cl/program/execute/load-lo16.cl
> @@ -0,0 +1,275 @@
> +/*!
> +
> +[config]
> +  name: load into low 16-bits of 32-bit register
> +  clc_version_min: 10
> +  dimensions: 1
> +
> +[test]
> +  name: load lo16 global
> +  kernel_name: load_lo16_global
> +  global_size: 4 0 0
> +  local_size: 4 0 0
> +
> +  arg_out: 0 buffer uint[4] \
> +  0xabcd  0x1234  0x 0xdeadbeef
> +
> +  arg_in: 1 buffer uint[4] \
> +  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
> +
> +  arg_in: 2 buffer ushort[4] \
> +  0x   0x   0x  0xbeef
> +
> +
> +[test]
> +  name: load lo16 local
> +  kernel_name: load_lo16_local
> +  global_size: 4 0 0
> +  local_size: 4 0 0
> +
> +  arg_out: 0 buffer uint[4] \
> +  0xabcd  0x1234  0x 0xdeadbeef
> +
> +  arg_in: 1 buffer uint[4] \
> +  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
> +
> +  arg_in: 2 buffer ushort[4] \
> +  0x   0x333

[Piglit] [PATCH] cl: Fix types to be unsigned

2017-10-27 Thread Matt Arsenault
Doesn't really matter.
---
 tests/cl/program/execute/store-hi16.cl | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/cl/program/execute/store-hi16.cl 
b/tests/cl/program/execute/store-hi16.cl
index b734b3766..4273d3369 100644
--- a/tests/cl/program/execute/store-hi16.cl
+++ b/tests/cl/program/execute/store-hi16.cl
@@ -92,7 +92,7 @@ kernel void store_hi16_global(volatile global ushort* out, 
volatile global uint*
 
 kernel void store_hi16_local(volatile global ushort* out, volatile global 
uint* in)
 {
-volatile local short lds[64];
+volatile local ushort lds[64];
 int lid = get_local_id(0);
 int gid = get_global_id(0);
 
@@ -104,7 +104,7 @@ kernel void store_hi16_local(volatile global ushort* out, 
volatile global uint*
 kernel void store_hi16_private(volatile global ushort* out, volatile global 
uint* in)
 {
 int gid = get_global_id(0);
-volatile private short stack = in[gid] >> 16;
+volatile private ushort stack = in[gid] >> 16;
 out[gid] = stack;
 }
 
@@ -117,7 +117,7 @@ kernel void truncstorei8_hi16_global(volatile global uchar* 
out, volatile global
 
 kernel void truncstorei8_hi16_local(volatile global uchar* out, volatile 
global uint* in)
 {
-volatile local short lds[64];
+volatile local ushort lds[64];
 int lid = get_local_id(0);
 int gid = get_global_id(0);
 
@@ -129,6 +129,6 @@ kernel void truncstorei8_hi16_local(volatile global uchar* 
out, volatile global
 kernel void truncstorei8_hi16_private(volatile global uchar* out, volatile 
global uint* in)
 {
 int gid = get_global_id(0);
-volatile private short stack = in[gid] >> 16;
+volatile private ushort stack = in[gid] >> 16;
 out[gid] = (uchar)stack;
 }
-- 
2.11.0

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add tests for load lo16 instructions

2017-10-27 Thread Matt Arsenault
---
 tests/cl/program/execute/load-lo16-generic.cl |  90 +
 tests/cl/program/execute/load-lo16.cl | 275 ++
 2 files changed, 365 insertions(+)
 create mode 100644 tests/cl/program/execute/load-lo16-generic.cl
 create mode 100644 tests/cl/program/execute/load-lo16.cl

diff --git a/tests/cl/program/execute/load-lo16-generic.cl 
b/tests/cl/program/execute/load-lo16-generic.cl
new file mode 100644
index 0..62660c629
--- /dev/null
+++ b/tests/cl/program/execute/load-lo16-generic.cl
@@ -0,0 +1,90 @@
+/*!
+
+[config]
+name: load into low 16-bits of 32-bit register with generic addressing
+clc_version_min: 20
+dimensions: 1
+
+[test]
+  name: load lo16 generic
+  kernel_name: load_lo16_generic
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer uint[4] \
+  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
+
+  arg_in: 2 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+[test]
+  name: zextloadi8 lo16 generic
+  kernel_name: zextloadi8_lo16_generic
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0x00ab0099  0x00120033  0x00110044 0x00de00be
+
+arg_in: 1 buffer uint[4] \
+  0x00abf00f   0x0012f00f   0x0011f00f  0x00def00f
+
+arg_in: 2 buffer uchar[4] \
+  0x99   0x33   0x44  0xbe
+
+
+[test]
+  name: sextloadi8 lo16 generic
+  kernel_name: sextloadi8_lo16_generic
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0x0099ffab  0x00330012  0x00440011 0x00beffde
+
+arg_in: 1 buffer uint[4] \
+  0x0099f00f   0x0033f00f   0x0044f00f  0x00bef00f
+
+arg_in: 2 buffer char[4] \
+  0xab   0x12   0x11  0xde
+
+!*/
+
+kernel void load_lo16_generic(volatile global uint* out,
+  volatile global uint* in0,
+  volatile global ushort* in1)
+{
+volatile generic uint* generic_in0 = (volatile generic uint*)in0;
+volatile generic ushort* generic_in1 = (volatile generic ushort*)in1;
+int gid = get_global_id(0);
+ushort2 val = as_ushort2(in0[gid]);
+val.lo = generic_in1[gid];
+out[gid] = as_uint(val);
+}
+
+kernel void zextloadi8_lo16_generic(volatile global uint* out,
+volatile global uint* in0,
+volatile global uchar* in1)
+{
+volatile generic uint* generic_in0 = (volatile generic uint*)in0;
+volatile generic uchar* generic_in1 = (volatile generic uchar*)in1;
+int gid = get_global_id(0);
+ushort2 val = as_ushort2(in0[gid]);
+val.lo = (ushort)generic_in1[gid];
+out[gid] = as_uint(val);
+}
+
+kernel void sextloadi8_lo16_generic(volatile global uint* out,
+volatile global uint* in0,
+volatile global char* in1)
+{
+volatile generic uint* generic_in0 = (volatile generic uint*)in0;
+volatile generic char* generic_in1 = (volatile generic char*)in1;
+int gid = get_global_id(0);
+short2 val = as_short2(in0[gid]);
+val.lo = (short)generic_in1[gid];
+out[gid] = as_uint(val);
+}
diff --git a/tests/cl/program/execute/load-lo16.cl 
b/tests/cl/program/execute/load-lo16.cl
new file mode 100644
index 0..f8bf2c2f6
--- /dev/null
+++ b/tests/cl/program/execute/load-lo16.cl
@@ -0,0 +1,275 @@
+/*!
+
+[config]
+  name: load into low 16-bits of 32-bit register
+  clc_version_min: 10
+  dimensions: 1
+
+[test]
+  name: load lo16 global
+  kernel_name: load_lo16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer uint[4] \
+  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
+
+  arg_in: 2 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+
+[test]
+  name: load lo16 local
+  kernel_name: load_lo16_local
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer uint[4] \
+  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
+
+  arg_in: 2 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+[test]
+  name: load lo16 private
+  kernel_name: load_lo16_private
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer uint[4] \
+  0xabcdf00f   0x1234f00f   0xf00f  0xdeadf00f
+
+  arg_in: 2 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+
+[test]
+  name: zextloadi8 lo16 global
+  kernel_name: zextloadi8_lo16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0x00ab0099  0x00120033  0x00110044 0x00de00be
+
+arg_in: 1 buffer uint[4] \
+  0x00abf00f   0x0012f00f   0x0011f00f  0x00def00f
+
+arg_in: 2 buffer uchar[4] \
+  0x99   0x33   0x44  0xbe
+
+
+[test]
+  name: sextloadi8 lo16 global
+  kernel_name: sextloadi8_lo16_globa

[Piglit] [PATCH] cl: Add bigger versions of calls with struct tests

2017-10-12 Thread Matt Arsenault
These are just bigger versions of the existing struct
calls tests so that they stress using byval/sret. The
existing call with struct tests are now passed directly
in registers.
---
 tests/cl/program/execute/calls-large-struct.cl | 156 +
 tests/cl/program/execute/calls-struct.cl   |  50 
 2 files changed, 181 insertions(+), 25 deletions(-)
 create mode 100644 tests/cl/program/execute/calls-large-struct.cl

diff --git a/tests/cl/program/execute/calls-large-struct.cl 
b/tests/cl/program/execute/calls-large-struct.cl
new file mode 100644
index 0..46d84760d
--- /dev/null
+++ b/tests/cl/program/execute/calls-large-struct.cl
@@ -0,0 +1,156 @@
+/*!
+
+[config]
+name: calls with large structs
+clc_version_min: 10
+
+[test]
+name: byval struct
+kernel_name: call_i32_func_byval_Char_IntArray
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 1021 1022 1023 1024 1025 1026 1027 1028 \
+ 1029 1030 1031 1032 1033 1034 1035 1036
+
+arg_out: 1 buffer int[16] \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+
+arg_in: 2 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+
+[test]
+name: sret struct
+kernel_name: call_sret_Char_IntArray_func
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 921 922 923 924 925 926 927 928 \
+ 929 930 931 932 933 934 935 936
+
+arg_in: 1 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+!*/
+
+#define NOINLINE __attribute__((noinline))
+
+typedef struct ByVal_Char_IntArray {
+char c;
+int i[32];
+} ByVal_Char_IntArray;
+
+NOINLINE
+int i32_func_byval_Char_IntArray(ByVal_Char_IntArray st)
+{
+st.i[0] += 100;
+
+int sum = 0;
+for (int i = 0; i < 32; ++i)
+{
+sum += st.i[i];
+}
+
+sum += st.c;
+return sum;
+}
+
+kernel void call_i32_func_byval_Char_IntArray(global int* out0,
+  global int* out1,
+  global int* input)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int id = get_global_id(0);
+
+int val = input[id];
+
+
+st.i[0] = 14;
+st.i[1] = -8;
+st.i[2] = val;
+st.i[3] = 900;
+
+for (int i = 4; i < 32; ++i)
+{
+st.i[i] = 0;
+}
+
+volatile int stack_object[16];
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x07080900 | i;
+stack_object[i] = test_val;
+}
+
+int result = i32_func_byval_Char_IntArray(st);
+
+// Check for stack corruption
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x07080900 | i;
+if (stack_object[i] != test_val)
+result = -1;
+}
+
+out0[id] = result;
+out1[id] = st.i[0];
+}
+
+NOINLINE
+ByVal_Char_IntArray sret_Char_IntArray_func(global int* input, int id)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int val = input[id];
+st.i[0] = 14;
+st.i[1] = -8;
+st.i[2] = val;
+st.i[3] = 900;
+
+for (int i = 4; i < 32; ++i)
+{
+st.i[i] = 0;
+}
+
+return st;
+}
+
+kernel void call_sret_Char_IntArray_func(global int* output, global int* input)
+{
+volatile int stack_object[16];
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x04030200 | i;
+stack_object[i] = test_val;
+}
+
+int id = get_global_id(0);
+ByVal_Char_IntArray st = sret_Char_IntArray_func(input, id);
+
+int sum = 0;
+for (int i = 0; i < 32; ++i)
+{
+sum += st.i[i];
+}
+
+sum += st.c;
+
+// Check for stack corruption
+for (int i = 0; i < 16; ++i)
+{
+const int test_val = 0x04030200 | i;
+if (stack_object[i] != test_val)
+sum = -1;
+}
+
+output[id] = sum;
+}
diff --git a/tests/cl/program/execute/calls-struct.cl 
b/tests/cl/program/execute/calls-struct.cl
index 04f769dac..5d52e5587 100644
--- a/tests/cl/program/execute/calls-struct.cl
+++ b/tests/cl/program/execute/calls-struct.cl
@@ -1,12 +1,12 @@
 /*!
 
 [config]
-name: calls with structs
+name: calls with structs passed in registers
 clc_version_min: 10
 
 [test]
-name: byval struct
-kernel_name: call_i32_func_byval_Char_IntArray
+name: regs struct
+kernel_name: call_i32_func_small_struct_regs_Char_IntArray
 dimensions: 1
 global_size: 16 0 0
 
@@ -25,8 +25,8 @@ arg_in: 2 buffer int[16] \
 
 
 [test]
-name: sret struct
-kernel_name: call_sret_Char_IntArray_func
+name: struct_smallregs struct
+kernel_name: call_struct_smallregs_Char_IntArray_func
 dimensions: 1
 global_size: 16 0 0
 
@@ -39,8 +39,8 @@ arg_in: 1 buffer int[16] \
 
 
 [test]
-name: byval struct and sret struct
-kernel_name: call_sret_Char_IntArray_func_byval_Char_IntArray
+name: small struct in regs
+kernel_name: 
call_struct_smallregs_Char_IntArray_func_small_struct_regs_Char_IntArray
 dimensions: 1
 global_size: 16 0 0
 
@@ -63,13 +63,13 @@ arg_in: 2 buffer int[16] \
 
 #define NOINLINE __attribute__((noinl

Re: [Piglit] [PATCH] cl: Add tests for load hi16 instructions

2017-10-12 Thread Matt Arsenault

> On Oct 6, 2017, at 16:22, Jan Vesely  wrote:
> 
> On Tue, 2017-09-19 at 22:02 -0700, Matt Arsenault wrote:
>> v2: Fix some formatting
> 
> Reviewed-by: Jan Vesely 
> 
> sorry for the delay, feel free to cc me on cl piglit patches. I don't
> think anyone else is interested in piglit cl.
> Are these targeting specific gcn instructions as well?
> 
> Jan

Yes, can you push this for me?
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add tests for mad mix

2017-10-06 Thread Matt Arsenault
gfx9 added v_mad_mix_f32, v_mad_mixlo_f16,
and v_mad_mixhi_f16 instructions. Make sure the
conversion to/from f16 is folded into this
instruction and it works.

These aren't great since they need more test
values, and generating half results is kind
of a pain from any other tool. The perfect
values used don't really stress the conversions,
but this as at least enough to make sure the
encoding is correct.

v2: Fix backslash alignments
---
 tests/cl/program/execute/mad-mix.cl | 283 
 1 file changed, 283 insertions(+)
 create mode 100644 tests/cl/program/execute/mad-mix.cl

diff --git a/tests/cl/program/execute/mad-mix.cl 
b/tests/cl/program/execute/mad-mix.cl
new file mode 100644
index 0..a5955361d
--- /dev/null
+++ b/tests/cl/program/execute/mad-mix.cl
@@ -0,0 +1,283 @@
+/*!
+
+[config]
+name: f32 mad with conversion from f16
+clc_version_min: 10
+build_options: -cl-denorms-are-zero
+require_device_extensions: cl_khr_fp16
+
+dimensions: 1
+
+[test]
+name: mad mix f32 f16lo f16lo f16lo
+kernel_name: mad_mix_f32_f16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0   1.0   1.0  -1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f32 fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f32_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0  -1.0   1.0   1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f32 f16lo f16lo f16hi
+kernel_name: mad_mix_f32_f16lo_f16lo_f16hi
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0  \
+  1.0  \
+  1.0  \
+ -1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   \
+  1.0   \
+  0.0   \
+ -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0\
+  1.0\
+  1.0\
+  1.0
+
+arg_in: 3 buffer half2[4] \
+  1000.0 0.0  \
+  1000.0 0.0  \
+  1000.0 1.0  \
+  1000.0 0.0
+
+
+[test]
+name: mad mix f32 f16lo f16lo neg(f16hi)
+kernel_name: mad_mix_f32_f16lo_f16lo_negf16hi
+global_size: 5 0 0
+
+arg_out: 0 buffer float[5] \
+  0.0  \
+  1.0  \
+ -1.0  \
+ -1.0  \
+  0.0
+
+arg_in: 1 buffer half[5] \
+  0.0\
+  1.0\
+  0.0\
+ -1.0\
+  2.0
+
+arg_in: 2 buffer half[5] \
+  0.0\
+  1.0\
+  1.0\
+  1.0\
+  2.0
+
+arg_in: 3 buffer half2[5] \
+  1000.0 0.0  \
+  1000.0 0.0  \
+  1000.0 1.0  \
+  1000.0 0.0  \
+  1000.0 4.0
+
+
+[test]
+name: mad mix f16lo fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f16lo_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer half[4] \
+  0.0  -1.0   1.0   1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f16hi fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f16hi_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer half2[4] \
+  2.00.0   \
+  2.0   -1.0   \
+  2.01.0   \
+  2.01.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+
+[test]
+name: mad mix f32 f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f32_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer float[5] \
+  0.0   1.0   0.0   0.75   \
+  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+[test]
+name: mad mix f16lo f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f16lo_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer half[5] \
+  0.0   1.0   0.0   0.75  \
+  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+[test]
+name: mad mix f16hi f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f16hi_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer half2[5] \
+  2.0  0.0 \
+  2.0  1.0 \
+  2.0  0.0 \
+  2.0  0.75\
+  2.0  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer 

Re: [Piglit] [PATCH] cl: Add tests for mad mix

2017-10-05 Thread Matt Arsenault

> On Oct 5, 2017, at 12:33, Jan Vesely  wrote:
> 
> 
> ah, that makes more sense. Do you mind if I add it to the commit
> message? (I'll also fix the formatting nits) with that
> 
> Reviewed-by: Jan Vesely 
> 
> out of curiosity what's the use of having these in piglit? supposedly
> the instruction selection and encoding part is tested in llvm lit. Is
> this testing whether the instruction works correctly? shouldn't the hw
> design team have tests for that?
> 
> Jan

We can test an encoding in the lit tests, but we can’t actually check that it 
works. I don’t really trust the encoding tests until there’s something 
executing it. A lot of times in the past we’ve gotten the encodings wrong and 
the instruction doesn’t work, or the manual has had an off by one error in some 
of the encodings.

-Matt
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] cl: Add tests for mad mix

2017-10-05 Thread Matt Arsenault
These aren't great since they need more test
values, and generating half results is kind
of a pain from any other tool. The perfect
values used don't really stress the conversions,
but this as at least enough to make sure the
encoding is correct.

v2: Fix backslash alignments
---
 tests/cl/program/execute/mad-mix.cl | 283 
 1 file changed, 283 insertions(+)
 create mode 100644 tests/cl/program/execute/mad-mix.cl

diff --git a/tests/cl/program/execute/mad-mix.cl 
b/tests/cl/program/execute/mad-mix.cl
new file mode 100644
index 0..a5955361d
--- /dev/null
+++ b/tests/cl/program/execute/mad-mix.cl
@@ -0,0 +1,283 @@
+/*!
+
+[config]
+name: f32 mad with conversion from f16
+clc_version_min: 10
+build_options: -cl-denorms-are-zero
+require_device_extensions: cl_khr_fp16
+
+dimensions: 1
+
+[test]
+name: mad mix f32 f16lo f16lo f16lo
+kernel_name: mad_mix_f32_f16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0   1.0   1.0  -1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f32 fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f32_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0  -1.0   1.0   1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f32 f16lo f16lo f16hi
+kernel_name: mad_mix_f32_f16lo_f16lo_f16hi
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0  \
+  1.0  \
+  1.0  \
+ -1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   \
+  1.0   \
+  0.0   \
+ -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0\
+  1.0\
+  1.0\
+  1.0
+
+arg_in: 3 buffer half2[4] \
+  1000.0 0.0  \
+  1000.0 0.0  \
+  1000.0 1.0  \
+  1000.0 0.0
+
+
+[test]
+name: mad mix f32 f16lo f16lo neg(f16hi)
+kernel_name: mad_mix_f32_f16lo_f16lo_negf16hi
+global_size: 5 0 0
+
+arg_out: 0 buffer float[5] \
+  0.0  \
+  1.0  \
+ -1.0  \
+ -1.0  \
+  0.0
+
+arg_in: 1 buffer half[5] \
+  0.0\
+  1.0\
+  0.0\
+ -1.0\
+  2.0
+
+arg_in: 2 buffer half[5] \
+  0.0\
+  1.0\
+  1.0\
+  1.0\
+  2.0
+
+arg_in: 3 buffer half2[5] \
+  1000.0 0.0  \
+  1000.0 0.0  \
+  1000.0 1.0  \
+  1000.0 0.0  \
+  1000.0 4.0
+
+
+[test]
+name: mad mix f16lo fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f16lo_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer half[4] \
+  0.0  -1.0   1.0   1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f16hi fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f16hi_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer half2[4] \
+  2.00.0   \
+  2.0   -1.0   \
+  2.01.0   \
+  2.01.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+
+[test]
+name: mad mix f32 f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f32_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer float[5] \
+  0.0   1.0   0.0   0.75   \
+  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+[test]
+name: mad mix f16lo f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f16lo_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer half[5] \
+  0.0   1.0   0.0   0.75  \
+  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+[test]
+name: mad mix f16hi f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f16hi_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer half2[5] \
+  2.0  0.0 \
+  2.0  1.0 \
+  2.0  0.0 \
+  2.0  0.75\
+  2.0  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+!*/
+
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
+
+kernel void mad_mix_f32_f16lo_f16lo_f16lo(global float*

Re: [Piglit] [PATCH] cl: Add tests for mad mix

2017-10-05 Thread Matt Arsenault

> On Oct 5, 2017, at 08:09, Jan Vesely  wrote:
> 
> On Mon, 2017-10-02 at 10:32 -0700, Matt Arsenault wrote:
>> ping
>> 
>>> On Sep 19, 2017, at 19:25, Matt Arsenault  wrote:
>>> 
>>> These aren't great since they need more test
>>> values, and generating half results is kind
>>> of a pain from any other tool. The perfect
>>> values used don't really stress the conversions,
>>> but this as at least enough to make sure the
>>> encoding is correct.
> 
> what is this test supposed to test? generic fp16 encoding
> /functionality? if so, why is it better than vstore_half/vload_half
> tests? 
> does it test fp16 version of clamp? why is it better than generated
> clamp tests?
> 
> Jan
> 

It’s specifically testing v_mad_mix_f32, v_mad_mixlo_f16, v_mad_mixhi_f16 
instructions added in gfx9

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add tests for store hi16 instructions

2017-10-02 Thread Matt Arsenault
ping

> On Sep 19, 2017, at 19:51, Matt Arsenault  wrote:
> 
> ---
> tests/cl/program/execute/store-hi16-generic.cl |  51 ++
> tests/cl/program/execute/store-hi16.cl | 134 +
> 2 files changed, 185 insertions(+)
> create mode 100644 tests/cl/program/execute/store-hi16-generic.cl
> create mode 100644 tests/cl/program/execute/store-hi16.cl
> 
> diff --git a/tests/cl/program/execute/store-hi16-generic.cl 
> b/tests/cl/program/execute/store-hi16-generic.cl
> new file mode 100644
> index 0..807818971
> --- /dev/null
> +++ b/tests/cl/program/execute/store-hi16-generic.cl
> @@ -0,0 +1,51 @@
> +/*!
> +
> +[config]
> +name: store high 16-bits of 32-bit value with generic addressing
> +clc_version_min: 20
> +dimensions: 1
> +
> +[test]
> +name: store hi16 generic
> +kernel_name: store_hi16_generic
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer ushort[4] \
> +0xabcd0x   0x 0x
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x0x0x
> +
> +
> +[test]
> +name: store hi16 generic trunc i8
> +kernel_name: truncstorei8_hi16_generic
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer uchar[4] \
> +  0xcd 0x22 0xad 0x80
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x112233440xdeadbeef  0x70809024
> +
> +!*/
> +
> +kernel void store_hi16_generic(volatile global ushort* out, volatile global 
> uint* in)
> +{
> +int gid = get_global_id(0);
> +uint value = in[gid];
> +
> +volatile generic ushort* generic_out = (volatile generic ushort*)out;
> +generic_out[gid] = value >> 16;
> +}
> +
> +kernel void truncstorei8_hi16_generic(volatile global uchar* out, volatile 
> global uint* in)
> +{
> +int gid = get_global_id(0);
> +uint value = in[gid];
> +
> +volatile generic uchar* generic_out = (volatile generic ushort*)out;
> +generic_out[gid] = (uchar)(value >> 16);
> +}
> diff --git a/tests/cl/program/execute/store-hi16.cl 
> b/tests/cl/program/execute/store-hi16.cl
> new file mode 100644
> index 0..b734b3766
> --- /dev/null
> +++ b/tests/cl/program/execute/store-hi16.cl
> @@ -0,0 +1,134 @@
> +/*!
> +
> +[config]
> +name: store high 16-bits of 32-bit value
> +clc_version_min: 10
> +
> +dimensions: 1
> +
> +[test]
> +name: store hi16 global
> +kernel_name: store_hi16_global
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer ushort[4] \
> +0xabcd0x   0x 0x
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x0x0x
> +
> +[test]
> +name: store hi16 local
> +kernel_name: store_hi16_local
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer ushort[4] \
> +  0xabcd0x   0x 0x
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x0x0x
> +
> +[test]
> +name: store hi16 private
> +kernel_name: store_hi16_private
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer ushort[4] \
> +  0xabcd0x   0x 0x
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x0x0x
> +
> +
> +[test]
> +name: store hi16 global trunc i8
> +kernel_name: truncstorei8_hi16_global
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer uchar[4] \
> +  0xcd 0x22 0xad 0x80
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x112233440xdeadbeef  0x70809024
> +
> +
> +[test]
> +name: store hi16 local trunc i8
> +kernel_name: truncstorei8_hi16_local
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer uchar[4] \
> +  0xcd 0x22 0xad 0x80
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x112233440xdeadbeef  0x70809024
> +
> +
> +[test]
> +name: store hi16 private trunc i8
> +kernel_name: truncstorei8_hi16_private
> +global_size: 4 0 0
> +local_size: 4 0 0
> +
> +arg_out: 0 buffer uchar[4] \
> +  0xcd 0x22 0xad 0x80
> +
> +arg_in: 1 buffer uint[4] \
> +   0xabcd12340x112233440xdeadbeef  0x70809024
> +
> +!*/
> +
> +kernel void store_hi16_global(volatile global ushort* out, volatile global 
> uint* in)
> +{
> +int gid = get_global_id(0);
> +uint value = in[gid];
> +out[gid] = value >> 16;
> +}
> +
> +kernel void store_hi16_local(volatile global ushort* out, volatile global 
> uint* in)
> +{
> +volatile local short lds[64];
> +int lid = get_local_id(0);
> +int gid = get_global_id(0);
> +

Re: [Piglit] [PATCH] cl: Add tests for mad mix

2017-10-02 Thread Matt Arsenault
ping

> On Sep 19, 2017, at 19:25, Matt Arsenault  wrote:
> 
> These aren't great since they need more test
> values, and generating half results is kind
> of a pain from any other tool. The perfect
> values used don't really stress the conversions,
> but this as at least enough to make sure the
> encoding is correct.
> ---
> tests/cl/program/execute/mad-mix.cl | 283 
> 1 file changed, 283 insertions(+)
> create mode 100644 tests/cl/program/execute/mad-mix.cl
> 
> diff --git a/tests/cl/program/execute/mad-mix.cl 
> b/tests/cl/program/execute/mad-mix.cl
> new file mode 100644
> index 0..dd7a5a516
> --- /dev/null
> +++ b/tests/cl/program/execute/mad-mix.cl
> @@ -0,0 +1,283 @@
> +/*!
> +
> +[config]
> +name: f32 mad with conversion from f16
> +clc_version_min: 10
> +build_options: -cl-denorms-are-zero
> +require_device_extensions: cl_khr_fp16
> +
> +dimensions: 1
> +
> +[test]
> +name: mad mix f32 f16lo f16lo f16lo
> +kernel_name: mad_mix_f32_f16lo_f16lo_f16lo
> +global_size: 4 0 0
> +
> +arg_out: 0 buffer float[4] \
> +  0.0   1.0   1.0   -1.0   \
> +
> +arg_in: 1 buffer half[4] \
> +  0.0   1.0   0.0  -1.0
> +
> +arg_in: 2 buffer half[4] \
> +  0.0   1.0   1.0   1.0\
> +
> +arg_in: 3 buffer half[4] \
> +  0.0   0.0   1.0   0.0
> +
> +
> +[test]
> +name: mad mix f32 fneg(f16lo) f16lo f16lo
> +kernel_name: mad_mix_f32_negf16lo_f16lo_f16lo
> +global_size: 4 0 0
> +
> +arg_out: 0 buffer float[4] \
> +  0.0   -1.0   1.0   1.0   \
> +
> +arg_in: 1 buffer half[4] \
> +  0.0   1.0   0.0  -1.0
> +
> +arg_in: 2 buffer half[4] \
> +  0.0   1.0   1.0   1.0\
> +
> +arg_in: 3 buffer half[4] \
> +  0.0   0.0   1.0   0.0
> +
> +
> +[test]
> +name: mad mix f32 f16lo f16lo f16hi
> +kernel_name: mad_mix_f32_f16lo_f16lo_f16hi
> +global_size: 4 0 0
> +
> +arg_out: 0 buffer float[4] \
> +  0.0   \
> +  1.0   \
> +  1.0   \
> + -1.0
> +
> +arg_in: 1 buffer half[4] \
> +  0.0   \
> +  1.0   \
> +  0.0   \
> + -1.0
> +
> +arg_in: 2 buffer half[4] \
> +  0.0   \
> +  1.0   \
> +  1.0   \
> +  1.0
> +
> +arg_in: 3 buffer half2[4] \
> +  1000.0 0.0   \
> +  1000.0 0.0   \
> +  1000.0 1.0   \
> +  1000.0 0.0
> +
> +
> +[test]
> +name: mad mix f32 f16lo f16lo neg(f16hi)
> +kernel_name: mad_mix_f32_f16lo_f16lo_negf16hi
> +global_size: 5 0 0
> +
> +arg_out: 0 buffer float[5] \
> +  0.0   \
> +  1.0   \
> + -1.0   \
> + -1.0   \
> +  0.0
> +
> +arg_in: 1 buffer half[5] \
> +  0.0   \
> +  1.0   \
> +  0.0   \
> + -1.0   \
> +  2.0
> +
> +arg_in: 2 buffer half[5] \
> +  0.0   \
> +  1.0   \
> +  1.0   \
> +  1.0   \
> +  2.0
> +
> +arg_in: 3 buffer half2[5] \
> +  1000.0 0.0   \
> +  1000.0 0.0   \
> +  1000.0 1.0   \
> +  1000.0 0.0   \
> +  1000.0 4.0
> +
> +
> +[test]
> +name: mad mix f16lo fneg(f16lo) f16lo f16lo
> +kernel_name: mad_mix_f16lo_negf16lo_f16lo_f16lo
> +global_size: 4 0 0
> +
> +arg_out: 0 buffer half[4] \
> +  0.0   -1.0   1.0   1.0  \
> +
> +arg_in: 1 buffer half[4] \
> +  0.0   1.0   0.0  -1.0
> +
> +arg_in: 2 buffer half[4] \
> +  0.0   1.0   1.0   1.0\
> +
> +arg_in: 3 buffer half[4] \
> +  0.0   0.0   1.0   0.0
> +
> +
> +[test]
> +name: mad mix f16hi fneg(f16lo) f16lo f16lo
> +kernel_name: mad_mix_f16hi_negf16lo_f16lo_f16lo
> +global_size: 4 0 0
> +
> +arg_out: 0 buffer half2[4] \
> +  2.00.0   \
> +  2.0   -1.0   \
> +  2.01.0   \
> +  2.01.0
> +
> +arg_in: 1 buffer half[4] \
> +  0.0   1.0   0.0  -1.0
> +
> +arg_in: 2 buffer half[4] \
> +  0.0   1.0   1.0   1.0\
> +
> +arg_in: 3 buffer half[4] \
> +  0.0   0.0   1.0   0.0
> +
> +
> +
> +[test]
> +name: mad mix f32 f16lo f16lo f16lo with clamp
> +kernel_name: mad_mix_f32_f16lo_f16lo_f16lo_clamp
> +global_size: 5 0 0
> +
> +arg_out: 0 buffer float[5] \
> +  0.0   1.0   0.0   0.75   \
> +  1.0
> +
> +arg_in: 1 buffer half[5] \
> +  0.0   2.0  -2.0   0.5  \
> +  0.5
> +
> +arg_in: 2 buffer half[5] \
> +  0.0   1.0   1.0   0.5  \
> +  1.0
> +
> +arg_in: 3 buffer half[5] \
> +  0.0   1.0   1.0   0.5  \
> +  0.5
> +
> +
> +[test]
> +name: mad mix f16lo f16lo f16lo f16lo with clamp
> +kernel_name: mad_mix_f16lo_f16lo_f16lo_f16lo_clamp
> +global_size: 5 0 0
> +
> +arg_out: 0 buffer half[5] \
> +  0.0   1.0   0.0   0.75   \
> +  1.0
> +
> +arg_in: 1 buffer half[5] \
> +  0.0   2.0  -2.0   0.5  \
> +  0.5
> +
> +arg_in: 2 buffer half[5] \
> + 

[Piglit] [PATCH] cl: Add tests for load hi16 instructions

2017-09-19 Thread Matt Arsenault
v2: Fix some formatting
---
 tests/cl/program/execute/load-hi16-generic.cl |  96 +
 tests/cl/program/execute/load-hi16.cl | 280 ++
 2 files changed, 376 insertions(+)
 create mode 100644 tests/cl/program/execute/load-hi16-generic.cl
 create mode 100644 tests/cl/program/execute/load-hi16.cl

diff --git a/tests/cl/program/execute/load-hi16-generic.cl 
b/tests/cl/program/execute/load-hi16-generic.cl
new file mode 100644
index 0..ac1cb7a4d
--- /dev/null
+++ b/tests/cl/program/execute/load-hi16-generic.cl
@@ -0,0 +1,96 @@
+/*!
+
+[config]
+name: load into high 16-bits of 32-bit register with generic addressing
+clc_version_min: 20
+dimensions: 1
+
+[test]
+name: load hi16 generic
+kernel_name: load_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+arg_in: 1 buffer ushort[4] \
+   0x   0x   0x  0xbeef
+
+arg_in: 2 buffer ushort[4] \
+   0xabcd   0x1234   0x  0xdead
+
+
+[test]
+name: zextloadi8 hi16 generic
+kernel_name: zextloadi8_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0x00ab0099  0x00120033  0x00110044 0x00de00be
+
+arg_in: 1 buffer uchar[4] \
+   0x99   0x33   0x44  0xbe
+
+arg_in: 2 buffer uchar[4] \
+   0xab   0x12   0x11  0xde
+
+
+[test]
+name: sextloadi8 hi16 generic
+kernel_name: sextloadi8_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0xffabff99  0x00120033  0x00110044 0xffdeffbe
+
+arg_in: 1 buffer char[4] \
+   0x99   0x33   0x44  0xbe
+
+arg_in: 2 buffer char[4] \
+   0xab   0x12   0x11  0xde
+
+!*/
+
+kernel void load_hi16_generic(volatile global uint* out,
+  volatile global ushort* in0,
+  volatile global ushort* in1)
+{
+volatile generic ushort* generic_in0 = (volatile generic ushort*)in0;
+volatile generic ushort* generic_in1 = (volatile generic ushort*)in1;
+int gid = get_global_id(0);
+ushort lo = generic_in0[gid];
+ushort hi = generic_in1[gid];
+ushort2 vec = { lo, hi };
+out[gid] = as_uint(vec);
+}
+
+kernel void zextloadi8_hi16_generic(volatile global uint* out,
+volatile global uchar* in0,
+volatile global uchar* in1)
+{
+volatile generic uchar* generic_in0 = (volatile generic uchar*)in0;
+volatile generic uchar* generic_in1 = (volatile generic uchar*)in1;
+
+int gid = get_global_id(0);
+ushort lo = (ushort)generic_in0[gid];
+ushort hi = (ushort)generic_in1[gid];
+ushort2 vec = { lo, hi };
+out[gid] = as_uint(vec);
+}
+
+kernel void sextloadi8_hi16_generic(volatile global uint* out,
+volatile global char* in0,
+volatile global char* in1)
+{
+volatile generic char* generic_in0 = (volatile generic char*)in0;
+volatile generic char* generic_in1 = (volatile generic char*)in1;
+
+int gid = get_global_id(0);
+short lo = (short)generic_in0[gid];
+short hi = (short)generic_in1[gid];
+short2 vec = { lo, hi };
+out[gid] = as_uint(vec);
+}
diff --git a/tests/cl/program/execute/load-hi16.cl 
b/tests/cl/program/execute/load-hi16.cl
new file mode 100644
index 0..f57e0e886
--- /dev/null
+++ b/tests/cl/program/execute/load-hi16.cl
@@ -0,0 +1,280 @@
+/*!
+
+[config]
+  name: load into high 16-bits of 32-bit register
+  clc_version_min: 10
+  dimensions: 1
+
+[test]
+  name: load hi16 global
+  kernel_name: load_hi16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+  arg_in: 2 buffer ushort[4] \
+  0xabcd   0x1234   0x  0xdead
+
+
+[test]
+  name: load hi16 local
+  kernel_name: load_hi16_local
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+  arg_in: 2 buffer ushort[4] \
+  0xabcd   0x1234   0x  0xdead
+
+
+[test]
+  name: load hi16 private
+  kernel_name: load_hi16_private
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+  arg_in: 2 buffer ushort[4] \
+  0xabcd   0x1234   0x  0xdead
+
+
+[test]
+  name: zextloadi8 hi16 global
+  kernel_name: zextloadi8_hi16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0x00ab0099  0x00120033  0x00110044 0x00de00be
+
+  arg_in: 1 buffer uchar[4] \
+  0x99   0x33   0x44  0xbe
+
+  arg_in: 2 buffer uchar[4] \
+  0xab   0x12   0x11  0xde
+
+
+[test]
+  name: sextloadi8 hi16 global
+  kernel_name: sextloadi8_hi16_global
+  global_size: 4 0 0
+  local_size: 4 0 

[Piglit] [PATCH] cl: Add tests for load hi16 instructions

2017-09-19 Thread Matt Arsenault
---
 tests/cl/program/execute/load-hi16-generic.cl |  96 +
 tests/cl/program/execute/load-hi16.cl | 283 ++
 2 files changed, 379 insertions(+)
 create mode 100644 tests/cl/program/execute/load-hi16-generic.cl
 create mode 100644 tests/cl/program/execute/load-hi16.cl

diff --git a/tests/cl/program/execute/load-hi16-generic.cl 
b/tests/cl/program/execute/load-hi16-generic.cl
new file mode 100644
index 0..1fa094aa8
--- /dev/null
+++ b/tests/cl/program/execute/load-hi16-generic.cl
@@ -0,0 +1,96 @@
+/*!
+
+[config]
+name: load into high 16-bits of 32-bit register with generic addressing
+clc_version_min: 20
+dimensions: 1
+
+[test]
+name: load hi16 generic
+kernel_name: load_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+arg_in: 1 buffer ushort[4] \
+   0x   0x   0x  0xbeef
+
+arg_in: 2 buffer ushort[4] \
+   0xabcd   0x1234   0x  0xdead
+
+
+   [test]
+name: zextloadi8 hi16 generic
+kernel_name: zextloadi8_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0x00ab0099  0x00120033  0x00110044 0x00de00be
+
+arg_in: 1 buffer uchar[4] \
+   0x99   0x33   0x44  0xbe
+
+arg_in: 2 buffer uchar[4] \
+   0xab   0x12   0x11  0xde
+
+
+[test]
+name: sextloadi8 hi16 generic
+kernel_name: sextloadi8_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uint[4] \
+  0xffabff99  0x00120033  0x00110044 0xffdeffbe
+
+arg_in: 1 buffer char[4] \
+   0x99   0x33   0x44  0xbe
+
+arg_in: 2 buffer char[4] \
+   0xab   0x12   0x11  0xde
+
+!*/
+
+kernel void load_hi16_generic(volatile global uint* out,
+  volatile global ushort* in0,
+  volatile global ushort* in1)
+{
+volatile generic ushort* generic_in0 = (volatile generic ushort*)in0;
+volatile generic ushort* generic_in1 = (volatile generic ushort*)in1;
+int gid = get_global_id(0);
+ushort lo = generic_in0[gid];
+ushort hi = generic_in1[gid];
+ushort2 vec = { lo, hi };
+out[gid] = as_uint(vec);
+}
+
+kernel void zextloadi8_hi16_generic(volatile global uint* out,
+volatile global uchar* in0,
+volatile global uchar* in1)
+{
+volatile generic uchar* generic_in0 = (volatile generic uchar*)in0;
+volatile generic uchar* generic_in1 = (volatile generic uchar*)in1;
+
+int gid = get_global_id(0);
+ushort lo = (ushort)generic_in0[gid];
+ushort hi = (ushort)generic_in1[gid];
+ushort2 vec = { lo, hi };
+out[gid] = as_uint(vec);
+}
+
+kernel void sextloadi8_hi16_generic(volatile global uint* out,
+volatile global char* in0,
+volatile global char* in1)
+{
+volatile generic char* generic_in0 = (volatile generic char*)in0;
+volatile generic char* generic_in1 = (volatile generic char*)in1;
+
+int gid = get_global_id(0);
+short lo = (short)generic_in0[gid];
+short hi = (short)generic_in1[gid];
+short2 vec = { lo, hi };
+out[gid] = as_uint(vec);
+}
diff --git a/tests/cl/program/execute/load-hi16.cl 
b/tests/cl/program/execute/load-hi16.cl
new file mode 100644
index 0..aaab65f63
--- /dev/null
+++ b/tests/cl/program/execute/load-hi16.cl
@@ -0,0 +1,283 @@
+/*!
+
+  [config]
+  name: load into high 16-bits of 32-bit register
+  clc_version_min: 10
+
+  dimensions: 1
+
+  [test]
+  name: load hi16 global
+  kernel_name: load_hi16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+  arg_in: 2 buffer ushort[4] \
+  0xabcd   0x1234   0x  0xdead
+
+
+  [test]
+  name: load hi16 local
+  kernel_name: load_hi16_local
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+  arg_in: 2 buffer ushort[4] \
+  0xabcd   0x1234   0x  0xdead
+
+
+  [test]
+  name: load hi16 private
+  kernel_name: load_hi16_private
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0xabcd  0x1234  0x 0xdeadbeef
+
+  arg_in: 1 buffer ushort[4] \
+  0x   0x   0x  0xbeef
+
+  arg_in: 2 buffer ushort[4] \
+  0xabcd   0x1234   0x  0xdead
+
+
+  [test]
+  name: zextloadi8 hi16 global
+  kernel_name: zextloadi8_hi16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  arg_out: 0 buffer uint[4] \
+  0x00ab0099  0x00120033  0x00110044 0x00de00be
+
+  arg_in: 1 buffer uchar[4] \
+  0x99   0x33   0x44  0xbe
+
+  arg_in: 2 buffer uchar[4] \
+  0xab   0x12   0x11  0xde
+
+
+  [test]
+  name: sextloadi8 hi16 global
+  kernel_name: sextloadi8_hi16_global
+  global_size: 4 0 0
+  local_size: 4 0 0
+
+  

[Piglit] [PATCH] cl: Add tests for store hi16 instructions

2017-09-19 Thread Matt Arsenault
---
 tests/cl/program/execute/store-hi16-generic.cl |  51 ++
 tests/cl/program/execute/store-hi16.cl | 134 +
 2 files changed, 185 insertions(+)
 create mode 100644 tests/cl/program/execute/store-hi16-generic.cl
 create mode 100644 tests/cl/program/execute/store-hi16.cl

diff --git a/tests/cl/program/execute/store-hi16-generic.cl 
b/tests/cl/program/execute/store-hi16-generic.cl
new file mode 100644
index 0..807818971
--- /dev/null
+++ b/tests/cl/program/execute/store-hi16-generic.cl
@@ -0,0 +1,51 @@
+/*!
+
+[config]
+name: store high 16-bits of 32-bit value with generic addressing
+clc_version_min: 20
+dimensions: 1
+
+[test]
+name: store hi16 generic
+kernel_name: store_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer ushort[4] \
+0xabcd0x   0x 0x
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x0x0x
+
+
+[test]
+name: store hi16 generic trunc i8
+kernel_name: truncstorei8_hi16_generic
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uchar[4] \
+  0xcd 0x22 0xad 0x80
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x112233440xdeadbeef  0x70809024
+
+!*/
+
+kernel void store_hi16_generic(volatile global ushort* out, volatile global 
uint* in)
+{
+int gid = get_global_id(0);
+uint value = in[gid];
+
+volatile generic ushort* generic_out = (volatile generic ushort*)out;
+generic_out[gid] = value >> 16;
+}
+
+kernel void truncstorei8_hi16_generic(volatile global uchar* out, volatile 
global uint* in)
+{
+int gid = get_global_id(0);
+uint value = in[gid];
+
+volatile generic uchar* generic_out = (volatile generic ushort*)out;
+generic_out[gid] = (uchar)(value >> 16);
+}
diff --git a/tests/cl/program/execute/store-hi16.cl 
b/tests/cl/program/execute/store-hi16.cl
new file mode 100644
index 0..b734b3766
--- /dev/null
+++ b/tests/cl/program/execute/store-hi16.cl
@@ -0,0 +1,134 @@
+/*!
+
+[config]
+name: store high 16-bits of 32-bit value
+clc_version_min: 10
+
+dimensions: 1
+
+[test]
+name: store hi16 global
+kernel_name: store_hi16_global
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer ushort[4] \
+0xabcd0x   0x 0x
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x0x0x
+
+[test]
+name: store hi16 local
+kernel_name: store_hi16_local
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer ushort[4] \
+  0xabcd0x   0x 0x
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x0x0x
+
+[test]
+name: store hi16 private
+kernel_name: store_hi16_private
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer ushort[4] \
+  0xabcd0x   0x 0x
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x0x0x
+
+
+[test]
+name: store hi16 global trunc i8
+kernel_name: truncstorei8_hi16_global
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uchar[4] \
+  0xcd 0x22 0xad 0x80
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x112233440xdeadbeef  0x70809024
+
+
+[test]
+name: store hi16 local trunc i8
+kernel_name: truncstorei8_hi16_local
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uchar[4] \
+  0xcd 0x22 0xad 0x80
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x112233440xdeadbeef  0x70809024
+
+
+[test]
+name: store hi16 private trunc i8
+kernel_name: truncstorei8_hi16_private
+global_size: 4 0 0
+local_size: 4 0 0
+
+arg_out: 0 buffer uchar[4] \
+  0xcd 0x22 0xad 0x80
+
+arg_in: 1 buffer uint[4] \
+   0xabcd12340x112233440xdeadbeef  0x70809024
+
+!*/
+
+kernel void store_hi16_global(volatile global ushort* out, volatile global 
uint* in)
+{
+int gid = get_global_id(0);
+uint value = in[gid];
+out[gid] = value >> 16;
+}
+
+kernel void store_hi16_local(volatile global ushort* out, volatile global 
uint* in)
+{
+volatile local short lds[64];
+int lid = get_local_id(0);
+int gid = get_global_id(0);
+
+uint value = in[gid];
+lds[lid] = value >> 16;
+out[gid] = lds[lid];
+}
+
+kernel void store_hi16_private(volatile global ushort* out, volatile global 
uint* in)
+{
+int gid = get_global_id(0);
+volatile private short stack = in[gid] >> 16;
+out[gid] = stack;
+}
+
+kernel void truncstorei8_hi16_global(volatile global uchar* out, volatile 
global uint* in)
+{
+int gid = get_global_id(0);
+uint value = in[gid];
+out[gid] = (uchar)(value >> 16);
+}
+
+kernel void truncstorei8_hi16_local(volatile global uchar* out, volatile 
global uint* in)
+{
+volatile local short lds[64];
+int lid = get_local_id(0);
+int gid = get_global_id(0);
+
+uint value = in[gid];
+lds[lid] = value >> 16;
+out[gid] = (uchar)lds[lid];
+}
+
+kernel void truncstorei8_hi16_private(volatile global uchar* out, volatile 
global uint* in)
+{
+int gid = get_global_id(0);
+volatile private short stack = in[gid]

[Piglit] [PATCH] cl: Add tests for mad mix

2017-09-19 Thread Matt Arsenault
These aren't great since they need more test
values, and generating half results is kind
of a pain from any other tool. The perfect
values used don't really stress the conversions,
but this as at least enough to make sure the
encoding is correct.
---
 tests/cl/program/execute/mad-mix.cl | 283 
 1 file changed, 283 insertions(+)
 create mode 100644 tests/cl/program/execute/mad-mix.cl

diff --git a/tests/cl/program/execute/mad-mix.cl 
b/tests/cl/program/execute/mad-mix.cl
new file mode 100644
index 0..dd7a5a516
--- /dev/null
+++ b/tests/cl/program/execute/mad-mix.cl
@@ -0,0 +1,283 @@
+/*!
+
+[config]
+name: f32 mad with conversion from f16
+clc_version_min: 10
+build_options: -cl-denorms-are-zero
+require_device_extensions: cl_khr_fp16
+
+dimensions: 1
+
+[test]
+name: mad mix f32 f16lo f16lo f16lo
+kernel_name: mad_mix_f32_f16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0   1.0   1.0   -1.0   \
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0\
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f32 fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f32_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0   -1.0   1.0   1.0   \
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0\
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f32 f16lo f16lo f16hi
+kernel_name: mad_mix_f32_f16lo_f16lo_f16hi
+global_size: 4 0 0
+
+arg_out: 0 buffer float[4] \
+  0.0   \
+  1.0   \
+  1.0   \
+ -1.0
+
+arg_in: 1 buffer half[4] \
+  0.0   \
+  1.0   \
+  0.0   \
+ -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   \
+  1.0   \
+  1.0   \
+  1.0
+
+arg_in: 3 buffer half2[4] \
+  1000.0 0.0   \
+  1000.0 0.0   \
+  1000.0 1.0   \
+  1000.0 0.0
+
+
+[test]
+name: mad mix f32 f16lo f16lo neg(f16hi)
+kernel_name: mad_mix_f32_f16lo_f16lo_negf16hi
+global_size: 5 0 0
+
+arg_out: 0 buffer float[5] \
+  0.0   \
+  1.0   \
+ -1.0   \
+ -1.0   \
+  0.0
+
+arg_in: 1 buffer half[5] \
+  0.0   \
+  1.0   \
+  0.0   \
+ -1.0   \
+  2.0
+
+arg_in: 2 buffer half[5] \
+  0.0   \
+  1.0   \
+  1.0   \
+  1.0   \
+  2.0
+
+arg_in: 3 buffer half2[5] \
+  1000.0 0.0   \
+  1000.0 0.0   \
+  1000.0 1.0   \
+  1000.0 0.0   \
+  1000.0 4.0
+
+
+[test]
+name: mad mix f16lo fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f16lo_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer half[4] \
+  0.0   -1.0   1.0   1.0  \
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0\
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+[test]
+name: mad mix f16hi fneg(f16lo) f16lo f16lo
+kernel_name: mad_mix_f16hi_negf16lo_f16lo_f16lo
+global_size: 4 0 0
+
+arg_out: 0 buffer half2[4] \
+  2.00.0   \
+  2.0   -1.0   \
+  2.01.0   \
+  2.01.0
+
+arg_in: 1 buffer half[4] \
+  0.0   1.0   0.0  -1.0
+
+arg_in: 2 buffer half[4] \
+  0.0   1.0   1.0   1.0\
+
+arg_in: 3 buffer half[4] \
+  0.0   0.0   1.0   0.0
+
+
+
+[test]
+name: mad mix f32 f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f32_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer float[5] \
+  0.0   1.0   0.0   0.75   \
+  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+[test]
+name: mad mix f16lo f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f16lo_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer half[5] \
+  0.0   1.0   0.0   0.75   \
+  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+[test]
+name: mad mix f16hi f16lo f16lo f16lo with clamp
+kernel_name: mad_mix_f16hi_f16lo_f16lo_f16lo_clamp
+global_size: 5 0 0
+
+arg_out: 0 buffer half2[5] \
+  2.0  0.0   \
+  2.0  1.0   \
+  2.0  0.0   \
+  2.0  0.75  \
+  2.0  1.0
+
+arg_in: 1 buffer half[5] \
+  0.0   2.0  -2.0   0.5  \
+  0.5
+
+arg_in: 2 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  1.0
+
+arg_in: 3 buffer half[5] \
+  0.0   1.0   1.0   0.5  \
+  0.5
+
+
+!*/
+
+#pragma OPENCL EXTENSION cl_khr_fp16 : enable
+
+kernel void mad_mix_f32_f16lo_f16lo_f16lo(global float* out, global half* in0, 
global half* in1, global half* in2)
+{
+int id = get_global_id(0);
+out[id] = (float)in0[id] * (float)in1[id] + (float)in2[id];
+}
+
+kernel void mad_mix_f32_negf16lo_f16lo_f16lo(global float* out, global half* 
in0, global half* in1, global half* in2)
+{
+int id = get_global_id(0);
+out[id] = (float)-in0[id] * (float)in1[id] + (float)in2[id];
+}
+
+kernel void mad_mix_f32_f16lo_f16lo_f16hi(global float* out, gl

[Piglit] [PATCH] cl: Add tests for function calls

2017-09-18 Thread Matt Arsenault
Passes on ROCm, I haven't tried clover recently. Last
time I did it errored because the AsmParser wasn't properly
initialized.

v2: Fix non-unique test names, Wrap noinline in unguarded macro,
use prettier test names, use device_regex (effectively restricting to ROCm)
---
 tests/cl/program/execute/call-clobbers-amdgcn.cl |  68 +++
 tests/cl/program/execute/calls-struct.cl | 179 +++
 tests/cl/program/execute/calls-workitem-id.cl|  77 +++
 tests/cl/program/execute/calls.cl| 607 +++
 tests/cl/program/execute/tail-calls.cl   | 305 
 5 files changed, 1236 insertions(+)
 create mode 100644 tests/cl/program/execute/call-clobbers-amdgcn.cl
 create mode 100644 tests/cl/program/execute/calls-struct.cl
 create mode 100644 tests/cl/program/execute/calls-workitem-id.cl
 create mode 100644 tests/cl/program/execute/calls.cl
 create mode 100644 tests/cl/program/execute/tail-calls.cl

diff --git a/tests/cl/program/execute/call-clobbers-amdgcn.cl 
b/tests/cl/program/execute/call-clobbers-amdgcn.cl
new file mode 100644
index 0..400771795
--- /dev/null
+++ b/tests/cl/program/execute/call-clobbers-amdgcn.cl
@@ -0,0 +1,68 @@
+/*!
+
+[config]
+name: amdgcn call clobbers
+clc_version_min: 10
+device_regex: gfx[\d]*
+
+[test]
+name: callee saved sgpr
+kernel_name: call_clobber_s40
+dimensions: 1
+global_size: 1 0 0
+arg_out: 0 buffer int[1] 0xabcd1234
+
+[test]
+name: callee saved vgpr
+kernel_name: call_clobber_v40
+dimensions: 1
+global_size: 1 0 0
+arg_out: 0 buffer int[1] 0xabcd1234
+
+!*/
+
+#ifndef __AMDGCN__
+#error This test is only for amdgcn
+#endif
+
+__attribute__((noinline))
+void clobber_s40()
+{
+__asm volatile("s_mov_b32 s40, 0xdead" : : : "s40");
+}
+
+kernel void call_clobber_s40(__global int* ret)
+{
+__asm volatile("s_mov_b32 s40, 0xabcd1234" : : : "s40");
+
+clobber_s40();
+
+int tmp;
+
+__asm volatile("v_mov_b32 %0, s40"
+  : "=v"(tmp)
+  :
+  : "s40");
+*ret = tmp;
+}
+
+__attribute__((noinline))
+void clobber_v40()
+{
+__asm volatile("v_mov_b32 v40, 0xdead" : : : "v40");
+}
+
+kernel void call_clobber_v40(__global int* ret)
+{
+__asm volatile("v_mov_b32 v40, 0xabcd1234" : : : "v40");
+
+clobber_v40();
+
+int tmp;
+__asm volatile("v_mov_b32 %0, v40"
+  : "=v"(tmp)
+  :
+  : "v40");
+*ret = tmp;
+}
+
diff --git a/tests/cl/program/execute/calls-struct.cl 
b/tests/cl/program/execute/calls-struct.cl
new file mode 100644
index 0..04f769dac
--- /dev/null
+++ b/tests/cl/program/execute/calls-struct.cl
@@ -0,0 +1,179 @@
+/*!
+
+[config]
+name: calls with structs
+clc_version_min: 10
+
+[test]
+name: byval struct
+kernel_name: call_i32_func_byval_Char_IntArray
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 1021 1022 1023 1024 1025 1026 1027 1028 \
+ 1029 1030 1031 1032 1033 1034 1035 1036
+
+arg_out: 1 buffer int[16] \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+
+arg_in: 2 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+
+[test]
+name: sret struct
+kernel_name: call_sret_Char_IntArray_func
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 921 922 923 924 925 926 927 928 \
+ 929 930 931 932 933 934 935 936
+
+arg_in: 1 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+
+[test]
+name: byval struct and sret struct
+kernel_name: call_sret_Char_IntArray_func_byval_Char_IntArray
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+  86 87 88 89   \
+  90 91 92 93   \
+  94 95 96 97   \
+  98 99 100 101
+
+arg_out: 1 buffer int[16]\
+  134  135  136  137  \
+  138  139  140  141  \
+  142  143  144  145  \
+  146  147  148  149
+
+arg_in: 2 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+!*/
+
+#define NOINLINE __attribute__((noinline))
+
+typedef struct ByVal_Char_IntArray {
+char c;
+int i[4];
+} ByVal_Char_IntArray;
+
+NOINLINE
+int i32_func_byval_Char_IntArray(ByVal_Char_IntArray st)
+{
+st.i[0] += 100;
+
+int sum = 0;
+for (int i = 0; i < 4; ++i)
+{
+sum += st.i[i];
+}
+
+sum += st.c;
+return sum;
+}
+
+kernel void call_i32_func_byval_Char_IntArray(global int* out0,
+  global int* out1,
+  global int* input)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int id = get_global_id(0);
+
+int val = input[id];
+st.i[0] = 14;
+st.i[1] = -8;
+st.i[2] = val;
+st.i[3] = 900;
+
+int result = i32_func_byval_Char_IntArray(st);
+out0[id] = result;
+out1[id] = st.i[0];
+}
+
+NOINLINE
+ByVal_Char_IntArray sret_Char_IntArray_func(global int* input, int id)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int val = input[id];
+st.i[0] = 14;
+st.i[1] = -8;
+

[Piglit] [PATCH] cl: Fix device_regex feature

2017-09-18 Thread Matt Arsenault
---
 tests/cl/program/program-tester.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/cl/program/program-tester.c 
b/tests/cl/program/program-tester.c
index 1a876101d..a4abed9ee 100644
--- a/tests/cl/program/program-tester.c
+++ b/tests/cl/program/program-tester.c
@@ -1703,7 +1703,7 @@ parse_config(const char* config_str,
} else if(regex_match(key, "^platform_regex$")) 
{
config->platform_regex = 
add_dynamic_str_copy(value);
} else if(regex_match(key, "^device_regex$")) {
-   config->platform_regex = 
add_dynamic_str_copy(value);
+   config->device_regex = 
add_dynamic_str_copy(value);
} else if(regex_match(key, 
"^require_platform_extensions$")) {
config->require_platform_extensions =
add_dynamic_str_copy(value);
-- 
2.11.0

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] Add tests for function calls

2017-09-18 Thread Matt Arsenault

> On Sep 16, 2017, at 20:15, Jan Vesely  wrote:
>> 
>> +#ifndef __AMDGCN__
>> +#error This test is only for amdgcn
>> +#endif
> 
> This needs "device_regexp" in config section to skip instead of fail on
> other platforms/devices.
> 

I’ve tried doing this, but there isn’t really a satisfactory way to test for 
this. While ROCm uses the consistent gfxNNN device names, clover and the older 
OpenCL platforms use the various device names which ends up just being an 
exhaustive list. The names are also slightly different for the same devices 
between those.

Additionally, this feature seems to not work and no other test is using it. If 
I use just device_regex, I get errors about the platform:
# Skipping platform AMD Accelerated Parallel Processing because it does not 
match platform_regex.

If I add the exact platform name, it works. If I additionally add a device_name 
regex, I get the same error about the platform name. Quickly looking at the 
code I don’t see any reason why these would be linked.


>> +// The inline asm is necessary to defeat interprocedural sparse
>> +// conditional constant propagation eliminating some of the trivial
>> +// calls.
>> +#ifdef __AMDGCN__
>> +#define USE_ASM 1
>> +#endif
> 
> I think it'd be better to use build options to disable the opt pass
> instead (or all optimizations, like optimization-options-cl1X.cl
> tests).
> 
> Jan
> 

-O0 is far heavier than this test should get. We really want testing with full 
optimizations, the code is pretty radically different at -O0 in all programs. 
Ideally we would have every test running at -O0 and other opt levels for best 
coverage. 

We could defeat this optimization in particular by keeping the function 
externally visible, but right now we are forced to internalize every program 
before codegen so this is always a problem for us. I don’t see any specific 
switch to disable this particular pass. Other methods might be using a volatile 
variable for the constant (which I don’t want because that introduces stack 
usage, which these tests specifically do not want). I could try to find other 
convoluted methods of defeating the optimization, but those will also end up 
contrary to the goal of having easily readable isa for these.

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] Add tests for function calls

2017-09-18 Thread Matt Arsenault

> On Sep 17, 2017, at 19:22, Jan Vesely  wrote:
> 
> On Sun, 2017-09-17 at 17:02 -0700, Matt Arsenault wrote:
>>> On Sep 16, 2017, at 20:15, Jan Vesely  wrote:
>>> 
>>> afaik, noinline is not defined in CLC, so it should be ifdefed on
>>> __clang__
>>> 
>> 
>> It’s not, but my reading of the standard is that it’s required to
>> parse any unknown attributes and ignore them. ifdef clang would be
>> too restrictive (e.g. it wouldn’t work with the old EDG frontend)
> 
> OK, if it helps. ignoring/warning is sensible compiler behaviour.
> 
> I haven't found anything about unknown attributes in the specs (CLC or
> GCC), so technically it should be UB.
> 
> Jan
> 
> -- 
> Jan Vesely 

6.11.5 in the 1.2 spec says:
Attributes are intended as useful hints to the compiler. It is our intention 
that a particular implementation of OpenCL be free to ignore all attributes and 
the resulting executable binary will produce the same result. This does not 
preclude an implementation from making use of the additional information 
provided by attributes and performing optimizations or other transformations as 
it sees fit. In this case it is the programmer’s responsibility to guarantee 
that the information provided is in some sense correct. 


My interpretation is it should be OK to ignore unknown attributes___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] Add tests for function calls

2017-09-17 Thread Matt Arsenault

> On Sep 16, 2017, at 20:15, Jan Vesely  wrote:
> 
> afaik, noinline is not defined in CLC, so it should be ifdefed on
> __clang__
> 

It’s not, but my reading of the standard is that it’s required to parse any 
unknown attributes and ignore them. ifdef clang would be too restrictive (e.g. 
it wouldn’t work with the old EDG frontend)

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


[Piglit] [PATCH] Add tests for function calls

2017-09-13 Thread Matt Arsenault
---
 tests/cl/program/execute/call-clobbers-amdgcn.cl |  68 +++
 tests/cl/program/execute/calls-struct.cl | 177 +++
 tests/cl/program/execute/calls-workitem-id.cl|  75 +++
 tests/cl/program/execute/calls.cl| 605 +++
 tests/cl/program/execute/tail-calls.cl   | 305 
 5 files changed, 1230 insertions(+)
 create mode 100644 tests/cl/program/execute/call-clobbers-amdgcn.cl
 create mode 100644 tests/cl/program/execute/calls-struct.cl
 create mode 100644 tests/cl/program/execute/calls-workitem-id.cl
 create mode 100644 tests/cl/program/execute/calls.cl
 create mode 100644 tests/cl/program/execute/tail-calls.cl

diff --git a/tests/cl/program/execute/call-clobbers-amdgcn.cl 
b/tests/cl/program/execute/call-clobbers-amdgcn.cl
new file mode 100644
index 0..66243ddbe
--- /dev/null
+++ b/tests/cl/program/execute/call-clobbers-amdgcn.cl
@@ -0,0 +1,68 @@
+/*!
+
+[config]
+name: calls
+clc_version_min: 10
+
+
+[test]
+name: callee saved sgpr
+kernel_name: call_clobber_s40
+dimensions: 1
+global_size: 1 0 0
+arg_out: 0 buffer int[1] 0xabcd1234
+
+[test]
+name: callee saved vgpr
+kernel_name: call_clobber_v40
+dimensions: 1
+global_size: 1 0 0
+arg_out: 0 buffer int[1] 0xabcd1234
+
+!*/
+
+#ifndef __AMDGCN__
+#error This test is only for amdgcn
+#endif
+
+__attribute__((noinline))
+void clobber_s40()
+{
+__asm volatile("s_mov_b32 s40, 0xdead" : : : "s40");
+}
+
+kernel void call_clobber_s40(__global int* ret)
+{
+__asm volatile("s_mov_b32 s40, 0xabcd1234" : : : "s40");
+
+clobber_s40();
+
+int tmp;
+
+__asm volatile("v_mov_b32 %0, s40"
+  : "=v"(tmp)
+  :
+  : "s40");
+*ret = tmp;
+}
+
+__attribute__((noinline))
+void clobber_v40()
+{
+__asm volatile("v_mov_b32 v40, 0xdead" : : : "v40");
+}
+
+kernel void call_clobber_v40(__global int* ret)
+{
+__asm volatile("v_mov_b32 v40, 0xabcd1234" : : : "v40");
+
+clobber_v40();
+
+int tmp;
+__asm volatile("v_mov_b32 %0, v40"
+  : "=v"(tmp)
+  :
+  : "v40");
+*ret = tmp;
+}
+
diff --git a/tests/cl/program/execute/calls-struct.cl 
b/tests/cl/program/execute/calls-struct.cl
new file mode 100644
index 0..2e8176c8e
--- /dev/null
+++ b/tests/cl/program/execute/calls-struct.cl
@@ -0,0 +1,177 @@
+/*!
+
+[config]
+name: calls
+clc_version_min: 10
+
+[test]
+name: byval struct
+kernel_name: call_i32_func_byval_Char_IntArray
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 1021 1022 1023 1024 1025 1026 1027 1028 \
+ 1029 1030 1031 1032 1033 1034 1035 1036
+
+arg_out: 1 buffer int[16] \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+  14   14   14   14 \
+
+arg_in: 2 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+
+[test]
+name: sret struct
+kernel_name: call_sret_Char_IntArray_func
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+ 921 922 923 924 925 926 927 928 \
+ 929 930 931 932 933 934 935 936
+
+arg_in: 1 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+
+[test]
+name: byval struct and sret struct
+kernel_name: call_sret_Char_IntArray_func_byval_Char_IntArray
+dimensions: 1
+global_size: 16 0 0
+
+arg_out: 0 buffer int[16]\
+  86 87 88 89   \
+  90 91 92 93   \
+  94 95 96 97   \
+  98 99 100 101
+
+arg_out: 1 buffer int[16]\
+  134  135  136  137  \
+  138  139  140  141  \
+  142  143  144  145  \
+  146  147  148  149
+
+arg_in: 2 buffer int[16] \
+ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+
+!*/
+
+typedef struct ByVal_Char_IntArray {
+char c;
+int i[4];
+} ByVal_Char_IntArray;
+
+__attribute__((noinline))
+int i32_func_byval_Char_IntArray(ByVal_Char_IntArray st)
+{
+st.i[0] += 100;
+
+int sum = 0;
+for (int i = 0; i < 4; ++i)
+{
+sum += st.i[i];
+}
+
+sum += st.c;
+return sum;
+}
+
+kernel void call_i32_func_byval_Char_IntArray(global int* out0,
+  global int* out1,
+  global int* input)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int id = get_global_id(0);
+
+int val = input[id];
+st.i[0] = 14;
+st.i[1] = -8;
+st.i[2] = val;
+st.i[3] = 900;
+
+int result = i32_func_byval_Char_IntArray(st);
+out0[id] = result;
+out1[id] = st.i[0];
+}
+
+__attribute__((noinline))
+ByVal_Char_IntArray sret_Char_IntArray_func(global int* input, int id)
+{
+ByVal_Char_IntArray st;
+st.c = 15;
+
+int val = input[id];
+st.i[0] = 14;
+st.i[1] = -8;
+st.i[2] = val;
+st.i[3] = 900;
+
+return st;
+}
+
+kernel void call_sret_Char_IntArray_func(global int* output, global int* input)
+{
+int id = get_global_id(0);
+ByVal_Char_IntArray st = sret_Char_IntArray_func(input, id);
+
+int sum = 0;
+for (int i = 0; i < 4; ++i)
+{
+sum += st.i[i];
+}

Re: [Piglit] [PATCH] cl: Add sign_extend_inreg test

2017-02-08 Thread Matt Arsenault

> On Feb 2, 2017, at 01:24, arse...@gmail.com wrote:
> 
> From: Matt Arsenault 
> 
> v2: Rename test file
> ---
> .../cl/program/execute/amdgcn.sign_extend_inreg.cl | 387 +
> 1 file changed, 387 insertions(+)
> create mode 100644 tests/cl/program/execute/amdgcn.sign_extend_inreg.cl
> 
ping

___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add tests for stored fneg

2017-01-26 Thread Matt Arsenault

> On Jan 20, 2017, at 11:56, Jan Vesely  wrote:
> nly one config section per test file.
> 
>> +
>> +[test]
>> +name: fneg
>> +kernel_name: fneg_f32
>> +global_size: 15 0 0
>> +
>> +arg_out: 0 buffer float[15]   \
>> +  -0.0  0.0  -0.5  0.5  \
>> +  -1.0  1.0  -2.0  2.0  \
>> +  -4.0  4.0 -10.0  10.0 \
>> +  -inf  inf  nan
>> +
>> +arg_in: 1 buffer float[15]   \
>> +  0.0  -0.0   0.5  -0.5   \
>> +  1.0  -1.0   2.0  -2.0   \
>> +  4.0  -4.0  10.0  -10.0  \
>> +  inf  -inf  nan
> 
> can you split the values between -pos and -neg tests?
> sorry, I should have been more explicit the first time.

I don’t see the point of doing this and it will increase the execution cost of 
the test, plus I find more tests of the same kernel more annoying to update


> 
>> +
>> !*/
>> 
>> kernel void add(global float* out, float a, float b) {
>> @@ -341,3 +365,9 @@ kernel void plus(global float* out, float in) {
>> kernel void minus(global float* out, float in) {
>>  out[0] = -in;
>> }
>> +
>> +kernel void fneg_f32(global float* out, global float* in)
>> +{
>> +int id = get_global_id(0);
>> +out[id] = -in[id];
>> +}
> 
> this should replace the "minus" kernel.
> 

I was debating this, although the scalar input argument is a difference. The 
new version removes this

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 2/2] cl: Add tests for fdiv with neg/abs inputs

2017-01-25 Thread Matt Arsenault

> On Jan 20, 2017, at 12:34, Jan Vesely  wrote:
> 
> On Mon, 2017-01-16 at 11:02 -0800, arse...@gmail.com wrote:
>> From: Matt Arsenault 
> 
> Reviewed-by: Jan Vesely 
> 
> Jan


I need someone to push this for me
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 07/10] cl: Add test for negative index + small offset for private

2016-12-09 Thread Matt Arsenault

> On Dec 8, 2016, at 11:52, Jan Vesely  wrote:
>> 
> 
> can I drop the part?
> 
> Jan

Yes
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 07/10] cl: Add test for negative index + small offset for private

2016-12-08 Thread Matt Arsenault

> On Dec 7, 2016, at 11:03, Jan Vesely  wrote:
> 
> On Tue, 2016-12-06 at 11:05 -0800, Matt Arsenault wrote:
>>> On Dec 6, 2016, at 11:04, Jan Vesely  wrote:
>>> 
>>> On Tue, 2016-12-06 at 10:52 -0800, Matt Arsenault wrote:
>>>>> On Dec 5, 2016, at 12:42, Jan Vesely  wrote:
>>>>> 
>>>>> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
>>>>> <mailto:arse...@gmail.com> wrote:
>>>>>> From: Matt Arsenault 
>>>>>> 
>>>>>> ---
>>>>>> .../execute/negative-private-base-pointer.cl   | 120 
>>>>>> +
>>>>>> 1 file changed, 120 insertions(+)
>>>>>> create mode 100644 
>>>>>> tests/cl/program/execute/negative-private-base-pointer.cl
>>>>>> 
>>>>>> diff --git a/tests/cl/program/execute/negative-private-base-pointer.cl 
>>>>>> b/tests/cl/program/execute/negative-private-base-pointer.cl
>>>>>> new file mode 100644
>>>>>> index 000..7ee528b
>>>>>> --- /dev/null
>>>>>> +++ b/tests/cl/program/execute/negative-private-base-pointer.cl
>>>>>> @@ -0,0 +1,120 @@
>>>>>> +/*!
>>>>>> +[config]
>>>>>> +name: negative private buffer base index
>>>>>> +clc_version_min: 10
>>>>>> +dimensions: 1
>>>>>> +
>>>>>> +[test]
>>>>>> +kernel_name: read_write_private_base_plus_offset
>>>>>> +name: negative base private index
>>>>>> +global_size: 1 0 0
>>>>>> +
>>>>>> +arg_out: 0 buffer int[16]  \
>>>>>> +  0xab   \
>>>>>> +  0xbc   \
>>>>>> +  0xabcd \
>>>>>> +  0xdead \
>>>>>> + \
>>>>>> +  0xcafe \
>>>>>> +  0xf00d \
>>>>>> +  0xababfeed \
>>>>>> +  0xca00fe   \
>>>>>> + \
>>>>>> +  0xb00feed  \
>>>>>> +  0xca00fe   \
>>>>>> +  0xfeedbeef \
>>>>>> +  0xfe   \
>>>>>> + \
>>>>>> +  0xbe00fe   \
>>>>>> +  0xabcdef   \
>>>>>> +  0xbeef \
>>>>>> +  0xde
>>>>>> +
>>>>>> +
>>>>>> +arg_in: 1 buffer int[16] \
>>>>>> +-1 \
>>>>>> +-1 \
>>>>>> +-4 \
>>>>>> +-4 \
>>>>>> +   \
>>>>>> +-3 \
>>>>>> +-4 \
>>>>>> +-2 \
>>>>>> +  -115 \
>>>>>> +   \
>>>>>> +  -109 \
>>>>>> + -1015 \
>>>>>> + -1011 \
>>>>>> + -1020 \
>>>>>> +   \
>>>>>> + -1014 \
>>>>>> +  -137 \
>>>>>> +  -151 \
>>>>>> +   -40
>>>>>> +
>>>>>> +!*/
>>>>>> +
>>>>>> +#if 0
>>>>>> +  0xab   \
>>>>>> +  0xbc   \
>>>>>> +  0xf00d \
>>>>>> +  0xdead \
>>>>>> +  0xcafe \
>>>>>> +  0xabcd \
>>>>>> +  0xababfeed \
>>>>>> +  0xca00fe   \
>>>>>> +  0xb00feed  \
>>>>>> +  0xca00fe   \
>>>>>> +  0xfeedbeef \
>>>>>> +  0xfe   \
>>>>>> +  0xbe00fe   \
>>>>>> +  0xabcdef   \
>>>>>> +  0xbeef \
>>>>>> +  0xde
>>>>>> +#endif
>>>>>> +
>>>>>> +kernel void read_write_private_base_plus_offset(global int* out, global 
>>>>>> int* in)
>>>>>> +{
>>>>>> +volatile int alloca[16];
>>>>> 
>>>>> does this need to be volatile?
>>>>> 
>>>>> other than that:
>>>>> Reviewed-by: Jan Vesely >>>> <mailto:jan.ves...@rutgers.edu>>
>>>>> 
>>>>> Jan
>>>> 
>>>> Yes, otherwise the private memory access will be trivially optimized
>>>> out defeating the point of the test
>>> 
>>> I don't get the trivial part. what would that be optimized to? the
>>> indices are using values from input buffer (therefore unknown), so it
>>> cannot directly match the constants to corresponding position in out
>>> buffer.
>>> 
>>> Jan
>> 
>> This could be replaced with a series of selects or hit the move to LDS 
>> optimization 
> 
> right, thanks. I didn't consider move to LDS.
> 
> last question. what's the purpose of that #if 0 block?
> 
> Jan


I think it was just other values I was going to test but then never finished 
them___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] cl: Add f32 immediate tests

2016-12-08 Thread Matt Arsenault

> On Dec 6, 2016, at 15:30, Jan Vesely  wrote:
> 
> git complains about whitespace error here.
> Reviewed-by: Jan Vesely  >
> I fixed all 3 issues locally, I can push it if you're OK with those
> changes. It passes on r600 and intel CPU.

That’s fine, go ahead___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 03/10] cl: Add tests for different versions of fmin / fmax.

2016-12-06 Thread Matt Arsenault

> On Dec 5, 2016, at 13:35, Jan Vesely  wrote:
> 
> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
> <mailto:arse...@gmail.com> wrote:
>> From: Matt Arsenault 
>> 
>> These do not use the normal simple format because the number
>> of combinations that need to be tested is simply too large,
>> especially when tests for min3/max3 are added.
>> 
>> The unordered compare tests could be improved. Currently they truly
>> test the unordered compare because of LLVM bug 21610, but
>> ideally that would be fixed.
>> ---
>> tests/cl/program/CMakeLists.cl.txt |   1 +
>> .../cl/program/execute/scalar-comparison-float.cl  | 105 +
>> tests/cl/program/float-min-max-kernels.cl  | 492 
>> +
>> tests/cl/program/float-min-max.cpp | 475 
>> 4 files changed, 1073 insertions(+)
>> create mode 100644 tests/cl/program/float-min-max-kernels.cl
>> create mode 100644 tests/cl/program/float-min-max.cpp
>> 
>> diff --git a/tests/cl/program/CMakeLists.cl.txt 
>> b/tests/cl/program/CMakeLists.cl.txt
>> index c8d7307..5ef0f6b 100644
>> --- a/tests/cl/program/CMakeLists.cl.txt
>> +++ b/tests/cl/program/CMakeLists.cl.txt
>> @@ -2,3 +2,4 @@ piglit_cl_add_program_test (tester program-tester.c)
>> piglit_cl_add_program_test (max-work-item-sizes max-work-item-sizes.c)
>> piglit_cl_add_program_test (bitcoin-phatk bitcoin-phatk.c)
>> piglit_cl_add_program_test (predefined-macros predefined-macros.c)
>> +piglit_cl_add_program_test (float-min-max float-min-max.cpp)
>> diff --git a/tests/cl/program/execute/scalar-comparison-float.cl 
>> b/tests/cl/program/execute/scalar-comparison-float.cl
>> index 4891fc5..598fae0 100644
>> --- a/tests/cl/program/execute/scalar-comparison-float.cl
>> +++ b/tests/cl/program/execute/scalar-comparison-float.cl
>> @@ -148,6 +148,71 @@ arg_in:  1 float -3.5
>> arg_in:  2 float 4.5
>> arg_out: 0 buffer int[1] 1
>> 
>> +
>> +[test]
>> +name: select_max_gt
>> +kernel_name: select_max_gt
>> +global_size: 24 0 0
>> +
>> +arg_out: 0 buffer float[24]\
>> +  0.0  1.0  2.0  2.0  0.0  0.0 \
>> +  NAN  NAN  1.0  NAN -1.0  NAN \
>> +  0.0  0.0 97.0  INF  INF  INF \
>> +  NAN  NAN  INF  NAN -INF  INF
>> +
>> +arg_in: 1 buffer float[24] \
>> +  0.0  1.0  1.0  2.0  0.0 -1.0 \
>> +  NAN  1.0  NAN -1.0  NAN  0.0 \
>> +  0.0 -0.0 37.0  INF  INF -INF \
>> + -INF  INF  NAN -INF  NAN  0.0
>> +
>> +arg_in: 2 buffer float[24] \
>> +  0.0  1.0  2.0  1.0 -1.0  0.0 \
>> +  NAN  NAN  1.0  NAN -1.0  NAN \
>> + -0.0  0.0 97.0  INF -INF  INF \
>> + -INF  NAN  INF  NAN -INF  INF
>> +
>> +[test]
>> +name: select_max_gte
>> +kernel_name: select_max_gte
>> +global_size: 15 0 0
>> +
>> +arg_out: 0 buffer float[15]\
>> +  0.0  1.0  2.0  2.0  0.0  0.0 \
>> +  NAN  NAN  1.0  NAN -1.0  NAN \
>> +  0.0  0.0 97.0
>> +
>> +arg_in: 1 buffer float[15] \
>> +  0.0  1.0  1.0  2.0  0.0 -1.0 \
>> +  NAN  1.0  NAN -1.0  NAN  0.0 \
>> +  0.0 -0.0 37.0
>> +
>> +arg_in: 2 buffer float[15] \
>> +  0.0  1.0  2.0  1.0 -1.0  0.0 \
>> +  NAN  NAN  1.0  NAN -1.0  NAN \
>> + -0.0  0.0 97.0
>> +
>> +[test]
>> +name: select_min_gt
>> +kernel_name: select_min_gt
>> +global_size: 15 0 0
>> +
>> +arg_out: 0 buffer float[15]\
>> +  0.0  1.0  1.0  1.0 -1.0 -1.0 \
>> +  NAN  NAN  NAN  NAN NAN  NAN  \
>> +  0.0  0.0 37.0
>> +
>> +arg_in: 1 buffer float[15] \
>> +  0.0  1.0  1.0  2.0  0.0 -1.0 \
>> +  NAN  1.0  NAN -1.0  NAN  0.0 \
>> +  0.0 -0.0  37.0
>> +
>> +arg_in: 2 buffer float[15] \
>> +  0.0  1.0  2.0  1.0 -1.0  0.0 \
>> +  NAN  NAN  1.0  NAN -1.0  NAN \
>> + -0.0  0.0 97.0
>> +
>> +
>> !*/
>> 
>> kernel void eq(global int* out, float a, float b) {
>> @@ -173,3 +238,43 @@ kernel void lt(global int* out, float a, float b) {
>> kernel void lte(global int* out, float a, float b) {
>>  out[0] = a <= b;
>> }
>> +
>> +kernel void select_max_gt(global float* restrict out, global float* 
>> restrict a, global float* restrict b) {
>> +int id = get_global_id(0);
>> +out[id] = (a[id] > b[id]) ? a[id] : b[id];
>> +}
>> +
>> +kernel void select_max_gte(global float* restrict out, global float* 
>> restrict a, global float* restrict b) {
>> +int id = get_global_id(0);
>> +out[id] = (a[id] >= b[id]) ? a[id] : 

Re: [Piglit] [PATCH 02/10] cl: Add sign_extend_inreg test

2016-12-06 Thread Matt Arsenault

> On Dec 6, 2016, at 12:18, Jan Vesely  wrote:
> 
> On Tue, 2016-12-06 at 10:55 -0800, Matt Arsenault wrote:
>>> On Dec 5, 2016, at 13:20, Jan Vesely >> <mailto:jan.ves...@rutgers.edu>> wrote:
>>> 
>>> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
>>> <mailto:arse...@gmail.com> <mailto:arse...@gmail.com 
>>> <mailto:arse...@gmail.com>> wrote:
>>>> From: Matt Arsenault mailto:arse...@gmail.com> 
>>>> <mailto:arse...@gmail.com <mailto:arse...@gmail.com>>>
>>>> 
>>>> ---
>>>> tests/cl/program/execute/sign_extend_inreg.cl | 387 
>>>> ++
>>>> 1 file changed, 387 insertions(+)
>>>> create mode 100644 tests/cl/program/execute/sign_extend_inreg.cl
>>> 
>>> this looks very GCN specific, the name should IMO indicate it.
>> 
>> It’s completely a completely generic test, just the testcases are
>> intended to stress the important cases for GCN
> 
> there is no sign extent CL operation, nor sign extend inreg. CL
> implementations are not required to have SGPR registers. Almost all of
> the tests in this series are GCN specific especially with the names
> like v_* and s_*.
> 
> I'm not against GCN specific test cases, I'm just saying that it should
> be marked as such. Just like "r600 create release buffer bug", the test
> is generic (runs and passes on other platforms), but tests r600
> specific bug/behaviour. It would also make it easier to use regexp to
> select/skip only these specific tests.
> 
> Jan

It’s testing the underlying operation in the compiler exposed by using the 
basic bit operations, the fact that there isn’t an explicit CL operation called 
sign extend is irrelevant. Skipping these on other platforms would be a 
mistake. The s_/v_ distinction is to emphasize that it is stressing the scalar 
operators which the standard conformance tests will not do.There are more tests 
because of the emphasize on stressing the GCN compiler parts, but nothing about 
it is specific.

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 07/10] cl: Add test for negative index + small offset for private

2016-12-06 Thread Matt Arsenault

> On Dec 6, 2016, at 11:04, Jan Vesely  wrote:
> 
> On Tue, 2016-12-06 at 10:52 -0800, Matt Arsenault wrote:
>>> On Dec 5, 2016, at 12:42, Jan Vesely  wrote:
>>> 
>>> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
>>> <mailto:arse...@gmail.com> wrote:
>>>> From: Matt Arsenault 
>>>> 
>>>> ---
>>>> .../execute/negative-private-base-pointer.cl   | 120 
>>>> +
>>>> 1 file changed, 120 insertions(+)
>>>> create mode 100644 
>>>> tests/cl/program/execute/negative-private-base-pointer.cl
>>>> 
>>>> diff --git a/tests/cl/program/execute/negative-private-base-pointer.cl 
>>>> b/tests/cl/program/execute/negative-private-base-pointer.cl
>>>> new file mode 100644
>>>> index 000..7ee528b
>>>> --- /dev/null
>>>> +++ b/tests/cl/program/execute/negative-private-base-pointer.cl
>>>> @@ -0,0 +1,120 @@
>>>> +/*!
>>>> +[config]
>>>> +name: negative private buffer base index
>>>> +clc_version_min: 10
>>>> +dimensions: 1
>>>> +
>>>> +[test]
>>>> +kernel_name: read_write_private_base_plus_offset
>>>> +name: negative base private index
>>>> +global_size: 1 0 0
>>>> +
>>>> +arg_out: 0 buffer int[16]  \
>>>> +  0xab   \
>>>> +  0xbc   \
>>>> +  0xabcd \
>>>> +  0xdead \
>>>> + \
>>>> +  0xcafe \
>>>> +  0xf00d \
>>>> +  0xababfeed \
>>>> +  0xca00fe   \
>>>> + \
>>>> +  0xb00feed  \
>>>> +  0xca00fe   \
>>>> +  0xfeedbeef \
>>>> +  0xfe   \
>>>> + \
>>>> +  0xbe00fe   \
>>>> +  0xabcdef   \
>>>> +  0xbeef \
>>>> +  0xde
>>>> +
>>>> +
>>>> +arg_in: 1 buffer int[16] \
>>>> +-1 \
>>>> +-1 \
>>>> +-4 \
>>>> +-4 \
>>>> +   \
>>>> +-3 \
>>>> +-4 \
>>>> +-2 \
>>>> +  -115 \
>>>> +   \
>>>> +  -109 \
>>>> + -1015 \
>>>> + -1011 \
>>>> + -1020 \
>>>> +   \
>>>> + -1014 \
>>>> +  -137 \
>>>> +  -151 \
>>>> +   -40
>>>> +
>>>> +!*/
>>>> +
>>>> +#if 0
>>>> +  0xab   \
>>>> +  0xbc   \
>>>> +  0xf00d \
>>>> +  0xdead \
>>>> +  0xcafe \
>>>> +  0xabcd \
>>>> +  0xababfeed \
>>>> +  0xca00fe   \
>>>> +  0xb00feed  \
>>>> +  0xca00fe   \
>>>> +  0xfeedbeef \
>>>> +  0xfe   \
>>>> +  0xbe00fe   \
>>>> +  0xabcdef   \
>>>> +  0xbeef \
>>>> +  0xde
>>>> +#endif
>>>> +
>>>> +kernel void read_write_private_base_plus_offset(global int* out, global 
>>>> int* in)
>>>> +{
>>>> +volatile int alloca[16];
>>> 
>>> does this need to be volatile?
>>> 
>>> other than that:
>>> Reviewed-by: Jan Vesely >> <mailto:jan.ves...@rutgers.edu>>
>>> 
>>> Jan
>> 
>> Yes, otherwise the private memory access will be trivially optimized
>> out defeating the point of the test
> 
> I don't get the trivial part. what would that be optimized to? the
> indices are using values from input buffer (therefore unknown), so it
> cannot directly match the constants to corresponding position in out
> buffer.
> 
> Jan

This could be replaced with a series of selects or hit the move to LDS 
optimization 

-Matt
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 09/10] Require python 2.7

2016-12-06 Thread Matt Arsenault

> On Dec 5, 2016, at 12:30, Jan Vesely  wrote:
> 
> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com wrote:
>> From: Matt Arsenault 
>> 
>> Things seem to break with python3
> 
> can you be more specific? things have run OK for me with python3.5 for
> some time.
> 
> Jan

I”ve had this patch sitting in my tree for at least a year, so it’s possible 
it’s not necessary anymore

-Matt
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 01/10] cl: Add tests for 24-bit div / rem optimization

2016-12-06 Thread Matt Arsenault

> On Dec 5, 2016, at 13:24, Jan Vesely  wrote:
> 
> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com wrote:
>> From: Matt Arsenault 
>> 
>> ---
>> tests/cl/program/execute/scalar-arithmetic-int.cl | 1657 
>> +
>> 1 file changed, 1657 insertions(+)
> 
> I have used an extended version of this patch (same values for uint
> test) for some time:
> Reviewed-by: Jan Vesely 

I need someone to commit these for me

-Matt
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 06/10] cl: Add test for clz

2016-12-06 Thread Matt Arsenault

> On Dec 5, 2016, at 12:14, Jan Vesely  wrote:
> 
> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
> <mailto:arse...@gmail.com> wrote:
>> From: Matt Arsenault 
>> 
>> ---
>> tests/cl/program/execute/clz.cl | 389 
>> 
>> 1 file changed, 389 insertions(+)
>> create mode 100644 tests/cl/program/execute/clz.cl
>> 
>> diff --git a/tests/cl/program/execute/clz.cl 
>> b/tests/cl/program/execute/clz.cl
>> new file mode 100644
>> index 000..06ba0e3
>> --- /dev/null
>> +++ b/tests/cl/program/execute/clz.cl
>> @@ -0,0 +1,389 @@
>> +/*!
>> +
>> +[config]
>> +name: clz
>> +clc_version_min: 10
> 
> there already is a generated clz test, what's wrong with adding
> testcases there?
> 
> Jan


I looked at this briefly and it looks like it only is testing the builtin 
function over various ranges. This is more targeted for optimizations involving 
clz rather than just the raw function itself, so maybe it should be renamed. It 
test combines like folding compare + select with 0 input argument into the 
behavior of v_ffbh_u32. I’m not sure this would fit in easily with the simple 
function tests

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 02/10] cl: Add sign_extend_inreg test

2016-12-06 Thread Matt Arsenault

> On Dec 5, 2016, at 13:20, Jan Vesely  wrote:
> 
> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
> <mailto:arse...@gmail.com> wrote:
>> From: Matt Arsenault mailto:arse...@gmail.com>>
>> 
>> ---
>> tests/cl/program/execute/sign_extend_inreg.cl | 387 
>> ++
>> 1 file changed, 387 insertions(+)
>> create mode 100644 tests/cl/program/execute/sign_extend_inreg.cl
> 
> this looks very GCN specific, the name should IMO indicate it.

It’s completely a completely generic test, just the testcases are intended to 
stress the important cases for GCN

> 
> 
> why 14 different arguments? does scalarization not work transitively on
> global pointers?


This is the most reliable way to get an SGPR value of the correct type. I could 
change it to a uniformly indexed constant pointer (but even that may someday 
only be an optimization that may not always happen)

> 
>> +int shift0)
>> +{
>> +long args[] =
>> +{
>> +a0, a1, a2, a3,
>> +a4, a5, a6, a7,
>> +a8, a9, a10, a11,
>> +a12, a13
>> +};
> 
> I assume the private copy is just to have an array and use for loop?
> 

Yes


___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH 07/10] cl: Add test for negative index + small offset for private

2016-12-06 Thread Matt Arsenault

> On Dec 5, 2016, at 12:42, Jan Vesely  wrote:
> 
> On Mon, 2016-12-05 at 09:48 -0800, arse...@gmail.com 
> <mailto:arse...@gmail.com> wrote:
>> From: Matt Arsenault 
>> 
>> ---
>> .../execute/negative-private-base-pointer.cl   | 120 
>> +
>> 1 file changed, 120 insertions(+)
>> create mode 100644 tests/cl/program/execute/negative-private-base-pointer.cl
>> 
>> diff --git a/tests/cl/program/execute/negative-private-base-pointer.cl 
>> b/tests/cl/program/execute/negative-private-base-pointer.cl
>> new file mode 100644
>> index 000..7ee528b
>> --- /dev/null
>> +++ b/tests/cl/program/execute/negative-private-base-pointer.cl
>> @@ -0,0 +1,120 @@
>> +/*!
>> +[config]
>> +name: negative private buffer base index
>> +clc_version_min: 10
>> +dimensions: 1
>> +
>> +[test]
>> +kernel_name: read_write_private_base_plus_offset
>> +name: negative base private index
>> +global_size: 1 0 0
>> +
>> +arg_out: 0 buffer int[16]  \
>> +  0xab   \
>> +  0xbc   \
>> +  0xabcd \
>> +  0xdead \
>> + \
>> +  0xcafe \
>> +  0xf00d \
>> +  0xababfeed \
>> +  0xca00fe   \
>> + \
>> +  0xb00feed  \
>> +  0xca00fe   \
>> +  0xfeedbeef \
>> +  0xfe   \
>> + \
>> +  0xbe00fe   \
>> +  0xabcdef   \
>> +  0xbeef \
>> +  0xde
>> +
>> +
>> +arg_in: 1 buffer int[16] \
>> +-1 \
>> +-1 \
>> +-4 \
>> +-4 \
>> +   \
>> +-3 \
>> +-4 \
>> +-2 \
>> +  -115 \
>> +   \
>> +  -109 \
>> + -1015 \
>> + -1011 \
>> + -1020 \
>> +   \
>> + -1014 \
>> +  -137 \
>> +  -151 \
>> +   -40
>> +
>> +!*/
>> +
>> +#if 0
>> +  0xab   \
>> +  0xbc   \
>> +  0xf00d \
>> +  0xdead \
>> +  0xcafe \
>> +  0xabcd \
>> +  0xababfeed \
>> +  0xca00fe   \
>> +  0xb00feed  \
>> +  0xca00fe   \
>> +  0xfeedbeef \
>> +  0xfe   \
>> +  0xbe00fe   \
>> +  0xabcdef   \
>> +  0xbeef \
>> +  0xde
>> +#endif
>> +
>> +kernel void read_write_private_base_plus_offset(global int* out, global 
>> int* in)
>> +{
>> +volatile int alloca[16];
> 
> does this need to be volatile?
> 
> other than that:
> Reviewed-by: Jan Vesely  <mailto:jan.ves...@rutgers.edu>>
> 
> Jan

Yes, otherwise the private memory access will be trivially optimized out 
defeating the point of the test

-Matt___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit


Re: [Piglit] [PATCH] Use right comparison function for double

2016-08-24 Thread Matt Arsenault

> On Aug 24, 2016, at 08:11, Jan Vesely  wrote:
> 
> On Tue, 2016-08-23 at 21:33 -0700, arse...@gmail.com wrote:
>> From: Matt Arsenault 
> 
> I guess more cases can be consolidated this way...
> 
> Reviewed-by: Jan Vesely 
> 
> Jan
> 

I need someone to commit this for me
___
Piglit mailing list
Piglit@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/piglit