https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69568
Bug ID: 69568 Summary: Invalid HSAIL opcode when using builtin vector Product: gcc Version: hsa Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: christophe_choquet at hotmail dot com CC: jakub at gcc dot gnu.org Target Milestone: --- The test case is the following: typedef float float2 __attribute__ ((vector_size (8))); float2 *output; .... #pragma omp target #pragma omp teams thread_limit(256) // thread_limit is optional #pragma omp distribute parallel for firstprivate(n) private(i) for (i=0; i < n; i++) { float2 a; a[0] = i; a[1] = 1+i; output[i] = a; } At execution on Kaveri APU, HSA runtime says: Error in hsa_code section, at offset 336: Instruction has invalid type (f32x2), expected one of: b128, f16, f32, f64, roimg, rwimg, s16, s32, s64, s8, samp, sig32, sig64, u16, u32, u64, u8, woimg libgomp: HSA fatal error: Could not add a module to the HSA program In fact, the HSAIL dump gives: 14: ld_private_align(8)_f32x2 $d1, [%a] /* BRIG offset: 380, op0: 240, op1: 248 */ 15: st_align(8)_f32x2 $d1, [$d0] /* BRIG offset: 400, op0: 240, op1: 268 */ In fact cloc.sh for a similar code sequence gives: st_v2_global_align(8)_f32 ($s0, $s1), [$d1]; I understand the design was not to use vector instructions, but this gives poor performance when writing complex vectors for example (bad memory pattern). Since the current HSAIL generation strategy is to pack F32, I tried this fix: hsa-gen.c: if (TREE_CODE (type) == VECTOR_TYPE) { HOST_WIDE_INT tsize = tree_to_uhwi (TYPE_SIZE (type)); ..... case 64: // Not working! res |= BRIG_TYPE_PACK_64; res = BRIG_TYPE_B64; break; With this change, HSAIL code generated is: 14: ld_private_align(8)_u64 $d1, [%a] /* BRIG offset: 380, op0: 240, op1: 248 */ 15: st_align(8)_u64 $d1, [$d0] /* BRIG offset: 400, op0: 240, op1: 268 */ which is correct. The program loads and gives the correct result.