https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69568

            Bug ID: 69568
           Summary: Invalid HSAIL opcode when using builtin vector
           Product: gcc
           Version: hsa
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: christophe_choquet at hotmail dot com
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

The test case is the following:

typedef float float2 __attribute__ ((vector_size (8)));
float2 *output;
....
#pragma omp target
#pragma omp teams thread_limit(256) // thread_limit is optional
#pragma omp distribute parallel for firstprivate(n) private(i)
for (i=0; i < n; i++) {
  float2 a;
  a[0] = i;
  a[1] = 1+i;
  output[i] = a;
}

At execution on Kaveri APU, HSA runtime says:
Error in hsa_code section, at offset 336:
Instruction has invalid type (f32x2), expected one of: b128, f16, f32, f64,
roimg, rwimg, s16, s32, s64, s8, samp, sig32, sig64, u16, u32, u64, u8, woimg

libgomp: HSA fatal error: Could not add a module to the HSA program

In fact, the HSAIL dump gives:
   14:   ld_private_align(8)_f32x2 $d1, [%a]             /* BRIG offset: 380,
op0: 240, op1: 248 */
   15:   st_align(8)_f32x2 $d1, [$d0]             /* BRIG offset: 400, op0:
240, op1: 268 */

In fact cloc.sh for a similar code sequence gives:
        st_v2_global_align(8)_f32       ($s0, $s1), [$d1];


I understand the design was not to use vector instructions, but this gives poor
performance when writing complex vectors for example (bad memory pattern).

Since the current HSAIL generation strategy is to pack F32, I tried this fix:
hsa-gen.c:
  if (TREE_CODE (type) == VECTOR_TYPE)
    {
      HOST_WIDE_INT tsize = tree_to_uhwi (TYPE_SIZE (type));
.....
        case 64:
          // Not working! res |= BRIG_TYPE_PACK_64;
          res = BRIG_TYPE_B64;
          break;

With this change, HSAIL code generated is:
   14:   ld_private_align(8)_u64 $d1, [%a]             /* BRIG offset: 380,
op0: 240, op1: 248 */
   15:   st_align(8)_u64 $d1, [$d0]             /* BRIG offset: 400, op0: 240,
op1: 268 */

which is correct.  The program loads and gives the correct result.

Reply via email to