On June 1, 2018 5:15:58 PM GMT+02:00, Bill Schmidt <wschm...@linux.ibm.com> wrote: >On Jun 1, 2018, at 10:11 AM, Will Schmidt <will_schm...@vnet.ibm.com> >wrote: >> >> On Fri, 2018-06-01 at 08:53 +0200, Richard Biener wrote: >>> On Thu, May 31, 2018 at 9:59 PM Will Schmidt ><will_schm...@vnet.ibm.com> wrote: >>>> >>>> Hi, >>>> Add support for gimple folding for unaligned vector loads and >stores. >>>> testcases posted separately in this thread. >>>> >>>> Regtest completed across variety of systems, P6,P7,P8,P9. >>>> >>>> OK for trunk? >>>> Thanks, >>>> -Will >>>> >>>> [gcc] >>>> >>>> 2018-05-31 Will Schmidt <will_schm...@vnet.ibm.com> >>>> >>>> * config/rs6000/rs6000.c: (rs6000_builtin_valid_without_lhs) >Add vec_xst >>>> variants to the list. (rs6000_gimple_fold_builtin) Add >support for >>>> folding unaligned vector loads and stores. >>>> >>>> diff --git a/gcc/config/rs6000/rs6000.c >b/gcc/config/rs6000/rs6000.c >>>> index d62abdf..54b7de2 100644 >>>> --- a/gcc/config/rs6000/rs6000.c >>>> +++ b/gcc/config/rs6000/rs6000.c >>>> @@ -15360,10 +15360,16 @@ rs6000_builtin_valid_without_lhs (enum >rs6000_builtins fn_code) >>>> case ALTIVEC_BUILTIN_STVX_V8HI: >>>> case ALTIVEC_BUILTIN_STVX_V4SI: >>>> case ALTIVEC_BUILTIN_STVX_V4SF: >>>> case ALTIVEC_BUILTIN_STVX_V2DI: >>>> case ALTIVEC_BUILTIN_STVX_V2DF: >>>> + case VSX_BUILTIN_STXVW4X_V16QI: >>>> + case VSX_BUILTIN_STXVW4X_V8HI: >>>> + case VSX_BUILTIN_STXVW4X_V4SF: >>>> + case VSX_BUILTIN_STXVW4X_V4SI: >>>> + case VSX_BUILTIN_STXVD2X_V2DF: >>>> + case VSX_BUILTIN_STXVD2X_V2DI: >>>> return true; >>>> default: >>>> return false; >>>> } >>>> } >>>> @@ -15869,10 +15875,77 @@ rs6000_gimple_fold_builtin >(gimple_stmt_iterator *gsi) >>>> gimple_set_location (g, loc); >>>> gsi_replace (gsi, g, true); >>>> return true; >>>> } >>>> >>>> + /* unaligned Vector loads. */ >>>> + case VSX_BUILTIN_LXVW4X_V16QI: >>>> + case VSX_BUILTIN_LXVW4X_V8HI: >>>> + case VSX_BUILTIN_LXVW4X_V4SF: >>>> + case VSX_BUILTIN_LXVW4X_V4SI: >>>> + case VSX_BUILTIN_LXVD2X_V2DF: >>>> + case VSX_BUILTIN_LXVD2X_V2DI: >>>> + { >>>> + arg0 = gimple_call_arg (stmt, 0); // offset >>>> + arg1 = gimple_call_arg (stmt, 1); // address >>>> + lhs = gimple_call_lhs (stmt); >>>> + location_t loc = gimple_location (stmt); >>>> + /* Since arg1 may be cast to a different type, just use >ptr_type_node >>>> + here instead of trying to enforce TBAA on pointer >types. */ >>>> + tree arg1_type = ptr_type_node; >>>> + tree lhs_type = TREE_TYPE (lhs); >>>> + /* POINTER_PLUS_EXPR wants the offset to be of type >'sizetype'. Create >>>> + the tree using the value from arg0. The resulting type >will match >>>> + the type of arg1. */ >>>> + gimple_seq stmts = NULL; >>>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, >arg0); >>>> + tree temp_addr = gimple_build (&stmts, loc, >POINTER_PLUS_EXPR, >>>> + arg1_type, arg1, >temp_offset); >>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>>> + /* Use the build2 helper to set up the mem_ref. The >MEM_REF could also >>>> + take an offset, but since we've already incorporated >the offset >>>> + above, here we just pass in a zero. */ >>>> + gimple *g; >>>> + g = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, >temp_addr, >>>> + build_int_cst >(arg1_type, 0))); >>> >>> So in GIMPLE the type of the MEM_REF specifies the alignment so my >question >>> is what type does the lhs usually have here? I'd simply guess V4SF, >etc.? In >> >> yes. (double-checking). my reference for the intrinsic signatures >> shows the lhs is a vector of type. The rhs can be either *type or >> *vector of type. >> >> vector double vec_vsx_ld (int, const vector double *); >> vector double vec_vsx_ld (int, const double *); >> With similar/same for the assorted other types. >> >> These are also on my list as 'unaligned' vector loads. I'm not >certain >> if that adds a twist to how I should answer the below.. >> >> Bill? > >'unaligned' means not necessarily aligned on a vector boundary. >They are guaranteed to be aligned on an element boundary. >> >>> this case you are missing a >>> tree ltype = build_aligned_type (lhs_type, desired-alignment); >>> >>> and use that ltype for building the MEM_REF. I suppose in this case >the known >>> alignment is either BITS_PER_UNIT or element alignment (thus >>> TYPE_ALIGN (TREE_TYPE (lhs_type)))? >> >> I'd think element alignment. but no longer certain. :-) > >Yep, element alignment.
Note the x86 unaligned intrinsics support arbitray unaligned loads. So that's not available for power? Does the HW implementation require element alignment? Richard. >Thanks, >Bill >> >>> Or is the type of the load the element types? >> >> >> So, In any case.. I'll build up / modify some tests to look at data >> being loaded, and see if I can see alignment issues here. >> >> Thanks, >> -Will >> >> >> >>> Richard. >>> >>>> + gimple_set_location (g, loc); >>>> + gsi_replace (gsi, g, true); >>>> + return true; >>>> + } >>>> + >>>> + /* unaligned Vector stores. */ >>>> + case VSX_BUILTIN_STXVW4X_V16QI: >>>> + case VSX_BUILTIN_STXVW4X_V8HI: >>>> + case VSX_BUILTIN_STXVW4X_V4SF: >>>> + case VSX_BUILTIN_STXVW4X_V4SI: >>>> + case VSX_BUILTIN_STXVD2X_V2DF: >>>> + case VSX_BUILTIN_STXVD2X_V2DI: >>>> + { >>>> + arg0 = gimple_call_arg (stmt, 0); /* Value to be stored. >*/ >>>> + arg1 = gimple_call_arg (stmt, 1); /* Offset. */ >>>> + tree arg2 = gimple_call_arg (stmt, 2); /* Store-to >address. */ >>>> + location_t loc = gimple_location (stmt); >>>> + tree arg0_type = TREE_TYPE (arg0); >>>> + /* Use ptr_type_node (no TBAA) for the arg2_type. */ >>>> + tree arg2_type = ptr_type_node; >>>> + /* POINTER_PLUS_EXPR wants the offset to be of type >'sizetype'. Create >>>> + the tree using the value from arg0. The resulting type >will match >>>> + the type of arg2. */ >>>> + gimple_seq stmts = NULL; >>>> + tree temp_offset = gimple_convert (&stmts, loc, sizetype, >arg1); >>>> + tree temp_addr = gimple_build (&stmts, loc, >POINTER_PLUS_EXPR, >>>> + arg2_type, arg2, >temp_offset); >>>> + /* Mask off any lower bits from the address. */ >>>> + gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT); >>>> + gimple *g; >>>> + g = gimple_build_assign (build2 (MEM_REF, arg0_type, >temp_addr, >>>> + build_int_cst >(arg2_type, 0)), arg0); >>>> + gimple_set_location (g, loc); >>>> + gsi_replace (gsi, g, true); >>>> + return true; >>>> + } >>>> + >>>> /* Vector Fused multiply-add (fma). */ >>>> case ALTIVEC_BUILTIN_VMADDFP: >>>> case VSX_BUILTIN_XVMADDDP: >>>> case ALTIVEC_BUILTIN_VMLADDUHM: >>>> {