> On May 16, 2014, at 4:13 AM, Richard Biener <richard.guent...@gmail.com> > wrote: > > On Fri, May 16, 2014 at 1:03 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, May 16, 2014 at 12:56 PM, <pins...@gmail.com> wrote: >>> >>> >>>> On May 16, 2014, at 3:48 AM, Richard Biener <richard.guent...@gmail.com> >>>> wrote: >>>> >>>> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme >>>> <thomas.preudho...@arm.com> wrote: >>>>> Ping? >>>> >>>> Sorry ... >>>> >>>>> Best regards, >>>>> >>>>> Thomas Preud'homme >>>>> >>>>>> -----Original Message----- >>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>>> Sent: Friday, May 09, 2014 6:26 PM >>>>>> To: GCC Patches >>>>>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store >>>>>> >>>>>> Sorry, took longer than expected as I got distracted by some other patch. >>>>>> I merged the whole patchset in a single patch as I was told the current >>>>>> setup >>>>>> is actually more difficult to read. >>>>>> >>>>>> Here are the updated ChangeLogs: >>>>>> >>>>>> *** gcc/ChangeLog *** >>>>>> >>>>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>>>> >>>>>> PR tree-optimization/54733 >>>>>> * expr.c (get_inner_reference): Add a parameter to control whether >>>>>> a >>>>>> MEM_REF should be split into base + offset. >>>>>> * tree.h (get_inner_reference): Default new parameter to false. >>>>>> * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure. >>>>>> (CMPNOP): Define. >>>>>> (find_bswap_or_nop_load): New. >>>>>> (find_bswap_1): Renamed to ... >>>>>> (find_bswap_or_nop_1): This. Also add support for memory source. >>>>>> (find_bswap): Renamed to ... >>>>>> (find_bswap_or_nop): This. Also add support for memory source and >>>>>> detection of bitwise operations equivalent to load in host >>>>>> endianness. >>>>>> (execute_optimize_bswap): Likewise. Also move its leading >>>>>> comment back >>>>>> in place and split statement transformation into ... >>>>>> (bswap_replace): This. Add assert when updating bswap_stats. >>>>>> >>>>>> *** gcc/testsuite/ChangeLog *** >>>>>> >>>>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>>>> >>>>>> PR tree-optimization/54733 >>>>>> * gcc.dg/optimize-bswapdi-3.c: New test to check extension of >>>>>> bswap >>>>>> optimization to support memory sources and bitwise operations >>>>>> equivalent to load in host endianness. >>>>>> * gcc.dg/optimize-bswaphi-1.c: Likewise. >>>>>> * gcc.dg/optimize-bswapsi-2.c: Likewise. >>>>>> * gcc.c-torture/execute/bswap-2.c: Likewise. >>>>>> >>>>>> Ok for trunk? >>>> >>>> Ok, I now decided otherwise and dislike the new parameter to >>>> get_inner_reference. Can you please revert that part and just >>>> deal with a MEM_REF result in your only caller? >>>> >>>> And (of course) I found another possible issue. The way you >>>> compute load_type and use it here: >>>> >>>> + /* Perform the load. */ >>>> + load_offset_ptr = build_int_cst (n->alias_set, 0); >>>> + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, >>>> + load_offset_ptr); >>>> >>>> makes the load always appear aligned according to the mode of >>>> load_type. On strict-alignment targets this may cause faults. >>>> >>>> So what you have to do is either (simpler) >>>> >>>> unsigned int align = get_pointer_alignment (addr_tmp); >>>> tree al_load_type = load_type; >>>> if (align < TYPE_ALIGN (load_type)) >>>> al_load_type = build_aligned_type (load_type, align); >>>> ... >>>> val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, >>>> load_offset_ptr); >>>> >>>> or keep track of the "first" actual load and use >>>> >>>> unsigned int align = get_object_alignment (that_first_load); >>>> >>>> "first" in the one that corresponds to addr_tmp. From that there >>>> is a much better chance to derive good alignment values. >>>> >>>> Of course on STRICT_ALIGNMENT targets a not aligned load >>>> will be decomposed again, so eventually doing the transformation >>>> may no longer be profitable(?). >>> >>> Not always decomposed. On MIPS, it should using the load/store left/right >>> instructions for unaligned load/stores which is normally better than >>> decomposed load/stores. So having a cost model would be nice. >> >> Agreed, but I am happy with doing that as a followup. Btw, >> a very simple one would be to reject unaligned >> SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). >> [of course that may be true on MIPS even for the cases where >> a "reasonable" fast unalgined variant exists - nearly no target >> defines that macro in a too fancy way] > > Oh, and what happens for > > unsigned foo (unsigned char *x) > { > return x[0] << 24 | x[2] << 8 | x[3]; > } > > ? We could do an unsigned int load from x and zero byte 3 > with an AND. Enhancement for a followup, similar to also > considering vector types for the load (also I'm not sure > that uint64_type_node always has non-BLKmode for all > targets).
No we cannot if x[4] is on a different page which is not mapped in, we get a fault. Not something we want. Thanks, Andrew > > Richard. > >> Richard. >> >>> Thanks, >>> Andrew >>> >>>> >>>> Thanks and sorry again for the delay. >>>> >>>> Otherwise the patch looks good to me. >>>> >>>> Richard. >>>> >>>>>> Best regards, >>>>>> >>>>>> Thomas >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>>>> Sent: Monday, May 05, 2014 7:30 PM >>>>>>> To: GCC Patches >>>>>>> Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent >>>>>>> load/store >>>>>>> >>>>>>> I found a way to improve the function find_bswap/find_bswap_or_nop >>>>>>> and reduce its size. Please hold for the review, I will post an updated >>>>>>> version as soon as I finish testing. >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Thomas Preud'homme >>>>> >>>>>