On Fri, May 16, 2014 at 1:03 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, May 16, 2014 at 12:56 PM, <pins...@gmail.com> wrote: >> >> >>> On May 16, 2014, at 3:48 AM, Richard Biener <richard.guent...@gmail.com> >>> wrote: >>> >>> On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme >>> <thomas.preudho...@arm.com> wrote: >>>> Ping? >>> >>> Sorry ... >>> >>>> Best regards, >>>> >>>> Thomas Preud'homme >>>> >>>>> -----Original Message----- >>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>> Sent: Friday, May 09, 2014 6:26 PM >>>>> To: GCC Patches >>>>> Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store >>>>> >>>>> Sorry, took longer than expected as I got distracted by some other patch. >>>>> I merged the whole patchset in a single patch as I was told the current >>>>> setup >>>>> is actually more difficult to read. >>>>> >>>>> Here are the updated ChangeLogs: >>>>> >>>>> *** gcc/ChangeLog *** >>>>> >>>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>>> >>>>> PR tree-optimization/54733 >>>>> * expr.c (get_inner_reference): Add a parameter to control whether >>>>> a >>>>> MEM_REF should be split into base + offset. >>>>> * tree.h (get_inner_reference): Default new parameter to false. >>>>> * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure. >>>>> (CMPNOP): Define. >>>>> (find_bswap_or_nop_load): New. >>>>> (find_bswap_1): Renamed to ... >>>>> (find_bswap_or_nop_1): This. Also add support for memory source. >>>>> (find_bswap): Renamed to ... >>>>> (find_bswap_or_nop): This. Also add support for memory source and >>>>> detection of bitwise operations equivalent to load in host >>>>> endianness. >>>>> (execute_optimize_bswap): Likewise. Also move its leading >>>>> comment back >>>>> in place and split statement transformation into ... >>>>> (bswap_replace): This. Add assert when updating bswap_stats. >>>>> >>>>> *** gcc/testsuite/ChangeLog *** >>>>> >>>>> 2014-05-09 Thomas Preud'homme <thomas.preudho...@arm.com> >>>>> >>>>> PR tree-optimization/54733 >>>>> * gcc.dg/optimize-bswapdi-3.c: New test to check extension of >>>>> bswap >>>>> optimization to support memory sources and bitwise operations >>>>> equivalent to load in host endianness. >>>>> * gcc.dg/optimize-bswaphi-1.c: Likewise. >>>>> * gcc.dg/optimize-bswapsi-2.c: Likewise. >>>>> * gcc.c-torture/execute/bswap-2.c: Likewise. >>>>> >>>>> Ok for trunk? >>> >>> Ok, I now decided otherwise and dislike the new parameter to >>> get_inner_reference. Can you please revert that part and just >>> deal with a MEM_REF result in your only caller? >>> >>> And (of course) I found another possible issue. The way you >>> compute load_type and use it here: >>> >>> + /* Perform the load. */ >>> + load_offset_ptr = build_int_cst (n->alias_set, 0); >>> + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, >>> + load_offset_ptr); >>> >>> makes the load always appear aligned according to the mode of >>> load_type. On strict-alignment targets this may cause faults. >>> >>> So what you have to do is either (simpler) >>> >>> unsigned int align = get_pointer_alignment (addr_tmp); >>> tree al_load_type = load_type; >>> if (align < TYPE_ALIGN (load_type)) >>> al_load_type = build_aligned_type (load_type, align); >>> ... >>> val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, >>> load_offset_ptr); >>> >>> or keep track of the "first" actual load and use >>> >>> unsigned int align = get_object_alignment (that_first_load); >>> >>> "first" in the one that corresponds to addr_tmp. From that there >>> is a much better chance to derive good alignment values. >>> >>> Of course on STRICT_ALIGNMENT targets a not aligned load >>> will be decomposed again, so eventually doing the transformation >>> may no longer be profitable(?). >> >> Not always decomposed. On MIPS, it should using the load/store left/right >> instructions for unaligned load/stores which is normally better than >> decomposed load/stores. So having a cost model would be nice. > > Agreed, but I am happy with doing that as a followup. Btw, > a very simple one would be to reject unaligned > SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). > [of course that may be true on MIPS even for the cases where > a "reasonable" fast unalgined variant exists - nearly no target > defines that macro in a too fancy way]
Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] << 24 | x[2] << 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). Richard. > Richard. > >> Thanks, >> Andrew >> >>> >>> Thanks and sorry again for the delay. >>> >>> Otherwise the patch looks good to me. >>> >>> Richard. >>> >>>>> Best regards, >>>>> >>>>> Thomas >>>>> >>>>>> -----Original Message----- >>>>>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- >>>>>> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme >>>>>> Sent: Monday, May 05, 2014 7:30 PM >>>>>> To: GCC Patches >>>>>> Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent >>>>>> load/store >>>>>> >>>>>> I found a way to improve the function find_bswap/find_bswap_or_nop >>>>>> and reduce its size. Please hold for the review, I will post an updated >>>>>> version as soon as I finish testing. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Thomas Preud'homme >>>> >>>>