[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #9 from Andrew Pinski --- Since this based on an out of tree patch set and the patch set has not been updated since GCC 5, I am going to close this as invalid. It is not obvious if this was a scheduler issue or a bug in the GUPC code.
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #8 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-20 20:54:25 UTC --- Author: olegendo Date: Mon Aug 20 20:54:20 2012 New Revision: 190545 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190545 Log: PR target/50489 * config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and splits. (ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split. * config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function. * config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare it. PR target/50489 * gcc.target/sh/pr54089-1.c: New. Added: trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/sh/sh-protos.h trunk/gcc/config/sh/sh.c trunk/gcc/config/sh/sh.md trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #7 from Gary Funck gary at intrepid dot com 2011-10-17 03:04:08 UTC --- Do you have any suggestions on additional tests, debug steps that we can perform to narrow down the factors that lead to instructions being mis-scheduled?
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #5 from Richard Guenther rguenth at gcc dot gnu.org 2011-09-25 12:13:44 UTC --- D.3059_11 = VIEW_CONVERT_EXPRshared [8] struct foo[1] *(D.3058); looks like bogus IL to me. You view D.3058, a struct of size 16, as a pointer (of size 8). I suppose you want to load D.3058.vaddr here? D.3060_12 = (shared [8] struct foo *) D.3059_11; D.3061_13 = VIEW_CONVERT_EXPRstruct upc_shared_ptr_t(D.3060_12).phase; looks bogus IL to me. It views the pointer(!?) D.3060_12 as being a struct upc_shared_ptr_t and extracts a value that is not within that pointer. But maybe I'm missing something because I don't recognize that 'shared [8]' qualification. Do you want to dereference D.3060_12 (D.3058.vaddr) here? That said, I wonder why you don't trip over tree-cfg.c verification of VIEW_CONVERT_EXPR as TYPE_SIZE (TREE_TYPE (D.3060_12)) != TYPE_SIZE (struct upc_shared_ptr_t). Please try to avoid using VIEW_CONVERT_EXPRs completely unless you know exactly what you are doing.
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #6 from Gary Funck gary at intrepid dot com 2011-09-25 19:58:58 UTC --- (In reply to comment #5) D.3059_11 = VIEW_CONVERT_EXPRshared [8] struct foo[1] *(D.3058); looks like bogus IL to me. You view D.3058, a struct of size 16, as a pointer (of size 8). I suppose you want to load D.3058.vaddr here? D.3060_12 = (shared [8] struct foo *) D.3059_11; D.3061_13 = VIEW_CONVERT_EXPRstruct upc_shared_ptr_t(D.3060_12).phase; looks bogus IL to me. It views the pointer(!?) D.3060_12 as being a struct upc_shared_ptr_t and extracts a value that is not within that pointer. But maybe I'm missing something because I don't recognize that 'shared [8]' qualification. [...] The syntax (shared [8] struct foo *) above is unique to UPC. This is a pointer to a shared' qualified object with a blocking factor (layout qualifier) of 8. This type of pointer is called a pointer-to-shared (PTS) in the UPC language definition; it is a pointer that can span nodes. On a 64-bit machine, using the sturct PTS (as opposed to packed PTS) representation it is a 16 byte quantity. Thus the casts back/forth between (shared *) and struct upc_shared_ptr_t do not violate the size assumptions of VIEW_CONVERT_EXPR(). The blocking factor (the [8] in shared [8] * above) is unique to UPC. In UPC, arrays are block distributed. This means that block 0 is on thread 0, block 1 is on thread 1 and so on. Thus, for a UPC program that is run with 2 threads, foo[0], foo[1] ... foo[7] are allocated on (have affinity to) thread 0 and foo[8], foo[9] ... foo[13] are allocated on thread 1. This blocking factor provides for the ability to cast a pointer to a block of shared storage into a regular C pointer (a local pointer) as long as the thread performing the cast has affinity to the block. What is potentially troublesome for the middle end tree optimizations and back end RTL optimizations is that these pointers-to-shared (PTS's) are fat pointers. Note that after the lowering pass (performed in upc/upc-genericize.c) that there will be no *indirections* through a PTS. Instead, indirections of a PTS in a value context will be converted into get calls, which are implemented by the UPC runtime (libupc/smp). Indirections that are the targets of assignments are translated into put calls, implemented by the UPC runtime. The lowering pass also translates UPC pointer-to-shared arithmetic operations into their equivalent operations which do not involve PTS's, but rather cast the PTS's to their representation type (struct upc_shared_ptr_t) and then operate on the component parts of the PTS. As you can see from the description of blocking factors above, the mapping of foo[i] to its (global) address requires a fairly complex arrangement of division and modulo operations. The libupc runtime is unique in that parts of it may be inlined. Inlining of the runtime is enabled at optimization levels greater than 0, or it can be explicitly inlined/not-inlined via the -fupc-inline-lib switch. The inlining is accomplished via a pre-include of a runtime header file, implemented by the upc driver. Inlining is enabled in the test case documented in this bug report. Thus, a simple assignment statement involving array indexing of a UPC shared blocked array expands into a rather complex assortment of tree code, and generated RTL. (This complexity makes it difficult to create an equivalent C test case.) After lowering, any references to shared * (pointers-to-shared) should only occur in casts to/from the representation type and in moves/copies of the PTS container. We have run into a few places where the middle end makes some assumptions about regular pointers and tries to apply those assumptions to a UPC pointer-to-shared; we have been able to exclude PTS's by adding additional checks for them -- there are not many places that we have had to do this. Perhaps that sort of pointer-specific logic is kicking in here. Arguably, the UPC lowering pass should fully lower PTS typed expressions, so that they don't end up in the tree. Potentially, a PTS hanging around in the tree doesn't meet the strict (or even not-so-strict) definition of GENERIC. Fully lowering those expressions is on our to do list. When we do that, rather than using casts, we will likely rewrite the PTS type references into references to the PTS representation type. We have shied away from this because it makes the resulting tree code even more difficult to follow, because it loses logical correspondence to the original C source statements. That said, this technique of casting a PTS to its representation type and then extracting its sub-parts has been working for quite a while on several different target architectures. However, maybe this recast of a pointer-to-shared is confusing the post-reload instruction scheduler and/or the logic that creates the MEM_REF?. We would like to see if we can find a way to make the
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 Alexander Monakov amonakov at gcc dot gnu.org changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #3 from Alexander Monakov amonakov at gcc dot gnu.org 2011-09-23 09:30:01 UTC --- Does the problem vanish if you add -fno-strict-aliasing? One more thing, you mention -O2 in the flags, but then refer to selective scheduler, which is only enabled at -O3. Perhaps you meant Haifa scheduler.
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #4 from Gary Funck gary at intrepid dot com 2011-09-23 17:38:18 UTC --- (In reply to comment #3) Does the problem vanish if you add -fno-strict-aliasing? One more thing, you mention -O2 in the flags, but then refer to selective scheduler, which is only enabled at -O3. Perhaps you meant Haifa scheduler. The tests still fail with -O3 -fno-strict-aliasing. They pass with -O3 -fno-schedule-insns2. We mentioned -O2 in the bug report, because it helped rule out other optimizations that -O3 might imply. Then we selectively added -ftree-vectorize and -fschedule-insns2 to demonstrate that the combination of those additional optimizations will demonstrate the mis-scheduling. If there are additional tests that you suggest that we can run to help narrow this down, let us know, and we'll try to provide that additional information. Also, we can provide a script to run gdb on cc1upc, if that helps. Thanks.
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #1 from Gary Funck gary at intrepid dot com 2011-09-22 19:21:54 UTC --- Created attachment 25343 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25343 UPC test case that demonstrates instruction mis-schedule
[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 --- Comment #2 from Gary Funck gary at intrepid dot com 2011-09-22 19:31:04 UTC --- Created attachment 25344 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25344 zipped tar file with build script, readme, test case and test artifacts