[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2024-03-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #9 from Andrew Pinski  ---
Since this based on an  out of tree patch set and the patch set has not been
updated since GCC 5, I am going to close this as invalid. It is not obvious if
this was a scheduler issue or a bug in the GUPC code.

[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2012-08-20 Thread olegendo at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #8 from Oleg Endo olegendo at gcc dot gnu.org 2012-08-20 20:54:25 
UTC ---
Author: olegendo
Date: Mon Aug 20 20:54:20 2012
New Revision: 190545

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190545
Log:
PR target/50489
* config/sh/sh.md (rotcr, *rotcr, shar, shlr): New insns and splits.
(ashrdi3_k, lshrdi3_k): Rewrite as insn_and_split.
* config/sh/sh.c (sh_lshrsi_clobbers_t_reg_p): New function.
* config/sh/sh-protos.h (sh_lshrsi_clobbers_t_reg_p): Declare it.

PR target/50489
* gcc.target/sh/pr54089-1.c: New.


Added:
trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/sh/sh-protos.h
trunk/gcc/config/sh/sh.c
trunk/gcc/config/sh/sh.md
trunk/gcc/testsuite/ChangeLog


[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-10-16 Thread gary at intrepid dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #7 from Gary Funck gary at intrepid dot com 2011-10-17 03:04:08 
UTC ---
Do you have any suggestions on additional tests, debug steps that we can
perform to narrow down the factors that lead to instructions being
mis-scheduled?


[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-09-25 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #5 from Richard Guenther rguenth at gcc dot gnu.org 2011-09-25 
12:13:44 UTC ---
  D.3059_11 = VIEW_CONVERT_EXPRshared [8] struct foo[1] *(D.3058);

looks like bogus IL to me.  You view D.3058, a struct of size 16, as
a pointer (of size 8).  I suppose you want to load D.3058.vaddr here?

  D.3060_12 = (shared [8] struct foo *) D.3059_11;
  D.3061_13 = VIEW_CONVERT_EXPRstruct upc_shared_ptr_t(D.3060_12).phase;

looks bogus IL to me.  It views the pointer(!?) D.3060_12 as being a
struct upc_shared_ptr_t and extracts a value that is not within that
pointer.

But maybe I'm missing something because I don't recognize that 'shared [8]'
qualification.

Do you want to dereference D.3060_12 (D.3058.vaddr) here?

That said, I wonder why you don't trip over tree-cfg.c verification
of VIEW_CONVERT_EXPR as TYPE_SIZE (TREE_TYPE (D.3060_12)) != TYPE_SIZE (struct
upc_shared_ptr_t).

Please try to avoid using VIEW_CONVERT_EXPRs completely unless you know
exactly what you are doing.


[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-09-25 Thread gary at intrepid dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #6 from Gary Funck gary at intrepid dot com 2011-09-25 19:58:58 
UTC ---
(In reply to comment #5)
   D.3059_11 = VIEW_CONVERT_EXPRshared [8] struct foo[1] *(D.3058);
 
 looks like bogus IL to me.  You view D.3058, a struct of size 16, as
 a pointer (of size 8).  I suppose you want to load D.3058.vaddr here?
 
   D.3060_12 = (shared [8] struct foo *) D.3059_11;
   D.3061_13 = VIEW_CONVERT_EXPRstruct upc_shared_ptr_t(D.3060_12).phase;
 
 looks bogus IL to me.  It views the pointer(!?) D.3060_12 as being a
 struct upc_shared_ptr_t and extracts a value that is not within that
 pointer.
 
 But maybe I'm missing something because I don't recognize that 'shared [8]'
 qualification.  [...]

The syntax (shared [8] struct foo *) above is unique to UPC.  This is a pointer
to a shared' qualified object with a blocking factor (layout qualifier) of
8.  This type of pointer is called a pointer-to-shared (PTS) in the UPC
language definition; it is a pointer that can span nodes.  On a 64-bit machine,
using the sturct PTS (as opposed to packed PTS) representation it is a 16
byte quantity.  Thus the casts back/forth between (shared *) and struct
upc_shared_ptr_t do not violate the size assumptions of VIEW_CONVERT_EXPR().

The blocking factor (the [8] in shared [8] * above) is unique to UPC.  In
UPC, arrays are block distributed.  This means that block 0 is on thread 0,
block 1 is on thread 1 and so on.  Thus, for a UPC program that is run with 2
threads, foo[0], foo[1] ... foo[7] are allocated on (have affinity to) thread
0 and foo[8], foo[9] ... foo[13] are allocated on thread 1.  This blocking
factor provides for the ability to cast a pointer to a block of shared storage
into a regular C pointer (a local pointer) as long as the thread performing
the cast has affinity to the block.

What is potentially troublesome for the middle end tree optimizations and
back end RTL optimizations is that these pointers-to-shared (PTS's) are fat
pointers.  Note that after the lowering pass (performed in
upc/upc-genericize.c) that there will be no *indirections* through a PTS. 
Instead, indirections of a PTS in a value context will be converted into get
calls, which are implemented by the UPC runtime (libupc/smp).  Indirections
that are the targets of assignments are translated into put calls,
implemented by the UPC runtime. 

The lowering pass also translates UPC pointer-to-shared arithmetic operations
into their equivalent operations which do not involve PTS's, but rather cast
the PTS's to their representation type (struct upc_shared_ptr_t) and then
operate on the component parts of the PTS.  As you can see from the description
of blocking factors above, the mapping of foo[i] to its (global) address
requires a fairly complex arrangement of division and modulo operations.

The libupc runtime is unique in that parts of it may be inlined.  Inlining of
the runtime is enabled at optimization levels greater than 0, or it can be
explicitly inlined/not-inlined via the -fupc-inline-lib switch.  The inlining
is accomplished via a pre-include of a runtime header file, implemented by the
upc driver.  Inlining is enabled in the test case documented in this bug
report.  Thus, a simple assignment statement involving array indexing of a UPC
shared blocked array expands into a rather complex assortment of tree code,
and generated RTL.  (This complexity makes it difficult to create an equivalent
C test case.)

After lowering, any references to shared * (pointers-to-shared) should only
occur in casts to/from the representation type and in moves/copies of the PTS
container.  We have run into a few places where the middle end makes some
assumptions about regular pointers and tries to apply those assumptions to a
UPC pointer-to-shared; we have been able to exclude PTS's by adding additional
checks for them -- there are not many places that we have had to do this. 
Perhaps that sort of pointer-specific logic is kicking in here.

Arguably, the UPC lowering pass should fully lower PTS typed expressions, so
that they don't end up in the tree.  Potentially, a PTS hanging around in the
tree doesn't meet the strict (or even not-so-strict) definition of GENERIC. 
Fully lowering those expressions is on our to do list.  When we do that,
rather than using casts, we will likely rewrite the PTS type references into
references to the PTS representation type.  We have shied away from this
because it makes the resulting tree code even more difficult to follow, because
it loses logical correspondence to the original C source statements.

That said, this technique of casting a PTS to its representation type and then
extracting its sub-parts has been working for quite a while on several
different target architectures.  However, maybe this recast of a
pointer-to-shared is confusing the post-reload instruction scheduler and/or the
logic that creates the MEM_REF?.

We would like to see if we can find a way to make the 

[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-09-23 Thread amonakov at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

Alexander Monakov amonakov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov amonakov at gcc dot gnu.org 2011-09-23 
09:30:01 UTC ---
Does the problem vanish if you add -fno-strict-aliasing?

One more thing, you mention -O2 in the flags, but then refer to selective
scheduler, which is only enabled at -O3.  Perhaps you meant Haifa scheduler.


[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-09-23 Thread gary at intrepid dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #4 from Gary Funck gary at intrepid dot com 2011-09-23 17:38:18 
UTC ---
(In reply to comment #3)
 Does the problem vanish if you add -fno-strict-aliasing?
 
 One more thing, you mention -O2 in the flags, but then refer to selective
 scheduler, which is only enabled at -O3.  Perhaps you meant Haifa scheduler.

The tests still fail with -O3 -fno-strict-aliasing.  They pass with -O3
-fno-schedule-insns2.  We mentioned -O2 in the bug report, because it helped
rule out other optimizations that -O3 might imply.  Then we selectively added
-ftree-vectorize and -fschedule-insns2 to demonstrate that the combination
of those additional optimizations will demonstrate the mis-scheduling.

If there are additional tests that you suggest that we can run to help narrow
this down, let us know, and we'll try to provide that additional information. 
Also, we can provide a script to run gdb on cc1upc, if that helps.  Thanks.


[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-09-22 Thread gary at intrepid dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #1 from Gary Funck gary at intrepid dot com 2011-09-22 19:21:54 
UTC ---
Created attachment 25343
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25343
UPC test case that demonstrates instruction mis-schedule


[Bug rtl-optimization/50489] [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2

2011-09-22 Thread gary at intrepid dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

--- Comment #2 from Gary Funck gary at intrepid dot com 2011-09-22 19:31:04 
UTC ---
Created attachment 25344
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=25344
zipped tar file with build script, readme, test case and test artifacts