Hi!  I'd like to ping this series.  This is a big change, so I'd like to get it committed fairly early in stage 1.  I know you have a lot stacked up, though.

Thanks!
Bill

On 4/27/21 10:32 AM, Bill Schmidt wrote:
The design of the target-specific built-in function support in the
Power back end has not stood the test of time.  The machinery is
grossly inefficient, confusing, and arcane; and adding new built-in
functions is inefficient and error-prone.  This patch set introduces a
replacement.

Because of the scope of the changes, it's important to be able to
verify that the new system makes only intended changes to the
functions that are supported.  Therefore this patch set adds a new
mechanism, and (in the final patch) enables it instead of the existing
support, but does not yet remove the old support.  That will happen in
a follow-up patch once we're comfortable with the new system.

Most of the patches in this set are specific to the rs6000 back end.
However, the first two patches make changes in common code and require
review from the appropriate maintainers.  Jakub and Jeff, I would
appreciate it if you could look at these two small patches.

After these changes are upstream, adding new built-in functions will
usually be as simple as adding two lines to a file,
rs6000-builtin-new.def, that give the prototype of the function and a
little additional information.  Adding new overloaded functions will
require adding a new section to another file, rs6000-overload.def,
with one line describing the overload information, and two lines for
each function to be dispatched to from the overloaded function.

The patches are divided into the following sections.

Patches 0001-0002: Common code patches

   Patch 0001 adds a mechanism to the Makefile to allow specifying
   additional dependencies for "out_object_file", which is rs6000.o for
   the rs6000 back end.  I found this necessary to be able to have
   rs6000.o depend on a header file generated during the build.

   Patch 0002 expands the gengtype machinery to scan header files
   created during the build for GC roots.

Patches 0003, 0005-0023: Generator program

   A new program, rs6000-gen-builtins, is created and executed during
   the build.  It reads rs6000-builtin-new.def and rs6000-overload.def
   and produces three output files:  rs6000-builtins.h,
   rs6000-builtins.c, and rs6000-vecdefines.h.  rs6000-builtins.h
   defines the data structures representing the built-in functions,
   overloaded functions, overload instantiations, and function type
   specifiers.  rs6000-builtins.c contains static initializers for the
   data structures, as well as the function rs6000_autoinit_builtins
   that performs additional run-time initialization.
   rs6000-vecdefines.h contains a set of #defines that map external
   identifiers such as vec_add to their internal builtin names, such as
   __builtin_vec_add.  This replaces most of the similar #defines
   previously contained in altivec.h, which now #includes the new file
   instead.

   This set of patches adds the source for the generator program.

Patches 0024-0025: Target build machinery

   These patches make changes to config.gcc and t-rs6000 to build and
   run the new generator program, and to ensure that the garbage
   collection roots in rs6000-builtins.h are scanned by gengtype.

Patches 0004, 0026-0031, 0033-0037: Input files

   These patches build up the input files to the generator program,
   listing all of the built-in functions and overloads to be
   processed.

Patch 0032: Add pointer types

   This patch creates and caches a bunch of pointer type nodes.  The
   existing built-in machinery, for some reason, only created base
   types up front and created the pointer types on demand (over and
   over and over again).  The new mechanism needs all the type nodes
   available, so we add them here.

Patch 0038: Call rs6000_autoinit_builtins

Patch 0039: A little special handling for Darwin

Patches 0040-0041: Miscellaneous support patches

Patch 0042: Rewrite the overload processing

   Most of this code remains largely the same as before, with the same
   special handling for a few interesting built-in functions.  But the
   general handling of overloaded functions is now much more efficient
   since the new data structures are designed for quick lookup, whereas
   the old machinery does a brutal linear search.

Patch 0043: Rewrite gimple folding

   The "rewrite" here consists entirely of changing the names of the
   builtins to be processed, since we need a separate enumeration of
   builtins for the new machinery.

Patch 0044: Vectorization support

   Small updates to the functions used for mapping built-ins to their
   vectorized counterparts.

Patches 0045-0050: Rewrite built-in function expansion

   This is where most of the meat comes in.  Lookup of built-ins at
   expand time is again much more efficient, replacing the old
   mechanism of multiple linear searches over the whole built-in
   table.  Another major change is that all built-in functions are
   always defined, but a test at expand time is used to determine
   whether they are enabled.  This allows proper handling of
   built-ins in the presence of "#pragma target" directives.  Also,
   handling of special cases is made more uniform using an attribute
   system, which I hope makes this much easier to maintain.

Patches 0051-0052: Miscellaneous changes

Patch 0053: Debug support

   Small changes here to allow gathering of a little more data from
   -mdebug=builtin.  I used this to look for differences between
   functions defined by the old and new built-in support.

Patch 0054: Change altivec.h to use rs6000-vecdefines.h

Patch 0055: Test case adjustments

   Most of these changes are due to automating checks for literal
   arguments that must be within a certain range.  This gives us more
   regular error messages, which don't always match the previous error
   messages.  There are also some adjustments because altivec.h now
   includes rs6000-vecdefines.h.

Patch 0056: Flip the switch to enable the new support

   Victory is ours...

Patch 0057: Fix one last late-breaking change

   Keeping the code up-to-date with upstream has been fun.  When I
   rebased to create the patch set, I found one new issue where a
   small change had been made to the overload handling for the
   vec_insert builtins.  This patch reflects that change into the
   new handling.  My version of git is having trouble with
   interactive rebasing, so it was easier to just add the extra patch.

Now, with all that done, there are a few things that are not yet
done:

(1) A future patch will remove the old code.

(2) There are times where we ought to dispatch an overload to one
function if VSX is available, and to another function if it is not.
We need a general mechanism for allowing conditional dispatch.  I've
outlined a method for this in rs6000-overload.def that I want to
implement down the road.

(3) I want to investigate why vec_mul requires special handling in
rs6000-c.c; it doesn't seem like it should.

(4) Similarly, can we remove some of the special handling for
vec_adde, vec_addec, vec_sube, and vec_subec?

(5) The parser in the generator program doesn't yet handle
"escape-newline" sequences for breaking long lines.  I should add that
capability.

(6) Longer term, can we use a similar mechanism for built-in functions
used for all targets in common code?


A word about compatibility:

I deliberately implemented all the old built-ins exactly as previously
defined, wherever possible, despite an overwhelming desire to pitch
out a bunch of them that have already been considered deprecated for
ages.  I found that it was too difficult to both implement a new
system and remove deprecated things at the same time, and in general
it seems like a dangerous thing to do.  Better to do this in stages if
we're going to do it at all.  Unfortunately a lot of deprecated things
still appear all over our own test suite, and I'm afraid we can assume
they appear in user code all over the place as well.

What I've done instead is to make very clear which interfaces are
considered deprecated in the input files themselves.  Over time,
perhaps we can start to remove some of these, but in reality I think
we're going to just continue to be stuck with them.

Here is a complete list of known incompatibilities with the old
mechanism:

(1) __builtin_vec_vpopcntu[bdhw] were all registered as overloads but
didn't have any instantiations.  Therefore they could not have been
used anywhere, and I haven't implemented them.

(2) I added ten new built-ins named __builtin_vsx_xxpermx_<type> to be
used for the overloaded vec_xxpermx function, instead of the bloody
hack that was used before.  The functionality of vec_xxpermx is
unchanged.

(3) A number of built-ins used "long" for DImode types, which would
break these for 32-bit.  I changed those arguments and return values
to "long long" to avoid such problems, when those built-ins were not
restricted to 64-bit mode already.  There aren't many such cases.

(4) A small handful of builtins didn't have the correct return type to
match the mode of the pattern, so I fixed those.  They are all new in
GCC 11 and can't have worked properly.

(5) I handled the MMA internal functions slightly differently, so that
all the ones with extra vector_quad arguments are listed as such,
rather than having that hacked on during expand time.

(6) __builtin_vsx_xl_len_r took only a char * rather than a void *;
fixing this was backward compatible.

(7) __builtin_vsx_splat_2d[fi] were incompletely defined and couldn't
have ever worked; fixed.

(8) A small handful of builtins weren't marked as "const," but are
obviously const, so I fixed those.

I've kept a complete list of discrepancies for my records, in case any
issues arise from my misunderstanding something.

I do want to thank all the people who have contributed to the built-in
design over the years.  For all my griping, there are some marvelous
bits in there that I hope I have kept intact.  My hope is to make the
whole system much easier to use and maintain going forward.  Time will
tell.

The patches have been bootstrapped and tested on a Power10
little-endian system, and on a Power8 big-endian system with both 32-
and 64-bit enabled, with no regressions.  I'm not crazy enough to
believe I don't have any errors in here, but I have endeavoured to
test and minimize them to the best of my ability.

Is this series okay for trunk, in GCC 12 stage 1?

Thanks!
Bill

Bill Schmidt (57):
   Allow targets to specify build dependencies for out_object_file
   Support scanning of build-time GC roots in gengtype
   rs6000: Initial create of rs6000-gen-builtins.c
   rs6000: Add initial input files
   rs6000: Add file support and functions for diagnostic support
   rs6000: Add helper functions for parsing
   rs6000: Add functions for matching types, part 1 of 3
   rs6000: Add functions for matching types, part 2 of 3
   rs6000: Add functions for matching types, part 3 of 3
   rs6000: Red-black tree implementation for balanced tree search
   rs6000: Main function with stubs for parsing and output
   rs6000: Parsing built-in input file, part 1 of 3
   rs6000: Parsing built-in input file, part 2 of 3
   rs6000: Parsing built-in input file, part 3 of 3
   rs6000: Parsing of overload input file
   rs6000: Build and store function type identifiers
   rs6000: Write output to the builtin definition include file
   rs6000: Write output to the builtins header file
   rs6000: Write output to the builtins init file, part 1 of 3
   rs6000: Write output to the builtins init file, part 2 of 3
   rs6000: Write output to the builtins init file, part 3 of 3
   rs6000: Write static initializations for built-in table
   rs6000: Write static initializations for overload tables
   rs6000: Incorporate new builtins code into the build machinery
   rs6000: Add gengtype handling to the build machinery
   rs6000: Add the rest of the [altivec] stanza to the builtins file
   rs6000: Add VSX builtins
   rs6000: Add available-everywhere and ancient builtins
   rs6000: Add power7 and power7-64 builtins
   rs6000: Add power8-vector builtins
   rs6000: Add Power9 builtins
   rs6000: Add more type nodes to support builtin processing
   rs6000: Add Power10 builtins
   rs6000: Add MMA builtins
   rs6000: Add miscellaneous builtins
   rs6000: Add Cell builtins
   rs6000: Add remaining overloads
   rs6000: Execute the automatic built-in initialization code
   rs6000: Darwin builtin support
   rs6000: Add sanity to V2DI_type_node definitions
   rs6000: Always initialize vector_pair and vector_quad nodes
   rs6000: Handle overloads during program parsing
   rs6000: Handle gimple folding of target built-ins
   rs6000: Support for vectorizing built-in functions
   rs6000: Builtin expansion, part 1
   rs6000: Builtin expansion, part 2
   rs6000: Builtin expansion, part 3
   rs6000: Builtin expansion, part 4
   rs6000: Builtin expansion, part 5
   rs6000: Builtin expansion, part 6
   rs6000: Update rs6000_builtin_decl
   rs6000: Miscellaneous uses of rs6000_builtin_decls_x
   rs6000: Debug support
   rs6000: Update altivec.h for automated interfaces
   rs6000: Test case adjustments
   rs6000: Enable the new builtin support
   rs6000: Adjust to late-breaking change

  gcc/Makefile.in                               |    8 +-
  gcc/config.gcc                                |    2 +
  gcc/config/rs6000/altivec.h                   |  516 +-
  gcc/config/rs6000/darwin.h                    |    8 +-
  gcc/config/rs6000/rbtree.c                    |  233 +
  gcc/config/rs6000/rbtree.h                    |   51 +
  gcc/config/rs6000/rs6000-builtin-new.def      | 3875 +++++++++++
  gcc/config/rs6000/rs6000-c.c                  | 1083 +++
  gcc/config/rs6000/rs6000-call.c               | 3371 ++++++++-
  gcc/config/rs6000/rs6000-gen-builtins.c       | 2997 ++++++++
  gcc/config/rs6000/rs6000-overload.def         | 6076 +++++++++++++++++
  gcc/config/rs6000/rs6000.c                    |  219 +-
  gcc/config/rs6000/rs6000.h                    |   82 +
  gcc/config/rs6000/t-rs6000                    |   44 +-
  gcc/gengtype-state.c                          |   29 +-
  gcc/gengtype.c                                |   19 +-
  gcc/gengtype.h                                |    5 +
  .../powerpc/bfp/scalar-extract-exp-2.c        |    2 +-
  .../powerpc/bfp/scalar-extract-sig-2.c        |    2 +-
  .../powerpc/bfp/scalar-insert-exp-2.c         |    2 +-
  .../powerpc/bfp/scalar-insert-exp-5.c         |    2 +-
  .../powerpc/bfp/scalar-insert-exp-8.c         |    2 +-
  .../powerpc/bfp/scalar-test-neg-2.c           |    2 +-
  .../powerpc/bfp/scalar-test-neg-3.c           |    2 +-
  .../powerpc/bfp/scalar-test-neg-5.c           |    2 +-
  .../gcc.target/powerpc/byte-in-set-2.c        |    2 +-
  gcc/testsuite/gcc.target/powerpc/cmpb-2.c     |    2 +-
  gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |    2 +-
  .../gcc.target/powerpc/crypto-builtin-2.c     |   14 +-
  .../powerpc/fold-vec-splat-floatdouble.c      |    4 +-
  .../powerpc/fold-vec-splat-longlong.c         |   10 +-
  .../powerpc/fold-vec-splat-misc-invalid.c     |    8 +-
  .../gcc.target/powerpc/p8vector-builtin-8.c   |    1 +
  gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |    2 +-
  gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |    2 +-
  gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |    2 +-
  gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |    2 +-
  gcc/testsuite/gcc.target/powerpc/pr88100.c    |   12 +-
  .../gcc.target/powerpc/pragma_misc9.c         |    2 +-
  .../gcc.target/powerpc/pragma_power8.c        |    2 +
  .../gcc.target/powerpc/pragma_power9.c        |    3 +
  .../powerpc/test_fpscr_drn_builtin_error.c    |    4 +-
  .../powerpc/test_fpscr_rn_builtin_error.c     |   12 +-
  gcc/testsuite/gcc.target/powerpc/test_mffsl.c |    3 +-
  gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |    2 +-
  .../gcc.target/powerpc/vsu/vec-all-nez-7.c    |    2 +-
  .../gcc.target/powerpc/vsu/vec-any-eqz-7.c    |    2 +-
  .../gcc.target/powerpc/vsu/vec-cmpnez-7.c     |    2 +-
  .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c |    2 +-
  .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c |    2 +-
  .../gcc.target/powerpc/vsu/vec-xl-len-13.c    |    2 +-
  .../gcc.target/powerpc/vsu/vec-xst-len-12.c   |    2 +-
  52 files changed, 17915 insertions(+), 824 deletions(-)
  create mode 100644 gcc/config/rs6000/rbtree.c
  create mode 100644 gcc/config/rs6000/rbtree.h
  create mode 100644 gcc/config/rs6000/rs6000-builtin-new.def
  create mode 100644 gcc/config/rs6000/rs6000-gen-builtins.c
  create mode 100644 gcc/config/rs6000/rs6000-overload.def

Reply via email to