This is a patch kit that adds the nvptx port to gcc. It contains
preliminary patches to add needed functionality, the target files, and
one somewhat optional patch with additional target tools. There'll be
more patch series, one for the testsuite, and one to make the offload
functionality work with this port. Also required are the previous four
rtl patches, two of which weren't entirely approved yet.
For the moment, I've stripped out all the address space support that got
bogged down in review by brokenness in our representation of address
spaces. The ptx address spaces are of course still defined and used
inside the backend.
Ptx really isn't a usual target - it is a virtual target which is then
translated by another compiler (ptxas) to the final code that runs on
the GPU. There are many restrictions, some imposed by the GPU hardware,
and some by the fact that not everything you'd want can be represented
in ptx. Here are some of the highlights:
* Everything is typed - variables, functions, registers. This can
cause problems with K&R style C or anything else that doesn't
have a proper type internally.
* Declarations are needed, even for undefined variables.
* Can't emit initializers referring to their variable's address since
you can't write forward declarations for variables.
* Variables can be declared only as scalars or arrays, not
structures. Initializers must be in the variable's declared type,
which requires some code in the backend, and it means that packed
pointer values are not representable.
* Since it's a virtual target, we skip register allocation - no good
can probably come from doing that twice. This means asm statements
aren't fixed up and will fail if they use matching constraints.
* No support for indirect jumps, label values, nonlocal gotos.
* No alloca - ptx defines it, but it's not implemented.
* No trampolines.
* No debugging (at all, for now - we may add line number directives).
* Limited C library support - I have a hacked up copy of newlib
that provides a reasonable subset.
* malloc and free are defined by ptx (these appear to be
undocumented), but there isn't a realloc. I have one patch for
Fortran to use a malloc/memcpy helper function in cases where we
know the old size.
All in all, this is not intended to be used as a C (or any other source
language) compiler. I've gone through a lot of effort to make it work
reasonably well, but only in order to get sufficient test coverage from
the testsuites. The intended use for this is only to build it as an
offload compiler, and use it through OpenACC by way of lto1. That leaves
the question of how we should document it - does it need the usual
constraint and option documentation, given that user's aren't expected
to use any of it?
A slightly earlier version of the entire patch kit was bootstrapped and
tested on x86_64-linux. Ok for trunk?
Bernd