Some time ago, we submitted an RFC for the introduction of UPC support into GCC. During the intervening time period, we have continued to keep the 'gupc' (GNU UPC) branch in sync with the GCC trunk and have incorporated feedback and contributions from various GCC developers (Joseph Myers, Tom Tromey, Jakub Jelinek, Richard Henderson, Meador Inge, and others). We have also implemented various bug fixes and improvements.
At this time, we would like to re-submit the UPC patches for comment with the goal of introducing these changes into GCC 6.0. This email provides an overview of UPC and summarizes the impact of UPC changes on the GCC front-end. Subsequent emails will include various patch sets which are grouped by the area of GCC that they impact (front-end, generic, documentation, build, test, target-specific, and so on), so that they can receive a more focused review by their respective maintainers. The main review-related changes are: * GUPC is no longer implemented as a separate language (e.g., Objective-C or C++) compiler. Rather, a new -fupc switch has been added, which enables UPC support in the C compiler. * The UPC blocking factor now only uses two of the tree's "spare" bits. If the UPC blocking factor is not the default value of 1 or the "indefinite" value of 0, then it is recorded in a separate hash table, indexed by the tree node. * UPC-specific tree support has been integrated into gcc/c-family/c-common.def and gcc/c-family/c-common.h. * The number of UPC-specific configuration options have been reduced. * The UPC pointer-to-shared format per-target configuration has been simplified. Before, both a "packed" and a "struct" pointer-to-shared representation was supported. Now, only the "struct" format is supported and various configuration options for tweaking field sizes and such have been removed. * In keeping with current GCC development guidelines target macros are no longer used. Rather, where needed, target hooks are defined and used. * FIXME's and TODO's were either fixed or cleaned up. * The copyright and license notices were updated. * The code was reviewed for conformance to coding standards and updated. * Diagnostics now use appropriate format strings rather than building up the strings with sprintf(). * Files in c-family/ no longer include c-tree.h to conform with modularization improvements. * Most of the #ifdef conditionals have been removed. Some target hooks have been defined and documented in tm.texi. * The code was reviewed to verify that it conforms with current GCC coding practices and that it incorporates cleanups done in the past several years. * Comments were added to most new functions, and typos and spelling errors in comments were fixed. * Changes that appeared in the diff's that were unrelated to UPC were removed or incorporated into the trunk. * The linkage to the libgupc library was changed to use the newly defined method (used in libgomp/libgo for example) of including library 'spec' files. This led to a simplification where we no longer needed to add UPC-specific spec. files in various target-specific config. directories. Introduction: UPC-related Changes --------------------------------- Below, various UPC-related changes are summarized. This introduction is provided as background for review of the UPC changes implemented in the GUPC branch. Each individual change will be discussed in more detail in the patch sets found in the following emails. The current GUPC branch is based upon a recent version of the GCC trunk and has been bootstrapped on x86_64/i686 Linux, x86_64 Darwin, IA64/Altix Linux, PowerPC Power7 (big endian), and Power8 (little endian). Also some testing has been done on various flavors of BSD and Solaris and in the past MIPS was tested and supported. All languages (c, c++, fortran, go, lto, objc, obj-c++) have been bootstrapped; no test suite regressions were introduced, relative to the GCC trunk. The GUPC branch is described here: http://gcc.gnu.org/projects/gupc.html The UPC-related source code differences are summarized here: http://gccupc.org/gupc-changes In the discussion below, some changes are excerpted in order to highlight important aspects of the changes. UPC's Shared Qualifier and Layout Qualifier ------------------------------------------- The UPC language specification describes the language syntax and semantics: http://upc.lbl.gov/publications/upc-spec-1.3.pdf UPC introduces a new qualifier, "shared" that indicates that the qualified object is located in a global shared address space that is accessible by all UPC threads. Additional qualifiers ("strict" and "relaxed") further specify the semantics of accesses to UPC shared objects. In UPC, a shared qualified array can optionally specify a "layout qualifier" that indicates how the shared data is blocked and distributed across UPC threads. There are two language pre-defined identifiers that indicate the number of threads that will be created when the program starts (THREADS) and the current (zero-based) thread number (MYTHREAD). Typically, a UPC thread is implemented as an operating system process, though they may be mapped to pthreads, when compiled with the -fupc-pthreads-model-tls switch. Access to UPC shared memory may be implemented locally via OS provided facilities (for example, mmap), or across nodes via a high speed network inter-connect (for example, Infiniband). GUPC provides a runtime (libgupc) that targets an SMP-based system that uses mmap() to implement global shared memory. Optionally, GUPC can use the more general and more capable Berkeley UPCR runtime: http://upc.lbl.gov/download/source.shtml#runtime The UPCR runtime supports a number of network topologies, and has been ported to most of the current High Performance Computing (HPC) systems. The following example illustrates the use of the UPC "shared" qualifier combined with a layout qualifier. #define BLKSIZE 5 #define N_PER_THREAD (4 * BLKSIZE) shared [BLKSIZE] double A[N_PER_THREAD*THREADS]; Above the "[BLKSIZE]" construct is the UPC layout factor; this specifies that the shared array, A, distributes its elements across each thread in blocks of 5 elements. If the program is run with two threads, then A is distributed as shown below: Thread 0 Thread 1 -------- --------- A[ 0.. 4] A[ 5.. 9] A[10..14] A[15..19] A[20..24] A[25..29] A[30..34] A[35..39] The elements shown for thread 0 are defined as having "affinity" to thread 0. Similarly, those elements shown for thread 1 have affinity to thread 1. In UPC, a pointer to a shared object can be cast to a thread local pointer (a "C" pointer), when the designated shared object has affinity to the referencing thread. A UPC "pointer-to-shared" (PTS) is a pointer that references a UPC shared object. A UPC pointer-to-shared is a "fat" pointer with the following logical fields: (virt_addr, thread, phase) The virtual address (virt_addr) field is combined with the thread number (thread) to derive the location of the referenced object within the UPC shared address space. The phase field is used keep track of the current block offset for PTS's that have blocking factor that is greater than one. GUPC implements pointer-to-shared objects using a "struct" representation. Until recently, GUPC also supported a "packed" representation, which is more space efficient, but limits the range of various fields in the UPC pointer-to-shared representation. We have decided to support only the "struct" representation so that the compiler uses a single ABI that supports the full range of addresses, threads, and blocking factors. GCC's internal tree representation is extended to record the UPC "shared", "strict", "relaxed" qualifiers, and the layout qualifier. --- gcc/tree-core.h (.../trunk) (revision 228959) +++ gcc/tree-core.h (.../branches/gupc) (revision 229159) @@ -470,7 +470,11 @@ enum cv_qualifier { TYPE_QUAL_CONST = 0x1, TYPE_QUAL_VOLATILE = 0x2, TYPE_QUAL_RESTRICT = 0x4, - TYPE_QUAL_ATOMIC = 0x8 + TYPE_QUAL_ATOMIC = 0x8, + /* UPC qualifiers */ + TYPE_QUAL_SHARED = 0x10, + TYPE_QUAL_RELAXED = 0x20, + TYPE_QUAL_STRICT = 0x40 }; [...] @@ -857,9 +875,14 @@ struct GTY(()) tree_base { unsigned user_align : 1; unsigned nameless_flag : 1; unsigned atomic_flag : 1; - unsigned spare0 : 3; - - unsigned spare1 : 8; + unsigned shared_flag : 1; + unsigned strict_flag : 1; + unsigned relaxed_flag : 1; + + unsigned threads_factor_flag : 1; + unsigned block_factor_0 : 1; + unsigned block_factor_x : 1; + unsigned spare1 : 5; UPC defines a few additional tree node types: --- gcc/c-family/c-common.def (.../trunk) (revision 228959) +++ gcc/c-family/c-common.def (.../branches/gupc) (revision 229159) @@ -62,6 +62,24 @@ DEFTREECODE (SIZEOF_EXPR, "sizeof_expr", Operand 3 is the stride. */ DEFTREECODE (ARRAY_NOTATION_REF, "array_notation_ref", tcc_reference, 4) +/* Used to represent a `upc_forall' statement. The operands are + UPC_FORALL_INIT_STMT, UPC_FORALL_COND, UPC_FORALL_EXPR, + UPC_FORALL_BODY, and UPC_FORALL_AFFINITY respectively. */ + +DEFTREECODE (UPC_FORALL_STMT, "upc_forall_stmt", tcc_statement, 5) + +/* Used to represent a UPC synchronization statement. The first + operand is the synchronization operation, UPC_SYNC_OP: + UPC_SYNC_NOTIFY_OP 1 Notify operation + UPC_SYNC_WAIT_OP 2 Wait operation + UPC_SYNC_BARRIER_OP 3 Barrier operation + + The second operand, UPC_SYNC_ID is the (optional) expression + whose value specifies the barrier identifier which is checked + by the various synchronization operations. */ + +DEFTREECODE (UPC_SYNC_STMT, "upc_sync_stmt", tcc_statement, 2) + The "C" parser is extended to recognize UPC's syntactic extensions. --- gcc/c-family/c-common.c (.../trunk) (revision 228959) +++ gcc/c-family/c-common.c (.../branches/gupc) (revision 229159) @@ -412,8 +426,9 @@ static int resort_field_decl_cmp (const C --std=c89: D_C99 | D_CXXONLY | D_OBJC | D_CXX_OBJC C --std=c99: D_CXXONLY | D_OBJC ObjC is like C except that D_OBJC and D_CXX_OBJC are not set - C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC - C++ --std=c0x: D_CONLY | D_OBJC + UPC is like C except that D_UPC is not set + C++ --std=c98: D_CONLY | D_CXXOX | D_OBJC | D_UPC + C++ --std=c0x: D_CONLY | D_OBJC | D_UPC ObjC++ is like C++ except that D_OBJC is not set [...] @@ -629,6 +644,19 @@ const struct c_common_resword c_common_r { "inout", RID_INOUT, D_OBJC }, { "oneway", RID_ONEWAY, D_OBJC }, { "out", RID_OUT, D_OBJC }, + + /* UPC keywords */ + { "shared", RID_SHARED, D_UPC }, + { "relaxed", RID_RELAXED, D_UPC }, + { "strict", RID_STRICT, D_UPC }, + { "upc_barrier", RID_UPC_BARRIER, D_UPC }, + { "upc_blocksizeof", RID_UPC_BLOCKSIZEOF, D_UPC }, + { "upc_elemsizeof", RID_UPC_ELEMSIZEOF, D_UPC }, + { "upc_forall", RID_UPC_FORALL, D_UPC }, + { "upc_localsizeof", RID_UPC_LOCALSIZEOF, D_UPC }, + { "upc_notify", RID_UPC_NOTIFY, D_UPC }, + { "upc_wait", RID_UPC_WAIT, D_UPC }, + --- gcc/c/c-parser.c (.../trunk) (revision 228959) +++ gcc/c/c-parser.c (.../branches/gupc) (revision 229159) [...] +/* These UPC parser functions are only ever called when + compiling UPC. */ +static void c_parser_upc_forall_statement (c_parser *); +static void c_parser_upc_sync_statement (c_parser *, int); +static void c_parser_upc_shared_qual (source_location, + c_parser *, + struct c_declspecs *); + [...] + /* UPC qualifiers */ + case RID_SHARED: + attrs_ok = true; + c_parser_upc_shared_qual (loc, parser, specs); + break; + case RID_STRICT: + case RID_RELAXED: + attrs_ok = true; + declspecs_add_qual (loc, specs, c_parser_peek_token (parser)->value); + c_parser_consume_token (parser); + break; [...] + /* Process all #pragma's just after the opening brace. This + handles #pragma upc, which can only appear just after + the opening brace, when it appears within a function body. */ + push_upc_consistency_mode (); + permit_pragma_upc (); + while (c_parser_next_token_is (parser, CPP_PRAGMA)) + { + location_t loc ATTRIBUTE_UNUSED = c_parser_peek_token (parser)->location; + if (c_parser_pragma (parser, pragma_compound)) + last_label = false, last_stmt = true; + parser->error = false; + } + deny_pragma_upc (); [...] + case RID_UPC_FORALL: + gcc_assert (flag_upc); + c_parser_upc_forall_statement (parser); + break; + case RID_UPC_NOTIFY: + gcc_assert (flag_upc); + c_parser_upc_sync_statement (parser, UPC_SYNC_NOTIFY_OP); + goto expect_semicolon; + case RID_UPC_WAIT: + gcc_assert (flag_upc); + c_parser_upc_sync_statement (parser, UPC_SYNC_WAIT_OP); + goto expect_semicolon; + case RID_UPC_BARRIER: + gcc_assert (flag_upc); + c_parser_upc_sync_statement (parser, UPC_SYNC_BARRIER_OP); + goto expect_semicolon; [...] case RID_SIZEOF: return c_parser_sizeof_expression (parser); + case RID_UPC_BLOCKSIZEOF: + case RID_UPC_ELEMSIZEOF: + case RID_UPC_LOCALSIZEOF: + gcc_assert (flag_upc); + return c_parser_sizeof_expression (parser); [...] --- gcc/c-family/c-pragma.c (.../trunk) (revision 228959) +++ gcc/c-family/c-pragma.c (.../branches/gupc) (revision 229159) [...] +/* + * #pragma upc strict + * #pragma upc relaxed + * #pragma upc upc_code + * #pragma upc c_code + */ +static void +handle_pragma_upc (cpp_reader * ARG_UNUSED (dummy)) +{ [...] c-decl.c handles the additional UPC qualifiers and declspecs. The layout qualifier is handled here: --- gcc/c/c-decl.c (.../trunk) (revision 228959) +++ gcc/c/c-decl.c (.../branches/gupc) (revision 229159) [...] + /* A UPC layout qualifier is encoded as an ARRAY_REF, + further, it implies the presence of the 'shared' keyword. */ + if (TREE_CODE (qual) == ARRAY_REF) + { + if (specs->upc_layout_qualifier) + { + error ("two or more layout qualifiers specified"); + return specs; + } + else + { + specs->upc_layout_qualifier = qual; + qual = ridpointers[RID_SHARED]; + } + } In UPC, a qualifier includes both the traditional "C" qualifier flags and the UPC "layout qualifier". Thus, the pointer_quals field of a declarator node is defined as a struct including both qualifier flags and the UPC type qualifier, as shown below. /* Process type qualifiers (such as const or volatile) that were given inside the `*'. */ - type_quals = declarator->u.pointer_quals; + type_quals = declarator->u.pointer.quals; + upc_layout_qualifier = declarator->u.pointer.upc_layout_qual; + sharedp = ((type_quals & TYPE_QUAL_SHARED) != 0); UPC shared variables are allocated at runtime in the global memory that is allocated and managed by the UPC runtime. A separate link section is used as a method of assigning virtual addresses to UPC shared variables. The UPC shared variable section is designated as a "no load" section on systems that support that facility; in that case, the linkage section begins at virtual address zero. The logic below assigns UPC shared variables to their own linkage section. + /* Shared variables are given their own link section on + most target platforms, and if compiling in pthreads mode + regular local file scope variables are made thread local. */ + if ((TREE_CODE(decl) == VAR_DECL) + && !threadp && (TREE_SHARED (decl) || flag_upc_pthreads)) + upc_set_decl_section (decl); + Patches ------- The patches are organized into the following categories and will be sent out as separate email messages. [UPC 01/22] front-end changes [UPC 02/22] tree-related changes [UPC 03/22] options processing, driver [UPC 04/22] Make, Config changes [UPC 05/22] language hooks changes [UPC 06/22] target hooks [UPC 07/22] lowering, pointer-to-shared ops [UPC 08/22] target - Darwin [UPC 09/22] target - x86 [UPC 10/22] target - rs6000 [UPC 11/22] documentation [UPC 12/22] DWARF support [UPC 13/22] C++ changes [UPC 14/22] constant folding changes [UPC 15/22] RTL changes [UPC 16/22] gimple/gimplify changes [UPC 17/22] misc/common changes [UPC 18/22] libatomic changes [UPC 19/22] libgupc - Make, Configure [UPC 20/22] libgupc runtime library [UPC 21/22] gcc.dg test suite [UPC 22/22] libgupc test suite thanks, - Gary