> On Tue, 26 May 2015, Jan Hubicka wrote: > > > > > Now the change does not really translate to great increase of > > > > disambiguations > > > > for Firefox (it seems more in noise). The reason is the pointer_type > > > > globbing > > > > in alias.c. > > > > > > Yeah, we only get the improvement because of some "hack" in the tree > > > alias oracle which also uses the base object for TBAA. > > > > Why that is hack? Dereferencing a pointer makes it clear the type of memory > > location > > pointed to is known, we should use that info. > > > > > > Yeah, we should fix that. And in fact, for cross-language LTO I don't > > > see why > > > > > > union { int a; char c; }; > > > > > > and > > > > > > union { int a; short s; }; > > > > > > should not be compatible - they have a common member after all. So > > > I'd like to glob all unions that have the same size (and as improvement > > > > Well, none of language standards I saw so far expect this to happen. > > Going to extremes, you can always put variable sized char array to union > > and by transitivity glob everything with everything. > > I'm speaking of cross-language LTO - that leaves the language standards > territorry and requires us to apply common sense.
Well, language standards do help here to some degree. C++, Fortran, Ada, Java (probably go too, I did not read it yet) do cover interface to other language (generally C that allows also interoperability between non-C languages). In some cases the inter-operable types needs to be marked (in Fortran, to some degree to C++ by language linkage and Ada) that allows us to punt to alias set 0 in very ezoteric cases. Older Fortran standards seem to be such a case to me. Fortran 2008 has some interesting bits, too. In this way C is the most evil language, because every type in C program can potentially be part of cross-language interface, and, its cross-TU type compatibility rules are aftertought and not a direct application of inter-TU. I am trying trying to understand those rules, be sure that we do is compatible with these standards. We can make our own extensions, but it is a slipperly ground - we need to think of consequences and how much extra compaibility rules we want to commit ourselves to support. In those cases I would go from practical example to refine our unrestanding of the problem. > > Applying this rule you have > > > > union { char a[n]; } compatible with every union and thus also > > union {int a;} > > struct { int a;} > > int a; > > > > Which would disable TBAA completely. > > See ;) At least we have the int a; vs. struct { int a; } issue > with Fortran vs. C compatibility (there is even a PR about this). Yep :) Fortran is bit esoteric though not as esoteric anymore when you do not consider old language standards. http://www.j3-fortran.org/doc/year/10/10-007.pdf look into page 443-455. There are few notes I made about it. The types that binds to C are declared with BIND(C). BIND(C) does not seem to appear in Fortran 77 standard, so legacy codeabses won't do that. We can handle that though - either detect units containing fortran 77 + different language and turn into -fno-strict-aliasing (or more generous globbing) and/or we can add a warning, like -WOdr, that will tell you when Fortran program is interfacing C without BIND(C) attribute or breaks the Fortran's interoperability rules and suggest users to use -fno-strict-aliasing in that case or update the codebase. Fortran has few interesting ideas. - It has type C_SIGNED_CHAR that is defined to be compatible with both unsigned char and signed char (but not char) - It has C_PTR that is the "universal pointer" but only for non-functions - It has C_FUNPTR that is the "universal pointer" but only for functions. We are safe to map both to ptr_type_node. rest of types more or less uniquely match. We can do some globbing for character types or we can just make C_SIGNED_CHAR to have alias set 0. Page 449 deals with interoperability of derived types. A Fortran derived type is interoperable with a C struct type if and only if the Fortran type has the BIND 13 attribute ( 4.5.2), the Fortran derived type and the C struct type have the same number of components, and the 14 components of the Fortran derived type would interoperate with corresponding components of the C struct type 15 as described in 15.3.5 and 15.3.6 if the components were variables. A component of a Fortran derived type and 16 a component of a C struct type correspond if they are declared in the same relative position in their respective 17 type denitions. So structures compare by fileds, ignoring names. There is no Fortran type that is interoperable with a C struct type that contains a bitfield or that contains a 19 exible array member. There is no Fortran type that is interoperable with a C union type. No unions :) A Fortran variable that is a named array is interoperable if and only if its type and type parameters are interoperable , it is not a coarray , it is of explicit shape or assumed size, and if it is of type character its length is not assumed or declared by an expression that is not a constant expression Arrays seems to rule out things with no direct C equivalent. It also continues by saying when the bounds are considered compatible. There is also no legal way to call varargs function from Fortran. THere is also some detail about what variables can be interoperated (no COMMON) and how to bind the C symbols. > > But sizeof (enum Foo) depends on the enum, so no, it won't solve it. > There are no incomplete integer types but incomplete enum types. By incomplete integer types I mean int and char, for example. > (ok, this is probably a GNU extension issue that we have enums larger > than unsigned int). Yep, we also have -fshort-enum, both are GNU extensions and C standard expect all enums to be ints. It does not promise that long enum is compatible with long long, because it is not part of the language. I would say, that for purpposes of canonical type computation we are thus safe to glob all enums to "int" or "unsigned int". For the pointer code, we can just consider enum a *ptr; for incmplete types to be the universal pointer void *ptr. > > First of all retain TYPE_CANONICAL if we have a single source language > (we still have to merge them I guess, and hopefully tree merging will > do the correct thing here). I don't believe it will. Consider struct a {struct b *ptr;} wrt struct b {int a;}; struct a {struct b *ptr;}; their canonicals should be merged. We need to merge TYPE_CANONICAL by at least all type compatibility rules the source language demands, even for single language units. I however do like the idea of streaming TYPE_CANONICAL. First of all it makes variadic types easier. We can arrange for array: int a[p]; to have canonical type int a[*]; The canonical type does not refer local declarations and can go into the global stream and be handled the usual way. When streaming the function body, we will only get a new variant of the type and we can just retain the current logic of type canonical. We also can use TYPE_CANONICAL as they are a ls long as the TYPE_CANONICAL are local to unit (anonymous namespaces for C++). > > Then avoid the transitivity constraint by computing TYPE_CANONICAL > "globally" (thus not requiring incremental compute to work). That's Te transitivity is not about incrementality - yes, it would be nice to avoid the incrementality as it will let us to be significantly stronger in special cases. Thus the idea of placing canonical types of variadic types into global stream. The transitivity is needed by alias.c to work. It starts by a DAG and builds its transitive closure. The edges in the transitive closure corresponds to posisble aliases. This is where the transitivity comes from. I did not put that much of tought into removing that constraint, but it would be interesting to get that defined and right. > going to work only if we record all canonical types and its uses > (&TYPE_CANONICAL) or the main variants that don't have a canonical > type yet (even for cross-language LTO the original type-canonicals > denote minimal coalescing we have to preserve). > > That said, eventually we'd want to stream TYPE_CANONICAL for > correctness (at least for verification that in the end for > two types where the original TYPE_CANONICAL was the same the > LTO idea of TYPE_CANONICAL is also the same - possibly that's > ensured by your type verifier checking the LTO compute computes > the same outcome). Yes, that was one of main things I wanted to check by the verifier. Honza