Re: Teach gimple_canonical_types_compatible_p about incomplete types

Jan Hubicka Wed, 27 May 2015 14:30:45 -0700

> On Tue, 26 May 2015, Jan Hubicka wrote:
> 
> > > > Now the change does not really translate to great increase of 
> > > > disambiguations
> > > > for Firefox (it seems more in noise). The reason is the pointer_type 
> > > > globbing
> > > > in alias.c.
> > > 
> > > Yeah, we only get the improvement because of some "hack" in the tree
> > > alias oracle which also uses the base object for TBAA.
> > 
> > Why that is hack? Dereferencing a pointer makes it clear the type of memory 
> > location
> > pointed to is known, we should use that info.
> > > 
> > > Yeah, we should fix that.  And in fact, for cross-language LTO I don't
> > > see why
> > > 
> > >   union { int a; char c; };
> > > 
> > > and
> > > 
> > >   union { int a; short s; };
> > > 
> > > should not be compatible - they have a common member after all.  So
> > > I'd like to glob all unions that have the same size (and as improvement
> > 
> > Well, none of language standards I saw so far expect this to happen.
> > Going to extremes, you can always put variable sized char array to union
> > and by transitivity glob everything with everything.
> 
> I'm speaking of cross-language LTO - that leaves the language standards
> territorry and requires us to apply common sense.


Well, language standards do help here to some degree.  C++, Fortran, Ada, Java
(probably go too, I did not read it yet) do cover interface to other language
(generally C that allows also interoperability between non-C languages).  In
some cases the inter-operable types needs to be marked (in Fortran, to some
degree to C++ by language linkage and Ada) that allows us to punt to alias set
0 in very ezoteric cases. Older Fortran standards seem to be such a case to me.
Fortran 2008 has some interesting bits, too.

In this way C is the most evil language, because every type in C program can
potentially be part of cross-language interface, and, its cross-TU type
compatibility rules are aftertought and not a direct application of inter-TU.

I am trying trying to understand those rules, be sure that we do is compatible 
with
these standards.  We can make our own extensions, but it is a slipperly ground
- we need to think of consequences and how much extra compaibility rules we
want to commit ourselves to support.

In those cases I would go from practical example to refine our unrestanding of
the problem.

> > Applying this rule you have
> > 
> > union { char a[n]; } compatible with every union and thus also
> > union {int a;}
> > struct { int a;}
> > int a;
> > 
> > Which would disable TBAA completely.
> 
> See ;)  At least we have the int a; vs. struct { int a; } issue
> with Fortran vs. C compatibility (there is even a PR about this).

Yep :) Fortran is bit esoteric though not as esoteric anymore when you do not
consider old language standards.
http://www.j3-fortran.org/doc/year/10/10-007.pdf
look into page 443-455. There are few notes I made about it.

The types that binds to C are declared with BIND(C). BIND(C) does not seem to
appear in Fortran 77 standard, so legacy codeabses won't do that.  We can
handle that though - either detect units containing fortran 77 + different
language and turn into -fno-strict-aliasing (or more generous globbing) and/or
we can add a warning, like -WOdr, that will tell you when Fortran program is
interfacing C without BIND(C) attribute or breaks the Fortran's
interoperability rules and suggest users to use -fno-strict-aliasing in that
case or update the codebase.

Fortran has few interesting ideas.
 - It has type C_SIGNED_CHAR that is defined to be compatible with both unsigned
   char and signed char (but not char)
 - It has C_PTR that is the "universal pointer" but only for non-functions
 - It has C_FUNPTR that is the "universal pointer" but only for functions.  We
   are safe to map both to ptr_type_node.

rest of types more or less uniquely match.  We can do some globbing for 
character
types or we can just make C_SIGNED_CHAR to have alias set 0.

Page 449 deals with interoperability of derived types.

  A  Fortran  derived  type  is interoperable with  a  C  struct  type  if  and
  only  if  the  Fortran  type  has  the BIND 13 attribute ( 4.5.2), the Fortran
  derived type and the C struct type have the same number of components, and the
  14 components of the Fortran derived type would interoperate with 
corresponding
  components of the C struct type 15 as described in 15.3.5 and 15.3.6 if the
  components were variables.  A component of a Fortran derived type and 16 a
  component of a C struct type correspond if they are declared in the same
  relative position in their respective 17 type denitions.

So structures compare by fileds, ignoring names.

  There is no Fortran type that is interoperable with a C struct type that
  contains a bitfield or that contains a 19 exible array member.  There is no
  Fortran type that is interoperable with a C union type.

No unions :)

   A Fortran variable that is a named array is interoperable if and only if its
   type and type parameters are interoperable , it is not a coarray , it is of
   explicit shape or assumed size, and if it is of type character its length is
   not assumed or declared by an expression that is not a constant expression

Arrays seems to rule out things with no direct C equivalent.  It also continues
by saying when the bounds are considered compatible.

There is also no legal way to call varargs function from Fortran.
THere is also some detail about what variables can be interoperated (no COMMON) 
and
how to bind the C symbols.
> 
> But sizeof (enum Foo) depends on the enum, so no, it won't solve it.
> There are no incomplete integer types but incomplete enum types.

By incomplete integer types I mean int and char, for example.

> (ok, this is probably a GNU extension issue that we have enums larger
> than unsigned int).

Yep, we also have -fshort-enum, both are GNU extensions and C standard expect
all enums to be ints. It does not promise that long enum is compatible with
long long, because it is not part of the language.

I would say, that for purpposes of canonical type computation we are thus safe
to glob all enums to "int" or "unsigned int".  For the pointer code, we can just
consider 
enum a *ptr;
for incmplete types to be the universal pointer void *ptr.
> 
> First of all retain TYPE_CANONICAL if we have a single source language
> (we still have to merge them I guess, and hopefully tree merging will
> do the correct thing here).

I don't believe it will.  Consider

struct a {struct b *ptr;}

wrt

struct b {int a;};
struct a {struct b *ptr;};

their canonicals should be merged.  We need to merge TYPE_CANONICAL by at least
all type compatibility rules the source language demands, even for single
language units.

I however do like the idea of streaming TYPE_CANONICAL.  First of all it makes
variadic types easier.  We can arrange for array:

int a[p];

to have canonical type

int a[*];

The canonical type does not refer local declarations and can go into the global
stream and be handled the usual way.  When streaming the function body, we will
only get a new variant of the type and we can just retain the current logic
of type canonical.

We also can use TYPE_CANONICAL as they are a ls long as the TYPE_CANONICAL
are local to unit (anonymous namespaces for C++).
> 
> Then avoid the transitivity constraint by computing TYPE_CANONICAL
> "globally" (thus not requiring incremental compute to work).  That's

Te transitivity is not about incrementality - yes, it would be nice to avoid
the incrementality as it will let us to be significantly stronger in special
cases.  Thus the idea of placing canonical types of variadic types into
global stream.

The transitivity is needed by alias.c to work. It starts by a DAG and builds
its transitive closure. The edges in the transitive closure corresponds to
posisble aliases. This is where the transitivity comes from.
I did not put that much of tought into removing that constraint, but it
would be interesting to get that defined and right.

> going to work only if we record all canonical types and its uses
> (&TYPE_CANONICAL) or the main variants that don't have a canonical
> type yet (even for cross-language LTO the original type-canonicals
> denote minimal coalescing we have to preserve).
> 
> That said, eventually we'd want to stream TYPE_CANONICAL for
> correctness (at least for verification that in the end for
> two types where the original TYPE_CANONICAL was the same the
> LTO idea of TYPE_CANONICAL is also the same - possibly that's
> ensured by your type verifier checking the LTO compute computes
> the same outcome).

Yes, that was one of main things I wanted to check by the verifier.

Honza

Re: Teach gimple_canonical_types_compatible_p about incomplete types

Reply via email to