Re: r359960 - Reduce amount of work ODR hashing does.

David Blaikie via cfe-commits Tue, 14 May 2019 11:59:39 -0700

On Tue, May 7, 2019 at 7:40 PM Richard Trieu <rtr...@google.com> wrote:
>
>
> From: David Blaikie <dblai...@gmail.com>
> Date: Mon, May 6, 2019 at 4:39 PM
> To: Richard Trieu
> Cc: cfe-commits
>
>> On Mon, May 6, 2019 at 4:24 PM Richard Trieu <rtr...@google.com> wrote:
>> >
>> > There was no cycle for this crash.
>>
>> Oh, yeah, didn't mean to imply there were - but that a system designed
>> to prevent cycles might also be used/help prevent redundant work like
>> this.
>>
>> > What happened is that an exponential runtime is reduced to a linear 
>> > runtime.  Without this revision, ODR hashing would have worked if the 
>> > machine had enough memory and the user waited long enough.
>> >
>> > void foo(int a, int b) {}
>> > When computing the ODR hash for function foo, it will visit the type int 
>> > twice, once per parameter.  In general, re-visiting types shouldn't be a 
>> > problem, and in most cases, should be pretty fast.
>>
>> It does mean some potentially problematic worst-case situations where
>> non-trivial types are mentioned more than once (eg: if, instead of
>> int, it was a complex struct type - it wouldn't cycle, but it would do
>> all that work twice (or many more times if it appears in more places
>> in the entity being hashed)
>
>
> See below in the answer to DWARF.  ODRHash did have a system, it worked for a 
> while until it didn't, and was since removed.
>>
>>
>> > class S {
>> >   void bar(S* s);
>> > };
>> > There's actually two ways to visit the Decl behind S, 
>> > ODR::AddCXXRecordDecl and ODR::AddDecl.  When computing the ODR hash of S, 
>> > ODR::AddCXXRecordDecl is used for a deep dive into the AST of S.  When 
>> > reaching S another way, (via FunctionDecl bar, parameter s, PointerType 
>> > S*, RecordType S), then the CXXRecordDecl gets processed through 
>> > ODR::AddDecl, which only processes enough information to identify S, but 
>> > not any of its deeper details.  This allows self-reference without 
>> > introducing cycles.
>>
>> Ah, OK - specifically to break the cycle.
>>
>> So the ODR hash of the function "void f(S*)" doesn't hash the
>> implementation of S, (it uses AddDecl, not AddCXXRecordDecl)? But if
>> it were "void f(S)" it would hash S? What about a member function that
>> takes a parameter by value? ("struct S { void bar(S); }")
>
>
> The three functions AddCXXRecordDecl, AddFunctionDecl, and AddEnumDecl are 
> the entry points from outside to use the ODRHash and nothing inside ODRHash 
> will call these functions.  That means hashing "class S {};"  
> AddCXXRecordDecl is called with S.  Every other example, "void f(S)", "void 
> bar(S);", etc will be called into AddDecl.  The next question is probably, 
> how do you know if two functions "void f(S)" in two files refer to same class 
> S?  The answer is, ODRHash doesn't know and doesn't care.  But when Clang 
> imports both "void f(S)" functions, it will also import both S classes.  
> Since Clang checks, ODRHash doesn't need to.


Clang checks this by entering ODRHash separately (via AddCXXRecordDecl).

So it's sort of an inductive proof based on the assumptions of the ODR
- >"void f(S)" is ODR compliant if S is ODR compliant<, etc.

That makes sense, though still does mean potentially a lot of
redundant work in cases like this bug, but never circular work -
because anonymous types can't form cycles (because they have no name
to do that).

I think it still /might/ be worth using a type stack/hash so there's
never redundant work, even if it's non-circular work.

>>
>>
>> > I think it would be possible to add some checks in debug mode to catch 
>> > cycles.  I'm not sure it can detect redundant work as the function foo 
>> > example above shows that visiting the same types over multiple times is 
>> > expected.
>>
>> Both for efficiency and to avoid these cycles, it might be worthwhile
>> to consider a different way to resolve this issue
>>
>> The reason these ideas come to my mind is that DWARF has a type hash
>> that works in a different way to avoid cycles and redundant work.
>>
>> http://dwarfstd.org/doc/DWARF5.pdf - 7.32, Type Signature Computation.
>> It works by assigning every type a number when it's first encountered
>> (even before its contents are hashed), and if it's ever encountered
>> again, hash the number again rather than going back into hashing the
>> implementation.
>>
> Originally, ODR hashing did have a system similar to what DWARF had.  
> Relevant portions of 7.32 are 1, 4.a, and 4.b.  Basically, maintain a list of 
> Type's, when first visiting a Type, add it to the list and process it, and if 
> the Type is ever seen again, use the index number instead reprocessing.  
> Worked well, and then the AST had a small change in it where now we needed 
> two different Type's to hash to the same thing.  
> https://reviews.llvm.org/rL335853 ripped this out.  It's possible to replace 
> it, but it needs to be better than what we currently have.

Ah - I think I understand/see. The uniqueness of Type*s is based on
the profile hashing and it varies in interesting ways? That seems
vaguely disconcerting - Hmm, in r335853, could you walk me through how
the Type* DenseMap broke the ODR type hashing?

- Dave

>
>>
>> This way no type is hashed more than once, avoiding cycles and redundant 
>> work.
>>
>> >
>> >
>> >
>> > From: David Blaikie <dblai...@gmail.com>
>> > Date: Sat, May 4, 2019 at 9:06 AM
>> > To: Richard Trieu
>> > Cc: cfe-commits
>> >
>> >> Does the ODR hashing have some sort of cycle breaking infrastructure -
>> >> so that if the same type is seen more than once (eg: classes have
>> >> members that have pointers back to the outer class type, etc) they
>> >> don't cause indefinite cycles? Should that infrastructure have caught
>> >> these cases & avoided the redundant work?
>> >>
>> >> I'm curious to understand better how these things work/overlap/or don't.
>> >>
>> >> On Fri, May 3, 2019 at 9:20 PM Richard Trieu via cfe-commits
>> >> <cfe-commits@lists.llvm.org> wrote:
>> >> >
>> >> > Author: rtrieu
>> >> > Date: Fri May  3 21:22:33 2019
>> >> > New Revision: 359960
>> >> >
>> >> > URL: http://llvm.org/viewvc/llvm-project?rev=359960&view=rev
>> >> > Log:
>> >> > Reduce amount of work ODR hashing does.
>> >> >
>> >> > When a FunctionProtoType is in the original type in a DecayedType, the 
>> >> > decayed
>> >> > type is a PointerType which points back the original FunctionProtoType. 
>> >> >  The
>> >> > visitor for ODRHashing will attempt to process both Type's, doing 
>> >> > double work.
>> >> > By chaining together multiple DecayedType's and FunctionProtoType's, 
>> >> > this would
>> >> > result in 2^N Type's visited only N DecayedType's and N 
>> >> > FunctionProtoType's
>> >> > exsit.  Another bug where VisitDecayedType and VisitAdjustedType did
>> >> > redundant work doubled the work at each level, giving 4^N Type's 
>> >> > visited.  This
>> >> > patch removed the double work and detects when a FunctionProtoType 
>> >> > decays to
>> >> > itself to only check the Type once.  This lowers the exponential 
>> >> > runtime to
>> >> > linear runtime.  Fixes https://bugs.llvm.org/show_bug.cgi?id=41625
>> >> >
>> >> > Modified:
>> >> >     cfe/trunk/lib/AST/ODRHash.cpp
>> >> >     cfe/trunk/test/Modules/odr_hash.cpp
>> >> >
>> >> > Modified: cfe/trunk/lib/AST/ODRHash.cpp
>> >> > URL: 
>> >> > http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/AST/ODRHash.cpp?rev=359960&r1=359959&r2=359960&view=diff
>> >> > ==============================================================================
>> >> > --- cfe/trunk/lib/AST/ODRHash.cpp (original)
>> >> > +++ cfe/trunk/lib/AST/ODRHash.cpp Fri May  3 21:22:33 2019
>> >> > @@ -703,14 +703,36 @@ public:
>> >> >    void VisitType(const Type *T) {}
>> >> >
>> >> >    void VisitAdjustedType(const AdjustedType *T) {
>> >> > -    AddQualType(T->getOriginalType());
>> >> > -    AddQualType(T->getAdjustedType());
>> >> > +    QualType Original = T->getOriginalType();
>> >> > +    QualType Adjusted = T->getAdjustedType();
>> >> > +
>> >> > +    // The original type and pointee type can be the same, as in the 
>> >> > case of
>> >> > +    // function pointers decaying to themselves.  Set a bool and only 
>> >> > process
>> >> > +    // the type once, to prevent doubling the work.
>> >> > +    SplitQualType split = Adjusted.split();
>> >> > +    if (auto Pointer = dyn_cast<PointerType>(split.Ty)) {
>> >> > +      if (Pointer->getPointeeType() == Original) {
>> >> > +        Hash.AddBoolean(true);
>> >> > +        ID.AddInteger(split.Quals.getAsOpaqueValue());
>> >> > +        AddQualType(Original);
>> >> > +        VisitType(T);
>> >> > +        return;
>> >> > +      }
>> >> > +    }
>> >> > +
>> >> > +    // The original type and pointee type are different, such as in 
>> >> > the case
>> >> > +    // of a array decaying to an element pointer.  Set a bool to false 
>> >> > and
>> >> > +    // process both types.
>> >> > +    Hash.AddBoolean(false);
>> >> > +    AddQualType(Original);
>> >> > +    AddQualType(Adjusted);
>> >> > +
>> >> >      VisitType(T);
>> >> >    }
>> >> >
>> >> >    void VisitDecayedType(const DecayedType *T) {
>> >> > -    AddQualType(T->getDecayedType());
>> >> > -    AddQualType(T->getPointeeType());
>> >> > +    // getDecayedType and getPointeeType are derived from 
>> >> > getAdjustedType
>> >> > +    // and don't need to be separately processed.
>> >> >      VisitAdjustedType(T);
>> >> >    }
>> >> >
>> >> >
>> >> > Modified: cfe/trunk/test/Modules/odr_hash.cpp
>> >> > URL: 
>> >> > http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Modules/odr_hash.cpp?rev=359960&r1=359959&r2=359960&view=diff
>> >> > ==============================================================================
>> >> > --- cfe/trunk/test/Modules/odr_hash.cpp (original)
>> >> > +++ cfe/trunk/test/Modules/odr_hash.cpp Fri May  3 21:22:33 2019
>> >> > @@ -4587,6 +4587,43 @@ int num = bar();
>> >> >  #endif
>> >> >  }
>> >> >
>> >> > +namespace FunctionProtoTypeDecay {
>> >> > +#if defined(FIRST)
>> >> > +struct S1 {
>> >> > +  struct X {};
>> >> > +  using Y = X(X());
>> >> > +};
>> >> > +#elif defined(SECOND)
>> >> > +struct S1 {
>> >> > +  struct X {};
>> >> > +  using Y = X(X(X()));
>> >> > +};
>> >> > +#else
>> >> > +S1 s1;
>> >> > +// expected-error@first.h:* {{'FunctionProtoTypeDecay::S1::Y' from 
>> >> > module 'FirstModule' is not present in definition of 
>> >> > 'FunctionProtoTypeDecay::S1' in module 'SecondModule'}}
>> >> > +// expected-note@second.h:* {{declaration of 'Y' does not match}}
>> >> > +#endif
>> >> > +
>> >> > +#if defined(FIRST)
>> >> > +struct S2 {
>> >> > +  struct X {};
>> >> > +  using Y =
>> >> > +      X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(
>> >> > +      X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(
>> >> > +      X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(
>> >> > +      X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(X(
>> >> > +      ))))))))))))))))
>> >> > +      ))))))))))))))))
>> >> > +      ))))))))))))))))
>> >> > +      ))))))))))))))));
>> >> > +};
>> >> > +#elif defined(SECOND)
>> >> > +#else
>> >> > +S2 s2;
>> >> > +#endif
>> >> > +
>> >> > +}
>> >> > +
>> >> >  // Keep macros contained to one file.
>> >> >  #ifdef FIRST
>> >> >  #undef FIRST
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > cfe-commits mailing list
>> >> > cfe-commits@lists.llvm.org
>> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: r359960 - Reduce amount of work ODR hashing does.

Reply via email to