yaxunl marked 4 inline comments as done.
yaxunl added inline comments.

================
Comment at: clang/lib/Sema/Sema.cpp:1514
+  void visitUsedDecl(SourceLocation Loc, Decl *D) {
+    if (auto *TD = dyn_cast<TranslationUnitDecl>(D)) {
+      for (auto *DD : TD->decls()) {
----------------
rjmccall wrote:
> yaxunl wrote:
> > rjmccall wrote:
> > > bader wrote:
> > > > yaxunl wrote:
> > > > > rjmccall wrote:
> > > > > > yaxunl wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > erichkeane wrote:
> > > > > > > > > rjmccall wrote:
> > > > > > > > > > erichkeane wrote:
> > > > > > > > > > > Note that when recommitting this (if you choose to), this 
> > > > > > > > > > > needs to also handle NamespaceDecl.  We're a downstream 
> > > > > > > > > > > and discovered that this doesn't properly handle 
> > > > > > > > > > > functions or records handled in a namespace.
> > > > > > > > > > > 
> > > > > > > > > > > It can be implemented identically to TranslationUnitDecl.
> > > > > > > > > > Wait, what?  We shouldn't be doing this for 
> > > > > > > > > > TranslationUnitDecl either.   I don't even know how we're 
> > > > > > > > > > "using" a TranslationUnitDecl, but neither this case not 
> > > > > > > > > > the case for `NamespaceDecl` should be recursively using 
> > > > > > > > > > every declaration declared inside it.  If there's a 
> > > > > > > > > > declaration in a namespace that's being used, it should be 
> > > > > > > > > > getting visited as part of the actual use of it.
> > > > > > > > > > 
> > > > > > > > > > The logic for `RecordDecl` has the same problem.  
> > > > > > > > > Despite the name, this seems to be more of a home-written ast 
> > > > > > > > > walking class.  The entry point is the 'translation unit' 
> > > > > > > > > which seems to walk through everything in an attempt to find 
> > > > > > > > > all the functions (including those that are 'marked' as used 
> > > > > > > > > by an attribute).
> > > > > > > > > 
> > > > > > > > > You'll see the FunctionDecl section makes this assumption as 
> > > > > > > > > well (not necessarily that we got to a function via a call). 
> > > > > > > > > IMO, this approach is strange, and we should register entry 
> > > > > > > > > points in some manner (functions marked as emitted to the 
> > > > > > > > > device in some fashion), then just follow its call-graph (via 
> > > > > > > > > the clang::CallGraph?) to emit all of these functions.
> > > > > > > > > 
> > > > > > > > > It seemed really odd to see this approach here, but it seemed 
> > > > > > > > > well reviewed by the time I noticed it (via a downstream bug) 
> > > > > > > > > so I figured I'd lost my chance to disagree with the approach.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > Sure, but `visitUsedDecl` isn't the right place to be entering 
> > > > > > > > the walk.  `visitUsedDecl` is supposed to be the *callback* 
> > > > > > > > from the walk.  If they need to walk all the global 
> > > > > > > > declarations to find kernels instead of tracking the kernels as 
> > > > > > > > they're encountered (which would be a *much* better approach), 
> > > > > > > > it should be done as a separate function.
> > > > > > > > 
> > > > > > > > I just missed this in the review.
> > > > > > > The deferred diagnostics could be initiated by non-kernel 
> > > > > > > functions or even host functions.
> > > > > > > 
> > > > > > > Let's consider a device code library where no kernels are 
> > > > > > > defined. A device function is emitted, which calls a host device 
> > > > > > > function which has a deferred diagnostic. All device functions 
> > > > > > > that are emitted need to be checked.
> > > > > > > 
> > > > > > > Same with host functions that are emitted, which may call a host 
> > > > > > > device function which has deferred diagnostic.
> > > > > > > 
> > > > > > > Also not just function calls need to be checked. A function 
> > > > > > > address may be taken then called through function pointer. 
> > > > > > > Therefore any reference to a function needs to be followed.
> > > > > > > 
> > > > > > > In the case of OpenMP, the initialization of a global function 
> > > > > > > pointer which refers a function may trigger a deferred 
> > > > > > > diangostic. There are tests for that.
> > > > > > Right, I get that emitting deferred diagnostics for a declaration D 
> > > > > > needs to trigger any deferred diagnostics in declarations used by 
> > > > > > D, recursively.  You essentially have a graph of lazily-emitted 
> > > > > > declarations (which may or may not have deferred diagnostics) and a 
> > > > > > number of eagerly-emitted "root" declarations with use-edges 
> > > > > > leading into that graph.  Any declaration that's reachable from a 
> > > > > > root will need to be emitted and so needs to have any deferred 
> > > > > > diagnostics emitted as well.  My question is why you're finding 
> > > > > > these roots with a retroactive walk of the entire translation unit 
> > > > > > instead of either building a list of roots as you go or (better 
> > > > > > yet) building a list of lazily-emitted declarations that are used 
> > > > > > by those roots.  You can unambiguously identify at the point of 
> > > > > > declaration whether an entity will be eagerly or lazily emitted, 
> > > > > > right?  If you just store those initial edges into the 
> > > > > > lazily-emitted declarations graph and then initiate the recursive 
> > > > > > walk from them at the end of the translation unit, you'll only end 
> > > > > > up walking declarations that are actually relevant to your 
> > > > > > compilation, so you'll have much better locality and (if this 
> > > > > > matters to you) you'll naturally work a lot better with PCH and 
> > > > > > modules.
> > > > > I will try the approach you suggested. Basically I will record the 
> > > > > emitted functions and variables during parsing and use them as 
> > > > > starting point for the final traversal.
> > > > > 
> > > > > This should work for CUDA/HIP. However it may be tricky for OpenMP 
> > > > > since the emission of some entities depending on pragmas. Still it 
> > > > > may be doable. If I encounter difficulty I will come back for 
> > > > > discussion.
> > > > > 
> > > > > I will post the change for review.
> > > > > 
> > > > > Thanks.
> > > > FYI: SYCL is also using deferred diagnostics engine to emit device side 
> > > > diagnostics, although this part hasn't been up-streamed yet, but we are 
> > > > tracking changes in this area.
> > > > SYCL support implementation should be quite similar to CUDA/HIP.
> > > Okay, thank you.  Do you still need all the cases in here for records, 
> > > templates, and so on?  It looks to me like you should always end up here 
> > > with exactly the variables and functions that are being used, and you 
> > > should never need to make special efforts to e.g. visit all the 
> > > specializations of a template or visit all the methods of a class.
> > I can remove handling of templates and records. However I have to keep the 
> > handling of CapturedDecl. It is generated from code like
> > 
> > ```
> > void t1(int r) {}
> > 
> > int main() {
> > #pragma omp target
> >   {
> >     t1(0);
> >   }
> >   return 0;
> > }
> > 
> > ```
> > And it is like a function decl embeded in function main, e.g.
> > 
> > 
> > ```
> > -FunctionDecl 0x86f7c70 <line:8:1, line:15:1> line:8:5 main 'int ()'
> >   `-CompoundStmt 0x873c3f8 <col:12, line:15:1>
> >     |-OMPTargetDirective 0x873c3a0 <line:9:1, col:19>
> >     | `-CapturedStmt 0x873c378 <line:10:3, line:13:3>
> >     |   `-CapturedDecl 0x873bd18 <<invalid sloc>> <invalid sloc> nothrow
> >     |     |-CapturedStmt 0x873c350 <line:10:3, line:13:3>
> >     |     | `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow
> >     |     |   |-CompoundStmt 0x873c338 <line:10:3, line:13:3>
> >     |     |   | `-CallExpr 0x873c310 <line:12:5, col:9> 'void'
> >     |     |   |   |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)' 
> > <FunctionToPointerDecay>
> >     |     |   |   | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 
> > 0x86f7b18 't1' 'void (int)'
> >     |     |   |   `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0
> >     |     |   `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit 
> > __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const 
> > restrict'
> >     |     |-AlwaysInlineAttr 0x873c040 <<invalid sloc>> Implicit 
> > __forceinline
> >     |     |-ImplicitParamDecl 0x873bda0 <col:1> col:1 implicit .global_tid. 
> > 'const int'
> >     |     |-ImplicitParamDecl 0x873be08 <col:1> col:1 implicit .part_id. 
> > 'const int *const restrict'
> >     |     |-ImplicitParamDecl 0x873be70 <col:1> col:1 implicit .privates. 
> > 'void *const restrict'
> >     |     |-ImplicitParamDecl 0x873bed8 <col:1> col:1 implicit .copy_fn. 
> > 'void (*const restrict)(void *const restrict, ...)'
> >     |     |-ImplicitParamDecl 0x873bf40 <col:1> col:1 implicit .task_t. 
> > 'void *const'
> >     |     |-ImplicitParamDecl 0x873bfd8 <col:1> col:1 implicit __context 
> > 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
> >     |     |-RecordDecl 0x873c098 <col:1> col:1 implicit struct definition
> >     |     | `-CapturedRecordAttr 0x873c140 <<invalid sloc>> Implicit
> >     |     `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow
> >     |       |-CompoundStmt 0x873c338 <line:10:3, line:13:3>
> >     |       | `-CallExpr 0x873c310 <line:12:5, col:9> 'void'
> >     |       |   |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)' 
> > <FunctionToPointerDecay>
> >     |       |   | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 
> > 0x86f7b18 't1' 'void (int)'
> >     |       |   `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0
> >     |       `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit 
> > __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const 
> > restrict'
> >     `-ReturnStmt 0x873c3e8 <line:14:3, col:10>
> >       `-IntegerLiteral 0x873c3c8 <col:10> 'int' 0
> > 
> > ```
> > If I do not handle it, I will not be able to reach the call of t1().
> Sure, although I wonder if it might be more reasonable to just make 
> UsedDeclVisitor walk into `CapturedDecl`s (and `BlockDecl`s) when it sees the 
> corresponding statements/expressions.  Unlike other declaration references, 
> those are never "cross-references"; they're just local code tied to a 
> declaration for representational reasons.
done


================
Comment at: clang/lib/Sema/Sema.cpp:1540
+    } else if (auto *VD = dyn_cast<VarDecl>(D)) {
+      if (auto *Init = VD->getInit()) {
+        auto DevTy = OMPDeclareTargetDeclAttr::getDeviceType(VD);
----------------
rjmccall wrote:
> yaxunl wrote:
> > rjmccall wrote:
> > > Can there also be deferred diagnostics associated with this initializer?
> > Yes. A global variable may be marked by omp declare target directive to be 
> > emitted on device. If the global var is initialized with the address of a 
> > function, the function will be emitted on device. If the device function 
> > calls a host device function which contains a deferred diag, that diag will 
> > be emitted. This can only be known after everything is parsed.
> I meant directly with the initializer.  Is there a way today to defer a 
> diagnostic that you would emit while processing an initializer expression?  
> If so, this needs to trigger that.
I don't think the initializer itself (without a target declare directive) will 
cause a deferred diagnostic since it does not cause change of emission states 
of functions. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to