rjmccall added inline comments.
================ Comment at: clang/lib/Sema/Sema.cpp:1514 + void visitUsedDecl(SourceLocation Loc, Decl *D) { + if (auto *TD = dyn_cast<TranslationUnitDecl>(D)) { + for (auto *DD : TD->decls()) { ---------------- yaxunl wrote: > rjmccall wrote: > > bader wrote: > > > yaxunl wrote: > > > > rjmccall wrote: > > > > > yaxunl wrote: > > > > > > rjmccall wrote: > > > > > > > erichkeane wrote: > > > > > > > > rjmccall wrote: > > > > > > > > > erichkeane wrote: > > > > > > > > > > Note that when recommitting this (if you choose to), this > > > > > > > > > > needs to also handle NamespaceDecl. We're a downstream and > > > > > > > > > > discovered that this doesn't properly handle functions or > > > > > > > > > > records handled in a namespace. > > > > > > > > > > > > > > > > > > > > It can be implemented identically to TranslationUnitDecl. > > > > > > > > > Wait, what? We shouldn't be doing this for > > > > > > > > > TranslationUnitDecl either. I don't even know how we're > > > > > > > > > "using" a TranslationUnitDecl, but neither this case not the > > > > > > > > > case for `NamespaceDecl` should be recursively using every > > > > > > > > > declaration declared inside it. If there's a declaration in > > > > > > > > > a namespace that's being used, it should be getting visited > > > > > > > > > as part of the actual use of it. > > > > > > > > > > > > > > > > > > The logic for `RecordDecl` has the same problem. > > > > > > > > Despite the name, this seems to be more of a home-written ast > > > > > > > > walking class. The entry point is the 'translation unit' which > > > > > > > > seems to walk through everything in an attempt to find all the > > > > > > > > functions (including those that are 'marked' as used by an > > > > > > > > attribute). > > > > > > > > > > > > > > > > You'll see the FunctionDecl section makes this assumption as > > > > > > > > well (not necessarily that we got to a function via a call). > > > > > > > > IMO, this approach is strange, and we should register entry > > > > > > > > points in some manner (functions marked as emitted to the > > > > > > > > device in some fashion), then just follow its call-graph (via > > > > > > > > the clang::CallGraph?) to emit all of these functions. > > > > > > > > > > > > > > > > It seemed really odd to see this approach here, but it seemed > > > > > > > > well reviewed by the time I noticed it (via a downstream bug) > > > > > > > > so I figured I'd lost my chance to disagree with the approach. > > > > > > > > > > > > > > > > > > > > > > > Sure, but `visitUsedDecl` isn't the right place to be entering > > > > > > > the walk. `visitUsedDecl` is supposed to be the *callback* from > > > > > > > the walk. If they need to walk all the global declarations to > > > > > > > find kernels instead of tracking the kernels as they're > > > > > > > encountered (which would be a *much* better approach), it should > > > > > > > be done as a separate function. > > > > > > > > > > > > > > I just missed this in the review. > > > > > > The deferred diagnostics could be initiated by non-kernel functions > > > > > > or even host functions. > > > > > > > > > > > > Let's consider a device code library where no kernels are defined. > > > > > > A device function is emitted, which calls a host device function > > > > > > which has a deferred diagnostic. All device functions that are > > > > > > emitted need to be checked. > > > > > > > > > > > > Same with host functions that are emitted, which may call a host > > > > > > device function which has deferred diagnostic. > > > > > > > > > > > > Also not just function calls need to be checked. A function address > > > > > > may be taken then called through function pointer. Therefore any > > > > > > reference to a function needs to be followed. > > > > > > > > > > > > In the case of OpenMP, the initialization of a global function > > > > > > pointer which refers a function may trigger a deferred diangostic. > > > > > > There are tests for that. > > > > > Right, I get that emitting deferred diagnostics for a declaration D > > > > > needs to trigger any deferred diagnostics in declarations used by D, > > > > > recursively. You essentially have a graph of lazily-emitted > > > > > declarations (which may or may not have deferred diagnostics) and a > > > > > number of eagerly-emitted "root" declarations with use-edges leading > > > > > into that graph. Any declaration that's reachable from a root will > > > > > need to be emitted and so needs to have any deferred diagnostics > > > > > emitted as well. My question is why you're finding these roots with > > > > > a retroactive walk of the entire translation unit instead of either > > > > > building a list of roots as you go or (better yet) building a list of > > > > > lazily-emitted declarations that are used by those roots. You can > > > > > unambiguously identify at the point of declaration whether an entity > > > > > will be eagerly or lazily emitted, right? If you just store those > > > > > initial edges into the lazily-emitted declarations graph and then > > > > > initiate the recursive walk from them at the end of the translation > > > > > unit, you'll only end up walking declarations that are actually > > > > > relevant to your compilation, so you'll have much better locality and > > > > > (if this matters to you) you'll naturally work a lot better with PCH > > > > > and modules. > > > > I will try the approach you suggested. Basically I will record the > > > > emitted functions and variables during parsing and use them as starting > > > > point for the final traversal. > > > > > > > > This should work for CUDA/HIP. However it may be tricky for OpenMP > > > > since the emission of some entities depending on pragmas. Still it may > > > > be doable. If I encounter difficulty I will come back for discussion. > > > > > > > > I will post the change for review. > > > > > > > > Thanks. > > > FYI: SYCL is also using deferred diagnostics engine to emit device side > > > diagnostics, although this part hasn't been up-streamed yet, but we are > > > tracking changes in this area. > > > SYCL support implementation should be quite similar to CUDA/HIP. > > Okay, thank you. Do you still need all the cases in here for records, > > templates, and so on? It looks to me like you should always end up here > > with exactly the variables and functions that are being used, and you > > should never need to make special efforts to e.g. visit all the > > specializations of a template or visit all the methods of a class. > I can remove handling of templates and records. However I have to keep the > handling of CapturedDecl. It is generated from code like > > ``` > void t1(int r) {} > > int main() { > #pragma omp target > { > t1(0); > } > return 0; > } > > ``` > And it is like a function decl embeded in function main, e.g. > > > ``` > -FunctionDecl 0x86f7c70 <line:8:1, line:15:1> line:8:5 main 'int ()' > `-CompoundStmt 0x873c3f8 <col:12, line:15:1> > |-OMPTargetDirective 0x873c3a0 <line:9:1, col:19> > | `-CapturedStmt 0x873c378 <line:10:3, line:13:3> > | `-CapturedDecl 0x873bd18 <<invalid sloc>> <invalid sloc> nothrow > | |-CapturedStmt 0x873c350 <line:10:3, line:13:3> > | | `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow > | | |-CompoundStmt 0x873c338 <line:10:3, line:13:3> > | | | `-CallExpr 0x873c310 <line:12:5, col:9> 'void' > | | | |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)' > <FunctionToPointerDecay> > | | | | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function > 0x86f7b18 't1' 'void (int)' > | | | `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0 > | | `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit > __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const > restrict' > | |-AlwaysInlineAttr 0x873c040 <<invalid sloc>> Implicit __forceinline > | |-ImplicitParamDecl 0x873bda0 <col:1> col:1 implicit .global_tid. > 'const int' > | |-ImplicitParamDecl 0x873be08 <col:1> col:1 implicit .part_id. > 'const int *const restrict' > | |-ImplicitParamDecl 0x873be70 <col:1> col:1 implicit .privates. > 'void *const restrict' > | |-ImplicitParamDecl 0x873bed8 <col:1> col:1 implicit .copy_fn. > 'void (*const restrict)(void *const restrict, ...)' > | |-ImplicitParamDecl 0x873bf40 <col:1> col:1 implicit .task_t. 'void > *const' > | |-ImplicitParamDecl 0x873bfd8 <col:1> col:1 implicit __context > 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict' > | |-RecordDecl 0x873c098 <col:1> col:1 implicit struct definition > | | `-CapturedRecordAttr 0x873c140 <<invalid sloc>> Implicit > | `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow > | |-CompoundStmt 0x873c338 <line:10:3, line:13:3> > | | `-CallExpr 0x873c310 <line:12:5, col:9> 'void' > | | |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)' > <FunctionToPointerDecay> > | | | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function > 0x86f7b18 't1' 'void (int)' > | | `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0 > | `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context > 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict' > `-ReturnStmt 0x873c3e8 <line:14:3, col:10> > `-IntegerLiteral 0x873c3c8 <col:10> 'int' 0 > > ``` > If I do not handle it, I will not be able to reach the call of t1(). Sure, although I wonder if it might be more reasonable to just make UsedDeclVisitor walk into `CapturedDecl`s (and `BlockDecl`s) when it sees the corresponding statements/expressions. Unlike other declaration references, those are never "cross-references"; they're just local code tied to a declaration for representational reasons. ================ Comment at: clang/lib/Sema/Sema.cpp:1540 + } else if (auto *VD = dyn_cast<VarDecl>(D)) { + if (auto *Init = VD->getInit()) { + auto DevTy = OMPDeclareTargetDeclAttr::getDeviceType(VD); ---------------- yaxunl wrote: > rjmccall wrote: > > Can there also be deferred diagnostics associated with this initializer? > Yes. A global variable may be marked by omp declare target directive to be > emitted on device. If the global var is initialized with the address of a > function, the function will be emitted on device. If the device function > calls a host device function which contains a deferred diag, that diag will > be emitted. This can only be known after everything is parsed. I meant directly with the initializer. Is there a way today to defer a diagnostic that you would emit while processing an initializer expression? If so, this needs to trigger that. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D70172/new/ https://reviews.llvm.org/D70172 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits