keryell added inline comments.
================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:5542 + if (!getLangOpts().HIPStdPar) + ErrorUnsupported(E, "builtin function"); ---------------- AlexVlx wrote: > efriedma wrote: > > AlexVlx wrote: > > > efriedma wrote: > > > > This doesn't make sense; we can't just ignore bits of the source code. > > > > I guess this is related to "the decision on their validity is > > > > deferred", but I don't see how you expect this to work. > > > This is one of the weirder parts, so let's consider the following example: > > > > > > ```cpp > > > void foo() { __builtin_ia32_pause(); } > > > void bar() { __builtin_trap(); } > > > > > > void baz(const vector<int>& v) { > > > return for_each(par_unseq, cbegin(v), cend(v), [](auto&& x) { if (x > > > == 42) bar(); }); > > > } > > > ``` > > > > > > In the case above, what we'd offload to the accelerator, and ask the > > > target BE to lower, is the implementation of `for_each`, and `bar`, > > > because it is reachable from the latter. `foo` is not reachable by any > > > execution path on the accelerator side, however it includes a builtin > > > that is unsupported by the accelerator (unless said accelerator is x86, > > > which is not impossible, but not something we're dealing with at the > > > moment). If we were to actually error out early, in the FE, in these > > > cases, there's almost no appeal to what is being proposed, because > > > standard headers, as well as other libraries, are littered with various > > > target specific builtins that are not going to be supported. This all > > > builds on the core invariant of this model / extension / thingamabob, > > > which is that the algorithms, and only the algorithms, are targets for > > > offload. It thus follows that as long as code that is reachable from an > > > algorithm's implementation is safe, all is fine, but we cannot know this > > > in the FE / on an AST level, because we need the actual CFG. This part is > > > handled in LLVM in the `SelectAcceleratorCodePass` that's in the last > > > patch in this series. > > > > > > Now, you will quite correctly observe that there's nothing preventing an > > > user from calling `foo` in the callable they pass to an algorithm; they > > > might read the docs / appreciate that this won't work, but even there > > > they are not safe, because there via some opaque call chain they might > > > end up touching some unsupported builtin. My intuition here, which is > > > reflected above in letting builtins just flow through, is that such cases > > > are better served with a compile time error, which is what will obtain > > > once the target BE chokes trying to lower an unsupported builtin. It's > > > not going to be a beautiful error, and we could probably prettify it > > > somewhat if we were to check after we've done the accelerator code > > > selection pass, but it will happen at compile time. Another solution > > > would be to emit these as traps (poison + trap for value returning ones), > > > but I am concerned that it would lead to really fascinating debug > > > journeys. > > > > > > Having said this, if there's a better way to deal with these scenarios, > > > it would be rather nice. Similarly, if the above doesn't make sense, > > > please let me know. > > > > > Oh, I see; you "optimistically" compile everything assuming it might run on > > the accelerator, then run LLVM IR optimizations, then determine late which > > bits of code will actually run on the accelerator, which then prunes the > > code which shouldn't run. > > > > I'm not sure I really like this... would it be possible to infer which > > functions need to be run on the accelerator based on the AST? I mean, if > > your API takes a lambda expression that runs on the accelerator, you can > > mark the lambda's body as "must be emitted for GPU", then recursively mark > > all the functions referred to by the lambda. > > > > Emiting errors lazily from the backend means you get different diagnostics > > depending on the optimization level. > > > > If you do go with this codegen-based approach, it's not clear to me how you > > detect that a forbidden builtin was called; if you skip the error handling, > > you just get a literal "undef". > `I'm not sure I really like this...` - actually, I am not a big fan either, > however I think it's about the best one can do, given the constraints > (consume standard C++, no annotations on the user side etc.). Having tried a > few times in the past (and at least once in a different compiler), I don't > quite think this can be done on an AST level. It would add some fairly > awkward checking during template instantiation (no way to know earlier that a > `CallableFoo` was passed to an offloadable algorithm), and it's a bit > unwieldy to basically compute the CFG on the AST and mark reachable Callees > at that point. Ignoring those, the main reason for which we cannot do this is > that the interface is not constrained to only take lambdas, but callables in > general, and that includes pointers to function as well. We don't deal with > those today, but plan to, and there's a natural solution when operating on > IR, assuming closed / internalised Modules (which is the case at least for > AMDGPU at the moment). The final challenge pertains to the AST being per TU, > with no cross-TU visibility, whereas with IR you can either pre-link the BC > (implicitly or LTO) and then operate on the entire compilation. This is a > problem with cases where `foo` defined in TU0 is reachable from > `algorithm_bar_offloaded_impl` in TU1. So TL;DR, I think it would be more > complex to do this on the AST and would end up more brittle / less future > proof. > > In what regards how to do deferred diagnostics, it think it can be done like > this (I crossed streams in my prior reply when discussing this part, so it's > actually nonsense): instead of emitting undef here, we can emit a builtin > with the same signature, but with the name suffixed with e.g. > (`__stdpar_unsupported`) or something similar. Then, when doing the > reachability computation later, if we stumble upon a node in the CFG that > contains a builtin suffixed with `__stdpar_unsupported` we error out, and can > provide nice diagnostics since we'd have the call-chain handy. Thoughts? There is a lot of interesting design information in this discussion thread which will be lost forever after this is merged. Is there a way to keep a summary as a comment somewhere to help the future readers/maintainers/historians? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155850/new/ https://reviews.llvm.org/D155850 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits