ABataev added a comment. In D71241#1782157 <https://reviews.llvm.org/D71241#1782157>, @jdoerfert wrote:
> In D71241#1781955 <https://reviews.llvm.org/D71241#1781955>, @ABataev wrote: > > > In D71241#1780715 <https://reviews.llvm.org/D71241#1780715>, @jdoerfert > > wrote: > > > > > In D71241#1779097 <https://reviews.llvm.org/D71241#1779097>, @ABataev > > > wrote: > > > > > > > In D71241#1778963 <https://reviews.llvm.org/D71241#1778963>, @jdoerfert > > > > wrote: > > > > > > > > > In D71241#1778736 <https://reviews.llvm.org/D71241#1778736>, @ABataev > > > > > wrote: > > > > > > > > > > > In D71241#1778717 <https://reviews.llvm.org/D71241#1778717>, > > > > > > @jdoerfert wrote: > > > > > > > > > > > > > >> There is no evidence that this is more complicated. By all > > > > > > > >> measures, this is less complicated (see also below). It is > > > > > > > >> also actually doing the right thing when it comes to code > > > > > > > >> emission. Take https://godbolt.org/z/sJiP3B for example. The > > > > > > > >> calls are wrong and the definition of base is missing. > > > > > > > > > > > > > > > > How did you measure it? I have a completely different opinion. > > > > > > > > Also, tried to reproduce the problem locally, could not > > > > > > > > reproduce. It seems to me, the output of the test misses > > > > > > > > several important things. You can check it yourself. > > > > > > > > `tgt_target_teams()` call uses `@.offload_maptypes` global var > > > > > > > > but it is not defined. > > > > > > > > > > > > > > Here is the link with the globals not hidden: > > > > > > > https://godbolt.org/z/5etB5S > > > > > > > The base function is called both times but should not be called > > > > > > > at all. What is your local output and why does it differ? > > > > > > > > > > > > > > > > > > On the host `base` is an alias for the `hst` function. On the > > > > > > device `base` has the body of `dev` function because NVPTX does nit > > > > > > support function aliases (10+ suppprts it, but LLVM does not > > > > > > support it yet). Try to change the bodies of dev and hst and you > > > > > > will see. > > > > > > > > > > > > I tried to keep original function names to improve debugging and > > > > > > make users less wonder why instead of `base` something else is > > > > > > called. > > > > > > > > > > > > > > > How does that work with linking? Another translation unit can call > > > > > both the base and hst/dev function, right? I mean they both need to > > > > > be present. > > > > > > > > > > > > If hst or dev are needed, they are emitted too, independently. On the > > > > host, the variamt function is emitted and base function is set to be an > > > > alias of the variant function. On the device we just inherit the body > > > > of the variant function, but this variant function also can be emitted, > > > > if used independently. > > > > > > > > > This is confusing: > > > > > > 1. Especially for debugging we should do the same on host and device. > > > > > > We do the same on the host and on the device. The user will see that he is > > the context of originally called function `base`. It is the same behavior > > just like the behavior of the GNU gcc `alias` attribute. > > > No we do not. You said it yourself just in the last comment: one time you > emit an alias (host), one time you "emit code into the base function" > (device). > That is not the same thing, especially if you look at debug information, > stack frames, ... Yes, formally it is not the same. But only because NVPTX does not support function aliasing. > > >>> 2. The device function now exists twice, that is bad. >> >> It is not so bad, actually. In your solution, you have almost the same >> situation. Plus, it is not a big problem taking into account that most of >> the function calls will be inlined and function will be eliminated. As soon >> as we have support for global aliases in LLVM/NVPTX, we can improve it. > > It is bad to duplicate code for no reason. The SemaOverload solution has not > "almost the same situation", as it does not replace the base function body > with the variant function body. Again, just a workaround for NVPTX problem with function aliases. > > >>> 3. How is this supposed to work with type corrections? I mean the variant >>> needs to be compatible but not necessarily of the same type, right? >>> https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it >>> crashes for me afterwards). >> >> Actually, these functions are not compatible. We're missing some extra >> checks in Sema, will add them later. > > That is not necessarily clear to me. The standard does not say the types need > to be equal but compatible. Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function `base` as `int base(float)` and then try to redeclare it as `int base(int)`. You will get the error because the types are not compatible. For example, `int ()` and `int(int)` are compatible though are not equal. > > >>> 4. On the host the expression `&base == &hst` will evaluate to true with >>> the alias. >> >> And what is the problem with this? I believe, with your solution the result >> will be the same because you will just replace `base` with the `hst` upon >> overloading resolution. > > But I can do it for calls only. Aliases are not as fine granular. That is > also why the current code cannot handle the `construct` trait but an overload > solution can. I see no problem here. > > >>> In D71241#1779779 <https://reviews.llvm.org/D71241#1779779>, @ABataev wrote: >>> >>>> Here is the example that does not work with the proposed solution but >>>> works with the existing one: >>>> >>>> static void cpu() { asm("nop"); } >>>> >>>> #pragma omp declare variant(cpu) match(device = {kind(cpu)}) >>>> static __attribute__((used)) void wrong_asm() { >>>> asm ("xxx"); >>>> } >>>> >>>> >>>> The existing solution has some problems with the delayed error messages >>>> too, but they are very easy to fix. >>> >>> >>> I don't understand this example. What is the expected outcome here (I get >>> the error below from ToT clang). Why is that not achievable by a >>> SemaOverload solution? >>> `asm.c:4:35: error: function with '#pragma omp declare variant' must have >>> a prototype` >> >> Try to compile as C++: >> >> clang++ -с -fopenmp repro.cpp > > You did not answer any of the questions here. If you ignore my comments this > is not working. > What is the expected outcome here? Why is that not achievable by a > SemaOverload solution? Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71241/new/ https://reviews.llvm.org/D71241 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits