[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

Artem Belevich via Phabricator via cfe-commits Tue, 22 May 2018 16:09:48 -0700

tra added a comment.

In https://reviews.llvm.org/D47070#1106018, @echristo wrote:


> > As a short-term fix we can disable feature-to-function attribute 
> > propagation for NVPTX until we fix it.
> > 
> > @echristo -- any other suggestions?
>
> This is some of what I was talking about when I was mentioning how function 
> attributes and the targets work. Ideally you'll have a compatible set of 
> features and it won't really cause an issue. The idea is that if you're 
> compiling for a minimum ptx feature of X, then any "compatible" set of ptx 
> should be able to inline into your code. I think you do want the features to 
> propagate in general, just specific use cases may not care one way or another 
> - that said, for those use cases you're probably just compiling everything 
> with the same feature anyhow.


The thing is that with NVPTX you can not have incompatible functions in the 
PTX, period. PTXAS will just throw syntax errors at you. In that regard PTX is 
very different from intel where in the same binary you can have different 
functions with code for different x86 variants.  For PTX, sm_50 and sm_60 mean 
entirely different GPUs with entirely different instruction sets/encoding. PTX 
version would be an approximation of a different language dialect .  You can 
not use anything from PTX 4.0 if your file says it's PTX3.0. It's sort of like 
you can't use c++17 features when you're compiling in c++98 mode. Bottom line 
is that features and target-cpu do not make  much sense for NVPTX. Everything  
we generate in a TU must satisfy minimum PTX version and minimum GPU variant 
and it all will be compiled for and run on only one specific GPU. There's no 
mixing and matching.

The question is -- what's the best way to make things work as they were before 
I broke them?
@Hahnfeld's idea of ignoring features and target-cpu would get us there, but 
that may be a never-ending source of surprises if/when something else decides 
to pay attention to those attributes.
I think the best way to tackle that would be to 
a) figure out how to make builtins available/or not on clang side, and
b) make target-cpu and target-features attributes explicitly unsupported on 
NVPTX as we can not provide the functionality those attributes imply.

> I guess, ultimately, I'm not seeing what the concern here is for how features 
> are working or not working for the target so it's harder to help. What is the 
> problem you're running into, or can you try a different way of explaining it 
> to me? :)

Here's my understanding of what happens: 
We've started adding target-features and target-cpu to everything clang 
generates. 
We also need to link with libdevice (or IR generated by clang which which has 
functions w/o those attributes. Or we need to link with IR produced by clang 
which used different CUDA SDK and thus different PTX version in target-feature.
Due to attribute mismatch we are failing to inline some of the functions and 
that hurts performance.


Repository:
  rC Clang

https://reviews.llvm.org/D47070



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

Reply via email to