tra added a comment.

In D110089#3021652 <https://reviews.llvm.org/D110089#3021652>, @jlebar wrote:

> Okay, I give up on the phab interface.  It's unreadable with all the existing
> comments and lint errors.

Yeah. Phabricator experience is not great.

> +// Put all functions into anonymous namespace so they have internal linkage.
>
> Say a little more?  Specifically, you want anon ns because this is device code
> and it has to work even without being linked.
>
> (Also, are you sure that plain `inline` doesn't do the right thing?  Like, we
> have lots of CUDA headers that are `inline`'ed without all being in an anon
> ns.)

We do want inlining, but the main purpose here is to avoid potential ODR for 
use with -fgpu-rdc where multiple TUs may be compiled with different versions 
of this header. Because the hash may change, we could end up with thesame Tag 
types (and fetch functions) with the same names meaning different things for 
different TUs.

> +// NOTE: the perfect hashing scheme comes with inherent self-test. If the 
> hash
> +// function has a collision for any of the texture operations, the 
> compilation
> +// will fail due to an attempt to redefine a tag with the same value. If the
> +// header compiles, then the hash function is good enough for the job.
>
> I guess if it has a self-test then that's fine.  Though is this really better
> than a series of `if` statements with strcmp?

Yes, I think somewhat obfuscated metaprogramming here wins on points, IMO.

- it's fairly well-structured, even if macros make it a bit of a pain to dig 
through.
- assumptions about the perfect hash are minimal -- it's just a 1:1 
string->integer map. If that assumption is violated we're guaranteed to get a 
compilation error when we instantiate the templates that map to the same value. 
I did test that by changing the hash function.
- strcmp() will result in 100+ comparisons. That alone will be a pain to write 
manually. In my experience, having more than a handful of nearly-identical, but 
critically different chunks of code makes the whole thing very error-prone. 
I've tried that early on before I've figured out how to parameterize templates 
by a string.
- We'll also need to use it in a function template, so the code will get 
replicated over all the instances of the signatures we'll need to impement. 
While it's probably no a showstopper, it's still additional IR we'd have to 
deal with. Adding incremental burden on all CUDA users is worse than additional 
mental burden on whoever may need to read this code (most likely me).

> I guess I am scared of this kind of thing because I did it once in ccache.  I 
> thought I was very clever and got
> a good speedup.  1 year later I found out I'd broken handling of __DATE__ and 
> __TIME__.  o.O

Being wary of making an easy-to-miss errors here, I literally did exhaustive 
testing of all variants (2972 of them) of high-level API calls provided by 
NVIDIA headers  and verified that we do end up generating identical 
instructions and their parameters.
I will add that test in the test-suite as it needs actual CUDA headers.

In D110089#3021659 <https://reviews.llvm.org/D110089#3021659>, @jlebar wrote:

> Presumably as a separate commit we should add tests to the test_suite 
> repository to ensure that this at least still compiles with different 
> versions of CUDA?

That's the plan.  I've tested thhe patch manually down to CUDA-9. It will not 
work with CUDA-8 or older as they have completely different under-the hood 
implementation in CUDA headers. I'll add an include guard for the old CUDA 
versions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110089/new/

https://reviews.llvm.org/D110089

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to