[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-11 Thread David Majnemer via Phabricator via cfe-commits
majnemer added a comment.

In D7#1975618 , @hliao wrote:

> In D7#1975406 , @tra wrote:
>
> > In D7#1975178 , @hliao wrote:
> >
> > > the 1st argument in `llvm.nvvm.texsurf.hande.internal` or the 2nd one in 
> > > `llvm.nvvm.texsurf.handle` must be kept as an immediate or constant 
> > > value, i.e. that global variable. However, optimizations will find common 
> > > code in the following
> > >
> > >   if (cond) {
> > > %hnd = texsurf.handle.internal(@tex1);
> > >   } else {
> > > %hnd = texsurf.handle.internal(@tex2)
> > >   }
> > >   = use(%hnd)
> > >
> > >
> > > and hoist or sink it into
> > >
> > >   if (cond) {
> > > %ptr = @tex1;
> > >   } else {
> > > %ptr = @tex2;
> > >   }
> > >   %hnd = texsurf.handle.intenal(%ptr);
> > >   = use(%hnd)
> > >
> > >
> > > The backend cannot handle non immediate operand in `texsurf.handle`. The 
> > > similar thing happens to `read.register` as well as it also assumes its 
> > > argument is always an immediate value.
> >
> >
> > I wonder if we can use `token` types to represent the handle? 
> > https://reviews.llvm.org/D11861
> >  @majnemer -- would this use case be suitable for the `token` type?
>
>
> If we still could make PHI over token, it canont serve this purpose. Check 
> `llvm::canReplaceOperandWithVariable` for operand for details.


It is not possible to PHI a token value. Token values disable the call to 
canReplaceOperandWithVariable.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-11 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D7#1975628 , @hliao wrote:

> In D7#1975440 , @tra wrote:
>
> > Also, if I read PTX docs correctly, it should be OK to pass texture handle 
> > address via an intermediate variable:
> >  
> > https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types
> >
> > > Creating pointers to opaque variables using mov, e.g., mov.u64 reg, 
> > > opaque_var;. The resulting pointer may be stored to and loaded from 
> > > memory, passed as a parameter to functions, and de-referenced by texture 
> > > and surface load, store, and query instructions
> >
> > We may not need the tokens and should be able to use regular pointer.
>
>
> That handle is the output of `texsurf.handle` intrinsic instead of its input. 
> Internally within NVTPX backend, it needs to keep track of which global 
> variable needs to be a `texref` or `surfref` and requires the operand of 
> `texsurf.handle` must be a global variable. Check 
> `NVPTXReplaceImageHandles.cpp` line 167 - 175.


That's the point I'm trying to make -- existing code may not be the best way to 
implement this and should be improved. If we are serious about supporting 
textures & surfaces, then it may be worth making it work properly, as opposed 
to adding more hacks to get the old and till-now largely unused bits of LLVM do 
what we want.

We could do something like this:

- class instances with the texref attribute are lowered in a way that produces 
a global handle. It could be a handle-only, or the handle may be produced in 
addition to the object itself.
- intrinsics accept texref pointers and follow standard LLVM rules/assumptions.
- => code is subject to regular LLVM optimizations
- => no need to have special passes to tweak IR just so. At worst, we may keep 
something similar to NVPTXReplaceImageHandles which would replace object 
references with handle references. We may not even need to do that. As far as 
LLVM is concerned texture handle is just a pointer, and the objects with texref 
attribute are lowered as a .texref global in PTX. It's user's responsibility to 
pass the right pointer to the intrinsic.

On a side note, the lowering of texture/surface instructions and intrinsics 
could use a major overhaul, too. It's currently excessively redundant and could 
be reduced to a much more concise tablegen-driven implementation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Michael Liao via Phabricator via cfe-commits
hliao added a comment.

In D7#1975440 , @tra wrote:

> Also, if I read PTX docs correctly, it should be OK to pass texture handle 
> address via an intermediate variable:
>  
> https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types
>
> > Creating pointers to opaque variables using mov, e.g., mov.u64 reg, 
> > opaque_var;. The resulting pointer may be stored to and loaded from memory, 
> > passed as a parameter to functions, and de-referenced by texture and 
> > surface load, store, and query instructions
>
> We may not need the tokens and should be able to use regular pointer.


That handle is the output of `texsurf.handle` intrinsic instead of its input. 
Internally within NVTPX backend, it needs to keep track of which global 
variable needs to be a `texref` or `surfref` and requires the operand of 
`texsurf.handle` must be a global variable. Check 
`NVPTXReplaceImageHandles.cpp` line 167 - 175.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Michael Liao via Phabricator via cfe-commits
hliao added a comment.

In D7#1975406 , @tra wrote:

> In D7#1975178 , @hliao wrote:
>
> > the 1st argument in `llvm.nvvm.texsurf.hande.internal` or the 2nd one in 
> > `llvm.nvvm.texsurf.handle` must be kept as an immediate or constant value, 
> > i.e. that global variable. However, optimizations will find common code in 
> > the following
> >
> >   if (cond) {
> > %hnd = texsurf.handle.internal(@tex1);
> >   } else {
> > %hnd = texsurf.handle.internal(@tex2)
> >   }
> >   = use(%hnd)
> >
> >
> > and hoist or sink it into
> >
> >   if (cond) {
> > %ptr = @tex1;
> >   } else {
> > %ptr = @tex2;
> >   }
> >   %hnd = texsurf.handle.intenal(%ptr);
> >   = use(%hnd)
> >
> >
> > The backend cannot handle non immediate operand in `texsurf.handle`. The 
> > similar thing happens to `read.register` as well as it also assumes its 
> > argument is always an immediate value.
>
>
> I wonder if we can use `token` types to represent the handle? 
> https://reviews.llvm.org/D11861
>  @majnemer -- would this use case be suitable for the `token` type?


If we still could make PHI over token, it canont serve this purpose. Check 
`llvm::canReplaceOperandWithVariable` for operand for details.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

Also, if I read PTX docs correctly, it should be OK to pass texture handle 
address via an intermediate variable:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-sampler-and-surface-types

> Creating pointers to opaque variables using mov, e.g., mov.u64 reg, 
> opaque_var;. The resulting pointer may be stored to and loaded from memory, 
> passed as a parameter to functions, and de-referenced by texture and surface 
> load, store, and query instructions

We may not need the tokens and should be able to use regular pointer.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: majnemer.
tra added a comment.

In D7#1975178 , @hliao wrote:

> the 1st argument in `llvm.nvvm.texsurf.hande.internal` or the 2nd one in 
> `llvm.nvvm.texsurf.handle` must be kept as an immediate or constant value, 
> i.e. that global variable. However, optimizations will find common code in 
> the following
>
>   if (cond) {
> %hnd = texsurf.handle.internal(@tex1);
>   } else {
> %hnd = texsurf.handle.internal(@tex2)
>   }
>   = use(%hnd)
>
>
> and hoist or sink it into
>
>   if (cond) {
> %ptr = @tex1;
>   } else {
> %ptr = @tex2;
>   }
>   %hnd = texsurf.handle.intenal(%ptr);
>   = use(%hnd)
>
>
> The backend cannot handle non immediate operand in `texsurf.handle`. The 
> similar thing happens to `read.register` as well as it also assumes its 
> argument is always an immediate value.


I wonder if we can use `token` types to represent the handle? 
https://reviews.llvm.org/D11861
@majnemer -- would this use case be suitable for the `token` type?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Michael Liao via Phabricator via cfe-commits
hliao added a comment.

In D7#1974988 , @tra wrote:

> In D7#1974849 , @hliao wrote:
>
> >
>
>
>
>
> >> NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does 
> >> not impose direct constraints on LLVM's design choices.
> > 
> > It would be an advantage and, sometimes, desirable to generate IR 
> > compatible to NVVM IR spec.
>
> I'm not against it, but I think it's OK to make different choices if we have 
> good reasons for that. NVIDIA didn't update LLVM since they've contributed 
> the original implementation, so by now we're both far behind the current 
> state of NVVM and quite a bit sideways due to the things LLVM has added to 
> NVPTX backend.
>
> >> This sounds like it may have been done that way in an attempt to work 
> >> around a problem with intrinsics' constraints. We may want to check if 
> >> there's a better way to do it now.
> >>  Right now both intrinsics are marked with `[IntrNoMem]` which may be the 
> >> reason for compiler feeling free to move it around. We may need to give 
> >> compiler correct information and then we may not need this just-in-time 
> >> intrinsic replacement hack. I think it should be at least `IntrArgMemOnly` 
> >> or, maybe  `IntrInaccessibleMemOrArgMemOnly`.
> > 
> > That may not exactly model the behavior as, for binding texture/surface 
> > support, in fact, it's true that there's no memory operation at all. Even 
> > with InstArgMemOnly or similar attributes, it still won't be preventable 
> > for optimizations to sink common code. Such trick is played in lots of 
> > intrinsics, such as `read.register` and etc.
>
> Can you give me an example where/how optimizer would break things? Is that 
> because were using metadata as an argument?
>
> I've re-read NVVM docs and I can't say that I understand how it's supposed to 
> work.
>  `metadata holding the texture or surface variable` alone is a rather odd 
> notion and I'm not surprised that it's not handled well. In the end we do end 
> up with a 'handle' which is an in-memory object. Perhaps it should be 
> represented as a real variable with a metadata attribute. Then we can lower 
> it as a handle,  can enforce that only texture/surface instructions are 
> allowed to use it and will have a way to tell LLVM what it's allowed to do.
>
> I don't have a good picture of how it all will fit together in the end (or 
> whether what I suggest makes sense), but the current implementation appears 
> to be in need of rethinking.


the 1st argument in `llvm.nvvm.texsurf.hande.internal` or the 2nd one in 
`llvm.nvvm.texsurf.handle` must be kept as an immediate or constant value, i.e. 
that global variable. However, optimizations will find common code in the 
following

  if (cond) {
%hnd = texsurf.handle.internal(@tex1);
  } else {
%hnd = texsurf.handle.internal(@tex2)
  }
  = use(%hnd)

and hoist or sink it into

  if (cond) {
%ptr = @tex1;
  } else {
%ptr = @tex2;
  }
  %hnd = texsurf.handle.intenal(%ptr);
  = use(%hnd)

The backend cannot handle non immediate operand in `texsurf.handle`. The 
similar thing happens to `read.register` as well as it also assumes its 
argument is always an immediate value.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D7#1974849 , @hliao wrote:

>




>> NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does 
>> not impose direct constraints on LLVM's design choices.
> 
> It would be an advantage and, sometimes, desirable to generate IR compatible 
> to NVVM IR spec.

I'm not against it, but I think it's OK to make different choices if we have 
good reasons for that. NVIDIA didn't update LLVM since they've contributed the 
original implementation, so by now we're both far behind the current state of 
NVVM and quite a bit sideways due to the things LLVM has added to NVPTX backend.

>> This sounds like it may have been done that way in an attempt to work around 
>> a problem with intrinsics' constraints. We may want to check if there's a 
>> better way to do it now.
>>  Right now both intrinsics are marked with `[IntrNoMem]` which may be the 
>> reason for compiler feeling free to move it around. We may need to give 
>> compiler correct information and then we may not need this just-in-time 
>> intrinsic replacement hack. I think it should be at least `IntrArgMemOnly` 
>> or, maybe  `IntrInaccessibleMemOrArgMemOnly`.
> 
> That may not exactly model the behavior as, for binding texture/surface 
> support, in fact, it's true that there's no memory operation at all. Even 
> with InstArgMemOnly or similar attributes, it still won't be preventable for 
> optimizations to sink common code. Such trick is played in lots of 
> intrinsics, such as `read.register` and etc.

Can you give me an example where/how optimizer would break things? Is that 
because were using metadata as an argument?

I've re-read NVVM docs and I can't say that I understand how it's supposed to 
work.
`metadata holding the texture or surface variable` alone is a rather odd notion 
and I'm not surprised that it's not handled well. In the end we do end up with 
a 'handle' which is an in-memory object. Perhaps it should be represented as a 
real variable with a metadata attribute. Then we can lower it as a handle,  can 
enforce that only texture/surface instructions are allowed to use it and will 
have a way to tell LLVM what it's allowed to do.

I don't have a good picture of how it all will fit together in the end (or 
whether what I suggest makes sense), but the current implementation appears to 
be in need of rethinking.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Michael Liao via Phabricator via cfe-commits
hliao added a comment.

In D7#1974672 , @tra wrote:

> In D7#1972720 , @hliao wrote:
>
> > In D7#1972349 , @tra wrote:
> >
> > > The patch could use a more detailed description. Specifically, it does 
> > > not describe the purpose of these changes.
> > >
> > > > Replace them with the internal version, i.e. 
> > > > nvvm.texsurf.handle.internal just before the instruction selector.
> > >
> > > It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
> > >  If so, do we need 'internal' any more? Can we just rename internal and 
> > > be done with it? Adding an extra pass just to replace one intrinsic with 
> > > another seems to be unnecessary.
> > >
> > > I may be missing something here. Why do we have internal and non-internal 
> > > intrinsics at all? Do we need both?
> >
> >
> > besides required by NVVM IR spec,
>
>
> NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does 
> not impose direct constraints on LLVM's design choices.


It would be an advantage and, sometimes, desirable to generate IR compatible to 
NVVM IR spec.

> 
> 
>> the metadata in that intrinsic is a trick to prevent it from being sunk into 
>> common code during optimization in LLVM IR.
> 
> This sounds like it may have been done that way in an attempt to work around 
> a problem with intrinsics' constraints. We may want to check if there's a 
> better way to do it now.
>  Right now both intrinsics are marked with `[IntrNoMem]` which may be the 
> reason for compiler feeling free to move it around. We may need to give 
> compiler correct information and then we may not need this just-in-time 
> intrinsic replacement hack. I think it should be at least `IntrArgMemOnly` 
> or, maybe  `IntrInaccessibleMemOrArgMemOnly`.

That may not exactly model the behavior as, for binding texture/surface 
support, in fact, it's true that there's no memory operation at all. Even with 
InstArgMemOnly or similar attributes, it still won't be preventable for 
optimizations to sink common code. Such trick is played in lots of intrinsics, 
such as `read.register` and etc.

> 
> 
>> NVPTX backend only handles the `internal` version.
> 
> This is obviously fixable.

SDAG so far cannot handle metadata GISel doesn't have support either. Getting 
that supported in TICG won't justify too much for target-specific intrinsics as 
metadata should not directly be used in code generation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D7#1972720 , @hliao wrote:

> In D7#1972349 , @tra wrote:
>
> > The patch could use a more detailed description. Specifically, it does not 
> > describe the purpose of these changes.
> >
> > > Replace them with the internal version, i.e. nvvm.texsurf.handle.internal 
> > > just before the instruction selector.
> >
> > It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
> >  If so, do we need 'internal' any more? Can we just rename internal and be 
> > done with it? Adding an extra pass just to replace one intrinsic with 
> > another seems to be unnecessary.
> >
> > I may be missing something here. Why do we have internal and non-internal 
> > intrinsics at all? Do we need both?
>
>
> besides required by NVVM IR spec,


NVVM IR spec is for nvidia's own compiler. It's based on LLVM, but it does not 
impose direct constraints on LLVM's design choices.

> the metadata in that intrinsic is a trick to prevent it from being sunk into 
> common code during optimization in LLVM IR.

This sounds like it may have been done that way in an attempt to work around a 
problem with intrinsics' constraints. We may want to check if there's a better 
way to do it now.
Right now both intrinsics are marked with `[IntrNoMem]` which may be the reason 
for compiler feeling free to move it around. We may need to give compiler 
correct information and then we may not need this just-in-time intrinsic 
replacement hack. I think it should be at least `IntrArgMemOnly` or, maybe  
`IntrInaccessibleMemOrArgMemOnly`.

> NVPTX backend only handles the `internal` version.

This is obviously fixable.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Michael Liao via Phabricator via cfe-commits
hliao updated this revision to Diff 256518.
hliao added a comment.

Fix a clang-tidy warning.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDA/surface.cu
  clang/test/CodeGenCUDA/texture.cu
  llvm/lib/Target/NVPTX/CMakeLists.txt
  llvm/lib/Target/NVPTX/NVPTX.h
  llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
  llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
  llvm/test/CodeGen/NVPTX/tex-read-cuda.ll

Index: llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
===
--- llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
+++ llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
@@ -6,6 +6,7 @@
 
 declare { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64, i32)
 declare i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)*)
+declare i64 @llvm.nvvm.texsurf.handle.p1i64(metadata, i64 addrspace(1)*)
 
 ; SM20-LABEL: .entry foo
 ; SM30-LABEL: .entry foo
@@ -28,7 +29,7 @@
 ; SM20-LABEL: .entry bar
 ; SM30-LABEL: .entry bar
 define void @bar(float* %red, i32 %idx) {
-; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0 
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
   %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex0)
 ; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
 ; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
@@ -40,7 +41,24 @@
   ret void
 }
 
-!nvvm.annotations = !{!1, !2, !3}
+; SM20-LABEL: .entry bax
+; SM30-LABEL: .entry bax
+define void @bax(float* %red, i32 %idx) {
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
+  %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata !5, i64 addrspace(1)* @tex0)
+; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
+; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
+  %val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)
+  %ret = extractvalue { float, float, float, float } %val, 0
+; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+  store float %ret, float* %red
+  ret void
+}
+
+!nvvm.annotations = !{!1, !2, !3, !4}
 !1 = !{void (i64, float*, i32)* @foo, !"kernel", i32 1}
 !2 = !{void (float*, i32)* @bar, !"kernel", i32 1}
-!3 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!3 = !{void (float*, i32)* @bax, !"kernel", i32 1}
+!4 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!5 = !{i64 addrspace(1)* @tex0}
Index: llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
===
--- /dev/null
+++ llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
@@ -0,0 +1,91 @@
+//===- NVPTXLowerAggrCopies.cpp - --*- C++ -*--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+//
+// According to [NVVM IR Spec][1], `nvvm.texsurf.handle` should be used to
+// access texture/surface memory. The first argument to that intrinsic is a
+// metadata holding the texture or surface variable. The second argument to
+// that intrinsic is the texture or surface variable itself. However, the first
+// metadata argument cannot be handled directly by the NVPTX backend, which
+// only handle its internal version, i.e., `nvvm.texsurf.handle.internal`. This
+// pass, arranged just before the code selection, replaces
+// `nvvm.texsurf.handle` intrinsics with their internal version, i.e.,
+// `nvvm.texsurf.handle.internal`.
+// ---
+// [1]: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html
+//
+//===--===//
+
+#include "NVPTX.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsNVPTX.h"
+#include "llvm/Pass.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "nvptx-texsurf-handle-internalizer"
+
+namespace llvm {
+void initializeTexSurfHandleInternalizerPass(PassRegistry &);
+} // namespace llvm
+
+namespace {
+
+class TexSurfHandleInternalizer : public FunctionPass {
+public:
+  static char ID;
+
+  TexSurfHandleInternalizer() : FunctionPass(ID) {
+initializeTexSurfHandleInternalizerPass(*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {

[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-10 Thread Michael Liao via Phabricator via cfe-commits
hliao updated this revision to Diff 256511.
hliao added a comment.

Add more comments to explain what that pass does.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDA/surface.cu
  clang/test/CodeGenCUDA/texture.cu
  llvm/lib/Target/NVPTX/CMakeLists.txt
  llvm/lib/Target/NVPTX/NVPTX.h
  llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
  llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
  llvm/test/CodeGen/NVPTX/tex-read-cuda.ll

Index: llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
===
--- llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
+++ llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
@@ -6,6 +6,7 @@
 
 declare { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64, i32)
 declare i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)*)
+declare i64 @llvm.nvvm.texsurf.handle.p1i64(metadata, i64 addrspace(1)*)
 
 ; SM20-LABEL: .entry foo
 ; SM30-LABEL: .entry foo
@@ -28,7 +29,7 @@
 ; SM20-LABEL: .entry bar
 ; SM30-LABEL: .entry bar
 define void @bar(float* %red, i32 %idx) {
-; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0 
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
   %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex0)
 ; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
 ; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
@@ -40,7 +41,24 @@
   ret void
 }
 
-!nvvm.annotations = !{!1, !2, !3}
+; SM20-LABEL: .entry bax
+; SM30-LABEL: .entry bax
+define void @bax(float* %red, i32 %idx) {
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
+  %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata !5, i64 addrspace(1)* @tex0)
+; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
+; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
+  %val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)
+  %ret = extractvalue { float, float, float, float } %val, 0
+; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+  store float %ret, float* %red
+  ret void
+}
+
+!nvvm.annotations = !{!1, !2, !3, !4}
 !1 = !{void (i64, float*, i32)* @foo, !"kernel", i32 1}
 !2 = !{void (float*, i32)* @bar, !"kernel", i32 1}
-!3 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!3 = !{void (float*, i32)* @bax, !"kernel", i32 1}
+!4 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!5 = !{i64 addrspace(1)* @tex0}
Index: llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
===
--- /dev/null
+++ llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
@@ -0,0 +1,91 @@
+//===- NVPTXLowerAggrCopies.cpp - --*- C++ -*--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+//
+// According to [NVVM IR Spec][1], `nvvm.texsurf.handle` should be used to
+// access texture/surface memory. The first argument to that intrinsic is a
+// metadata holding the texture or surface variable. The second argument to
+// that intrinsic is the texture or surface variable itself. However, the first
+// metadata argument cannot be handled directly by the NVPTX backend, which
+// only handle its internal version, i.e., `nvvm.texsurf.handle.internal`. This
+// pass, arranged just before the code selection, replaces
+// `nvvm.texsurf.handle` intrinsics with their internal version, i.e.,
+// `nvvm.texsurf.handle.internal`.
+// ---
+// [1]: https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html
+//
+//===--===//
+
+#include "NVPTX.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsNVPTX.h"
+#include "llvm/Pass.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "nvptx-texsurf-handle-internalizer"
+
+namespace llvm {
+void initializeTexSurfHandleInternalizerPass(PassRegistry &);
+}
+
+namespace {
+
+class TexSurfHandleInternalizer : public FunctionPass {
+public:
+  static char ID;
+
+  TexSurfHandleInternalizer() : FunctionPass(ID) {
+initializeTexSurfHandleInternalizerPass(*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const 

[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-09 Thread Michael Liao via Phabricator via cfe-commits
hliao added a comment.

In D7#1972349 , @tra wrote:

> The patch could use a more detailed description. Specifically, it does not 
> describe the purpose of these changes.
>
> > Replace them with the internal version, i.e. nvvm.texsurf.handle.internal 
> > just before the instruction selector.
>
> It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
>  If so, do we need 'internal' any more? Can we just rename internal and be 
> done with it? Adding an extra pass just to replace one intrinsic with another 
> seems to be unnecessary.
>
> I may be missing something here. Why do we have internal and non-internal 
> intrinsics at all? Do we need both?


besides required by NVVM IR spec, the metadata in that intrinsic is a trick to 
prevent it from being sunk into common code during optimization in LLVM IR. 
NVPTX backend only handles the `internal` version. We need to internalize them 
for codegen. I will put a brief explanation in that pass.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

The patch could use a more detailed description. Specifically, it does not 
describe the purpose of these changes.

> Replace them with the internal version, i.e. nvvm.texsurf.handle.internal 
> just before the instruction selector.

It's not clear what is 'them'. 'nvvm.texsurf.handle' ?
If so, do we need 'internal' any more? Can we just rename internal and be done 
with it? Adding an extra pass just to replace one intrinsic with another seems 
to be unnecessary.

I may be missing something here. Why do we have internal and non-internal 
intrinsics at all? Do we need both?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-09 Thread Michael Liao via Phabricator via cfe-commits
hliao updated this revision to Diff 256321.
hliao added a comment.

Rebase to trunk.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D7/new/

https://reviews.llvm.org/D7

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDA/surface.cu
  clang/test/CodeGenCUDA/texture.cu
  llvm/lib/Target/NVPTX/CMakeLists.txt
  llvm/lib/Target/NVPTX/NVPTX.h
  llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
  llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
  llvm/test/CodeGen/NVPTX/tex-read-cuda.ll

Index: llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
===
--- llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
+++ llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
@@ -6,6 +6,7 @@
 
 declare { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64, i32)
 declare i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)*)
+declare i64 @llvm.nvvm.texsurf.handle.p1i64(metadata, i64 addrspace(1)*)
 
 ; SM20-LABEL: .entry foo
 ; SM30-LABEL: .entry foo
@@ -28,7 +29,7 @@
 ; SM20-LABEL: .entry bar
 ; SM30-LABEL: .entry bar
 define void @bar(float* %red, i32 %idx) {
-; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0 
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
   %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex0)
 ; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
 ; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
@@ -40,7 +41,24 @@
   ret void
 }
 
-!nvvm.annotations = !{!1, !2, !3}
+; SM20-LABEL: .entry bax
+; SM30-LABEL: .entry bax
+define void @bax(float* %red, i32 %idx) {
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
+  %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata !5, i64 addrspace(1)* @tex0)
+; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
+; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
+  %val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)
+  %ret = extractvalue { float, float, float, float } %val, 0
+; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+  store float %ret, float* %red
+  ret void
+}
+
+!nvvm.annotations = !{!1, !2, !3, !4}
 !1 = !{void (i64, float*, i32)* @foo, !"kernel", i32 1}
 !2 = !{void (float*, i32)* @bar, !"kernel", i32 1}
-!3 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!3 = !{void (float*, i32)* @bax, !"kernel", i32 1}
+!4 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!5 = !{i64 addrspace(1)* @tex0}
Index: llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
===
--- /dev/null
+++ llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
@@ -0,0 +1,81 @@
+//===- NVPTXLowerAggrCopies.cpp - --*- C++ -*--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// Replace `nvvm.texsurf.handle` intrinsics with their internal version, i.e.
+// `nvvm.texsurf.handle.internal`.
+//
+//===--===//
+
+#include "NVPTX.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsNVPTX.h"
+#include "llvm/Pass.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "nvptx-texsurf-handle-internalizer"
+
+namespace llvm {
+void initializeTexSurfHandleInternalizerPass(PassRegistry &);
+}
+
+namespace {
+
+class TexSurfHandleInternalizer : public FunctionPass {
+public:
+  static char ID;
+
+  TexSurfHandleInternalizer() : FunctionPass(ID) {
+initializeTexSurfHandleInternalizerPass(*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Internalize `nvvm.texsurf.handle` intrinsics";
+  }
+
+  void getAnalysisUsage(AnalysisUsage ) const override {
+AU.setPreservesCFG();
+  }
+
+  bool runOnFunction(Function ) override {
+bool Changed = false;
+for (auto  : F)
+  for (auto BI = BB.begin(), BE = BB.end(); BI != BE; /*EMPTY*/) {
+IntrinsicInst *II = dyn_cast(&*BI++);
+if (!II || II->getIntrinsicID() != Intrinsic::nvvm_texsurf_handle)
+  continue;
+assert(II->getArgOperand(1) ==
+   cast(
+   cast(II->getArgOperand(0))->getMetadata())
+   

[PATCH] D77777: [nvptx] Add `nvvm.texsurf.handle` internalizer.

2020-04-09 Thread Michael Liao via Phabricator via cfe-commits
hliao created this revision.
hliao added a reviewer: tra.
Herald added subscribers: cfe-commits, hiraditya, mgorny, jholewinski.
Herald added a project: clang.

- Replace them with the internal version, i.e. `nvvm.texsurf.handle.internal` 
just before the instruction selector.
- Teach clang codegen to generate `nvvm.texsurf.handle` instead of 
`nvvm.texsurf.handle.internal`.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D7

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGenCUDA/surface.cu
  clang/test/CodeGenCUDA/texture.cu
  llvm/lib/Target/NVPTX/CMakeLists.txt
  llvm/lib/Target/NVPTX/NVPTX.h
  llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
  llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
  llvm/test/CodeGen/NVPTX/tex-read-cuda.ll

Index: llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
===
--- llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
+++ llvm/test/CodeGen/NVPTX/tex-read-cuda.ll
@@ -6,6 +6,7 @@
 
 declare { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64, i32)
 declare i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)*)
+declare i64 @llvm.nvvm.texsurf.handle.p1i64(metadata, i64 addrspace(1)*)
 
 ; SM20-LABEL: .entry foo
 ; SM30-LABEL: .entry foo
@@ -28,7 +29,7 @@
 ; SM20-LABEL: .entry bar
 ; SM30-LABEL: .entry bar
 define void @bar(float* %red, i32 %idx) {
-; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0 
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
   %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.internal.p1i64(i64 addrspace(1)* @tex0)
 ; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
 ; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
@@ -40,7 +41,24 @@
   ret void
 }
 
-!nvvm.annotations = !{!1, !2, !3}
+; SM20-LABEL: .entry bax
+; SM30-LABEL: .entry bax
+define void @bax(float* %red, i32 %idx) {
+; SM30: mov.u64 %rd[[TEXHANDLE:[0-9]+]], tex0
+  %texHandle = tail call i64 @llvm.nvvm.texsurf.handle.p1i64(metadata !5, i64 addrspace(1)* @tex0)
+; SM20: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [tex0, {%r{{[0-9]+}}}]
+; SM30: tex.1d.v4.f32.s32 {%f[[RED:[0-9]+]], %f[[GREEN:[0-9]+]], %f[[BLUE:[0-9]+]], %f[[ALPHA:[0-9]+]]}, [%rd[[TEXHANDLE]], {%r{{[0-9]+}}}]
+  %val = tail call { float, float, float, float } @llvm.nvvm.tex.unified.1d.v4f32.s32(i64 %texHandle, i32 %idx)
+  %ret = extractvalue { float, float, float, float } %val, 0
+; SM20: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+; SM30: st.global.f32 [%r{{[0-9]+}}], %f[[RED]]
+  store float %ret, float* %red
+  ret void
+}
+
+!nvvm.annotations = !{!1, !2, !3, !4}
 !1 = !{void (i64, float*, i32)* @foo, !"kernel", i32 1}
 !2 = !{void (float*, i32)* @bar, !"kernel", i32 1}
-!3 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!3 = !{void (float*, i32)* @bax, !"kernel", i32 1}
+!4 = !{i64 addrspace(1)* @tex0, !"texture", i32 1}
+!5 = !{i64 addrspace(1)* @tex0}
Index: llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
===
--- /dev/null
+++ llvm/lib/Target/NVPTX/NVPTXTexSurfHandleInternalizer.cpp
@@ -0,0 +1,81 @@
+//===- NVPTXLowerAggrCopies.cpp - --*- C++ -*--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// \file
+// Replace `nvvm.texsurf.handle` intrinsics with their internal version, i.e.
+// `nvvm.texsurf.handle.internal`.
+//
+//===--===//
+
+#include "NVPTX.h"
+#include "llvm/IR/BasicBlock.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsNVPTX.h"
+#include "llvm/Pass.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "nvptx-texsurf-handle-internalizer"
+
+namespace llvm {
+void initializeTexSurfHandleInternalizerPass(PassRegistry &);
+}
+
+namespace {
+
+class TexSurfHandleInternalizer : public FunctionPass {
+public:
+  static char ID;
+
+  TexSurfHandleInternalizer() : FunctionPass(ID) {
+initializeTexSurfHandleInternalizerPass(*PassRegistry::getPassRegistry());
+  }
+
+  StringRef getPassName() const override {
+return "Internalize `nvvm.texsurf.handle` intrinsics";
+  }
+
+  void getAnalysisUsage(AnalysisUsage ) const override {
+AU.setPreservesCFG();
+  }
+
+  bool runOnFunction(Function ) override {
+bool Changed = false;
+for (auto  : F)
+  for (auto BI = BB.begin(), BE = BB.end(); BI != BE; /*EMPTY*/) {
+IntrinsicInst *II = dyn_cast(&*BI++);
+if (!II ||