[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

Joseph Huber via Phabricator via cfe-commits Tue, 24 May 2022 15:08:30 -0700

jhuber6 added a comment.

A




================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:6836
+
+  // If the CUID is not specified we try to generate a unique postfix.
+  if (getLangOpts().CUID.empty()) {
----------------
jhuber6 wrote:
> tra wrote:
> > jhuber6 wrote:
> > > tra wrote:
> > > > > However, [CUID] is not always availible. 
> > > > 
> > > > The question is -- when and why is it not available? I'm getting the 
> > > > feeling that we're fixing the consequence here, not the root cause.
> > > > 
> > > > Is there a reason we can't make sure that the driver always generates a 
> > > > cuid for offload subcompilations and error out if it's needed but is 
> > > > not provided?
> > > > That would make this fallback unnecessary and would be a more robust 
> > > > approach in general.
> > > > 
> > > So, I'm more in favor of this approach because it doesn't require extra 
> > > intervention from the compiler driver, this makes it less convoluted to 
> > > do split compilation since we don't have an extra arguments. The way I 
> > > would prefer it, is that we do this implicitly by default without 
> > > requiring extra thought from the driver, but if it's not good enough we 
> > > can support the manual `CUID` approach to let the user override it. I 
> > > think this is a cleaner implementation, and is mostly coming from my 
> > > support for CUDA in the new driver which currently doesn't implement the 
> > > CUID as we do with the old driver. Generally I'd prefer things to behave 
> > > independent of the driver, so we can consider host and device compilation 
> > > more separately.
> > > So, I'm more in favor of this approach because it doesn't require extra 
> > > intervention from the compiler driver
> > 
> > We need the driver intervention for any cc1 compilations anyways, so this 
> > does not buy us anything.  While you can run a sub-compilation manually 
> > with handcrafted cc1 flags, that's not a practical use case. The driver is 
> > the ultimate source of cc1 flags.
> > 
> > > this makes it less convoluted to do split compilation since we don't have 
> > > an extra arguments.
> > 
> > For CUDA/HIP sub-compilation should be done with clang 
> > --cuda-host-only/--cuda-device-only.  Whether the driver supplies yet 
> > another cc1 option, --cuid=... makes no difference to the user launching 
> > such sub-compilation. 
> > 
> > > The way I would prefer it, is that we do this implicitly by default 
> > > without requiring extra thought from the driver, but if it's not good 
> > > enough we can support the manual CUID approach to let the user override 
> > > it.
> > 
> > I agree that we can come up with something that will almost always work. 
> > Possibly even good enough for all practical purposes. However, if a better 
> > solution would take comparable effort, it would make sense to do things 
> > right and avoid adding technical debt. 
> > 
> > On the other hand, requiring the driver to supply identical cuid to all 
> > sub-compilations appears to be a better approach to me:
> > * Driver is the best place to do it, functionally. Driver has access to all 
> > user-provided inputs and is in position to guarantee that all 
> > subcompilations get the same cuid.
> > * Calculating CUID in the driver keeps relevant logic in one place. Doing 
> > it in the driver *and* in the codegen 
> > * Figuring out what inputs are relevant for calculation of CUID in cc1 
> > invocation is error prone. E.g. we have to guess which cc1 options are 
> > relevant or not and is the driver would pass a macro to one subcompilation 
> > but not to another, we would end up generating mismatching CUID and would 
> > not have any way to notice that. Even when that's not the case, we would 
> > need to guess which flags, supplied by the driver, are relevant. At CC1 
> > level that may be somewhat complicated as top-level options may expand to 
> > quite a few more cc1 options. E.g. we'll need to take into account 
> > `-std=...`, `--cuda-path=`, `-include ...`, `-I` (and other include 
> > paths)... All of that does not belong to the codegen.
> > 
> > The driver is already doing CUID computation, so I do not see any downsides 
> > to just letting it do its job, and I do believe it will be a better, and 
> > likely less complicated, solution.
> > 
> > > ... mostly coming from my support for CUDA in the new driver which 
> > > currently doesn't implement the CUID as we do with the old driver
> > 
> > Right. That appears to be the key missing piece.
> > 
> > What are the obstacles for having CUID calculation done in the new driver. 
> > It should have all the info it needs. What am I missing?
> > 
> > For CUDA/HIP sub-compilation should be done with clang 
> > --cuda-host-only/--cuda-device-only. Whether the driver supplies yet 
> > another cc1 option, --cuid=... makes no difference to the user launching 
> > such sub-compilation.
> The problem I have with this is that we use the command line to generate the 
> value, so they aren't going to be the same without the user manually 
> specifying it. I guess we could filter out only "relevant" command line 
> flags, maybe that's an option. I just think it's not intuitive for a name 
> mangling scheme to depend on something external, but there's definitely 
> advantages to doing it that way.
> 
> I can see your point for the Driver handling this stuff. Now that I'm 
> thinking about it I don't think looking at the macros or the other arguments 
> is a sound solution in the first place. Even without that it would work for 
> almost all the same cases just using the file's unique ID. Without that, this 
> solution is guaranteed not to conflict with any other file on the same file 
> system at the time of compilation. This, as we discussed, potentially fails 
> for non-static source trees and compiling the same file twice and linking it. 
> The current CUID implementation fails on the former, this method fails on 
> both.
> 
> If the CUID didn't exist, the way I would have implemented it would simply be 
> with the File-ID, and have the CUID be a simple marshalling option that lets 
> the user override it to something unique if needed. I personally think that's 
> simpler for 99.99% of cases and has an easy-out in the last 0.01%. Given that 
> it already exists there's some desire to keep it since the work has already 
> been done I understand.
> 
> > What are the obstacles for having CUID calculation done in the new driver. 
> > It should have all the info it needs. What am I missing?
> It's less of a difficulty in implementing and more hoping we could make the 
> name mangling more simple and work by default without the driver. 
> Also, we may need this support for a single case in OpenMP, and I'd prefer 
> not need to generate the CUID for OpenMP offloading when it's unused the vast 
> majority of the time. Generally I'd prefer if compiling for the host / device 
> was conceptually the same to the user without requiring external values. If 
> we're sold on the CUID method I can go forward with that, but from my 
> perspective what it's buying us is the ability to compile the following
> ```
> static __device__ int a;
> 
> #ifdef MACRO
>   do_something_with(a);
> #else
>   do_something_else();
> #endif
> ```
> ```
> clang foo.cu -DMACRO -c -o 1.o
> clang foo.cu 1.o
> ```
> 
> This is just a tough problem overall, I don't think there's a single perfect 
> solution. Whatever we choose we'll be trading reproducibility for correctness 
> or whatever. You have more seniority in this space so it's your call what you 
> think I should go forward with. 
Also, it's incredibly convoluted, but I can think of a way to break even the 
current CUID for this.
```
static __device__ int a;

__device__ int __attribute__((weak)) *a_ref = &a;
```
```
$ clang a.cu -c -fgpu-rdc
$ mv a.o b.o
$ clang a.cu -c -fgpu-rdc
$ nvlink a.o b.o -arch=sm_35 -o out.cubin
nvlink error   : Multiple definition of '_ZL1a__static__d041026c8e4167e6' in 
'1.o', first defined in 'a.o'
nvlink fatal   : merge_elf failed
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125904/new/

https://reviews.llvm.org/D125904

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

Reply via email to