[PATCH] D101630: [HIP] Fix device-only compilation

Yaxun Liu via Phabricator via cfe-commits Fri, 28 May 2021 14:16:30 -0700

yaxunl added a comment.

In D101630#2777702 <https://reviews.llvm.org/D101630#2777702>, @tra wrote:


> In D101630#2777346 <https://reviews.llvm.org/D101630#2777346>, @yaxunl wrote:
>
>> In D101630#2748513 <https://reviews.llvm.org/D101630#2748513>, @tra wrote:
>>
>>> How about this:
>>> If the user explicitly specified `--cuda-host-only` or 
>>> `--cuda-device-only`, then by default only allow producing the natural 
>>> output format, unless a bundled output is requested by an option. This 
>>> should keep existing users working.
>>> If the compilation is done without explicitly requested sub-compilation(s), 
>>> then bundle the output by default. This should keep the GPU-unaware tools 
>>> like ccache happy as they would always get the single output they expect.
>>>
>>> WDYT?
>>
>> `--cuda-host-only` always have one output, therefore there is no point of 
>> bundle its output. We only need to decide the proper behavior of 
>> `--cuda-device-only`.
>
> It still fits my proposal of requiring a single sub-compilation and not 
> bundling the output.
> The point was that such behavior is consistent regardless of whether we're 
> compiling CUDA or HIP for the host or for device.
>
>> How about keeping the original default behavior of not bundling if users do 
>> not specify output file, 
>> whereas bundle the output if users specifying output file.
>
> I think it will make things worse. Compiler output should not change 
> depending on whether `-o` is used.
>
>> Since specifying output file indicates users  requesting one output. 
>> -f[no-]hip-bundle-device-output override the default behavior.
>
> I disagree. When user specifies the output, the intent is to specify the 
> **location** of the outputs, not its contents or format.
>
> Telling compiler to produce a different output format should not depend on 
> specifying (or not) the output location.
>
> I think our options are:
>
> - Always bundle --cuda-device-only outputs by default. This is consistent for 
> HIP compilation, but deviates from CUDA, which can't do bundling. Also, 
> single-target subcompilation deviates from both CUDA and regular C++ 
> compilation, which is what most users would be familiar with and which would 
> probably be the most sensible default for a single sub-compilation. It can be 
> overridden with an option, but it goes against the principle that it's 
> specialized use case that should need extra options. The most common use case 
> should not need them.
>
> - Only bundle multiple sub-compilations' output by default. This would 
> preserve the sensible single sub-compilation behavior. The downside is that 
> it makes the output format depend on whether compiler ends up doing one or 
> many sub-compilations. E.g. `--offload-arch=A -S` would produce ASM and 
> `--offload-arch=A --offload-arch=B -S` would produce a bundle. If the user 
> can't control some of the compiler options, Such approach would make output 
> format unpredictable. E.g. passing `--offload-arch=foo` to compiler on 
> godbolt would all of a sudden produce bundled output instead of assembly text 
> or a sensible error message that you're trying to produce multiple outputs.
>
> - Keep the current behavior (insist on single sub-compilation) as the 
> default, allow overriding it for HIP with the flag. IMO that's the most 
> consistent option and I still think it's the one most suitable to keep as the 
> default.
>
> I can see the benefit of always bundling for HIP, but I also believe that 
> keeping things simple, consistent and predictable is important. Considering 
> that we're tinkering in a relatively obscure niche of the compiler, it 
> probably does not matter all that much, but it should not stop us from trying 
> to figure out the best approach in a principled way.
>
> I think we could benefit from a second opinion on which approach would make 
> more sense for clang. 
> Summoning @jdoerfert and @echristo.

How does nvcc --genco behave when there are multiple GPU arch's? Does it output 
a fat binary containing multiple ISA's? Also, does it support device-only 
compilation for intermediate outputs?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D101630: [HIP] Fix device-only compilation

Reply via email to