zhouronghua wrote:

> Yes, and my question was if we could manage this automatically somehow via 
> `-fdepfile-entry`. I don't know the subtleties here, but from what I can see 
> this PR duplicates the depfile and tracks them separately, which I don't 
> think is what we want. Ideally we have the single one for the host that also 
> knows the device dependencies in addition to the host dependencies.

I'm sorry for misunderstanding of the option of `-fdepfile-entry`.
When we compile with clang like this:
clang++-20 -c example.cu   --cuda-gpu-arch=sm_75   -I/usr/local/cuda/include   
-D__CUDA_NO_TEXTURE_INTRINSICS__   -MD -MT example.o -MF example.d   
-save-temps   -o example.cu.o -v
The dep file for sm_75  kernel will be generated first, so you mean we generate 
an example.d first like this:
"/usr/lib/llvm-20/bin/clang" -cc1 -triple nvptx64-nvidia-cuda -aux-triple 
x86_64-pc-linux-gnu -dependency-file example.d  ... ...
and when compile the host file ,we add the option "-fdepfile-entry example.d"  
like this(The option "-dependency-file example.d" option is already present in 
the original. ):
"/usr/lib/llvm-20/bin/clang" -cc1 -triple x86_64-pc-linux-gnu 
-target-sdk-version=12.6 -fcuda-allow-variadic-functions -aux-triple 
nvptx64-nvidia-cuda  -fdepfile-entry example.d  -dependency-file example.d ... 
...
and clang will generate a new example.d depend on the old kernel example.d?

What if compiling kernels for multiple architectures? For example, both for the 
sm75 architecture and the sm80 architecture?
Does it become like this:
"/usr/lib/llvm-20/bin/clang" -cc1 -triple nvptx64-nvidia-cuda -aux-triple 
x86_64-pc-linux-gnu -target-cpu sm_75 -dependency-file example.d  ... ...

"/usr/lib/llvm-20/bin/clang" -cc1 -triple nvptx64-nvidia-cuda -aux-triple 
x86_64-pc-linux-gnu -target-cpu sm_80 -dependency-file example.d   
-fdepfile-entry example.d  -dependency-file example.d ... ...

"/usr/lib/llvm-20/bin/clang" -cc1 -triple x86_64-pc-linux-gnu 
-target-sdk-version=12.6 -fcuda-allow-variadic-functions -aux-triple 
nvptx64-nvidia-cuda  -fdepfile-entry example.d  -dependency-file example.d ... 
...

and the last example.d will contain all dep from sm_75 sm_80 and the host?

I tried it, and currently using -fdepfile-entry example.d and -dependency-file 
example.d with the same filename doesn't seem to work. Does this mean we need 
to generate different suffixes (with the source file name as the prefix) for 
different kernel compilations? Like .sm_75.d or .sm_80.d ? Then, during the 
host preprocessing stage, we pass all kernel dependency files via 
-fdepfile-entryto trigger automatic merging in the Clang frontend.

If this works, there's only one issue left: do we need to delete the .d files 
generated for kernel compilation?

If we delete them, it's actually not much different from the current 
implementation.

If we don't delete them, it shouldn't be a problem for Makefiles since 
compilation rules are handwritten anyway. 
But for automated build systems like Bazel and CMake, which automatically 
collect .d files for each source file to generate dependency rules, it would 
mean they need to handle an extra task: for multi-architecture compilation, 
they need to collect multiple .d files or there will be many redundent .d file 
left.




https://github.com/llvm/llvm-project/pull/176072
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to