Issue 55455
Summary clang: -ftime-trace output does not include device compilation
Labels new issue
Assignees
Reporter Maetveis
    As the title says, it would be nice when using offloading targets (like CUDA / HIP / OpenMP) the final time trace would include the traces from the device side compilations.

These files are already being generated (because the spawned processes inherit the -ftime-report flags), but are not merged to the final trace. In the case of CUDA they can be found under `/tmp/<file>-<hash>/<file>-<arch>.json`, under HIP they seem to be overwritten as they use the same filename as the host compilation.

Having clang merge these files would help usability for tools like [ninjatracing](https://github.com/nico/ninjatracing) (with the `--embed-time-trace` option) when used with CUDA/HIP that are not aware of the multiple compilation passes involved.

To reproduce, compile the following file with `clang main.cu --offload-arch=sm_35 --offload-arch=sm_80 -ftime-trace". 

```cuda
#ifdef __CUDA_ARCH__
template <unsigned int N>
struct SlowToCompile {
    constexpr static unsigned int value = N + SlowToCompile<N - 1>::value;
};

template <>
struct SlowToCompile<0> {
    constexpr static unsigned int value = 0;
};

#endif

void __global__ kernel(unsigned int* dst) {
#ifdef __CUDA_ARCH__
    *dst = SlowToCompile<1024>::value;
#endif
}

int main(int argc, char** argv) {
    if(argc < 0) {
        kernel<<< dim3(1), dim3(1) >>>(nullptr);
    }
}

```

I would like to help fixing this and would like to hear your opinions on my proposed approach:
Add an option to the `-cc1` command line to merge the time trace output to an already existing file. The offloading drivers could then add this to the device compilation jobs when time trace is enabled. 
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to