tra added a comment. Make sure it works with -save-temps and -fintegrated-as/-fno-integrated-as. They tend to throw wrenches into pipeline construction.
================ Comment at: lib/Driver/Driver.cpp:1380 @@ +1379,3 @@ + C.MakeAction<LinkJobAction>(DeviceActions, types::TY_CUDA_FATBIN), + /* GpuArchName = */ nullptr, + /* AtTopLevel = */ false); ---------------- So, you're treating GpuArchName==nullptr as a special case of DeviceAction for fatbin? Perhaps it warrants its own CudaDeviceLinkAction? ================ Comment at: lib/Driver/Driver.cpp:1620-1625 @@ -1602,3 +1619,8 @@ case phases::Assemble: - return C.MakeAction<AssembleJobAction>(Input, types::TY_Object); + // Assembling generates an object file, except when we're assembling CUDA, + // in which case we get a cubin file. + return C.MakeAction<AssembleJobAction>( + std::move(Input), TC.getTriple().getOS() == llvm::Triple::CUDA + ? types::TY_CUDA_CUBIN + : types::TY_Object); } ---------------- cubin *is* an ELF object file. Do we really need a new type here or can we get by with TY_Object? ================ Comment at: lib/Driver/ToolChains.cpp:4193 @@ +4192,3 @@ + : Linux(D, Triple, Args) { + if (CudaInstallation.isValid()) { + getProgramPaths().push_back(CudaInstallation.getBinPath()); ---------------- unneeded {} here and in few more places throughout the patch. ================ Comment at: lib/Driver/ToolChains.cpp:4230 @@ -4224,3 +4229,3 @@ // Skip this argument unless the architecture matches BoundArch - if (A->getValue(0) != StringRef(BoundArch)) + if (!BoundArch || A->getValue(0) != StringRef(BoundArch)) continue; ---------------- You may as well move it out of the loop and return early if BoundArch is nullptr. ================ Comment at: lib/Driver/Tools.cpp:10621-10625 @@ +10620,7 @@ + ArgStringList CmdArgs; + if (TC.getTriple().isArch64Bit()) { + CmdArgs.push_back("-m64"); + } else { + CmdArgs.push_back("-m32"); + } + ---------------- CmdArgs.push_back(TC.getTriple().isArch64Bit() ? "-m64" : "-m32"); or, even ArgStringList CmdArgs = {TC.getTriple().isArch64Bit() ? "-m64" : "-m32"}; Same in Linker::ConstructJob below. ================ Comment at: lib/Driver/Tools.cpp:10677-10678 @@ +10676,4 @@ + std::string Arch = A->getGpuArchName(); + // We need to name pass an Arch of the form "sm_XX" for cubin files and + // "compute_XX" for ptx. CudaDeviceAction's getGpuArchName() is guaranteed + // to match "sm_XX". ---------------- First line does not parse. In general compute_XX does not necessarily match sm_XX. [[ http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options | ptxas options ]] says that > Allowed values for this option: compute_20, compute_30, compute_35, > compute_50, compute_52; and sm_20, sm_21, sm_30, sm_32, sm_35, sm_50 and sm_52 Note that there's no compute_21, compute_32. You'll need sm_XX -> compute_YY map. ================ Comment at: lib/Driver/Tools.h:923 @@ +922,3 @@ + +// Runs fatbinary, which is sort of a linker for NVPTX. +class LLVM_LIBRARY_VISIBILITY Linker : public Tool { ---------------- Please add more details about what fatbin does. ".. which combines GPU object files and, optionally, PTX assembly into a single output file." http://reviews.llvm.org/D16082 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits