jhuber6 added a comment. Where is the code to change the target registration? We need a new initialization runtime call to use and change the old one to reallocate the `__tgt_device_image` array. Did you work around that some other way?
================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:613-614 CmdArgs.push_back("-shared"); + std::string ArchArg = std::string("-plugin-opt=mcpu=").append(Arch.str()); + CmdArgs.push_back(ArchArg); CmdArgs.push_back("-o"); ---------------- Does this just pass the architecture to the AMD link? Is this related? If not move it to a separate patch and I'll review it. ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1069 bool WholeProgram = false; + std::string TheArch = File.Arch; ---------------- Is this necessary? ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1153 SmallVector<ArrayRef<char>, 4> ImagesToWrap; + SmallVector<ArrayRef<char>, 4> OffloadArchs; + std::string Arch; ---------------- This should be a StringRef, the data is stable. When you create the struct just use this and it will add the null terminator for you. ``` Constant *ConstantStr = ConstantDataArray::getString(M.getContext(), Str); ``` ================ Comment at: clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp:1165 + + if (Arch.empty()) { + Arch = std::string(File.Arch); ---------------- An empty `StringRef` should already create a null string when you create a ConstantDataArray, should be fine to just push it back. ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:62-63 +// int32_t version; +// int32_t image_number; +// int32_t number_images; +// char* offload_arch; ---------------- The struct should be contained within the image itself so we can query it during registration, why do we need to know the number? ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:64-65 +// int32_t number_images; +// char* offload_arch; +// char* target_compile_opts; +// }; ---------------- ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:191-194 + auto *NullPtr = llvm::ConstantPointerNull::get(Type::getInt8PtrTy(C)); + unsigned int ImgCount = 0; + std::string OffloadArchBase = "__offload_arch"; + std::string OffloadImageBase = "offload_image_info"; ---------------- You should be able to pass the name to the constructor, if it's internal LLVM will give you a new name. ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:214 - ImagesInits.push_back(ConstantStruct::get(getDeviceImageTy(M), ImageB, - ImageE, EntriesB, EntriesE)); + auto OArch = OffloadArchs[ImgCount]; + Constant *OArchV = ConstantDataArray::get(C, OArch); ---------------- I would just assert that the metadata vector's size matches the number of images, we're always going to generate these now so it makes sense to generate them in pairs. Then just use `llvm::zip` or just an iterator to go through both of these at the same time. ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:215-227 + Constant *OArchV = ConstantDataArray::get(C, OArch); + std::string OffloadArchGV(OffloadArchBase), + OffloadImageGV(OffloadImageBase); + if (ImgCount) { + auto Suffix = std::to_string(ImgCount); + OffloadArchGV.append(".").append(Suffix); + OffloadImageGV.append(".").append(Suffix); ---------------- Why do we need all these strings? Should just be generating a constant initializer. ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:229-232 + // store value of these variables (i.e. offload archs) into a custom + // section which will be used by "offload-arch -f". It won't be + // removed during binary stripping. + GV->setSection(".offload_arch_list"); ---------------- Why does this need a custom section? We should just use it like this, not sure why we need these to be anything but some internal struct. ``` for (auto *Image : BinDesc->Images) { if (Image->Info) // Use Info } ``` ================ Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:240-241 + getImageInfoTy(M), ConstantInt::get(Type::getInt32Ty(C), 1), + ConstantInt::get(Type::getInt32Ty(C), ImgCount++), + ConstantInt::get(Type::getInt32Ty(C), (uint32_t)OffloadArchs.size()), + RequirementVPtr, ---------------- Why do we need this when we should already know which image we're looking at? ================ Comment at: openmp/libomptarget/include/omptarget.h:122 +/// __tgt_image_info: +/// ---------------- This comment is extremely long. It should be sufficient to just say what it does, 3 lines max. Specify that it may be null in the device image struct. ================ Comment at: openmp/libomptarget/include/omptarget.h:145-146 + int32_t version; // The version of this struct + int32_t image_number; // Image number in image library starting from 0 + int32_t number_images; // Number of images, used for initial allocation + char *offload_arch; // e.g. sm_30, sm_70, gfx906, includes features ---------------- I still feel like we don't need these, correct me if I'm wrong. ================ Comment at: openmp/libomptarget/include/omptarget.h:157 __tgt_offload_entry *EntriesEnd; // End of table (non inclusive) + __tgt_image_info *ImageInfo; // Metadata about the image }; ---------------- ================ Comment at: openmp/libomptarget/include/omptarget.h:169-176 +/// __tgt_active_offload_env +/// +/// This structure is created by __tgt_get_active_offload_env and is used +/// to determine compatibility of the images with the current environment +/// that is "in play". +struct __tgt_active_offload_env { +char *capabilities; // string returned by offload-arch -c ---------------- Shouldn't the plugin handle this? We should be able to get the architecture in the plugin and select the info struct that matches it, if it exists. Then if the plugin finds one that matches we indicate that it's compatible like we do now. Correct me if I'm wrong. ================ Comment at: openmp/libomptarget/src/rtl.cpp:444 + + RTLInfoTy *FoundRTL = NULL; PM->RTLsMtx.lock(); ---------------- Shouldn't we be able to have the plugin handle its own architecture detection, then return from `R.is_valid_binary`? That makes more sense to me than introducing a new tool to libomptarget, since we can already query the architecture from the CUDA runtime for example. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124525/new/ https://reviews.llvm.org/D124525 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits