awarzynski accepted this revision. awarzynski added a comment. This revision is now accepted and ready to land.
LGTM, thanks for working on this! Would be great for somebody with a bit more experience with offloading to OK this as well :) @tschuett or perhaps @jhuber6 ? ================ Comment at: clang/lib/Driver/ToolChains/Flang.cpp:128 + if (IsHostOffloadingAction) { + for (size_t i = 1; i < Inputs.size(); ++i) { + if (Inputs[i].getType() != types::TY_Nothing) ---------------- jhuber6 wrote: > agozillon wrote: > > jhuber6 wrote: > > > agozillon wrote: > > > > awarzynski wrote: > > > > > agozillon wrote: > > > > > > awarzynski wrote: > > > > > > > What's the magic "1"? And given that the input count matters here > > > > > > > - is there a test with multiple inputs? > > > > > > It aims to mimic the behavior of Clang: > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L4561 > > > > > > where the main input is skipped (the input currently being > > > > > > compiled or embedded into etc.), when adding to > > > > > > //-fembed-offload-object//. > > > > > > > > > > > > It does look different to Clang's as Clang has more cases and the > > > > > > logic is spread across the constructJob invocation, but the first > > > > > > if case is what the if statement inside of the loop and setting the > > > > > > loop index variable to 1 do. The HostOffloadingInputs array is what > > > > > > is being generated here, except we're skipping and directly > > > > > > applying it as arguments. > > > > > > > > > > > > I tried to condense it a little in this case! Perhaps it loses > > > > > > readability though, I had hoped the comment might have kept it clear > > > > > Thanks for the link - that code in Clang doesn't really clarify what > > > > > makes `Inputs[0]` special 🤔 . > > > > > > > > > > Let me rephrase my question - what's so special about the first > > > > > input? (referred to in Clang as "main input") Is that something > > > > > specific to OpenMP? For example, in this case: > > > > > ``` > > > > > flang-new -fopenmp file.f90 > > > > > ``` > > > > > I assume that `inputs[0]` is "file.f90", so nothing will happen? > > > > > > > > > > > I tried to condense it a little in this case! Perhaps it loses > > > > > > readability though, I had hoped the comment might have kept it clear > > > > > > > > > > Nah, I think that your implementation is fine. It's my ignorance with > > > > > respect to OpenMP that's the problem here ;-) > > > > It's not specific to OpenMP I believe, as far as I am aware Clang's > > > > supported offload models (SYCL and CUDA as well as OpenMP) use it! In > > > > Flang's case we only really care about OpenMP as I believe it's the > > > > only offload programming model supported. > > > > > > > > In the case of the command: > > > > > > > > ``` > > > > flang-new -fopenmp file.f90 > > > > ``` > > > > The code should never be executed as no part of the command will enable > > > > an offloading action (for device or host)! But yes inputs[0] would > > > > indeed refer to file.f90. > > > > > > > > However, this code becomes relevant when you utilise an option that > > > > enables the clangDriver to perform some form of offloading action. For > > > > example a command like: > > > > > > > > ``` > > > > flang-new -fopenmp --offload-arch=gfx90a file.f90 > > > > ``` > > > > Will trigger two phase compilation, one for the host device (your > > > > resident CPU in this command) and one for the device (gfx90a in this > > > > command), the regular host pass will invoke like your provided command > > > > and the device pass will then invoke with -fopenmp-is-device in > > > > addition alongside the device triple. This generates two bitcode files > > > > from the one file, one containing the host code from the file, the > > > > other the device code (extracted from OpenMP target regions or declare > > > > target etc.). > > > > > > > > However, now we have two files, with both parts of our program, we need > > > > to conjoin them together, the clangDriver generates an embeddable > > > > fat-binary/binary using the clang-offload-packager and then invokes > > > > flang-new again, and this is where the above code becomes relevant, as > > > > this binary (or multiple binaries, if we target multiple devices in the > > > > same program) is what is passed to -fembed-offload-object! And > > > > inputs[0] in this case would refer to the output from the original host > > > > pass, which is what we want to embed the device binary into, so we wish > > > > to skip this original host output and only pass the subsequent inputs > > > > (which should be device binaries when the clangDriver initiates a host > > > > offloading action) we want to embed as -fembed-offload-object > > > > arguments. > > > > > > > > The offloading driver is quite complex and my knowledge of it is not > > > > perfect as I am not our resident expert on it unfortunately (so if > > > > anyone sees anything incorrect, please do chime in and correct me)! > > > > > > > > But hopefully this answers your question and gives you an idea of when > > > > and how this -fembed-offload-object comes into play, essentially a way > > > > to get the device binaries we wish to insert into the host binary, so > > > > it can load the binaries at runtime and execute them. Currently > > > > upstream Flang doesn't utilise this option of course, but we intend to > > > > use this as part of our OpenMP offloading efforts for AMD devices > > > > (whilst leaving the door open for other vendors devices as well). We > > > > are trying to re-use/mimic as much of the existing machinery that the > > > > clangDriver implements as we can! > > > > > > > The compiler requires at least one input file to run, otherwise it exits > > > early. Therefore we're guaranteed to have at least one input file in the > > > list. Some functions need an input file, usually to write some temp name > > > to, and `Inputs[0]` is the easiest way to get an input file. > > Thank you very much @jhuber6! I should have added you as a > > subscriber/reviewer as well in hindsight, sorry about that. > Sorry, this is two separate things. I was thinking about the driver's input > list which behaves like I mentioned above. > > Here is the input list to an actual compilation job. In this case we expect > to only get one. E.g. `clang foo.c bar.c` gets turned into `clang -cc1 foo.c` > and `clang -cc1 bar.c`. The extra files added here are intended to be used as > additional input handled separately. So, for example consider the following > OpenMP offloading compilation. > > ``` > > clang input.c -fopenmp --offload-arch=gfx1030 -ccc-print-bindings > # "x86_64-unknown-linux-gnu" - "clang", inputs: ["input.c"], output: > "/tmp/input-44c8b5.bc" > # "amdgcn-amd-amdhsa" - "clang", inputs: ["input.c", "/tmp/input-44c8b5.bc"], > output: "/tmp/input-gfx1030-4471d9.bc" > # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: > ["/tmp/input-gfx1030-4471d9.bc"], output: "/tmp/input-83327d.out" > # "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/input-44c8b5.bc", > "/tmp/input-83327d.out"], output: "/tmp/input-cdd693.o" > # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: > ["/tmp/input-cdd693.o"], output: "a.out" > ``` > The first input file is the actual file meant to be compiled. The other files > are handled separately. For the `amdgcn-amd-amdhsa` triple the extra input is > the host bitcode we use to match symbols between the host and the device. For > the `x64` triple the extra input is a binary blob to be embedded into the > host. Thanks for this thorough explanation! > We are trying to re-use/mimic as much of the existing machinery that the > clangDriver implements as we can! That makes a ton of sense 👍🏻 ! It would be good to add a note in the summary (perhaps with a link to https://clang.llvm.org/docs/OffloadingDesign.html). So basically `inputs[0]` is the host bitcode file, and this method is for device files, right? Perhaps you could replace "primary input" with something a bit more descriptive? Btw, would you be willing to update https://github.com/llvm/llvm-project/blob/main/flang/docs/FlangDriver.md with some notes on offloading? Clang's documentation is very C-centric. Definitely not in this patch ;-) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D145815/new/ https://reviews.llvm.org/D145815 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits