On Tue, 11 Oct 2022, Jakub Jelinek wrote: > So, does this mean one has to have gcc configured --with-arch=sm_70 > or later to make reverse offloading work (and then on the other > side no support for older PTX arches at all)? > If yes, I was kind of hoping we could arrange for it to be more > user-friendly, build libgomp.a normally (sm_35 or what is the default), > build the single TU in libgomp that needs the sm_70 stuff with -march=sm_70 > and arrange for mkoffload to link in the sm_70 stuff only if the user > wants reverse offload (or has requires reverse_offload?). In that case > ignore sm_60 and older devices, if reverse offload isn't wanted, don't link > in the part that needs sm_70 and make stuff working on sm_35 and later. > Or perhaps have 2 versions of target.o, one sm_35 and one sm_70 and let > mkoffload choose among them.
My understanding is such trickery should not be necessary with the barrier-based approach, i.e. the sequence of PTX instructions st % plain store membar.sys st.volatile should be enough to guarantee that the former store is visible on the host before the latter, and work all the way back to sm_20. Alexander