On Tue, 11 Oct 2022, Jakub Jelinek wrote:

> So, does this mean one has to have gcc configured --with-arch=sm_70
> or later to make reverse offloading work (and then on the other
> side no support for older PTX arches at all)?
> If yes, I was kind of hoping we could arrange for it to be more
> user-friendly, build libgomp.a normally (sm_35 or what is the default),
> build the single TU in libgomp that needs the sm_70 stuff with -march=sm_70
> and arrange for mkoffload to link in the sm_70 stuff only if the user
> wants reverse offload (or has requires reverse_offload?).  In that case
> ignore sm_60 and older devices, if reverse offload isn't wanted, don't link
> in the part that needs sm_70 and make stuff working on sm_35 and later.
> Or perhaps have 2 versions of target.o, one sm_35 and one sm_70 and let
> mkoffload choose among them.

My understanding is such trickery should not be necessary with
the barrier-based approach, i.e. the sequence of PTX instructions

  st   % plain store
  membar.sys
  st.volatile

should be enough to guarantee that the former store is visible on the host
before the latter, and work all the way back to sm_20.

Alexander

Reply via email to