>> Permission to ask potentially dumb questions? > > Not dumb at all.
Thanks. I was scared that Barry would tell me I should quit my job (again) if I asked a dumb question :-) And the high-level goal that brought me here was to understand PETSc's use of MPI-3 features, including RMA. If there's more of that on a branch, please point me to it. Bitbucket code search either doesn't exist or I cannot find it. >> 1) Implementing atomic read-modify-write using >> Lock->Get->Accumulate->Unlock is W.R.O.N.G. Get and Accumulate are >> NOT guaranteed to be ordered within any epoch. Ever. You appear to >> be leveraged the ordered channel semantics of MPICH's Ch3. You need >> to either use MPI-3 RMA's MPI_Get_accumulate (or MPI_Fetch_and_op) or >> use a mutex ala MPICH's ROMIO and ARMCI-MPI's MPI-2 port. See >> http://dx.doi.org/10.1109/CCGRID.2005.1558687 for algorithm details. > > So the secret here is that the Windown implementation was written before > MPI-3 was available and mostly exists as the proof of concept that > demonstrated that the API was useful. We pretty quickly made it > non-default and told people to never use it because Open MPI datatypes > were so broken and because we've never seen it be faster. Of course > it's dangerous to leave this incorrect code (relying on a side-effect of > the MPICH implementation) lying around, but my feeling is that this > isn't even a good (for performance) one-sided implementation, so it > should get a more major overhaul if we have reason to believe that > one-sided is a desirable interface (versus two-sided or neighborhood > collectives). What do you think? It's not just MPICH-specific. It will break on BG[PQ] since they use ADI not Ch3. If I were you, I'd ifdef it to MPI-3 only and use MPI_Get_accumulate, which will fix all your problems at once, particularly if this code is not used by default. >> 2) 'sf->ranks[i]' is used in MPI_Win_lock, whereas the rest of the RMA >> calls use ranks[i]. I assume this is innocuous, but unless you have a >> mutex on 'sf', it is theoretically possible that sf->ranks[i] could be >> changed by another thread in such a way as to lead to an undefined or >> incorrect program. If you prohibit this behavior as part of the >> internal design contract, then it should be noted. > > sf->ranks is not mutated after setup. Except due to DRAM SDC, rowhammer exploits, gamma ray muons... :-D In any case, it will probably save you 0-10 cycles to use the automatic variable rather than to dereference the struct pointer again :-) Best, Jeff -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/