Hi Sylvain, > > Also, we modified tuned COLL to implement interconnect-and-topology- > > specific bcast/allgather/alltoall/allreduce algorithm. These algorithm > > implementations also bypass PML/BML/BTL to eliminate protocol and > software > > overhead. > This seems perfectly valid to me. The current coll components use normal > MPI_Send/Recv semantics, hence the PML/BML/BTL chain, but I always saw the > coll framework as a way to be able to integrate smoothly "custom" > collective components for a specific interconnect. I think that Mellanox > also did a specific collective component using directly their ConnectX HCA > capabilities. > > However, modifying the "tuned" component may not be the better way to > integrate your collective work. You may consider creating a "tofu" coll > component which would only provide the collectives you optimized (and the > coll framework will fallback on tuned for the ones you didn't optimize).
Yes. I agree. But sadly, my colleague implemented it badly. We created another COLL component that use interconnect barrier, like Mellanox FCA. > > To achieve above, we created 'tofu COMMON', like sm > (ompi/mca/common/sm/). > > > > Is there interesting one? > It may be interesting, yes. I don't know the tofu model, but if it is not > secret, contributing it is usually a good thing. > > Your communication model may be similar to others and portions of code may > be shared with other technologies (I'm thinking of IB, MX, PSM,...). > People writing new code would also consider your model and let you take > advantage of it. Knowing how tofu is integrated into Open MPI may also > impact major decisions the open-source community is taking. Tofu communication model is simular to that of IB RDMA. Actually, we use source code of openib BTL as a reference. We'll consider contribution of some code, and join the discussion. Regards, Takahiro Kawashima, MPI development team, Fujitsu