Hello, As I understand, When MPI_Iprobe is called, the code that is called is the function pointed by the attribute
mca_pml_base_module_iprobe_fn_t pml_iprobe; in ompi/mca/pml/pml.h In the file ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c (Open-MPI 1.4.3), ompi_crcp_bkmrk_pml_iprobe calls drain_message_find_any. In drain_message_find_any (in ompi/mca/crcp/bkmrk/crcp_bkmrk_pml.c), there is a loop over all MPI ranks regardless of the peer parameter. For instance, with 256 peers, probing for peer 255 requires 256 iterations while probing for peer 0 requires 1 iteration. As I understand it, the linked list ompi_crcp_bkmrk_pml_peer_refs is populated with nprocs entries where nprocs is presumably the number of MPI ranks in MPI_COMM_WORLD. If my understanding is right, here are some suggestions: 1. ompi_crcp_bkmrk_pml_peer_refs should be an array so that when peer is not MPI_ANY_SOURCE, MPI_Iprobe can returns in constant time. 2. There should be some sort of round-robin mechanism for the case where the peer is MPI_ANY_SOURCE, otherwise lower ranks will get more probed and higher ranks will suffer from starvation. This could be done by having a current position in the peer list (or array, see point 1). Instead of starting to loop on the first, the loop would start at current position and a maximum of nprocs iterations would take place. A code review is on my blog: http://dskernel.blogspot.com/2011/09/code-review-what-happens-in-open-mpis.html Sébastien