Hi George,

Thanks for the feedback, appreciated.  Few questions/comments:

> Regarding the tag with your proposal the OFI MTL will support a wider range 
> of tags than the OB1 PML, where we are limited to 16 bits. Just make sure you 
> correctly expose your tag limit via the MPI_TAG_UB.

I will take a look at MPI_TAG_UB.

> I personally would prefer a solution where we can alter the distribution of 
> bits between bits in the cid and tag at compile time.

Sure, I can do this. What would you suggest for plan B? Fewer tag bits and more 
cid ones? Numbers?

>. We can also envision this selection to be driven by an MCA parameter, but 
>this might be too costly

I did think about it. However, as you say, I’m not yet convinced it is worth it:

a)      I will be soon reviewing synchronous send protocol. Not reviewed 
thoroughly yet, but I’m quite sure I can reduce it to use 2 bits (maybe just 
1). Freeing 2 (or 3) more bits for cids or ranks.

b)      Most of the providers TODAY effectively support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECV (psm2, gni, verbs;ofi_rxm, sockets). This is just a fallback 
for potential new ones.  FI_DIRECTED_RECV is necessary to discriminate the 
source at RX time when the source is not in the tag.

c)       I will include build_time_plan_B you just suggested ;)

Thanks, again.

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of George 
Bosilca
Sent: Saturday, March 03, 2018 6:29 AM
To: Open MPI Developers <devel@lists.open-mpi.org>
Subject: Re: [OMPI devel] Default tag for OFI MTL

Hi Matias,

Relaxing the restriction on the number of ranks is definitively a good thing. 
The cost will be reflected on the number of communicators and tags, and we must 
be careful how we balance this.

Assuming context_id is the communicator cid, with 10 bits you can only support 
1024. A little low, even lower than MVAPICH. The way we allocate cid is very 
sparse, and with a limited number of possible cid, we might run in troubles 
very quickly for the few applications that are using a large number of 
communicators, and for the resilience support. Yet another reason to revisit 
the cid allocation in the short term.

Regarding the tag with your proposal the OFI MTL will support a wider range of 
tags than the OB1 PML, where we are limited to 16 bits. Just make sure you 
correctly expose your tag limit via the MPI_TAG_UB.

I personally would prefer a solution where we can alter the distribution of 
bits between bits in the cid and tag at compile time. We can also envision this 
selection to be driven by an MCA parameter, but this might be too costly.
  George.




On Sat, Mar 3, 2018 at 2:56 AM, Cabral, Matias A 
<matias.a.cab...@intel.com<mailto:matias.a.cab...@intel.com>> wrote:
Hi all,

I’m working on extending the OFI MTL to support FI_REMOTE_CQ_DATA (1) to extend 
the number of ranks currently supported by the MTL. Currently limited to only 
16 bits included in the OFI tag (2). After the feature is implemented there 
will be no limitation for providers that support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECEIVE (3). However, there will be a fallback mode for providers 
that do not support these features and I would like to get consensus on the 
default tag distribution. This is my proposal:

* Default: No FI_REMOTE_CQ_DATA
* 01234567 01| 234567 01234567 0123| 4567 |01234567 01234567 01234567 01234567
* context_id   |    source rank                 |proto|          message tag

#define MTL_OFI_CONTEXT_MASK            (0xFFC0000000000000ULL)
#define MTL_OFI_SOURCE_MASK             (0x003FFFF0000000000ULL)
#define MTL_OFI_SOURCE_BITS_COUNT       (18) /* 262,143 ranks */
#define MTL_OFI_CONTEXT_BITS_COUNT      (10) /* 1,023 communicators */
#define MTL_OFI_TAG_BITS_COUNT          (32) /* no restrictions */
#define MTL_OFI_PROTO_BITS_COUNT        (4)

Notes:

-          More ranks and fewer context ids than the current implementation.

-          Moved the protocol bits from the most significant bits because some 
providers may reserve starting from there (see mem_tag_format (4)) and sync 
send will not work.

Thoughts?

Today we had a call with Howard (LANL), John and Hamuri (HPE) and briefly 
talked about this, and also thought about sending this email as a query to find 
other developers keeping an eye on OFI support in OMPI.

Thanks,
_MAC



(1)    https://ofiwg.github.io/libfabric/master/man/fi_cq.3.html

(2)    
https://github.com/open-mpi/ompi/blob/master/ompi/mca/mtl/ofi/mtl_ofi_types.h#L70

(3)    https://ofiwg.github.io/libfabric/master/man/fi_getinfo.3.html

(4)    https://ofiwg.github.io/libfabric/master/man/fi_endpoint.3.html







_______________________________________________
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to