Hey Sasha, This is a conceptually simple option I've developed for updn routing.
Currently in updn routing, nodes/guids are routed on switches in a seemingly-random order, which I believe is due to internal data structure organization (i.e. cl_qmap_apply_func is called on port_guid_tbl) as well as how the fabric is scanned (it is logically scanned from a port perspective, but it may not be logical from a node perspective). I had a hypothesis that this was leading to increased contention in the network for MPI. For example, suppose we have 12 uplinks from a leaf switch to a spine switch. If we want to send data from this leaf switch to node[13-24], the up links we will send on are pretty random. It's because: A) node[13-24] are individually routed at seemingly-random points based on when they are called by cl_qmap_apply_func(). B) the ports chosen for routing are based on least used port usage. C) least used port usage is based on whatever was routed earlier on. So I developed this patch series, which supports an option called "guid_routing_order_file" which allows the user to input a file with a list of port_guids which will indicate the order in which guids are routed instead (naturally, those guids not listed are routed last). I list the port guids of the nodes of the cluster from node0 to nodeN, one per line in the file. By listing the nodes in this order, I believe we could get less contention in the network. In the example above, sending to node[13-24] should use all of the 12 uplinks, b/c the ports will be equally used b/c nodes[1-12] were routed beforehand in order. The results from some tests are pretty impressive when I do this. LMC=0 average bandwidth in mpiGraph goes from 391.374 MB/s to 573.678 MB/s when I use guid_routing_order. A variety of other positive performance increases were found when doing other tests, other MPIs, and other LMCs if anyone is interested. BTW, I developed this patch series before your preserve-base-lid patch series. It will 100% conflict with the preserve-base-lid patch series. I will fix this patch series once the preserve-base-lids patch series is committed to git. I'm just looking for comments right now. Al -- Albert Chu [EMAIL PROTECTED] 925-422-5311 Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
