Hello, In this email, I answer your question and give 3 things you may want to do.
1) Manually select the way messages are sent and received 2) Turn on data collection 3) Testing other communication models. By the way, the git version of Ray now works fine on guillimin, no more crashes due to communication ! > ________________________________________ > De : Adrian Platts [[email protected]] > Date d'envoi : 30 juillet 2012 07:36 > À : Sébastien Boisvert > Objet : Question about MPI > > Hi Sebastien > > I have a very simple question about message routing on Ray. I apologize if > you have covered this before in more detailed previous discussion. > > The context here is that we run Ray on a single motherboard system. It has > 80 cores and 256GB RAM. When we run Ray there are usually no other > user-level processes running. > That is a nice system. > I decided to have a look at the networkTest files from our most recent runs > to see if things looked different with few or many threads, and was surprised Ray is not using any threads, it only uses one process per MPI rank. So I guess you mean processes and not threads. > to find 2 orders of magnitude difference in the latency, but that this wasn't > directly related to the number of threads being used. > That is huge. 4133 microseconds for a job that runs all inside one node is simply unacceptable. This is 4.1 milliseconds, which is ridiculous. Maybe Open-MPI is using TCP instead of shared memory. So to what is it related then ? Sometimes, the distribution of latency is bimodal. > So the question arose - does Ray use the network stack for all message > routing even when all the routing is going between processes in the same > memory space? I would call that message communication, not message routing. Message routing refers to the process of changing the path that a message is taking when going from MPI rank A to MPI rank B. Simply put, Ray uses MPI for any message that is sent or received, regardless of the source or the destination. Hence, Ray is not aware of how the messages are sent or received. The MPI library will deal with this stuff. For example, in Open-MPI, there are the byte transfer layers (BTL) if your use the point-to-point message layer (PML) called Obi-Wan Kenobi (ob1). If a message is sent from rank A to rank A, then the BTL self is utilized. The BTL self is only doing a simple memory copy. When a message is sent from rank A to rank B and that A and B are on the same node, then the BTL sm is used. sm stands for shared memory. Finally, if ranks A and B are on different nodes, the selected BTL will depend on your system. On colosse, it will use Infiniband with the BTL openib. On guillimin, however, the PML Obi-Wan Kenobi is not utilized. Instead, Open-MPI is using the PML Connor McLeod (cm), which can only have exactly one MTL (I think MTL stands for media transfer layer, but I am not sure). The MTL used on guillimin is called psm, for performance-scaled messaging. Obi-Wan Kenobi (ob1) is the default on most systems. See this email on the Open-MPI list -> http://www.open-mpi.org/community/lists/users/2012/06/19720.php > If so should we look to optimize something to decrease the variance on the > loopback latency below? > Yes, there are things you can test, of course. On your system, in theory the MPI library is supposed to use shared memory. But then, I guess you don't have a true symmetric system (SMP), but more of a NUMA (non-uniform memory access) system. On a NUMA system, a given core has a different access speed depending the memory address it is addressing. True SMP systems are expensive because the hardware must protect cache coherency. > Finally I was unsure if Ray might use a non-uniform model of communication > where if processes resided in the same memory space they might use > shared memory to pass messages with network messages only being used for > processes on different systems? > The processes may be on the same node, but they don't share their virtual address space. What you describe is the default behavior in both MPICH2 and Open-MPI. See below for 3 courses of action. > Thanks as always > > Adrian > Adrian Platts > McGill > > Output of network test (SNIP is not the real server name), this is for Ray > 1.7 no routing was specified and the command line was usually something > simple like Just out of curiosity, what features in 1.7 makes you not migrate to 2.0.0 ? > mpirun -np XY ./Ray -k 41 -p ... -o foo > Are you running Ray inside a virtual machine ? What are the 80 cores ? (Xeon, Opteron, something else) What is your kernel version ? Linux distribution ? Is it an SGI system like everyone else ? try these things: 1) Manually select the way messages are sent and received If you are using Open-MPI, try this: mpiexec -n 70 \ --mca btl self,sm \ ./Ray -o foo -test-network-only This will force Open-MPI to use only shared memory. If you want to test with TCP/IP only (the TCP/IP stack of Linux): mpiexec -n 70 \ --mca btl self,tcp \ ./Ray -o foo -test-network-only Usually, Open-MPI should select the best thing to do. But for a large-memory system, I don't know what it does. 2) Turn on data collection You can ask Ray to write raw collected data from the network tests to check that. -write-network-test-raw-data Writes one additional file per rank detailing the network test. 3) Testing other communication models. You may be interested to test the three available communication models in Ray 2.0.0. Recently, we ported Ray for the Cray XK6(TM). We added new communication models. You have to edit manually RayPlatform/communication/MessagesHandler.cpp to select one of these: CONFIG_COMM_PERSISTENT CONFIG_COMM_IPROBE_ROUND_ROBIN CONFIG_COMM_IPROBE_ANY_SOURCE The default is the second. On the Cray XK6(TM), they use the third. Sébastien Boisvert sent from Utah, U.S.A. at the ICiS 2012 workshop Next Generation Sequence Analysis > find -name NetworkTest.txt -exec cat {} \; > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 1368 > 1 SNIP.mcgill.ca 1325 > 2 SNIP.mcgill.ca 1421 > 3 SNIP.mcgill.ca 1420 > 4 SNIP.mcgill.ca 1414 > 5 SNIP.mcgill.ca 1347 > 6 SNIP.mcgill.ca 1364 > 7 SNIP.mcgill.ca 1416 > 8 SNIP.mcgill.ca 1438 > 9 SNIP.mcgill.ca 1421 > 10 SNIP.mcgill.ca 1336 > 11 SNIP.mcgill.ca 1420 > 12 SNIP.mcgill.ca 1417 > 13 SNIP.mcgill.ca 1419 > 14 SNIP.mcgill.ca 1422 > 15 SNIP.mcgill.ca 1343 > 16 SNIP.mcgill.ca 1419 > 17 SNIP.mcgill.ca 1422 > 18 SNIP.mcgill.ca 1420 > 19 SNIP.mcgill.ca 1393 > 20 SNIP.mcgill.ca 1435 > 21 SNIP.mcgill.ca 1420 > 22 SNIP.mcgill.ca 1419 > 23 SNIP.mcgill.ca 1420 > 24 SNIP.mcgill.ca 1380 > 25 SNIP.mcgill.ca 1419 > 26 SNIP.mcgill.ca 1419 > 27 SNIP.mcgill.ca 1377 > 28 SNIP.mcgill.ca 1421 > 29 SNIP.mcgill.ca 1410 > 30 SNIP.mcgill.ca 1403 > 31 SNIP.mcgill.ca 1420 > 32 SNIP.mcgill.ca 1422 > 33 SNIP.mcgill.ca 1341 > 34 SNIP.mcgill.ca 1421 > 35 SNIP.mcgill.ca 1393 > 36 SNIP.mcgill.ca 1407 > 37 SNIP.mcgill.ca 1420 > 38 SNIP.mcgill.ca 1421 > 39 SNIP.mcgill.ca 1316 > 40 SNIP.mcgill.ca 1421 > 41 SNIP.mcgill.ca 1422 > 42 SNIP.mcgill.ca 1419 > 43 SNIP.mcgill.ca 1340 > 44 SNIP.mcgill.ca 1421 > 45 SNIP.mcgill.ca 1405 > 46 SNIP.mcgill.ca 1411 > 47 SNIP.mcgill.ca 1361 > 48 SNIP.mcgill.ca 1419 > 49 SNIP.mcgill.ca 1341 > 50 SNIP.mcgill.ca 1419 > 51 SNIP.mcgill.ca 1420 > 52 SNIP.mcgill.ca 1419 > 53 SNIP.mcgill.ca 1421 > 54 SNIP.mcgill.ca 1420 > 55 SNIP.mcgill.ca 1373 > 56 SNIP.mcgill.ca 1383 > 57 SNIP.mcgill.ca 1348 > 58 SNIP.mcgill.ca 1420 > 59 SNIP.mcgill.ca 1418 > 60 SNIP.mcgill.ca 1373 > 61 SNIP.mcgill.ca 1420 > 62 SNIP.mcgill.ca 1419 > 63 SNIP.mcgill.ca 1419 > 64 SNIP.mcgill.ca 1346 > 65 SNIP.mcgill.ca 1370 > 66 SNIP.mcgill.ca 1404 > 67 SNIP.mcgill.ca 1419 > 68 SNIP.mcgill.ca 1419 > 69 SNIP.mcgill.ca 1420 > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 20 > 1 SNIP.mcgill.ca 20 > 2 SNIP.mcgill.ca 20 > 3 SNIP.mcgill.ca 20 > 4 SNIP.mcgill.ca 20 > 5 SNIP.mcgill.ca 21 > 6 SNIP.mcgill.ca 21 > 7 SNIP.mcgill.ca 26 > 8 SNIP.mcgill.ca 20 > 9 SNIP.mcgill.ca 21 > 10 SNIP.mcgill.ca 20 > 11 SNIP.mcgill.ca 20 > 12 SNIP.mcgill.ca 21 > 13 SNIP.mcgill.ca 20 > 14 SNIP.mcgill.ca 20 > 15 SNIP.mcgill.ca 20 > 16 SNIP.mcgill.ca 20 > 17 SNIP.mcgill.ca 20 > 18 SNIP.mcgill.ca 27 > 19 SNIP.mcgill.ca 20 > 20 SNIP.mcgill.ca 20 > 21 SNIP.mcgill.ca 20 > 22 SNIP.mcgill.ca 20 > 23 SNIP.mcgill.ca 20 > 24 SNIP.mcgill.ca 20 > 25 SNIP.mcgill.ca 20 > 26 SNIP.mcgill.ca 20 > 27 SNIP.mcgill.ca 20 > 28 SNIP.mcgill.ca 20 > 29 SNIP.mcgill.ca 21 > 30 SNIP.mcgill.ca 20 > 31 SNIP.mcgill.ca 20 > 32 SNIP.mcgill.ca 20 > 33 SNIP.mcgill.ca 20 > 34 SNIP.mcgill.ca 20 > 35 SNIP.mcgill.ca 20 > 36 SNIP.mcgill.ca 20 > 37 SNIP.mcgill.ca 20 > 38 SNIP.mcgill.ca 20 > 39 SNIP.mcgill.ca 20 > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 2530 > 1 SNIP.mcgill.ca 2458 > 2 SNIP.mcgill.ca 2638 > 3 SNIP.mcgill.ca 2445 > 4 SNIP.mcgill.ca 2485 > 5 SNIP.mcgill.ca 2570 > 6 SNIP.mcgill.ca 2448 > 7 SNIP.mcgill.ca 2523 > 8 SNIP.mcgill.ca 2450 > 9 SNIP.mcgill.ca 2610 > 10 SNIP.mcgill.ca 2448 > 11 SNIP.mcgill.ca 2493 > 12 SNIP.mcgill.ca 2473 > 13 SNIP.mcgill.ca 2528 > 14 SNIP.mcgill.ca 4128 > 15 SNIP.mcgill.ca 2558 > 16 SNIP.mcgill.ca 2525 > 17 SNIP.mcgill.ca 2545 > 18 SNIP.mcgill.ca 2548 > 19 SNIP.mcgill.ca 2475 > 20 SNIP.mcgill.ca 2545 > 21 SNIP.mcgill.ca 2475 > 22 SNIP.mcgill.ca 2445 > 23 SNIP.mcgill.ca 2458 > 24 SNIP.mcgill.ca 2503 > 25 SNIP.mcgill.ca 4133 > 26 SNIP.mcgill.ca 2515 > 27 SNIP.mcgill.ca 2450 > 28 SNIP.mcgill.ca 2518 > 29 SNIP.mcgill.ca 2580 > 30 SNIP.mcgill.ca 2503 > 31 SNIP.mcgill.ca 2478 > 32 SNIP.mcgill.ca 2575 > 33 SNIP.mcgill.ca 2513 > 34 SNIP.mcgill.ca 2435 > 35 SNIP.mcgill.ca 2483 > 36 SNIP.mcgill.ca 2380 > 37 SNIP.mcgill.ca 2413 > 38 SNIP.mcgill.ca 2405 > 39 SNIP.mcgill.ca 2545 > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 16 > 1 SNIP.mcgill.ca 15 > 2 SNIP.mcgill.ca 16 > 3 SNIP.mcgill.ca 16 > 4 SNIP.mcgill.ca 16 > 5 SNIP.mcgill.ca 16 > 6 SNIP.mcgill.ca 16 > 7 SNIP.mcgill.ca 16 > 8 SNIP.mcgill.ca 15 > 9 SNIP.mcgill.ca 15 > 10 SNIP.mcgill.ca 16 > 11 SNIP.mcgill.ca 16 > 12 SNIP.mcgill.ca 16 > 13 SNIP.mcgill.ca 16 > 14 SNIP.mcgill.ca 16 > 15 SNIP.mcgill.ca 15 > 16 SNIP.mcgill.ca 16 > 17 SNIP.mcgill.ca 16 > 18 SNIP.mcgill.ca 15 > 19 SNIP.mcgill.ca 15 > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 544 > 1 SNIP.mcgill.ca 493 > 2 SNIP.mcgill.ca 542 > 3 SNIP.mcgill.ca 545 > 4 SNIP.mcgill.ca 432 > 5 SNIP.mcgill.ca 412 > 6 SNIP.mcgill.ca 545 > 7 SNIP.mcgill.ca 532 > 8 SNIP.mcgill.ca 425 > 9 SNIP.mcgill.ca 538 > 10 SNIP.mcgill.ca 541 > 11 SNIP.mcgill.ca 542 > 12 SNIP.mcgill.ca 541 > 13 SNIP.mcgill.ca 541 > 14 SNIP.mcgill.ca 545 > 15 SNIP.mcgill.ca 545 > 16 SNIP.mcgill.ca 540 > 17 SNIP.mcgill.ca 543 > 18 SNIP.mcgill.ca 545 > 19 SNIP.mcgill.ca 545 > 20 SNIP.mcgill.ca 544 > 21 SNIP.mcgill.ca 535 > 22 SNIP.mcgill.ca 542 > 23 SNIP.mcgill.ca 544 > 24 SNIP.mcgill.ca 512 > 25 SNIP.mcgill.ca 432 > 26 SNIP.mcgill.ca 544 > 27 SNIP.mcgill.ca 539 > 28 SNIP.mcgill.ca 480 > 29 SNIP.mcgill.ca 542 > 30 SNIP.mcgill.ca 544 > 31 SNIP.mcgill.ca 536 > 32 SNIP.mcgill.ca 541 > 33 SNIP.mcgill.ca 542 > 34 SNIP.mcgill.ca 541 > 35 SNIP.mcgill.ca 545 > 36 SNIP.mcgill.ca 535 > 37 SNIP.mcgill.ca 541 > 38 SNIP.mcgill.ca 536 > 39 SNIP.mcgill.ca 541 > 40 SNIP.mcgill.ca 544 > 41 SNIP.mcgill.ca 544 > 42 SNIP.mcgill.ca 542 > 43 SNIP.mcgill.ca 542 > 44 SNIP.mcgill.ca 545 > 45 SNIP.mcgill.ca 544 > 46 SNIP.mcgill.ca 543 > 47 SNIP.mcgill.ca 543 > 48 SNIP.mcgill.ca 546 > 49 SNIP.mcgill.ca 509 > 50 SNIP.mcgill.ca 544 > 51 SNIP.mcgill.ca 516 > 52 SNIP.mcgill.ca 545 > 53 SNIP.mcgill.ca 542 > 54 SNIP.mcgill.ca 544 > 55 SNIP.mcgill.ca 545 > 56 SNIP.mcgill.ca 545 > 57 SNIP.mcgill.ca 541 > 58 SNIP.mcgill.ca 542 > 59 SNIP.mcgill.ca 544 > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 947 > 1 SNIP.mcgill.ca 947 > 2 SNIP.mcgill.ca 948 > 3 SNIP.mcgill.ca 947 > 4 SNIP.mcgill.ca 948 > 5 SNIP.mcgill.ca 950 > 6 SNIP.mcgill.ca 950 > 7 SNIP.mcgill.ca 951 > 8 SNIP.mcgill.ca 950 > 9 SNIP.mcgill.ca 950 > 10 SNIP.mcgill.ca 947 > 11 SNIP.mcgill.ca 948 > 12 SNIP.mcgill.ca 948 > 13 SNIP.mcgill.ca 951 > 14 SNIP.mcgill.ca 944 > 15 SNIP.mcgill.ca 947 > 16 SNIP.mcgill.ca 945 > 17 SNIP.mcgill.ca 944 > 18 SNIP.mcgill.ca 967 > 19 SNIP.mcgill.ca 946 > 20 SNIP.mcgill.ca 946 > 21 SNIP.mcgill.ca 950 > 22 SNIP.mcgill.ca 947 > 23 SNIP.mcgill.ca 948 > 24 SNIP.mcgill.ca 947 > 25 SNIP.mcgill.ca 947 > 26 SNIP.mcgill.ca 947 > 27 SNIP.mcgill.ca 954 > 28 SNIP.mcgill.ca 951 > 29 SNIP.mcgill.ca 948 > 30 SNIP.mcgill.ca 953 > 31 SNIP.mcgill.ca 949 > 32 SNIP.mcgill.ca 950 > 33 SNIP.mcgill.ca 951 > 34 SNIP.mcgill.ca 946 > 35 SNIP.mcgill.ca 948 > 36 SNIP.mcgill.ca 945 > 37 SNIP.mcgill.ca 950 > 38 SNIP.mcgill.ca 948 > 39 SNIP.mcgill.ca 954 > 40 SNIP.mcgill.ca 951 > 41 SNIP.mcgill.ca 948 > 42 SNIP.mcgill.ca 947 > 43 SNIP.mcgill.ca 948 > 44 SNIP.mcgill.ca 951 > 45 SNIP.mcgill.ca 950 > 46 SNIP.mcgill.ca 934 > 47 SNIP.mcgill.ca 950 > 48 SNIP.mcgill.ca 948 > 49 SNIP.mcgill.ca 948 > 50 SNIP.mcgill.ca 948 > 51 SNIP.mcgill.ca 948 > 52 SNIP.mcgill.ca 948 > 53 SNIP.mcgill.ca 951 > 54 SNIP.mcgill.ca 948 > 55 SNIP.mcgill.ca 993 > 56 SNIP.mcgill.ca 950 > 57 SNIP.mcgill.ca 945 > 58 SNIP.mcgill.ca 948 > 59 SNIP.mcgill.ca 950 > # average latency in microseconds (10^-6 seconds) when requesting a reply for > a message of 4000 bytes > # Message passing interface rank Name Latency in microseconds > 0 SNIP.mcgill.ca 42 > 1 SNIP.mcgill.ca 30 > 2 SNIP.mcgill.ca 31 > 3 SNIP.mcgill.ca 30 > 4 SNIP.mcgill.ca 41 > 5 SNIP.mcgill.ca 37 > 6 SNIP.mcgill.ca 41 > 7 SNIP.mcgill.ca 38 > 8 SNIP.mcgill.ca 39 > 9 SNIP.mcgill.ca 38 > 10 SNIP.mcgill.ca 37 > 11 SNIP.mcgill.ca 39 > 12 SNIP.mcgill.ca 39 > 13 SNIP.mcgill.ca 38 > 14 SNIP.mcgill.ca 40 > 15 SNIP.mcgill.ca 39 > 16 SNIP.mcgill.ca 39 > 17 SNIP.mcgill.ca 38 > 18 SNIP.mcgill.ca 37 > 19 SNIP.mcgill.ca 30 > 20 SNIP.mcgill.ca 31 > 21 SNIP.mcgill.ca 40 > 22 SNIP.mcgill.ca 31 > 23 SNIP.mcgill.ca 31 > 24 SNIP.mcgill.ca 40 > 25 SNIP.mcgill.ca 39 > 26 SNIP.mcgill.ca 30 > 27 SNIP.mcgill.ca 39 > 28 SNIP.mcgill.ca 30 > 29 SNIP.mcgill.ca 39 > 30 SNIP.mcgill.ca 38 > 31 SNIP.mcgill.ca 41 > 32 SNIP.mcgill.ca 40 > 33 SNIP.mcgill.ca 38 > 34 SNIP.mcgill.ca 30 > 35 SNIP.mcgill.ca 30 > 36 SNIP.mcgill.ca 31 > 37 SNIP.mcgill.ca 39 > 38 SNIP.mcgill.ca 38 > 39 SNIP.mcgill.ca 38 > 40 SNIP.mcgill.ca 40 > 41 SNIP.mcgill.ca 38 > 42 SNIP.mcgill.ca 40 > 43 SNIP.mcgill.ca 31 > 44 SNIP.mcgill.ca 38 > 45 SNIP.mcgill.ca 31 > 46 SNIP.mcgill.ca 40 > 47 SNIP.mcgill.ca 39 > 48 SNIP.mcgill.ca 30 > 49 SNIP.mcgill.ca 37 > 50 SNIP.mcgill.ca 38 > 51 SNIP.mcgill.ca 38 > 52 SNIP.mcgill.ca 40 > 53 SNIP.mcgill.ca 38 > 54 SNIP.mcgill.ca 38 > 55 SNIP.mcgill.ca 38 > 56 SNIP.mcgill.ca 38 > 57 SNIP.mcgill.ca 38 > 58 SNIP.mcgill.ca 39 > 59 SNIP.mcgill.ca 38 > > ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Denovoassembler-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
