Re: [OMPI devel] TIPC BTL code ready for review
Sorry, it took me so long to reply, I was out of office these days. The paper I sent, the author did the test using 100% CPU, but not in my case. But since I did not look into the code of TIPC, I am afraid I cannot explain it. I can only show you my result and hope you get the same or even better result in your environment :) I did Netpipe test on TCP, I have not found any good tools for TIPC, maybe I will write one myself later. /Xin Send and receive buffers are 16384 and 87380 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 1960 times --> 0.14 Mbps in 53.00 usec 1: 2 bytes 1886 times --> 0.29 Mbps in 53.28 usec 2: 3 bytes 1876 times --> 0.43 Mbps in 53.23 usec 3: 4 bytes 1252 times --> 0.57 Mbps in 53.28 usec 4: 6 bytes 1407 times --> 0.87 Mbps in 52.74 usec 5: 8 bytes948 times --> 1.14 Mbps in 53.35 usec 6: 12 bytes 1171 times --> 1.71 Mbps in 53.40 usec 7: 13 bytes780 times --> 1.86 Mbps in 53.32 usec 8: 16 bytes865 times --> 2.28 Mbps in 53.43 usec 9: 19 bytes 1052 times --> 2.71 Mbps in 53.57 usec 10: 21 bytes 1178 times --> 2.99 Mbps in 53.57 usec 11: 24 bytes 1244 times --> 3.41 Mbps in 53.62 usec 12: 27 bytes 1320 times --> 3.84 Mbps in 53.60 usec 13: 29 bytes829 times --> 4.14 Mbps in 53.47 usec 14: 32 bytes902 times --> 4.57 Mbps in 53.48 usec 15: 35 bytes993 times --> 4.98 Mbps in 53.62 usec 16: 45 bytes 1065 times --> 6.40 Mbps in 53.63 usec 17: 48 bytes 1242 times --> 6.82 Mbps in 53.66 usec 18: 51 bytes 1281 times --> 7.25 Mbps in 53.65 usec 19: 61 bytes730 times --> 8.68 Mbps in 53.65 usec 20: 64 bytes916 times --> 9.10 Mbps in 53.65 usec 21: 67 bytes961 times --> 9.52 Mbps in 53.69 usec 22: 93 bytes 1000 times --> 13.17 Mbps in 53.89 usec 23: 96 bytes 1237 times --> 13.58 Mbps in 53.94 usec 24: 99 bytes 1255 times --> 13.99 Mbps in 54.00 usec 25: 125 bytes673 times --> 17.64 Mbps in 54.06 usec 26: 128 bytes917 times --> 18.06 Mbps in 54.08 usec 27: 131 bytes938 times --> 18.46 Mbps in 54.14 usec 28: 189 bytes958 times --> 26.47 Mbps in 54.47 usec 29: 192 bytes 1223 times --> 26.82 Mbps in 54.62 usec 30: 195 bytes 1230 times --> 27.22 Mbps in 54.66 usec 31: 253 bytes638 times --> 35.26 Mbps in 54.75 usec 32: 256 bytes909 times --> 35.43 Mbps in 55.12 usec 33: 259 bytes914 times --> 35.84 Mbps in 55.14 usec 34: 381 bytes924 times --> 51.58 Mbps in 56.36 usec 35: 384 bytes 1182 times --> 51.87 Mbps in 56.49 usec 36: 387 bytes 1184 times --> 52.21 Mbps in 56.55 usec 37: 509 bytes603 times --> 63.56 Mbps in 61.10 usec 38: 512 bytes816 times --> 62.56 Mbps in 62.44 usec 39: 515 bytes803 times --> 62.56 Mbps in 62.81 usec 40: 765 bytes803 times --> 82.00 Mbps in 71.18 usec 41: 768 bytes936 times --> 82.32 Mbps in 71.18 usec 42: 771 bytes938 times --> 82.59 Mbps in 71.22 usec 43:1021 bytes473 times -->102.64 Mbps in 75.89 usec 44:1024 bytes658 times -->102.92 Mbps in 75.91 usec 45:1027 bytes659 times -->103.06 Mbps in 76.03 usec 46:1533 bytes660 times -->134.60 Mbps in 86.89 usec 47:1536 bytes767 times -->134.90 Mbps in 86.87 usec 48:1539 bytes768 times -->135.18 Mbps in 86.86 usec 49:2045 bytes385 times -->173.37 Mbps in 89.99 usec 50:2048 bytes555 times -->173.63 Mbps in 89.99 usec 51:2051 bytes556 times -->173.22 Mbps in 90.34 usec 52:3069 bytes554 times -->225.61 Mbps in 103.78 usec 53:3072 bytes642 times -->225.97 Mbps in 103.72 usec 54:3075 bytes643 times -->226.14 Mbps in 103.74 usec 55:4093 bytes322 times -->277.99 Mbps in 112.33 usec 56:4096 bytes445 times -->277.98 Mbps in 112.42 usec 57:4099 bytes444 times -->277.72 Mbps in 112.60 usec 58:6141 bytes444 times -->400.61 Mbps in 116.95 usec 59:6144 bytes570 times -->402.32 Mbps in 116.51 usec 60:6147 bytes572 times -->400.78 Mbps in 117.02 usec 61:8189 bytes285 times -->458.70 Mbps in 136.20 usec 62:8192 bytes367 times -->460.25 Mbps in 135.80 usec 63:8
Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
Don't forget that this RFC has a timeout of today. I didn't think it would be controversial, which is why it had a short timeout. - Josh brought up a good point on the teleconf today that he'd like to be able to have hwloc without the the additional libxml dependency (i.e., the way it is on the trunk today). Remember that making hwloc a 1st class citizen is the first step of a multi-sept plan (i.e., part of revamping paffinity in general). As part of the larger plan, we had planned to -- at least for a short while -- enable XML support in hwloc. Ralph and I will discuss this; I *think* we should be able to bring in the overall hwloc support without XML. For the future, hwloc is exploring either supporting some other text format that won't have an additional dependency (e.g., JSON), or re-writing its XML support to drop the libxml dependency. On Aug 31, 2011, at 3:05 PM, Jeff Squyres wrote: > WHAT: Move hwloc up to be a first-class citizen in OPAL (while still making > it possible to compile it out for platforms that don't need it) > > WHY: I previously sent a similar RFC to this one, but it got shot down in > favor of hiding hwloc's functionality under abstraction. After playing with > this for some time, we're now firmly in the belief that the additional > abstraction doesn't buy OMPI anything. > > WHERE: A new compile-time-one-of-many framework like libevent: opal/mca/hwloc. > > WHEN: as part of the paffinity changes being worked on by Jeff, Josh, Terry, > and Ralph. > > TIMEOUT: Teleconf, Tuesday, Sep 6. > > --> Short timeout because I *think* the only person that objected to the > prior RFC (Ralph) has now been converted. Hence, I think this will be > non-controversial. See below. > > -- > > MORE DETAIL: > > There are many people who want to use hwloc within the OMPI code base for > many different reasons. We've struggled how to do so for two reasons: > > 1. avoid a complete dependence on hwloc > 2. be able to compile it out for platforms that don't want/need it (e.g., > Cray) > > The initial objection to my long-ago RFC was that you could hide hwloc under > some abstraction and therefore easily be able to handle compiling hwloc out, > supporting platforms that hwloc doesn't support, and potentially be able to > replace hwloc with something else someday, if desired. > > After wrestling with this for a good long while, none of those goals seem > workable via a thin layer of abstraction. > > Instead, let's just call a spade a spade: we'll be dependent upon hwloc. > We'll provide a mechanism to compile it out for Cray and other embedded > platforms. > > Here's the plan: > > 1. Make a new framework opal/mca/hwloc. We'll initially have 3 components: > - hwloc121: hwloc distribution v1.2.1 > - system: the system-installed hwloc > - none: for platforms that don't want hwloc support > > Just like the libevent framework, we can introduce new versions of hwloc > (e.g., 1.3) as new components. Old versions/components can be deleted as new > versions/components are stabilized. > > 2. The hwloc framework will be like the libevent framework; only one of these > components will be compiled. The component's hwloc API will be directly > available (via name-shifting) to the rest of OPAL/ORTE/OMPI. No need for the > usual structs of function pointers and whatnot. > > 3. The rest of the OPAL / ORTE / OMPI code base can use the hwloc API in the > following way: > > 3a. opal_init() will initialize hwloc and load a central copy of the local > machine's topology in a global variable. Anyone in the code base can use this > global variable; its use does not need to be protected by #if _whatever_. > However, its value may be NULL for platforms that hwloc doesn't support or > installations that used the "none" hwloc component. > > 3b. opal_config.h will contain the macro OPAL_HAVE_HWLOC, which will be > either 0 or 1. Any code that uses the hwloc API must protect itself with #if > OPAL_HAVE_HWLOC, because installations that use the "none" hwloc component > won't be able to link resolve any of the hwloc symbols. > > Meaning that you could do something like: > > if (NULL != opal_hwloc_topology) { > #if OPAL_HAVE_HWLOC > // ...use hwloc API, etc. > #endif > } > > 4. After steps 1-3 are all done, the paffinity and maffinity frameworks can > be deleted and replaced with the corresponding hwloc calls. > > Meaning: if we've got hwloc, the paffinity and maffinity frameworks now > become redundant. So let's whack them. This can happen after 1-3 are done > and stable in the trunk, however. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squ
Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
I guess that as long as there is an option to have any need for XML support compiled out, there is no reason to complain. george. On Sep 6, 2011, at 17:36 , Jeff Squyres wrote: > Don't forget that this RFC has a timeout of today. I didn't think it would > be controversial, which is why it had a short timeout. > > - > > Josh brought up a good point on the teleconf today that he'd like to be able > to have hwloc without the the additional libxml dependency (i.e., the way it > is on the trunk today). > > Remember that making hwloc a 1st class citizen is the first step of a > multi-sept plan (i.e., part of revamping paffinity in general). As part of > the larger plan, we had planned to -- at least for a short while -- enable > XML support in hwloc. Ralph and I will discuss this; I *think* we should be > able to bring in the overall hwloc support without XML. > > For the future, hwloc is exploring either supporting some other text format > that won't have an additional dependency (e.g., JSON), or re-writing its XML > support to drop the libxml dependency. > > > On Aug 31, 2011, at 3:05 PM, Jeff Squyres wrote: > >> WHAT: Move hwloc up to be a first-class citizen in OPAL (while still making >> it possible to compile it out for platforms that don't need it) >> >> WHY: I previously sent a similar RFC to this one, but it got shot down in >> favor of hiding hwloc's functionality under abstraction. After playing with >> this for some time, we're now firmly in the belief that the additional >> abstraction doesn't buy OMPI anything. >> >> WHERE: A new compile-time-one-of-many framework like libevent: >> opal/mca/hwloc. >> >> WHEN: as part of the paffinity changes being worked on by Jeff, Josh, Terry, >> and Ralph. >> >> TIMEOUT: Teleconf, Tuesday, Sep 6. >> >> --> Short timeout because I *think* the only person that objected to the >> prior RFC (Ralph) has now been converted. Hence, I think this will be >> non-controversial. See below. >> >> -- >> >> MORE DETAIL: >> >> There are many people who want to use hwloc within the OMPI code base for >> many different reasons. We've struggled how to do so for two reasons: >> >> 1. avoid a complete dependence on hwloc >> 2. be able to compile it out for platforms that don't want/need it (e.g., >> Cray) >> >> The initial objection to my long-ago RFC was that you could hide hwloc under >> some abstraction and therefore easily be able to handle compiling hwloc out, >> supporting platforms that hwloc doesn't support, and potentially be able to >> replace hwloc with something else someday, if desired. >> >> After wrestling with this for a good long while, none of those goals seem >> workable via a thin layer of abstraction. >> >> Instead, let's just call a spade a spade: we'll be dependent upon hwloc. >> We'll provide a mechanism to compile it out for Cray and other embedded >> platforms. >> >> Here's the plan: >> >> 1. Make a new framework opal/mca/hwloc. We'll initially have 3 components: >> - hwloc121: hwloc distribution v1.2.1 >> - system: the system-installed hwloc >> - none: for platforms that don't want hwloc support >> >> Just like the libevent framework, we can introduce new versions of hwloc >> (e.g., 1.3) as new components. Old versions/components can be deleted as >> new versions/components are stabilized. >> >> 2. The hwloc framework will be like the libevent framework; only one of >> these components will be compiled. The component's hwloc API will be >> directly available (via name-shifting) to the rest of OPAL/ORTE/OMPI. No >> need for the usual structs of function pointers and whatnot. >> >> 3. The rest of the OPAL / ORTE / OMPI code base can use the hwloc API in the >> following way: >> >> 3a. opal_init() will initialize hwloc and load a central copy of the local >> machine's topology in a global variable. Anyone in the code base can use >> this global variable; its use does not need to be protected by #if >> _whatever_. However, its value may be NULL for platforms that hwloc doesn't >> support or installations that used the "none" hwloc component. >> >> 3b. opal_config.h will contain the macro OPAL_HAVE_HWLOC, which will be >> either 0 or 1. Any code that uses the hwloc API must protect itself with >> #if OPAL_HAVE_HWLOC, because installations that use the "none" hwloc >> component won't be able to link resolve any of the hwloc symbols. >> >> Meaning that you could do something like: >> >> if (NULL != opal_hwloc_topology) { >> #if OPAL_HAVE_HWLOC >> // ...use hwloc API, etc. >> #endif >> } >> >> 4. After steps 1-3 are all done, the paffinity and maffinity frameworks can >> be deleted and replaced with the corresponding hwloc calls. >> >> Meaning: if we've got hwloc, the paffinity and maffinity frameworks now >> become redundant. So let's whack them. This can happen after 1-3 are done >> and stable in the trunk, however.
[OMPI devel] OMPI v1.4.4rc3 is now up
Chock full of fixes: http://www.open-mpi.org/software/ompi/v1.4/ -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] Regarding Connection establishment in OpenMPI (Jeff Squyres)
Hi Jeff, As per our last discussion, MPI_INIT(..) uses TCP socket to exchange its service-id/lid with other MPI processes. I assume this applies irrespective of underlying library used to establish connection i.e libibcm or librdmacm. Please correct me if I am wrong. Message: 1 > Date: Wed, 24 Aug 2011 12:06:30 -0400 > From: Jeff Squyres > Subject: Re: [OMPI devel] Regarding Connection establishment in >OpenMPI (Jeff Squyres) > To: Open MPI Developers > Message-ID: > Content-Type: text/plain; charset=us-ascii > > At the moment, our only "OOB" (out of band) module uses TCP sockets. This > can use traditional ethernet or an emulated IP layer, such as IPoIB. > > > On Aug 24, 2011, at 11:58 AM, Bhargava Ramu Kavati wrote: > > > Hi Jeff, > > Thank you for your prompt response. I have a query related to MPI_INIT > here. What is the underlying transport mechanism does OpenMPI uses to > exchange service-id/lid via MPI_INIT, is it TCP/IP socket ? > > > > Thanks & Regards, > > Ramu > > > > Message: 2 > > Date: Mon, 22 Aug 2011 17:33:19 -0400 > > From: Jeff Squyres > > Subject: Re: [OMPI devel] Regarding Connection establishment in > >OpenMPI > > To: Open MPI Developers > > Message-ID: <2399c470-7f91-49d4-a463-a8994691a...@cisco.com> > > Content-Type: text/plain; charset=us-ascii > > > > On Aug 22, 2011, at 9:35 AM, Bhargava Ramu Kavati wrote: > > > > > I am trying to explore the details of connection establishment in > OpenMPI using libibcm/librdmacm. > > > > Note that the IB community has given up on ibcm. Our support of it is > incomplete; I wouldn't look at it as an example. > > > > > In the code, I could not find how OpenMPI app is getting service-id/lid > of remote node to which it wants to connect. > > > > In the normal case, we pass that information during MPI_INIT. It's a > global gather / broadcast operation that we refer to as the "modex" (module > exchange). I.e., each openib BTL module instance publishes its address > information in the modex and sends it. Near the end of MPI_INIT, each MPI > process receives the modex broadcast and caches it. > > > > During connection establishment, an MPI process will look in its modex > cache to find the connection information for the peer process that it wants > to connect to. > > > > > Also, I did not see any query in the code related to service_record_get > from SA. Can you please desribe what is happening OR Am I missing something > here ? > > > > IIRC, we don't currently use the SA because of its serialization and > other resource bottlenecks (this is a hand-waving answer; I don't remember > the exact reasons for not using the SA, but there were many discussions > between the MPI and OpenFabrics communities a long time ago. The SA issues > were not resolved to the MPI community's liking, IIRC, but this was a long > time ago, and I don't even work for an IB vendor any more, so I might not be > remembering this correctly...). > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > -- > >
Re: [OMPI devel] RFC: make hwloc be a 1st-class citizen
The ultimate goal is to not add an additional dependency for serialization of the hwloc topology information. One way or another, we'll get there. On Sep 6, 2011, at 11:46 AM, George Bosilca wrote: > I guess that as long as there is an option to have any need for XML support > compiled out, there is no reason to complain. > > george. > > On Sep 6, 2011, at 17:36 , Jeff Squyres wrote: > >> Don't forget that this RFC has a timeout of today. I didn't think it would >> be controversial, which is why it had a short timeout. >> >> - >> >> Josh brought up a good point on the teleconf today that he'd like to be able >> to have hwloc without the the additional libxml dependency (i.e., the way it >> is on the trunk today). >> >> Remember that making hwloc a 1st class citizen is the first step of a >> multi-sept plan (i.e., part of revamping paffinity in general). As part of >> the larger plan, we had planned to -- at least for a short while -- enable >> XML support in hwloc. Ralph and I will discuss this; I *think* we should be >> able to bring in the overall hwloc support without XML. >> >> For the future, hwloc is exploring either supporting some other text format >> that won't have an additional dependency (e.g., JSON), or re-writing its XML >> support to drop the libxml dependency. >> >> >> On Aug 31, 2011, at 3:05 PM, Jeff Squyres wrote: >> >>> WHAT: Move hwloc up to be a first-class citizen in OPAL (while still making >>> it possible to compile it out for platforms that don't need it) >>> >>> WHY: I previously sent a similar RFC to this one, but it got shot down in >>> favor of hiding hwloc's functionality under abstraction. After playing >>> with this for some time, we're now firmly in the belief that the additional >>> abstraction doesn't buy OMPI anything. >>> >>> WHERE: A new compile-time-one-of-many framework like libevent: >>> opal/mca/hwloc. >>> >>> WHEN: as part of the paffinity changes being worked on by Jeff, Josh, >>> Terry, and Ralph. >>> >>> TIMEOUT: Teleconf, Tuesday, Sep 6. >>> >>> --> Short timeout because I *think* the only person that objected to the >>> prior RFC (Ralph) has now been converted. Hence, I think this will be >>> non-controversial. See below. >>> >>> -- >>> >>> MORE DETAIL: >>> >>> There are many people who want to use hwloc within the OMPI code base for >>> many different reasons. We've struggled how to do so for two reasons: >>> >>> 1. avoid a complete dependence on hwloc >>> 2. be able to compile it out for platforms that don't want/need it (e.g., >>> Cray) >>> >>> The initial objection to my long-ago RFC was that you could hide hwloc >>> under some abstraction and therefore easily be able to handle compiling >>> hwloc out, supporting platforms that hwloc doesn't support, and potentially >>> be able to replace hwloc with something else someday, if desired. >>> >>> After wrestling with this for a good long while, none of those goals seem >>> workable via a thin layer of abstraction. >>> >>> Instead, let's just call a spade a spade: we'll be dependent upon hwloc. >>> We'll provide a mechanism to compile it out for Cray and other embedded >>> platforms. >>> >>> Here's the plan: >>> >>> 1. Make a new framework opal/mca/hwloc. We'll initially have 3 components: >>> - hwloc121: hwloc distribution v1.2.1 >>> - system: the system-installed hwloc >>> - none: for platforms that don't want hwloc support >>> >>> Just like the libevent framework, we can introduce new versions of hwloc >>> (e.g., 1.3) as new components. Old versions/components can be deleted as >>> new versions/components are stabilized. >>> >>> 2. The hwloc framework will be like the libevent framework; only one of >>> these components will be compiled. The component's hwloc API will be >>> directly available (via name-shifting) to the rest of OPAL/ORTE/OMPI. No >>> need for the usual structs of function pointers and whatnot. >>> >>> 3. The rest of the OPAL / ORTE / OMPI code base can use the hwloc API in >>> the following way: >>> >>> 3a. opal_init() will initialize hwloc and load a central copy of the local >>> machine's topology in a global variable. Anyone in the code base can use >>> this global variable; its use does not need to be protected by #if >>> _whatever_. However, its value may be NULL for platforms that hwloc doesn't >>> support or installations that used the "none" hwloc component. >>> >>> 3b. opal_config.h will contain the macro OPAL_HAVE_HWLOC, which will be >>> either 0 or 1. Any code that uses the hwloc API must protect itself with >>> #if OPAL_HAVE_HWLOC, because installations that use the "none" hwloc >>> component won't be able to link resolve any of the hwloc symbols. >>> >>> Meaning that you could do something like: >>> >>> if (NULL != opal_hwloc_topology) { >>> #if OPAL_HAVE_HWLOC >>> // ...use hwloc API, etc. >>> #endif >>> } >>> >>> 4. After steps 1-3 a
Re: [OMPI devel] Regarding Connection establishment in OpenMPI (Jeff Squyres)
On Sep 6, 2011, at 1:24 PM, Bhargava Ramu Kavati wrote: > As per our last discussion, MPI_INIT(..) uses TCP socket to exchange its > service-id/lid with other MPI processes. I assume this applies irrespective > of underlying library used to establish connection i.e libibcm or librdmacm. > Please correct me if I am wrong. It can use librdmacm, but it doesn't have to (and doesn't by default). librdmacm uses its own internal communications -- I'm not sure what transport layer it uses. By default, however, OMPI uses a TCP-based connection mechanism. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/