Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
I've confirmed. Thanks. Takahiro Kawashima, MPI development team, Fujitsu > Done -- thank you! > > On Jan 11, 2013, at 3:52 AM, "Kawashima, Takahiro" > wrote: > > > Hi Open MPI core members and Rayson, > > > > I've confirmed to the authors and created the bibtex reference. > > Could you make a page in the "Open MPI Publications" page that > > links to Fujitsu's PDF file? The attached file contains information > > of title, authors, abstract, link URL, and bibtex reference. > > > > Best regards, > > Takahiro Kawashima, > > MPI development team, > > Fujitsu > > > >> Sorry for not replying sooner. > >> I'm taliking with the authors (they are not in this list) and > >> will request linking the PDF soon if they allowed. > >> > >> Takahiro Kawashima, > >> MPI development team, > >> Fujitsu > >> > >>> Our policy so far was that adding a paper to the list of publication on > >>> the Open MPI website was a discretionary action at the authors' request. > >>> I don't see any compelling reason to change. Moreover, Fujitsu being a > >>> contributor of the Open MPI community, there is no obstacle of adding a > >>> link to their paper -- at their request. > >>> > >>> George. > >>> > >>> On Jan 10, 2013, at 00:15 , Rayson Ho wrote: > >>> > Hi Ralph, > > Since the whole journal is available online, and is reachable by > Google, I don't believe we can get into copyright issues by providing > a link to it (but then, I also know that there are countries that have > more crazy web page linking rules!). > > http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html > > Rayson > > == > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > > Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster > http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html > > > On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: > > I'm unaware of any formal criteria. The papers currently located there > > are those written by members of the OMPI community, but we can > > certainly link to something written by someone else, so long as we > > don't get into copyright issues. > > > > On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > > > >> I found this paper recently, "MPI Library and Low-Level Communication > >> on the K computer", available at: > >> > >> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf > >> > >> What are the criteria for adding papers to the "Open MPI Publications" > >> page? > >> > >> Rayson
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Done -- thank you! On Jan 11, 2013, at 3:52 AM, "Kawashima, Takahiro" wrote: > Hi Open MPI core members and Rayson, > > I've confirmed to the authors and created the bibtex reference. > Could you make a page in the "Open MPI Publications" page that > links to Fujitsu's PDF file? The attached file contains information > of title, authors, abstract, link URL, and bibtex reference. > > Best regards, > Takahiro Kawashima, > MPI development team, > Fujitsu > >> Sorry for not replying sooner. >> I'm taliking with the authors (they are not in this list) and >> will request linking the PDF soon if they allowed. >> >> Takahiro Kawashima, >> MPI development team, >> Fujitsu >> >>> Our policy so far was that adding a paper to the list of publication on the >>> Open MPI website was a discretionary action at the authors' request. I >>> don't see any compelling reason to change. Moreover, Fujitsu being a >>> contributor of the Open MPI community, there is no obstacle of adding a >>> link to their paper -- at their request. >>> >>> George. >>> >>> On Jan 10, 2013, at 00:15 , Rayson Ho wrote: >>> Hi Ralph, Since the whole journal is available online, and is reachable by Google, I don't believe we can get into copyright issues by providing a link to it (but then, I also know that there are countries that have more crazy web page linking rules!). http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: > I'm unaware of any formal criteria. The papers currently located there > are those written by members of the OMPI community, but we can certainly > link to something written by someone else, so long as we don't get into > copyright issues. > > On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > >> I found this paper recently, "MPI Library and Low-Level Communication >> on the K computer", available at: >> >> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf >> >> What are the criteria for adding papers to the "Open MPI Publications" >> page? >> >> Rayson > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Open MPI core members and Rayson, I've confirmed to the authors and created the bibtex reference. Could you make a page in the "Open MPI Publications" page that links to Fujitsu's PDF file? The attached file contains information of title, authors, abstract, link URL, and bibtex reference. Best regards, Takahiro Kawashima, MPI development team, Fujitsu > Sorry for not replying sooner. > I'm taliking with the authors (they are not in this list) and > will request linking the PDF soon if they allowed. > > Takahiro Kawashima, > MPI development team, > Fujitsu > > > Our policy so far was that adding a paper to the list of publication on the > > Open MPI website was a discretionary action at the authors' request. I > > don't see any compelling reason to change. Moreover, Fujitsu being a > > contributor of the Open MPI community, there is no obstacle of adding a > > link to their paper -- at their request. > > > > George. > > > > On Jan 10, 2013, at 00:15 , Rayson Ho wrote: > > > > > Hi Ralph, > > > > > > Since the whole journal is available online, and is reachable by > > > Google, I don't believe we can get into copyright issues by providing > > > a link to it (but then, I also know that there are countries that have > > > more crazy web page linking rules!). > > > > > > http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html > > > > > > Rayson > > > > > > == > > > Open Grid Scheduler - The Official Open Source Grid Engine > > > http://gridscheduler.sourceforge.net/ > > > > > > Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster > > > http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html > > > > > > > > > On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: > > >> I'm unaware of any formal criteria. The papers currently located there > > >> are those written by members of the OMPI community, but we can certainly > > >> link to something written by someone else, so long as we don't get into > > >> copyright issues. > > >> > > >> On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > > >> > > >>> I found this paper recently, "MPI Library and Low-Level Communication > > >>> on the K computer", available at: > > >>> > > >>> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf > > >>> > > >>> What are the criteria for adding papers to the "Open MPI Publications" > > >>> page? > > >>> > > >>> Rayson Title: MPI Library and Low-Level Communication on the K computer Title: MPI Library and Low-Level Communication on the K computer Author(s): Naoyuki Shida, Shinji Sumimoto, Atsuya Uno Abstract: The key to raising application performance in a massively parallel system like the K computer is to increase the speed of communication between compute nodes. In the K computer, this inter-node communication is governed by the Message Passing Interface (MPI) communication library and low-level communication. This paper describes the implementation and performance of the MPI communication library, which exploits the new Tofu-interconnect architecture introduced in the K computer to enhance the performance of petascale applications, and low-level communication mechanism, which performs fine-grained control of the Tofu interconnect. Paper: paper11.pdf (PDF) Presented: FUJITSU Scientific & Technical Journal 2012-7 (Vol.48, No.3) Bibtex reference: @Article{shida2012:mpi_kcomputer, author = {Naoyuki Shida and Shinji Sumimoto and Atsuya Uno}, title = {{MPI} Library and Low-Level Communication on the {K computer}}, journal = {FUJITSU Scientific \& Technical Journal}, month = {July}, year= {2012}, volume = {48}, number = {3}, pages = {324--330} }
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi, Sorry for not replying sooner. I'm taliking with the authors (they are not in this list) and will request linking the PDF soon if they allowed. Takahiro Kawashima, MPI development team, Fujitsu > Our policy so far was that adding a paper to the list of publication on the > Open MPI website was a discretionary action at the authors' request. I don't > see any compelling reason to change. Moreover, Fujitsu being a contributor of > the Open MPI community, there is no obstacle of adding a link to their paper > -- at their request. > > George. > > On Jan 10, 2013, at 00:15 , Rayson Ho wrote: > > > Hi Ralph, > > > > Since the whole journal is available online, and is reachable by > > Google, I don't believe we can get into copyright issues by providing > > a link to it (but then, I also know that there are countries that have > > more crazy web page linking rules!). > > > > http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html > > > > Rayson > > > > == > > Open Grid Scheduler - The Official Open Source Grid Engine > > http://gridscheduler.sourceforge.net/ > > > > Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster > > http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html > > > > > > On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: > >> I'm unaware of any formal criteria. The papers currently located there are > >> those written by members of the OMPI community, but we can certainly link > >> to something written by someone else, so long as we don't get into > >> copyright issues. > >> > >> On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > >> > >>> I found this paper recently, "MPI Library and Low-Level Communication > >>> on the K computer", available at: > >>> > >>> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf > >>> > >>> What are the criteria for adding papers to the "Open MPI Publications" > >>> page? > >>> > >>> Rayson > >>> > >>> == > >>> Open Grid Scheduler - The Official Open Source Grid Engine > >>> http://gridscheduler.sourceforge.net/ > >>> > >>> > >>> On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca > >>> wrote: > Dear Yuki and Takahiro, > > Thanks for the bug report and for the patch. I pushed a [nearly > identical] patch in the trunk in > https://svn.open-mpi.org/trac/ompi/changeset/25488. A special version > for the 1.4 has been prepared and has been attached to the ticket #2916 > (https://svn.open-mpi.org/trac/ompi/ticket/2916). > > Thanks, > george. > > > On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: > > > Dear Open MPI community, > > > > I'm a member of MPI library development team in Fujitsu, > > Takahiro Kawashima, who sent mail before, is my colleague. > > We start to feed back. > > > > First, we fixed about MPI_LB/MPI_UB and data packing problem. > > > > Program crashes when it meets all of the following conditions: > > a: The type of sending data is contiguous and derived type. > > b: Either or both of MPI_LB and MPI_UB is used in the data type. > > c: The size of sending data is smaller than extent(Data type has gap). > > d: Send-count is bigger than 1. > > e: Total size of data is bigger than "eager limit" > > > > This problem occurs in attachment C program. > > > > An incorrect-address accessing occurs > > because an unintended value of "done" inputs and > > the value of "max_allowd" becomes minus > > in the following place in "ompi/datatype/datatype_pack.c(in version > > 1.4.3)". > > > > > > (ompi/datatype/datatype_pack.c) > > 188 packed_buffer = (unsigned char *) > > iov[iov_count].iov_base; > > 189 done = pConv->bConverted - i * pData->size; /* partial > > data from last pack */ > > 190 if( done != 0 ) { /* still some data to copy from the > > last time */ > > 191 done = pData->size - done; > > 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, > > pConv->pBaseBuf, pData, pConv->count ); > > 193 MEMCPY_CSUM( packed_buffer, user_memory, done, > > pConv ); > > 194 packed_buffer += done; > > 195 max_allowed -= done; > > 196 total_bytes_converted += done; > > 197 user_memory += (extent - pData->size + done); > > 198 } > > > > This program assumes "done" as the size of partial data from last pack. > > However, when the program crashes, "done" equals the sum of all > > transmitted data size. > > It makes "max_allowed" to be a negative value. > > > > We modified the code as following and it passed our test suite. > > But we are not
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Our policy so far was that adding a paper to the list of publication on the Open MPI website was a discretionary action at the authors' request. I don't see any compelling reason to change. Moreover, Fujitsu being a contributor of the Open MPI community, there is no obstacle of adding a link to their paper -- at their request. George. On Jan 10, 2013, at 00:15 , Rayson Ho wrote: > Hi Ralph, > > Since the whole journal is available online, and is reachable by > Google, I don't believe we can get into copyright issues by providing > a link to it (but then, I also know that there are countries that have > more crazy web page linking rules!). > > http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html > > Rayson > > == > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > > Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster > http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html > > > On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: >> I'm unaware of any formal criteria. The papers currently located there are >> those written by members of the OMPI community, but we can certainly link to >> something written by someone else, so long as we don't get into copyright >> issues. >> >> On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: >> >>> I found this paper recently, "MPI Library and Low-Level Communication >>> on the K computer", available at: >>> >>> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf >>> >>> What are the criteria for adding papers to the "Open MPI Publications" page? >>> >>> Rayson >>> >>> == >>> Open Grid Scheduler - The Official Open Source Grid Engine >>> http://gridscheduler.sourceforge.net/ >>> >>> >>> On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca >>> wrote: Dear Yuki and Takahiro, Thanks for the bug report and for the patch. I pushed a [nearly identical] patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A special version for the 1.4 has been prepared and has been attached to the ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916). Thanks, george. On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: > Dear Open MPI community, > > I'm a member of MPI library development team in Fujitsu, > Takahiro Kawashima, who sent mail before, is my colleague. > We start to feed back. > > First, we fixed about MPI_LB/MPI_UB and data packing problem. > > Program crashes when it meets all of the following conditions: > a: The type of sending data is contiguous and derived type. > b: Either or both of MPI_LB and MPI_UB is used in the data type. > c: The size of sending data is smaller than extent(Data type has gap). > d: Send-count is bigger than 1. > e: Total size of data is bigger than "eager limit" > > This problem occurs in attachment C program. > > An incorrect-address accessing occurs > because an unintended value of "done" inputs and > the value of "max_allowd" becomes minus > in the following place in "ompi/datatype/datatype_pack.c(in version > 1.4.3)". > > > (ompi/datatype/datatype_pack.c) > 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; > 189 done = pConv->bConverted - i * pData->size; /* partial > data from last pack */ > 190 if( done != 0 ) { /* still some data to copy from the > last time */ > 191 done = pData->size - done; > 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, > pConv->pBaseBuf, pData, pConv->count ); > 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv > ); > 194 packed_buffer += done; > 195 max_allowed -= done; > 196 total_bytes_converted += done; > 197 user_memory += (extent - pData->size + done); > 198 } > > This program assumes "done" as the size of partial data from last pack. > However, when the program crashes, "done" equals the sum of all > transmitted data size. > It makes "max_allowed" to be a negative value. > > We modified the code as following and it passed our test suite. > But we are not sure this fix is correct. Can anyone review this fix? > Patch (against Open MPI 1.4 branch) is attached to this mail. > > -if( done != 0 ) { /* still some data to copy from the last > time */ > +if( (done + max_allowed) >= pData->size ) { /* still some > data to copy from the last time */ > > Best regards, > > Yuki MATSUMOTO > MPI development team, > Fujitsu > >
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Ralph, Since the whole journal is available online, and is reachable by Google, I don't believe we can get into copyright issues by providing a link to it (but then, I also know that there are countries that have more crazy web page linking rules!). http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: > I'm unaware of any formal criteria. The papers currently located there are > those written by members of the OMPI community, but we can certainly link to > something written by someone else, so long as we don't get into copyright > issues. > > On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > >> I found this paper recently, "MPI Library and Low-Level Communication >> on the K computer", available at: >> >> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf >> >> What are the criteria for adding papers to the "Open MPI Publications" page? >> >> Rayson >> >> == >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> >> On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca wrote: >>> Dear Yuki and Takahiro, >>> >>> Thanks for the bug report and for the patch. I pushed a [nearly identical] >>> patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A >>> special version for the 1.4 has been prepared and has been attached to the >>> ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916). >>> >>> Thanks, >>> george. >>> >>> >>> On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: >>> Dear Open MPI community, I'm a member of MPI library development team in Fujitsu, Takahiro Kawashima, who sent mail before, is my colleague. We start to feed back. First, we fixed about MPI_LB/MPI_UB and data packing problem. Program crashes when it meets all of the following conditions: a: The type of sending data is contiguous and derived type. b: Either or both of MPI_LB and MPI_UB is used in the data type. c: The size of sending data is smaller than extent(Data type has gap). d: Send-count is bigger than 1. e: Total size of data is bigger than "eager limit" This problem occurs in attachment C program. An incorrect-address accessing occurs because an unintended value of "done" inputs and the value of "max_allowd" becomes minus in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)". (ompi/datatype/datatype_pack.c) 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; 189 done = pConv->bConverted - i * pData->size; /* partial data from last pack */ 190 if( done != 0 ) { /* still some data to copy from the last time */ 191 done = pData->size - done; 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, pConv->pBaseBuf, pData, pConv->count ); 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); 194 packed_buffer += done; 195 max_allowed -= done; 196 total_bytes_converted += done; 197 user_memory += (extent - pData->size + done); 198 } This program assumes "done" as the size of partial data from last pack. However, when the program crashes, "done" equals the sum of all transmitted data size. It makes "max_allowed" to be a negative value. We modified the code as following and it passed our test suite. But we are not sure this fix is correct. Can anyone review this fix? Patch (against Open MPI 1.4 branch) is attached to this mail. -if( done != 0 ) { /* still some data to copy from the last time */ +if( (done + max_allowed) >= pData->size ) { /* still some data to copy from the last time */ Best regards, Yuki MATSUMOTO MPI development team, Fujitsu (2011/06/28 10:58), Takahiro Kawashima wrote: > Dear Open MPI community, > > I'm a member of MPI library development team in Fujitsu. Shinji > Sumimoto, whose name appears in Jeff's blog, is one of our bosses. > > As Rayson and Jeff noted, K computer, world's most powerful HPC system > developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI > library. We, Fujitsu, are pleased to announce that, and also have special > thanks to Open MPI community. > We are sorry to be late announce! > > Our MPI l
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
I'm unaware of any formal criteria. The papers currently located there are those written by members of the OMPI community, but we can certainly link to something written by someone else, so long as we don't get into copyright issues. On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > I found this paper recently, "MPI Library and Low-Level Communication > on the K computer", available at: > > http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf > > What are the criteria for adding papers to the "Open MPI Publications" page? > > Rayson > > == > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > > > On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca wrote: >> Dear Yuki and Takahiro, >> >> Thanks for the bug report and for the patch. I pushed a [nearly identical] >> patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A >> special version for the 1.4 has been prepared and has been attached to the >> ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916). >> >> Thanks, >> george. >> >> >> On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: >> >>> Dear Open MPI community, >>> >>> I'm a member of MPI library development team in Fujitsu, >>> Takahiro Kawashima, who sent mail before, is my colleague. >>> We start to feed back. >>> >>> First, we fixed about MPI_LB/MPI_UB and data packing problem. >>> >>> Program crashes when it meets all of the following conditions: >>> a: The type of sending data is contiguous and derived type. >>> b: Either or both of MPI_LB and MPI_UB is used in the data type. >>> c: The size of sending data is smaller than extent(Data type has gap). >>> d: Send-count is bigger than 1. >>> e: Total size of data is bigger than "eager limit" >>> >>> This problem occurs in attachment C program. >>> >>> An incorrect-address accessing occurs >>> because an unintended value of "done" inputs and >>> the value of "max_allowd" becomes minus >>> in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)". >>> >>> >>> (ompi/datatype/datatype_pack.c) >>> 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; >>> 189 done = pConv->bConverted - i * pData->size; /* partial >>> data from last pack */ >>> 190 if( done != 0 ) { /* still some data to copy from the last >>> time */ >>> 191 done = pData->size - done; >>> 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, >>> pConv->pBaseBuf, pData, pConv->count ); >>> 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); >>> 194 packed_buffer += done; >>> 195 max_allowed -= done; >>> 196 total_bytes_converted += done; >>> 197 user_memory += (extent - pData->size + done); >>> 198 } >>> >>> This program assumes "done" as the size of partial data from last pack. >>> However, when the program crashes, "done" equals the sum of all transmitted >>> data size. >>> It makes "max_allowed" to be a negative value. >>> >>> We modified the code as following and it passed our test suite. >>> But we are not sure this fix is correct. Can anyone review this fix? >>> Patch (against Open MPI 1.4 branch) is attached to this mail. >>> >>> -if( done != 0 ) { /* still some data to copy from the last >>> time */ >>> +if( (done + max_allowed) >= pData->size ) { /* still some >>> data to copy from the last time */ >>> >>> Best regards, >>> >>> Yuki MATSUMOTO >>> MPI development team, >>> Fujitsu >>> >>> (2011/06/28 10:58), Takahiro Kawashima wrote: Dear Open MPI community, I'm a member of MPI library development team in Fujitsu. Shinji Sumimoto, whose name appears in Jeff's blog, is one of our bosses. As Rayson and Jeff noted, K computer, world's most powerful HPC system developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI library. We, Fujitsu, are pleased to announce that, and also have special thanks to Open MPI community. We are sorry to be late announce! Our MPI library is based on Open MPI 1.4 series, and has a new point- to-point component (BTL) and new topology-aware collective communication algorithms (COLL). Also, it is adapted to our runtime environment (ESS, PLM, GRPCOMM etc). K computer connects 68,544 nodes by our custom interconnect. Its runtime environment is our proprietary one. So we don't use orted. We cannot tell start-up time yet because of disclosure restriction, sorry. We are surprised by the extensibility of Open MPI, and have proved that Open MPI is scalable to 68,000 processes level! We feel pleasure to utilize such a great open-source software. We cannot tell detail of our technology yet because of our contract with RIKEN AICS, however, we will
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
I found this paper recently, "MPI Library and Low-Level Communication on the K computer", available at: http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf What are the criteria for adding papers to the "Open MPI Publications" page? Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca wrote: > Dear Yuki and Takahiro, > > Thanks for the bug report and for the patch. I pushed a [nearly identical] > patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A > special version for the 1.4 has been prepared and has been attached to the > ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916). > > Thanks, > george. > > > On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: > >> Dear Open MPI community, >> >> I'm a member of MPI library development team in Fujitsu, >> Takahiro Kawashima, who sent mail before, is my colleague. >> We start to feed back. >> >> First, we fixed about MPI_LB/MPI_UB and data packing problem. >> >> Program crashes when it meets all of the following conditions: >> a: The type of sending data is contiguous and derived type. >> b: Either or both of MPI_LB and MPI_UB is used in the data type. >> c: The size of sending data is smaller than extent(Data type has gap). >> d: Send-count is bigger than 1. >> e: Total size of data is bigger than "eager limit" >> >> This problem occurs in attachment C program. >> >> An incorrect-address accessing occurs >> because an unintended value of "done" inputs and >> the value of "max_allowd" becomes minus >> in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)". >> >> >> (ompi/datatype/datatype_pack.c) >> 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; >> 189 done = pConv->bConverted - i * pData->size; /* partial data >> from last pack */ >> 190 if( done != 0 ) { /* still some data to copy from the last >> time */ >> 191 done = pData->size - done; >> 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, >> pConv->pBaseBuf, pData, pConv->count ); >> 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); >> 194 packed_buffer += done; >> 195 max_allowed -= done; >> 196 total_bytes_converted += done; >> 197 user_memory += (extent - pData->size + done); >> 198 } >> >> This program assumes "done" as the size of partial data from last pack. >> However, when the program crashes, "done" equals the sum of all transmitted >> data size. >> It makes "max_allowed" to be a negative value. >> >> We modified the code as following and it passed our test suite. >> But we are not sure this fix is correct. Can anyone review this fix? >> Patch (against Open MPI 1.4 branch) is attached to this mail. >> >> -if( done != 0 ) { /* still some data to copy from the last >> time */ >> +if( (done + max_allowed) >= pData->size ) { /* still some data >> to copy from the last time */ >> >> Best regards, >> >> Yuki MATSUMOTO >> MPI development team, >> Fujitsu >> >> (2011/06/28 10:58), Takahiro Kawashima wrote: >>> Dear Open MPI community, >>> >>> I'm a member of MPI library development team in Fujitsu. Shinji >>> Sumimoto, whose name appears in Jeff's blog, is one of our bosses. >>> >>> As Rayson and Jeff noted, K computer, world's most powerful HPC system >>> developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI >>> library. We, Fujitsu, are pleased to announce that, and also have special >>> thanks to Open MPI community. >>> We are sorry to be late announce! >>> >>> Our MPI library is based on Open MPI 1.4 series, and has a new point- >>> to-point component (BTL) and new topology-aware collective communication >>> algorithms (COLL). Also, it is adapted to our runtime environment (ESS, >>> PLM, GRPCOMM etc). >>> >>> K computer connects 68,544 nodes by our custom interconnect. >>> Its runtime environment is our proprietary one. So we don't use orted. >>> We cannot tell start-up time yet because of disclosure restriction, sorry. >>> >>> We are surprised by the extensibility of Open MPI, and have proved that >>> Open MPI is scalable to 68,000 processes level! We feel pleasure to >>> utilize such a great open-source software. >>> >>> We cannot tell detail of our technology yet because of our contract >>> with RIKEN AICS, however, we will plan to feedback of our improvements >>> and bug fixes. We can contribute some bug fixes soon, however, for >>> contribution of our improvements will be next year with Open MPI >>> agreement. >>> >>> Best regards, >>> >>> MPI development team, >>> Fujitsu >>> >>> I got more information: http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ Short version: yes, Open MPI is used on K and was used to
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Dear Yuki and Takahiro, Thanks for the bug report and for the patch. I pushed a [nearly identical] patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A special version for the 1.4 has been prepared and has been attached to the ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916). Thanks, george. On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: > Dear Open MPI community, > > I'm a member of MPI library development team in Fujitsu, > Takahiro Kawashima, who sent mail before, is my colleague. > We start to feed back. > > First, we fixed about MPI_LB/MPI_UB and data packing problem. > > Program crashes when it meets all of the following conditions: > a: The type of sending data is contiguous and derived type. > b: Either or both of MPI_LB and MPI_UB is used in the data type. > c: The size of sending data is smaller than extent(Data type has gap). > d: Send-count is bigger than 1. > e: Total size of data is bigger than "eager limit" > > This problem occurs in attachment C program. > > An incorrect-address accessing occurs > because an unintended value of "done" inputs and > the value of "max_allowd" becomes minus > in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)". > > > (ompi/datatype/datatype_pack.c) > 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; > 189 done = pConv->bConverted - i * pData->size; /* partial data > from last pack */ > 190 if( done != 0 ) { /* still some data to copy from the last > time */ > 191 done = pData->size - done; > 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, > pConv->pBaseBuf, pData, pConv->count ); > 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); > 194 packed_buffer += done; > 195 max_allowed -= done; > 196 total_bytes_converted += done; > 197 user_memory += (extent - pData->size + done); > 198 } > > This program assumes "done" as the size of partial data from last pack. > However, when the program crashes, "done" equals the sum of all transmitted > data size. > It makes "max_allowed" to be a negative value. > > We modified the code as following and it passed our test suite. > But we are not sure this fix is correct. Can anyone review this fix? > Patch (against Open MPI 1.4 branch) is attached to this mail. > > -if( done != 0 ) { /* still some data to copy from the last time > */ > +if( (done + max_allowed) >= pData->size ) { /* still some data > to copy from the last time */ > > Best regards, > > Yuki MATSUMOTO > MPI development team, > Fujitsu > > (2011/06/28 10:58), Takahiro Kawashima wrote: >> Dear Open MPI community, >> >> I'm a member of MPI library development team in Fujitsu. Shinji >> Sumimoto, whose name appears in Jeff's blog, is one of our bosses. >> >> As Rayson and Jeff noted, K computer, world's most powerful HPC system >> developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI >> library. We, Fujitsu, are pleased to announce that, and also have special >> thanks to Open MPI community. >> We are sorry to be late announce! >> >> Our MPI library is based on Open MPI 1.4 series, and has a new point- >> to-point component (BTL) and new topology-aware collective communication >> algorithms (COLL). Also, it is adapted to our runtime environment (ESS, >> PLM, GRPCOMM etc). >> >> K computer connects 68,544 nodes by our custom interconnect. >> Its runtime environment is our proprietary one. So we don't use orted. >> We cannot tell start-up time yet because of disclosure restriction, sorry. >> >> We are surprised by the extensibility of Open MPI, and have proved that >> Open MPI is scalable to 68,000 processes level! We feel pleasure to >> utilize such a great open-source software. >> >> We cannot tell detail of our technology yet because of our contract >> with RIKEN AICS, however, we will plan to feedback of our improvements >> and bug fixes. We can contribute some bug fixes soon, however, for >> contribution of our improvements will be next year with Open MPI >> agreement. >> >> Best regards, >> >> MPI development team, >> Fujitsu >> >> >>> I got more information: >>> >>>http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ >>> >>> Short version: yes, Open MPI is used on K and was used to power the 8PF >>> runs. >>> >>> w00t! >>> >>> >>> >>> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: >>> w00t! OMPI powers 8 petaflops! (at least I'm guessing that -- does anyone know if that's true?) On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: > Interesting... page 11: > > http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf > > Open MPI based: > > * Open Standard, Open Source, Multi-Platform including PC Cluster. > * Adding extension to O
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 14/11/11 21:27, Y.MATSUMOTO wrote: > I'm a member of MPI library development team in Fujitsu, > Takahiro Kawashima, who sent mail before, is my colleague. > We start to feed back. First of all I'd like to say congratulations on breaking 10PF, and also a big thanks for working on contributing changes back to Open-MPI! Whilst I can't comment on the fix I can confirm that I also see segfaults with Open-MPI 1.4.2 and 1.4.4 with your example program. Intel compilers 11.1: - -- [bruce002:03973] *** Process received signal *** [bruce002:03973] Signal: Segmentation fault (11) [bruce002:03973] Signal code: Address not mapped (1) [bruce002:03973] Failing at address: 0x10009 [bruce002:03973] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:03973] [ 1] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0 [0x2ab5d79d] [bruce002:03973] [ 2] /usr/local/openmpi/1.4.4-intel/lib/libopen-pal.so.0(opal_progress+0x87) [0x2b1fdc27] [bruce002:03973] [ 3] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0 [0x2abce252] [bruce002:03973] [ 4] /usr/local/openmpi/1.4.4-intel/lib/libmpi.so.0(PMPI_Recv+0x213) [0x2ab1e0f3] [bruce002:03973] [ 5] ./tp_lb_ub_ng(main+0x29b) [0x4021ab] [bruce002:03973] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994] [bruce002:03973] [ 7] ./tp_lb_ub_ng [0x401e59] [bruce002:03973] *** End of error message *** - -- mpiexec noticed that process rank 1 with PID 3973 on node bruce002 exited on signal 11 (Segmentation fault). - -- [bruce002:03972] *** Process received signal *** [bruce002:03972] Signal: Segmentation fault (11) [bruce002:03972] Signal code: Address not mapped (1) [bruce002:03972] Failing at address: 0xff84bad0 [bruce002:03972] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:03972] [ 1] ./tp_lb_ub_ng(__intel_new_memcpy+0x2c) [0x403c9c] [bruce002:03972] *** End of error message *** GCC 4.4.4: - -- [bruce002:04049] *** Process received signal *** [bruce002:04049] Signal: Segmentation fault (11) [bruce002:04049] Signal code: Address not mapped (1) [bruce002:04049] Failing at address: 0x10009 [bruce002:04049] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:04049] [ 1] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2ab51f27] [bruce002:04049] [ 2] /usr/local/openmpi/1.4.4-gcc/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b14bb3a] [bruce002:04049] [ 3] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2abb9985] [bruce002:04049] [ 4] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0(PMPI_Recv+0x12f) [0x2ab1913f] [bruce002:04049] [ 5] ./tp_lb_ub_ng(main+0x21c) [0x400dd0] [bruce002:04049] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994] [bruce002:04049] [ 7] ./tp_lb_ub_ng [0x400af9] [bruce002:04049] *** End of error message *** - -- mpiexec noticed that process rank 1 with PID 4049 on node bruce002 exited on signal 11 (Segmentation fault). - -- [bruce002:04048] *** Process received signal *** [bruce002:04048] Signal: Segmentation fault (11) [bruce002:04048] Signal code: Address not mapped (1) [bruce002:04048] Failing at address: 0x2aaab0833000 [bruce002:04048] [ 0] /lib64/libpthread.so.0 [0x3e1320eb10] [bruce002:04048] [ 1] /lib64/libc.so.6(memcpy+0x3ff) [0x3e12a7c63f] [bruce002:04048] [ 2] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2aafef7b] [bruce002:04048] [ 3] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2ab4fcdd] [bruce002:04048] [ 4] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2abc1563] [bruce002:04048] [ 5] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2abbce78] [bruce002:04048] [ 6] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2ab52036] [bruce002:04048] [ 7] /usr/local/openmpi/1.4.4-gcc/lib/libopen-pal.so.0(opal_progress+0x5a) [0x2b14bb3a] [bruce002:04048] [ 8] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0 [0x2abba5f5] [bruce002:04048] [ 9] /usr/local/openmpi/1.4.4-gcc/lib/libmpi.so.0(MPI_Send+0x177) [0x2ab1b1d7] [bruce002:04048] [10] ./tp_lb_ub_ng(main+0x1e4) [0x400d98] [bruce002:04048] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e12a1d994] [bruce002:04048] [12] ./tp_lb_ub_ng [0x400af9] [bruce002:04048] *** End of error message *** - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARE
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Dear Open MPI community, I'm a member of MPI library development team in Fujitsu, Takahiro Kawashima, who sent mail before, is my colleague. We start to feed back. First, we fixed about MPI_LB/MPI_UB and data packing problem. Program crashes when it meets all of the following conditions: a: The type of sending data is contiguous and derived type. b: Either or both of MPI_LB and MPI_UB is used in the data type. c: The size of sending data is smaller than extent(Data type has gap). d: Send-count is bigger than 1. e: Total size of data is bigger than "eager limit" This problem occurs in attachment C program. An incorrect-address accessing occurs because an unintended value of "done" inputs and the value of "max_allowd" becomes minus in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)". (ompi/datatype/datatype_pack.c) 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; 189 done = pConv->bConverted - i * pData->size; /* partial data from last pack */ 190 if( done != 0 ) { /* still some data to copy from the last time */ 191 done = pData->size - done; 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, pConv->pBaseBuf, pData, pConv->count ); 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); 194 packed_buffer += done; 195 max_allowed -= done; 196 total_bytes_converted += done; 197 user_memory += (extent - pData->size + done); 198 } This program assumes "done" as the size of partial data from last pack. However, when the program crashes, "done" equals the sum of all transmitted data size. It makes "max_allowed" to be a negative value. We modified the code as following and it passed our test suite. But we are not sure this fix is correct. Can anyone review this fix? Patch (against Open MPI 1.4 branch) is attached to this mail. -if( done != 0 ) { /* still some data to copy from the last time */ +if( (done + max_allowed) >= pData->size ) { /* still some data to copy from the last time */ Best regards, Yuki MATSUMOTO MPI development team, Fujitsu (2011/06/28 10:58), Takahiro Kawashima wrote: Dear Open MPI community, I'm a member of MPI library development team in Fujitsu. Shinji Sumimoto, whose name appears in Jeff's blog, is one of our bosses. As Rayson and Jeff noted, K computer, world's most powerful HPC system developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI library. We, Fujitsu, are pleased to announce that, and also have special thanks to Open MPI community. We are sorry to be late announce! Our MPI library is based on Open MPI 1.4 series, and has a new point- to-point component (BTL) and new topology-aware collective communication algorithms (COLL). Also, it is adapted to our runtime environment (ESS, PLM, GRPCOMM etc). K computer connects 68,544 nodes by our custom interconnect. Its runtime environment is our proprietary one. So we don't use orted. We cannot tell start-up time yet because of disclosure restriction, sorry. We are surprised by the extensibility of Open MPI, and have proved that Open MPI is scalable to 68,000 processes level! We feel pleasure to utilize such a great open-source software. We cannot tell detail of our technology yet because of our contract with RIKEN AICS, however, we will plan to feedback of our improvements and bug fixes. We can contribute some bug fixes soon, however, for contribution of our improvements will be next year with Open MPI agreement. Best regards, MPI development team, Fujitsu I got more information: http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ Short version: yes, Open MPI is used on K and was used to power the 8PF runs. w00t! On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: w00t! OMPI powers 8 petaflops! (at least I'm guessing that -- does anyone know if that's true?) On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: Interesting... page 11: http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf Open MPI based: * Open Standard, Open Source, Multi-Platform including PC Cluster. * Adding extension to Open MPI for "Tofu" interconnect Rayson ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel Index: ompi/datatype/datatype_pack.c === --- ompi/datatype/datatype_pack.c (revision 25474) +++ ompi/datatype/datatype_pack.c (working copy) @@ -187,7 +187,7 @@ packed_buffer = (unsigned char *) iov[iov_count].iov_base; done = pConv->bConverted - i * pData->size; /* partial data from last pack */ -if( done != 0 ) { /* still some data to copy from the last time */ +if( (done + max_allowed) >= pData->size ) { /* still some data to copy
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
On Jul 3, 2011, at 8:40 PM, Kawashima wrote: >> Does your llp sed path order MPI matching ordering? Eg if some prior isend >> is already queued, could the llp send overtake it? > > Yes, LLP send may overtake queued isend. > But we use correct PML send_sequence. So the LLP message is queued as > unexpected message on receiver side, and I think it's no problem. Good! I just wanted to ask because I couldn't quite tell from your prior description. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Jeff, > Does your llp sed path order MPI matching ordering? Eg if some prior isend > is already queued, could the llp send overtake it? Yes, LLP send may overtake queued isend. But we use correct PML send_sequence. So the LLP message is queued as unexpected message on receiver side, and I think it's no problem. > >rc = MCA_LLP_CALL(send(buf, size, OMPI_PML_OB1_MATCH_HDR_LEN, > > (bool)OMPI_ENABLE_OB1_PAD_MATCH_HDR, > > ompi_comm_peer_lookup(comm, dst), > > MCA_PML_OB1_HDR_TYPE_MATCH)); > > > >if (rc == OMPI_SUCCESS) { > >/* NOTE this is not thread safe */ > >OPAL_THREAD_ADD32(&proc->send_sequence, 1); > >} Takahiro Kawashima, MPI development team, Fujitsu > Does your llp sed path order MPI matching ordering? Eg if some prior isend > is already queued, could the llp send overtake it? > > Sent from my phone. No type good. > > On Jun 29, 2011, at 8:27 AM, "Kawashima" wrote: > > > Hi Jeff, > > > >>> First, we created a new BTL component, 'tofu BTL'. It's not so special > >>> one but dedicated to our Tofu interconnect. But its latency was not > >>> enough for us. > >>> > >>> So we created a new framework, 'LLP', and its component, 'tofu LLP'. > >>> It bypasses request object creation in PML and BML/BTL, and sends > >>> a message immediately if possible. > >> > >> Gotcha. Was the sendi pml call not sufficient? (sendi = "send > >> immediate") This call was designed to be part of a latency reduction > >> mechanism. I forget offhand what we don't do before calling sendi, but > >> the rationale was that if the message was small enough, we could skip some > >> steps in the sending process and "just send it." > > > > I know sendi, but its latency was not sufficient for us. > > To come at sendi call, we must do: > > - allocate send request (MCA_PML_OB1_SEND_REQUEST_ALLOC) > > - initialize send request (MCA_PML_OB1_SEND_REQUEST_INIT) > > - select BTL module (mca_pml_ob1_send_request_start) > > - select protocol (mca_pml_ob1_send_request_start_btl) > > We want to eliminate these overheads. We want to send more immediately. > > > > Here is a code snippet: > > > > > > > > #if OMPI_ENABLE_LLP > > static inline int mca_pml_ob1_call_llp_send(void *buf, > >size_t size, > >int dst, > >int tag, > >ompi_communicator_t *comm) > > { > >int rc; > >mca_pml_ob1_comm_proc_t *proc = &comm->c_pml_comm->procs[dst]; > >mca_pml_ob1_match_hdr_t *match = mca_pml_ob1.llp_send_buf; > > > >match->hdr_common.hdr_type = MCA_PML_OB1_HDR_TYPE_MATCH; > >match->hdr_common.hdr_flags = 0; > >match->hdr_ctx = comm->c_contextid; > >match->hdr_src = comm->c_my_rank; > >match->hdr_tag = tag; > >match->hdr_seq = proc->send_sequence + 1; > > > >rc = MCA_LLP_CALL(send(buf, size, OMPI_PML_OB1_MATCH_HDR_LEN, > > (bool)OMPI_ENABLE_OB1_PAD_MATCH_HDR, > > ompi_comm_peer_lookup(comm, dst), > > MCA_PML_OB1_HDR_TYPE_MATCH)); > > > >if (rc == OMPI_SUCCESS) { > >/* NOTE this is not thread safe */ > >OPAL_THREAD_ADD32(&proc->send_sequence, 1); > >} > > > >return rc; > > } > > #endif > > > > int mca_pml_ob1_send(void *buf, > > size_t count, > > ompi_datatype_t * datatype, > > int dst, > > int tag, > > mca_pml_base_send_mode_t sendmode, > > ompi_communicator_t * comm) > > { > >int rc; > >mca_pml_ob1_send_request_t *sendreq; > > > > #if OMPI_ENABLE_LLP > >/* try to send message via LLP if > > * - one of LLP modules is available, and > > * - datatype is basic, and > > * - data is small, and > > * - communication mode is standard, buffered, or ready, and > > * - destination is not myself > > */ > >if (((datatype->flags & DT_FLAG_BASIC) == DT_FLAG_BASIC) && > >(datatype->size * count < mca_pml_ob1.llp_max_payload_size) && > >(sendmode == MCA_PML_BASE_SEND_STANDARD || > > sendmode == MCA_PML_BASE_SEND_BUFFERED || > > sendmode == MCA_PML_BASE_SEND_READY) && > >(dst != comm->c_my_rank)) { > >rc = mca_pml_ob1_call_llp_send(buf, datatype->size * count, dst, > > tag, comm); > >if (rc != OMPI_ERR_NOT_AVAILABLE) { > >/* successfully sent out via LLP or unrecoverable error occurred > > */ > >return rc; > >} > >} > > #endif > > > >MCA_PML_OB1_SEND_REQUEST_ALLOC(comm, dst, sendreq, rc); > >if (rc != OMPI_SUCCESS) > >return rc; > > > >MCA_PML_OB1_SEND_REQUEST_INIT(sendreq, > >
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Does your llp sed path order MPI matching ordering? Eg if some prior isend is already queued, could the llp send overtake it? Sent from my phone. No type good. On Jun 29, 2011, at 8:27 AM, "Kawashima" wrote: > Hi Jeff, > >>> First, we created a new BTL component, 'tofu BTL'. It's not so special >>> one but dedicated to our Tofu interconnect. But its latency was not >>> enough for us. >>> >>> So we created a new framework, 'LLP', and its component, 'tofu LLP'. >>> It bypasses request object creation in PML and BML/BTL, and sends >>> a message immediately if possible. >> >> Gotcha. Was the sendi pml call not sufficient? (sendi = "send immediate") >> This call was designed to be part of a latency reduction mechanism. I >> forget offhand what we don't do before calling sendi, but the rationale was >> that if the message was small enough, we could skip some steps in the >> sending process and "just send it." > > I know sendi, but its latency was not sufficient for us. > To come at sendi call, we must do: > - allocate send request (MCA_PML_OB1_SEND_REQUEST_ALLOC) > - initialize send request (MCA_PML_OB1_SEND_REQUEST_INIT) > - select BTL module (mca_pml_ob1_send_request_start) > - select protocol (mca_pml_ob1_send_request_start_btl) > We want to eliminate these overheads. We want to send more immediately. > > Here is a code snippet: > > > > #if OMPI_ENABLE_LLP > static inline int mca_pml_ob1_call_llp_send(void *buf, >size_t size, >int dst, >int tag, >ompi_communicator_t *comm) > { >int rc; >mca_pml_ob1_comm_proc_t *proc = &comm->c_pml_comm->procs[dst]; >mca_pml_ob1_match_hdr_t *match = mca_pml_ob1.llp_send_buf; > >match->hdr_common.hdr_type = MCA_PML_OB1_HDR_TYPE_MATCH; >match->hdr_common.hdr_flags = 0; >match->hdr_ctx = comm->c_contextid; >match->hdr_src = comm->c_my_rank; >match->hdr_tag = tag; >match->hdr_seq = proc->send_sequence + 1; > >rc = MCA_LLP_CALL(send(buf, size, OMPI_PML_OB1_MATCH_HDR_LEN, > (bool)OMPI_ENABLE_OB1_PAD_MATCH_HDR, > ompi_comm_peer_lookup(comm, dst), > MCA_PML_OB1_HDR_TYPE_MATCH)); > >if (rc == OMPI_SUCCESS) { >/* NOTE this is not thread safe */ >OPAL_THREAD_ADD32(&proc->send_sequence, 1); >} > >return rc; > } > #endif > > int mca_pml_ob1_send(void *buf, > size_t count, > ompi_datatype_t * datatype, > int dst, > int tag, > mca_pml_base_send_mode_t sendmode, > ompi_communicator_t * comm) > { >int rc; >mca_pml_ob1_send_request_t *sendreq; > > #if OMPI_ENABLE_LLP >/* try to send message via LLP if > * - one of LLP modules is available, and > * - datatype is basic, and > * - data is small, and > * - communication mode is standard, buffered, or ready, and > * - destination is not myself > */ >if (((datatype->flags & DT_FLAG_BASIC) == DT_FLAG_BASIC) && >(datatype->size * count < mca_pml_ob1.llp_max_payload_size) && >(sendmode == MCA_PML_BASE_SEND_STANDARD || > sendmode == MCA_PML_BASE_SEND_BUFFERED || > sendmode == MCA_PML_BASE_SEND_READY) && >(dst != comm->c_my_rank)) { >rc = mca_pml_ob1_call_llp_send(buf, datatype->size * count, dst, tag, > comm); >if (rc != OMPI_ERR_NOT_AVAILABLE) { >/* successfully sent out via LLP or unrecoverable error occurred */ >return rc; >} >} > #endif > >MCA_PML_OB1_SEND_REQUEST_ALLOC(comm, dst, sendreq, rc); >if (rc != OMPI_SUCCESS) >return rc; > >MCA_PML_OB1_SEND_REQUEST_INIT(sendreq, > buf, > count, > datatype, > dst, tag, > comm, sendmode, false); > >PERUSE_TRACE_COMM_EVENT (PERUSE_COMM_REQ_ACTIVATE, > &(sendreq)->req_send.req_base, > PERUSE_SEND); > >MCA_PML_OB1_SEND_REQUEST_START(sendreq, rc); >if (rc != OMPI_SUCCESS) { >MCA_PML_OB1_SEND_REQUEST_RETURN( sendreq ); >return rc; >} > >ompi_request_wait_completion(&sendreq->req_send.req_base.req_ompi); > >rc = sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR; >ompi_request_free( (ompi_request_t**)&sendreq ); >return rc; > } > > > > mca_pml_ob1_send is body of MPI_Send in Open MPI. Region of > OMPI_ENABLE_LLP is added by us. > > We don't have to use a send request if we could "send immediately". > So we try to se
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Jeff, > > First, we created a new BTL component, 'tofu BTL'. It's not so special > > one but dedicated to our Tofu interconnect. But its latency was not > > enough for us. > > > > So we created a new framework, 'LLP', and its component, 'tofu LLP'. > > It bypasses request object creation in PML and BML/BTL, and sends > > a message immediately if possible. > > Gotcha. Was the sendi pml call not sufficient? (sendi = "send immediate") > This call was designed to be part of a latency reduction mechanism. I forget > offhand what we don't do before calling sendi, but the rationale was that if > the message was small enough, we could skip some steps in the sending process > and "just send it." I know sendi, but its latency was not sufficient for us. To come at sendi call, we must do: - allocate send request (MCA_PML_OB1_SEND_REQUEST_ALLOC) - initialize send request (MCA_PML_OB1_SEND_REQUEST_INIT) - select BTL module (mca_pml_ob1_send_request_start) - select protocol (mca_pml_ob1_send_request_start_btl) We want to eliminate these overheads. We want to send more immediately. Here is a code snippet: #if OMPI_ENABLE_LLP static inline int mca_pml_ob1_call_llp_send(void *buf, size_t size, int dst, int tag, ompi_communicator_t *comm) { int rc; mca_pml_ob1_comm_proc_t *proc = &comm->c_pml_comm->procs[dst]; mca_pml_ob1_match_hdr_t *match = mca_pml_ob1.llp_send_buf; match->hdr_common.hdr_type = MCA_PML_OB1_HDR_TYPE_MATCH; match->hdr_common.hdr_flags = 0; match->hdr_ctx = comm->c_contextid; match->hdr_src = comm->c_my_rank; match->hdr_tag = tag; match->hdr_seq = proc->send_sequence + 1; rc = MCA_LLP_CALL(send(buf, size, OMPI_PML_OB1_MATCH_HDR_LEN, (bool)OMPI_ENABLE_OB1_PAD_MATCH_HDR, ompi_comm_peer_lookup(comm, dst), MCA_PML_OB1_HDR_TYPE_MATCH)); if (rc == OMPI_SUCCESS) { /* NOTE this is not thread safe */ OPAL_THREAD_ADD32(&proc->send_sequence, 1); } return rc; } #endif int mca_pml_ob1_send(void *buf, size_t count, ompi_datatype_t * datatype, int dst, int tag, mca_pml_base_send_mode_t sendmode, ompi_communicator_t * comm) { int rc; mca_pml_ob1_send_request_t *sendreq; #if OMPI_ENABLE_LLP /* try to send message via LLP if * - one of LLP modules is available, and * - datatype is basic, and * - data is small, and * - communication mode is standard, buffered, or ready, and * - destination is not myself */ if (((datatype->flags & DT_FLAG_BASIC) == DT_FLAG_BASIC) && (datatype->size * count < mca_pml_ob1.llp_max_payload_size) && (sendmode == MCA_PML_BASE_SEND_STANDARD || sendmode == MCA_PML_BASE_SEND_BUFFERED || sendmode == MCA_PML_BASE_SEND_READY) && (dst != comm->c_my_rank)) { rc = mca_pml_ob1_call_llp_send(buf, datatype->size * count, dst, tag, comm); if (rc != OMPI_ERR_NOT_AVAILABLE) { /* successfully sent out via LLP or unrecoverable error occurred */ return rc; } } #endif MCA_PML_OB1_SEND_REQUEST_ALLOC(comm, dst, sendreq, rc); if (rc != OMPI_SUCCESS) return rc; MCA_PML_OB1_SEND_REQUEST_INIT(sendreq, buf, count, datatype, dst, tag, comm, sendmode, false); PERUSE_TRACE_COMM_EVENT (PERUSE_COMM_REQ_ACTIVATE, &(sendreq)->req_send.req_base, PERUSE_SEND); MCA_PML_OB1_SEND_REQUEST_START(sendreq, rc); if (rc != OMPI_SUCCESS) { MCA_PML_OB1_SEND_REQUEST_RETURN( sendreq ); return rc; } ompi_request_wait_completion(&sendreq->req_send.req_base.req_ompi); rc = sendreq->req_send.req_base.req_ompi.req_status.MPI_ERROR; ompi_request_free( (ompi_request_t**)&sendreq ); return rc; } mca_pml_ob1_send is body of MPI_Send in Open MPI. Region of OMPI_ENABLE_LLP is added by us. We don't have to use a send request if we could "send immediately". So we try to send via LLP at first. If LLP could not send immediately because of interconnect busy or something, LLP returns OMPI_ERR_NOT_AVAILABLE, and we continue normal PML/BML/BTL send(i). Since we want to use simple memcpy instead of complex convertor, we restrict datatype that can go into the LLP. Of course, we cannot use LLP on MPI_Isend. > Note, too, that the coll modules can be la
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
On Jun 29, 2011, at 3:57 AM, Kawashima wrote: > First, we created a new BTL component, 'tofu BTL'. It's not so special > one but dedicated to our Tofu interconnect. But its latency was not > enough for us. > > So we created a new framework, 'LLP', and its component, 'tofu LLP'. > It bypasses request object creation in PML and BML/BTL, and sends > a message immediately if possible. Gotcha. Was the sendi pml call not sufficient? (sendi = "send immediate") This call was designed to be part of a latency reduction mechanism. I forget offhand what we don't do before calling sendi, but the rationale was that if the message was small enough, we could skip some steps in the sending process and "just send it." Note, too, that the coll modules can be laid overtop of each other -- e.g., if you only implement barrier (and some others) in tofu coll, then you can supply NULL for the other function pointers and the coll base will resolve those functions to other coll modules automatically. > Also, we modified tuned COLL to implement interconnect-and-topology- > specific bcast/allgather/alltoall/allreduce algorithm. These algorithm > implementations also bypass PML/BML/BTL to eliminate protocol and software > overhead. Good. As Sylvain mentioned, that was the intent of the coll framework -- it certainly isn't *necessary* for coll's to always implement their underlying sends/receives with the BTL. The sm coll does this, for example -- it uses its own shared memory block for talking to other the sm coll's in other processes on the same node, but it doesn't go through the sm BTL. > To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/). > > Is there interesting one? > > Though our BTL and COLL are quite interconnect-specific, LLP may be > contributed in the future. Yes, it may be interesting to see what you did there. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Sylvain, > > Also, we modified tuned COLL to implement interconnect-and-topology- > > specific bcast/allgather/alltoall/allreduce algorithm. These algorithm > > implementations also bypass PML/BML/BTL to eliminate protocol and > software > > overhead. > This seems perfectly valid to me. The current coll components use normal > MPI_Send/Recv semantics, hence the PML/BML/BTL chain, but I always saw the > coll framework as a way to be able to integrate smoothly "custom" > collective components for a specific interconnect. I think that Mellanox > also did a specific collective component using directly their ConnectX HCA > capabilities. > > However, modifying the "tuned" component may not be the better way to > integrate your collective work. You may consider creating a "tofu" coll > component which would only provide the collectives you optimized (and the > coll framework will fallback on tuned for the ones you didn't optimize). Yes. I agree. But sadly, my colleague implemented it badly. We created another COLL component that use interconnect barrier, like Mellanox FCA. > > To achieve above, we created 'tofu COMMON', like sm > (ompi/mca/common/sm/). > > > > Is there interesting one? > It may be interesting, yes. I don't know the tofu model, but if it is not > secret, contributing it is usually a good thing. > > Your communication model may be similar to others and portions of code may > be shared with other technologies (I'm thinking of IB, MX, PSM,...). > People writing new code would also consider your model and let you take > advantage of it. Knowing how tofu is integrated into Open MPI may also > impact major decisions the open-source community is taking. Tofu communication model is simular to that of IB RDMA. Actually, we use source code of openib BTL as a reference. We'll consider contribution of some code, and join the discussion. Regards, Takahiro Kawashima, MPI development team, Fujitsu
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Kawashima-san, Congratulations for your machine, this is a stunning achievement ! > Kawashima wrote : > Also, we modified tuned COLL to implement interconnect-and-topology- > specific bcast/allgather/alltoall/allreduce algorithm. These algorithm > implementations also bypass PML/BML/BTL to eliminate protocol and software > overhead. This seems perfectly valid to me. The current coll components use normal MPI_Send/Recv semantics, hence the PML/BML/BTL chain, but I always saw the coll framework as a way to be able to integrate smoothly "custom" collective components for a specific interconnect. I think that Mellanox also did a specific collective component using directly their ConnectX HCA capabilities. However, modifying the "tuned" component may not be the better way to integrate your collective work. You may consider creating a "tofu" coll component which would only provide the collectives you optimized (and the coll framework will fallback on tuned for the ones you didn't optimize). > To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/). > > Is there interesting one? It may be interesting, yes. I don't know the tofu model, but if it is not secret, contributing it is usually a good thing. Your communication model may be similar to others and portions of code may be shared with other technologies (I'm thinking of IB, MX, PSM,...). People writing new code would also consider your model and let you take advantage of it. Knowing how tofu is integrated into Open MPI may also impact major decisions the open-source community is taking. Sylvain
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Jeff, Ralph, and all, Thank you for your reply. RIKEN and Fujitsu will work toword 10Pflops with Open MPI continuously. Here we can explain some parts of our MPI: As page 13 of Koh Hotta's presentation shows, we extended OMPI communication layers. > http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf # Sorry, this figure is somewhat broken. Arrows point incorrect layers. First, we created a new BTL component, 'tofu BTL'. It's not so special one but dedicated to our Tofu interconnect. But its latency was not enough for us. So we created a new framework, 'LLP', and its component, 'tofu LLP'. It bypasses request object creation in PML and BML/BTL, and sends a message immediately if possible. Also, we modified tuned COLL to implement interconnect-and-topology- specific bcast/allgather/alltoall/allreduce algorithm. These algorithm implementations also bypass PML/BML/BTL to eliminate protocol and software overhead. To achieve above, we created 'tofu COMMON', like sm (ompi/mca/common/sm/). Is there interesting one? Though our BTL and COLL are quite interconnect-specific, LLP may be contributed in the future. Regards, Takahiro Kawashima, MPI development team, Fujitsu > I echo what Ralph said -- congratulations! > > Let us know when you'll be ready to contribute back what you can. > > Thanks! > > > On Jun 27, 2011, at 9:58 PM, Takahiro Kawashima wrote: > > > Dear Open MPI community, > > > > I'm a member of MPI library development team in Fujitsu. Shinji > > Sumimoto, whose name appears in Jeff's blog, is one of our bosses. > > > > As Rayson and Jeff noted, K computer, world's most powerful HPC system > > developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI > > library. We, Fujitsu, are pleased to announce that, and also have special > > thanks to Open MPI community. > > We are sorry to be late announce! > > > > Our MPI library is based on Open MPI 1.4 series, and has a new point- > > to-point component (BTL) and new topology-aware collective communication > > algorithms (COLL). Also, it is adapted to our runtime environment (ESS, > > PLM, GRPCOMM etc). > > > > K computer connects 68,544 nodes by our custom interconnect. > > Its runtime environment is our proprietary one. So we don't use orted. > > We cannot tell start-up time yet because of disclosure restriction, sorry. > > > > We are surprised by the extensibility of Open MPI, and have proved that > > Open MPI is scalable to 68,000 processes level! We feel pleasure to > > utilize such a great open-source software. > > > > We cannot tell detail of our technology yet because of our contract > > with RIKEN AICS, however, we will plan to feedback of our improvements > > and bug fixes. We can contribute some bug fixes soon, however, for > > contribution of our improvements will be next year with Open MPI > > agreement. > > > > Best regards, > > > > MPI development team, > > Fujitsu > > > > > >> I got more information: > >> > >> http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ > >> > >> Short version: yes, Open MPI is used on K and was used to power the 8PF > >> runs. > >> > >> w00t! > >> > >> > >> > >> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: > >> > >>> w00t! > >>> > >>> OMPI powers 8 petaflops! > >>> (at least I'm guessing that -- does anyone know if that's true?) > >>> > >>> > >>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: > >>> > Interesting... page 11: > > http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf > > Open MPI based: > > * Open Standard, Open Source, Multi-Platform including PC Cluster. > * Adding extension to Open MPI for "Tofu" interconnect > > Rayson > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
I echo what Ralph said -- congratulations! Let us know when you'll be ready to contribute back what you can. Thanks! On Jun 27, 2011, at 9:58 PM, Takahiro Kawashima wrote: > Dear Open MPI community, > > I'm a member of MPI library development team in Fujitsu. Shinji > Sumimoto, whose name appears in Jeff's blog, is one of our bosses. > > As Rayson and Jeff noted, K computer, world's most powerful HPC system > developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI > library. We, Fujitsu, are pleased to announce that, and also have special > thanks to Open MPI community. > We are sorry to be late announce! > > Our MPI library is based on Open MPI 1.4 series, and has a new point- > to-point component (BTL) and new topology-aware collective communication > algorithms (COLL). Also, it is adapted to our runtime environment (ESS, > PLM, GRPCOMM etc). > > K computer connects 68,544 nodes by our custom interconnect. > Its runtime environment is our proprietary one. So we don't use orted. > We cannot tell start-up time yet because of disclosure restriction, sorry. > > We are surprised by the extensibility of Open MPI, and have proved that > Open MPI is scalable to 68,000 processes level! We feel pleasure to > utilize such a great open-source software. > > We cannot tell detail of our technology yet because of our contract > with RIKEN AICS, however, we will plan to feedback of our improvements > and bug fixes. We can contribute some bug fixes soon, however, for > contribution of our improvements will be next year with Open MPI > agreement. > > Best regards, > > MPI development team, > Fujitsu > > >> I got more information: >> >> http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ >> >> Short version: yes, Open MPI is used on K and was used to power the 8PF runs. >> >> w00t! >> >> >> >> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: >> >>> w00t! >>> >>> OMPI powers 8 petaflops! >>> (at least I'm guessing that -- does anyone know if that's true?) >>> >>> >>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >>> Interesting... page 11: http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf Open MPI based: * Open Standard, Open Source, Multi-Platform including PC Cluster. * Adding extension to Open MPI for "Tofu" interconnect Rayson > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Thank you for the info! Congratulations on a tremendous achievement I look forward to hearing more about the system and its performance as disclosure permits. Anything you can share is most welcome as we always welcome the opportunity to learn how to improve OMPI. Meantime, good wishes on your continued efforts towards the 10Pflop! Best regards Ralph On Jun 27, 2011, at 7:58 PM, Takahiro Kawashima wrote: > Dear Open MPI community, > > I'm a member of MPI library development team in Fujitsu. Shinji > Sumimoto, whose name appears in Jeff's blog, is one of our bosses. > > As Rayson and Jeff noted, K computer, world's most powerful HPC system > developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI > library. We, Fujitsu, are pleased to announce that, and also have special > thanks to Open MPI community. > We are sorry to be late announce! > > Our MPI library is based on Open MPI 1.4 series, and has a new point- > to-point component (BTL) and new topology-aware collective communication > algorithms (COLL). Also, it is adapted to our runtime environment (ESS, > PLM, GRPCOMM etc). > > K computer connects 68,544 nodes by our custom interconnect. > Its runtime environment is our proprietary one. So we don't use orted. > We cannot tell start-up time yet because of disclosure restriction, sorry. > > We are surprised by the extensibility of Open MPI, and have proved that > Open MPI is scalable to 68,000 processes level! We feel pleasure to > utilize such a great open-source software. > > We cannot tell detail of our technology yet because of our contract > with RIKEN AICS, however, we will plan to feedback of our improvements > and bug fixes. We can contribute some bug fixes soon, however, for > contribution of our improvements will be next year with Open MPI > agreement. > > Best regards, > > MPI development team, > Fujitsu > > >> I got more information: >> >> http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ >> >> Short version: yes, Open MPI is used on K and was used to power the 8PF runs. >> >> w00t! >> >> >> >> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: >> >>> w00t! >>> >>> OMPI powers 8 petaflops! >>> (at least I'm guessing that -- does anyone know if that's true?) >>> >>> >>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >>> Interesting... page 11: http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf Open MPI based: * Open Standard, Open Source, Multi-Platform including PC Cluster. * Adding extension to Open MPI for "Tofu" interconnect Rayson > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Dear Open MPI community, I'm a member of MPI library development team in Fujitsu. Shinji Sumimoto, whose name appears in Jeff's blog, is one of our bosses. As Rayson and Jeff noted, K computer, world's most powerful HPC system developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI library. We, Fujitsu, are pleased to announce that, and also have special thanks to Open MPI community. We are sorry to be late announce! Our MPI library is based on Open MPI 1.4 series, and has a new point- to-point component (BTL) and new topology-aware collective communication algorithms (COLL). Also, it is adapted to our runtime environment (ESS, PLM, GRPCOMM etc). K computer connects 68,544 nodes by our custom interconnect. Its runtime environment is our proprietary one. So we don't use orted. We cannot tell start-up time yet because of disclosure restriction, sorry. We are surprised by the extensibility of Open MPI, and have proved that Open MPI is scalable to 68,000 processes level! We feel pleasure to utilize such a great open-source software. We cannot tell detail of our technology yet because of our contract with RIKEN AICS, however, we will plan to feedback of our improvements and bug fixes. We can contribute some bug fixes soon, however, for contribution of our improvements will be next year with Open MPI agreement. Best regards, MPI development team, Fujitsu > I got more information: > >http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ > > Short version: yes, Open MPI is used on K and was used to power the 8PF runs. > > w00t! > > > > On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: > > > w00t! > > > > OMPI powers 8 petaflops! > > (at least I'm guessing that -- does anyone know if that's true?) > > > > > > On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: > > > >> Interesting... page 11: > >> > >> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf > >> > >> Open MPI based: > >> > >> * Open Standard, Open Source, Multi-Platform including PC Cluster. > >> * Adding extension to Open MPI for "Tofu" interconnect > >> > >> Rayson
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
On Sat, Jun 25, 2011 at 9:23 PM, Jeff Squyres wrote: > I got more information: > > http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ That's really awesome!! SC08: "Open MPI: 10^15 Flops Can't Be Wrong" 2011: "Open MPI: 8 * 10^15 Flops Can't Be Wrong" And equally awesome is that Fujitsu is going to contribute its changes back to Open MPI!! Can't wait to see presentations like: "Open MPI: 10^17 Flops Can't Be Wrong", or even "Open MPI: 10^18 Flops Can't Be Wrong" :-) Rayson > > Short version: yes, Open MPI is used on K and was used to power the 8PF runs. > > w00t! > > > > On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: > >> w00t! >> >> OMPI powers 8 petaflops! >> (at least I'm guessing that -- does anyone know if that's true?) >> >> >> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >> >>> Interesting... page 11: >>> >>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf >>> >>> Open MPI based: >>> >>> * Open Standard, Open Source, Multi-Platform including PC Cluster. >>> * Adding extension to Open MPI for "Tofu" interconnect >>> >>> Rayson >>> >>> == >>> Grid Engine / Open Grid Scheduler >>> http://gridscheduler.sourceforge.net/ >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Any info available on the launch environment used, and how long it took to start the 8Pf job? On Jun 25, 2011, at 7:23 PM, Jeff Squyres wrote: > I got more information: > > http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ > > Short version: yes, Open MPI is used on K and was used to power the 8PF runs. > > w00t! > > > > On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: > >> w00t! >> >> OMPI powers 8 petaflops! >> (at least I'm guessing that -- does anyone know if that's true?) >> >> >> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >> >>> Interesting... page 11: >>> >>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf >>> >>> Open MPI based: >>> >>> * Open Standard, Open Source, Multi-Platform including PC Cluster. >>> * Adding extension to Open MPI for "Tofu" interconnect >>> >>> Rayson >>> >>> == >>> Grid Engine / Open Grid Scheduler >>> http://gridscheduler.sourceforge.net/ >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
I got more information: http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ Short version: yes, Open MPI is used on K and was used to power the 8PF runs. w00t! On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: > w00t! > > OMPI powers 8 petaflops! > (at least I'm guessing that -- does anyone know if that's true?) > > > On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: > >> Interesting... page 11: >> >> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf >> >> Open MPI based: >> >> * Open Standard, Open Source, Multi-Platform including PC Cluster. >> * Adding extension to Open MPI for "Tofu" interconnect >> >> Rayson >> >> == >> Grid Engine / Open Grid Scheduler >> http://gridscheduler.sourceforge.net/ >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
w00t! OMPI powers 8 petaflops! (at least I'm guessing that -- does anyone know if that's true?) On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: > Interesting... page 11: > > http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf > > Open MPI based: > > * Open Standard, Open Source, Multi-Platform including PC Cluster. > * Adding extension to Open MPI for "Tofu" interconnect > > Rayson > > == > Grid Engine / Open Grid Scheduler > http://gridscheduler.sourceforge.net/ > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/