Dear Yuki and Takahiro, Thanks for the bug report and for the patch. I pushed a [nearly identical] patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A special version for the 1.4 has been prepared and has been attached to the ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916).
Thanks, george. On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: > Dear Open MPI community, > > I'm a member of MPI library development team in Fujitsu, > Takahiro Kawashima, who sent mail before, is my colleague. > We start to feed back. > > First, we fixed about MPI_LB/MPI_UB and data packing problem. > > Program crashes when it meets all of the following conditions: > a: The type of sending data is contiguous and derived type. > b: Either or both of MPI_LB and MPI_UB is used in the data type. > c: The size of sending data is smaller than extent(Data type has gap). > d: Send-count is bigger than 1. > e: Total size of data is bigger than "eager limit" > > This problem occurs in attachment C program. > > An incorrect-address accessing occurs > because an unintended value of "done" inputs and > the value of "max_allowd" becomes minus > in the following place in "ompi/datatype/datatype_pack.c(in version 1.4.3)". > > > (ompi/datatype/datatype_pack.c) > 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; > 189 done = pConv->bConverted - i * pData->size; /* partial data > from last pack */ > 190 if( done != 0 ) { /* still some data to copy from the last > time */ > 191 done = pData->size - done; > 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, > pConv->pBaseBuf, pData, pConv->count ); > 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); > 194 packed_buffer += done; > 195 max_allowed -= done; > 196 total_bytes_converted += done; > 197 user_memory += (extent - pData->size + done); > 198 } > > This program assumes "done" as the size of partial data from last pack. > However, when the program crashes, "done" equals the sum of all transmitted > data size. > It makes "max_allowed" to be a negative value. > > We modified the code as following and it passed our test suite. > But we are not sure this fix is correct. Can anyone review this fix? > Patch (against Open MPI 1.4 branch) is attached to this mail. > > - if( done != 0 ) { /* still some data to copy from the last time > */ > + if( (done + max_allowed) >= pData->size ) { /* still some data > to copy from the last time */ > > Best regards, > > Yuki MATSUMOTO > MPI development team, > Fujitsu > > (2011/06/28 10:58), Takahiro Kawashima wrote: >> Dear Open MPI community, >> >> I'm a member of MPI library development team in Fujitsu. Shinji >> Sumimoto, whose name appears in Jeff's blog, is one of our bosses. >> >> As Rayson and Jeff noted, K computer, world's most powerful HPC system >> developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI >> library. We, Fujitsu, are pleased to announce that, and also have special >> thanks to Open MPI community. >> We are sorry to be late announce! >> >> Our MPI library is based on Open MPI 1.4 series, and has a new point- >> to-point component (BTL) and new topology-aware collective communication >> algorithms (COLL). Also, it is adapted to our runtime environment (ESS, >> PLM, GRPCOMM etc). >> >> K computer connects 68,544 nodes by our custom interconnect. >> Its runtime environment is our proprietary one. So we don't use orted. >> We cannot tell start-up time yet because of disclosure restriction, sorry. >> >> We are surprised by the extensibility of Open MPI, and have proved that >> Open MPI is scalable to 68,000 processes level! We feel pleasure to >> utilize such a great open-source software. >> >> We cannot tell detail of our technology yet because of our contract >> with RIKEN AICS, however, we will plan to feedback of our improvements >> and bug fixes. We can contribute some bug fixes soon, however, for >> contribution of our improvements will be next year with Open MPI >> agreement. >> >> Best regards, >> >> MPI development team, >> Fujitsu >> >> >>> I got more information: >>> >>> http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ >>> >>> Short version: yes, Open MPI is used on K and was used to power the 8PF >>> runs. >>> >>> w00t! >>> >>> >>> >>> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: >>> >>>> w00t! >>>> >>>> OMPI powers 8 petaflops! >>>> (at least I'm guessing that -- does anyone know if that's true?) >>>> >>>> >>>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >>>> >>>>> Interesting... page 11: >>>>> >>>>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf >>>>> >>>>> Open MPI based: >>>>> >>>>> * Open Standard, Open Source, Multi-Platform including PC Cluster. >>>>> * Adding extension to Open MPI for "Tofu" interconnect >>>>> >>>>> Rayson >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> > > <ub_lb.patch><tp_lb_ub_ng.c>_______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel