Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

Rayson Ho Wed, 9 Jan 2013 18:15:24 -0500

Hi Ralph,

Since the whole journal is available online, and is reachable by
Google, I don't believe we can get into copyright issues by providing
a link to it (but then, I also know that there are countries that have
more crazy web page linking rules!).


http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster
http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html


On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain <[email protected]> wrote:
> I'm unaware of any formal criteria. The papers currently located there are 
> those written by members of the OMPI community, but we can certainly link to 
> something written by someone else, so long as we don't get into copyright 
> issues.
>
> On Sep 19, 2012, at 11:57 PM, Rayson Ho <[email protected]> wrote:
>
>> I found this paper recently, "MPI Library and Low-Level Communication
>> on the K computer", available at:
>>
>> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf
>>
>> What are the criteria for adding papers to the "Open MPI Publications" page?
>>
>> Rayson
>>
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>>
>> On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca <[email protected]> wrote:
>>> Dear Yuki and Takahiro,
>>>
>>> Thanks for the bug report and for the patch. I pushed a [nearly identical] 
>>> patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A 
>>> special version for the 1.4 has been prepared and has been attached to the 
>>> ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916).
>>>
>>>  Thanks,
>>>  george.
>>>
>>>
>>> On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote:
>>>
>>>> Dear Open MPI community,
>>>>
>>>> I'm a member of MPI library development team in Fujitsu,
>>>> Takahiro Kawashima, who sent mail before, is my colleague.
>>>> We start to feed back.
>>>>
>>>> First, we fixed about MPI_LB/MPI_UB and data packing problem.
>>>>
>>>> Program crashes when it meets all of the following conditions:
>>>> a: The type of sending data is contiguous and derived type.
>>>> b: Either or both of MPI_LB and MPI_UB is used in the data type.
>>>> c: The size of sending data is smaller than extent(Data type has gap).
>>>> d: Send-count is bigger than 1.
>>>> e: Total size of data is bigger than "eager limit"
>>>>
>>>> This problem occurs in attachment C program.
>>>>
>>>> An incorrect-address accessing occurs
>>>> because an unintended value of "done" inputs and
>>>> the value of "max_allowd" becomes minus
>>>> in the following place in "ompi/datatype/datatype_pack.c(in version 
>>>> 1.4.3)".
>>>>
>>>>
>>>> (ompi/datatype/datatype_pack.c)
>>>> 188             packed_buffer = (unsigned char *) iov[iov_count].iov_base;
>>>> 189             done = pConv->bConverted - i * pData->size;  /* partial 
>>>> data from last pack */
>>>> 190             if( done != 0 ) {  /* still some data to copy from the 
>>>> last time */
>>>> 191                 done = pData->size - done;
>>>> 192                 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, 
>>>> pConv->pBaseBuf, pData, pConv->count );
>>>> 193                 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv );
>>>> 194                 packed_buffer += done;
>>>> 195                 max_allowed -= done;
>>>> 196                 total_bytes_converted += done;
>>>> 197                 user_memory += (extent - pData->size + done);
>>>> 198             }
>>>>
>>>> This program assumes "done" as the size of partial data from last pack.
>>>> However, when the program crashes, "done" equals the sum of all 
>>>> transmitted data size.
>>>> It makes "max_allowed" to be a negative value.
>>>>
>>>> We modified the code as following and it passed our test suite.
>>>> But we are not sure this fix is correct. Can anyone review this fix?
>>>> Patch (against Open MPI 1.4 branch) is attached to this mail.
>>>>
>>>> -            if( done != 0 ) {  /* still some data to copy from the last 
>>>> time */
>>>> +            if( (done + max_allowed) >= pData->size ) {  /* still some 
>>>> data to copy from the last time */
>>>>
>>>> Best regards,
>>>>
>>>> Yuki MATSUMOTO
>>>> MPI development team,
>>>> Fujitsu
>>>>
>>>> (2011/06/28 10:58), Takahiro Kawashima wrote:
>>>>> Dear Open MPI community,
>>>>>
>>>>> I'm a member of MPI library development team in Fujitsu. Shinji
>>>>> Sumimoto, whose name appears in Jeff's blog, is one of our bosses.
>>>>>
>>>>> As Rayson and Jeff noted, K computer, world's most powerful HPC system
>>>>> developed by RIKEN and Fujitsu, utilizes Open MPI as a base of its MPI
>>>>> library. We, Fujitsu, are pleased to announce that, and also have special
>>>>> thanks to Open MPI community.
>>>>> We are sorry to be late announce!
>>>>>
>>>>> Our MPI library is based on Open MPI 1.4 series, and has a new point-
>>>>> to-point component (BTL) and new topology-aware collective communication
>>>>> algorithms (COLL). Also, it is adapted to our runtime environment (ESS,
>>>>> PLM, GRPCOMM etc).
>>>>>
>>>>> K computer connects 68,544 nodes by our custom interconnect.
>>>>> Its runtime environment is our proprietary one. So we don't use orted.
>>>>> We cannot tell start-up time yet because of disclosure restriction, sorry.
>>>>>
>>>>> We are surprised by the extensibility of Open MPI, and have proved that
>>>>> Open MPI is scalable to 68,000 processes level! We feel pleasure to
>>>>> utilize such a great open-source software.
>>>>>
>>>>> We cannot tell detail of our technology yet because of our contract
>>>>> with RIKEN AICS, however, we will plan to feedback of our improvements
>>>>> and bug fixes. We can contribute some bug fixes soon, however, for
>>>>> contribution of our improvements will be next year with Open MPI
>>>>> agreement.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> MPI development team,
>>>>> Fujitsu
>>>>>
>>>>>
>>>>>> I got more information:
>>>>>>
>>>>>>   http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/
>>>>>>
>>>>>> Short version: yes, Open MPI is used on K and was used to power the 8PF 
>>>>>> runs.
>>>>>>
>>>>>> w00t!
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote:
>>>>>>
>>>>>>> w00t!
>>>>>>>
>>>>>>> OMPI powers 8 petaflops!
>>>>>>> (at least I'm guessing that -- does anyone know if that's true?)
>>>>>>>
>>>>>>>
>>>> Open MPI based:
>>>>>>>>
>>>>>>>>>>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote:
>>>>>>>
>>>>>>>> Interesting... page 11:
>>>>>>>>
>>>>>>>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf
>>>>>>>>
>>>>>>>> * Open Standard, Open Source, Multi-Platform including PC Cluster.
>>>>>>>> * Adding extension to Open MPI for "Tofu" interconnect
>>>>>>>>
>>>>>>>> Rayson
>>>>>>>> http://blogs.scalablelogic.com/
>>
>> _______________________________________________
>> devel mailing list
>> [email protected]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] "Open MPI"-based MPI library used by K computer

Reply via email to