[OMPI devel] "Open MPI"-based MPI library used by K computer
Interesting... page 11: http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf Open MPI based: * Open Standard, Open Source, Multi-Platform including PC Cluster. * Adding extension to Open MPI for "Tofu" interconnect Rayson == Grid Engine / Open Grid Scheduler http://gridscheduler.sourceforge.net/
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
On Sat, Jun 25, 2011 at 9:23 PM, Jeff Squyres wrote: > I got more information: > > http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ That's really awesome!! SC08: "Open MPI: 10^15 Flops Can't Be Wrong" 2011: "Open MPI: 8 * 10^15 Flops Can't Be Wrong" And equally awesome is that Fujitsu is going to contribute its changes back to Open MPI!! Can't wait to see presentations like: "Open MPI: 10^17 Flops Can't Be Wrong", or even "Open MPI: 10^18 Flops Can't Be Wrong" :-) Rayson > > Short version: yes, Open MPI is used on K and was used to power the 8PF runs. > > w00t! > > > > On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: > >> w00t! >> >> OMPI powers 8 petaflops! >> (at least I'm guessing that -- does anyone know if that's true?) >> >> >> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >> >>> Interesting... page 11: >>> >>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf >>> >>> Open MPI based: >>> >>> * Open Standard, Open Source, Multi-Platform including PC Cluster. >>> * Adding extension to Open MPI for "Tofu" interconnect >>> >>> Rayson >>> >>> == >>> Grid Engine / Open Grid Scheduler >>> http://gridscheduler.sourceforge.net/ >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
[OMPI devel] IBM to acquire Platform Computing!
I guess Platform MPI (which is a "merge" of Scali MPI & HP-MPI) will get technologies from IBM MPI as well... or "merge" IBM's MPI into Platform MPI (merge is around quotes because it is in general hard to merge technologies - like I told a co-worker 10 years ago that one can't just merge SGE with PBS and get a better batch system, and one can't just merge Alpha with PA-RISC and get a super-fast microprocessor). So I guess there will be some engineering to do at IBM/Platform. It is nevertheless exciting news!! http://www.platform.com/press-releases/2011/IBMtoAcquireSystemSoftwareCompanyPlatformComputingtoExtendReachofTechnicalComputing Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI devel] openmpi-1.5.5rc1: 2nd gmake dependence (mostly VT)
On Tue, Dec 20, 2011 at 8:28 PM, Larry Baker wrote: >> I am pretty sure a literal "rm -rf" should be fine. > > Not necessarily. I'm not at work. But I think either -f or -r might not be > legal on all Unix's (Tru64 Unix? AIX?). I used to code on AIX daily, and I am pretty sure that "rm -rf" works on AIX: http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.cmds%2Fdoc%2Faixcmds4%2Frm.htm I have not worked on Tru64 for many years, but according to the manpage, -r & -f are supported: http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51_HTML/MAN/MAN1/0320.HTM Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > > Larry Baker > US Geological Survey > 650-329-5608 > ba...@usgs.gov > > > > > On Dec 20, 2011, at 5:19 PM, Paul H. Hargrove wrote: > >> For the first time I tried "make clean" on FreeBSD and found /another/ >> GNU-vs-Berkeley Make problem. >> >> The problem is use of $(RM) in several Makefile.am's (see below for list). >> The onlt non-VT instance (ompi_info/Makefile.am) occurs in openmpi-1.4.5rc1 >> as well. >> >> $(RM) is a predefined variable in GNU Make, not provided by Berkeley Make >> (or by Automake for that matter). >> I am pretty sure a literal "rm -rf" should be fine. >> >> -Paul >> >>> $ find openmpi-1.5.5rc1 -name Makefile.am | xargs grep -w RM >>> openmpi-1.5.5rc1/ompi/tools/ompi_info/Makefile.am: test -z >>> "$(OMPI_CXX_TEMPLATE_REPOSITORY)" || $(RM) -rf >>> $(OMPI_CXX_TEMPLATE_REPOSITORY) >>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/hello/Makefile.am: >>> $(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z *.marker.z >>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/generic_streams-mpi/Makefile.am: >>> $(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z *.marker.z >>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/generic_streams/Makefile.am: >>> $(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z >>> *.marker.z >>> openmpi-1.5.5rc1/ompi/contrib/vt/vt/extlib/otf/tests/progress/Makefile.am: >>> $(RM) *.otf *.def *.events *.marker *.otf.z *.def.z *.events.z >>> *.marker.z >> >> -- >> Paul H. Hargrove phhargr...@lbl.gov >> Future Technologies Group >> HPC Research Department Tel: +1-510-495-2352 >> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI devel] RFC: Java MPI bindings
Currently, Hadoop tasks (in a job) are independent of each. If Hadoop is going to use MPI for inter-task communication, then make sure they understand that the MPI standard currently does not address fault folerant. Note that it is not uncommon to run map reduce jobs on Amazon EC2's spot instances, which can be taken back by Amazon at any time if the spot price rises above the bid price of the user. If Hadoop is going to use MPI, and without a fault folerant MPI implementation, then the whole job needs to be rerun. http://www.youtube.com/watch?v=66rfnFA0jpM Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ On Wed, Feb 1, 2012 at 3:20 PM, Ralph Castain wrote: > FROM: LANL, HLRS, Cisco, Oracle, and IBM > > WHAT: Adds Java bindings > > WHY: The Hadoop community would like to use MPI in their efforts, and most of > their code is in Java > > WHERE: ompi/mpi/java plus one new config file in ompi/config > > TIMEOUT: Feb 10, 2012 > > > Hadoop is a Java-based environment for processing extremely large data sets. > Modeled on the Google enterprise system, it has evolved into its own > open-source community. Currently, they use their own IPC for messaging, but > acknowledge that it is nowhere near as efficient or well-developed as found > in MPI. > > While 3rd party Java bindings are available, the Hadoop business world is > leery of depending on something that "bolts on" - they would be more willing > to adopt the technology if it were included in a "standard" distribution. > Hence, they have requested that Open MPI provide that capability, and in > exchange will help champion broader adoption of Java support within the MPI > community. > > We have based the OMPI bindings on the mpiJava code originally developed at > IU, and currently maintained by HLRS. Adding the bindings to OMPI is > completely transparent to all other OMPI users and has zero performance > impact on the rest of the code/bindings. We have setup the configure so that > the Java bindings will build if/when they can or are explicitly requested, > just as with other language support. > > As the Hadoop community represents a rapidly-growing new set of customers and > needs, we feel that adding these bindings is appropriate. The bindings will > be maintained by those organizations that have an interest in this use-case. > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI devel] RFC: Java MPI bindings
Ralph, I am not totally against the idea. As long as Hadoop is not taking away the current task communication mechanism until MPI finally (there are just too many papers on FT MPI, I remember reading checkpointing MPI jobs more than 10 years ago!) has a standard way to handle node failure, then I am not concerned at all! Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ On Tue, Feb 7, 2012 at 3:14 PM, Ralph Castain wrote: > The community is aware of the issue. However, the corporations > interested/involved in this area are not running on EC2 nor concerned about > having allocations taken away. The question of failed nodes is something we > plan to address over time, but is not considered an immediate show-stopper. > > On Feb 7, 2012, at 1:05 PM, Rayson Ho wrote: > >> Currently, Hadoop tasks (in a job) are independent of each. If Hadoop >> is going to use MPI for inter-task communication, then make sure they >> understand that the MPI standard currently does not address fault >> folerant. >> >> Note that it is not uncommon to run map reduce jobs on Amazon EC2's >> spot instances, which can be taken back by Amazon at any time if the >> spot price rises above the bid price of the user. If Hadoop is going >> to use MPI, and without a fault folerant MPI implementation, then the >> whole job needs to be rerun. >> >> http://www.youtube.com/watch?v=66rfnFA0jpM >> >> Rayson >> >> = >> Open Grid Scheduler / Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> Scalable Grid Engine Support Program >> http://www.scalablelogic.com/ >> >> >> On Wed, Feb 1, 2012 at 3:20 PM, Ralph Castain wrote: >>> FROM: LANL, HLRS, Cisco, Oracle, and IBM >>> >>> WHAT: Adds Java bindings >>> >>> WHY: The Hadoop community would like to use MPI in their efforts, and most >>> of their code is in Java >>> >>> WHERE: ompi/mpi/java plus one new config file in ompi/config >>> >>> TIMEOUT: Feb 10, 2012 >>> >>> >>> Hadoop is a Java-based environment for processing extremely large data >>> sets. Modeled on the Google enterprise system, it has evolved into its own >>> open-source community. Currently, they use their own IPC for messaging, but >>> acknowledge that it is nowhere near as efficient or well-developed as found >>> in MPI. >>> >>> While 3rd party Java bindings are available, the Hadoop business world is >>> leery of depending on something that "bolts on" - they would be more >>> willing to adopt the technology if it were included in a "standard" >>> distribution. Hence, they have requested that Open MPI provide that >>> capability, and in exchange will help champion broader adoption of Java >>> support within the MPI community. >>> >>> We have based the OMPI bindings on the mpiJava code originally developed at >>> IU, and currently maintained by HLRS. Adding the bindings to OMPI is >>> completely transparent to all other OMPI users and has zero performance >>> impact on the rest of the code/bindings. We have setup the configure so >>> that the Java bindings will build if/when they can or are explicitly >>> requested, just as with other language support. >>> >>> As the Hadoop community represents a rapidly-growing new set of customers >>> and needs, we feel that adding these bindings is appropriate. The bindings >>> will be maintained by those organizations that have an interest in this >>> use-case. >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> -- >> Rayson >> >> == >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] MVAPICH2 vs Open-MPI
See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it something that they are doing to optimize for CUDA & GPUs and those optimizations are not in OMPI, or did they specifically tune MVAPICH2 to make it shine?? http://hpcadvisorycouncil.com/events/2012/Israel-Workshop/Presentations/7_OSU.pdf The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/ Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/
Re: [OMPI devel] algorithm selection in open mpi
Performance depends on the network topology & node hardware, and the benchmark - so we don't have enough information to determine the root of the issue... However, you can do some debugging on your end (once you master the techniques you will be able to debug all sorts of performance problems - not just those in Open MPI): Compile coll_tuned_decision_fixed.c with debug on (-g), and remote log onto a node where one of the tasks runs, and attach a debugger to see if the execution path changes. If it does, then you next step will be to determine if the bottleneck of the benchmark really is affected by the decisions made in coll_tuned_decision_fixed.c . Note that to attach a debugger (just gdb will do for this case), you will need to put a sleep after main (ideally before MPI_Init() so that you can attach the debugger. You can even use all sorts of hacks like using shell script wrappers to echo the PID of the task and then sleep in the script before starting the real MPI task so that you don't need to recompile the benchmark... Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ On Mon, Apr 2, 2012 at 11:10 PM, roswan ismail wrote: > Hi all.. > > I am Roswan Ismail from Malaysia. I am focusing on MPI communication > performance on quad-core cluster at my university. I used Open MPI-1.4.3 and > measurements were done using scampi benchmark. > > As I know, open MPI used multiple algorithms to broadcast data (MPI_BCAST) > such as binomial, pipeline, binary tree, basic linear and split binary tree. > All these algorithms will be used based on message size and communicator > size. For example, binomial is used when message size to be broadcasted is > small while pipeline used for broadcasting a large message. > > What I want to do now is, to use fixed algorithm i.e binomial for all > message size. I want to see and compare the results with the default > results. So, I was modified coll_tuned_decision_fixed.c which is located in > open mpi-1.4.3/ompi/mca/coll/tuned by returning binomial algorithm for all > condition. Then I recompile the files but the problem is, the results > obtained is same as default. It seems I do not do any changes to the codes. > > So could you guys tell me the right way to do that. > > Many thanks > > Roswan Binti Ismail, > FTMK, > Univ. Pend. Sultan Idris, > Tg Malim, Perak. > Pej: 05-4505173 > H/P: 0123588047 > iewa...@gmail.com > ros...@ftmk.upsi.edu.my > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI devel] RFC: opal_cache_line_size
On Mon, Apr 23, 2012 at 4:21 PM, Jeffrey Squyres wrote: > No one replied to this RFC. Does anyone have an opinion about it? > > I have attached a patch (including some debugging output) showing my initial > implementation. If no one objects by the end of this week, I'll commit to > the trunk. I have a naive question - why do we need to find the cacheline size of the L1? If it is to avoid cacheline ping-pong, shouldn't we set "opal_cache_line_size" to at least the line size of the L2? I am not a cache coherency expert (so correct me if I am wrong) - I think most of the modern processors keep track of memory ownership (in the MOSI or MOESI protocols) by the L2 line size. So if L1 line size is smaller than L2 line size, then we will still get cache line ping pong effect in those processors. I quickly googled and found that in modern AMD & Intel processors, L1 line size is same as the L2 line size, and same is true for K computer's SPARC64 VIIIfx. However, Itanium has L1 line size = 32 bytes, L2 line size = 64 bytes. And it's the L2 that interfaces the bus logic: http://www.owchallie.com/systems/cache-itanium.php So if we dirty an L1 cache line, the cache coherency logic would mark the whole 64-byte L2 line as dirty (Modified). Thus if another thread/processor owns a seperate L1 that is next to the first line and thus shares the L2 line, we would still get false sharing... Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > > Terry: please add this to the agenda tomorrow. > > > On Mar 30, 2012, at 1:09 PM, Jeffrey Squyres wrote: > >> I was just recently reminded of a comment that is near the top of >> opal_init_util(): >> >> /* JMS See note in runtime/opal.h -- this is temporary; to be >> replaced with real hwloc information soon (in trunk/v1.5 and >> beyond, only). This *used* to be a #define, so it's important >> to define it very early. */ >> opal_cache_line_size = 128; >> >> A few points: >> >> 1. On my platforms, hwloc tells me that my cache line size is 64, not 128. >> Probably not a tragedy, but... >> >> 2. I see opal_cache_line_size being used in a lot of BTL and PML >> initialization locations. I see it being used in opal/class/free_list.*, >> too. >> >> 3. I poked around with this yesterday to see if we could have hwloc >> initialize the opal_cache_line_size value. Points to remember: >> >> - we initialize the opal hwloc framework in opal_init(), but we do not load >> the local machine's architecture then (because it can be expensive, >> particularly if lots of MPI processes are all doing it simultaneously) >> - instead, the local machine topology is discovered once by each orted >> (using hwloc) and then RML sent to each local MPI process, where it is >> locally loaded into each MPI proc's hwloc tree >> - this happens during the orte_init() in ompi_mpi_init() >> >> Meaning: we can initialize the opal_cache_line_size in MPI processes during >> orte_init(). >> >> Is this acceptable to everyone? >> >> If so, I can go ahead and code this up. I would probably leave the initial >> value hard-coded to 128 (just in case something uses it before orte_init()), >> and then later during orte_init(), reset it to the smallest L1 cache size >> that hwloc finds on the machine. >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI devel] RFC: opal_cache_line_size
On Mon, Apr 23, 2012 at 5:56 PM, Jeffrey Squyres wrote: > On Apr 23, 2012, at 5:53 PM, George Bosilca wrote: > >> However, I did a quick grep and most of our headers are larger than a single >> line of cache (even Itanium L2) so I suppose that making >> opal_cache_line_size equal to the L2 cache line size will not be a too big >> waste of memory overall. > > Easy to update the patch; done. Thanks Jeff! Rayson = Open Grid Scheduler / Grid Engine http://gridscheduler.sourceforge.net/ Scalable Grid Engine Support Program http://www.scalablelogic.com/ > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/
Re: [OMPI devel] OpenMPI and SGE integration made more stable
On Fri, Jul 27, 2012 at 8:53 AM, Daniel Gruber wrote: > A while after u5 the open source repository was closed and most of the > German engineers from Sun/Oracle moved to Univa, working on Univa > Grid Engine. Currently you have the choice between Univa Grid Engine, > Son of Grid Engine (free acadmic project), and OGS. Oracle Grid Engine is still alive, and in fact updates are still released by Oracle from time to time. (But of course it is not free, and since most people are looking for a free download, it is usually not mentioned in the mailing list discussions...) Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ > Daniel > > >> >> +-+--+ >> | Prof. Christoph van Wüllen | Tele-Phone (+49) (0)631 205 2749 | >> | TU Kaiserslautern, FB Chemie| Tele-Fax (+49) (0)631 205 2750 | >> | Erwin-Schrödinger-Str. | | >> | D-67663 Kaiserslautern, Germany | vanwul...@chemie.uni-kl.de | >> || >> | HomePage: http://www.chemie.uni-kl.de/vanwullen | >> +-+--+ >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
ised by the extensibility of Open MPI, and have proved that >>> Open MPI is scalable to 68,000 processes level! We feel pleasure to >>> utilize such a great open-source software. >>> >>> We cannot tell detail of our technology yet because of our contract >>> with RIKEN AICS, however, we will plan to feedback of our improvements >>> and bug fixes. We can contribute some bug fixes soon, however, for >>> contribution of our improvements will be next year with Open MPI >>> agreement. >>> >>> Best regards, >>> >>> MPI development team, >>> Fujitsu >>> >>> >>>> I got more information: >>>> >>>>http://blogs.cisco.com/performance/open-mpi-powers-8-petaflops/ >>>> >>>> Short version: yes, Open MPI is used on K and was used to power the 8PF >>>> runs. >>>> >>>> w00t! >>>> >>>> >>>> >>>> On Jun 24, 2011, at 7:16 PM, Jeff Squyres wrote: >>>> >>>>> w00t! >>>>> >>>>> OMPI powers 8 petaflops! >>>>> (at least I'm guessing that -- does anyone know if that's true?) >>>>> >>>>> >> Open MPI based: >>>>>> >>>>>>>>> On Jun 24, 2011, at 7:03 PM, Rayson Ho wrote: >>>>> >>>>>> Interesting... page 11: >>>>>> >>>>>> http://www.fujitsu.com/downloads/TC/sc10/programming-on-k-computer.pdf >>>>>> >>>>>> * Open Standard, Open Source, Multi-Platform including PC Cluster. >>>>>> * Adding extension to Open MPI for "Tofu" interconnect >>>>>> >>>>>> Rayson >>>>>> http://blogs.scalablelogic.com/
[OMPI devel] ROMIO code in OMPI
How is the ROMIO code in Open MPI developed & maintained? Do Open MPI releases take snapshots of the ROMIO code from time to time from the ROMIO project, or was the ROMIO code forked a while ago and maintained separately in Open MPI?? I would like to fix the 2GB limit in the ROMIO code... and that's why I am asking! :-D Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ On Thu, Nov 1, 2012 at 6:21 PM, Richard Shaw wrote: > Hi Rayson, > > Just seen this. > > In the end we've worked around it, by creating successive views of the file > that are all else than 2GB and then offsetting them to eventually read in > everything. It's a bit of a pain to keep track of, but it works at the > moment. > > I was intending on following your hints and trying to fix the bug myself, > but I've been short on time so haven't gotten around to it yet. > > Richard > > On Saturday, 20 October, 2012 at 10:12 AM, Rayson Ho wrote: > > Hi Eric, > > Sounds like it's also related to this problem reported by Scinet back in > July: > > http://www.open-mpi.org/community/lists/users/2012/07/19762.php > > And I think I found the issue, but I still have not followed up with > the ROMIO guys yet. And I was not sure if Scinet was waiting for the > fix or not - next time I visit U of Toronto, I will see if I can visit > the Scinet office and meet with the Scinet guys! > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI devel] ROMIO code in OMPI
Vishwanath, Can you point me to the two_phase module code? (I just wanted to make sure that we are looking at the same problem.) Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ On Thu, Nov 8, 2012 at 1:58 PM, Vishwanath Venkatesan wrote: > I just checked the code for testing 2GB limitation in OMPIO. The code works > with OMPIO's " fcoll dynamic" module. Although it does have the same 2GB > limitation with the two_phase module which is based on ROMIO's > implementation and the static module. I have a fix for both these modules I > will commit them to trunk shortly. > > > Thanks > Vish > > Vishwanath Venkatesan > Graduate Research Assistant > Parallel Software Technologies Lab > Department of Computer Science > University of Houston > TX, USA > www.cs.uh.edu/~venkates > > > On Nov 7, 2012, at 3:47 AM, Ralph Castain wrote: > > Hi Rayson > > We take snapshots from time to time. We debated whether or not to update > again for the 1.7 release, but ultimately decided not to do so - IIRC, none > of our developers had the time. > > If you are interested and willing to do the update, and perhaps look at > removing the limit, that is fine with me! You might check to see if the > latest ROMIO can go past 2GB - could be that an update is all that is > required. > > Alternatively, you might check with Edgar Gabriel about the ompio component > and see if it either supports > 2GB sizes or can also be extended to do so. > Might be that a simple change to select that module instead of ROMIO would > meet the need. > > Appreciate your interest in contributing! > Ralph > > > On Tue, Nov 6, 2012 at 11:55 AM, Rayson Ho wrote: >> >> How is the ROMIO code in Open MPI developed & maintained? Do Open MPI >> releases take snapshots of the ROMIO code from time to time from the >> ROMIO project, or was the ROMIO code forked a while ago and maintained >> separately in Open MPI?? >> >> I would like to fix the 2GB limit in the ROMIO code... and that's why >> I am asking! :-D >> >> Rayson >> >> == >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> >> On Thu, Nov 1, 2012 at 6:21 PM, Richard Shaw >> wrote: >> > Hi Rayson, >> > >> > Just seen this. >> > >> > In the end we've worked around it, by creating successive views of the >> > file >> > that are all else than 2GB and then offsetting them to eventually read >> > in >> > everything. It's a bit of a pain to keep track of, but it works at the >> > moment. >> > >> > I was intending on following your hints and trying to fix the bug >> > myself, >> > but I've been short on time so haven't gotten around to it yet. >> > >> > Richard >> > >> > On Saturday, 20 October, 2012 at 10:12 AM, Rayson Ho wrote: >> > >> > Hi Eric, >> > >> > Sounds like it's also related to this problem reported by Scinet back in >> > July: >> > >> > http://www.open-mpi.org/community/lists/users/2012/07/19762.php >> > >> > And I think I found the issue, but I still have not followed up with >> > the ROMIO guys yet. And I was not sure if Scinet was waiting for the >> > fix or not - next time I visit U of Toronto, I will see if I can visit >> > the Scinet office and meet with the Scinet guys! >> > >> > >> > >> > >> > ___ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] "Open MPI"-based MPI library used by K computer
Hi Ralph, Since the whole journal is available online, and is reachable by Google, I don't believe we can get into copyright issues by providing a link to it (but then, I also know that there are countries that have more crazy web page linking rules!). http://www.fujitsu.com/global/news/publications/periodicals/fstj/archives/vol48-3.html Rayson == Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ Scalable Cloud HPC: 10,000-node OGS/GE Amazon EC2 cluster http://blogs.scalablelogic.com/2012/11/running-1-node-grid-engine-cluster.html On Thu, Sep 20, 2012 at 6:46 AM, Ralph Castain wrote: > I'm unaware of any formal criteria. The papers currently located there are > those written by members of the OMPI community, but we can certainly link to > something written by someone else, so long as we don't get into copyright > issues. > > On Sep 19, 2012, at 11:57 PM, Rayson Ho wrote: > >> I found this paper recently, "MPI Library and Low-Level Communication >> on the K computer", available at: >> >> http://www.fujitsu.com/downloads/MAG/vol48-3/paper11.pdf >> >> What are the criteria for adding papers to the "Open MPI Publications" page? >> >> Rayson >> >> == >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> >> On Fri, Nov 18, 2011 at 5:32 AM, George Bosilca wrote: >>> Dear Yuki and Takahiro, >>> >>> Thanks for the bug report and for the patch. I pushed a [nearly identical] >>> patch in the trunk in https://svn.open-mpi.org/trac/ompi/changeset/25488. A >>> special version for the 1.4 has been prepared and has been attached to the >>> ticket #2916 (https://svn.open-mpi.org/trac/ompi/ticket/2916). >>> >>> Thanks, >>> george. >>> >>> >>> On Nov 14, 2011, at 02:27 , Y.MATSUMOTO wrote: >>> >>>> Dear Open MPI community, >>>> >>>> I'm a member of MPI library development team in Fujitsu, >>>> Takahiro Kawashima, who sent mail before, is my colleague. >>>> We start to feed back. >>>> >>>> First, we fixed about MPI_LB/MPI_UB and data packing problem. >>>> >>>> Program crashes when it meets all of the following conditions: >>>> a: The type of sending data is contiguous and derived type. >>>> b: Either or both of MPI_LB and MPI_UB is used in the data type. >>>> c: The size of sending data is smaller than extent(Data type has gap). >>>> d: Send-count is bigger than 1. >>>> e: Total size of data is bigger than "eager limit" >>>> >>>> This problem occurs in attachment C program. >>>> >>>> An incorrect-address accessing occurs >>>> because an unintended value of "done" inputs and >>>> the value of "max_allowd" becomes minus >>>> in the following place in "ompi/datatype/datatype_pack.c(in version >>>> 1.4.3)". >>>> >>>> >>>> (ompi/datatype/datatype_pack.c) >>>> 188 packed_buffer = (unsigned char *) iov[iov_count].iov_base; >>>> 189 done = pConv->bConverted - i * pData->size; /* partial >>>> data from last pack */ >>>> 190 if( done != 0 ) { /* still some data to copy from the >>>> last time */ >>>> 191 done = pData->size - done; >>>> 192 OMPI_DDT_SAFEGUARD_POINTER( user_memory, done, >>>> pConv->pBaseBuf, pData, pConv->count ); >>>> 193 MEMCPY_CSUM( packed_buffer, user_memory, done, pConv ); >>>> 194 packed_buffer += done; >>>> 195 max_allowed -= done; >>>> 196 total_bytes_converted += done; >>>> 197 user_memory += (extent - pData->size + done); >>>> 198 } >>>> >>>> This program assumes "done" as the size of partial data from last pack. >>>> However, when the program crashes, "done" equals the sum of all >>>> transmitted data size. >>>> It makes "max_allowed" to be a negative value. >>>> >>>> We modified the code as following and it passed our test suite. >>>> But we are not sure this fix is correct. Can anyone review this fix? >>>> Patch
[OMPI devel] MIPS/Linux port?
I googled but found no precise answer... Is it true that some features are disabled if the code is not supported for the target platform?? http://www.open-mpi.org/community/lists/devel/2007/07/1896.php http://www.open-mpi.org/community/lists/devel/2007/07/1886.php I got everything compiled on MIPS/Linux, but I would like to make sure that there are no features disabled. Thanks, Rayson
Re: [OMPI devel] MIPS/Linux port?
What would the MIPS/Linux port miss if ompi_info prints: Thread support: posix (mpi: yes, progress: yes) TIA, Rayson On Tue, Apr 14, 2009 at 10:00 AM, Rayson Ho wrote: > I googled but found no precise answer... > > Is it true that some features are disabled if the code is not > supported for the target platform?? > > http://www.open-mpi.org/community/lists/devel/2007/07/1896.php > http://www.open-mpi.org/community/lists/devel/2007/07/1886.php > > I got everything compiled on MIPS/Linux, but I would like to make sure > that there are no features disabled. > > Thanks, > Rayson >
Re: [OMPI devel] OpenMPI without RSH
On Wed, Apr 29, 2009 at 12:38 PM, Jerry Ye wrote: > I’m currently working in an environment where I cannot use SSH to launch > child processes. Instead, the process with rank 0 skips the ssh_child > function in plm_rsh_module.c and the child processes are all started at the > same time on different machines. Coordination is done with static jobids > and ports. I have sucessfully modified the code to get the hello_c example > working. However, I’m having problems with inter-process communication when > using MPI_Bcast. Is there something else that I’m obviously missing? Does your remote invocation method setup environment variables on for slave tasks correctly?? I remember MPICH relies on env variables to pass rank and other information from the rank 0 process to processes with non-zero ranks. (I have not looked at how things are handled in Open MPI in detail...) If you loop through all the environment variables using a " while (*environ != NULL) printf("%s\n", *environ++); " loop, and compare an MPI job started using your remote invocation method vs. the standard one, then you can find out the answer easily. And if you are using Grid Engine or Torque, then the integration with Open MPI is already implemented. May be you are using Hadoop+something else?? :D Rayson > > Thanks. > > - jerry > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] processor affinity -- OpenMPI / batch system integration
The code for the Job to Core Binding (aka. thread binding, or CPU binding) feature was checked into the Grid Engine project cvs. It uses OpenMPI's Portable Linux Processor Affinity (PLPA) library, and is topology and NUMA aware. The presentation from HPC Software Workshop '09: http://wikis.sun.com/download/attachments/170755116/job2core.pdf The design doc: http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=213897 Initial support is planned for 6.2 update 5 (current release is update 4, so update 5 is likely to be released in the next 2 or 3 months). Rayson On Tue, Sep 30, 2008 at 2:23 PM, Ralph Castain wrote: > Note that we would also have to modify OMPI to: > > 1. recognize these environmental variables, and > > 2. use them to actually set the binding, instead of using OMPI-internal > directives > > Not a big deal to do, but not something currently in the system. Since we > launch through our own daemons (something that isn't likely to change in > your time frame), these changes would be required. > > Otherwise, we could come up with some method by which you could provide > mapper information we use. While I agree with Jeff that having you tell us > which cores to use for each rank would generally be better, it does raise > issues when users want specific mapping algorithms that you might not > support. For example, we are working on mappers that will take input from > the user regarding comm topology plus system info on network wiring topology > and generate a near-optimal mapping of ranks. As part of that, users may > request some number of cores be reserved for that rank for threading or > other purposes. > > So perhaps both options would be best - give us the list of cores available > to us so we can map and do affinity, and pass in your own mapping. Maybe > with some logic so we can decide which to use based on whether OMPI or GE > did the mapping?? > > Not sure here - just thinking out loud. > Ralph > > On Sep 30, 2008, at 12:58 PM, Jeff Squyres wrote: > >> On Sep 30, 2008, at 2:51 PM, Rayson Ho wrote: >> >>> Restarting this discussion. A new update version of Grid Engine 6.2 >>> will come out early next year [1], and I really hope that we can get >>> at least the interface defined. >> >> Great! >> >>> At the minimum, is it enough for the batch system to tell OpenMPI via >>> an env variable which core (or virtual core, in the SMT case) to start >>> binding the first MPI task?? I guess an added bonus would be >>> information about the number of processors to skip (the stride) >>> between the sibling tasks?? Stride of one is usually the case, but >>> something larger than one would allow the batch system to control the >>> level of cache and memory bandwidth sharing between the MPI tasks... >> >> Wouldn't it be better to give us a specific list of cores to bind to? As >> core counts go up in servers, I think we may see a re-emergence of having >> multiple MPI jobs on a single server. And as core counts go even *higher*, >> then fragmentation of available cores over time is possible/likely. >> >> Would you be giving us a list of *relative* cores to bind to (i.e., "bind >> to the Nth online core on the machine" -- which may be different than the >> OS's ID for that processor) or will you be giving us the actual OS virtual >> processor ID(s) to bind to? >> >> -- >> Jeff Squyres >> Cisco Systems >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] New OMPI MPI extension
Hi Jeff, There's a typo in trunk/README: -> 1175 ...unrelated to wach other I guess you mean "unrelated to each other". Rayson On Wed, Apr 21, 2010 at 12:35 PM, Jeff Squyres wrote: > Per the telecon Tuesday, I committed a new OMPI MPI extension to the trunk: > > https://svn.open-mpi.org/trac/ompi/changeset/23018 > > Please read the commit message and let me know what you think. Suggestions > are welcome. > > If everyone is ok with it, I'd like to see this functionality hit the 1.5 > series at some point. > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] New OMPI MPI extension
Jeff, Seems like OMPI_Affinity_str() 's finest granularity is at the core level. However, in SGE (Sun Grid Engine) we also offer thread level (SMT) binding: http://wikis.sun.com/display/gridengine62u5/Using+Job+to+Core+Binding Will OpenMPI support thread level binding in the future?? BTW, another 2 typos in README: 1193subdirectory off <- directory "of" 1199 thse extensions <- "these" extensions Rayson On Thu, Apr 22, 2010 at 10:35 AM, Jeff Squyres wrote: > Fixed -- thanks! > > On Apr 22, 2010, at 12:35 AM, Rayson Ho wrote: > >> Hi Jeff, >> >> There's a typo in trunk/README: >> >> -> 1175 ...unrelated to wach other >> >> I guess you mean "unrelated to each other". >> >> Rayson >> >> >> >> On Wed, Apr 21, 2010 at 12:35 PM, Jeff Squyres wrote: >> > Per the telecon Tuesday, I committed a new OMPI MPI extension to the trunk: >> > >> > https://svn.open-mpi.org/trac/ompi/changeset/23018 >> > >> > Please read the commit message and let me know what you think. >> > Suggestions are welcome. >> > >> > If everyone is ok with it, I'd like to see this functionality hit the 1.5 >> > series at some point. >> > >> > -- >> > Jeff Squyres >> > jsquy...@cisco.com >> > For corporate legal information go to: >> > http://www.cisco.com/web/about/doing_business/legal/cri/ >> > >> > >> > ___ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
[OMPI devel] MPI_Pack & MPI_Unpack Performance
In the MPITypes paper (Processing MPI Datatypes Outside MPI), page 7: Test: Vector Element type: float MPICH2: 1788.85 MB/sec OpenMPI: 1088.01 MB/sec <- * MPITypes: 1789.37 MB/sec Manual Copy: 1791.59 MB/sec Test: YZ Face Element type: float MPICH2: 145.32 MB/sec OpenMPI: 93.08 MB/sec <- * MPITypes: 145.32 MB/sec Manual Copy: 143.68 MB/sec Size: 0.25 MB Extent:63.99 MB The paper can be downloaded at: http://press.mcs.anl.gov/mpitypes/ Is anyone working on this performance issue, or has it been fixed already?? If not, I will check with the authors and try to get the source code of the benchmark... Rayson
[OMPI devel] processor affinity -- OpenMPI/batch system integration
Hello, I'm from the Sun Grid Engine (SGE) project ( http://gridengine.sunsource.net ). I am working on processor affinity support for SGE. In 2005, we had some discussions on the SGE mailing list with Jeff on this topic. As quad-core processors are available from AMD and Intel, and higher core count per socket is coming soon, I would like to see what we can do to come up with a simple interface for the SGE 6.2 release, which will be available in Q2 this year (or at least into an "update" release of SGE6.2 if we couldn't get the changes in on time). The discussions we had before: http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7081 http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=4803 I looked at the SGE code, the simplest we can do is to set an environment variable to tell the task group the processor mask of the node before we start each task group. Is it good enough for OpenMPI?? After reading the OpenMPI code, I believe what we need to do is that in ompi/runtime/ompi_mpi_init.c , we need to add an else case: if (ompi_mpi_paffinity_alone) { ... } else { // get processor affinity information from batch system via the env var ... } Thanks, Rayson
[OMPI devel] On Host Topology Description (carto)
The OnHostTopologyDescription is actually quite similar to the Locality Group API and MPO on Solaris: http://www.opensolaris.org/os/community/performance/mpo_overview.pdf http://www.opensolaris.org/os/community/performance/numa/mpo_update.pdf And we have some discussions on the OpenSolaris forum, "NUMA and interconnect transfers": http://opensolaris.org/jive/thread.jspa?messageID=185268 Rayson On Jan 11, 2008 6:22 AM, Pak Lui wrote: > https://svn.open-mpi.org/trac/ompi/wiki/OnHostTopologyDescription > > > Rayson Ho wrote: > > Hello, > > > > I'm from the Sun Grid Engine (SGE) project ( > > http://gridengine.sunsource.net ). I am working on processor affinity > > support for SGE. > > > > In 2005, we had some discussions on the SGE mailing list with Jeff on > > this topic. As quad-core processors are available from AMD and Intel, > > and higher core count per socket is coming soon, I would like to see > > what we can do to come up with a simple interface for the SGE 6.2 > > release, which will be available in Q2 this year (or at least into an > > "update" release of SGE6.2 if we couldn't get the changes in on time). > > > > The discussions we had before: > > http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7081 > > http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=4803 > > > > I looked at the SGE code, the simplest we can do is to set an > > environment variable to tell the task group the processor mask of the > > node before we start each task group. Is it good enough for OpenMPI?? > > > > After reading the OpenMPI code, I believe what we need to do is that > > in ompi/runtime/ompi_mpi_init.c , we need to add an else case: > > > > if (ompi_mpi_paffinity_alone) { > >... > > } > > else > > { > >// get processor affinity information from batch system via the env var > >... > > } > >
Re: [OMPI devel] Logo as a vector graphic
What is the license of the logo?? If it is under a free license, then may be I can upload it to wikipedia and update the page: http://en.wikipedia.org/wiki/Open_MPI Rayson On 3/13/08, Jeff Squyres wrote: > On Mar 13, 2008, at 8:35 AM, Adrian Knoth wrote: > > >> We usually snip off the words at the bottom. > > > > I also did so. How do you crop the image? I used pdfcrop which is part > > of the tetex distribution, but I guess there are better PS editors out > > for Linux/Unix. I didn't find one, pdfcrop was fine, but JFTR... > > > Heh. I usually use the png or jpg version and just crop there. :-) > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] processor affinity -- OpenMPI/batch system integration
Restarting this discussion. A new update version of Grid Engine 6.2 will come out early next year [1], and I really hope that we can get at least the interface defined. At the minimum, is it enough for the batch system to tell OpenMPI via an env variable which core (or virtual core, in the SMT case) to start binding the first MPI task?? I guess an added bonus would be information about the number of processors to skip (the stride) between the sibling tasks?? Stride of one is usually the case, but something larger than one would allow the batch system to control the level of cache and memory bandwidth sharing between the MPI tasks... Rayson [1]: http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=26002 On 1/11/08, Jeff Squyres wrote: > carto is more intended to be a discovery and provider of topology > information. How various parts of the OMPI code base use that > information is a different issue. > > With regards to processor affinity, there are two general ways of > doing it: > > 1. The resource manager tells us what processors have been allocated > to us. E.g., provide us some environment variables saying what > processors/cores/whatever have been allocated to us on a per-host > basis (e.g., in the environment of the launched applications, and > therefore may be different on every host). Then Open MPI decides how > to split up the allocated host processors amongst all the Open MPI > processes on that host. > > It would be great if SGE could provide some environment variables to us. > > 2. The resource manager does all the processor affinity itself. > SLURM, for example, has a nice command line syntax for all kinds of > processor affinity stuff in their "srun" command. A traditional > roadblock to this has been that OMPI currently uses the resource > manager to launch a single "orted" process on each node, and then that > orted, in turn, launches all the MPI processes locally. However, > there is work progressing to remove this roadblock. If I try to > describe it, I'm sure I'll get it wrong :-) -- Ralph / IU? > > - > > Open MPI will need to be able to tell the difference between #1 and > #2. So it might be good if the RM always provides the environment > variables, but in those env variables, tell us whether the RM did the > affinity pinning or not. I.e., in #1, you'll get information about > all the processors that are available -- all the processes on a single > host will get the same information. In #2, each process will get > individualized information about where it has been pinned. > > Make sense? > > > > On Jan 11, 2008, at 6:22 AM, Pak Lui wrote: > > > Hi Rayson, > > > > I guess this is an issue only for SGE. I believe there is something > > called 'carto' framework is being developed to represent the node- > > socket > > relationship in order to address the multicore issue. I think there > > are > > other folks in the team who are actively working on it so they > > probably > > can address it better than I can. Here some descriptions on the wiki > > for it: > > > > https://svn.open-mpi.org/trac/ompi/wiki/OnHostTopologyDescription > > > > Rayson Ho wrote: > >> Hello, > >> > >> I'm from the Sun Grid Engine (SGE) project ( > >> http://gridengine.sunsource.net ). I am working on processor affinity > >> support for SGE. > >> > >> In 2005, we had some discussions on the SGE mailing list with Jeff on > >> this topic. As quad-core processors are available from AMD and Intel, > >> and higher core count per socket is coming soon, I would like to see > >> what we can do to come up with a simple interface for the SGE 6.2 > >> release, which will be available in Q2 this year (or at least into an > >> "update" release of SGE6.2 if we couldn't get the changes in on > >> time). > >> > >> The discussions we had before: > >> http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7081 > >> http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=4803 > >> > >> I looked at the SGE code, the simplest we can do is to set an > >> environment variable to tell the task group the processor mask of the > >> node before we start each task group. Is it good enough for OpenMPI?? > >> > >> After reading the OpenMPI code, I believe what we need to do is that > >> in ompi/runtime/ompi_mpi_init.c , we need to add an else case: > >> > >> if (ompi_mpi_paffinity_alone) { > >> ... > >> } &g