Re: [OMPI devel] ROMIO code in OMPI
Hi Rayson We take snapshots from time to time. We debated whether or not to update again for the 1.7 release, but ultimately decided not to do so - IIRC, none of our developers had the time. If you are interested and willing to do the update, and perhaps look at removing the limit, that is fine with me! You might check to see if the latest ROMIO can go past 2GB - could be that an update is all that is required. Alternatively, you might check with Edgar Gabriel about the ompio component and see if it either supports > 2GB sizes or can also be extended to do so. Might be that a simple change to select that module instead of ROMIO would meet the need. Appreciate your interest in contributing! Ralph On Tue, Nov 6, 2012 at 11:55 AM, Rayson Ho wrote: > How is the ROMIO code in Open MPI developed & maintained? Do Open MPI > releases take snapshots of the ROMIO code from time to time from the > ROMIO project, or was the ROMIO code forked a while ago and maintained > separately in Open MPI?? > > I would like to fix the 2GB limit in the ROMIO code... and that's why > I am asking! :-D > > Rayson > > == > Open Grid Scheduler - The Official Open Source Grid Engine > http://gridscheduler.sourceforge.net/ > > > On Thu, Nov 1, 2012 at 6:21 PM, Richard Shaw > wrote: > > Hi Rayson, > > > > Just seen this. > > > > In the end we've worked around it, by creating successive views of the > file > > that are all else than 2GB and then offsetting them to eventually read in > > everything. It's a bit of a pain to keep track of, but it works at the > > moment. > > > > I was intending on following your hints and trying to fix the bug myself, > > but I've been short on time so haven't gotten around to it yet. > > > > Richard > > > > On Saturday, 20 October, 2012 at 10:12 AM, Rayson Ho wrote: > > > > Hi Eric, > > > > Sounds like it's also related to this problem reported by Scinet back in > > July: > > > > http://www.open-mpi.org/community/lists/users/2012/07/19762.php > > > > And I think I found the issue, but I still have not followed up with > > the ROMIO guys yet. And I was not sure if Scinet was waiting for the > > fix or not - next time I visit U of Toronto, I will see if I can visit > > the Scinet office and meet with the Scinet guys! > > > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] [OMPI svn] svn:open-mpi r27574 - trunk/orte/mca/rmaps/rank_file
Hmm, not sure why I didn't see an error when I tested the change. It looks like in this case yyterminate should have been defined as orte_rmaps_rank_file_lex_destroy(). Looked a little deeper and it looks like the default action for yyterminate is to call the *lex_destroy function so we don't need to define it anywhere. Let me see if deleting yyterminate introduces any leaks. -Nathan On Wed, Nov 07, 2012 at 06:11:06AM -0500, svn-commit-mai...@open-mpi.org wrote: > Author: rhc (Ralph Castain) > Date: 2012-11-07 06:11:05 EST (Wed, 07 Nov 2012) > New Revision: 27574 > URL: https://svn.open-mpi.org/trac/ompi/changeset/27574 > > Log: > A prior commit apparently broke the trunk when something was inadvertently > left behind - so remove a reference to a no-longer-existing function > > Text files modified: >trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l | 3 --- > >1 files changed, 0 insertions(+), 3 deletions(-) > > Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l > == > --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l Tue Nov 6 > 16:25:19 2012(r27573) > +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l 2012-11-07 > 06:11:05 EST (Wed, 07 Nov 2012) (r27574) > @@ -36,9 +36,6 @@ > > END_C_DECLS > > -#define yyterminate() \ > - return orte_rmaps_rank_file_yylex_destroy() > - > /* > * global variables > */ > ___ > svn mailing list > s...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn
Re: [OMPI devel] [OMPI svn] svn:open-mpi r27574 - trunk/orte/mca/rmaps/rank_file
Ok, looks like the default yyterminate does not clean up the lex state. The definition in rmaps/rankfile/rmaps_rank_file_lex.l should be #define yyterminate() return orte_rmaps_rank_file_lex_destroy() I can fix it if you want. -Nathan On Wed, Nov 07, 2012 at 08:34:59AM -0700, Nathan Hjelm wrote: > Hmm, not sure why I didn't see an error when I tested the change. It looks > like in this case yyterminate should have been defined as > orte_rmaps_rank_file_lex_destroy(). Looked a little deeper and it looks like > the default action for yyterminate is to call the *lex_destroy function so we > don't need to define it anywhere. Let me see if deleting yyterminate > introduces any leaks. > > -Nathan > > On Wed, Nov 07, 2012 at 06:11:06AM -0500, svn-commit-mai...@open-mpi.org > wrote: > > Author: rhc (Ralph Castain) > > Date: 2012-11-07 06:11:05 EST (Wed, 07 Nov 2012) > > New Revision: 27574 > > URL: https://svn.open-mpi.org/trac/ompi/changeset/27574 > > > > Log: > > A prior commit apparently broke the trunk when something was inadvertently > > left behind - so remove a reference to a no-longer-existing function > > > > Text files modified: > >trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l | 3 --- > > > >1 files changed, 0 insertions(+), 3 deletions(-) > > > > Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l > > == > > --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.lTue Nov 6 > > 16:25:19 2012(r27573) > > +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l2012-11-07 > > 06:11:05 EST (Wed, 07 Nov 2012) (r27574) > > @@ -36,9 +36,6 @@ > > > > END_C_DECLS > > > > -#define yyterminate() \ > > - return orte_rmaps_rank_file_yylex_destroy() > > - > > /* > > * global variables > > */ > > ___ > > svn mailing list > > s...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/svn > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r27574 - trunk/orte/mca/rmaps/rank_file
Problem is that the orte function no longer seems to exist, so build fails Sent from my iPhone On Nov 7, 2012, at 7:55 AM, Nathan Hjelm wrote: > Ok, looks like the default yyterminate does not clean up the lex state. The > definition in rmaps/rankfile/rmaps_rank_file_lex.l should be > > #define yyterminate() return orte_rmaps_rank_file_lex_destroy() > > I can fix it if you want. > > -Nathan > > On Wed, Nov 07, 2012 at 08:34:59AM -0700, Nathan Hjelm wrote: >> Hmm, not sure why I didn't see an error when I tested the change. It looks >> like in this case yyterminate should have been defined as >> orte_rmaps_rank_file_lex_destroy(). Looked a little deeper and it looks like >> the default action for yyterminate is to call the *lex_destroy function so >> we don't need to define it anywhere. Let me see if deleting yyterminate >> introduces any leaks. >> >> -Nathan >> >> On Wed, Nov 07, 2012 at 06:11:06AM -0500, svn-commit-mai...@open-mpi.org >> wrote: >>> Author: rhc (Ralph Castain) >>> Date: 2012-11-07 06:11:05 EST (Wed, 07 Nov 2012) >>> New Revision: 27574 >>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27574 >>> >>> Log: >>> A prior commit apparently broke the trunk when something was inadvertently >>> left behind - so remove a reference to a no-longer-existing function >>> >>> Text files modified: >>> trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l | 3 --- >>> >>> 1 files changed, 0 insertions(+), 3 deletions(-) >>> >>> Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l >>> == >>> --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.lTue Nov 6 >>> 16:25:19 2012(r27573) >>> +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l2012-11-07 >>> 06:11:05 EST (Wed, 07 Nov 2012)(r27574) >>> @@ -36,9 +36,6 @@ >>> >>> END_C_DECLS >>> >>> -#define yyterminate() \ >>> - return orte_rmaps_rank_file_yylex_destroy() >>> - >>> /* >>> * global variables >>> */ >>> ___ >>> svn mailing list >>> s...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/svn >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27573 - trunk/ompi/mca/pml/v
Nathan, Although this typo fix is correct, AFAIK, it is unnecessary to check for NULL before calling free(). -- Tim "the perpetual OMPI lurker" Mattox P.S. - I have a coding problem... I'm still watching commits 3+ years after giving up gatekeeper duties! On Tue, Nov 6, 2012 at 4:25 PM, wrote: > Author: hjelmn (Nathan Hjelm) > Date: 2012-11-06 16:25:19 EST (Tue, 06 Nov 2012) > New Revision: 27573 > URL: https://svn.open-mpi.org/trac/ompi/changeset/27573 > > Log: > fix typo > > Text files modified: >trunk/ompi/mca/pml/v/pml_v_component.c | 2 +- >1 files changed, 1 insertions(+), 1 deletions(-) > > Modified: trunk/ompi/mca/pml/v/pml_v_component.c > == > --- trunk/ompi/mca/pml/v/pml_v_component.c Tue Nov 6 15:06:54 2012 > (r27572) > +++ trunk/ompi/mca/pml/v/pml_v_component.c 2012-11-06 16:25:19 EST (Tue, > 06 Nov 2012) (r27573) > @@ -86,7 +86,7 @@ > V_OUTPUT_VERBOSE(500, "loaded"); > > rc = mca_vprotocol_base_open(vprotocol_include_list); > -if (NULL == vprotocol_include_list) { > +if (NULL != vprotocol_include_list) { > free (vprotocol_include_list); > } > > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Tim Mattox, Ph.D. - I'm a bright... http://www.the-brights.net/ timat...@open-mpi.org || tmat...@gmail.com
[OMPI devel] -npersocket in 1.6
There appears to have been a change in the behaviour of -npersocket from 1.4.3 to 1.6.x (tested with 1.6.2). Below is what I see on a pair of dual quad-core socket Nehalem nodes running under PBS. Is this expected? Thanks David [dbs900@v482 ~/MPI]$ mpirun -V mpirun (Open MPI) 1.4.3 ... [dbs900@v482 ~/MPI]$ mpirun --report-bindings -npersocket 3 -np 12 ./numa143 [v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],0] to socket 0 cpus 0001 [v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],1] to socket 0 cpus 0002 [v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],2] to socket 0 cpus 0004 [v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],3] to socket 1 cpus 0010 [v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],4] to socket 1 cpus 0020 [v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],5] to socket 1 cpus 0040 [v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],6] to socket 0 cpus 0001 [v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],7] to socket 0 cpus 0002 [v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],8] to socket 0 cpus 0004 [v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],9] to socket 1 cpus 0010 [v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],10] to socket 1 cpus 0020 [v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],11] to socket 1 cpus 0040 ... [dbs900@v482 ~/MPI]$ mpirun -V mpirun (Open MPI) 1.6.2 ... [dbs900@v482 ~/MPI]$ mpirun --report-bindings -npersocket 3 -np 12 ./numa162 -- Your job has requested a conflicting number of processes for the application: App: ./numa162 number of procs: 12 This is more processes than we can launch under the following additional directives and conditions: number of sockets: 0 npersocket: 3 Please revise the conflict and try again. -- -- A daemon (pid unknown) died unexpectedly on signal 1 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. --