Re: [OMPI devel] ROMIO code in OMPI

2012-11-07 Thread Ralph Castain
Hi Rayson

We take snapshots from time to time. We debated whether or not to update
again for the 1.7 release, but ultimately decided not to do so - IIRC, none
of our developers had the time.

If you are interested and willing to do the update, and perhaps look at
removing the limit, that is fine with me! You might check to see if the
latest ROMIO can go past 2GB - could be that an update is all that is
required.

Alternatively, you might check with Edgar Gabriel about the ompio component
and see if it either supports > 2GB sizes or can also be extended to do so.
Might be that a simple change to select that module instead of ROMIO would
meet the need.

Appreciate your interest in contributing!
Ralph


On Tue, Nov 6, 2012 at 11:55 AM, Rayson Ho  wrote:

> How is the ROMIO code in Open MPI developed & maintained? Do Open MPI
> releases take snapshots of the ROMIO code from time to time from the
> ROMIO project, or was the ROMIO code forked a while ago and maintained
> separately in Open MPI??
>
> I would like to fix the 2GB limit in the ROMIO code... and that's why
> I am asking! :-D
>
> Rayson
>
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
>
> On Thu, Nov 1, 2012 at 6:21 PM, Richard Shaw 
> wrote:
> > Hi Rayson,
> >
> > Just seen this.
> >
> > In the end we've worked around it, by creating successive views of the
> file
> > that are all else than 2GB and then offsetting them to eventually read in
> > everything. It's a bit of a pain to keep track of, but it works at the
> > moment.
> >
> > I was intending on following your hints and trying to fix the bug myself,
> > but I've been short on time so haven't gotten around to it yet.
> >
> > Richard
> >
> > On Saturday, 20 October, 2012 at 10:12 AM, Rayson Ho wrote:
> >
> > Hi Eric,
> >
> > Sounds like it's also related to this problem reported by Scinet back in
> > July:
> >
> > http://www.open-mpi.org/community/lists/users/2012/07/19762.php
> >
> > And I think I found the issue, but I still have not followed up with
> > the ROMIO guys yet. And I was not sure if Scinet was waiting for the
> > fix or not - next time I visit U of Toronto, I will see if I can visit
> > the Scinet office and meet with the Scinet guys!
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [OMPI svn] svn:open-mpi r27574 - trunk/orte/mca/rmaps/rank_file

2012-11-07 Thread Nathan Hjelm
Hmm, not sure why I didn't see an error when I tested the change. It looks like 
in this case yyterminate should have been defined as 
orte_rmaps_rank_file_lex_destroy(). Looked a little deeper and it looks like 
the default action for yyterminate is to call the *lex_destroy function so we 
don't need to define it anywhere. Let me see if deleting yyterminate introduces 
any leaks.

-Nathan

On Wed, Nov 07, 2012 at 06:11:06AM -0500, svn-commit-mai...@open-mpi.org wrote:
> Author: rhc (Ralph Castain)
> Date: 2012-11-07 06:11:05 EST (Wed, 07 Nov 2012)
> New Revision: 27574
> URL: https://svn.open-mpi.org/trac/ompi/changeset/27574
> 
> Log:
> A prior commit apparently broke the trunk when something was inadvertently 
> left behind - so remove a reference to a no-longer-existing function
> 
> Text files modified: 
>trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l | 3 ---   
>   
>1 files changed, 0 insertions(+), 3 deletions(-)
> 
> Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l
> ==
> --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l  Tue Nov  6 
> 16:25:19 2012(r27573)
> +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l  2012-11-07 
> 06:11:05 EST (Wed, 07 Nov 2012)  (r27574)
> @@ -36,9 +36,6 @@
>  
>  END_C_DECLS
>  
> -#define yyterminate() \
> -  return orte_rmaps_rank_file_yylex_destroy()
> -
>  /*
>   * global variables
>   */
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn


Re: [OMPI devel] [OMPI svn] svn:open-mpi r27574 - trunk/orte/mca/rmaps/rank_file

2012-11-07 Thread Nathan Hjelm
Ok, looks like the default yyterminate does not clean up the lex state. The 
definition in rmaps/rankfile/rmaps_rank_file_lex.l should be

#define yyterminate() return orte_rmaps_rank_file_lex_destroy()

I can fix it if you want.

-Nathan

On Wed, Nov 07, 2012 at 08:34:59AM -0700, Nathan Hjelm wrote:
> Hmm, not sure why I didn't see an error when I tested the change. It looks 
> like in this case yyterminate should have been defined as 
> orte_rmaps_rank_file_lex_destroy(). Looked a little deeper and it looks like 
> the default action for yyterminate is to call the *lex_destroy function so we 
> don't need to define it anywhere. Let me see if deleting yyterminate 
> introduces any leaks.
> 
> -Nathan
> 
> On Wed, Nov 07, 2012 at 06:11:06AM -0500, svn-commit-mai...@open-mpi.org 
> wrote:
> > Author: rhc (Ralph Castain)
> > Date: 2012-11-07 06:11:05 EST (Wed, 07 Nov 2012)
> > New Revision: 27574
> > URL: https://svn.open-mpi.org/trac/ompi/changeset/27574
> > 
> > Log:
> > A prior commit apparently broke the trunk when something was inadvertently 
> > left behind - so remove a reference to a no-longer-existing function
> > 
> > Text files modified: 
> >trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l | 3 --- 
> > 
> >1 files changed, 0 insertions(+), 3 deletions(-)
> > 
> > Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l
> > ==
> > --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.lTue Nov  6 
> > 16:25:19 2012(r27573)
> > +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l2012-11-07 
> > 06:11:05 EST (Wed, 07 Nov 2012)  (r27574)
> > @@ -36,9 +36,6 @@
> >  
> >  END_C_DECLS
> >  
> > -#define yyterminate() \
> > -  return orte_rmaps_rank_file_yylex_destroy()
> > -
> >  /*
> >   * global variables
> >   */
> > ___
> > svn mailing list
> > s...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/svn
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] [OMPI svn] svn:open-mpi r27574 - trunk/orte/mca/rmaps/rank_file

2012-11-07 Thread Ralph Castain
Problem is that the orte function no longer seems to exist, so build fails

Sent from my iPhone

On Nov 7, 2012, at 7:55 AM, Nathan Hjelm  wrote:

> Ok, looks like the default yyterminate does not clean up the lex state. The 
> definition in rmaps/rankfile/rmaps_rank_file_lex.l should be
> 
> #define yyterminate() return orte_rmaps_rank_file_lex_destroy()
> 
> I can fix it if you want.
> 
> -Nathan
> 
> On Wed, Nov 07, 2012 at 08:34:59AM -0700, Nathan Hjelm wrote:
>> Hmm, not sure why I didn't see an error when I tested the change. It looks 
>> like in this case yyterminate should have been defined as 
>> orte_rmaps_rank_file_lex_destroy(). Looked a little deeper and it looks like 
>> the default action for yyterminate is to call the *lex_destroy function so 
>> we don't need to define it anywhere. Let me see if deleting yyterminate 
>> introduces any leaks.
>> 
>> -Nathan
>> 
>> On Wed, Nov 07, 2012 at 06:11:06AM -0500, svn-commit-mai...@open-mpi.org 
>> wrote:
>>> Author: rhc (Ralph Castain)
>>> Date: 2012-11-07 06:11:05 EST (Wed, 07 Nov 2012)
>>> New Revision: 27574
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27574
>>> 
>>> Log:
>>> A prior commit apparently broke the trunk when something was inadvertently 
>>> left behind - so remove a reference to a no-longer-existing function
>>> 
>>> Text files modified: 
>>>   trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l | 3 ---  
>>>
>>>   1 files changed, 0 insertions(+), 3 deletions(-)
>>> 
>>> Modified: trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l
>>> ==
>>> --- trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.lTue Nov  6 
>>> 16:25:19 2012(r27573)
>>> +++ trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_lex.l2012-11-07 
>>> 06:11:05 EST (Wed, 07 Nov 2012)(r27574)
>>> @@ -36,9 +36,6 @@
>>> 
>>> END_C_DECLS
>>> 
>>> -#define yyterminate() \
>>> -  return orte_rmaps_rank_file_yylex_destroy()
>>> -
>>> /*
>>>  * global variables
>>>  */
>>> ___
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r27573 - trunk/ompi/mca/pml/v

2012-11-07 Thread Tim Mattox
Nathan,
Although this typo fix is correct, AFAIK, it is unnecessary to check
for NULL before calling free().
-- Tim "the perpetual OMPI lurker" Mattox
P.S. - I have a coding problem... I'm still watching commits 3+ years
after giving up gatekeeper duties!

On Tue, Nov 6, 2012 at 4:25 PM,   wrote:
> Author: hjelmn (Nathan Hjelm)
> Date: 2012-11-06 16:25:19 EST (Tue, 06 Nov 2012)
> New Revision: 27573
> URL: https://svn.open-mpi.org/trac/ompi/changeset/27573
>
> Log:
> fix typo
>
> Text files modified:
>trunk/ompi/mca/pml/v/pml_v_component.c | 2 +-
>1 files changed, 1 insertions(+), 1 deletions(-)
>
> Modified: trunk/ompi/mca/pml/v/pml_v_component.c
> ==
> --- trunk/ompi/mca/pml/v/pml_v_component.c  Tue Nov  6 15:06:54 2012  
>   (r27572)
> +++ trunk/ompi/mca/pml/v/pml_v_component.c  2012-11-06 16:25:19 EST (Tue, 
> 06 Nov 2012)  (r27573)
> @@ -86,7 +86,7 @@
>  V_OUTPUT_VERBOSE(500, "loaded");
>
>  rc = mca_vprotocol_base_open(vprotocol_include_list);
> -if (NULL == vprotocol_include_list) {
> +if (NULL != vprotocol_include_list) {
>  free (vprotocol_include_list);
>  }
>
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full



-- 
Tim Mattox, Ph.D. - I'm a bright... http://www.the-brights.net/
 timat...@open-mpi.org || tmat...@gmail.com


[OMPI devel] -npersocket in 1.6

2012-11-07 Thread David Singleton


There appears to have been a change in the behaviour of -npersocket from
1.4.3 to 1.6.x (tested with 1.6.2). Below is what I see on a pair of dual
quad-core socket Nehalem nodes running under PBS.  Is this expected?

Thanks
David


[dbs900@v482 ~/MPI]$ mpirun -V
mpirun (Open MPI) 1.4.3
...
[dbs900@v482 ~/MPI]$ mpirun --report-bindings -npersocket 3 -np 12 ./numa143
[v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],0] to 
socket 0 cpus 0001
[v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],1] to 
socket 0 cpus 0002
[v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],2] to 
socket 0 cpus 0004
[v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],3] to 
socket 1 cpus 0010
[v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],4] to 
socket 1 cpus 0020
[v482:03367] [[64945,0],0] odls:default:fork binding child [[64945,1],5] to 
socket 1 cpus 0040
[v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],6] to 
socket 0 cpus 0001
[v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],7] to 
socket 0 cpus 0002
[v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],8] to 
socket 0 cpus 0004
[v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],9] to 
socket 1 cpus 0010
[v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],10] to 
socket 1 cpus 0020
[v483:31768] [[64945,0],1] odls:default:fork binding child [[64945,1],11] to 
socket 1 cpus 0040
...

[dbs900@v482 ~/MPI]$ mpirun -V
mpirun (Open MPI) 1.6.2
...
[dbs900@v482 ~/MPI]$ mpirun --report-bindings -npersocket 3 -np 12 ./numa162
--
Your job has requested a conflicting number of processes for the
application:

App: ./numa162
number of procs:  12

This is more processes than we can launch under the following
additional directives and conditions:

number of sockets:   0
npersocket:   3

Please revise the conflict and try again.
--
--
A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--