[OMPI devel] r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread TERRY DONTJE
r26255 is forcing the use of __malloc_hook which is implemented in 
opal/mca/memory/linux however that is not compiled in the library when 
built on Solaris thus causing a referenced symbol not found when libmpi 
tries to load the openib btl.


I am looking how to fix this now but if someone has a good idea how to 
detect when __malloc_hook is used (or not) I'd be interested in hearing it.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread TERRY DONTJE
I am thinking MEMORY_LINUX_PTMALLOC2 is probably the right define to key 
off of but this is really going to look gross ifdef'ing out the lines 
that are accessing the Linux memory module.  One other idea I have is to 
create a dummy __malloc_hook in the Solaris memory module but might 
there be other OSes that could run into the same problem?   Or what 
happens if PTMALLOC2 is not used (does that happen)?


--td

On 4/13/2012 10:45 AM, TERRY DONTJE wrote:
r26255 is forcing the use of __malloc_hook which is implemented in 
opal/mca/memory/linux however that is not compiled in the library when 
built on Solaris thus causing a referenced symbol not found when 
libmpi tries to load the openib btl.


I am looking how to fix this now but if someone has a good idea how to 
detect when __malloc_hook is used (or not) I'd be interested in 
hearing it.




--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





[OMPI devel] RTE node allocation component

2012-04-13 Thread Alex Margolin

Hi,

The next component I'm writing is a component for allocating nodes to
run the processes of an MPI job.
Suppose I have a "getbestnode" executable which not only tells me the
best location for spawning a new process,
but it also reserves the space (for some time), so that every time I run
it I get different results (as the best cores are already reserved).

I thought I should write a component under orte/mca/ras, similar to
loadleveler, but the problem is that I can't determine inside the module
the amount of slots required allocate. It gets an list to fill in as a 
parameter, and

I guess it assumes I somehow know how many processes are run because the
allocation was done externally and now I'm just asking the allocator for
the list.

A related location, the rmaps, has this information (and much more), but
it doesn't look like a good location for such a module since it maps
already allocated resources, and has a lot of irrelevant code in this case.

Maybe the answer is to change the base module a bit, to contain this
information? It could be used as a decent sanity check for other modules
- making sure the external allocation fits the amount of processes we
intend to run. Maybe orte_ras_base_allocate(orte_job_t *jdata) in
ras_base_allocate.c can store the relevant information from jdata in
orte_ras_base? In the long run it can become a parameter passed to the
ras components, but for backwards-compatability the global will do for now.

Thanks,
Alex

P.S. An RDS component is elaborately mentioned in ras.h, yet it is no
longer available, right?


Re: [OMPI devel] [EXTERNAL] Re: r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread Barrett, Brian W
r2655 is awful as a patch.  It doesn't work on any non-Linux platform,
which is unpleasant.  But worse, what does it possibly accomplish?  In
codes other than benchmarks, there's no advantage to aligning the pointer
to 32 or 64 byte boundaries, as the malloced buffer very rarely is exactly
what is sent.  So you've done a whole lot of work, screwed with the memory
allocator (which always bites OMPI in the butt), and accomplished nothing
useful.  Mellanox should fix the hardware, not make everyone's life
miserable with crappy workarounds.

MEMORY_LINUX_PTMALLOC2 is the wrong define for what they want.  They
should check for __malloc_hook and only use that code if __malloc_hook is
found.

Brian

On 4/13/12 9:32 AM, "TERRY DONTJE"  wrote:

>
>  
>  
>I am thinking MEMORY_LINUX_PTMALLOC2 is probably the right define to
>key off of but this is really going to look gross ifdef'ing out the
>lines that are accessing the Linux memory module.  One other idea I
>have is to create a dummy __malloc_hook in the Solaris memory module
>but might there be other OSes that could run into the same
>problem?   Or what happens if PTMALLOC2 is not used (does that
>happen)?
>
>--td
>
>On 4/13/2012 10:45 AM, TERRY DONTJE wrote:
>
>  
>  r26255 is forcing the use of __malloc_hook which is implemented in
>  opal/mca/memory/linux however that is not compiled in the library
>  when built on Solaris thus causing a referenced symbol not found
>  when libmpi tries to load the openib btl.
>  
>  I am looking how to fix this now but if someone has a good idea
>  how to detect when __malloc_hook is used (or not) I'd be
>  interested in hearing it.
>  
>
>  
>
>  
> 
>  
>
>  
>
>
>  
>
>
>
>-- 
>  
>  
>
>
>
>  
>
>  Terry D. Dontje | Principal
>Software Engineer
>Developer
>Tools
>Engineering | +1.781.442.2631
>  
>  Oracle
>  
>  - Performance
>  Technologies
>  
>95 Network Drive, Burlington, MA 01803
>Email terry.don...@oracle.com
>  
>
>  
>
>  
>
>  
>  
>  
>
>  
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories








Re: [OMPI devel] RTE node allocation component

2012-04-13 Thread Ralph Castain
Looks like you are using an old version - the trunk RAS has changed a bit. I'll 
shortly be implementing further changes to support dynamic allocation requests 
that might be relevant here as well.

Adding job data to the RAS base isn't a good idea - remember, multiple jobs can 
be launching at the same time!

On Apr 13, 2012, at 10:07 AM, Alex Margolin wrote:

> Hi,
> 
> The next component I'm writing is a component for allocating nodes to
> run the processes of an MPI job.
> Suppose I have a "getbestnode" executable which not only tells me the
> best location for spawning a new process,
> but it also reserves the space (for some time), so that every time I run
> it I get different results (as the best cores are already reserved).
> 
> I thought I should write a component under orte/mca/ras, similar to
> loadleveler, but the problem is that I can't determine inside the module
> the amount of slots required allocate. It gets an list to fill in as a 
> parameter, and
> I guess it assumes I somehow know how many processes are run because the
> allocation was done externally and now I'm just asking the allocator for
> the list.
> 
> A related location, the rmaps, has this information (and much more), but
> it doesn't look like a good location for such a module since it maps
> already allocated resources, and has a lot of irrelevant code in this case.
> 
> Maybe the answer is to change the base module a bit, to contain this
> information? It could be used as a decent sanity check for other modules
> - making sure the external allocation fits the amount of processes we
> intend to run. Maybe orte_ras_base_allocate(orte_job_t *jdata) in
> ras_base_allocate.c can store the relevant information from jdata in
> orte_ras_base? In the long run it can become a parameter passed to the
> ras components, but for backwards-compatability the global will do for now.
> 
> Thanks,
> Alex
> 
> P.S. An RDS component is elaborately mentioned in ras.h, yet it is no
> longer available, right?
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [EXTERNAL] Re: r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread TERRY DONTJE



On 4/13/2012 12:06 PM, Barrett, Brian W wrote:

r2655 is awful as a patch.  It doesn't work on any non-Linux platform,
which is unpleasant.  But worse, what does it possibly accomplish?  In
codes other than benchmarks, there's no advantage to aligning the pointer
to 32 or 64 byte boundaries, as the malloced buffer very rarely is exactly
what is sent.  So you've done a whole lot of work, screwed with the memory
allocator (which always bites OMPI in the butt), and accomplished nothing
useful.  Mellanox should fix the hardware, not make everyone's life
miserable with crappy workarounds.

MEMORY_LINUX_PTMALLOC2 is the wrong define for what they want.  They
should check for __malloc_hook and only use that code if __malloc_hook is
found.
I actually think the usage of __malloc_hook is a gross hack.  Maybe 
there should be some sort of
memory interface to allow one to register a malloc_hook.  Anyways, per 
my comment to 3071 I am

going to back out r26255.

--td

Brian

On 4/13/12 9:32 AM, "TERRY DONTJE"  wrote:




I am thinking MEMORY_LINUX_PTMALLOC2 is probably the right define to
key off of but this is really going to look gross ifdef'ing out the
lines that are accessing the Linux memory module.  One other idea I
have is to create a dummy __malloc_hook in the Solaris memory module
but might there be other OSes that could run into the same
problem?   Or what happens if PTMALLOC2 is not used (does that
happen)?

--td

On 4/13/2012 10:45 AM, TERRY DONTJE wrote:


  r26255 is forcing the use of __malloc_hook which is implemented in
  opal/mca/memory/linux however that is not compiled in the library
  when built on Solaris thus causing a referenced symbol not found
  when libmpi tries to load the openib btl.

  I am looking how to fix this now but if someone has a good idea
  how to detect when __malloc_hook is used (or not) I'd be
  interested in hearing it.















--







  Terry D. Dontje | Principal
Software Engineer
Developer
Tools
Engineering | +1.781.442.2631

  Oracle

  - Performance
  Technologies

95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com












___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] [EXTERNAL] Re: r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread Mike Dubman
Too many drama - we will fix it to detect hooks availability at configure
stage, this will make your life back to normal.

The problem is not a Mellanox hw, but Intel PCI bus implementation, which
charge extra latency if buffers are not aligned.
The patch is a workaround for this problem and help to non-benchmark code
as well.



On Fri, Apr 13, 2012 at 7:06 PM, Barrett, Brian W wrote:

> r2655 is awful as a patch.  It doesn't work on any non-Linux platform,
> which is unpleasant.  But worse, what does it possibly accomplish?  In
> codes other than benchmarks, there's no advantage to aligning the pointer
> to 32 or 64 byte boundaries, as the malloced buffer very rarely is exactly
> what is sent.  So you've done a whole lot of work, screwed with the memory
> allocator (which always bites OMPI in the butt), and accomplished nothing
> useful.  Mellanox should fix the hardware, not make everyone's life
> miserable with crappy workarounds.
>
> MEMORY_LINUX_PTMALLOC2 is the wrong define for what they want.  They
> should check for __malloc_hook and only use that code if __malloc_hook is
> found.
>
> Brian
>
> On 4/13/12 9:32 AM, "TERRY DONTJE"  wrote:
>
> >
> >
> >
> >I am thinking MEMORY_LINUX_PTMALLOC2 is probably the right define to
> >key off of but this is really going to look gross ifdef'ing out the
> >lines that are accessing the Linux memory module.  One other idea I
> >have is to create a dummy __malloc_hook in the Solaris memory module
> >but might there be other OSes that could run into the same
> >problem?   Or what happens if PTMALLOC2 is not used (does that
> >happen)?
> >
> >--td
> >
> >On 4/13/2012 10:45 AM, TERRY DONTJE wrote:
> >
> >
> >  r26255 is forcing the use of __malloc_hook which is implemented in
> >  opal/mca/memory/linux however that is not compiled in the library
> >  when built on Solaris thus causing a referenced symbol not found
> >  when libmpi tries to load the openib btl.
> >
> >  I am looking how to fix this now but if someone has a good idea
> >  how to detect when __malloc_hook is used (or not) I'd be
> >  interested in hearing it.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >
> >
> >
> >
> >
> >
> >
> >  Terry D. Dontje | Principal
> >Software Engineer
> >Developer
> >Tools
> >Engineering | +1.781.442.2631
> >
> >  Oracle
> >
> >  - Performance
> >  Technologies
> >
> >95 Network Drive, Burlington, MA 01803
> >Email terry.don...@oracle.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >___
> >devel mailing list
> >de...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
>
>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [EXTERNAL] Re: r26255 has made openib unusable on Solaris platforms

2012-04-13 Thread Ralph Castain
I don't know about "drama", but people did clearly explain to you why this 
approach was unacceptable. You simply cannot cross-link at the component level. 
If you need something from the opal/mca/memory framework, you have to get it 
from the framework level.

Doesn't seem that hard a concept to grasp and follow - failing to do so breaks 
things for a bunch of people, which is why we don't allow it. So I hope your 
"configure" approach also takes this into account, or we'll have to revert it 
again :-(


On Apr 13, 2012, at 11:13 AM, Mike Dubman wrote:

> Too many drama - we will fix it to detect hooks availability at configure 
> stage, this will make your life back to normal.
>  
> The problem is not a Mellanox hw, but Intel PCI bus implementation, which 
> charge extra latency if buffers are not aligned.
> The patch is a workaround for this problem and help to non-benchmark code as 
> well.
>  
> 
>  
> On Fri, Apr 13, 2012 at 7:06 PM, Barrett, Brian W  wrote:
> r2655 is awful as a patch.  It doesn't work on any non-Linux platform,
> which is unpleasant.  But worse, what does it possibly accomplish?  In
> codes other than benchmarks, there's no advantage to aligning the pointer
> to 32 or 64 byte boundaries, as the malloced buffer very rarely is exactly
> what is sent.  So you've done a whole lot of work, screwed with the memory
> allocator (which always bites OMPI in the butt), and accomplished nothing
> useful.  Mellanox should fix the hardware, not make everyone's life
> miserable with crappy workarounds.
> 
> MEMORY_LINUX_PTMALLOC2 is the wrong define for what they want.  They
> should check for __malloc_hook and only use that code if __malloc_hook is
> found.
> 
> Brian
> 
> On 4/13/12 9:32 AM, "TERRY DONTJE"  wrote:
> 
> >
> >
> >
> >I am thinking MEMORY_LINUX_PTMALLOC2 is probably the right define to
> >key off of but this is really going to look gross ifdef'ing out the
> >lines that are accessing the Linux memory module.  One other idea I
> >have is to create a dummy __malloc_hook in the Solaris memory module
> >but might there be other OSes that could run into the same
> >problem?   Or what happens if PTMALLOC2 is not used (does that
> >happen)?
> >
> >--td
> >
> >On 4/13/2012 10:45 AM, TERRY DONTJE wrote:
> >
> >
> >  r26255 is forcing the use of __malloc_hook which is implemented in
> >  opal/mca/memory/linux however that is not compiled in the library
> >  when built on Solaris thus causing a referenced symbol not found
> >  when libmpi tries to load the openib btl.
> >
> >  I am looking how to fix this now but if someone has a good idea
> >  how to detect when __malloc_hook is used (or not) I'd be
> >  interested in hearing it.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >
> >
> >
> >
> >
> >
> >
> >  Terry D. Dontje | Principal
> >Software Engineer
> >Developer
> >Tools
> >Engineering | +1.781.442.2631
> >
> >  Oracle
> >
> >  - Performance
> >  Technologies
> >
> >95 Network Drive, Burlington, MA 01803
> >Email terry.don...@oracle.com
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >___
> >devel mailing list
> >de...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
> 
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Non-zero exit status

2012-04-13 Thread Ralph Castain
This has come up again because some of the MTT tests depend on a specific 
behavior when a process exits with a non-zero status - in this case, they 
expect ORTE to abort the job. At some point, the default had been switched to 
NOT abort the job if a process exited with a non-zero status.

So I'll throw this out to the community: if any process exits with a non-zero 
status, should ORTE abort the job?

I don't personally care, but we ought to decide on something. In the meantime, 
I will set the default so we DO abort, thus allowing the MTT runs to complete 
correctly.

FWIW: the MCA param orte_abort_non_zero_exit can always be set to control this 
behavior.

Ralph




Re: [OMPI devel] Non-zero exit status

2012-04-13 Thread TERRY DONTJE
I could see if less then N processes exit with non-zero exit code that 
the ORTE may choose not to abort the job.  However, if all N processes 
have exited or aborted I expect everything to clean up and mpirun to 
exit.  It does not do that at the moment which I think is what is 
causing most of the hangs in the MTT trunk runs which did not occur 
prior to this week.


--td

On 4/13/2012 5:18 PM, Ralph Castain wrote:

This has come up again because some of the MTT tests depend on a specific 
behavior when a process exits with a non-zero status - in this case, they 
expect ORTE to abort the job. At some point, the default had been switched to 
NOT abort the job if a process exited with a non-zero status.

So I'll throw this out to the community: if any process exits with a non-zero 
status, should ORTE abort the job?

I don't personally care, but we ought to decide on something. In the meantime, 
I will set the default so we DO abort, thus allowing the MTT runs to complete 
correctly.

FWIW: the MCA param orte_abort_non_zero_exit can always be set to control this 
behavior.

Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI devel] Non-zero exit status

2012-04-13 Thread Ralph Castain
Did you have the param set? I found some missing code in the orted errmgr that 
contributed to it, but unless you had set the param in your test, there is no 
way it would abort no matter how many procs exit with non-zero status.

I'm guessing you have that param set in your test due to our earlier defining 
the default to "no abort". I'm content to leave it there, but wanted to ensure 
your tests ran clean.

On Apr 13, 2012, at 4:32 PM, TERRY DONTJE wrote:

> I could see if less then N processes exit with non-zero exit code that the 
> ORTE may choose not to abort the job.  However, if all N processes have 
> exited or aborted I expect everything to clean up and mpirun to exit.  It 
> does not do that at the moment which I think is what is causing most of the 
> hangs in the MTT trunk runs which did not occur prior to this week.
> 
> --td
> 
> On 4/13/2012 5:18 PM, Ralph Castain wrote:
>> 
>> This has come up again because some of the MTT tests depend on a specific 
>> behavior when a process exits with a non-zero status - in this case, they 
>> expect ORTE to abort the job. At some point, the default had been switched 
>> to NOT abort the job if a process exited with a non-zero status.
>> 
>> So I'll throw this out to the community: if any process exits with a 
>> non-zero status, should ORTE abort the job?
>> 
>> I don't personally care, but we ought to decide on something. In the 
>> meantime, I will set the default so we DO abort, thus allowing the MTT runs 
>> to complete correctly.
>> 
>> FWIW: the MCA param orte_abort_non_zero_exit can always be set to control 
>> this behavior.
>> 
>> Ralph
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> -- 
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] [PATCH] Open MPI on ARMv5

2012-04-13 Thread Evan Clinton
At present Open MPI only supports ARMv7 processors.  Attached is a
patch against current trunk (r26270) that extends the atomic
operations and memory barriers code to work with ARMv5 and ARMv6 ones,
too.

For v6, the only changes were to use "mcr p15, 0, r0, c7, c10, 5"
instead of the unavailable DMB instruction, and to disable the 64 bit
compare-exchange function (which I understand is not vital for Open
MPI on 32 bit platforms?).  For v5, it was a bit trickier; the
processor lacks nice memory barrier instructions or proper atomic
operations.  Fortunately, the Linux kernel offers several helper
functions on ARM, and I've used those here.

The changes build and pass all of the assembly-related tests in the
test folder and the hello world examples run on my "armv5tel" box
running Debian with Linux 2.6.32-5.  It should also run fine on ARMv6
boxes, and presumably v4, but I don't have either to test on.

Documentation for the Linux kernel helper functions:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=Documentation/arm/kernel_user_helpers.txt

I've sent in a contributor agreement so there should be no IP problems.

Hopefully this is useful,
Evan Clinton


ompi_armv5.diff
Description: Binary data