Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17983

2008-03-27 Thread Jeff Squyres

Gotcha.  Should this stuff go in ompi/config/ompi_microsoft.m4?

(I don't really care; I just already see a Microsoft file, so I  
figured I'd ask the question)



On Mar 26, 2008, at 9:54 PM, George Bosilca wrote:

Interix or SUA or SFU is the POSIX layer integrated with the latest
versions of Windows (such as Vista, and Server 2003). It provide fork,
rsh basically most of the tools we need.

 george.

Jeff Squyres wrote:

What's Interix?

On Mar 26, 2008, at 7:20 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
New Revision: 17983
URL: https://svn.open-mpi.org/trac/ompi/changeset/17983

Log:
Add support for Interix.

Added:
 trunk/config/ompi_interix.m4   (contents, props changed)
Text files modified:
 trunk/acinclude.m4 | 1 +
 trunk/configure.ac | 3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/acinclude.m4
=
=
=
=
=
=
=
=
= 
= 


--- trunk/acinclude.m4  (original)
+++ trunk/acinclude.m4  2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
@@ -108,6 +108,7 @@
# Include the macros for Windows checking
#
m4_include(config/ompi_microsoft.m4)
+m4_include(config/ompi_interix.m4)

#
# The config/mca_no_configure_components.m4 file is generated by

Added: trunk/config/ompi_interix.m4
=
=
=
=
=
=
=
=
= 
= 


--- (empty file)
+++ trunk/config/ompi_interix.m42008-03-26 19:20:33 EDT (Wed, 26
Mar 2008)
@@ -0,0 +1,56 @@
+dnl -*- shell-script -*-
+dnl
+dnl Copyright (c)  2008 The University of Tennessee and The
University
+dnl of Tennessee Research Foundation.  All
rights
+dnl reserved.
+dnl $COPYRIGHT$
+dnl
+dnl Additional copyrights may follow
+dnl
+dnl $HEADER$
+dnl
+
+
##
+#
+# OMPI_INTERIX
+#
+# Detect if the environment is SUA/SFU (i.e. Interix) and modify
+# the compiling environment accordingly.
+#
+# USAGE:
+#   OMPI_INTERIX()
+#
+
##
+AC_DEFUN([OMPI_INTERIX],[
+
+AC_MSG_CHECKING(for Interix environment)
+AC_TRY_COMPILE([],
+   [#if !defined(__INTERIX)
+#error Normal Unix environment
+#endif],
+   is_interix=yes,
+   is_interix=no)
+AC_MSG_RESULT([$is_interix])
+if test "$is_interix" = "yes"; then
+
+ompi_show_subtitle "Interix detection"
+
+if ! test -d /usr/include/port; then
+AC_MSG_WARN([Compiling Open MPI under Interix require
an up-to-date])
+AC_MSG_WARN([version of libport. Please ask your system
administrator])
+AC_MSG_WARN([to install it (pkg_update -L libport).])
+AC_MSG_ERROR([*** Cannot continue])
+fi
+#
+# These are the minimum requirements for Interix ...
+#
+AC_MSG_WARN([-lport was added to the linking flags])
+LDFLAGS="-lport $LDFLAGS"
+AC_MSG_WARN([-D_ALL_SOURCE -D_USE_LIBPORT was added to
the compilation flags])
+CFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CFLAGS"
+CPPFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CPPFLAGS"
+CXXFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CXXFLAGS"
+
+fi
+
+])

Modified: trunk/configure.ac
=
=
=
=
=
=
=
=
= 
= 


--- trunk/configure.ac  (original)
+++ trunk/configure.ac  2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
@@ -192,6 +192,9 @@
AM_CONDITIONAL(OMPI_NEED_WINDOWS_REPLACEMENTS,
  test "$ompi_cv_c_compiler_vendor" = "microsoft" )

+# Do all Interix detections if necessary
+OMPI_INTERIX
+
# Does the compiler support "ident"-like constructs?

OMPI_CHECK_IDENT([CC], [CFLAGS], [c], [C])
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full







___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-27 Thread Jeff Squyres

A few more comments on top of what Tim / Ralph said:

- opal_paffinity MCA params should be defined and registered in the  
opal paffinity base (in the base open function so that ompi_info can  
still see them), not opal/runtime/opal_params.c.


- I don't have a problem with setting the paffinity slot list from  
ompi_mpi_init, but we should probably make the corresponding MCA  
parameter be an "mpi_*" name; because this is functionality that is  
being exported through the MPI layer.  Additionally, the name  
"mpi_" will make more sense to users; they don't know  
anything about opal/orte -- "mpi_" resonates with running  
their MPI job.


- I don't think we can delete the MCA param ompi_paffinity_alone; it  
exists in the v1.2 series and has historical precedent.


- Note that symbols that are static don't have to abide by the prefix  
rule.  I'm not saying you need to change anything -- you don't -- I  
just notice that you made some symbols both static and use the prefix  
rule.  That's fine, but if you want to use shorter symbol names for  
static symbols, that's fine too.




On Mar 26, 2008, at 6:01 AM, Lenny Verkhovsky wrote:


Hi, all
Attached patch for modified Rank_File RMAPS component.

1.introduced new general purpose debug flags
  mpi_debug
  opal_debug

2.introduced new mca parameter opal_paffinity_slot_list
3.ompi_mpi_init cleaned from opal paffinity functions
4.opal paffinity functions moved to new file opal/mca/paffinity/ 
base/paffinity_base_service.c
5.rank_file component files were renamed according to prefix  
policy

6.global variables renamed as well.
7.few bug fixes that were brought during previous discussions.
8.If user defines opal_paffinity_alone and rmaps_rank_file_path  
or opal_paffinity_slot_list,

then he gets a Warning that only opal_paffinity_alone will be used.

.
Best Regards,
Lenny.





--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-27 Thread Lenny Verkhovsky
Hi,
thanks for the comments. I will definetly implement all of them and commit
the code as soon as I finished.

Also I experience few problems with using opal_verbose_output, either there
is a bugs or I am doing something wrong.


/home/USERS/lenny/OMPI_ORTE_DEBUG/bin/mpirun -mca mca_verbose 0 -mca
paffinity_base_verbose 1 --byslot -np 2 -hostfile hostfile -mca
btl_openib_max_lmc 1  -mca opal_paffinity_alone 1 -mca btl_openib_verbose 1
/home/USERS/lenny/TESTS/ORTE/mpi_p01_debug -t lt


/home/USERS/lenny/TESTS/ORTE/mpi_p01_debug: symbol lookup error:
/home/USERS/lenny/OMPI_ORTE_DEBUG//lib/openmpi/mca_btl_openib.so: undefined
symbol: mca_btl_base_out
/home/USERS/lenny/TESTS/ORTE/mpi_p01_debug: symbol lookup error:
/home/USERS/lenny/OMPI_ORTE_DEBUG//lib/openmpi/mca_btl_openib.so: undefined
symbol: mca_btl_base_out
--
mpirun has exited due to process rank 1 with PID 5896 on
node witch17 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).


On Wed, Mar 26, 2008 at 2:50 PM, Ralph H Castain  wrote:

> I would tend to echo Tim's suggestions. I note that you do lookup that
> opal
> mca param in orte as well. I know you sent me a note about that off-list -
> I
> apologize for not getting to it yet, but was swamped yesterday.
>
> I think the solution suggested in #1 below is the right approach. Looking
> up
> opal params in orte or ompi is probably not a good idea. We have had
> problems in the past where params were looked up in multiple places as
> people -do- sometimes change the names (ahem...).
>
> Also, I would suggest using the macro version of verbose
> OPAL_OUTPUT_VERBOSE
> so that it compiles out for non-debug builds - up to you. Many of us use
> it
> as we don't need the output from optimized builds.
>
> Other than that, I think this looks fine. I do truly appreciate the
> cleanup
> of ompi_mpi_init.
>
> Ralph
>
>
>
> On 3/26/08 6:09 AM, "Tim Prins"  wrote:
>
> > Hi Lenny,
> >
> > This looks good. But I have a couple of suggestions (which others may
> > disagree with):
> >
> > 1. You register an opal mca parameter, but look it up in ompi, then call
> > a opal function with the result. What if you had a function
> > opal_paffinity_base_set_slots(long rank) (or some other name, I don't
> > care) which looked up the mca parameter and then setup the slots as you
> > are doing if it is fount. This would make things a bit cleaner IMHO.
> >
> > 2. the functions in the paffinety base should be prefixed with
> > 'opal_paffinity_base_'
> >
> > 3. Why was the ompi_debug_flag added? It is not used anywhere.
> >
> > 4. You probably do not need to add the opal debug flag. There is already
> > a 'paffinity_base_verbose' flag which should suit your purposes fine. So
> > you should just be able to replace all of the conditional output
> > statements in paffinity with something like
> > opal_output_verbose(10, opal_paffinity_base_output, ...),
> > where 10 is the verbosity level number.
> >
> > Tim
> >
> >
> > Lenny Verkhovsky wrote:
> >>
> >>
> >> Hi, all
> >>
> >> Attached patch for modified Rank_File RMAPS component.
> >>
> >>
> >>
> >> 1.introduced new general purpose debug flags
> >>
> >>   mpi_debug
> >>
> >>   opal_debug
> >>
> >>
> >>
> >> 2.introduced new mca parameter opal_paffinity_slot_list
> >>
> >> 3.ompi_mpi_init cleaned from opal paffinity functions
> >>
> >> 4.opal paffinity functions moved to new file
> >> opal/mca/paffinity/base/paffinity_base_service.c
> >>
> >> 5.rank_file component files were renamed according to prefix policy
> >>
> >> 6.global variables renamed as well.
> >>
> >> 7.few bug fixes that were brought during previous discussions.
> >>
> >> 8.If user defines opal_paffinity_alone and rmaps_rank_file_path or
> >> opal_paffinity_slot_list,
> >>
> >> then he gets a Warning that only opal_paffinity_alone will be used.
> >>
> >>
> >>
> >> .
> >>
> >> Best Regards,
> >>
> >> Lenny.
> >>
> >>
> >>
> >>
> >>
> 
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] FreeBSD timer_base_open error?

2008-03-27 Thread Jeff Squyres

Added as https://svn.open-mpi.org/trac/ompi/ticket/1261.

On Mar 26, 2008, at 11:07 AM, Brian W. Barrett wrote:

George -

Good catch -- that's going to cause a problem :).  But I think we  
should
add yet another check to also make sure that we're on Linux.  So the  
three

tests would be:

  1) Am I on a platform that we have timer assembly support for?
 (That's the long list of architectures that we recently,
 and incorrectly, added).
  2) Am I on Linux (since we really only know how to parse
 /proc/cpuinfo on Linux)
  3) Is /proc/cpuinfo readable (Because we have a couple architectures
 that are reported by config.guess as Linux, but don't have
 /proc/cpuinfo).

Make sense?

Brian

On Wed, 26 Mar 2008, George Bosilca wrote:

I was working off-list with Brad on this. Brian is right, the logic  
in
configure.m4 is wrong. It overwrite the timer_linux_happy to yes if  
the host
match "i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*". On  
FreeBSD host

is i386-unknown-freebsd6.2.

Here is a quick and dirty patch. I just move the selection logic a  
little bit

around, without any major modifications.

george.

Index: configure.m4
===
--- configure.m4(revision 17970)
+++ configure.m4(working copy)
@@ -40,14 +40,12 @@
   [timer_linux_happy="yes"],
   [timer_linux_happy="no"])])

-AS_IF([test "$timer_linux_happy" = "yes"],
-  [AS_IF([test -r "/proc/cpuinfo"],
- [timer_linux_happy="yes"],
- [timer_linux_happy="no"])])
-
 case "${host}" in
 i?86-*|x86_64*|ia64-*|powerpc-*|powerpc64-*|sparc*-*)
-timer_linux_happy="yes"
+AS_IF([test "$timer_linux_happy" = "yes"],
+  [AS_IF([test -r "/proc/cpuinfo"],
+ [timer_linux_happy="yes"],
+ [timer_linux_happy="no"])])
  ;;
 *)
  timer_linux_happy="no"



On Mar 25, 2008, at 10:31 PM, Brian Barrett wrote:

On Mar 25, 2008, at 6:16 PM, Jeff Squyres wrote:
"linux" is the name of the component.  It looks like opal/mca/ 
timer/

linux/timer_linux_component.c is doing some checks during component
open() and returning an error if it can't be used (e.g,. if it's  
not

on linux).

The timer components are a little different than normal MCA
frameworks; they *must* be compiled in libopen-pal statically, and
there will only be one of them built.

In this case, I'm guessing that linux was built simply because  
nothing
else was selected to be built, but then its component_open()  
function

failed because it didn't find /proc/cpuinfo.



This is actually incorrect.  The linux component looks for /proc/
cpuinfo and builds if it founds that file.  There's a base component
that's built if nothing else is found.  The configure logic for the
linux component is probably not the right thing to do -- it should
probably be modified to check both for that file (there are systems
that call themselves "linux" but don't have a /proc/cpuinfo) is
readable and that we're actually on Linux.

Brian

--
Brian Barrett

There is an art . . . to flying. The knack lies in learning how to
throw yourself at the ground and miss.
Douglas Adams, 'The Hitchhikers Guide to the Galaxy'



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-27 Thread Lenny Verkhovsky


> -Original Message-
> From: Jeff Squyres [mailto:jsquy...@cisco.com]
> Sent: Thursday, March 27, 2008 1:38 PM
> To: Lenny Verkhovsky
> Cc: Ralph H Castain; Sharon Melamed; Open MPI Developers
> Subject: Re: RMAPS rank_file component patch and modifications for
review
> 
> A few more comments on top of what Tim / Ralph said:
> 
> - opal_paffinity MCA params should be defined and registered in the
> opal paffinity base (in the base open function so that ompi_info can
> still see them), not opal/runtime/opal_params.c.
OK.

> 
> - I don't have a problem with setting the paffinity slot list from
> ompi_mpi_init, but we should probably make the corresponding MCA
> parameter be an "mpi_*" name; because this is functionality that is
> being exported through the MPI layer.  Additionally, the name
> "mpi_" will make more sense to users; they don't know
> anything about opal/orte -- "mpi_" resonates with running
> their MPI job.
I think in opal_paffinity_base it makes more sense and ompi_mpi_init
will look cleaner.

> 
> - I don't think we can delete the MCA param ompi_paffinity_alone; it
> exists in the v1.2 series and has historical precedent.
It will not be deleted,
It will just use the same infrastructure ( slot_list parameter and
opal_base functions ). It will be transparent for the user.

User have 3 ways to setup it
1.  mca opal_paffinity_alone 1 
This will set paffinity as it did before
2.  mca opal_paffinity_slot_list "slot_list"
Used to define slots that will be used for all ranks on all
nodes.
3.  mca rmaps_rank_file_path rankfile
Assigning ranks to CPUs according to the file

Rank_file_path can be used with opal_paffinity_slot_list
In this case all undefined by rankfile ranks will be assigned by
opal_paffinity_slot_list mca parameter.


> 
> - Note that symbols that are static don't have to abide by the prefix
> rule.  I'm not saying you need to change anything -- you don't -- I
> just notice that you made some symbols both static and use the prefix
> rule.  That's fine, but if you want to use shorter symbol names for
> static symbols, that's fine too.
> 
> 
> 
> On Mar 26, 2008, at 6:01 AM, Lenny Verkhovsky wrote:
> >
> > Hi, all
> > Attached patch for modified Rank_File RMAPS component.
> >
> > 1.introduced new general purpose debug flags
> >   mpi_debug
> >   opal_debug
> >
> > 2.introduced new mca parameter opal_paffinity_slot_list
> > 3.ompi_mpi_init cleaned from opal paffinity functions
> > 4.opal paffinity functions moved to new file opal/mca/paffinity/
> > base/paffinity_base_service.c
> > 5.rank_file component files were renamed according to prefix
> > policy
> > 6.global variables renamed as well.
> > 7.few bug fixes that were brought during previous discussions.
> > 8.If user defines opal_paffinity_alone and rmaps_rank_file_path
> > or opal_paffinity_slot_list,
> > then he gets a Warning that only opal_paffinity_alone will be used.
> >
> > .
> > Best Regards,
> > Lenny.
> >
> > 
> 
> 
> --
> Jeff Squyres
> Cisco Systems




Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-27 Thread Jeff Squyres

Are you using BTL_OUTPUT or something else from btl_base_error.h?


On Mar 27, 2008, at 7:49 AM, Lenny Verkhovsky wrote:

Hi,
thanks for the comments. I will definetly implement all of them and  
commit the code as soon as I finished.


Also I experience few problems with using opal_verbose_output,  
either there is a bugs or I am doing something wrong.



/home/USERS/lenny/OMPI_ORTE_DEBUG/bin/mpirun -mca mca_verbose 0 -mca  
paffinity_base_verbose 1 --byslot -np 2 -hostfile hostfile -mca  
btl_openib_max_lmc 1  -mca opal_paffinity_alone 1 -mca  
btl_openib_verbose 1  /home/USERS/lenny/TESTS/ORTE/mpi_p01_debug -t lt



/home/USERS/lenny/TESTS/ORTE/mpi_p01_debug: symbol lookup error: / 
home/USERS/lenny/OMPI_ORTE_DEBUG//lib/openmpi/mca_btl_openib.so:  
undefined symbol: mca_btl_base_out
/home/USERS/lenny/TESTS/ORTE/mpi_p01_debug: symbol lookup error: / 
home/USERS/lenny/OMPI_ORTE_DEBUG//lib/openmpi/mca_btl_openib.so:  
undefined symbol: mca_btl_base_out

--
mpirun has exited due to process rank 1 with PID 5896 on
node witch17 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).


On Wed, Mar 26, 2008 at 2:50 PM, Ralph H Castain  wrote:
I would tend to echo Tim's suggestions. I note that you do lookup  
that opal
mca param in orte as well. I know you sent me a note about that off- 
list - I

apologize for not getting to it yet, but was swamped yesterday.

I think the solution suggested in #1 below is the right approach.  
Looking up

opal params in orte or ompi is probably not a good idea. We have had
problems in the past where params were looked up in multiple places as
people -do- sometimes change the names (ahem...).

Also, I would suggest using the macro version of verbose  
OPAL_OUTPUT_VERBOSE
so that it compiles out for non-debug builds - up to you. Many of us  
use it

as we don't need the output from optimized builds.

Other than that, I think this looks fine. I do truly appreciate the  
cleanup

of ompi_mpi_init.

Ralph



On 3/26/08 6:09 AM, "Tim Prins"  wrote:

> Hi Lenny,
>
> This looks good. But I have a couple of suggestions (which others  
may

> disagree with):
>
> 1. You register an opal mca parameter, but look it up in ompi,  
then call

> a opal function with the result. What if you had a function
> opal_paffinity_base_set_slots(long rank) (or some other name, I  
don't
> care) which looked up the mca parameter and then setup the slots  
as you

> are doing if it is fount. This would make things a bit cleaner IMHO.
>
> 2. the functions in the paffinety base should be prefixed with
> 'opal_paffinity_base_'
>
> 3. Why was the ompi_debug_flag added? It is not used anywhere.
>
> 4. You probably do not need to add the opal debug flag. There is  
already
> a 'paffinity_base_verbose' flag which should suit your purposes  
fine. So

> you should just be able to replace all of the conditional output
> statements in paffinity with something like
> opal_output_verbose(10, opal_paffinity_base_output, ...),
> where 10 is the verbosity level number.
>
> Tim
>
>
> Lenny Verkhovsky wrote:
>>
>>
>> Hi, all
>>
>> Attached patch for modified Rank_File RMAPS component.
>>
>>
>>
>> 1.introduced new general purpose debug flags
>>
>>   mpi_debug
>>
>>   opal_debug
>>
>>
>>
>> 2.introduced new mca parameter opal_paffinity_slot_list
>>
>> 3.ompi_mpi_init cleaned from opal paffinity functions
>>
>> 4.opal paffinity functions moved to new file
>> opal/mca/paffinity/base/paffinity_base_service.c
>>
>> 5.rank_file component files were renamed according to prefix  
policy

>>
>> 6.global variables renamed as well.
>>
>> 7.few bug fixes that were brought during previous discussions.
>>
>> 8.If user defines opal_paffinity_alone and  
rmaps_rank_file_path or

>> opal_paffinity_slot_list,
>>
>> then he gets a Warning that only opal_paffinity_alone will be used.
>>
>>
>>
>> .
>>
>> Best Regards,
>>
>> Lenny.
>>
>>
>>
>>
>>  


>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] trunk segfault

2008-03-27 Thread Jeff Squyres

Lenny --

Did this get fixed?  We were mucking with some mca param stuff on the  
trunk yesterday; not sure if it was related to this failure or not.



On Mar 26, 2008, at 10:34 AM, Lenny Verkhovsky wrote:

Hi, all

I compiled and builded source from trunk
and it causes segfault

/home/USERS/lenny/OMPI_ORTE_NEW/bin/mpirun -np 1 -H witch17 /home/ 
USERS/lenny/TESTS/ORTE/mpi_p01_NEW -t lt


--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
  mca_mpi_register_params() failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
[witch17:01220] *** Process received signal ***
[witch17:01220] Signal: Segmentation fault (11)
[witch17:01220] Signal code:  (128)
[witch17:01220] Failing at address: (nil)
[witch17:01220] [ 0] /lib64/libpthread.so.0 [0x2aadf7072c10]
[witch17:01220] [ 1] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libopen- 
pal.so.0(free+0x56) [0x2aadf6acb6d6]
[witch17:01220] [ 2] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libopen- 
pal.so.0(opal_argv_free+0x25) [0x2aadf6ab9635]
[witch17:01220] [ 3] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libmpi.so.0  
[0x2aadf67f4206]
[witch17:01220] [ 4] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libmpi.so. 
0(MPI_Init+0xf0) [0x2aadf68117c0]
[witch17:01220] [ 5] /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW(main 
+0xef) [0x40109f]
[witch17:01220] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)  
[0x2aadf7199154]
[witch17:01220] [ 7] /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW  
[0x400ee9]

[witch17:01220] *** End of error message ***
--
mpirun noticed that process rank 0 with PID 1220 on node witch17  
exited on signal 11 (Segmentation fault).

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] RMAPS rank_file component patch and modifications for review

2008-03-27 Thread Lenny Verkhovsky
NO, just tried to see some printouts during the run,
I use in the code

opal_output_verbose(0, 0,"LNY100 opal_paffinity_base_slot_list_set ver=%d
",0);
opal_output_verbose(1, 0,"LNY101 opal_paffinity_base_slot_list_set ver=%d
",1);
OPAL_OUTPUT_VERBOSE((1, 0,"VERBOSE LNY102 opal_paffinity_base_slot_list_set
ver=%d ",1));
but all I see is the first line ( since I put level 0)
I suppose that to see the second line I must configure with --enable-debug,
but this is not working for me either.



On Thu, Mar 27, 2008 at 2:02 PM, Jeff Squyres  wrote:

> Are you using BTL_OUTPUT or something else from btl_base_error.h?
>
>
> On Mar 27, 2008, at 7:49 AM, Lenny Verkhovsky wrote:
> > Hi,
> > thanks for the comments. I will definetly implement all of them and
> > commit the code as soon as I finished.
> >
> > Also I experience few problems with using opal_verbose_output,
> > either there is a bugs or I am doing something wrong.
> >
> >
> > /home/USERS/lenny/OMPI_ORTE_DEBUG/bin/mpirun -mca mca_verbose 0 -mca
> > paffinity_base_verbose 1 --byslot -np 2 -hostfile hostfile -mca
> > btl_openib_max_lmc 1  -mca opal_paffinity_alone 1 -mca
> > btl_openib_verbose 1  /home/USERS/lenny/TESTS/ORTE/mpi_p01_debug -t lt
> >
> >
> > /home/USERS/lenny/TESTS/ORTE/mpi_p01_debug: symbol lookup error: /
> > home/USERS/lenny/OMPI_ORTE_DEBUG//lib/openmpi/mca_btl_openib.so:
> > undefined symbol: mca_btl_base_out
> > /home/USERS/lenny/TESTS/ORTE/mpi_p01_debug: symbol lookup error: /
> > home/USERS/lenny/OMPI_ORTE_DEBUG//lib/openmpi/mca_btl_openib.so:
> > undefined symbol: mca_btl_base_out
> >
> --
> > mpirun has exited due to process rank 1 with PID 5896 on
> > node witch17 exiting without calling "finalize". This may
> > have caused other processes in the application to be
> > terminated by signals sent by mpirun (as reported here).
> >
> >
> > On Wed, Mar 26, 2008 at 2:50 PM, Ralph H Castain  wrote:
> > I would tend to echo Tim's suggestions. I note that you do lookup
> > that opal
> > mca param in orte as well. I know you sent me a note about that off-
> > list - I
> > apologize for not getting to it yet, but was swamped yesterday.
> >
> > I think the solution suggested in #1 below is the right approach.
> > Looking up
> > opal params in orte or ompi is probably not a good idea. We have had
> > problems in the past where params were looked up in multiple places as
> > people -do- sometimes change the names (ahem...).
> >
> > Also, I would suggest using the macro version of verbose
> > OPAL_OUTPUT_VERBOSE
> > so that it compiles out for non-debug builds - up to you. Many of us
> > use it
> > as we don't need the output from optimized builds.
> >
> > Other than that, I think this looks fine. I do truly appreciate the
> > cleanup
> > of ompi_mpi_init.
> >
> > Ralph
> >
> >
> >
> > On 3/26/08 6:09 AM, "Tim Prins"  wrote:
> >
> > > Hi Lenny,
> > >
> > > This looks good. But I have a couple of suggestions (which others
> > may
> > > disagree with):
> > >
> > > 1. You register an opal mca parameter, but look it up in ompi,
> > then call
> > > a opal function with the result. What if you had a function
> > > opal_paffinity_base_set_slots(long rank) (or some other name, I
> > don't
> > > care) which looked up the mca parameter and then setup the slots
> > as you
> > > are doing if it is fount. This would make things a bit cleaner IMHO.
> > >
> > > 2. the functions in the paffinety base should be prefixed with
> > > 'opal_paffinity_base_'
> > >
> > > 3. Why was the ompi_debug_flag added? It is not used anywhere.
> > >
> > > 4. You probably do not need to add the opal debug flag. There is
> > already
> > > a 'paffinity_base_verbose' flag which should suit your purposes
> > fine. So
> > > you should just be able to replace all of the conditional output
> > > statements in paffinity with something like
> > > opal_output_verbose(10, opal_paffinity_base_output, ...),
> > > where 10 is the verbosity level number.
> > >
> > > Tim
> > >
> > >
> > > Lenny Verkhovsky wrote:
> > >>
> > >>
> > >> Hi, all
> > >>
> > >> Attached patch for modified Rank_File RMAPS component.
> > >>
> > >>
> > >>
> > >> 1.introduced new general purpose debug flags
> > >>
> > >>   mpi_debug
> > >>
> > >>   opal_debug
> > >>
> > >>
> > >>
> > >> 2.introduced new mca parameter opal_paffinity_slot_list
> > >>
> > >> 3.ompi_mpi_init cleaned from opal paffinity functions
> > >>
> > >> 4.opal paffinity functions moved to new file
> > >> opal/mca/paffinity/base/paffinity_base_service.c
> > >>
> > >> 5.rank_file component files were renamed according to prefix
> > policy
> > >>
> > >> 6.global variables renamed as well.
> > >>
> > >> 7.few bug fixes that were brought during previous discussions.
> > >>
> > >> 8.If user defines opal_paffinity_alone and
> > rmaps_rank_file_path or
> > >> opal_paffinity_slot_list,
> > >>
> > >> then he gets a Warning that only opal_paffinity_alone wil

Re: [OMPI devel] trunk segfault

2008-03-27 Thread Lenny Verkhovsky
yes, thanks.



On Thu, Mar 27, 2008 at 2:07 PM, Jeff Squyres  wrote:

> Lenny --
>
> Did this get fixed?  We were mucking with some mca param stuff on the
> trunk yesterday; not sure if it was related to this failure or not.
>
>
> On Mar 26, 2008, at 10:34 AM, Lenny Verkhovsky wrote:
> > Hi, all
> >
> > I compiled and builded source from trunk
> > and it causes segfault
> >
> > /home/USERS/lenny/OMPI_ORTE_NEW/bin/mpirun -np 1 -H witch17 /home/
> > USERS/lenny/TESTS/ORTE/mpi_p01_NEW -t lt
> >
> >
> --
> > It looks like MPI_INIT failed for some reason; your parallel process
> > is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or
> > environment
> > problems.  This failure appears to be an internal failure; here's some
> > additional information (which may only be relevant to an Open MPI
> > developer):
> >   mca_mpi_register_params() failed
> >   --> Returned "Error" (-1) instead of "Success" (0)
> >
> --
> > [witch17:01220] *** Process received signal ***
> > [witch17:01220] Signal: Segmentation fault (11)
> > [witch17:01220] Signal code:  (128)
> > [witch17:01220] Failing at address: (nil)
> > [witch17:01220] [ 0] /lib64/libpthread.so.0 [0x2aadf7072c10]
> > [witch17:01220] [ 1] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libopen-
> > pal.so.0(free+0x56) [0x2aadf6acb6d6]
> > [witch17:01220] [ 2] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libopen-
> > pal.so.0(opal_argv_free+0x25) [0x2aadf6ab9635]
> > [witch17:01220] [ 3] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libmpi.so.0
> > [0x2aadf67f4206]
> > [witch17:01220] [ 4] /home/USERS/lenny/OMPI_ORTE_NEW/lib/libmpi.so.
> > 0(MPI_Init+0xf0) [0x2aadf68117c0]
> > [witch17:01220] [ 5] /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW(main
> > +0xef) [0x40109f]
> > [witch17:01220] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> > [0x2aadf7199154]
> > [witch17:01220] [ 7] /home/USERS/lenny/TESTS/ORTE/mpi_p01_NEW
> > [0x400ee9]
> > [witch17:01220] *** End of error message ***
> >
> --
> > mpirun noticed that process rank 0 with PID 1220 on node witch17
> > exited on signal 11 (Segmentation fault).
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Tim Prins

This commit breaks things for me. Running on 3 nodes of odin:

mpirun -mca btl tcp,sm,self  examples/ring_c

causes a hang. All of the processes are stuck in 
orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang, 
and the ring program does not hang all the time, but fairly often.


Tim

r...@osl.iu.edu wrote:

Author: rhc
Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
New Revision: 17941
URL: https://svn.open-mpi.org/trac/ompi/changeset/17941

Log:
Fix the allgather and allgather_list functions to avoid deadlocks at large 
node/proc counts. Violated the RML rules here - we received the allgather 
buffer and then did an xcast, which causes a send to go out, and is then 
subsequently received by the sender. This fix breaks that pattern by forcing 
the recv to complete outside of the function itself - thus, the allgather and 
allgather_list always complete their recvs before returning or sending.

Reogranize the grpcomm code a little to provide support for soon-to-come new 
grpcomm components. The revised organization puts what will be common code 
elements in the base to avoid duplication, while allowing components that don't 
need those functions to ignore them.

Added:
   trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
Text files modified: 
   trunk/orte/mca/grpcomm/base/Makefile.am| 5 
   trunk/orte/mca/grpcomm/base/base.h |23 +   
   trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4 
   trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1 
   trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---   
   trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16 
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -   
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845 ++- 
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8 
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8 
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21 
   trunk/orte/mca/grpcomm/grpcomm.h   |45 +   
   trunk/orte/mca/rml/rml_types.h |31 
   trunk/orte/orted/orted_comm.c  |27 +   
   14 files changed, 226 insertions(+), 959 deletions(-)



Diff not shown due to size (92619 bytes).
To see the diff, run the following command:

svn diff -r 17940:17941 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn




Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Ralph H Castain
Thanks Tim - I found the problem and will commit a fix shortly.

Appreciate your testing and reporting!


On 3/27/08 8:24 AM, "Tim Prins"  wrote:

> This commit breaks things for me. Running on 3 nodes of odin:
> 
> mpirun -mca btl tcp,sm,self  examples/ring_c
> 
> causes a hang. All of the processes are stuck in
> orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
> and the ring program does not hang all the time, but fairly often.
> 
> Tim
> 
> r...@osl.iu.edu wrote:
>> Author: rhc
>> Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
>> New Revision: 17941
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/17941
>> 
>> Log:
>> Fix the allgather and allgather_list functions to avoid deadlocks at large
>> node/proc counts. Violated the RML rules here - we received the allgather
>> buffer and then did an xcast, which causes a send to go out, and is then
>> subsequently received by the sender. This fix breaks that pattern by forcing
>> the recv to complete outside of the function itself - thus, the allgather and
>> allgather_list always complete their recvs before returning or sending.
>> 
>> Reogranize the grpcomm code a little to provide support for soon-to-come new
>> grpcomm components. The revised organization puts what will be common code
>> elements in the base to avoid duplication, while allowing components that
>> don't need those functions to ignore them.
>> 
>> Added:
>>trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
>>trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
>>trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
>> Text files modified:
>>trunk/orte/mca/grpcomm/base/Makefile.am| 5
>>trunk/orte/mca/grpcomm/base/base.h |23 +
>>trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4
>>trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1
>>trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---
>>trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16
>>trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -
>>trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845
>> ++-
>>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
>>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8
>>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21
>>trunk/orte/mca/grpcomm/grpcomm.h   |45 +
>>trunk/orte/mca/rml/rml_types.h |31
>>trunk/orte/orted/orted_comm.c  |27 +
>>14 files changed, 226 insertions(+), 959 deletions(-)
>> 
>> 
>> Diff not shown due to size (92619 bytes).
>> To see the diff, run the following command:
>> 
>> svn diff -r 17940:17941 --no-diff-deleted
>> 
>> ___
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17983

2008-03-27 Thread George Bosilca
Well, technically speaking Interix it's not ... 100% Microsoft, even  
if now it's somehow integrated in Windows. It does not support the  
standard Windows environment (such as windows.h) nor the compilers. It  
come with gcc (3.3), and most of the Unix tools.


  george.

On Mar 27, 2008, at 6:13 AM, Jeff Squyres wrote:

Gotcha.  Should this stuff go in ompi/config/ompi_microsoft.m4?

(I don't really care; I just already see a Microsoft file, so I
figured I'd ask the question)


On Mar 26, 2008, at 9:54 PM, George Bosilca wrote:

Interix or SUA or SFU is the POSIX layer integrated with the latest
versions of Windows (such as Vista, and Server 2003). It provide  
fork,

rsh basically most of the tools we need.

george.

Jeff Squyres wrote:

What's Interix?

On Mar 26, 2008, at 7:20 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
New Revision: 17983
URL: https://svn.open-mpi.org/trac/ompi/changeset/17983

Log:
Add support for Interix.

Added:
trunk/config/ompi_interix.m4   (contents, props changed)
Text files modified:
trunk/acinclude.m4 | 1 +
trunk/configure.ac | 3 +++
2 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/acinclude.m4
=
=
=
=
=
=
=
=
=
=
= 
===

--- trunk/acinclude.m4  (original)
+++ trunk/acinclude.m4  2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
@@ -108,6 +108,7 @@
# Include the macros for Windows checking
#
m4_include(config/ompi_microsoft.m4)
+m4_include(config/ompi_interix.m4)

#
# The config/mca_no_configure_components.m4 file is generated by

Added: trunk/config/ompi_interix.m4
=
=
=
=
=
=
=
=
=
=
= 
===

--- (empty file)
+++ trunk/config/ompi_interix.m42008-03-26 19:20:33 EDT (Wed, 26
Mar 2008)
@@ -0,0 +1,56 @@
+dnl -*- shell-script -*-
+dnl
+dnl Copyright (c)  2008 The University of Tennessee and The
University
+dnl of Tennessee Research Foundation.  All
rights
+dnl reserved.
+dnl $COPYRIGHT$
+dnl
+dnl Additional copyrights may follow
+dnl
+dnl $HEADER$
+dnl
+
+
##
+#
+# OMPI_INTERIX
+#
+# Detect if the environment is SUA/SFU (i.e. Interix) and modify
+# the compiling environment accordingly.
+#
+# USAGE:
+#   OMPI_INTERIX()
+#
+
##
+AC_DEFUN([OMPI_INTERIX],[
+
+AC_MSG_CHECKING(for Interix environment)
+AC_TRY_COMPILE([],
+   [#if !defined(__INTERIX)
+#error Normal Unix environment
+#endif],
+   is_interix=yes,
+   is_interix=no)
+AC_MSG_RESULT([$is_interix])
+if test "$is_interix" = "yes"; then
+
+ompi_show_subtitle "Interix detection"
+
+if ! test -d /usr/include/port; then
+AC_MSG_WARN([Compiling Open MPI under Interix require
an up-to-date])
+AC_MSG_WARN([version of libport. Please ask your  
system

administrator])
+AC_MSG_WARN([to install it (pkg_update -L libport).])
+AC_MSG_ERROR([*** Cannot continue])
+fi
+#
+# These are the minimum requirements for Interix ...
+#
+AC_MSG_WARN([-lport was added to the linking flags])
+LDFLAGS="-lport $LDFLAGS"
+AC_MSG_WARN([-D_ALL_SOURCE -D_USE_LIBPORT was added to
the compilation flags])
+CFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CFLAGS"
+CPPFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CPPFLAGS"
+CXXFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CXXFLAGS"
+
+fi
+
+])

Modified: trunk/configure.ac
=
=
=
=
=
=
=
=
=
=
= 
===

--- trunk/configure.ac  (original)
+++ trunk/configure.ac  2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
@@ -192,6 +192,9 @@
AM_CONDITIONAL(OMPI_NEED_WINDOWS_REPLACEMENTS,
 test "$ompi_cv_c_compiler_vendor" = "microsoft" )

+# Do all Interix detections if necessary
+OMPI_INTERIX
+
# Does the compiler support "ident"-like constructs?

OMPI_CHECK_IDENT([CC], [CFLAGS], [c], [C])
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full







___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r17983

2008-03-27 Thread Jeff Squyres

Gotcha; thanks for the explanation.

On Mar 27, 2008, at 10:58 AM, George Bosilca wrote:
Well, technically speaking Interix it's not ... 100% Microsoft, even  
if now it's somehow integrated in Windows. It does not support the  
standard Windows environment (such as windows.h) nor the compilers.  
It come with gcc (3.3), and most of the Unix tools.


 george.

On Mar 27, 2008, at 6:13 AM, Jeff Squyres wrote:

Gotcha.  Should this stuff go in ompi/config/ompi_microsoft.m4?

(I don't really care; I just already see a Microsoft file, so I
figured I'd ask the question)


On Mar 26, 2008, at 9:54 PM, George Bosilca wrote:

Interix or SUA or SFU is the POSIX layer integrated with the latest
versions of Windows (such as Vista, and Server 2003). It provide  
fork,

rsh basically most of the tools we need.

george.

Jeff Squyres wrote:

What's Interix?

On Mar 26, 2008, at 7:20 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
New Revision: 17983
URL: https://svn.open-mpi.org/trac/ompi/changeset/17983

Log:
Add support for Interix.

Added:
trunk/config/ompi_interix.m4   (contents, props changed)
Text files modified:
trunk/acinclude.m4 | 1 +
trunk/configure.ac | 3 +++
2 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/acinclude.m4
=
=
=
=
=
=
=
=
=
=
= 
= 
==

--- trunk/acinclude.m4  (original)
+++ trunk/acinclude.m4  2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
@@ -108,6 +108,7 @@
# Include the macros for Windows checking
#
m4_include(config/ompi_microsoft.m4)
+m4_include(config/ompi_interix.m4)

#
# The config/mca_no_configure_components.m4 file is generated by

Added: trunk/config/ompi_interix.m4
=
=
=
=
=
=
=
=
=
=
= 
= 
==

--- (empty file)
+++ trunk/config/ompi_interix.m42008-03-26 19:20:33 EDT (Wed, 26
Mar 2008)
@@ -0,0 +1,56 @@
+dnl -*- shell-script -*-
+dnl
+dnl Copyright (c)  2008 The University of Tennessee and The
University
+dnl of Tennessee Research Foundation.   
All

rights
+dnl reserved.
+dnl $COPYRIGHT$
+dnl
+dnl Additional copyrights may follow
+dnl
+dnl $HEADER$
+dnl
+
+
##
+#
+# OMPI_INTERIX
+#
+# Detect if the environment is SUA/SFU (i.e. Interix) and modify
+# the compiling environment accordingly.
+#
+# USAGE:
+#   OMPI_INTERIX()
+#
+
##
+AC_DEFUN([OMPI_INTERIX],[
+
+AC_MSG_CHECKING(for Interix environment)
+AC_TRY_COMPILE([],
+   [#if !defined(__INTERIX)
+#error Normal Unix environment
+#endif],
+   is_interix=yes,
+   is_interix=no)
+AC_MSG_RESULT([$is_interix])
+if test "$is_interix" = "yes"; then
+
+ompi_show_subtitle "Interix detection"
+
+if ! test -d /usr/include/port; then
+AC_MSG_WARN([Compiling Open MPI under Interix require
an up-to-date])
+AC_MSG_WARN([version of libport. Please ask your  
system

administrator])
+AC_MSG_WARN([to install it (pkg_update -L libport).])
+AC_MSG_ERROR([*** Cannot continue])
+fi
+#
+# These are the minimum requirements for Interix ...
+#
+AC_MSG_WARN([-lport was added to the linking flags])
+LDFLAGS="-lport $LDFLAGS"
+AC_MSG_WARN([-D_ALL_SOURCE -D_USE_LIBPORT was added  
to

the compilation flags])
+CFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/port
$CFLAGS"
+CPPFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/ 
port

$CPPFLAGS"
+CXXFLAGS="-D_ALL_SOURCE -D_USE_LIBPORT -I/usr/include/ 
port

$CXXFLAGS"
+
+fi
+
+])

Modified: trunk/configure.ac
=
=
=
=
=
=
=
=
=
=
= 
= 
==

--- trunk/configure.ac  (original)
+++ trunk/configure.ac  2008-03-26 19:20:33 EDT (Wed, 26 Mar 2008)
@@ -192,6 +192,9 @@
AM_CONDITIONAL(OMPI_NEED_WINDOWS_REPLACEMENTS,
test "$ompi_cv_c_compiler_vendor" = "microsoft" )

+# Do all Interix detections if necessary
+OMPI_INTERIX
+
# Does the compiler support "ident"-like constructs?

OMPI_CHECK_IDENT([CC], [CFLAGS], [c], [C])
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full







___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Tim Prins
Unfortunately now with r17988 I cannot run any mpi programs, they seem 
to hang in the modex.


Tim

Ralph H Castain wrote:

Thanks Tim - I found the problem and will commit a fix shortly.

Appreciate your testing and reporting!


On 3/27/08 8:24 AM, "Tim Prins"  wrote:


This commit breaks things for me. Running on 3 nodes of odin:

mpirun -mca btl tcp,sm,self  examples/ring_c

causes a hang. All of the processes are stuck in
orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
and the ring program does not hang all the time, but fairly often.

Tim

r...@osl.iu.edu wrote:

Author: rhc
Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
New Revision: 17941
URL: https://svn.open-mpi.org/trac/ompi/changeset/17941

Log:
Fix the allgather and allgather_list functions to avoid deadlocks at large
node/proc counts. Violated the RML rules here - we received the allgather
buffer and then did an xcast, which causes a send to go out, and is then
subsequently received by the sender. This fix breaks that pattern by forcing
the recv to complete outside of the function itself - thus, the allgather and
allgather_list always complete their recvs before returning or sending.

Reogranize the grpcomm code a little to provide support for soon-to-come new
grpcomm components. The revised organization puts what will be common code
elements in the base to avoid duplication, while allowing components that
don't need those functions to ignore them.

Added:
   trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
   trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
Text files modified:
   trunk/orte/mca/grpcomm/base/Makefile.am| 5
   trunk/orte/mca/grpcomm/base/base.h |23 +
   trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4
   trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1
   trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---
   trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -
   trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845
++-
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8
   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21
   trunk/orte/mca/grpcomm/grpcomm.h   |45 +
   trunk/orte/mca/rml/rml_types.h |31
   trunk/orte/orted/orted_comm.c  |27 +
   14 files changed, 226 insertions(+), 959 deletions(-)


Diff not shown due to size (92619 bytes).
To see the diff, run the following command:

svn diff -r 17940:17941 --no-diff-deleted

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Ralph H Castain
Hmmm...puzzling. It is working fine for me on TM machines and on my Mac.
However, Galen reports it borked on alps as well.

I'll have to dig a little to check this out and see if there is something
missing on those PLMs. Will get back shortly.

Sorry for problem


On 3/27/08 10:28 AM, "Tim Prins"  wrote:

> Unfortunately now with r17988 I cannot run any mpi programs, they seem
> to hang in the modex.
> 
> Tim
> 
> Ralph H Castain wrote:
>> Thanks Tim - I found the problem and will commit a fix shortly.
>> 
>> Appreciate your testing and reporting!
>> 
>> 
>> On 3/27/08 8:24 AM, "Tim Prins"  wrote:
>> 
>>> This commit breaks things for me. Running on 3 nodes of odin:
>>> 
>>> mpirun -mca btl tcp,sm,self  examples/ring_c
>>> 
>>> causes a hang. All of the processes are stuck in
>>> orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
>>> and the ring program does not hang all the time, but fairly often.
>>> 
>>> Tim
>>> 
>>> r...@osl.iu.edu wrote:
 Author: rhc
 Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
 New Revision: 17941
 URL: https://svn.open-mpi.org/trac/ompi/changeset/17941
 
 Log:
 Fix the allgather and allgather_list functions to avoid deadlocks at large
 node/proc counts. Violated the RML rules here - we received the allgather
 buffer and then did an xcast, which causes a send to go out, and is then
 subsequently received by the sender. This fix breaks that pattern by
 forcing
 the recv to complete outside of the function itself - thus, the allgather
 and
 allgather_list always complete their recvs before returning or sending.
 
 Reogranize the grpcomm code a little to provide support for soon-to-come
 new
 grpcomm components. The revised organization puts what will be common code
 elements in the base to avoid duplication, while allowing components that
 don't need those functions to ignore them.
 
 Added:
trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
 Text files modified:
trunk/orte/mca/grpcomm/base/Makefile.am| 5
trunk/orte/mca/grpcomm/base/base.h |23 +
trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4
trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1
trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---
trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16
trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -
trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845
 ++-
trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8
trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21
trunk/orte/mca/grpcomm/grpcomm.h   |45 +
trunk/orte/mca/rml/rml_types.h |31
trunk/orte/orted/orted_comm.c  |27 +
14 files changed, 226 insertions(+), 959 deletions(-)
 
 
 Diff not shown due to size (92619 bytes).
 To see the diff, run the following command:
 
 svn diff -r 17940:17941 --no-diff-deleted
 
 ___
 svn mailing list
 s...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/svn
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Ralph H Castain
Found the problem - should have a fix committed soon. Issue is with
differences in the number of daemons launched by the various plms (whether
or not procs are launched local to mpirun).



On 3/27/08 10:39 AM, "Ralph H Castain"  wrote:

> Hmmm...puzzling. It is working fine for me on TM machines and on my Mac.
> However, Galen reports it borked on alps as well.
> 
> I'll have to dig a little to check this out and see if there is something
> missing on those PLMs. Will get back shortly.
> 
> Sorry for problem
> 
> 
> On 3/27/08 10:28 AM, "Tim Prins"  wrote:
> 
>> Unfortunately now with r17988 I cannot run any mpi programs, they seem
>> to hang in the modex.
>> 
>> Tim
>> 
>> Ralph H Castain wrote:
>>> Thanks Tim - I found the problem and will commit a fix shortly.
>>> 
>>> Appreciate your testing and reporting!
>>> 
>>> 
>>> On 3/27/08 8:24 AM, "Tim Prins"  wrote:
>>> 
 This commit breaks things for me. Running on 3 nodes of odin:
 
 mpirun -mca btl tcp,sm,self  examples/ring_c
 
 causes a hang. All of the processes are stuck in
 orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
 and the ring program does not hang all the time, but fairly often.
 
 Tim
 
 r...@osl.iu.edu wrote:
> Author: rhc
> Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
> New Revision: 17941
> URL: https://svn.open-mpi.org/trac/ompi/changeset/17941
> 
> Log:
> Fix the allgather and allgather_list functions to avoid deadlocks at large
> node/proc counts. Violated the RML rules here - we received the allgather
> buffer and then did an xcast, which causes a send to go out, and is then
> subsequently received by the sender. This fix breaks that pattern by
> forcing
> the recv to complete outside of the function itself - thus, the allgather
> and
> allgather_list always complete their recvs before returning or sending.
> 
> Reogranize the grpcomm code a little to provide support for soon-to-come
> new
> grpcomm components. The revised organization puts what will be common code
> elements in the base to avoid duplication, while allowing components that
> don't need those functions to ignore them.
> 
> Added:
>trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
>trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
>trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
> Text files modified:
>trunk/orte/mca/grpcomm/base/Makefile.am| 5
>trunk/orte/mca/grpcomm/base/base.h |23 +
>trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4
>trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1
>trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---
>trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16
>trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -
>trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845
> ++-
>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8
>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21
>trunk/orte/mca/grpcomm/grpcomm.h   |45 +
>trunk/orte/mca/rml/rml_types.h |31
>trunk/orte/orted/orted_comm.c  |27 +
>14 files changed, 226 insertions(+), 959 deletions(-)
> 
> 
> Diff not shown due to size (92619 bytes).
> To see the diff, run the following command:
> 
> svn diff -r 17940:17941 --no-diff-deleted
> 
> ___
> svn mailing list
> s...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r17941

2008-03-27 Thread Ralph H Castain
Appears fixed with r17992 - at least, it works on TM, slurm (odin), and Mac.


On 3/27/08 11:06 AM, "Ralph H Castain"  wrote:

> Found the problem - should have a fix committed soon. Issue is with
> differences in the number of daemons launched by the various plms (whether
> or not procs are launched local to mpirun).
> 
> 
> 
> On 3/27/08 10:39 AM, "Ralph H Castain"  wrote:
> 
>> Hmmm...puzzling. It is working fine for me on TM machines and on my Mac.
>> However, Galen reports it borked on alps as well.
>> 
>> I'll have to dig a little to check this out and see if there is something
>> missing on those PLMs. Will get back shortly.
>> 
>> Sorry for problem
>> 
>> 
>> On 3/27/08 10:28 AM, "Tim Prins"  wrote:
>> 
>>> Unfortunately now with r17988 I cannot run any mpi programs, they seem
>>> to hang in the modex.
>>> 
>>> Tim
>>> 
>>> Ralph H Castain wrote:
 Thanks Tim - I found the problem and will commit a fix shortly.
 
 Appreciate your testing and reporting!
 
 
 On 3/27/08 8:24 AM, "Tim Prins"  wrote:
 
> This commit breaks things for me. Running on 3 nodes of odin:
> 
> mpirun -mca btl tcp,sm,self  examples/ring_c
> 
> causes a hang. All of the processes are stuck in
> orte_grpcomm_base_barrier during MPI_Finalize. Not all programs hang,
> and the ring program does not hang all the time, but fairly often.
> 
> Tim
> 
> r...@osl.iu.edu wrote:
>> Author: rhc
>> Date: 2008-03-24 16:50:31 EDT (Mon, 24 Mar 2008)
>> New Revision: 17941
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/17941
>> 
>> Log:
>> Fix the allgather and allgather_list functions to avoid deadlocks at
>> large
>> node/proc counts. Violated the RML rules here - we received the allgather
>> buffer and then did an xcast, which causes a send to go out, and is then
>> subsequently received by the sender. This fix breaks that pattern by
>> forcing
>> the recv to complete outside of the function itself - thus, the allgather
>> and
>> allgather_list always complete their recvs before returning or sending.
>> 
>> Reogranize the grpcomm code a little to provide support for soon-to-come
>> new
>> grpcomm components. The revised organization puts what will be common
>> code
>> elements in the base to avoid duplication, while allowing components that
>> don't need those functions to ignore them.
>> 
>> Added:
>>trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c
>>trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
>>trunk/orte/mca/grpcomm/base/grpcomm_base_modex.c
>> Text files modified:
>>trunk/orte/mca/grpcomm/base/Makefile.am| 5
>>trunk/orte/mca/grpcomm/base/base.h |23 +
>>trunk/orte/mca/grpcomm/base/grpcomm_base_close.c   | 4
>>trunk/orte/mca/grpcomm/base/grpcomm_base_open.c| 1
>>trunk/orte/mca/grpcomm/base/grpcomm_base_select.c  |   121 ++---
>>trunk/orte/mca/grpcomm/basic/grpcomm_basic.h   |16
>>trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c |30 -
>>trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c|   845
>> ++-
>>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos.h | 8
>>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_component.c   | 8
>>trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c  |21
>>trunk/orte/mca/grpcomm/grpcomm.h   |45 +
>>trunk/orte/mca/rml/rml_types.h |31
>>trunk/orte/orted/orted_comm.c  |27 +
>>14 files changed, 226 insertions(+), 959 deletions(-)
>> 
>> 
>> Diff not shown due to size (92619 bytes).
>> To see the diff, run the following command:
>> 
>> svn diff -r 17940:17941 --no-diff-deleted
>> 
>> ___
>> svn mailing list
>> s...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Switching away from SVN?

2008-03-27 Thread Tony Breeds
On Mon, Mar 24, 2008 at 04:00:18PM -0400, George Bosilca wrote:
> After playing with hg and git for few days, I tend to agree with the  
> emacs guys. It looks to me that any of them will do the job (as did  
> svn). I don't really care which one will be selected by the community  
> as long as we:
> 1. Don't spend months in deciding which one to choose.
> 2. Don't loose the nice integration o svn with our TRAC. Independent  
> on how good/fast the dVCS is, the way svn integrate with trac is a  
> real time saver. Tracking bugs, linking to revisions and to the wiki  
> are really important features to me, and I think that whatever our  
> decision will be we should not lose this.

For what it's worth I noticed this on one of the plents I read this
morning.  It looks like someone has already done the work to use git as
a backend for TRAC
http://www.terdmonk.com/using+git+as+a+trac+versioning+system+backend

Yours Tony

  linux.conf.auhttp://www.marchsouth.org/
  Jan 19 - 24 2009 The Australian Linux Technical Conference!