[OMPI devel] Trunk borked

2008-01-28 Thread Ralph H Castain
We seem to have a problem on the trunk this morning. I am building on a
platform with the following configuration:

with_threads=no
enable_dlopen=no
enable_pty_support=no
with_tm=/opt/PBS
LDFLAGS=-L/opt/PBS/lib64
with_openib=/opt/ofed
with_memory_manager=no
enable_mem_debug=yes
enable_mem_profile=no
enable_debug_symbols=yes
enable_binaries=yes
with_devel_headers=yes
enable_heterogeneous=no
enable_picky=yes

The compile errors out in the OpenIB BTL with the following error:

btl_openib_proc.c: In function `mca_btl_openib_proc_create':
btl_openib_proc.c:159: error: `i' undeclared (first use in this function)
btl_openib_proc.c:159: error: (Each undeclared identifier is reported only
once
btl_openib_proc.c:159: error: for each function it appears in.)
make[2]: *** [btl_openib_proc.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

When I look at the code, the problem is the following #if:

#if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT
size_t i;
#endif

Yet the code will ALWAYS use that variable to unpack all the ports. I
removed the #if to clear the problem, but before committing the change, I
wanted to ask why someone thought this test needed to be in the code.

Should the entire loop unpacking all the ports be similarly protected, or
was the protection around the variable declaration simply an error?

Thanks
Ralph




Re: [OMPI devel] Trunk borked

2008-01-28 Thread Adrian Knoth
On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote:

> We seem to have a problem on the trunk this morning. I am building on a

There are more errors:

/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
for member `__pos' in something not a structure or union
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos64':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
for member `__pos' in something not a structure or union
gmake[5]: *** [vt_iowrap.o] Error 1
gmake[5]: Leaving directory
`/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
for member `__pos' in something not a structure or union
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos64':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
for member `__pos' in something not a structure or union
gmake[5]: *** [vt_iowrap.o] Error 1
gmake[5]: Leaving directory
`/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'


Just my $0.02

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI devel] Trunk borked

2008-01-28 Thread Jeff Squyres

Doh -- sorry about that.  r17282 removes the erroneous #if.


On Jan 28, 2008, at 9:26 AM, Ralph H Castain wrote:

We seem to have a problem on the trunk this morning. I am building  
on a

platform with the following configuration:

with_threads=no
enable_dlopen=no
enable_pty_support=no
with_tm=/opt/PBS
LDFLAGS=-L/opt/PBS/lib64
with_openib=/opt/ofed
with_memory_manager=no
enable_mem_debug=yes
enable_mem_profile=no
enable_debug_symbols=yes
enable_binaries=yes
with_devel_headers=yes
enable_heterogeneous=no
enable_picky=yes

The compile errors out in the OpenIB BTL with the following error:

btl_openib_proc.c: In function `mca_btl_openib_proc_create':
btl_openib_proc.c:159: error: `i' undeclared (first use in this  
function)
btl_openib_proc.c:159: error: (Each undeclared identifier is  
reported only

once
btl_openib_proc.c:159: error: for each function it appears in.)
make[2]: *** [btl_openib_proc.lo] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

When I look at the code, the problem is the following #if:

#if !defined(WORDS_BIGENDIAN) && OMPI_ENABLE_HETEROGENEOUS_SUPPORT
   size_t i;
#endif

Yet the code will ALWAYS use that variable to unpack all the ports. I
removed the #if to clear the problem, but before committing the  
change, I

wanted to ask why someone thought this test needed to be in the code.

Should the entire loop unpacking all the ports be similarly  
protected, or

was the protection around the variable declaration simply an error?

Thanks
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Trunk borked

2008-01-28 Thread Jeff Squyres
Doh - this is Solaris on x86?  I think Terry said Solaris/sparc was  
tested...


VT guys -- can you check out what's going on?



On Jan 28, 2008, at 9:36 AM, Adrian Knoth wrote:


On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote:

We seem to have a problem on the trunk this morning. I am building  
on a


There are more errors:

/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
for member `__pos' in something not a structure or union
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos64':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
for member `__pos' in something not a structure or union
gmake[5]: *** [vt_iowrap.o] Error 1
gmake[5]: Leaving directory
`/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
for member `__pos' in something not a structure or union
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
`fsetpos64':
/tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
for member `__pos' in something not a structure or union
gmake[5]: *** [vt_iowrap.o] Error 1
gmake[5]: Leaving directory
`/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'


Just my $0.02

--
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems




[OMPI devel] vt Makefile.in's

2008-01-28 Thread Jeff Squyres

I see the following in SVN:

  ompi/contrib/vt/Makefile.in
  ompi/contrib/vt/wrappers/Makefile.in

I don't think that these should in SVN because there are corresponding  
Makefile.am's in those dirs.


I'll remove them and update svn:ignore in those dirs tonight (because  
removing them will cause everyone to re-autogen).


--
Jeff Squyres
Cisco Systems



[OMPI devel] VT in trunk + how to disable

2008-01-28 Thread Andreas Knüpfer
Hi everybody,

the vampirtrace integration arrived at the trunk today. There seems to be one 
issue already, but we'll fix this asap.

As a general hint, this is how to completely disable anything we integrated:

configure --enable-contrib-no-build=vt ...

Then again, we'd like to see all the issues you may encounter and fix them.

Best regards, Andreas

-- 
Dipl. Math. Andreas Knuepfer, 
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A114, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-38323, fax +49-351-463-37773


pgp5Xd0iiL0dD.pgp
Description: PGP signature


Re: [OMPI devel] Trunk borked

2008-01-28 Thread Matthias Jurenz
Hello,

this problem should be fixed now...
It seems that the symbol '__pos' is not available on every platform.
This isn't a problem, because
it's only used for a debug control message. 

Regards,
Matthias


On Mo, 2008-01-28 at 09:41 -0500, Jeff Squyres wrote:

> Doh - this is Solaris on x86?  I think Terry said Solaris/sparc was  
> tested...
> 
> VT guys -- can you check out what's going on?
> 
> 
> 
> On Jan 28, 2008, at 9:36 AM, Adrian Knoth wrote:
> 
> > On Mon, Jan 28, 2008 at 07:26:56AM -0700, Ralph H Castain wrote:
> >
> >> We seem to have a problem on the trunk this morning. I am building  
> >> on a
> >
> > There are more errors:
> >
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
> > `fsetpos':
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
> > for member `__pos' in something not a structure or union
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
> > `fsetpos64':
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
> > for member `__pos' in something not a structure or union
> > gmake[5]: *** [vt_iowrap.o] Error 1
> > gmake[5]: Leaving directory
> > `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
> > `fsetpos':
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:850: error: request
> > for member `__pos' in something not a structure or union
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c: In function
> > `fsetpos64':
> > /tmp/ompi/src/ompi/contrib/vt/vt/vtlib/vt_iowrap.c:876: error: request
> > for member `__pos' in something not a structure or union
> > gmake[5]: *** [vt_iowrap.o] Error 1
> > gmake[5]: Leaving directory
> > `/tmp/ompi/build/SunOS-i86pc/ompi/ompi/contrib/vt/vt/vtlib'
> >
> >
> > Just my $0.02
> >
> > -- 
> > Cluster and Metacomputing Working Group
> > Friedrich-Schiller-Universität Jena, Germany
> >
> > private: http://adi.thur.de
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 

--
Matthias Jurenz,
Center for Information Services and 
High Performance Computing (ZIH), TU Dresden, 
Willersbau A106, Zellescher Weg 12, 01062 Dresden
phone +49-351-463-31945, fax +49-351-463-37773


smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] dropping a pls module into an Open MPI build

2008-01-28 Thread Jeff Squyres
One thing you might check if you suspect compiler alignment issues is  
running "ompi_info --all" and see what Apple used to configure/build  
OMPI.  We save the CFLAGS and whatnot; they may be helpful to you...?


I see on my MBP/Leopard 10.5.1, for example:

 C compiler absolute: /usr/bin/gcc
...
Build CFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions - 
fno-strict-aliasing

  Build CXXFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions
Build FFLAGS:
   Build FCFLAGS:
   Build LDFLAGS: -export-dynamic   -Wl,-u,_munmap -Wl,- 
multiply_defined,suppress

  Build LIBS: -lutil
Wrapper extra CFLAGS:
  Wrapper extra CXXFLAGS:
Wrapper extra FFLAGS:
   Wrapper extra FCFLAGS:
   Wrapper extra LDFLAGS:   -Wl,-u,_munmap -Wl,- 
multiply_defined,suppress

  Wrapper extra LIBS:  -lutil

I'll *guess* that the -Wl options came from OMPI's normal configure  
script.  But the -arch and -f might have come from Apple...?


That being said, I'm *not* sure how this information relates to the  
universal binaries...  It *may* be that you'll see the different  
options for the different architectures depending on which machine you  
run "ompi_info" on...?  I don't know enough about how universal  
binaries are built or run to know.




On Jan 24, 2008, at 1:12 PM, Ralph H Castain wrote:


Appreciate the clarification. I am unaware of anyone attempting that
procedure in the past, but I'm not terribly surprised to hear it would
encounter problems and/or fail. Given the myriad of configuration  
options in
the code base, it would seem almost miraculous that you could either  
(a) hit
the same config options used by Apple (whatever they were), or (b)  
manage to
find a combination that matched enough to let you do this without  
problem.


Frankly, I'm surprised even this small a fix would let you work  
around the

problems... ;-)

Unless you have some overriding reason to use the shipped binaries for
everything other than this special component, you're probably going  
to have

a lot more success just rebuilding from source.

But that's just an opinion - either way, good luck with your efforts!
Ralph


On 1/24/08 10:54 AM, "Dean Dauger, Ph. D."   
wrote:



I'm sorry, but now I am totally confused. Are you saying that you
are having
problems with the default rsh component in the distributed 1.2.3
code??


Yes ...


Or are you having a problem with your customized version?


and yes.  Each exhibited the same problem, a bus error.


What compiler are you using? If it's your customized version, did
you make sure to change the
names of the data structures and modules as I pointed out?


gcc 4.0.1, the default of Leopard.  Yes, in the customized version, I
did change the names of the data structures, subroutines, support
file names, and where it says "rsh" just like you said.


We regularly work on Macs, both PPC and Intel based (I develop and
test on
both every day), and I have -never- seen this problem in our code
base.
Hence my confusion.


I'm sorry to confuse.  I'm starting with the shipping Mac OS X 10.5.1
"Leopard", which contains its own build of Open MPI (v1.2.3 according
to "orterun -version").  So I assumed that the v1.2.3 branch from
svn.open-mpi.org was the same code Apple used to build the Open MPI
that ships in Leopard.

My motivation was to build a new pls module based on pls_rsh module's
source code, substituting the rsh with my own name like you said, but
I encountered a bus error.  So to be sure I didn't screw up somewhere
in my custom module I rebuilt the unmodified pls_rsh module and
discovered the same problem.

Then, after downloading the Open MPI from opensource.apple.com
(suspecting it was different), I tried recompiling the pls_rsh module
from that source code, dropped in just the resulting mca_pls_rsh.la
and mca_pls_rsh.so into the existing /usr/lib/openmpi of Leopard,
overwriting Leopard's versions, and the bus error happened the same
as before.

That's where I was with my first post to this list.

My last post regards the discovery that rearranging the elements of
orte_pls_rsh_component_t, without changing anything else about the
pls_rsh code, affects the bus error outcome.  Then I padded out
orte_pls_rsh_component_t and my "orte_pls_dean_component_t" by hand
so that it would be "data alignment agnostic", if you will.
Consequently the bus error no longer occurs and both pls modules now
run as they should.

My hypothesis: Apple's procedure to build Open MPI into Leopard had a
side effect requiring shared object code structures to follow a data
alignment different than if I simply recompile Open MPI straight from
its source.

I'm not saying anyone is to blame, but I'm recognizing that those
builds have different timelines.  I predict that if I overwrite all
of Leopard's Open MPI object code, then it would all run too.

For my needs, I have a sufficient workaround: realign my data
structures to be "agnostic".  I'm sharing this littl

[OMPI devel] Configure error/warning in nightly tarball

2008-01-28 Thread Josh Hursey
I noticed that when running configure on the nightly snapshot tarball  
the following errors (warnings really, since it didn't stop  
configure) were produced. These seem to be remnants from the  
autogen.sh script pointing to files that do not (and should not)  
exist in the distribution.


-
shell$ ./configure --prefix=/foo/bar/
...
grep: ./orte/mca/gpr/proxy/configure.params: No such file or directory
grep: ./orte/mca/gpr/replica/configure.params: No such file or directory
grep: ./orte/mca/gpr/null/configure.params: No such file or directory
-

Any thoughts on how to fix this? I was using the r17175 nightly tarball.

Cheers,
Josh