[OMPI devel] 3rd party code contributions

2008-02-08 Thread Ralph Castain
I thought maybe we should move this to another thread as it really isn't
about Torsten's specific RFC.

I just took a quick gander at the code base to see how extensive this
problem might really be per Terry's concern. What I found was that we have
added 3rd party code in several places. How we want to define them in terms
of this issue is probably something for discussion.

Packages I could readily identify include:

1. event library
2. ROMIO
3. VT
4. backtrace
5. PLPA - this one is a little less obvious, but still being released as a
separate package
6. libNBC

There may well be others - these are only the ones I know about. By 3rd
party package, I mean these are blocks of code obtained as a complete,
distinct version and "dropped in" to the OMPI code repository, and then to
some degree tied into our build system. They are not code specifically
developed for OMPI by OMPI developers.

We have already discussed the issues with this approach. I am particularly
concerned with the maintenance and release cycle issues right now.

If these packages could be linked to our code instead of embedded within it,
then it seems to me that updating them could become much easier. For
example, we could download and install the latest ROMIO + Panasas patch,
compile it, and simply link it into libompi - without occupying someone with
constantly fixing the build system issues, etc.

Obviously, I don't claim to know enough about what was done to integrate
ROMIO to know if this would easily work. I only use it to illustrate the
point - the same could be said about the event library, for example.

Given our maintenance support problems, it would seem to me that changing
the way we do 3rd party packaging may be worth consideration and some
effort. I can't prioritize that relative to 1.3, though I do note that, from
LANL's perspective, the ROMIO issue is a definite blocker for 1.3 release.

Ralph


> Subject: Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to
> trunk
> From: Terry Dontje (Terry.Dontje_at_[hidden])
> Date: 2008-02-07 13:18:36
> 
> Jeff, the below sounds good if we really believe there is going to be a
> whole bunch of addons. I am not sure NBC really constitute as an addon
> than more some research work that might become an official API. So I
> look at the NBC stuff more like a BTL or PM that is in progress of being
> developed/refined for prime time. So would a new PM or BTL be added via
> ompi_contrib? I wouldn't think they would.
> 
> The ompi_contrib sounds like a nice utility but I have feeling there are
> bigger fish to fry unless we really believe there will be a lot of
> addons that we will need to support.
> 
> --td
> 
> Jeff Squyres wrote:
>> All these comments are good. I confess that although I should have, I
>> really did not previously consider the complexity of adding in N
>> contrib packages to OMPI.
>> 
>> The goal of the contrib packages is to easily allow additional
>> functionality that is nicely integrated with Open MPI. An obvious way
>> to do this is to include the code in the Open MPI tarball, but that
>> leads to the logistics and other issues that have been identified.
>> 
>> Ralph proposes a good way around this. But what about going farther
>> than that: what we if we offer a standardized set of hooks for
>> including contrib functionality *after* core OMPI has been installed?
>> Yes, it's one more step after OMPI has been installed -- but if we can
>> keep it as *one* step, perhaps the user onus is not that bad. Let me
>> explain.
>> 
>> Consider a new standalone executable: ompi_contrib. You would run
>> ompi_contrib to install and uninstall contrib functionality into your
>> existing OMPI:
>> 
>> ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz
>> or ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz
>> 
>> This will download NBC (if http), build it, and install it into the
>> current OMPI. It is likely that the nbc-ompi-contrib.tar.gz file will
>> contain the real NBC tarball (or maybe just a reference to it?) plus a
>> small number of hook/glue scripts for OMPI integration (perhaps quite
>> similar to what is in the contrib/ tree [on the branch] today for
>> NBC?). Likewise, after NBC is installed into the local OMPI
>> installation, ompi_info should be able to show "nbc" as installed
>> contrib functionality. It then follows that we might be able to do:
>> 
>> ompi_contrib --uninstall nbc
>> 
>> to uninstall contrib NBC from the local OMPI installation.
>> 
>> This kind of approach would seem to have several benefits:
>> 
>> - Keep a clear[er] distinction between core OMPI and contributed
>> packages.
>> 
>> - Allow simple integration of MPI libraries, tools, and even
>> applications (!) (think: numerical libraries, boost C++ libraries,
>> etc. -- how many of your users install additional tools on top of MPI
>> incorrectly?). Anything
>> 
>> - Allow 3rd parties to have "contrib" code to Open MPI without needing
>> to get into o

Re: [OMPI devel] ROMIO

2008-02-08 Thread Jeff Squyres
I know that Argonne was engaged at some level to help with the OMPI  
ROMIO integration -- was it on a formal or informal basis?



On Feb 7, 2008, at 12:02 PM, Ralph H Castain wrote:


I just -know- this is everyone's favorite subject, but...

Brian used to take care of the ROMIO code in Open MPI, but he has  
now moved
on to greener, happier pastures. As he left, he did raise the  
question of

who was going to maintain ROMIO, which we all happily dodged.

I raise this question again because I have been informed that a new  
ROMIO
patch may have come out that is required for Panasas support. I  
don't know
enough myself to verify this situation, but it did raise the flag  
about who
is going to track and support this area of our code, especially  
since we now

may have to do something in that area for 1.3.

Any great thoughts?
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] 3rd party code contributions

2008-02-08 Thread Jeff Squyres

On Feb 8, 2008, at 10:38 AM, Ralph Castain wrote:

I thought maybe we should move this to another thread as it really  
isn't

about Torsten's specific RFC.

I just took a quick gander at the code base to see how extensive this
problem might really be per Terry's concern. What I found was that  
we have
added 3rd party code in several places. How we want to define them  
in terms

of this issue is probably something for discussion.

Packages I could readily identify include:

1. event library
4. backtrace
5. PLPA - this one is a little less obvious, but still being  
released as a

separate package


FWIW, these packages are part of "core" OMPI and are not especially  
problematic.  We upgrade them when we have a need or desire to (which  
has been low frequency); we don't try to stay in sync with their  
release schedules at all.



2. ROMIO


ROMIO has traditionally been a problem (keeping up with its releases  
and patches).  We have long-since agreed that we definitely want to  
include ROMIO in our tarball, even though that presents challenges.   
One thing that makes it *slightly* easier is that Brian added the  
mechanics for OMPI to use a ROMIO that is outside of Open MPI rather  
than the one that is bundled with it.  It's not a perfect solution,  
but it does help some.



3. VT
6. libNBC


These two are definitely in the "contrib" category.

There may well be others - these are only the ones I know about. By  
3rd

party package, I mean these are blocks of code obtained as a complete,
distinct version and "dropped in" to the OMPI code repository, and  
then to

some degree tied into our build system. They are not code specifically
developed for OMPI by OMPI developers.


Those are all that I'm aware of.

We have already discussed the issues with this approach. I am  
particularly

concerned with the maintenance and release cycle issues right now.

If these packages could be linked to our code instead of embedded  
within it,

then it seems to me that updating them could become much easier. For
example, we could download and install the latest ROMIO + Panasas  
patch,
compile it, and simply link it into libompi - without occupying  
someone with

constantly fixing the build system issues, etc.


FWIW:

- event,backtrace,PLPA,ROMIO are included in OMPI because we wanted to  
certify them as part of "core" OMPI.  That is, we wanted to certify  
the whole system (vs. relying on [untested] combinations of versions  
that already exist on users' systems).


- ROMIO is likely the only one of that group that presents ongoing  
logistics problems.  The mechanism Brian added was seen as a  
workaround.  Argonne will definitely need to be involved at some level  
to improve the ROMIO integration.  Some talks started between Brian,  
me, and Rob(ANL) about a) making our integration better/easier, and b)  
having access to the ROMIO SVN to be able to suck down releases when  
we want to, but they kinda tapered off (Brian left and I got other  
priorities).  There was also talk of LANL maintaining its own ROMIO  
tree and pushing it into OMPI, but I don't know what happened there.   
I can help with part of the ROMIO make-the-integration-easier (not in  
the immediate future, though -- probably not for a few weeks), but I  
do not think that I can do it on an ongoing basis.  Note, too, that  
ROMIO is no longer distributed as a separate package -- it's only  
included in MPICH2.  So it's a little harder to just link against a  
ROMIO that is already installed on a system -- there won't be one that  
isn't already bundled with an MPI.


- vt and libnbc are a different category; they are add-on  
functionality, not "core" OMPI.


Obviously, I don't claim to know enough about what was done to  
integrate
ROMIO to know if this would easily work. I only use it to illustrate  
the

point - the same could be said about the event library, for example.

Given our maintenance support problems, it would seem to me that  
changing

the way we do 3rd party packaging may be worth consideration and some
effort. I can't prioritize that relative to 1.3, though I do note  
that, from
LANL's perspective, the ROMIO issue is a definite blocker for 1.3  
release.


Hmm.  This is odd because of the prior statements about ROMIO from  
LANL (that LANL was going to maintain ROMIO and push it into OMPI).   
I'm assuming that's changed?


If ROMIO is a v1.3 blocker for LANL, can LANL commit resources to  
fixing the problem?


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] 3rd party code contributions

2008-02-08 Thread Brian W. Barrett

On Fri, 8 Feb 2008, Ralph Castain wrote:


1. event library
2. ROMIO
3. VT
4. backtrace
5. PLPA - this one is a little less obvious, but still being released as a
separate package
6. libNBC


Sorry to Ralph, but I clipped everything from his e-mail, then am going to 
make references to it.  oh well :).


One minor correction -- the entire backtrace framework is not a third 
party deal.  The *DARWIN/Mac OS X* component relies heavily on third party 
code, but the others (Linux and Solaris) are just wrappers around code in 
their respective C libraries.


I believe I was responsible for the event library, ROMIO, and backtrace 
before leaving LANL.  I'll go through the motivations and issues with all 
three in terms of integration.


Event Library: The event library is the core "rendezvous" point for all of 
Open MPI, so any issues with it cause lots of issues with Open MPI in 
general.  We've also hacked it considerably since taking the original 
libevent source -- we've renamed all the functions, we've made it thread 
safe in a way the author was unwilling to do, we've fixed some performance 
issues unique to our usage model.  In short, this is no longer really the 
same libevent that might already be installed on the system.  Using such 
an unmodified libevent would be disasterous.


ROMIO is actually one that there was significant discussion about prior to 
me leaveing Los Alamos.  There are a number of problems / issues with 
ROMIO.  First and foremost, without ROMIO, we are not a fully compliant 
MPI implementation.  So we have to ship ROMIO -- it's the only way to have 
that important check mark.  But its current integration has some issues -- 
it's hard to test patches independently.  There is actually a mode in the 
current Open MPI tree where the MPI interface to MPI-I/O is not provided 
by OPen MPI and no io components are built.  This is to allow users to 
build ROMIO independently of Open MPI, for testing updates or whatever. 
There are some disadvantages to this.  First, the independent ROMIO will 
use generalized requests instead of being hooked into our progress engine, 
so there may be some progress issues (I never verified either way). 
Second, it does mean dealing with another package to build on the user's 
site.  Jeff is correct --there was discussion about how to make the 
integration "better" -- many of the changes were on our side, and we were 
going to have to ask for a couple of changes from Argonne.  If someone is 
going to put in the considerable amount of time to make this happen, I'm 
happy to write up whatever notes I can remember / find on the issue.


The Darwin backtrace component is mostly maintanance free.  It doesn't 
support 64-bit Intel chips, but that's fine.  Once every 18 months or so, 
I need to get a new copy for the latest operation system, although the 
truth is I don't think anything bad happens if we just stop doing the 
updates at OS release (by the way, I did the one for Leopard, so we're 
probably all going to be sick of MPI and on to other things before the 
next time it has to be done).  While it's useful, if the community is 
really worried, it could probably be deleted.  But having a stack trace 
when you segfault sure is nice :).


Brian




[OMPI devel] Datasize confusion in MPI_Write can lead to data los!

2008-02-08 Thread Christoph Niethammer
Hello!

I tested openMPI at HLRS for some time without detecting new problems in the 
implementation but now I recognized some awful ones with MPI_Write which can 
lead to data los:

When creating a struct for a mixed datatype like

struct {
  short a;
  int b;
}

the C-compiler introduce a gap of 2 bytes in the data representation for this 
type due to the 4byte alignment of the integer on 32bit systems.

If I now try to use MPI_File_write to write these data to a file and use 
MPI_SHORT_INT as mpi_datatype this leads to a data los.

I located the problem at the combined use of "write" and MPI_Type_size in 
MPI_File_write.
So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct uses 8 bytes 
in memory as there is a gap of 2 bytes. The write function in ad_write.c now 
leads to the los of the data because the gaps are not within the calculation 
of the complete data size to be written into the file.

This problem occures also in the other io functions.
As far as I could find out the problem seems not to be present with derived 
data types.

The question is now how to "fix":
i) Either the MPI_Standard is not clear in this point and the data types 
MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with 
structs of these types,
ii) Or the implementation of the MPI_Type_size function has to be modified to 
return the value of eg. true_ub which contains the correct value
iii) Or the MPI_File_write function has not to use the write function in 
the "continues" way on the data and should take care of the gaps.

Regards 

Christoph Niethammer


signature.asc
Description: This is a digitally signed message part.


Re: [OMPI devel] [RFC] Non-blocking collectives (LibNBC) merge to trunk

2008-02-08 Thread Jeff Squyres
Terry -- I reluctantly agree.  :-)  What I envision is not difficult  
(a first cut/feature-lean version is probably only several hundred  
lines of perl?), but I don't have the cycles (at present) to implement  
it -- my priorities are elsewhere at the moment.


If anyone is interested in this, I would gladly talk them through what  
[I think] needs to be done.


That being said, for NBC, per Terry's points:

- if it's not compiled/installed by default
- if we can make a big enough red flag for users that it's an R&D  
effort that is subject to change (perhaps 3'x5'?)


Then I think it would not be a bad thing to include NBC.  But then I  
think we need to disallow any other contrib/ projects until someone  
can find the cycles to implement a better solution (such as an  
ompi_contrib executable/system).




On Feb 7, 2008, at 1:18 PM, Terry Dontje wrote:

Jeff, the below sounds good if we really believe there is going to  
be a
whole bunch of addons.  I am not sure NBC really constitute as an  
addon

than more some research work that might become an official API.  So I
look at the NBC stuff more like a BTL or PM that is in progress of  
being
developed/refined for prime time.  So would a new PM or BTL be added  
via

ompi_contrib?  I wouldn't think they would.

The ompi_contrib sounds like a nice utility but I have feeling there  
are

bigger fish to fry unless we really believe there will be a lot of
addons that we will need to support.

--td

Jeff Squyres wrote:
All these comments are good.  I confess that although I should  
have, I

really did not previously consider the complexity of adding in N
contrib packages to OMPI.

The goal of the contrib packages is to easily allow additional
functionality that is nicely integrated with Open MPI.  An obvious  
way

to do this is to include the code in the Open MPI tarball, but that
leads to the logistics and other issues that have been identified.

Ralph proposes a good way around this.  But what about going farther
than that: what we if we offer a standardized set of hooks for
including contrib functionality *after* core OMPI has been installed?
Yes, it's one more step after OMPI has been installed -- but if we  
can

keep it as *one* step, perhaps the user onus is not that bad.  Let me
explain.

Consider a new standalone executable: ompi_contrib.  You would run
ompi_contrib to install and uninstall contrib functionality into your
existing OMPI:

ompi_contrib --install http://www.example.com/nbc/nbc-ompi-contrib.tar.gz
or  ompi_contrib --install file:///home/htor/nbc-ompi-contrib.tar.gz

This will download NBC (if http), build it, and install it into the
current OMPI.  It is likely that the nbc-ompi-contrib.tar.gz file  
will
contain the real NBC tarball (or maybe just a reference to it?)  
plus a

small number of hook/glue scripts for OMPI integration (perhaps quite
similar to what is in the contrib/ tree [on the branch] today for
NBC?).  Likewise, after NBC is installed into the local OMPI
installation, ompi_info should be able to show "nbc" as installed
contrib functionality.  It then follows that we might be able to do:

ompi_contrib --uninstall nbc

to uninstall contrib NBC from the local OMPI installation.

This kind of approach would seem to have several benefits:

- Keep a clear[er] distinction between core OMPI and contributed
packages.

- Allow simple integration of MPI libraries, tools, and even
applications (!) (think: numerical libraries, boost C++ libraries,
etc. -- how many of your users install additional tools on top of MPI
incorrectly?).  Anything

- Allow 3rd parties to have "contrib" code to Open MPI without  
needing

to get into our code tree (and sign the 3rd party agreements, etc.),
keeping our distribution size down, avoiding release schedule
logistical issues, keeping our "core" build time down, etc.

- Allow integration of contrib functionality at both a per-user and
system-wide basis.

What I'm really proposing here is that OMPI becomes a system that can
have additional functionality installed / uninstalled.  Based on the
infrastructure that we already have, this is not as much of a stretch
as one would think.

Comments?

("who's going to write this" is a question that will also have to be
answered, but perhaps we can discuss the code concept/idea first...)



On Feb 7, 2008, at 10:11 AM, Ralph H Castain wrote:



I believe Brian and Terry raise good points. May I offer a possible
alternative? What if we only include in Open MPI an include file  
that

contains the "hooks" to libNBC, and have the build system only "see"
those
if someone specifies --with-NBC (or whatever option name you like).
If you
like, you can make the inclusion automatic if libNBC is detected on
the
system. It would make sense to also add -libNBC to the mpicc et al
wrappers
as well when the build system includes the function definitions.

This would allow those users that want (or can) to use that library
link
against it, without adding a bunch of so

Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!

2008-02-08 Thread George Bosilca
MPI_Type_size is supposed to return only the size of useful data,  
which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it  
happens is that the MPI_SHORT_INT type is a predefined one, but it's a  
really strange predefined type. It's one of the few that are not  
contiguous. The problem seems to come from the fact that the  
MPI_File_write do a contiguous write for the predefined data types,  
making the assumption that they are all contiguous.


I tracked the problem down in the romio/adio/common/is_contig.c file.  
For Open MPI the last #else branch is used. The first case in the  
switch check for the MPI_COMBINER_NAMED (which is what an MPI is  
supposed to return for predefined data types) and set the flag to 1  
(which means contiguous). This is obviously wrong for MPI_SHORT_INT.  
It really look like a ROMIO problem, so I guess this email should be  
redirected to their mailing list.


  Thanks,
george.

On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:


Hello!

I tested openMPI at HLRS for some time without detecting new  
problems in the
implementation but now I recognized some awful ones with MPI_Write  
which can

lead to data los:

When creating a struct for a mixed datatype like

struct {
 short a;
 int b;
}

the C-compiler introduce a gap of 2 bytes in the data representation  
for this

type due to the 4byte alignment of the integer on 32bit systems.

If I now try to use MPI_File_write to write these data to a file and  
use

MPI_SHORT_INT as mpi_datatype this leads to a data los.

I located the problem at the combined use of "write" and  
MPI_Type_size in

MPI_File_write.
So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct  
uses 8 bytes
in memory as there is a gap of 2 bytes. The write function in  
ad_write.c now
leads to the los of the data because the gaps are not within the  
calculation

of the complete data size to be written into the file.

This problem occures also in the other io functions.
As far as I could find out the problem seems not to be present with  
derived

data types.

The question is now how to "fix":
i) Either the MPI_Standard is not clear in this point and the data  
types

MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with
structs of these types,
ii) Or the implementation of the MPI_Type_size function has to be  
modified to

return the value of eg. true_ub which contains the correct value
iii) Or the MPI_File_write function has not to use the write  
function in

the "continues" way on the data and should take care of the gaps.

Regards

Christoph Niethammer
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!

2008-02-08 Thread Rainer Keller
Hi George,
Good, if You come to the same conclusion with regard to romio using 
MPI_Type_size internally in RomIO...


So taking iscontig.c ,-]
/* This function needs more work. It should check for contiguity
   in other cases as well.*/
and mail to the romio list or have a specialized version of 
ADIOI_Datatype_iscontig for ompi ,-]

Either way, the mpi_test_suite in that regard is sane.


Thanks,
Rainer


On Friday 08 February 2008 18:22, George Bosilca wrote:
> MPI_Type_size is supposed to return only the size of useful data,
> which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
> happens is that the MPI_SHORT_INT type is a predefined one, but it's a
> really strange predefined type. It's one of the few that are not
> contiguous. The problem seems to come from the fact that the
> MPI_File_write do a contiguous write for the predefined data types,
> making the assumption that they are all contiguous.
>
> I tracked the problem down in the romio/adio/common/is_contig.c file.
> For Open MPI the last #else branch is used. The first case in the
> switch check for the MPI_COMBINER_NAMED (which is what an MPI is
> supposed to return for predefined data types) and set the flag to 1
> (which means contiguous). This is obviously wrong for MPI_SHORT_INT.
> It really look like a ROMIO problem, so I guess this email should be
> redirected to their mailing list.
>
>Thanks,
>  george.
>
> On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:
> > Hello!
> >
> > I tested openMPI at HLRS for some time without detecting new
> > problems in the
> > implementation but now I recognized some awful ones with MPI_Write
> > which can
> > lead to data los:
> >
> > When creating a struct for a mixed datatype like
> >
> > struct {
> >  short a;
> >  int b;
> > }
> >
> > the C-compiler introduce a gap of 2 bytes in the data representation
> > for this
> > type due to the 4byte alignment of the integer on 32bit systems.
> >
> > If I now try to use MPI_File_write to write these data to a file and
> > use
> > MPI_SHORT_INT as mpi_datatype this leads to a data los.
> >
> > I located the problem at the combined use of "write" and
> > MPI_Type_size in
> > MPI_File_write.
> > So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
> > uses 8 bytes
> > in memory as there is a gap of 2 bytes. The write function in
> > ad_write.c now
> > leads to the los of the data because the gaps are not within the
> > calculation
> > of the complete data size to be written into the file.
> >
> > This problem occures also in the other io functions.
> > As far as I could find out the problem seems not to be present with
> > derived
> > data types.
> >
> > The question is now how to "fix":
> > i) Either the MPI_Standard is not clear in this point and the data
> > types
> > MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used with
> > structs of these types,
> > ii) Or the implementation of the MPI_Type_size function has to be
> > modified to
> > return the value of eg. true_ub which contains the correct value
> > iii) Or the MPI_File_write function has not to use the write
> > function in
> > the "continues" way on the data and should take care of the gaps.
> >
> > Regards
> >
> > Christoph Niethammer
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 

Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
 HLRS  Tel: ++49 (0)711-685 6 5858
 Nobelstrasse 19  Fax: ++49 (0)711-685 6 5832
 70550 Stuttgartemail: kel...@hlrs.de 
 Germany AIM/Skype:rusraink


Re: [OMPI devel] 3rd party code contributions

2008-02-08 Thread Ralph H Castain
I'm going to "re-integrate" Jeff and Brian's comments into one reponse.

I have no problem with either of their observations. I only included the
event library, backtrace, and PLPA in my list for completeness. I expected
we would continue to treat those as we are, recognizing that this means
-someone- is going to have to step up to support those when we need to
update them. In the event library case, I know people have talked about a
major change coming soon - a release that has significant improvement we may
care about. Not sure when that might happen, or who is going to do that
integration.

As to ROMIO: as with many of the community's "planned" contributions, they
have tended to fade with time and personnel turnover. At this time, there is
no way LANL could support a ROMIO integration without a significant delay to
the proposed 1.3 release schedule. Not that such a delay particularly
bothers me - I don't see a pressing need to just throw something out there,
and I have been beaten severely around the neck-and-shoulders the last two
days about how out of date our ROMIO version is, and that it lacks a
critical Panasas patch that is severely impacting performance.

I'll continue to talk to people here about possibly getting help with ROMIO.
I don't know the prospects, but it will take some time for someone to become
familiar enough with our code base/build system to make a real contribution.
Alternatively, -I- may have to take this on, which will definitely delay the
1.3 RTE work, effectively just transferring the "blocker" from one part of
the code to another. ;-)

But we can deal with that on a separate thread. For now, I think Jeff's last
response to the other thread is where we are converging: delay work on a 3rd
party contribution system until we have more cycles, but don't bring more
3rd party code (post-libNBC) in until we have a better mechanism.

Ralph


On 2/8/08 9:06 AM, "Jeff Squyres"  wrote:

> On Feb 8, 2008, at 10:38 AM, Ralph Castain wrote:
> 
>> I thought maybe we should move this to another thread as it really
>> isn't
>> about Torsten's specific RFC.
>> 
>> I just took a quick gander at the code base to see how extensive this
>> problem might really be per Terry's concern. What I found was that
>> we have
>> added 3rd party code in several places. How we want to define them
>> in terms
>> of this issue is probably something for discussion.
>> 
>> Packages I could readily identify include:
>> 
>> 1. event library
>> 4. backtrace
>> 5. PLPA - this one is a little less obvious, but still being
>> released as a
>> separate package
> 
> FWIW, these packages are part of "core" OMPI and are not especially
> problematic.  We upgrade them when we have a need or desire to (which
> has been low frequency); we don't try to stay in sync with their
> release schedules at all.
> 
>> 2. ROMIO
> 
> ROMIO has traditionally been a problem (keeping up with its releases
> and patches).  We have long-since agreed that we definitely want to
> include ROMIO in our tarball, even though that presents challenges.
> One thing that makes it *slightly* easier is that Brian added the
> mechanics for OMPI to use a ROMIO that is outside of Open MPI rather
> than the one that is bundled with it.  It's not a perfect solution,
> but it does help some.
> 
>> 3. VT
>> 6. libNBC
> 
> These two are definitely in the "contrib" category.
> 
>> There may well be others - these are only the ones I know about. By
>> 3rd
>> party package, I mean these are blocks of code obtained as a complete,
>> distinct version and "dropped in" to the OMPI code repository, and
>> then to
>> some degree tied into our build system. They are not code specifically
>> developed for OMPI by OMPI developers.
> 
> Those are all that I'm aware of.
> 
>> We have already discussed the issues with this approach. I am
>> particularly
>> concerned with the maintenance and release cycle issues right now.
>> 
>> If these packages could be linked to our code instead of embedded
>> within it,
>> then it seems to me that updating them could become much easier. For
>> example, we could download and install the latest ROMIO + Panasas
>> patch,
>> compile it, and simply link it into libompi - without occupying
>> someone with
>> constantly fixing the build system issues, etc.
> 
> FWIW:
> 
> - event,backtrace,PLPA,ROMIO are included in OMPI because we wanted to
> certify them as part of "core" OMPI.  That is, we wanted to certify
> the whole system (vs. relying on [untested] combinations of versions
> that already exist on users' systems).
> 
> - ROMIO is likely the only one of that group that presents ongoing
> logistics problems.  The mechanism Brian added was seen as a
> workaround.  Argonne will definitely need to be involved at some level
> to improve the ROMIO integration.  Some talks started between Brian,
> me, and Rob(ANL) about a) making our integration better/easier, and b)
> having access to the ROMIO SVN to be able to suck down releases when
> we want to, 

[OMPI devel] PML V will be enabled again

2008-02-08 Thread Aurélien Bouteiller

Hi everyone,

All the problems detected last time PML V has been enabled in trunk  
have been fixed.  We invite you to give it a try (add a .ompi_unignore  
in ompi/mca/pml/v) on your favorite platform and compilation options  
and report any issues you may encounter. If none are detected, we plan  
to remove the ignore tag on wed. feb. 6.


Thanks,
Aurelien


--
Dr. Aurélien Bouteiller
Sr. Research Associate - Innovative Computing Laboratory
Suite 350, 1122 Volunteer Boulevard
Knoxville, TN 37996
865 974 6321







Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!

2008-02-08 Thread George Bosilca
Here is sketch of a ROMIO patch for Open MPI. I just wrote it, I  
didn't had time to test it. If you can test it please let me know if  
this solve the problem.


  Thanks,
george.

Index: iscontig.c
===
--- iscontig.c  (revision 17399)
+++ iscontig.c  (working copy)
@@ -58,6 +58,20 @@
 *flag = MPI_SGI_type_is_contig(datatype) && (displacement == 0);
 }

+#elif defined(OMPI_MPI_H)
+
+#include "ompi/datatype/datatype.h"
+
+void ADIOI_Datatype_iscontig(MPI_Datatype datatype, int *flag)
+{
+/*
+ * Open MPI contiguous check return true for datatype with
+ * gaps in the beginning and at the end. We have to provide
+ * a count of 2 in order to get these gaps taken into acount.
+ */
+*flag = ompi_ddt_is_contiguous_memory_layout( datatype, 2);
+}
+
 #else


On Feb 8, 2008, at 12:26 PM, Rainer Keller wrote:


Hi George,
Good, if You come to the same conclusion with regard to romio using
MPI_Type_size internally in RomIO...


So taking iscontig.c ,-]
   /* This function needs more work. It should check for contiguity
  in other cases as well.*/
and mail to the romio list or have a specialized version of
ADIOI_Datatype_iscontig for ompi ,-]

Either way, the mpi_test_suite in that regard is sane.


Thanks,
Rainer


On Friday 08 February 2008 18:22, George Bosilca wrote:

MPI_Type_size is supposed to return only the size of useful data,
which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
happens is that the MPI_SHORT_INT type is a predefined one, but  
it's a

really strange predefined type. It's one of the few that are not
contiguous. The problem seems to come from the fact that the
MPI_File_write do a contiguous write for the predefined data types,
making the assumption that they are all contiguous.

I tracked the problem down in the romio/adio/common/is_contig.c file.
For Open MPI the last #else branch is used. The first case in the
switch check for the MPI_COMBINER_NAMED (which is what an MPI is
supposed to return for predefined data types) and set the flag to 1
(which means contiguous). This is obviously wrong for MPI_SHORT_INT.
It really look like a ROMIO problem, so I guess this email should be
redirected to their mailing list.

  Thanks,
george.

On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:

Hello!

I tested openMPI at HLRS for some time without detecting new
problems in the
implementation but now I recognized some awful ones with MPI_Write
which can
lead to data los:

When creating a struct for a mixed datatype like

struct {
short a;
int b;
}

the C-compiler introduce a gap of 2 bytes in the data representation
for this
type due to the 4byte alignment of the integer on 32bit systems.

If I now try to use MPI_File_write to write these data to a file and
use
MPI_SHORT_INT as mpi_datatype this leads to a data los.

I located the problem at the combined use of "write" and
MPI_Type_size in
MPI_File_write.
So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
uses 8 bytes
in memory as there is a gap of 2 bytes. The write function in
ad_write.c now
leads to the los of the data because the gaps are not within the
calculation
of the complete data size to be written into the file.

This problem occures also in the other io functions.
As far as I could find out the problem seems not to be present with
derived
data types.

The question is now how to "fix":
i) Either the MPI_Standard is not clear in this point and the data
types
MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used  
with

structs of these types,
ii) Or the implementation of the MPI_Type_size function has to be
modified to
return the value of eg. true_ub which contains the correct value
iii) Or the MPI_File_write function has not to use the write
function in
the "continues" way on the data and should take care of the gaps.

Regards

Christoph Niethammer
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--

Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
HLRS  Tel: ++49 (0)711-685 6 5858
Nobelstrasse 19  Fax: ++49 (0)711-685 6 5832
70550 Stuttgartemail: kel...@hlrs.de
Germany AIM/Skype:rusraink




smime.p7s
Description: S/MIME cryptographic signature


Re: [OMPI devel] Datasize confusion in MPI_Write can lead to data los!

2008-02-08 Thread George Bosilca
The patch I send few minutes ago will only remove the problem for Open  
MPI. However, their generic test for contiguous data types is still  
broken. Only checking for COMBINER_NAMED is clearly not enough. A  
second test checking that the size and the extent of the data types  
are equal will make the check a lot more accurate.


  Thanks,
george.

On Feb 8, 2008, at 12:26 PM, Rainer Keller wrote:


Hi George,
Good, if You come to the same conclusion with regard to romio using
MPI_Type_size internally in RomIO...


So taking iscontig.c ,-]
   /* This function needs more work. It should check for contiguity
  in other cases as well.*/
and mail to the romio list or have a specialized version of
ADIOI_Datatype_iscontig for ompi ,-]

Either way, the mpi_test_suite in that regard is sane.


Thanks,
Rainer


On Friday 08 February 2008 18:22, George Bosilca wrote:

MPI_Type_size is supposed to return only the size of useful data,
which apparently it does (MPI_SHORT_INT is 6 bytes). What I think it
happens is that the MPI_SHORT_INT type is a predefined one, but  
it's a

really strange predefined type. It's one of the few that are not
contiguous. The problem seems to come from the fact that the
MPI_File_write do a contiguous write for the predefined data types,
making the assumption that they are all contiguous.

I tracked the problem down in the romio/adio/common/is_contig.c file.
For Open MPI the last #else branch is used. The first case in the
switch check for the MPI_COMBINER_NAMED (which is what an MPI is
supposed to return for predefined data types) and set the flag to 1
(which means contiguous). This is obviously wrong for MPI_SHORT_INT.
It really look like a ROMIO problem, so I guess this email should be
redirected to their mailing list.

  Thanks,
george.

On Feb 8, 2008, at 12:50 PM, Christoph Niethammer wrote:

Hello!

I tested openMPI at HLRS for some time without detecting new
problems in the
implementation but now I recognized some awful ones with MPI_Write
which can
lead to data los:

When creating a struct for a mixed datatype like

struct {
short a;
int b;
}

the C-compiler introduce a gap of 2 bytes in the data representation
for this
type due to the 4byte alignment of the integer on 32bit systems.

If I now try to use MPI_File_write to write these data to a file and
use
MPI_SHORT_INT as mpi_datatype this leads to a data los.

I located the problem at the combined use of "write" and
MPI_Type_size in
MPI_File_write.
So MPI_Type_size(MPI_SHORT_INT) returns 6 bytes where the struct
uses 8 bytes
in memory as there is a gap of 2 bytes. The write function in
ad_write.c now
leads to the los of the data because the gaps are not within the
calculation
of the complete data size to be written into the file.

This problem occures also in the other io functions.
As far as I could find out the problem seems not to be present with
derived
data types.

The question is now how to "fix":
i) Either the MPI_Standard is not clear in this point and the data
types
MPI_SHORT_INT, MPI_DOUBLE_INT, ... should be forbidden to be used  
with

structs of these types,
ii) Or the implementation of the MPI_Type_size function has to be
modified to
return the value of eg. true_ub which contains the correct value
iii) Or the MPI_File_write function has not to use the write
function in
the "continues" way on the data and should take care of the gaps.

Regards

Christoph Niethammer
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--

Dipl.-Inf. Rainer Keller   http://www.hlrs.de/people/keller
HLRS  Tel: ++49 (0)711-685 6 5858
Nobelstrasse 19  Fax: ++49 (0)711-685 6 5832
70550 Stuttgartemail: kel...@hlrs.de
Germany AIM/Skype:rusraink




smime.p7s
Description: S/MIME cryptographic signature


[OMPI devel] request help debugging openib btl problem

2008-02-08 Thread Ralph Campbell
I'm using openmpi 1.2.5 with a QLogic HCA and using the
openib btl (not PSM).  osu_latency and osu_bw work OK but
when I run osu_bibw with a message size of 2MB (1<<21),
it hangs in btl_openib_component_progress() waiting for something.

I tried adding printfs at each point where ibv_post_send(),
ibv_post_recv(), and ibv_poll_cq() are called and then ran
a python script which verified that all sends and recvs got a
good completion notice in the posted order
(mca_btl_openib_component.use_srq is zero for this test)
Note that only RC SEND (12252 byte) messages are being sent
at this point.

I can send the trace of ibv_* calls if it will help.

Any suggestions what to look for are welcome.