Re: [OMPI users] Regression: Fortran derived types with newer MPI module

2014-01-07 Thread Jeff Squyres (jsquyres)
Yes, I can explain what's going on here.  The short version is that a change 
was made with the intent to provide maximum Fortran code safety, but with a 
possible backwards compatibility issue.  If this change is causing real 
problems, we can probably change this, but I'd like a little feedback from the 
Fortran MPI dev community first.

It's a complex issue, and requires a little background and discussion, sorry...

a) Back in the 1.6.x series, we allowed users to build multiple variants of the 
"use mpi" Fortran interface:

- tiny: only the MPI_SIZEOF subroutine
- small: tiny + all MPI subroutines that did not take choice buffers, and the 
MPI functions (WTICK, WTIME)
- medium: small + all MPI subroutines that take 1 choice buffer (e.g., MPI_SEND)
- large: all MPI subroutines (even those that take 2 choice buffers, such as 
collectives)
  --> Note: the "large" size never really worked for uninteresting reasons.  It 
won't be fixed.

See the OMPI 1.6.x README for more details.

The default is "small" in the 1.6.x series.  This means that when you call 
MPI_SEND (and any other function that takes a choice buffer), you are not 
getting an MPI-implementation-provided prototype for that function -- it's 
essentially the same as how everyone has implemented mpif.h (i.e., no 
prototypes).

This is why you are able to compile your code in OMPI 1.6.x with "use mpi" -- 
because there is no prototype for MPI_SEND in the mpi module.  Heck, you could 
even:

  ! Don't pass enough params to MPI_SEND
  call MPI_Send(bogus)

and it would compile and link.  It will likely segv at run time, but that's a 
different issue.

b) I *believe* that MPICH does the equivalent of "tiny", but I'm not going to 
swear to that (meaning: you're not getting any prototypes for any MPI 
subroutines other than MPI_SIZEOF).

This is why you are able to compile your code with MPICH and "use mpi" -- same 
disclaimers as a) (i.e., you get no compile-time protection for when you don't 
call MPI subroutines properly).

c) The design of the MPI-2 "mpi" module has multiple flaws that are identified 
in the MPI-3 text (but were not recognized back in MPI-2.x days).  Here's one: 
until F2008+addendums, there was no Fortran equivalent of "void *".  Hence, the 
mpi module has to overload MPI_Send() and have a prototype *for every possible 
type and dimension*.

The OMPI "medium" implementation actually provides overloaded prototypes for 
all pre-defined Fortran datatypes (INTEGER, REAL, ...etc.), for scalars and, by 
default, array ranks up to 4.  Fortran <2003 allows up to... er... I think 
?7?... dimensional arrays, but providing an overloaded interface for each 
scalar type and all array dimensions for each type explodes the number of 
overloaded prototypes in the mpi module; most compilers that we tested several 
years ago would segv with this many interfaces in a single module.

It gets worse with the MPI subroutines that take multiple choice buffers: you 
get an exponential explosion of interfaces.  IIRC, a fully-populated mpi 
"large" module would contain over 5 million interfaces.

Craig Rasmussen and I wrote a paper about this in EuroMPI 2005 
(http://www.open-mpi.org/papers/euro-pvmmpi-2005-fortran/).  It was one of the 
issues that eventually led to the creation of the MPI-3 mpi_f08 module.

Here's another fatal flaw: it's not possible for an MPI implementation to 
provide MPI_Send() prototypes for user-defined Fortran datatypes.  Hence, the 
example you cite is a pipe dream for the "mpi" module because there's no way to 
specify a (void*)-like argument for the choice buffer.  

Meaning: Fortran MPI apps can either have compile-time safety or user-defined 
datatypes as choice buffers.  Pick one.

d) A solution to the problems listed in c) is to use non-standard, 
compiler-specific "ignore TKR" functionality in the mpi module implementation, 
which effectively provides (void*) functionality.  Hence, an implementation can 
have a *single* MPI_SEND subroutine interface, and use a pragma to ignore the 
type, kind, and rank of the choice buffer parameter.

OMPI 1.7 and beyond actually has 2 different implementations of the mpi module:

- the old tiny/small/medium-based interface for compilers that do not support 
"ignore TKR" pragmas (i.e., gfortran)
- a new ignore-TKR-based module that prototypes all MPI subroutines and 
functions

Meaning: OMPI 1.7 with non-gfortran works great (i.e., your sample code 
compiles).  OMPI 1.7 with gfortran is *mostly* the same as it was in 1.6, 
except that we changed the default from "small" to "medium".

*** This is what is causing your problem.  In OMPI 1.6, we didn't provide an 
interface for MPI_SEND by default.  In OMPI 1.7, we do.

Craig Rasmussen and I debated long and hard about whether to change the default 
from "small" to "medium" or not.  We finally ended up doing it with the 
following rationale:

- very few codes use the "mpi" module
- but those who do should have the maximum amount of compile-tim

Re: [OMPI users] Regression: Fortran derived types with newer MPI module

2014-01-07 Thread Jeff Squyres (jsquyres)
On Jan 7, 2014, at 10:09 PM, Jeff Squyres  wrote:

> - the old tiny/small/medium-based interface for compilers that do not support 
> "ignore TKR" pragmas (i.e., gfortran)

And of course, as soon as I say this publicly, I see a post from Tobias 
reminding me that gcc/gfortran 4.9 will include their own ignore TKR 
functionality (and he reminds me to include the patch to support it in OMPI... 
which I'll do for trunk/1.7.5).  :-)

This means that you'll get a good "mpi" module with gfortran 4.9 (i.e., all MPI 
subroutines and functions are prototyped, and (void*)-like functionality is 
supported).

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI users] Regression: Fortran derived types with newer MPI module

2014-01-07 Thread Jed Brown
"Jeff Squyres (jsquyres)"  writes:

> Yes, I can explain what's going on here.  The short version is that a
> change was made with the intent to provide maximum Fortran code
> safety, but with a possible backwards compatibility issue.  If this
> change is causing real problems, we can probably change this, but I'd
> like a little feedback from the Fortran MPI dev community first.

On page 610, I see text disallowing the explicit interfaces in
ompi/mpi/fortran/use-mpi-tkr:

  In S2 and S3: Without such extensions, routines with choice buffers should
  be provided with an implicit interface, instead of overloading with a 
different
  MPI function for each possible buffer type (as mentioned in Section 17.1.11 on
  page 625). Such overloading would also imply restrictions for passing Fortran
  derived types as choice buffer, see also Section 17.1.15 on page 629.


Why did OMPI decide that this (presumably non-normative) text in the
standard was not worth following?  (Rejecting something in the standard
indicates stronger convictions than would be independently weighing the
benefits of each approach.)

> c) The design of the MPI-2 "mpi" module has multiple flaws that are
> identified in the MPI-3 text (but were not recognized back in MPI-2.x
> days).  Here's one: until F2008+addendums, there was no Fortran
> equivalent of "void *".  Hence, the mpi module has to overload
> MPI_Send() and have a prototype *for every possible type and
> dimension*.

And this is not possible, thus the text saying not to do it.

I don't call MPI from Fortran, but someone on a Fortran project that I
watch mentioned that the compiler would complain about such and such a
use (actually relating to types for MPI_Status in MPI_Recv rather than
buffer types).  My immediate response was "they can't do that because
without nonstandard or post-F08 extensions (or exposing the user to
c_loc), the type system cannot express those functions and thus you
cannot have explicit interfaces".  But then I looked at latest OMPI and
indeed, it was enumerating types, thus my email.

> Here's another fatal flaw: it's not possible for an MPI implementation
> to provide MPI_Send() prototypes for user-defined Fortran datatypes.
> Hence, the example you cite is a pipe dream for the "mpi" module
> because there's no way to specify a (void*)-like argument for the
> choice buffer.

F2003 has c_loc, which is a sufficient stop-gap until TS 29113 is widely
available.  I have long-advocated that the best way to write extensible
libraries for Fortran2003 callers (even if the library is implemented
entirely in Fortran) involves some use of c_loc (e.g., for context
arguments).

This annoys the Fortran programmers and they usually write perl scripts
to generate interfaces that enumerate the types they need and give up on
extensibility.  ;-)

It's nice to know that after 60 years (when Fortran 201x is released,
including TS 29113), there will be a Fortran standard with an analogue
of void*.

> Craig Rasmussen and I debated long and hard about whether to change
> the default from "small" to "medium" or not.  We finally ended up
> doing it with the following rationale:
>
> - very few codes use the "mpi" module

FWIW, I've noticed a few projects transition to it in the last few years.

> - but those who do should have the maximum amount of compile-time protection
>
> ...but we always knew that someone may come complaining some day.  And that 
> day has now come.
>
> So my question to you / the Fortran MPI dev community is: what do you want 
> (for gfortran)?  
>
> Do you want us to go back to the "small" size by default, or do you
> want more compile-time protection by default?  (with the obvious
> caveat that you can't use user-defined Fortran datatypes as choice
> buffers; you might be able to use something like c_loc, but I haven't
> thought deeply about this and don't know offhand if that works)

I can't answer this as a Fortran developer, but I know that a lot of
projects want some modicum of portability and in practice, it takes
almost 10 years to flush the old compilers out of production
environments.  Either the upgrade problem will need to be fixed [1] so
that nearly all existing machines have new compilers or Fortran projects
will be wrestling with this for a long time yet.

Most Fortran packages I know use homogeneous arrays, which also means
that they don't call MPI_Type_create_struct or similar functions.  If
those functions are going to be provided by the module, I think they
should be able to use them (e.g., examples in the Standard should work)
and the Standard's advice about implicit interfaces should be followed.



[1] Also, there are still production machines without MPI-2.0 and I get
email if I make a mistake in providing MPI-1 fallback paths.


pgp4Mn5eAmbuu.pgp
Description: PGP signature