Re: [OMPI users] Shared memory

2010-10-06 Thread Andrei Fokau
Currently we run a code on a cluster with distributed memory, and this code
needs a lot of memory. Part of the data stored in memory is the same for
each process, but it is organized as one array - we can split it if
necessary. So far no magic occurred for us. What do we need to do to make
the magic working?


On Wed, Oct 6, 2010 at 12:43, Jeff Squyres (jsquyres) <jsquy...@cisco.com>wrote:

> Open MPI will use shared memory to communicate between peers on the sane
> node - but that's hidden beneath the covers; it's not exposed via the MPI
> API. You just MPI-send and magic occurs and the receiver gets the message.
>
> On Oct 4, 2010, at 11:13 AM, "Andrei Fokau" <andrei.fo...@neutron.kth.se>
> wrote:
>
> Does OMPI have shared memory capabilities (as it is mentioned in MPI-2)?
> How can I use them?
>
> On Sat, Sep 25, 2010 at 23:19, Andrei Fokau <<andrei.fo...@neutron.kth.se>
> andrei.fo...@neutron.kth.se> wrote:
>
>> Here are some more details about our problem. We use a dozen of
>> 4-processor nodes with 8 GB memory on each node. The code we run needs about
>> 3 GB per processor, so we can load only 2 processors out of 4. The vast
>> majority of those 3 GB is the same for each processor and is
>> accessed continuously during calculation. In my original question I wasn't
>> very clear asking about a possibility to use shared memory with Open MPI -
>> in our case we do not need to have a remote access to the data, and it
>> would be sufficient to share memory within each node only.
>>
>> Of course, the possibility to access the data remotely (via mmap) is
>> attractive because it would allow to store much larger arrays (up to 10 GB)
>> at one remote place, meaning higher accuracy for our calculations. However,
>> I believe that the access time would be too long for the data read so
>> frequently, and therefore the performance would be lost.
>>
>> I still hope that some of the subscribers to this mailing list have an
>> experience of using Global Arrays. This library seems to be fine for our
>> case, however I feel that there should be a simpler solution. Open MPI
>> conforms with MPI-2 standard, and the later has a description of shared
>> memory application. Do you see any other way for us to use shared memory
>> (within node) apart of using Global Arrays?
>>
>> On Fri, Sep 24, 2010 at 19:03, Durga Choudhury < <dpcho...@gmail.com>
>> dpcho...@gmail.com> wrote:
>>
>>> I think the 'middle ground' approach can be simplified even further if
>>> the data file is in a shared device (e.g. NFS/Samba mount) that can be
>>> mounted at the same location of the file system tree on all nodes. I
>>> have never tried it, though and mmap()'ing a non-POSIX compliant file
>>> system such as Samba might have issues I am unaware of.
>>>
>>> However, I do not see why you should not be able to do this even if
>>> the file is being written to as long as you call msync() before using
>>> the mapped pages.
>>>
>>> Durga
>>>
>>>
>>> On Fri, Sep 24, 2010 at 12:31 PM, Eugene Loh < <eugene@oracle.com>
>>> eugene@oracle.com> wrote:
>>> > It seems to me there are two extremes.
>>> >
>>> > One is that you replicate the data for each process.  This has the
>>> > disadvantage of consuming lots of memory "unnecessarily."
>>> >
>>> > Another extreme is that shared data is distributed over all processes.
>>> This
>>> > has the disadvantage of making at least some of the data less
>>> accessible,
>>> > whether in programming complexity and/or run-time performance.
>>> >
>>> > I'm not familiar with Global Arrays.  I was somewhat familiar with
>>> HPF.  I
>>> > think the natural thing to do with those programming models is to
>>> distribute
>>> > data over all processes, which may relieve the excessive memory
>>> consumption
>>> > you're trying to address but which may also just put you at a different
>>> > "extreme" of this spectrum.
>>> >
>>> > The middle ground I think might make most sense would be to share data
>>> only
>>> > within a node, but to replicate the data for each node.  There are
>>> probably
>>> > multiple ways of doing this -- possibly even GA, I don't know.  One way
>>> > might be to use one MPI process per node, with OMP multithreading
>>> within
>>> > each process|node.  Or (and I thought this was the solution yo

Re: [OMPI users] Shared memory

2010-10-04 Thread Andrei Fokau
Does OMPI have shared memory capabilities (as it is mentioned in MPI-2)?
How can I use them?

Andrei


On Sat, Sep 25, 2010 at 23:19, Andrei Fokau <andrei.fo...@neutron.kth.se>wrote:

> Here are some more details about our problem. We use a dozen of 4-processor
> nodes with 8 GB memory on each node. The code we run needs about 3 GB per
> processor, so we can load only 2 processors out of 4. The vast majority of
> those 3 GB is the same for each processor and is accessed continuously
> during calculation. In my original question I wasn't very clear asking about
> a possibility to use shared memory with Open MPI - in our case we do not
> need to have a remote access to the data, and it would be sufficient to
> share memory within each node only.
>
> Of course, the possibility to access the data remotely (via mmap) is
> attractive because it would allow to store much larger arrays (up to 10 GB)
> at one remote place, meaning higher accuracy for our calculations. However,
> I believe that the access time would be too long for the data read so
> frequently, and therefore the performance would be lost.
>
> I still hope that some of the subscribers to this mailing list have an
> experience of using Global Arrays. This library seems to be fine for our
> case, however I feel that there should be a simpler solution. Open MPI
> conforms with MPI-2 standard, and the later has a description of shared
> memory application. Do you see any other way for us to use shared memory
> (within node) apart of using Global Arrays?
>
> Andrei
>
>
> On Fri, Sep 24, 2010 at 19:03, Durga Choudhury <dpcho...@gmail.com> wrote:
>
>> I think the 'middle ground' approach can be simplified even further if
>> the data file is in a shared device (e.g. NFS/Samba mount) that can be
>> mounted at the same location of the file system tree on all nodes. I
>> have never tried it, though and mmap()'ing a non-POSIX compliant file
>> system such as Samba might have issues I am unaware of.
>>
>> However, I do not see why you should not be able to do this even if
>> the file is being written to as long as you call msync() before using
>> the mapped pages.
>>
>> Durga
>>
>>
>> On Fri, Sep 24, 2010 at 12:31 PM, Eugene Loh <eugene@oracle.com>
>> wrote:
>> > It seems to me there are two extremes.
>> >
>> > One is that you replicate the data for each process.  This has the
>> > disadvantage of consuming lots of memory "unnecessarily."
>> >
>> > Another extreme is that shared data is distributed over all processes.
>> This
>> > has the disadvantage of making at least some of the data less
>> accessible,
>> > whether in programming complexity and/or run-time performance.
>> >
>> > I'm not familiar with Global Arrays.  I was somewhat familiar with HPF.
>> I
>> > think the natural thing to do with those programming models is to
>> distribute
>> > data over all processes, which may relieve the excessive memory
>> consumption
>> > you're trying to address but which may also just put you at a different
>> > "extreme" of this spectrum.
>> >
>> > The middle ground I think might make most sense would be to share data
>> only
>> > within a node, but to replicate the data for each node.  There are
>> probably
>> > multiple ways of doing this -- possibly even GA, I don't know.  One way
>> > might be to use one MPI process per node, with OMP multithreading within
>> > each process|node.  Or (and I thought this was the solution you were
>> looking
>> > for), have some idea which processes are collocal.  Have one process per
>> > node create and initialize some shared memory -- mmap, perhaps, or SysV
>> > shared memory.  Then, have its peers map the same shared memory into
>> their
>> > address spaces.
>> >
>> > You asked what source code changes would be required.  It depends.  If
>> > you're going to mmap shared memory in on each node, you need to know
>> which
>> > processes are collocal.  If you're willing to constrain how processes
>> are
>> > mapped to nodes, this could be easy.  (E.g., "every 4 processes are
>> > collocal".)  If you want to discover dynamically at run time which are
>> > collocal, it would be harder.  The mmap stuff could be in a stand-alone
>> > function of about a dozen lines.  If the shared area is allocated as one
>> > piece, substituting the single malloc() call with a call to your mmap
>> > function should be simple.  If you have many malloc()s y

[OMPI users] STDIN

2010-10-03 Thread Andrei Fokau
We have a a program which is normally (running w/o MPI) controlled by Ctrl+C
and then a number of options. Is it possible to set STDIN for a command
executed via mpirun? The following page describes parameter -stdin, but it
seems to be not supported by OMPI's mpirun.
http://w3.ualg.pt/~dubuf/calhau.txt

Is there a solution for our problem?

Regards,
Andrei


Re: [OMPI users] Shared memory

2010-09-24 Thread Andrei Fokau
The data are read from a file and processed before calculations begin, so I
think that mapping will not work in our case.

Global Arrays look promising indeed. As I said, we need to put just a part
of data to the shared section. John, do you (or may be other users) have an
experience of working with GA?

http://www.emsl.pnl.gov/docs/global/um/build.html
*When GA runs with MPI:*

MPI_Init(..)  ! start MPI
GA_Initialize()   ! start global arrays
MA_Init(..)   ! start memory allocator

    do work

GA_Terminate()! tidy up global arrays
MPI_Finalize()! tidy up MPI
  ! exit program



On Fri, Sep 24, 2010 at 13:44, Reuti <re...@staff.uni-marburg.de> wrote:

> Am 24.09.2010 um 13:26 schrieb John Hearns:
>
> > On 24 September 2010 08:46, Andrei Fokau <andrei.fo...@neutron.kth.se>
> wrote:
> >> We use a C-program which consumes a lot of memory per process (up to few
> >> GB), 99% of the data being the same for each process. So for us it would
> be
> >> quite reasonable to put that part of data in a shared memory.
> >
> > http://www.emsl.pnl.gov/docs/global/
> >
> > Is this eny help? Apologies if I'm talking through my hat.
>
> I was also thinking of this when I read "data in a shared memory" (besides
> approaches like http://www.kerrighed.org/wiki/index.php/Main_Page). Wasn't
> this also one idea behind "High Performance Fortran" - running in parallel
> across nodes even without knowing that it's across nodes at all while
> programming and access all data like it's being local.
>
> -- Reuti
>
>


[OMPI users] Shared memory

2010-09-24 Thread Andrei Fokau
We use a C-program which consumes a lot of memory per process (up to few
GB), 99% of the data being the same for each process. So for us it would be
quite reasonable to put that part of data in a shared memory.

In the source code, the memory is allocated via malloc() function. What
would it require for us to change in the source code to be able to put that
repeating data in a shared memory?

The code is normally run on several nodes.


Re: [OMPI users] Running on crashing nodes

2010-09-24 Thread Andrei Fokau
Ralph, could you tell us when this functionality will be available in the
stable version? A rough estimate will be fine.


On Fri, Sep 24, 2010 at 01:24, Ralph Castain <r...@open-mpi.org> wrote:

> In a word, no. If a node crashes, OMPI will abort the currently-running job
> if it had processes on that node. There is no current ability to "ride-thru"
> such an event.
>
> That said, there is work being done to support "ride-thru". Most of that is
> in the current developer's code trunk, and more is coming, but I wouldn't
> consider it production-quality just yet.
>
> Specifically, the code that does what you specify below is done and works.
> It is recovery of the MPI job itself (collectives, lost messages, etc.) that
> remains to be completed.
>
>
>  On Thu, Sep 23, 2010 at 7:22 AM, Andrei Fokau <
> andrei.fo...@neutron.kth.se> wrote:
>
>>  Dear users,
>>
>> Our cluster has a number of nodes which have high probability to crash, so
>> it happens quite often that calculations stop due to one node getting down.
>> May be you know if it is possible to block the crashed nodes during run-time
>> when running with OpenMPI? I am asking about principal possibility to
>> program such behavior. Does OpenMPI allow such dynamic checking? The scheme
>> I am curious about is the following:
>>
>> 1. A code starts its tasks via mpirun on several nodes
>> 2. At some moment one node gets down
>> 3. The code realizes that the node is down (the results are lost) and
>> excludes it from the list of nodes to run its tasks on
>> 4. At later moment the user restarts the crashed node
>> 5. The code notices that the node is up again, and puts it back to the
>> list of active nodes
>>
>>
>> Regards,
>> Andrei
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


[OMPI users] Running on crashing nodes

2010-09-23 Thread Andrei Fokau
Dear users,

Our cluster has a number of nodes which have high probability to crash, so
it happens quite often that calculations stop due to one node getting down.
May be you know if it is possible to block the crashed nodes during run-time
when running with OpenMPI? I am asking about principal possibility to
program such behavior. Does OpenMPI allow such dynamic checking? The scheme
I am curious about is the following:

1. A code starts its tasks via mpirun on several nodes
2. At some moment one node gets down
3. The code realizes that the node is down (the results are lost) and
excludes it from the list of nodes to run its tasks on
4. At later moment the user restarts the crashed node
5. The code notices that the node is up again, and puts it back to the list
of active nodes


Regards,
Andrei