Since a few days the MTT runs on my ppc64 systems are failing with:
[bimini:11716] *** Process received signal ***
[bimini:11716] Signal: Segmentation fault (11)
[bimini:11716] Signal code: Address not mapped (1)
[bimini:11716] Failing at address: (nil)[bimini:11716] [ 0] [0x3fffa2bb0448]
[bimini:
Hi PSM folks,
I'm noticing some weirdness on master using the psm mtl.
If I run multi-node, I don't see a problem. If I run using only a
single node, however, and use more than 1 rank, then I get
a timeout in psm_ep_connect.
On ompi-release I also observe this problem, but it seems
to be more sp
Thanks Adrian; I turned this into https://github.com/open-mpi/ompi/issues/874.
> On Sep 8, 2015, at 9:56 AM, Adrian Reber wrote:
>
> Since a few days the MTT runs on my ppc64 systems are failing with:
>
> [bimini:11716] *** Process received signal ***
> [bimini:11716] Signal: Segmentation fault
Hi Howard,
Is this new behavior?
Do you see the error if you set PSM_DEVICES=shm,self ? The PSM MTL should be
setting this on its own, but maybe something changed.
Andrew
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Tuesday, September 8, 2015 10:06 AM
To
On Sep 7, 2015, at 5:07 PM, Ralph Castain wrote:
>
> * two jobs started by the same mpirun - supported today by ORTE
>
> * two jobs started by different mpiruns - we used to support, but is broken
> in grpcomm/barrier
>
> * two direct-launched jobs - never supported
>
> * one direct-launched
Why would anyone use connect/accept (or join) between processes on the same
job? The only environment where such a functionality makes sense is where
disjoint applications (think computing part and the visualization part) are
able to connect together. There are application that use such a model, bu
It’s called comm_spawn, which involves the connect/accept code after launch :-)
> On Sep 8, 2015, at 1:59 PM, George Bosilca wrote:
>
> Why would anyone use connect/accept (or join) between processes on the same
> job? The only environment where such a functionality makes sense is where
> dis
On Sep 8, 2015, at 4:59 PM, George Bosilca wrote:
>
> Why would anyone use connect/accept (or join) between processes on the same
> job? The only environment where such a functionality makes sense is where
> disjoint applications (think computing part and the visualization part) are
> able to
Hi folks
I’ve poked around this evening and gotten the Slurm support in master to at
least build, and for mpirun to now work correctly under a Slurm job allocation.
This should all be committed as soon as auto-testing completes:
https://github.com/open-mpi/ompi/pull/877
Howard/Nathan: I believ