[OMPI users] distributed file system

2013-05-16 Thread Reza Bakhshayeshi
Hi

Do we need distributed file system (like NFS) when running MPI program on
multiple machines?

thanks,
Reza


Re: [OMPI users] Segmentation fault with HPCC benchmark

2013-04-10 Thread Reza Bakhshayeshi
Dear Gus Correa,

Thank you in advance for your detailed answer.
I was busy checking your steps. But unfortunately still I have the problem.

1) Yes, I have sudo access to server, when I want to run the test only my
two instances are active.

2) There is no problem with running hello program simultaneously on two
instances, but someone told me these programs cannot check some factors.

Instances are pure installation of ubuntu server 12.04, by the way I
disabled "ufw". There are two notes here, openmpi uses ssh and I can
connect with no password from master to slave. And one more odd thing i​s
that​ the order is important in myhosts file, ie, allways the second
machine abort the process, even when I am in the master and master is
second in the file, it reports that master aborted.

3,4) I checked it, actually, I did everything from the first step. Just
installing Atlas and OpenMPI from packages with 64 switch to configure.

5) I used -np 4 with hello, is this sufficient?​

6) Yes, I checked auto-tuning (without input file) too.

One thing that I noticed is that a "vnet" created for each instance on the
main server. I ran these two commands:
mirun -np 2 --hostfile myhosts --mca btl_tcp_if_include eth0,lo hpcc
mirun -np 2 --hostfile myhosts --mca btl_tcp_if_exclude vnet0,vnet1 hpcc

in this case I didn't receive anything, ie, no error nor anything in output
file, I waited for hours but nothing happened. can these vnets cause the
problem?

Really Thank you for your consideration,
Best Regards,
Reza


Re: [OMPI users] Segmentation fault with HPCC benchmark

2013-04-03 Thread Reza Bakhshayeshi
Thanks for your answers.

@Ralph Castain:
Do you mean what error I receive?
It's the output when I'm running the program:

  *** Process received signal ***
  Signal: Segmentation fault (11)
  Signal code: Address not mapped (1)
  Failing at address: 0x1b7f000
  [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6a84b524a0]
  [ 1] hpcc(HPCC_Power2NodesMPIRandomAccessCheck+0xa04) [0x423834]
  [ 2] hpcc(HPCC_MPIRandomAccess+0x87a) [0x41e43a]
  [ 3] hpcc(main+0xfbf) [0x40a1bf]
  [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)
[0x7f6a84b3d76d]
  [ 5] hpcc() [0x40aafd]
  *** End of error message ***
[
][[53938,1],0][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--
mpirun noticed that process rank 1 with PID 4164 on node 192.168.100.6
exited on signal 11 (Segmentation fault).
--

@Gus Correa:
I did it both on server and on instances but it didn't solve the problem.


On 3 April 2013 19:14, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Reza
>
> Check the system stacksize first ('limit stacksize' or 'ulimit -s').
> If it is small, you can try to increase it
> before you run the program.
> Say (tcsh):
>
> limit stacksize unlimited
>
> or (bash):
>
> ulimit -s unlimited
>
> I hope this helps,
> Gus Correa
>
>
> On 04/03/2013 10:29 AM, Ralph Castain wrote:
>
>> Could you perhaps share the stacktrace from the segfault? It's
>> impossible to advise you on the problem without seeing it.
>>
>>
>> On Apr 3, 2013, at 5:28 AM, Reza Bakhshayeshi <reza.b2...@gmail.com
>> <mailto:reza.b2...@gmail.com>> wrote:
>>
>>  ​Hi
>>> ​​I have installed HPCC benchmark suite and openmpi on a private cloud
>>> instances.
>>> Unfortunately I get Segmentation fault error mostly when I want to run
>>> it simultaneously on two or more instances with:
>>> mpirun -np 2 --hostfile ./myhosts hpcc
>>>
>>> Everything is on Ubuntu server 12.04 (updated)
>>> and this is my make.intel64 file:
>>>
>>> shell --**--**--
>>> # --**--**
>>> --
>>> #
>>> SHELL = /bin/sh
>>> #
>>> CD = cd
>>> CP = cp
>>> LN_S = ln -s
>>> MKDIR = mkdir
>>> RM = /bin/rm -f
>>> TOUCH = touch
>>> #
>>> # --**--**
>>> --
>>> # - Platform identifier --**
>>> --
>>> # --**--**
>>> --
>>> #
>>> ARCH = intel64
>>> #
>>> # --**--**
>>> --
>>> # - HPL Directory Structure / HPL library --
>>> # --**--**
>>> --
>>> #
>>> TOPdir = ../../..
>>> INCdir = $(TOPdir)/include
>>> BINdir = $(TOPdir)/bin/$(ARCH)
>>> LIBdir = $(TOPdir)/lib/$(ARCH)
>>> #
>>> HPLlib = $(LIBdir)/libhpl.a
>>> #
>>> # --**--**
>>> --
>>> # - Message Passing library (MPI) --**
>>> 
>>> # --**--**
>>> --
>>> # MPinc tells the C compiler where to find the Message Passing library
>>> # header files, MPlib is defined to be the name of the library to be
>>> # used. The variable MPdir is only used for defining MPinc and MPlib.
>>> #
>>> MPdir = /usr/lib/openmpi
>>> MPinc = -I$(MPdir)/include
>>> MPlib = $(MPdir)/lib/libmpi.so
>>> #
>>> # --**--**
>>> --
>>> # - Linear Algebra library (BLAS or VSIPL) -
>>> # --**--**
>>> --
>>> # LAinc tells the C compiler where to find the Linear Algebra library
>>> # header files, LAlib is defined to be the name of the library to be
>>> # used. The variable LAdir is only used for defining LAinc and LAlib.
>>> #
>>> LAdir = /usr/local/ATLAS/obj64
>

[OMPI users] Segmentation fault with HPCC benchmark

2013-04-03 Thread Reza Bakhshayeshi
​Hi
​​I have installed HPCC benchmark suite and openmpi on a private cloud
instances.
Unfortunately I get Segmentation fault error mostly when I want to run it
simultaneously on two or more instances with:
mpirun -np 2 --hostfile ./myhosts hpcc

Everything is on Ubuntu server 12.04 (updated)
and this is my make.intel64 file:

shell --
# --
#
SHELL= /bin/sh
#
CD   = cd
CP   = cp
LN_S = ln -s
MKDIR= mkdir
RM   = /bin/rm -f
TOUCH= touch
#
# --
# - Platform identifier 
# --
#
ARCH = intel64
#
# --
# - HPL Directory Structure / HPL library --
# --
#
TOPdir   = ../../..
INCdir   = $(TOPdir)/include
BINdir   = $(TOPdir)/bin/$(ARCH)
LIBdir   = $(TOPdir)/lib/$(ARCH)
#
HPLlib   = $(LIBdir)/libhpl.a
#
# --
# - Message Passing library (MPI) --
# --
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir= /usr/lib/openmpi
MPinc= -I$(MPdir)/include
MPlib= $(MPdir)/lib/libmpi.so
#
# --
# - Linear Algebra library (BLAS or VSIPL) -
# --
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir= /usr/local/ATLAS/obj64
LAinc= -I$(LAdir)/include
LAlib= $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a
#
# --
# - F77 / C interface --
# --
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_  : all lower case and a suffixed underscore  (Suns,
#   Intel, ...),   [default]
# -DNoChange  : all lower case (IBM RS6000),
# -DUpCase: all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle: The string address is passed at the string loca-
#   tion on the stack, and the string length is then
#   passed as  an  F77_INTEGER  after  all  explicit
#   stack arguments,   [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#   Fortran 77  string,  and the structure is of the
#   form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#   77 string,  and  the  structure is  of the form:
#   struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#   Cray  fcd  (fortran  character  descriptor)  for
#   interoperation.
#
F2CDEFS  =
#
# --
# - HPL includes / libraries / specifics ---
# --
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -lm
#
# - Compile time options ---
#
# -DHPL_COPY_L   force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS   call the cblas interface;
#