[OMPI users] distributed file system
Hi Do we need distributed file system (like NFS) when running MPI program on multiple machines? thanks, Reza
Re: [OMPI users] Segmentation fault with HPCC benchmark
Dear Gus Correa, Thank you in advance for your detailed answer. I was busy checking your steps. But unfortunately still I have the problem. 1) Yes, I have sudo access to server, when I want to run the test only my two instances are active. 2) There is no problem with running hello program simultaneously on two instances, but someone told me these programs cannot check some factors. Instances are pure installation of ubuntu server 12.04, by the way I disabled "ufw". There are two notes here, openmpi uses ssh and I can connect with no password from master to slave. And one more odd thing is that the order is important in myhosts file, ie, allways the second machine abort the process, even when I am in the master and master is second in the file, it reports that master aborted. 3,4) I checked it, actually, I did everything from the first step. Just installing Atlas and OpenMPI from packages with 64 switch to configure. 5) I used -np 4 with hello, is this sufficient? 6) Yes, I checked auto-tuning (without input file) too. One thing that I noticed is that a "vnet" created for each instance on the main server. I ran these two commands: mirun -np 2 --hostfile myhosts --mca btl_tcp_if_include eth0,lo hpcc mirun -np 2 --hostfile myhosts --mca btl_tcp_if_exclude vnet0,vnet1 hpcc in this case I didn't receive anything, ie, no error nor anything in output file, I waited for hours but nothing happened. can these vnets cause the problem? Really Thank you for your consideration, Best Regards, Reza
Re: [OMPI users] Segmentation fault with HPCC benchmark
Thanks for your answers. @Ralph Castain: Do you mean what error I receive? It's the output when I'm running the program: *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0x1b7f000 [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f6a84b524a0] [ 1] hpcc(HPCC_Power2NodesMPIRandomAccessCheck+0xa04) [0x423834] [ 2] hpcc(HPCC_MPIRandomAccess+0x87a) [0x41e43a] [ 3] hpcc(main+0xfbf) [0x40a1bf] [ 4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f6a84b3d76d] [ 5] hpcc() [0x40aafd] *** End of error message *** [ ][[53938,1],0][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104) -- mpirun noticed that process rank 1 with PID 4164 on node 192.168.100.6 exited on signal 11 (Segmentation fault). -- @Gus Correa: I did it both on server and on instances but it didn't solve the problem. On 3 April 2013 19:14, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Reza > > Check the system stacksize first ('limit stacksize' or 'ulimit -s'). > If it is small, you can try to increase it > before you run the program. > Say (tcsh): > > limit stacksize unlimited > > or (bash): > > ulimit -s unlimited > > I hope this helps, > Gus Correa > > > On 04/03/2013 10:29 AM, Ralph Castain wrote: > >> Could you perhaps share the stacktrace from the segfault? It's >> impossible to advise you on the problem without seeing it. >> >> >> On Apr 3, 2013, at 5:28 AM, Reza Bakhshayeshi <reza.b2...@gmail.com >> <mailto:reza.b2...@gmail.com>> wrote: >> >> Hi >>> I have installed HPCC benchmark suite and openmpi on a private cloud >>> instances. >>> Unfortunately I get Segmentation fault error mostly when I want to run >>> it simultaneously on two or more instances with: >>> mpirun -np 2 --hostfile ./myhosts hpcc >>> >>> Everything is on Ubuntu server 12.04 (updated) >>> and this is my make.intel64 file: >>> >>> shell --**--**-- >>> # --**--** >>> -- >>> # >>> SHELL = /bin/sh >>> # >>> CD = cd >>> CP = cp >>> LN_S = ln -s >>> MKDIR = mkdir >>> RM = /bin/rm -f >>> TOUCH = touch >>> # >>> # --**--** >>> -- >>> # - Platform identifier --** >>> -- >>> # --**--** >>> -- >>> # >>> ARCH = intel64 >>> # >>> # --**--** >>> -- >>> # - HPL Directory Structure / HPL library -- >>> # --**--** >>> -- >>> # >>> TOPdir = ../../.. >>> INCdir = $(TOPdir)/include >>> BINdir = $(TOPdir)/bin/$(ARCH) >>> LIBdir = $(TOPdir)/lib/$(ARCH) >>> # >>> HPLlib = $(LIBdir)/libhpl.a >>> # >>> # --**--** >>> -- >>> # - Message Passing library (MPI) --** >>> >>> # --**--** >>> -- >>> # MPinc tells the C compiler where to find the Message Passing library >>> # header files, MPlib is defined to be the name of the library to be >>> # used. The variable MPdir is only used for defining MPinc and MPlib. >>> # >>> MPdir = /usr/lib/openmpi >>> MPinc = -I$(MPdir)/include >>> MPlib = $(MPdir)/lib/libmpi.so >>> # >>> # --**--** >>> -- >>> # - Linear Algebra library (BLAS or VSIPL) - >>> # --**--** >>> -- >>> # LAinc tells the C compiler where to find the Linear Algebra library >>> # header files, LAlib is defined to be the name of the library to be >>> # used. The variable LAdir is only used for defining LAinc and LAlib. >>> # >>> LAdir = /usr/local/ATLAS/obj64 >
[OMPI users] Segmentation fault with HPCC benchmark
Hi I have installed HPCC benchmark suite and openmpi on a private cloud instances. Unfortunately I get Segmentation fault error mostly when I want to run it simultaneously on two or more instances with: mpirun -np 2 --hostfile ./myhosts hpcc Everything is on Ubuntu server 12.04 (updated) and this is my make.intel64 file: shell -- # -- # SHELL= /bin/sh # CD = cd CP = cp LN_S = ln -s MKDIR= mkdir RM = /bin/rm -f TOUCH= touch # # -- # - Platform identifier # -- # ARCH = intel64 # # -- # - HPL Directory Structure / HPL library -- # -- # TOPdir = ../../.. INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a # # -- # - Message Passing library (MPI) -- # -- # MPinc tells the C compiler where to find the Message Passing library # header files, MPlib is defined to be the name of the library to be # used. The variable MPdir is only used for defining MPinc and MPlib. # MPdir= /usr/lib/openmpi MPinc= -I$(MPdir)/include MPlib= $(MPdir)/lib/libmpi.so # # -- # - Linear Algebra library (BLAS or VSIPL) - # -- # LAinc tells the C compiler where to find the Linear Algebra library # header files, LAlib is defined to be the name of the library to be # used. The variable LAdir is only used for defining LAinc and LAlib. # LAdir= /usr/local/ATLAS/obj64 LAinc= -I$(LAdir)/include LAlib= $(LAdir)/lib/libcblas.a $(LAdir)/lib/libatlas.a # # -- # - F77 / C interface -- # -- # You can skip this section if and only if you are not planning to use # a BLAS library featuring a Fortran 77 interface. Otherwise, it is # necessary to fill out the F2CDEFS variable with the appropriate # options. **One and only one** option should be chosen in **each** of # the 3 following categories: # # 1) name space (How C calls a Fortran 77 routine) # # -DAdd_ : all lower case and a suffixed underscore (Suns, # Intel, ...), [default] # -DNoChange : all lower case (IBM RS6000), # -DUpCase: all upper case (Cray), # -DAdd__ : the FORTRAN compiler in use is f2c. # # 2) C and Fortran 77 integer mapping # # -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default] # -DF77_INTEGER=long : Fortran 77 INTEGER is a C long, # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short. # # 3) Fortran 77 string handling # # -DStringSunStyle: The string address is passed at the string loca- # tion on the stack, and the string length is then # passed as an F77_INTEGER after all explicit # stack arguments, [default] # -DStringStructPtr : The address of a structure is passed by a # Fortran 77 string, and the structure is of the # form: struct {char *cp; F77_INTEGER len;}, # -DStringStructVal : A structure is passed by value for each Fortran # 77 string, and the structure is of the form: # struct {char *cp; F77_INTEGER len;}, # -DStringCrayStyle : Special option for Cray machines, which uses # Cray fcd (fortran character descriptor) for # interoperation. # F2CDEFS = # # -- # - HPL includes / libraries / specifics --- # -- # HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc) HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -lm # # - Compile time options --- # # -DHPL_COPY_L force the copy of the panel L before bcast; # -DHPL_CALL_CBLAS call the cblas interface; #