On Sep 9, 2010, at 2:51 PM, Benjamin Sanderse wrote:
> I have installed Petsc on a remote 64-bits Linux (Fedora). Might there be an
> issue with the 64-bits?
Shouldn't mater.
>
> My own computer is a Macbook Pro, but I am not running Petsc on it, because
> in the past I have had severe problems with getting mex-files to work under
> 64-bits Matlab.
Yes, that is painful, Matlab never respected the Mac
> I could try to install it, though.
> Another option is that I try it on another remote Linux machine, also
> 64-bits, running Red Hat.
You could try this to see if the same problem exists.
>
> What do you suggest?
Are you sure it is hanging on the sockets not hanging on running MPI jobs
one after each other?
Barry
>
>
> Op 9 sep 2010, om 13:10 heeft Barry Smith het volgende geschreven:
>
>>
>> What OS are you using.
>>
>> On my Apple Mac I made a shell script loop calling petsc_poisson_par_barry2
>> multiple times and a similar loop in Matlab and start then both off (with
>> parallel PETSc runs). It runs flawlessly, opening the socket sending and
>> receiving, dozens of times in a row with several processes. I think that
>> maybe you are using Linux?
>>
>>
>> Barry
>>
>>
>> On Sep 8, 2010, at 2:32 PM, Benjamin Sanderse wrote:
>>
>>> That's also what I thought. I checked once again, and I found out that when
>>> I use
>>>
>>> petscmpiexec -n 1
>>>
>>> the program works, but if I increase the number of processors to 2 it only
>>> works once in a while.
>>>
>>> I attached my test code. It is extremely simple and does nothing else than
>>> passing a vector to petsc and returning it to matlab.
>>>
>>> I run it as follows:
>>>
>>> shell #1
>>> -bash-4.0$ make petsc_poisson_par_barry2
>>>
>>> shell #2
>>> -bash-4.0$ matlab -nojvm -nodisplay
>>>>> test_petsc_par_barry;
>>>
>>> shell #1
>>> -bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -viewer_socket_port
>>> 5006 -info
>>>
>>> On lucky days, this works, on unlucky days, petsc will stop here:
>>>
>>> [1] PetscInitialize(): PETSc successfully started: number of processors = 2
>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>> [0] PetscInitialize(): PETSc successfully started: number of processors = 2
>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
>>> max tags = 2147483647
>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784
>>> max tags = 2147483647
>>> [1] PetscCommDuplicate(): returning tag 2147483647
>>> [0] PetscCommDuplicate(): returning tag 2147483647
>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on port
>>> 5006 machine borr.mas.cwi.nl
>>> [0] PetscCommDuplicate(): returning tag 2147483646
>>> [1] PetscCommDuplicate(): returning tag 2147483646
>>> [1] PetscCommDuplicate(): returning tag 2147483641
>>> [0] PetscCommDuplicate(): returning tag 2147483641
>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
>>> ^C
>>>
>>> <makefile>
>>> <test_petsc_par_barry.m>
>>> <petsc_poisson_par_barry2.c>
>>>
>>>
>>>
>>> Op 8 sep 2010, om 12:00 heeft Barry Smith het volgende geschreven:
>>>
>>>>
>>>> On Sep 8, 2010, at 10:13 AM, Benjamin Sanderse wrote:
>>>>
>>>>> Hi Barry,
>>>>>
>>>>> I am indeed closing the socket in Matlab between the two sets, using
>>>>> close(PS), where PS=PetscOpenSocket.
>>>>> I have tried different port numbers, but without guarantee of success.
>>>>> Sometimes it works, sometimes it doesn't. Often times the first time
>>>>> calling the PetscOpenSocket(portnumber) works, but even that is not
>>>>> guaranteed. I think there should be another solution.
>>>>> By the way, all these problems do not appear when using serial vectors
>>>>> instead of parallel.
>>>>
>>>> That is strange. Only the first process ever opens the socket so in theory
>>>> the fact that the PETSc code is parallel should not matter at all. Please
>>>> send me your test code that causes trouble again and I'll see if I can
>>>> reproduce the problem.
>>>>
>>>> Barry
>>>>
>>>>>
>>>>> Ben
>>>>>
>>>>> Op 7 sep 2010, om 17:27 heeft Barry Smith het volgende geschreven:
>>>>>
>>>>>>
>>>>>> Are you closing the socket on Matlab between to the two sets? Just
>>>>>> checking.
>>>>>>
>>>>>> You can try running with a different port number each time to see if it
>>>>>> is related to trying to reuse the port. Run with PetscOpenSocket(5006)
>>>>>> and the PETSc program with -viewer_socket_port 5006
>>>>>> then run both with 5007 then with 5008 etc does this work smoothly?
>>>>>>
>>>>>> Let me know and the will tell me the next step to try,
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>> On Sep 7, 2010, at 10:53 AM, Benjamin Sanderse wrote:
>>>>>>
>>>>>>> Hi Barry,
>>>>>>>
>>>>>>> I am still not too happy with the execution in parallel. I am working
>>>>>>> under Linux (64 bits) and still using your approach with two command
>>>>>>> windows (since it gives the best debugging possibility).
>>>>>>> As I said, sometimes things work, but most of the time not. Here is the
>>>>>>> output of two successive runs
>>>>>>>
>>>>>>> -bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>>>>>>> [1] PetscInitialize(): PETSc successfully started: number of processors
>>>>>>> = 2
>>>>>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of processors
>>>>>>> = 2
>>>>>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483647
>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483647
>>>>>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on
>>>>>>> port 5005 machine borr.mas.cwi.nl
>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483646
>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483646
>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483641
>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483641
>>>>>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
>>>>>>> [1] PetscFinalize(): PetscFinalize() called
>>>>>>> [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user
>>>>>>> MPI_Comm 1140850688
>>>>>>> [1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
>>>>>>> [1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
>>>>>>> [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user
>>>>>>> MPI_Comm -2080374784
>>>>>>> [0] PetscFinalize(): PetscFinalize() called
>>>>>>> [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user
>>>>>>> MPI_Comm 1140850688
>>>>>>> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
>>>>>>> [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
>>>>>>> [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user
>>>>>>> MPI_Comm -2080374784
>>>>>>>
>>>>>>>
>>>>>>> -bash-4.0$ netstat | grep 5005
>>>>>>>
>>>>>>>
>>>>>>> -bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>>>>>>> [1] PetscInitialize(): PETSc successfully started: number of processors
>>>>>>> = 2
>>>>>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of processors
>>>>>>> = 2
>>>>>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483647
>>>>>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on
>>>>>>> port 5005 machine borr.mas.cwi.nl
>>>>>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483647
>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483646
>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483646
>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483641
>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483641
>>>>>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
>>>>>>> ^C
>>>>>>> -bash-4.0$ [0]0:Return code = 0, signaled with Interrupt
>>>>>>> [0]1:Return code = 0, signaled with Interrupt
>>>>>>>
>>>>>>>
>>>>>>> In both cases I first started the Matlab program. I am currently
>>>>>>> starting Matlab without a GUI, but with a GUI I have the same problems.
>>>>>>> As you can see, in the first case everything works fine, and Petsc
>>>>>>> finalizes and closes. Matlab gives me the correct output. The second
>>>>>>> case, run just a couple of seconds later, does not reach PetscFinalize
>>>>>>> and Matlab does not give the correct output. In between the two cases I
>>>>>>> checked if port 5005 was in use, and it was not.
>>>>>>> Do you have any more suggestions on how to get this to work properly?
>>>>>>>
>>>>>>> Benjamin
>>>>>>>
>>>>>>> Op 3 sep 2010, om 21:11 heeft Barry Smith het volgende geschreven:
>>>>>>>
>>>>>>>>
>>>>>>>> On Sep 3, 2010, at 4:32 PM, Benjamin Sanderse wrote:
>>>>>>>>
>>>>>>>>> Hi Barry,
>>>>>>>>>
>>>>>>>>> Thanks for your help! However, there are still some issues left. In
>>>>>>>>> other to test things, I simplified the program even more and now I am
>>>>>>>>> just sending a vector back and forth: matlab->petsc->matlab:
>>>>>>>>>
>>>>>>>>> fd = PETSC_VIEWER_SOCKET_WORLD;
>>>>>>>>>
>>>>>>>>> // load rhs vector
>>>>>>>>> ierr = VecLoad(fd,VECMPI,&b);CHKERRQ(ierr);
>>>>>>>>>
>>>>>>>>> // send to matlab
>>>>>>>>> ierr = VecView(b,fd);CHKERRQ(ierr);
>>>>>>>>> ierr = VecDestroy(b);CHKERRQ(ierr);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Your approach with two windows works *sometimes*. I removed the
>>>>>>>>> 'launch' statement and I executed my program 10 times, the first 2
>>>>>>>>> times worked, and in all other cases I got this:
>>>>>>>>>
>>>>>>>>> petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>>>>>>>>> [1] PetscInitialize(): PETSc successfully started: number of
>>>>>>>>> processors = 2
>>>>>>>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of
>>>>>>>>> processors = 2
>>>>>>>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688
>>>>>>>>> -2080374784 max tags = 2147483647
>>>>>>>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688
>>>>>>>>> -2080374784 max tags = 2147483647
>>>>>>>>> [1] PetscCommDuplicate(): returning tag 2147483647
>>>>>>>>> [0] PetscCommDuplicate(): returning tag 2147483647
>>>>>>>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on
>>>>>>>>> port 5005 machine borr.mas.cwi.nl
>>>>>>>>> [0] PetscOpenSocket(): Connection refused in attaching socket, trying
>>>>>>>>> again[0] PetscOpenSocket(): Connection refused in attaching socket,
>>>>>>>>> trying again[0] PetscOpenSocket(): Connection refused in attaching
>>>>>>>>> socket, trying again
>>>>>>>>> [0] PetscOpenSocket(): Connection refused in attaching socket, trying
>>>>>>>>> again^C
>>>>>>>>> -bash-4.0$ [0]0:Return code = 0, signaled with Interrupt
>>>>>>>>> [0]1:Return code = 0, signaled with Interrupt
>>>>>>>>>
>>>>>>>>> Every time I start the program I use close(socket) and clear all in
>>>>>>>>> Matlab, so the socket from the previous run should not be present
>>>>>>>>> anymore. It seems that the port gets corrupted after a couple of
>>>>>>>>> times? Matlab does not respond and I have to kill it and restart it
>>>>>>>>> manually.
>>>>>>>>
>>>>>>>> Sometimes when you close a socket connection it doesn't close for a
>>>>>>>> very long time so that if you try to open it again it doesn't work.
>>>>>>>> When it appears the socket can not be used try using netstat | grep
>>>>>>>> 5005 to see if the socket is still active.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - If I include the launch statement, or just type
>>>>>>>>> system('mpiexec -n 2 ./petsc_poisson_par_barry2 &')
>>>>>>>>> the program never works.
>>>>>>>>
>>>>>>>> Are you sure mpiexec is in the path of system and it is the right one?
>>>>>>>> The problem is that we are kind of cheating with system because we
>>>>>>>> start a new job in the background and have no idea what the output is.
>>>>>>>> Are you using unix and running Matlab on the command line or in a GUI?
>>>>>>>>
>>>>>>>> Barry
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope you can figure out what is going wrong.
>>>>>>>>>
>>>>>>>>> Ben
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Op 3 sep 2010, om 13:25 heeft Barry Smith het volgende geschreven:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ben
>>>>>>>>>>
>>>>>>>>>> Ok, I figured out the problem. It is not fundamental and mostly
>>>>>>>>>> comes from not having a create way to debug this.
>>>>>>>>>>
>>>>>>>>>> The test vector you create is sequential then you try to view it
>>>>>>>>>> back to Matlab with the parallel fd viewer. If you change to
>>>>>>>>>> ierr =
>>>>>>>>>> VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,1,&test);CHKERRQ(ierr);
>>>>>>>>>> then the code runs.
>>>>>>>>>>
>>>>>>>>>> I've found (just now) that when I use launch all the output from the
>>>>>>>>>> .c program gets lost which makes it impossible to figure out what
>>>>>>>>>> has gone wrong. You can debug by running the two parts of the
>>>>>>>>>> computation in two different windows. So comment out the launch from
>>>>>>>>>> the matlab script and then in Matlab run the script (it will hang
>>>>>>>>>> waiting for the socket to work) and in a separate terminal window
>>>>>>>>>> run the .c program; for example petscmpiexec -n 2 ./ex1 -info Now
>>>>>>>>>> you see exactly what is happening in the PETSc program. You can even
>>>>>>>>>> use -start_in_debugger on the PETSc side to run the debugger on
>>>>>>>>>> crashes.
>>>>>>>>>>
>>>>>>>>>> I'll add this to the docs for launch
>>>>>>>>>>
>>>>>>>>>> Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sep 2, 2010, at 3:28 PM, Benjamin Sanderse wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Barry,
>>>>>>>>>>>
>>>>>>>>>>> I attached my matlab file, c file and makefile. First I generate
>>>>>>>>>>> the executable with 'make petsc_poisson_par_barry' and then I run
>>>>>>>>>>> test_petsc_par_barry.m.
>>>>>>>>>>> If you change MATMPIAIJ to MATAIJ and VECMPI to VECSEQ the code
>>>>>>>>>>> works fine.
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>>
>>>>>>>>>>> Benjamin
>>>>>>>>>>>
>>>>>>>>>>> <makefile><test_petsc_par_barry.m><petsc_poisson_par_barry.c>
>>>>>>>>>>>
>>>>>>>>>>> Op 2 sep 2010, om 13:45 heeft Barry Smith het volgende geschreven:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Matlab is never aware the vector is parallel. Please send me the
>>>>>>>>>>>> code and I'll figure out what is going on.
>>>>>>>>>>>>
>>>>>>>>>>>> Barry
>>>>>>>>>>>>
>>>>>>>>>>>> On Sep 2, 2010, at 2:07 PM, Benjamin Sanderse wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> That sounds great, but there is one issue I am encountering. I
>>>>>>>>>>>>> switched vector types to VECMPI and matrix type to MATMPIAIJ, but
>>>>>>>>>>>>> when running Matlab I get the following error:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Found unrecogonized header 0 in file. If your file contains
>>>>>>>>>>>>> complex numbers
>>>>>>>>>>>>> then call PetscBinaryRead() with "complex" as the second argument
>>>>>>>>>>>>> Error in ==> PetscBinaryRead at 27
>>>>>>>>>>>>> if nargin < 2
>>>>>>>>>>>>>
>>>>>>>>>>>>> ??? Output argument "varargout" (and maybe others) not assigned
>>>>>>>>>>>>> during call to
>>>>>>>>>>>>> "/ufs/sanderse/Software/petsc-3.1-p4/bin/matlab/PetscBinaryRead.m>PetscBinaryRead".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Error in ==> test_petsc_par at 57
>>>>>>>>>>>>> x4 = PetscBinaryReady(PS);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could it be that Matlab does not understand the "parallel" vector
>>>>>>>>>>>>> which is returned by Petsc? Currently I have this done with
>>>>>>>>>>>>> VecView as follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>> fd = PETSC_VIEWER_SOCKET_WORLD;
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> KSPSolve(ksp,b,x);
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> VecView(fd,x);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the help!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ben
>>>>>>>>>>>>>
>>>>>>>>>>>>> Op 2 sep 2010, om 10:09 heeft Barry Smith het volgende geschreven:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sep 2, 2010, at 10:51 AM, Benjamin Sanderse wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I figured out the coupling with Matlab and I can send back and
>>>>>>>>>>>>>>> forth matrices and vectors between Petsc and Matlab. Actually,
>>>>>>>>>>>>>>> I send only once a matrix from Matlab to Petsc and then
>>>>>>>>>>>>>>> repeatedly send new right hand sides from Matlab->Petsc and the
>>>>>>>>>>>>>>> solution vector from Petsc->Matlab. That works great.
>>>>>>>>>>>>>>> I know want to see if the matrix that is send from (serial)
>>>>>>>>>>>>>>> Matlab to Petsc can be stored as a parallel matrix in Petsc so
>>>>>>>>>>>>>>> that subsequent computations with different right hand sides
>>>>>>>>>>>>>>> can be performed in parallel by Petsc. Does this simply work by
>>>>>>>>>>>>>>> using MatLoad and setting Mattype MPIAIJ? Or is something more
>>>>>>>>>>>>>>> fancy required?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In theory this can be done using the same code as sequential
>>>>>>>>>>>>>> only with parallel vectors VECMPI and matrices. MATMPIAIJ
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ben
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>