On Sep 9, 2010, at 2:51 PM, Benjamin Sanderse wrote:

> I have installed Petsc on a remote 64-bits Linux (Fedora). Might there be an 
> issue with the 64-bits?

   Shouldn't mater.

> 
> My own computer is a Macbook Pro, but I am not running Petsc on it, because 
> in the past I have had severe problems with getting mex-files to work under 
> 64-bits Matlab.

   Yes, that is painful, Matlab never respected the Mac

> I could try to install it, though.
> Another option is that I try it on another remote Linux machine, also 
> 64-bits, running Red Hat. 

  You could try this to see if the same problem exists.

> 
> What do you suggest?

   Are you sure it is hanging on the sockets not hanging on running MPI jobs 
one after each other?

   Barry

> 
> 
> Op 9 sep 2010, om 13:10 heeft Barry Smith het volgende geschreven:
> 
>> 
>> What OS are you using. 
>> 
>>  On my Apple Mac I made a shell script loop calling petsc_poisson_par_barry2 
>> multiple times and a similar loop in Matlab and start then both off (with 
>> parallel PETSc runs). It runs flawlessly, opening the socket sending and 
>> receiving, dozens of times in a row with several processes. I think that 
>> maybe you are using Linux?
>> 
>> 
>>  Barry
>> 
>> 
>> On Sep 8, 2010, at 2:32 PM, Benjamin Sanderse wrote:
>> 
>>> That's also what I thought. I checked once again, and I found out that when 
>>> I use 
>>> 
>>> petscmpiexec -n 1
>>> 
>>> the program works, but if I increase the number of processors to 2 it only 
>>> works once in a while.
>>> 
>>> I attached my test code. It is extremely simple and does nothing else than 
>>> passing a vector to petsc and returning it to matlab.
>>> 
>>> I run it as follows:
>>> 
>>> shell #1
>>> -bash-4.0$ make petsc_poisson_par_barry2
>>> 
>>> shell #2
>>> -bash-4.0$ matlab -nojvm -nodisplay
>>>>> test_petsc_par_barry;
>>> 
>>> shell #1
>>> -bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -viewer_socket_port 
>>> 5006 -info
>>> 
>>> On lucky days, this works, on unlucky days, petsc will stop here:
>>> 
>>> [1] PetscInitialize(): PETSc successfully started: number of processors = 2
>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>> [0] PetscInitialize(): PETSc successfully started: number of processors = 2
>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 
>>> max tags = 2147483647
>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 -2080374784 
>>> max tags = 2147483647
>>> [1] PetscCommDuplicate():   returning tag 2147483647
>>> [0] PetscCommDuplicate():   returning tag 2147483647
>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on port 
>>> 5006 machine borr.mas.cwi.nl
>>> [0] PetscCommDuplicate():   returning tag 2147483646
>>> [1] PetscCommDuplicate():   returning tag 2147483646
>>> [1] PetscCommDuplicate():   returning tag 2147483641
>>> [0] PetscCommDuplicate():   returning tag 2147483641
>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
>>> ^C
>>> 
>>> <makefile>
>>> <test_petsc_par_barry.m>
>>> <petsc_poisson_par_barry2.c>
>>> 
>>> 
>>> 
>>> Op 8 sep 2010, om 12:00 heeft Barry Smith het volgende geschreven:
>>> 
>>>> 
>>>> On Sep 8, 2010, at 10:13 AM, Benjamin Sanderse wrote:
>>>> 
>>>>> Hi Barry,
>>>>> 
>>>>> I am indeed closing the socket in Matlab between the two sets, using 
>>>>> close(PS), where PS=PetscOpenSocket.
>>>>> I have tried different port numbers, but without guarantee of success. 
>>>>> Sometimes it works, sometimes it doesn't. Often times the first time 
>>>>> calling the PetscOpenSocket(portnumber) works, but even that is not 
>>>>> guaranteed. I think there should be another solution.
>>>>> By the way, all these problems do not appear when using serial vectors 
>>>>> instead of parallel.
>>>> 
>>>> That is strange. Only the first process ever opens the socket so in theory 
>>>> the fact that the PETSc code is parallel should not matter at all. Please 
>>>> send me your test code that causes trouble again and I'll see if I can 
>>>> reproduce the problem.
>>>> 
>>>> Barry
>>>> 
>>>>> 
>>>>> Ben
>>>>> 
>>>>> Op 7 sep 2010, om 17:27 heeft Barry Smith het volgende geschreven:
>>>>> 
>>>>>> 
>>>>>> Are you closing the socket on Matlab between to the two sets? Just 
>>>>>> checking.
>>>>>> 
>>>>>> You can try running with a different port number each time to see if it 
>>>>>> is related to trying to reuse the port. Run with PetscOpenSocket(5006) 
>>>>>> and the PETSc program with -viewer_socket_port  5006
>>>>>> then run both with 5007 then with 5008 etc does this work smoothly?
>>>>>> 
>>>>>> Let me know and the will tell me the next step to try,
>>>>>> 
>>>>>> Barry
>>>>>> 
>>>>>> On Sep 7, 2010, at 10:53 AM, Benjamin Sanderse wrote:
>>>>>> 
>>>>>>> Hi Barry,
>>>>>>> 
>>>>>>> I am still not too happy with the execution in parallel. I am working 
>>>>>>> under Linux (64 bits) and still using your approach with two command 
>>>>>>> windows (since it gives the best debugging possibility). 
>>>>>>> As I said, sometimes things work, but most of the time not. Here is the 
>>>>>>> output of two successive runs
>>>>>>> 
>>>>>>> -bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>>>>>>> [1] PetscInitialize(): PETSc successfully started: number of processors 
>>>>>>> = 2
>>>>>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of processors 
>>>>>>> = 2
>>>>>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483647
>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483647
>>>>>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on 
>>>>>>> port 5005 machine borr.mas.cwi.nl
>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483646
>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483646
>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483641
>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483641
>>>>>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
>>>>>>> [1] PetscFinalize(): PetscFinalize() called
>>>>>>> [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user 
>>>>>>> MPI_Comm 1140850688
>>>>>>> [1] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
>>>>>>> [1] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
>>>>>>> [1] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user 
>>>>>>> MPI_Comm -2080374784
>>>>>>> [0] PetscFinalize(): PetscFinalize() called
>>>>>>> [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user 
>>>>>>> MPI_Comm 1140850688
>>>>>>> [0] PetscCommDestroy(): Deleting PETSc MPI_Comm -2080374784
>>>>>>> [0] Petsc_DelCounter(): Deleting counter data in an MPI_Comm -2080374784
>>>>>>> [0] Petsc_DelComm(): Deleting PETSc communicator imbedded in a user 
>>>>>>> MPI_Comm -2080374784
>>>>>>> 
>>>>>>> 
>>>>>>> -bash-4.0$ netstat | grep 5005
>>>>>>> 
>>>>>>> 
>>>>>>> -bash-4.0$ petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>>>>>>> [1] PetscInitialize(): PETSc successfully started: number of processors 
>>>>>>> = 2
>>>>>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of processors 
>>>>>>> = 2
>>>>>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483647
>>>>>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on 
>>>>>>> port 5005 machine borr.mas.cwi.nl
>>>>>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 
>>>>>>> -2080374784 max tags = 2147483647
>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483647
>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483646
>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483646
>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483641
>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483641
>>>>>>> [0] VecAssemblyBegin_MPI(): Stash has 0 entries, uses 0 mallocs.
>>>>>>> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
>>>>>>> ^C
>>>>>>> -bash-4.0$ [0]0:Return code = 0, signaled with Interrupt
>>>>>>> [0]1:Return code = 0, signaled with Interrupt
>>>>>>> 
>>>>>>> 
>>>>>>> In both cases I first started the Matlab program. I am currently 
>>>>>>> starting Matlab without a GUI, but with a GUI I have the same problems.
>>>>>>> As you can see, in the first case everything works fine, and Petsc 
>>>>>>> finalizes and closes. Matlab gives me the correct output. The second 
>>>>>>> case, run just a couple of seconds later, does not reach PetscFinalize 
>>>>>>> and Matlab does not give the correct output. In between the two cases I 
>>>>>>> checked if port 5005 was in use, and it was not. 
>>>>>>> Do you have any more suggestions on how to get this to work properly?
>>>>>>> 
>>>>>>> Benjamin
>>>>>>> 
>>>>>>> Op 3 sep 2010, om 21:11 heeft Barry Smith het volgende geschreven:
>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sep 3, 2010, at 4:32 PM, Benjamin Sanderse wrote:
>>>>>>>> 
>>>>>>>>> Hi Barry,
>>>>>>>>> 
>>>>>>>>> Thanks for your help! However, there are still some issues left. In 
>>>>>>>>> other to test things, I simplified the program even more and now I am 
>>>>>>>>> just sending a vector back and forth: matlab->petsc->matlab:
>>>>>>>>> 
>>>>>>>>> fd   = PETSC_VIEWER_SOCKET_WORLD;
>>>>>>>>> 
>>>>>>>>> // load rhs vector
>>>>>>>>> ierr = VecLoad(fd,VECMPI,&b);CHKERRQ(ierr);
>>>>>>>>> 
>>>>>>>>> // send to matlab
>>>>>>>>> ierr = VecView(b,fd);CHKERRQ(ierr);
>>>>>>>>> ierr = VecDestroy(b);CHKERRQ(ierr);
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - Your approach with two windows works *sometimes*. I removed the 
>>>>>>>>> 'launch' statement and I executed my program 10 times, the first 2 
>>>>>>>>> times worked, and in all other cases I got this:
>>>>>>>>> 
>>>>>>>>> petscmpiexec -n 2 ./petsc_poisson_par_barry2 -info
>>>>>>>>> [1] PetscInitialize(): PETSc successfully started: number of 
>>>>>>>>> processors = 2
>>>>>>>>> [1] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>>>> [0] PetscInitialize(): PETSc successfully started: number of 
>>>>>>>>> processors = 2
>>>>>>>>> [0] PetscInitialize(): Running on machine: borr.mas.cwi.nl
>>>>>>>>> [0] PetscCommDuplicate(): Duplicating a communicator 1140850688 
>>>>>>>>> -2080374784 max tags = 2147483647
>>>>>>>>> [1] PetscCommDuplicate(): Duplicating a communicator 1140850688 
>>>>>>>>> -2080374784 max tags = 2147483647
>>>>>>>>> [1] PetscCommDuplicate():   returning tag 2147483647
>>>>>>>>> [0] PetscCommDuplicate():   returning tag 2147483647
>>>>>>>>> [0] PetscViewerSocketSetConnection(): Connecting to socket process on 
>>>>>>>>> port 5005 machine borr.mas.cwi.nl
>>>>>>>>> [0] PetscOpenSocket(): Connection refused in attaching socket, trying 
>>>>>>>>> again[0] PetscOpenSocket(): Connection refused in attaching socket, 
>>>>>>>>> trying again[0] PetscOpenSocket(): Connection refused in attaching 
>>>>>>>>> socket, trying again
>>>>>>>>> [0] PetscOpenSocket(): Connection refused in attaching socket, trying 
>>>>>>>>> again^C
>>>>>>>>> -bash-4.0$ [0]0:Return code = 0, signaled with Interrupt
>>>>>>>>> [0]1:Return code = 0, signaled with Interrupt
>>>>>>>>> 
>>>>>>>>> Every time I start the program I use close(socket) and clear all in 
>>>>>>>>> Matlab, so the socket from the previous run should not be present 
>>>>>>>>> anymore. It seems that the port gets corrupted after a couple of 
>>>>>>>>> times? Matlab does not respond and I have to kill it and restart it 
>>>>>>>>> manually.
>>>>>>>> 
>>>>>>>> Sometimes when you close a socket connection it doesn't close for a 
>>>>>>>> very long time so that if you try to open it again it doesn't work. 
>>>>>>>> When it appears the socket can not be used try using netstat | grep 
>>>>>>>> 5005 to see if the socket is still active. 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> - If I include the launch statement, or just type
>>>>>>>>> system('mpiexec -n 2 ./petsc_poisson_par_barry2 &')
>>>>>>>>> the program never works. 
>>>>>>>> 
>>>>>>>> Are you sure mpiexec is in the path of system and it is the right one? 
>>>>>>>> The problem is that we are kind of cheating with system because we 
>>>>>>>> start a new job in the background and have no idea what the output is. 
>>>>>>>> Are you using unix and running Matlab on the command line or in a GUI?
>>>>>>>> 
>>>>>>>> Barry
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Hope you can figure out what is going wrong.
>>>>>>>>> 
>>>>>>>>> Ben
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Op 3 sep 2010, om 13:25 heeft Barry Smith het volgende geschreven:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Ben
>>>>>>>>>> 
>>>>>>>>>> Ok, I figured out the problem. It is not fundamental and mostly 
>>>>>>>>>> comes from not having a create way to debug this.
>>>>>>>>>> 
>>>>>>>>>> The test vector you create is sequential then you try to view it 
>>>>>>>>>> back to Matlab with the parallel fd viewer. If you change to 
>>>>>>>>>> ierr = 
>>>>>>>>>> VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,1,&test);CHKERRQ(ierr);
>>>>>>>>>> then the code runs.
>>>>>>>>>> 
>>>>>>>>>> I've found (just now) that when I use launch all the output from the 
>>>>>>>>>> .c program gets lost which makes it impossible to figure out what 
>>>>>>>>>> has gone wrong. You can debug by running the two parts of the 
>>>>>>>>>> computation in two different windows. So comment out the launch from 
>>>>>>>>>> the matlab script and then in Matlab run the script (it will hang 
>>>>>>>>>> waiting for the socket to work) and in a separate terminal window 
>>>>>>>>>> run the .c program; for example petscmpiexec -n 2 ./ex1 -info Now 
>>>>>>>>>> you see exactly what is happening in the PETSc program. You can even 
>>>>>>>>>> use -start_in_debugger on the PETSc side to run the debugger on 
>>>>>>>>>> crashes.
>>>>>>>>>> 
>>>>>>>>>> I'll add this to the docs for launch
>>>>>>>>>> 
>>>>>>>>>> Barry
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Sep 2, 2010, at 3:28 PM, Benjamin Sanderse wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Barry,
>>>>>>>>>>> 
>>>>>>>>>>> I attached my matlab file, c file and makefile. First I generate 
>>>>>>>>>>> the executable with 'make petsc_poisson_par_barry' and then I run 
>>>>>>>>>>> test_petsc_par_barry.m. 
>>>>>>>>>>> If you change MATMPIAIJ to MATAIJ and VECMPI to VECSEQ the code 
>>>>>>>>>>> works fine.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>> 
>>>>>>>>>>> Benjamin
>>>>>>>>>>> 
>>>>>>>>>>> <makefile><test_petsc_par_barry.m><petsc_poisson_par_barry.c>
>>>>>>>>>>> 
>>>>>>>>>>> Op 2 sep 2010, om 13:45 heeft Barry Smith het volgende geschreven:
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Matlab is never aware the vector is parallel. Please send me the 
>>>>>>>>>>>> code and I'll figure out what is going on.
>>>>>>>>>>>> 
>>>>>>>>>>>> Barry
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sep 2, 2010, at 2:07 PM, Benjamin Sanderse wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> That sounds great, but there is one issue I am encountering. I 
>>>>>>>>>>>>> switched vector types to VECMPI and matrix type to MATMPIAIJ, but 
>>>>>>>>>>>>> when running Matlab I get the following error:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Found unrecogonized header 0 in file. If your file contains 
>>>>>>>>>>>>> complex numbers
>>>>>>>>>>>>> then call PetscBinaryRead() with "complex" as the second argument
>>>>>>>>>>>>> Error in ==> PetscBinaryRead at 27
>>>>>>>>>>>>> if nargin < 2
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ??? Output argument "varargout" (and maybe others) not assigned 
>>>>>>>>>>>>> during call to 
>>>>>>>>>>>>> "/ufs/sanderse/Software/petsc-3.1-p4/bin/matlab/PetscBinaryRead.m>PetscBinaryRead".
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Error in ==> test_petsc_par at 57
>>>>>>>>>>>>>   x4 = PetscBinaryReady(PS);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Could it be that Matlab does not understand the "parallel" vector 
>>>>>>>>>>>>> which is returned by Petsc? Currently I have this done with 
>>>>>>>>>>>>> VecView as follows:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> fd = PETSC_VIEWER_SOCKET_WORLD;
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> KSPSolve(ksp,b,x);
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> VecView(fd,x);
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for the help!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Ben
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Op 2 sep 2010, om 10:09 heeft Barry Smith het volgende geschreven:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Sep 2, 2010, at 10:51 AM, Benjamin Sanderse wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I figured out the coupling with Matlab and I can send back and 
>>>>>>>>>>>>>>> forth matrices and vectors between Petsc and Matlab. Actually, 
>>>>>>>>>>>>>>> I send only once a matrix from Matlab to Petsc and then 
>>>>>>>>>>>>>>> repeatedly send new right hand sides from Matlab->Petsc and the 
>>>>>>>>>>>>>>> solution vector from Petsc->Matlab. That works great.
>>>>>>>>>>>>>>> I know want to see if the matrix that is send from (serial) 
>>>>>>>>>>>>>>> Matlab to Petsc can be stored as a parallel matrix in Petsc so 
>>>>>>>>>>>>>>> that subsequent computations with different right hand sides 
>>>>>>>>>>>>>>> can be performed in parallel by Petsc. Does this simply work by 
>>>>>>>>>>>>>>> using MatLoad and setting Mattype MPIAIJ? Or is something more 
>>>>>>>>>>>>>>> fancy required?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In theory this can be done using the same code as sequential 
>>>>>>>>>>>>>> only with parallel vectors VECMPI  and matrices. MATMPIAIJ
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Ben
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to