Hi,

Ya that's better. I've been trying to remove the unnecessary codes (also some copyright issues with the university) and to present only the code which shows the error. But strangely, by restructuring and removing all the redundant stuffs, the error is no longer there. I'm now trying to add back and re-construct my code and see if it appears. I'll let you people know if there's update.

Thank you.

Yours sincerely,

TAY wee-beng

On 21/4/2014 8:58 AM, Barry Smith wrote:
    Please send the entire code. If we can run it and reproduce the problem we 
can likely track down the issue much faster than through endless rounds of 
email.

    Barry

On Apr 20, 2014, at 7:49 PM, TAY wee-beng<zon...@gmail.com>  wrote:

On 20/4/2014 8:39 AM, TAY wee-beng wrote:
On 20/4/2014 1:02 AM, Matthew Knepley wrote:
On Sat, Apr 19, 2014 at 10:49 AM, TAY wee-beng<zon...@gmail.com>  wrote:
On 19/4/2014 11:39 PM, Matthew Knepley wrote:
On Sat, Apr 19, 2014 at 10:16 AM, TAY wee-beng<zon...@gmail.com>  wrote:
On 19/4/2014 10:55 PM, Matthew Knepley wrote:
On Sat, Apr 19, 2014 at 9:14 AM, TAY wee-beng<zon...@gmail.com>  wrote:
On 19/4/2014 6:48 PM, Matthew Knepley wrote:
On Sat, Apr 19, 2014 at 4:59 AM, TAY wee-beng<zon...@gmail.com>  wrote:
On 19/4/2014 1:17 PM, Barry Smith wrote:
On Apr 19, 2014, at 12:11 AM, TAY wee-beng<zon...@gmail.com>  wrote:

On 19/4/2014 12:10 PM, Barry Smith wrote:
On Apr 18, 2014, at 9:57 PM, TAY wee-beng<zon...@gmail.com>  wrote:

On 19/4/2014 3:53 AM, Barry Smith wrote:
    Hmm,

        Interface DMDAVecGetArrayF90
          Subroutine DMDAVecGetArrayF903(da1, v,d1,ierr)
            USE_DM_HIDE
            DM_HIDE da1
            VEC_HIDE v
            PetscScalar,pointer :: d1(:,:,:)
            PetscErrorCode ierr
          End Subroutine

     So the d1 is a F90 POINTER. But your subroutine seems to be treating it as 
a “plain old Fortran array”?
real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)
Hi,

So d1 is a pointer, and it's different if I declare it as "plain old Fortran array"? 
Because I declare it as a Fortran array and it works w/o any problem if I only call 
DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u".

But if I call DMDAVecGetArrayF90 and DMDAVecRestoreArrayF90 with "u", "v" and 
"w", error starts to happen. I wonder why...

Also, supposed I call:

call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)

     call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)

     call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)

u_array ....

v_array .... etc

Now to restore the array, does it matter the sequence they are restored?
     No it should not matter. If it matters that is a sign that memory has been 
written to incorrectly earlier in the code.

Hi,

Hmm, I have been getting different results on different intel compilers. I'm 
not sure if MPI played a part but I'm only using a single processor. In the 
debug mode, things run without problem. In optimized mode, in some cases, the 
code aborts even doing simple initialization:


call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)

     call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)

     call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)

     call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)

     u_array = 0.d0

     v_array = 0.d0

     w_array = 0.d0

     p_array = 0.d0


     call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)


     call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)

     call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)

     call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)

The code aborts at call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr), 
giving segmentation error. But other                                            
         version of intel compiler passes thru this part w/o error. Since the 
response is different among different compilers, is this PETSc or intel 's bug? 
Or mvapich or openmpi?

We do this is a bunch of examples. Can you reproduce this different behavior in 
src/dm/examples/tutorials/ex11f90.F?
Hi Matt,

Do you mean putting the above lines into ex11f90.F and test?

It already has DMDAVecGetArray(). Just run it.
Hi,

It worked. The differences between mine and the code is the way the fortran 
modules are defined, and the ex11f90 only uses global vectors. Does it make a 
difference whether global or local vectors are used? Because the way it 
accesses x1 only touches the local region.

No the global/local difference should not matter.
Also, before using DMDAVecGetArrayF90, DMGetGlobalVector must be used 1st, is that so? I can't find the equivalent for local vector though.

DMGetLocalVector()
Ops, I do not have DMGetLocalVector and DMRestoreLocalVector in my code. Does 
it matter?

If so, when should I call them?

You just need a local vector from somewhere.
Hi,

Anyone can help with the questions below? Still trying to find why my code 
doesn't work.

Thanks.
Hi,

I insert part of my error region code into ex11f90:

call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr) call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr) call DMDAVecGetArrayF90(da_p,p_local,p_array,ierr)

     u_array = 0.d0
v_array = 0.d0 w_array = 0.d0 p_array = 0.d0

     call DMDAVecRestoreArrayF90(da_p,p_local,p_array,ierr)

     call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)

     call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)

     call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)

It worked w/o error. I'm going to change the way the modules are defined in my 
code.

My code contains a main program and a number of modules files, with subroutines 
inside e.g.

module solve
                  <- add include file?
subroutine RRK
                  <- add include file?
end subroutine RRK

end module solve

So where should the include files (#include <finclude/petscdmda.h90>) be placed?

After the module or inside the subroutine?

Thanks.
   Matt
Thanks.
    Matt
Thanks.
    Matt
Thanks

Regards.
    Matt
As in w, then v and u?

call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)
call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)

thanks
     Note also that the beginning and end indices of the u,v,w, are different 
for each process see for 
examplehttp://www.mcs.anl.gov/petsc/petsc-3.4/src/dm/examples/tutorials/ex11f90.F
   (and they do not start at 1). This is how to get the loop bounds.
Hi,

In my case, I fixed the u,v,w such that their indices are the same. I also 
checked using DMDAGetCorners and DMDAGetGhostCorners. Now the problem lies in 
my subroutine treating it as a “plain old Fortran array”.

If I declare them as pointers, their indices follow the C 0 start convention, 
is that so?
     Not really. It is that in each process you need to access them from the 
indices indicated by DMDAGetCorners() for global vectors and 
DMDAGetGhostCorners() for local vectors.  So really C or Fortran                
                                           doesn’t make any difference.


So my problem now is that in my old MPI code, the u(i,j,k) follow the Fortran 1 
start convention. Is there some way to manipulate such that I do not have to 
change my u(i,j,k) to u(i-1,j-1,k-1)?
    If you code wishes to access them with indices plus one from the values 
returned by DMDAGetCorners() for global vectors and DMDAGetGhostCorners() for 
local vectors then you need to manually subtract off the 1.

    Barry

Thanks.
    Barry

On Apr 18, 2014, at 10:58 AM, TAY wee-beng<zon...@gmail.com>  wrote:

Hi,

I tried to pinpoint the problem. I reduced my job size and hence I can run on 1 
processor. Tried using valgrind but perhaps I'm using the optimized version, it didn't 
catch the error, besides saying "Segmentation fault (core dumped)"

However, by re-writing my code, I found out a few things:

1. if I write my code this way:

call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)

call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)

call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)

u_array = ....

v_array = ....

w_array = ....

call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)

call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)

call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)

The code runs fine.

2. if I write my code this way:

call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)

call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)

call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)

call uvw_array_change(u_array,v_array,w_array) -> this subroutine does the same 
modification as the above.

call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)

call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)

call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr) -> error

where the subroutine is:

subroutine uvw_array_change(u,v,w)

real(8), intent(inout) :: u(:,:,:),v(:,:,:),w(:,:,:)

u ...
v...
w ...

end subroutine uvw_array_change.

The above will give an error at :

call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)

3. Same as above, except I change the order of the last 3 lines to:

call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)

call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)

call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)

So they are now in reversed order. Now it works.

4. Same as 2 or 3, except the subroutine is changed to :

subroutine uvw_array_change(u,v,w)

real(8), intent(inout) :: 
u(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))

real(8), intent(inout) :: 
v(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))

real(8), intent(inout) :: 
w(start_indices(1):end_indices(1),start_indices(2):end_indices(2),start_indices(3):end_indices(3))

u ...
v...
w ...

end subroutine uvw_array_change.

The start_indices and end_indices are simply to shift the 0 indices of C convention to 
that of the 1 indices of the Fortran convention. This is necessary in my case because 
most of my codes start array counting at 1, hence the "trick".

However, now no matter which order of the DMDAVecRestoreArrayF90 (as in 2 or 3), error 
will occur at "call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr) "

So did I violate and cause memory corruption due to the trick above? But I can't think of 
any way other                                                           than the 
"trick" to continue using the 1 indices convention.

Thank you.

Yours sincerely,

TAY wee-beng

On 15/4/2014 8:00 PM, Barry Smith wrote:
    Try running under 
valgrindhttp://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind


On Apr 14, 2014, at 9:47 PM, TAY wee-beng<zon...@gmail.com>  wrote:

Hi Barry,

As I mentioned earlier, the code works fine in PETSc debug mode but fails in 
non-debug mode.

I have attached my code.

Thank you

Yours sincerely,

TAY wee-beng

On 15/4/2014 2:26 AM, Barry Smith wrote:
    Please send the code that creates da_w and the declarations of w_array

    Barry

On Apr 14, 2014, at 9:40 AM, TAY wee-beng
<zon...@gmail.com>
   wrote:


Hi Barry,

I'm not too sure how to do it. I'm running mpi. So I run:

   mpirun -n 4 ./a.out -start_in_debugger

I got the msg below. Before the gdb windows appear (thru x11), the program 
aborts.

Also I tried running in another cluster and it worked. Also tried in the 
current cluster in debug mode and it worked too.

mpirun -n 4 ./a.out -start_in_debugger
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

    Local host:          n12-76 (PID 20235)
    MPI_COMM_WORLD rank: 2

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
[2]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20235 on display 
localhost:50.0 on machine n12-76
[0]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20233 on display 
localhost:50.0 on machine n12-76
[1]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20234 on display 
localhost:50.0 on machine n12-76
[3]PETSC ERROR: PETSC: Attaching gdb to ./a.out of pid 20236 on display 
localhost:50.0 on machine n12-76
[n12-76:20232] 3 more processes have sent help message help-mpi-runtime.txt / 
mpi_init:warn-fork
[n12-76:20232] Set MCA parameter "orte_base_help_aggregate" to 0 to see all 
help / error messages

....

   1
[1]PETSC ERROR: 
------------------------------------------------------------------------
[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[1]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[1]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[1]PETSC ERROR: or 
tryhttp://valgrind.org
   on GNU/linux and Apple Mac OS X to find memory corruption errors
[1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[1]PETSC ERROR: to get more information on the crash.
[1]PETSC ERROR: User provided function() line 0 in unknown directory unknown 
file (null)
[3]PETSC ERROR: 
------------------------------------------------------------------------
[3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[3]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[3]PETSC ERROR: or 
tryhttp://valgrind.org
   on GNU/linux and Apple Mac OS X to find memory corruption errors
[3]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[3]PETSC ERROR: to get more information on the crash.
[3]PETSC ERROR: User provided function() line 0 in unknown directory unknown 
file (null)

...
Thank you.

Yours sincerely,

TAY wee-beng

On 14/4/2014 9:05 PM, Barry Smith wrote:

    Because IO doesn’t always get flushed immediately it may not be hanging at 
this point.  It is better to use the option -start_in_debugger then type cont 
in each debugger window and then when you think it is “hanging” do a control C 
in each debugger window and type where to see where each process is you can 
also look around in the debugger at variables to see why it is “hanging” at 
that point.

     Barry

    This routines don’t have any parallel communication in them so are unlikely 
to hang.

On Apr 14, 2014, at 6:52 AM, TAY wee-beng

<zon...@gmail.com>

   wrote:



Hi,

My code hangs and I added in mpi_barrier and print to catch the bug. I found that it 
hangs after printing "7". Is it because I'm doing something wrong? I need to 
access the u,v,w array so I use DMDAVecGetArrayF90. After access, I use 
DMDAVecRestoreArrayF90.

          call DMDAVecGetArrayF90(da_u,u_local,u_array,ierr)
          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"3"
          call DMDAVecGetArrayF90(da_v,v_local,v_array,ierr)
          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"4"
          call DMDAVecGetArrayF90(da_w,w_local,w_array,ierr)
          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"5"
          call 
I_IIB_uv_initial_1st_dm(I_cell_no_u1,I_cell_no_v1,I_cell_no_w1,I_cell_u1,I_cell_v1,I_cell_w1,u_array,v_array,w_array)
          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"6"
          call DMDAVecRestoreArrayF90(da_w,w_local,w_array,ierr)  !must be in 
reverse order
          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"7"
          call DMDAVecRestoreArrayF90(da_v,v_local,v_array,ierr)
          call MPI_Barrier(MPI_COMM_WORLD,ierr);  if (myid==0) print *,"8"
          call DMDAVecRestoreArrayF90(da_u,u_local,u_array,ierr)
--
Thank you.

Yours sincerely,

TAY wee-beng



<code.txt>




--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

Reply via email to