Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-28 Thread Ye Luo
In Fortran, whatever-D array is 1-D array. mp_sum should be fine.
I saw something strange in your code that you were not copying the right
things as you expected.
How about the following?
output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=
input(1:3,1:nbnds,1:k_pool)

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-28 12:29 GMT-05:00 Vahid Askarpour :

> Dear QE Users,
>
> I am working on some modifications to the QE-6.0 code using symmetry. When
> I try to combine a 3-D array scattered across nodes, I use the following:
>
> output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=
> input(3,nbnds,1:k_pool)
>
> Here, nbnds is the number of bands, k_pool is the number of k points/pool,
> and pool_id is the id of the pool. Here I am assuming the the number of k
> points is divisible by the number of pools.
>
> Then I call mp_sum(output,inter_pool_comm) to put all the segments of
> input across the nodes  into one output file.
>
> When I run the modified QE code in parallel, the output file is different
> from the serial run.
>
> Does the QE's mp_sum allow the above operation for a three-D array?
>
> Any hints or suggestions would be greatly appreciated.
>
> Vahid
>
> Vahid Askarpour
> Department of Physics and Atmospheric Science
> Dalhousie University,
> Halifax, NS, Canada
> ___
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-28 Thread Vahid Askarpour
Hi Ye,

Thank you for your suggestion. I tried it and when I ran the code, it 
seg-faulted. I put flags in the code to see where the segmentation faults 
occurs. It happens as the code calls mp_sum. It seems that mp_sum may not
be able to handle this reduction.

Cheers,
Vahid

On Oct 28, 2016, at 2:51 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

In Fortran, whatever-D array is 1-D array. mp_sum should be fine.
I saw something strange in your code that you were not copying the right things 
as you expected.
How about the following?
output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=input(1:3,1:nbnds,1:k_pool)

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-28 12:29 GMT-05:00 Vahid Askarpour 
mailto:vh261...@dal.ca>>:
Dear QE Users,

I am working on some modifications to the QE-6.0 code using symmetry. When I 
try to combine a 3-D array scattered across nodes, I use the following:

output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=input(3,nbnds,1:k_pool)

Here, nbnds is the number of bands, k_pool is the number of k points/pool, and 
pool_id is the id of the pool. Here I am assuming the the number of k points is 
divisible by the number of pools.

Then I call mp_sum(output,inter_pool_comm) to put all the segments of input 
across the nodes  into one output file.

When I run the modified QE code in parallel, the output file is different from 
the serial run.

Does the QE's mp_sum allow the above operation for a three-D array?

Any hints or suggestions would be greatly appreciated.

Vahid

Vahid Askarpour
Department of Physics and Atmospheric Science
Dalhousie University,
Halifax, NS, Canada
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-30 Thread Ye Luo
Hi Vahid,
segfault in mp_sum doesn't necessarily mean a problem there. Probably you
wrote something to output array but not in a valid place before mp_sum.
Try to check your allocation of output and the copy make sure they are
correct.

Ye


===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-28 14:33 GMT-05:00 Vahid Askarpour :

> Hi Ye,
>
> Thank you for your suggestion. I tried it and when I ran the code, it
> seg-faulted. I put flags in the code to see where the segmentation faults
> occurs. It happens as the code calls mp_sum. It seems that mp_sum may not
> be able to handle this reduction.
>
> Cheers,
> Vahid
>
> On Oct 28, 2016, at 2:51 PM, Ye Luo  wrote:
>
> In Fortran, whatever-D array is 1-D array. mp_sum should be fine.
> I saw something strange in your code that you were not copying the right
> things as you expected.
> How about the following?
> output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_
> pool))=input(1:3,1:nbnds,1:k_pool)
>
> Ye
>
> ===
> Ye Luo, Ph.D.
> Leadership Computing Facility
> Argonne National Laboratory
>
> 2016-10-28 12:29 GMT-05:00 Vahid Askarpour :
>
>> Dear QE Users,
>>
>> I am working on some modifications to the QE-6.0 code using symmetry.
>> When I try to combine a 3-D array scattered across nodes, I use the
>> following:
>>
>> output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=inp
>> ut(3,nbnds,1:k_pool)
>>
>> Here, nbnds is the number of bands, k_pool is the number of k
>> points/pool, and pool_id is the id of the pool. Here I am assuming the the
>> number of k points is divisible by the number of pools.
>>
>> Then I call mp_sum(output,inter_pool_comm) to put all the segments of
>> input across the nodes  into one output file.
>>
>> When I run the modified QE code in parallel, the output file is different
>> from the serial run.
>>
>> Does the QE's mp_sum allow the above operation for a three-D array?
>>
>> Any hints or suggestions would be greatly appreciated.
>>
>> Vahid
>>
>> Vahid Askarpour
>> Department of Physics and Atmospheric Science
>> Dalhousie University,
>> Halifax, NS, Canada
>> ___
>> Pw_forum mailing list
>> Pw_forum@pwscf.org
>> http://pwscf.org/mailman/listinfo/pw_forum
>>
>
> ___
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
>
>
> ___
> Pw_forum mailing list
> Pw_forum@pwscf.org
> http://pwscf.org/mailman/listinfo/pw_forum
>
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-30 Thread Vahid Askarpour
Hi Ye,

The changes I am making are part of the EPW/QE code and so the parallelization 
depends on the QE MPI routines. I have no problem using mp_sum for a 2-D array 
such as
energy(iband,ik). I can use mp_sum to add the contributions from various nodes 
and print the total in agreement with the output from the serial run. As far as 
the allocation goes, I use the same 2-D approach to my 3-D array.

The problem arises when I use the 3-D array with mp_sum. I noticed that in the 
Modules/mp.f90 file, there is a subroutine as follows:

  SUBROUTINE mp_sum_rt( msg, gid )
IMPLICIT NONE
REAL (DP), INTENT (INOUT) :: msg(:,:,:)
INTEGER, INTENT(IN) :: gid
#if defined(__MPI)
INTEGER :: msglen
msglen = size(msg)
CALL reduce_base_real( msglen, msg, gid, -1 )
#endif
  END SUBROUTINE mp_sum_rt


Here  msg has three dimensions. There are other subroutines for 1, 2, and 4 
dimensions. Somehow, if I can get  EPW to use the above subroutine, it might 
work. Adding a statement like:

USE mp,   ONLY : mp_sum_rt

at the beginning of the file does not work as the compilation fails because EPW 
does not see the subroutine. Interestingly, it does see mp_sum which is in the 
same mp.f90 file.

So is there a way that I can get EPW to see mp_sum_rt? This might solve my 
problem.

Since I know that mp_sum works with 2-D arrays, one option is to rewrite my 
files in 2-D, one array for each x,y,z direction. I am trying to avoid this.

Thanks,

Vahid


On Oct 30, 2016, at 3:59 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

Hi Vahid,
segfault in mp_sum doesn't necessarily mean a problem there. Probably you wrote 
something to output array but not in a valid place before mp_sum.
Try to check your allocation of output and the copy make sure they are correct.

Ye


===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-28 14:33 GMT-05:00 Vahid Askarpour 
mailto:vh261...@dal.ca>>:
Hi Ye,

Thank you for your suggestion. I tried it and when I ran the code, it 
seg-faulted. I put flags in the code to see where the segmentation faults 
occurs. It happens as the code calls mp_sum. It seems that mp_sum may not
be able to handle this reduction.

Cheers,
Vahid

On Oct 28, 2016, at 2:51 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

In Fortran, whatever-D array is 1-D array. mp_sum should be fine.
I saw something strange in your code that you were not copying the right things 
as you expected.
How about the following?
output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=input(1:3,1:nbnds,1:k_pool)

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-28 12:29 GMT-05:00 Vahid Askarpour 
mailto:vh261...@dal.ca>>:
Dear QE Users,

I am working on some modifications to the QE-6.0 code using symmetry. When I 
try to combine a 3-D array scattered across nodes, I use the following:

output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=input(3,nbnds,1:k_pool)

Here, nbnds is the number of bands, k_pool is the number of k points/pool, and 
pool_id is the id of the pool. Here I am assuming the the number of k points is 
divisible by the number of pools.

Then I call mp_sum(output,inter_pool_comm) to put all the segments of input 
across the nodes  into one output file.

When I run the modified QE code in parallel, the output file is different from 
the serial run.

Does the QE's mp_sum allow the above operation for a three-D array?

Any hints or suggestions would be greatly appreciated.

Vahid

Vahid Askarpour
Department of Physics and Atmospheric Science
Dalhousie University,
Halifax, NS, Canada
___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum


___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

___
Pw_forum mailing list
Pw_forum@pwscf.org
http://pwscf.org/mailman/listinfo/pw_forum

Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-30 Thread Ye Luo
mp_sum_rt is interfaced with mp_sum for 3d array. When you mp_sum a 3d
array, it will be automatically called.
USE mp,   ONLY : mp_sum_rt is not necessary.
If you want to check if mp_sum_rt is implicitly called via interface, you
can print something before and after your mp_sum call and also print
something in mp_sum_rt. You should find the expected printing behaviour.

if you compare the mp_sum_rt with the 2d and 4d implementation mp_sum_rm
and mp_sum_r4d, you should find they flat the dimensionality and call the
same 1-d routine reduce_base_real.

As I asked earlier, how did you allocate your 3-D arrays input and output?
Are you allocating sufficient size for the output, did you deallocate it by
mistake? How large is output array size, if it exceeds 2^32, you might hit
the 32bit integer bug of the variable msglen.

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-30 14:25 GMT-05:00 Vahid Askarpour :

> Hi Ye,
>
> The changes I am making are part of the EPW/QE code and so the
> parallelization depends on the QE MPI routines. I have no problem using
> mp_sum for a 2-D array such as
> energy(iband,ik). I can use mp_sum to add the contributions from various
> nodes and print the total in agreement with the output from the serial run.
> As far as the allocation goes, I use the same 2-D approach to my 3-D array.
>
> The problem arises when I use the 3-D array with mp_sum. I noticed that in
> the Modules/mp.f90 file, there is a subroutine as follows:
>
>   SUBROUTINE mp_sum_rt( msg, gid )
> IMPLICIT NONE
> REAL (DP), INTENT (INOUT) :: msg(:,:,:)
> INTEGER, INTENT(IN) :: gid
> #if defined(__MPI)
> INTEGER :: msglen
> msglen = size(msg)
> CALL reduce_base_real( msglen, msg, gid, -1 )
> #endif
>   END SUBROUTINE mp_sum_rt
>
>
> Here  msg has three dimensions. There are other subroutines for 1, 2, and
> 4 dimensions. Somehow, if I can get  EPW to use the above subroutine, it
> might work. Adding a statement like:
>
> USE mp,   ONLY : mp_sum_rt
>
> at the beginning of the file does not work as the compilation fails
> because EPW does not see the subroutine. Interestingly, it does see mp_sum
> which is in the same mp.f90 file.
>
> So is there a way that I can get EPW to see mp_sum_rt? This might solve my
> problem.
>
> Since I know that mp_sum works with 2-D arrays, one option is to rewrite
> my files in 2-D, one array for each x,y,z direction. I am trying to avoid
> this.
>
> Thanks,
>
> Vahid
>
>
> On Oct 30, 2016, at 3:59 PM, Ye Luo  wrote:
>
> Hi Vahid,
> segfault in mp_sum doesn't necessarily mean a problem there. Probably you
> wrote something to output array but not in a valid place before mp_sum.
> Try to check your allocation of output and the copy make sure they are
> correct.
>
> Ye
>
>
> ===
> Ye Luo, Ph.D.
> Leadership Computing Facility
> Argonne National Laboratory
>
> 2016-10-28 14:33 GMT-05:00 Vahid Askarpour :
>
>> Hi Ye,
>>
>> Thank you for your suggestion. I tried it and when I ran the code, it
>> seg-faulted. I put flags in the code to see where the segmentation faults
>> occurs. It happens as the code calls mp_sum. It seems that mp_sum may not
>> be able to handle this reduction.
>>
>> Cheers,
>> Vahid
>>
>> On Oct 28, 2016, at 2:51 PM, Ye Luo  wrote:
>>
>> In Fortran, whatever-D array is 1-D array. mp_sum should be fine.
>> I saw something strange in your code that you were not copying the right
>> things as you expected.
>> How about the following?
>> output(1:3,1:nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))
>> =input(1:3,1:nbnds,1:k_pool)
>>
>> Ye
>>
>> ===
>> Ye Luo, Ph.D.
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> 2016-10-28 12:29 GMT-05:00 Vahid Askarpour :
>>
>>> Dear QE Users,
>>>
>>> I am working on some modifications to the QE-6.0 code using symmetry.
>>> When I try to combine a 3-D array scattered across nodes, I use the
>>> following:
>>>
>>> output(3,nbnds,(k_pool*pool_id+1:k_pool*pool_id+k_pool))=inp
>>> ut(3,nbnds,1:k_pool)
>>>
>>> Here, nbnds is the number of bands, k_pool is the number of k
>>> points/pool, and pool_id is the id of the pool. Here I am assuming the the
>>> number of k points is divisible by the number of pools.
>>>
>>> Then I call mp_sum(output,inter_pool_comm) to put all the segments of
>>> input across the nodes  into one output file.
>>>
>>> When I run the modified QE code in parallel, the output file is
>>> different from the serial run.
>>>
>>> Does the QE's mp_sum allow the above operation for a three-D array?
>>>
>>> Any hints or suggestions would be greatly appreciated.
>>>
>>> Vahid
>>>
>>> Vahid Askarpour
>>> Department of Physics and Atmospheric Science
>>> Dalhousie University,
>>> Halifax, NS, Canada
>>> ___
>>> Pw_forum mailing list
>>> Pw_forum@pwscf.org
>>> http://pwscf.org/mailman/listinfo/pw_forum
>>>
>>
>> 

Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-30 Thread Vahid Askarpour
I allocate an array that needs to be calculated among pools using:

ALLOCATE(A(3,nbnd,nkf)), where nkf is the number of k points per pool and 
initialized as:
A(:,:,:)=0

Once array A is calculated, I allocate array A_tot as:

ALLOCATE(A_tot(3,nbnd,nkf_tot)), where nkf_tot is the total number of k points. 
and initialized as:
A_tot(:,:,:)=0

Then I call the poolgather1 as below:

#if defined(__MPI)
CALL poolgather1(3,nbnd,nkf_tot,nkf,A,A_tot)
#else
A_tot=A
#endif

I have tried both allocating A_tot outside and inside the #if defined section.

Poolgather1 is similar to other EPW poolgather and QE poolcollect routines and 
is given below:

subroutine poolgather1 (nsize1,nsize2,nkstot,nks, f_in, f_out)
  !
  USE kinds, ONLY : DP
  USE mp_global, ONLY : my_pool_id, inter_pool_comm, npool,kunit
  USE mp,ONLY : mp_barrier, mp_bcast,mp_sum
  USE mp_world,  ONLY : mpime
  implicit none
  !
  INTEGER, INTENT (in) :: nsize1
  !! first dimension of vectors f_in and f_out
  INTEGER, INTENT (in) :: nsize2
  !! second dimension of vectors f_in and f_out
  INTEGER, INTENT (in) :: nks
  !! number of k-points per pool
  INTEGER, INTENT (in) :: nkstot
  !! total number of k-points
  REAL (KIND=DP), INTENT (in) :: f_in(nsize1,nsize2,nks)
  ! input ( only for k-points of mypool )
  REAL (KIND=DP), INTENT (out)  :: f_out(nsize1,nsize2,nkstot)
  ! output  ( contains values for all k-point )
  !
#if defined(__MPI)
  INTEGER :: rest, nbase
  ! the rest of the integer division nkstot / npool
  ! the position in the original list
  rest = nkstot / kunit - ( nkstot / kunit / npool ) * npool
  !
  nbase = nks * my_pool_id
  !
  IF ( ( my_pool_id + 1 ) > rest ) nbase = nbase + rest * kunit
  !
  f_out = 0.d0
  f_out(1:nsize1,1:nsize2,(nbase+1):(nbase+nks)) = f_in(1:nsize1,1:nsize2,1:nks)
  !
  ! ... reduce across the pools
  !
  CALL mp_sum(f_out,inter_pool_comm)
  !
#else
  f_out(:,:,:) = f_in(:,:,:)
  !
#endif
  !
  end subroutine poolgather1

For my test case, nsize1=3, nsize2=4, nkstot=47.

Thanks,

Vahid




On Oct 30, 2016, at 8:19 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

mp_sum_rt is interfaced with mp_sum for 3d array. When you mp_sum a 3d array, 
it will be automatically called.
USE mp,   ONLY : mp_sum_rt is not necessary.
If you want to check if mp_sum_rt is implicitly called via interface, you can 
print something before and after your mp_sum call and also print something in 
mp_sum_rt. You should find the expected printing behaviour.

if you compare the mp_sum_rt with the 2d and 4d implementation mp_sum_rm and 
mp_sum_r4d, you should find they flat the dimensionality and call the same 1-d 
routine reduce_base_real.

As I asked earlier, how did you allocate your 3-D arrays input and output? Are 
you allocating sufficient size for the output, did you deallocate it by 
mistake? How large is output array size, if it exceeds 2^32, you might hit the 
32bit integer bug of the variable msglen.

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-30 14:25 GMT-05:00 Vahid Askarpour 
mailto:vh261...@dal.ca>>:
Hi Ye,

The changes I am making are part of the EPW/QE code and so the parallelization 
depends on the QE MPI routines. I have no problem using mp_sum for a 2-D array 
such as
energy(iband,ik). I can use mp_sum to add the contributions from various nodes 
and print the total in agreement with the output from the serial run. As far as 
the allocation goes, I use the same 2-D approach to my 3-D array.

The problem arises when I use the 3-D array with mp_sum. I noticed that in the 
Modules/mp.f90 file, there is a subroutine as follows:

  SUBROUTINE mp_sum_rt( msg, gid )
IMPLICIT NONE
REAL (DP), INTENT (INOUT) :: msg(:,:,:)
INTEGER, INTENT(IN) :: gid
#if defined(__MPI)
INTEGER :: msglen
msglen = size(msg)
CALL reduce_base_real( msglen, msg, gid, -1 )
#endif
  END SUBROUTINE mp_sum_rt


Here  msg has three dimensions. There are other subroutines for 1, 2, and 4 
dimensions. Somehow, if I can get  EPW to use the above subroutine, it might 
work. Adding a statement like:

USE mp,   ONLY : mp_sum_rt

at the beginning of the file does not work as the compilation fails because EPW 
does not see the subroutine. Interestingly, it does see mp_sum which is in the 
same mp.f90 file.

So is there a way that I can get EPW to see mp_sum_rt? This might solve my 
problem.

Since I know that mp_sum works with 2-D arrays, one option is to rewrite my 
files in 2-D, one array for each x,y,z direction. I am trying to avoid this.

Thanks,

Vahid


On Oct 30, 2016, at 3:59 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

Hi Vahid,
segfault in mp_sum doesn't necessarily mean a problem there. Probably you wrote 
something to output array but not in a valid place before mp_sum.
Try to check your allocation of output and the copy make sure they are correct.

Ye


Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-30 Thread Ye Luo
Is kunit=1 ?
Is your nkf and nkf_tot the same as nks nks_tot provided by QE environment?
Could you print the nbase for each processor and see if they are the
numbers expected?

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-30 18:59 GMT-05:00 Vahid Askarpour :

> I allocate an array that needs to be calculated among pools using:
>
> ALLOCATE(A(3,nbnd,nkf)), where nkf is the number of k points per pool and
> initialized as:
> A(:,:,:)=0
>
> Once array A is calculated, I allocate array A_tot as:
>
> ALLOCATE(A_tot(3,nbnd,nkf_tot)), where nkf_tot is the total number of k
> points. and initialized as:
> A_tot(:,:,:)=0
>
> Then I call the poolgather1 as below:
>
> #if defined(__MPI)
> CALL poolgather1(3,nbnd,nkf_tot,nkf,A,A_tot)
> #else
> A_tot=A
> #endif
>
> I have tried both allocating A_tot outside and inside the #if defined
> section.
>
> Poolgather1 is similar to other EPW poolgather and QE poolcollect routines
> and is given below:
>
> subroutine poolgather1 (nsize1,nsize2,nkstot,nks, f_in, f_out)
>   !
>   USE kinds, ONLY : DP
>   USE mp_global, ONLY : my_pool_id, inter_pool_comm, npool,kunit
>   USE mp,ONLY : mp_barrier, mp_bcast,mp_sum
>   USE mp_world,  ONLY : mpime
>   implicit none
>   !
>   INTEGER, INTENT (in) :: nsize1
>   !! first dimension of vectors f_in and f_out
>   INTEGER, INTENT (in) :: nsize2
>   !! second dimension of vectors f_in and f_out
>   INTEGER, INTENT (in) :: nks
>   !! number of k-points per pool
>   INTEGER, INTENT (in) :: nkstot
>   !! total number of k-points
>   REAL (KIND=DP), INTENT (in) :: f_in(nsize1,nsize2,nks)
>   ! input ( only for k-points of mypool )
>   REAL (KIND=DP), INTENT (out)  :: f_out(nsize1,nsize2,nkstot)
>   ! output  ( contains values for all k-point )
>   !
> #if defined(__MPI)
>   INTEGER :: rest, nbase
>   ! the rest of the integer division nkstot / npool
>   ! the position in the original list
>   rest = nkstot / kunit - ( nkstot / kunit / npool ) * npool
>   !
>   nbase = nks * my_pool_id
>   !
>   IF ( ( my_pool_id + 1 ) > rest ) nbase = nbase + rest * kunit
>   !
>   f_out = 0.d0
>   f_out(1:nsize1,1:nsize2,(nbase+1):(nbase+nks)) =
> f_in(1:nsize1,1:nsize2,1:nks)
>   !
>   ! ... reduce across the pools
>   !
>   CALL mp_sum(f_out,inter_pool_comm)
>   !
> #else
>   f_out(:,:,:) = f_in(:,:,:)
>   !
> #endif
>   !
>   end subroutine poolgather1
>
> For my test case, nsize1=3, nsize2=4, nkstot=47.
>
> Thanks,
>
> Vahid
>
>
>
>
> On Oct 30, 2016, at 8:19 PM, Ye Luo  wrote:
>
> mp_sum_rt is interfaced with mp_sum for 3d array. When you mp_sum a 3d
> array, it will be automatically called.
> USE mp,   ONLY : mp_sum_rt is not necessary.
> If you want to check if mp_sum_rt is implicitly called via interface, you
> can print something before and after your mp_sum call and also print
> something in mp_sum_rt. You should find the expected printing behaviour.
>
> if you compare the mp_sum_rt with the 2d and 4d implementation mp_sum_rm
> and mp_sum_r4d, you should find they flat the dimensionality and call the
> same 1-d routine reduce_base_real.
>
> As I asked earlier, how did you allocate your 3-D arrays input and output?
> Are you allocating sufficient size for the output, did you deallocate it by
> mistake? How large is output array size, if it exceeds 2^32, you might hit
> the 32bit integer bug of the variable msglen.
>
> Ye
>
> ===
> Ye Luo, Ph.D.
> Leadership Computing Facility
> Argonne National Laboratory
>
> 2016-10-30 14:25 GMT-05:00 Vahid Askarpour :
>
>> Hi Ye,
>>
>> The changes I am making are part of the EPW/QE code and so the
>> parallelization depends on the QE MPI routines. I have no problem using
>> mp_sum for a 2-D array such as
>> energy(iband,ik). I can use mp_sum to add the contributions from various
>> nodes and print the total in agreement with the output from the serial run.
>> As far as the allocation goes, I use the same 2-D approach to my 3-D array.
>>
>> The problem arises when I use the 3-D array with mp_sum. I noticed that
>> in the Modules/mp.f90 file, there is a subroutine as follows:
>>
>>   SUBROUTINE mp_sum_rt( msg, gid )
>> IMPLICIT NONE
>> REAL (DP), INTENT (INOUT) :: msg(:,:,:)
>> INTEGER, INTENT(IN) :: gid
>> #if defined(__MPI)
>> INTEGER :: msglen
>> msglen = size(msg)
>> CALL reduce_base_real( msglen, msg, gid, -1 )
>> #endif
>>   END SUBROUTINE mp_sum_rt
>>
>>
>> Here  msg has three dimensions. There are other subroutines for 1, 2, and
>> 4 dimensions. Somehow, if I can get  EPW to use the above subroutine, it
>> might work. Adding a statement like:
>>
>> USE mp,   ONLY : mp_sum_rt
>>
>> at the beginning of the file does not work as the compilation fails
>> because EPW does not see the subroutine. Interestingly, it does see mp_sum
>> which is in the same mp.f90 file.
>>
>> So is there a way that I can get EPW t

Re: [Pw_forum] Gathering 3-d arrays across pools using QE's mp_sum

2016-10-31 Thread Vahid Askarpour
Hi Ye,

I reinstalled the QE/EPW package from scratch and now the segfault does not 
occur. Before, I only compiled EPW by issuing make in the EPW folder. This 
time, I issued “make epw” from the QE
folder to rebuild both. Odd!

I did print kunit and nbase. Kunit is 1 and nbase is one of the following (not 
in the order below though) when run on 8 cpus:

0, 6, 12, 18, 24, 30, 36, 42, and 47.

Thank you for all the suggestions you made and best wishes,

Vahid


On Oct 30, 2016, at 10:03 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

Is kunit=1 ?
Is your nkf and nkf_tot the same as nks nks_tot provided by QE environment?
Could you print the nbase for each processor and see if they are the numbers 
expected?

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-30 18:59 GMT-05:00 Vahid Askarpour 
mailto:vh261...@dal.ca>>:
I allocate an array that needs to be calculated among pools using:

ALLOCATE(A(3,nbnd,nkf)), where nkf is the number of k points per pool and 
initialized as:
A(:,:,:)=0

Once array A is calculated, I allocate array A_tot as:

ALLOCATE(A_tot(3,nbnd,nkf_tot)), where nkf_tot is the total number of k points. 
and initialized as:
A_tot(:,:,:)=0

Then I call the poolgather1 as below:

#if defined(__MPI)
CALL poolgather1(3,nbnd,nkf_tot,nkf,A,A_tot)
#else
A_tot=A
#endif

I have tried both allocating A_tot outside and inside the #if defined section.

Poolgather1 is similar to other EPW poolgather and QE poolcollect routines and 
is given below:

subroutine poolgather1 (nsize1,nsize2,nkstot,nks, f_in, f_out)
  !
  USE kinds, ONLY : DP
  USE mp_global, ONLY : my_pool_id, inter_pool_comm, npool,kunit
  USE mp,ONLY : mp_barrier, mp_bcast,mp_sum
  USE mp_world,  ONLY : mpime
  implicit none
  !
  INTEGER, INTENT (in) :: nsize1
  !! first dimension of vectors f_in and f_out
  INTEGER, INTENT (in) :: nsize2
  !! second dimension of vectors f_in and f_out
  INTEGER, INTENT (in) :: nks
  !! number of k-points per pool
  INTEGER, INTENT (in) :: nkstot
  !! total number of k-points
  REAL (KIND=DP), INTENT (in) :: f_in(nsize1,nsize2,nks)
  ! input ( only for k-points of mypool )
  REAL (KIND=DP), INTENT (out)  :: f_out(nsize1,nsize2,nkstot)
  ! output  ( contains values for all k-point )
  !
#if defined(__MPI)
  INTEGER :: rest, nbase
  ! the rest of the integer division nkstot / npool
  ! the position in the original list
  rest = nkstot / kunit - ( nkstot / kunit / npool ) * npool
  !
  nbase = nks * my_pool_id
  !
  IF ( ( my_pool_id + 1 ) > rest ) nbase = nbase + rest * kunit
  !
  f_out = 0.d0
  f_out(1:nsize1,1:nsize2,(nbase+1):(nbase+nks)) = f_in(1:nsize1,1:nsize2,1:nks)
  !
  ! ... reduce across the pools
  !
  CALL mp_sum(f_out,inter_pool_comm)
  !
#else
  f_out(:,:,:) = f_in(:,:,:)
  !
#endif
  !
  end subroutine poolgather1

For my test case, nsize1=3, nsize2=4, nkstot=47.

Thanks,

Vahid




On Oct 30, 2016, at 8:19 PM, Ye Luo 
mailto:xw111lu...@gmail.com>> wrote:

mp_sum_rt is interfaced with mp_sum for 3d array. When you mp_sum a 3d array, 
it will be automatically called.
USE mp,   ONLY : mp_sum_rt is not necessary.
If you want to check if mp_sum_rt is implicitly called via interface, you can 
print something before and after your mp_sum call and also print something in 
mp_sum_rt. You should find the expected printing behaviour.

if you compare the mp_sum_rt with the 2d and 4d implementation mp_sum_rm and 
mp_sum_r4d, you should find they flat the dimensionality and call the same 1-d 
routine reduce_base_real.

As I asked earlier, how did you allocate your 3-D arrays input and output? Are 
you allocating sufficient size for the output, did you deallocate it by 
mistake? How large is output array size, if it exceeds 2^32, you might hit the 
32bit integer bug of the variable msglen.

Ye

===
Ye Luo, Ph.D.
Leadership Computing Facility
Argonne National Laboratory

2016-10-30 14:25 GMT-05:00 Vahid Askarpour 
mailto:vh261...@dal.ca>>:
Hi Ye,

The changes I am making are part of the EPW/QE code and so the parallelization 
depends on the QE MPI routines. I have no problem using mp_sum for a 2-D array 
such as
energy(iband,ik). I can use mp_sum to add the contributions from various nodes 
and print the total in agreement with the output from the serial run. As far as 
the allocation goes, I use the same 2-D approach to my 3-D array.

The problem arises when I use the 3-D array with mp_sum. I noticed that in the 
Modules/mp.f90 file, there is a subroutine as follows:

  SUBROUTINE mp_sum_rt( msg, gid )
IMPLICIT NONE
REAL (DP), INTENT (INOUT) :: msg(:,:,:)
INTEGER, INTENT(IN) :: gid
#if defined(__MPI)
INTEGER :: msglen
msglen = size(msg)
CALL reduce_base_real( msglen, msg, gid, -1 )
#endif
  END SUBROUTINE mp_sum_rt


Here  msg has three dimensions. There are other subroutines for 1, 2, and 4 
dimensi