Glad it worked! --Junchao Zhang
On Wed, Dec 6, 2023 at 1:20 PM Sreeram R Venkat <srven...@utexas.edu> wrote: > Thank you for your help. It turned out the problem was that I forgot to > assemble the "x" vector before doing the scatter. It seems to be working > now. > > Thanks, > Sreeram > > On Wed, Dec 6, 2023 at 11:12 AM Junchao Zhang <junchao.zh...@gmail.com> > wrote: > >> Hi, Sreeram, >> I made an example with your approach. It worked fine as you see the >> output at the end >> >> #include <petscvec.h> >> int main(int argc, char **argv) >> { >> PetscInt i, j, rstart, rend, n, N, *indices; >> PetscMPIInt size, rank; >> IS ix; >> VecScatter vscat; >> Vec x, y; >> >> PetscFunctionBeginUser; >> PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL)); >> PetscCallMPI(MPI_Comm_size(PETSC_COMM_WORLD, &size)); >> PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank)); >> >> PetscCall(VecCreate(PETSC_COMM_WORLD, &x)); >> PetscCall(VecSetFromOptions(x)); >> PetscCall(PetscObjectSetName((PetscObject)x, "Vec X")); >> n = (rank < 4) ? 9 : 0; >> PetscCall(VecSetSizes(x, n, PETSC_DECIDE)); >> >> PetscCall(VecGetOwnershipRange(x, &rstart, &rend)); >> for (i = rstart; i < rend; i++) PetscCall(VecSetValue(x, i, >> (PetscScalar)i, INSERT_VALUES)); >> PetscCall(VecAssemblyBegin(x)); >> PetscCall(VecAssemblyEnd(x)); >> PetscCall(VecGetSize(x, &N)); >> >> PetscCall(VecCreate(PETSC_COMM_WORLD, &y)); >> PetscCall(VecSetFromOptions(y)); >> PetscCall(PetscObjectSetName((PetscObject)y, "Vec Y")); >> PetscCall(VecSetSizes(y, PETSC_DECIDE, N)); >> >> PetscCall(VecGetOwnershipRange(y, &rstart, &rend)); >> PetscCall(PetscMalloc1(rend - rstart, &indices)); >> for (i = rstart, j = 0; i < rend; i++, j++) indices[j] = rank + size * j; >> >> PetscCall(ISCreateGeneral(PETSC_COMM_WORLD, rend - rstart, indices, >> PETSC_OWN_POINTER, &ix)); >> PetscCall(VecScatterCreate(x, ix, y, NULL, &vscat)); >> >> PetscCall(VecScatterBegin(vscat, x, y, INSERT_VALUES, SCATTER_FORWARD)); >> PetscCall(VecScatterEnd(vscat, x, y, INSERT_VALUES, SCATTER_FORWARD)); >> >> PetscCall(ISView(ix, PETSC_VIEWER_STDOUT_WORLD)); >> PetscCall(VecView(x, PETSC_VIEWER_STDERR_WORLD)); >> PetscCall(VecView(y, PETSC_VIEWER_STDERR_WORLD)); >> >> PetscCall(VecScatterDestroy(&vscat)); >> PetscCall(ISDestroy(&ix)); >> PetscCall(VecDestroy(&x)); >> PetscCall(VecDestroy(&y)); >> PetscCall(PetscFinalize()); >> return 0; >> } >> >> $ mpirun -n 12 ./ex100 >> IS Object: 12 MPI processes >> type: general >> [0] Number of indices in set 3 >> [0] 0 0 >> [0] 1 12 >> [0] 2 24 >> [1] Number of indices in set 3 >> [1] 0 1 >> [1] 1 13 >> [1] 2 25 >> [2] Number of indices in set 3 >> [2] 0 2 >> [2] 1 14 >> [2] 2 26 >> [3] Number of indices in set 3 >> [3] 0 3 >> [3] 1 15 >> [3] 2 27 >> [4] Number of indices in set 3 >> [4] 0 4 >> [4] 1 16 >> [4] 2 28 >> [5] Number of indices in set 3 >> [5] 0 5 >> [5] 1 17 >> [5] 2 29 >> [6] Number of indices in set 3 >> [6] 0 6 >> [6] 1 18 >> [6] 2 30 >> [7] Number of indices in set 3 >> [7] 0 7 >> [7] 1 19 >> [7] 2 31 >> [8] Number of indices in set 3 >> [8] 0 8 >> [8] 1 20 >> [8] 2 32 >> [9] Number of indices in set 3 >> [9] 0 9 >> [9] 1 21 >> [9] 2 33 >> [10] Number of indices in set 3 >> [10] 0 10 >> [10] 1 22 >> [10] 2 34 >> [11] Number of indices in set 3 >> [11] 0 11 >> [11] 1 23 >> [11] 2 35 >> Vec Object: Vec X 12 MPI processes >> type: mpi >> Process [0] >> 0. >> 1. >> 2. >> 3. >> 4. >> 5. >> 6. >> 7. >> 8. >> Process [1] >> 9. >> 10. >> 11. >> 12. >> 13. >> 14. >> 15. >> 16. >> 17. >> Process [2] >> 18. >> 19. >> 20. >> 21. >> 22. >> 23. >> 24. >> 25. >> 26. >> Process [3] >> 27. >> 28. >> 29. >> 30. >> 31. >> 32. >> 33. >> 34. >> 35. >> Process [4] >> Process [5] >> Process [6] >> Process [7] >> Process [8] >> Process [9] >> Process [10] >> Process [11] >> Vec Object: Vec Y 12 MPI processes >> type: mpi >> Process [0] >> 0. >> 12. >> 24. >> Process [1] >> 1. >> 13. >> 25. >> Process [2] >> 2. >> 14. >> 26. >> Process [3] >> 3. >> 15. >> 27. >> Process [4] >> 4. >> 16. >> 28. >> Process [5] >> 5. >> 17. >> 29. >> Process [6] >> 6. >> 18. >> 30. >> Process [7] >> 7. >> 19. >> 31. >> Process [8] >> 8. >> 20. >> 32. >> Process [9] >> 9. >> 21. >> 33. >> Process [10] >> 10. >> 22. >> 34. >> Process [11] >> 11. >> 23. >> 35. >> >> --Junchao Zhang >> >> >> On Tue, Dec 5, 2023 at 10:09 PM Sreeram R Venkat <srven...@utexas.edu> >> wrote: >> >>> Yes, I have an example code at github.com/s769/petsc-test. Only thing >>> is, when I described the example before, I simplified the actual use case >>> in the code to make things simpler. Here are the extra details relevant to >>> this code: >>> >>> - We assume a 2D processor grid, given by the command-line args >>> -proc_rows and -proc_cols >>> - The total length of the vector is n_m*n_t given by command-line >>> args -nm and -nt. n_m corresponds to a space index and n_t a time index. >>> - In the "Start" phase, the vector is divided into n_m blocks each >>> of size n_t (indexed space->time). The blocks are partitioned over the >>> first row of processors. For example if -nm = 4 and -proc_cols = 4, each >>> processor in the first row will get one block of size n_t. Each processor >>> in the first row gets n_m_local blocks of size n_t, where the sum of all >>> n_m_locals for the first row of processors is n_m. >>> - In the "Finish" phase, the vector is divided into n_t blocks each >>> of size n_m (indexed time->space; this is the reason for the permutation >>> of >>> indices). The blocks are partitioned over all processors. Each processor >>> will get n_t_local blocks of size n_m, where the sum of all n_t_locals >>> for >>> all processors is n_t. >>> >>> I think the basic idea is similar to the previous example, but these >>> details make things a bit more complicated. Please let me know if anything >>> is unclear, and I can try to explain more. >>> >>> Thanks for your help, >>> Sreeram >>> >>> On Tue, Dec 5, 2023 at 9:30 PM Junchao Zhang <junchao.zh...@gmail.com> >>> wrote: >>> >>>> I think your approach is correct. Do you have an example code? >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Dec 5, 2023 at 5:15 PM Sreeram R Venkat <srven...@utexas.edu> >>>> wrote: >>>> >>>>> Hi, I have a follow up question on this. >>>>> >>>>> Now, I'm trying to do a scatter and permutation of the vector. Under >>>>> the same setup as the original example, here are the new Start and Finish >>>>> states I want to achieve: >>>>> Start Finish >>>>> Proc | Entries Proc | Entries >>>>> 0 | 0,...,8 0 | 0, 12, 24 >>>>> 1 | 9,...,17 1 | 1, 13, 25 >>>>> 2 | 18,...,26 2 | 2, 14, 26 >>>>> 3 | 27,...,35 3 | 3, 15, 27 >>>>> 4 | None 4 | 4, 16, 28 >>>>> 5 | None 5 | 5, 17, 29 >>>>> 6 | None 6 | 6, 18, 30 >>>>> 7 | None 7 | 7, 19, 31 >>>>> 8 | None 8 | 8, 20, 32 >>>>> 9 | None 9 | 9, 21, 33 >>>>> 10 | None 10 | 10, 22, 34 >>>>> 11 | None 11 | 11, 23, 35 >>>>> >>>>> So far, I've tried to use ISCreateGeneral(), with each process giving >>>>> an idx array corresponding to the indices it wants (i.e. idx on P0 looks >>>>> like [0,12,24] P1 -> [1,13, 25], and so on). >>>>> Then I use this to create the VecScatter with VecScatterCreate(x, is, >>>>> y, NULL, &scatter). >>>>> >>>>> However, when I try to do the scatter, I get some illegal memory >>>>> access errors. >>>>> >>>>> Is there something wrong with how I define the index sets? >>>>> >>>>> Thanks, >>>>> Sreeram >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Oct 5, 2023 at 12:57 PM Sreeram R Venkat <srven...@utexas.edu> >>>>> wrote: >>>>> >>>>>> Thank you. This works for me. >>>>>> >>>>>> Sreeram >>>>>> >>>>>> On Wed, Oct 4, 2023 at 6:41 PM Junchao Zhang <junchao.zh...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, Sreeram, >>>>>>> You can try this code. Since x, y are both MPI vectors, we just need >>>>>>> to say we want to scatter x[0:N] to y[0:N]. The 12 index sets with your >>>>>>> example on the 12 processes would be [0..8], [9..17], [18..26], >>>>>>> [27..35], >>>>>>> [], ..., []. Actually, you can do it arbitrarily, say, with 12 index >>>>>>> sets >>>>>>> [0..17], [18..35], .., []. PETSc will figure out how to do the >>>>>>> communication. >>>>>>> >>>>>>> PetscInt rstart, rend, N; >>>>>>> IS ix; >>>>>>> VecScatter vscat; >>>>>>> Vec y; >>>>>>> MPI_Comm comm; >>>>>>> VecType type; >>>>>>> >>>>>>> PetscObjectGetComm((PetscObject)x, &comm); >>>>>>> VecGetType(x, &type); >>>>>>> VecGetSize(x, &N); >>>>>>> VecGetOwnershipRange(x, &rstart, &rend); >>>>>>> >>>>>>> VecCreate(comm, &y); >>>>>>> VecSetSizes(y, PETSC_DECIDE, N); >>>>>>> VecSetType(y, type); >>>>>>> >>>>>>> ISCreateStride(PetscObjectComm((PetscObject)x), rend - rstart, >>>>>>> rstart, 1, &ix); >>>>>>> VecScatterCreate(x, ix, y, ix, &vscat); >>>>>>> >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 4, 2023 at 6:03 PM Sreeram R Venkat <srven...@utexas.edu> >>>>>>> wrote: >>>>>>> >>>>>>>> Suppose I am running on 12 processors, and I have a vector "v" of >>>>>>>> size 36 partitioned over the first 4. v still uses the >>>>>>>> PETSC_COMM_WORLD, so >>>>>>>> it has a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to >>>>>>>> repartition it over all 12 processors, so that the layout becomes (3, >>>>>>>> 3, 3, >>>>>>>> ..., 3). I've been trying to use VecScatter to do this, but I'm not >>>>>>>> sure >>>>>>>> what IndexSets to use for the sender and receiver. >>>>>>>> >>>>>>>> The result I am trying to achieve is this: >>>>>>>> >>>>>>>> Assume the vector is v = <0, 1, 2, ..., 35> >>>>>>>> >>>>>>>> Start Finish >>>>>>>> Proc | Entries Proc | Entries >>>>>>>> 0 | 0,...,8 0 | 0, 1, 2 >>>>>>>> 1 | 9,...,17 1 | 3, 4, 5 >>>>>>>> 2 | 18,...,26 2 | 6, 7, 8 >>>>>>>> 3 | 27,...,35 3 | 9, 10, 11 >>>>>>>> 4 | None 4 | 12, 13, 14 >>>>>>>> 5 | None 5 | 15, 16, 17 >>>>>>>> 6 | None 6 | 18, 19, 20 >>>>>>>> 7 | None 7 | 21, 22, 23 >>>>>>>> 8 | None 8 | 24, 25, 26 >>>>>>>> 9 | None 9 | 27, 28, 29 >>>>>>>> 10 | None 10 | 30, 31, 32 >>>>>>>> 11 | None 11 | 33, 34, 35 >>>>>>>> >>>>>>>> Appreciate any help you can provide on this. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sreeram >>>>>>>> >>>>>>>