Hi Andre,

I spent some time understanding why Intel fails on your example.
Playing around with your code below, adding print statements,
rearranging code, etc., it appears that ifx has a problem with:

>             associate(row => caf(:, team_number(row_team)))
...
>             col_t: change team(column_team)
>                 associate(cell => row(this_image(row_team)))
>                 cell = team_number(row_team)
>                 if (this_image(row_team) /= 1) row(this_image(row_team))[1, 
> team=row_team] = cell
>                 end associate
>             end team col_t

If I understand correctly, there is a very complicated dependency
of lhs and rhs for the assignment

  if (this_image(row_team) /= 1) row(this_image(row_team))[1, team=row_team] = 
cell

due to the associates.  Trying to rewrite it gives weird runtime errors.
This is beyond my paygrade.

Regarding the previous version:

> Anyhow to answer your initial question: The line:
> 
> if (this_image() == 1) caf(:, team_number(row_team))[1, team_number = -1] = 
> row

Looking at it with my MPI knowledge, this does feel unusual.  What would be the
communicator here?  Is there an equivalent to MPI communicators with Coarrays?
(and does form_team create that communicator, change_team remap coarray 
descriptors
suitably, etc.?)  Or am I completely off?

Cheers,
Harald


> Gesendet: Donnerstag, 17. Juli 2025 um 15:42
> Von: "Andre Vehreschild" <ve...@gmx.de>
> An: "Harald Anlauf" <anl...@gmx.de>
> CC: fortran@gcc.gnu.org
> Betreff: Re: Add: [Bug fortran/121043] [16 Regression] Tests of OpenCoarray 
> fail to pass, works on 15.1.1 20250712
>
> Hi Harald,
> 
> I see, that ifx has an "interesting" result. So what is wrong in the example:
> 
> program test_teams_1
>     use, intrinsic :: iso_fortran_env
>     use oc_assertions_interface, only : assert
> 
>     integer :: caf(3,3)[*] != 42
>     type(team_type) :: row_team, column_team
> 
>     caf = reshape((/(-i, i = 1, 9 )/), [3,3])
>     associate(me => this_image(), np => num_images())
>         call assert(np == 9, "I need exactly 9 teams.")
> 
>         ! Form a row team
>         form team((me - 1) / 3 + 1, row_team, new_index=mod(me - 1, 3) + 1)
>         row_t: change team(row_team)
>             associate(row => caf(:, team_number(row_team)))
>             ! Form column teams; each team has only one image
>             form team (this_image(), column_team)
>             col_t: change team(column_team)
>                 associate(cell => row(this_image(row_team)))
>                 print *,this_image(row_team), ": cell", cell, ", 
> team_number(row_team)", team_number(row_team)
>                 cell = team_number(row_team)
>                 if (this_image(row_team) /= 1) row(this_image(row_team))[1, 
> team=row_team] = cell
>                 end associate
>             end team col_t
>             sync team(row_team)
>             print *,this_image(), ": row", row
>             if (this_image() == 1) caf(:, team_number(row_team))[1, 
> team_number = -1] = row
>             end associate
>         end team row_t
>         sync all
>         if (me == 1) then
>             if (all(caf == reshape([1,1,1,2,2,2,3,3,3], [3, 3]))) then
>                 print *, "Test passed."
>             else
>                 print *, "Test failed."
>                 print *, "Expected:", reshape([1,1,1,2,2,2,3,3,3], [3, 3])
>                 print *, "Got     :", caf
>             end if
>         end if
>     end associate
> 
> end program test_teams_1
> 
> I have added some prints and am using associate instead of the change
> team-association now, but it still gives the wrong result. What I am trying to
> do is define three teams, where each one has a row of the caf matrix. Then 
> each
> row_team defines three column teams, where each one has one entry of the row.
> The example insists on exactly 9 images, i.e., every image has its exclusive
> entry in the matrix, where it is supposed to put its row_team's number into. 
> The
> images with id 1 in each row_team then have to aggregate the data into the
> global matrix on image 1 of the initial team (denoted by team_number -1 as per
> standard). For row 1 this works. But then everything gets mixed up. I had this
> while implementing caf_shmem and not taking the image offset in the shared
> memory into account when remapping coarrays for the coarray pointer in the
> associate/change team, but Intel can't do the same mistake. So what is wrong
> with the code? I don't get it.
> 
> Anyhow to answer your initial question: The line:
> 
> if (this_image() == 1) caf(:, team_number(row_team))[1, team_number = -1] = 
> row
> 
> made me rework replacing the function call of `team_number(row_team)` from the
> beginning. Row_team is an opaque pointer and the size of the memory it points
> to is not known. When now sending the data from `row` to the caf coarray, a
> helper structure is created in the routine in question. For each non-coindex a
> member in the structure is created. My initial thought was to allow certain
> functions to be executed on the remote image. (Just remember, when on
> MPI/OpenCoarrays, a separate thread is executing the modification of the data
> in the coarray for remote access. In this case the loop over the entries of
> `row` is done on the remote image.) Before I started this fix,
> `team_number(row_team)` would be executed on the remote image. This does not
> work, because the pointer to `row_team` is valid only on the source image. I
> then tried to filter better for functions that are safe to execute on the
> remote image. This is what all this mess with elemental and pure is about. I
> now propose to keep it simple and when ever a function call is encountered in 
> a
> coindexed variable access, the evaluation is done on the source image and the
> result propagated to the remote image. The attached patch has the modification
> for this.
> 
> Does this explain, what I am trying to do?
> 
> Regards,
>       Andre
> 
> On Tue, 15 Jul 2025 18:38:43 +0000
> Harald Anlauf <anl...@gmx.de> wrote:
> 
> > Hi Andre,
> > 
> > Jerry kindly sent me the full path:
> > 
> > https://github.com/sourceryinstitute/OpenCoarrays/blob/vehre/issue-779-form-team/src/tests/unit/teams/test_teams_1.f90
> > 
> > Frankly, you need to provide more details for me.
> > Or maybe someone else can jump in and help me out...
> > 
> > (Note: the above testcase fails with ifx, and it is not clear to me whether
> > it is rather an issue with Intel.)
> > 
> > Harald
> > 
> > 
> > > Gesendet: Dienstag, 15. Juli 2025 um 07:42
> > > Von: "Andre Vehreschild" <ve...@gmx.de>
> > > An: "Harald Anlauf" <anl...@gmx.de>
> > > CC: fortran@gcc.gnu.org
> > > Betreff: Re: Add: [Bug fortran/121043] [16 Regression] Tests of 
> > > OpenCoarray
> > > fail to pass, works on 15.1.1 20250712
> > >
> > > Hi Harald,
> > > 
> > > sorry, it's on this branch
> > > 
> > > https://github.com/sourceryinstitute/OpenCoarrays/tree/vehre/issue-779-form-team
> > > 
> > > only. There is PR for that, but it's not reviewed yet.
> > > 
> > > - Andre
> > > 
> > > On Mon, 14 Jul 2025 20:35:32 +0200
> > > Harald Anlauf <anl...@gmx.de> wrote:
> > >   
> > > > Andre,
> > > > 
> > > > Am 14.07.25 um 18:12 schrieb Andre Vehreschild:  
> > > > > Hi Jerry, hi Harald,
> > > > > 
> > > > > I am sorry for not responding earlier. I got a small but urgent 
> > > > > project
> > > > > in and have to do it first. It is only three days (w/ continuation
> > > > > possible), but it pays so it has priority.
> > > > > 
> > > > > As for the issues at hand: Jerry, you probably should have the coarray
> > > > > fixes patches from here :
> > > > > https://gcc.gnu.org/pipermail/fortran/2025-July/062470.html on your 
> > > > > test
> > > > > branch (both on 15 and 16) to pass the OpenCoarray tests successfully.
> > > > > Because with it: works for me ;-)
> > > > > 
> > > > > Harald, when you look at OpenCoarray's test_teams_1.f90.006.original
> > > > > tree dump (src/tests/unit/teams/ directory), then it may answer your
> > > > > question why I want to put all function evaluation to the calling side
> > > > > and not to the remote accessor. In this testcase w/o my patch the
> > > > > team_number(team) gets moved to the accessor, but `team` can not be
> > > > > moved there (opaque pointer on the source side w/o known size and how
> > > > > to copy).    
> > > > 
> > > > I downloaded OpenCoarrays-2.10.3 but cannot find the file you mention:
> > > > 
> > > > % ls src/tests/unit/teams/
> > > > CMakeLists.txt        teams_coarray_get_by_ref.f90 
> > > > teams_coarray_sendget.f90
> > > > get-communicator.F90  teams_coarray_get.f90          teams_send.f90
> > > > sync-team.f90         teams_coarray_send_by_ref.f90  teams_subset.f90
> > > > team-number.f90       teams_coarray_send.f90
> > > > 
> > > > The only file where grep finds the string test_teams1 is teams_send.f90,
> > > > but I don't see any relation to what you are talking about.
> > > > 
> > > > Can you provide a full path / URL and more details?
> > > > 
> > > > Harald
> > > >   
> > > > > When you test OpenCoarrays, Jerry, then please make sure to use
> > > > > separate and clean build directories. The build stuff from 
> > > > > OpenCoarrays
> > > > > is sometimes not cleaning up its artifacts correctly, so that .o's 
> > > > > stay
> > > > > that better shouldn't.
> > > > > 
> > > > > Hope to get back to gfortran hacking by mid of the week and that this
> > > > > few tips helped.
> > > > > 
> > > > > Regards,
> > > > >       Andre    
> > > > 
> > > >   
> > > 
> > > 
> > > -- 
> > > Andre Vehreschild * Email: vehre ad gmx dot de 
> > >   
> 
> 
> -- 
> Andre Vehreschild * Email: vehre ad gmx dot de 
> 

Reply via email to