On Wed, 25 Oct 2023 14:59:07 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

>> @merykitty do you have examples for both? Maybe stores to fields already 
>> works. Merging loads and stores may be out of scope. That sounds a little 
>> much like SLP. We can still try to do that in a future RFE. We could even 
>> try to use (masked) vector instructions.
>
> @eme64 I have tried your patch, it seems that there are some limitations:
> 
> - The stores are not merged if the order is not right (e.g `a[2] = 2; a[1] = 
> 1;`)
> - The stores are not merged if they are floating point constants.
> - The stores are not merged if they are consecutive fields in an object. E.g:
> 
> 
>     class Point {
>         int x; int y;
>     }
> 
>     p.x = 1;
>     p.y = 2; // Cannot merge into mov [p.x], 0x200000001
> 
> 
> Regarding the final point, fields may be of different types with different 
> sizes and there may be padding between them. This means that for load-store 
> sequence merges, I think SLP cannot handle these cases.
> 
> Thanks.

@merykitty I just looked at this project again today.

About the limitations: Yes, this is deliberately limited for now. We could make 
it much more smart, and create a sort of straight-line code SLP algorithm that 
could even allow for different element sizes and padding in between (using 
masked loads / stores). Maybe that would be worth attempting.

For now, this is just to satisfy the limited requirements of library folks who 
do not want to see everybody using Unsafe to merge stores.

About fields stores: I see that different fields apparently are not in a chain, 
but rather independent:

    static void test3(Point p) {
        p.x = 1;
        p.y = 2;
    }

40  StoreI  === 28 7 39 21  [[ 16 ]]  @Test$Point+12 *, name=x, idx=4;  Memory: 
@Test$Point+12 *, name=x, idx=4; !jvms: Test::test3 @ bci:2 (line 36)
44  StoreI  === 28 7 43 41  [[ 16 ]]  @Test$Point+16 *, name=y, idx=5;  Memory: 
@Test$Point+16 *, name=y, idx=5; !jvms: Test::test3 @ bci:7 (line 37)


I should be able to allow for that quite easily, they can either be in a chain, 
or have the same memory state as input.

@merykitty @cl4es @RogerRiggs @vnkozlov I wonder if you think that the approach 
of this PR is good, and if you have any suggestions about it?

- Is a separate phase ok?
- Is this PR in a sweet-spot that reaches the goals of the library-folks, but 
is not too complex?
- Would you prefer a more general solution, like a straight-line SLP algorithm, 
that can merge (even vectorize) any load / store sequences, even merge accesses 
with different element sizes and with gaps/padding?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1893927494
PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1893940205

Reply via email to