Hi Richard!
On 2024-02-20T08:44:35+0100, Richard Biener <[email protected]> wrote:
> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>> On 2024-02-19T17:31:20+0100, I wrote:
>> > On 2024-02-19T11:52:55+0100, Richard Biener <[email protected]> wrote:
>> >> On Mon, 19 Feb 2024, Thomas Schwinge wrote:
>> >>> On 2024-02-16T14:53:04+0100, I wrote:
>> >>> > On 2024-02-16T12:41:06+0000, Andrew Stubbs <[email protected]> wrote:
>> >>> >> On 16/02/2024 12:26, Richard Biener wrote:
>> >>> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote:
>> >>> >>>> On 16/02/2024 10:17, Richard Biener wrote:
>> >>> >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote:
>> >>> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs
>> >>> >>>>>> <[email protected]> wrote:
>> >>> >>>>>>> I've committed this patch
>> >>> >>>>>>
>> >>> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691
>> >>> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later
>> >>> >>>>>> RDNA3/gfx1100
>> >>> >>>>>> support builds on top of, and that's what I'm currently working on
>> >>> >>>>>> getting proper GCC/GCN target (not offloading) results for.
>> >>> >>>>>>
>> >>> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably
>> >>> >>>>>> simple,
>> >>> >>>>>> and hopefully representative for other SLP execution test FAILs
>> >>> >>>>>> (regressions compared to my earlier non-gfx1100 testing).
>> >>> >>>>>>
>> >>> >>>>>> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>> >>> >>>>>> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>> >>> >>>>>> --sysroot=install/amdgcn-amdhsa -ftree-vectorize
>> >>> >>>>>> -fno-tree-loop-distribute-patterns -fno-vect-cost-model
>> >>> >>>>>> -fno-common
>> >>> >>>>>> -O2 -fdump-tree-slp-details -fdump-tree-vect-details
>> >>> >>>>>> -isystem
>> >>> >>>>>> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>> >>> >>>>>> source-gcc/newlib/libc/include
>> >>> >>>>>> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>> >>> >>>>>> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper
>> >>> >>>>>> setarch,--addr-no-randomize -fdump-tree-all-all
>> >>> >>>>>> -fdump-ipa-all-all
>> >>> >>>>>> -fdump-rtl-all-all -save-temps -march=gfx1100
>> >>> >>>>>>
>> >>> >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from
>> >>> >>>>>> 'TARGET_PACKED_WORK_ITEMS' in
>> >>> >>>>>> 'gcn_target_asm_function_prologue'), so I
>> >>> >>>>>> suppose will also exhibit the same failure mode, once again?
>> >>> >>>>>>
>> >>> >>>>>> Compared to '-march=gfx90a', the differences begin in
>> >>> >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to
>> >>> >>>>>> 'a-bb-slp-cond-1.s'.
>> >>> >>>>>>
>> >>> >>>>>> Changed like:
>> >>> >>>>>>
>> >>> >>>>>> @@ -38,10 +38,10 @@ int main ()
>> >>> >>>>>> #pragma GCC novector
>> >>> >>>>>> for (i = 1; i < N; i++)
>> >>> >>>>>> if (a[i] != i%4 + 1)
>> >>> >>>>>> - abort ();
>> >>> >>>>>> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1);
>> >>> >>>>>>
>> >>> >>>>>> if (a[0] != 5)
>> >>> >>>>>> - abort ();
>> >>> >>>>>> + __builtin_printf("%d %d != %d\n", 0, a[0], 5);
>> >>> >>>>>>
>> >>> >>>>>> ..., we see:
>> >>> >>>>>>
>> >>> >>>>>> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>> >>> >>>>>> 40 5 != 1
>> >>> >>>>>> 41 6 != 2
>> >>> >>>>>> 42 7 != 3
>> >>> >>>>>> 43 8 != 4
>> >>> >>>>>> 44 5 != 1
>> >>> >>>>>> 45 6 != 2
>> >>> >>>>>> 46 7 != 3
>> >>> >>>>>> 47 8 != 4
>> >>> >>>>>>
>> >>> >>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is
>> >>> >>>>>> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration
>> >>> >>>>>> has
>> >>> >>>>>> scribbled zero values over these (vector lane masking issue,
>> >>> >>>>>> perhaps?),
>> >>> >>>>>> or some other code generation issue?
>> >>> >
>> >>> >>>> [...], I must be doing something different because
>> >>> >>>> vect/bb-slp-cond-1.c
>> >>> >>>> passes for me, on gfx1100.
>> >>> >
>> >>> > That's strange. I've looked at your log file (looks good), and used
>> >>> > your
>> >>> > toolchain to compile, and your 'gcn-run' to invoke, and still do get:
>> >>> >
>> >>> > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe
>> >>> > GCN Kernel Aborted
>> >>> > Kernel aborted
>> >>> >
>> >>> > Andrew, later on, please try what happens when you put an unconditional
>> >>> > 'abort' call into a test case?
>> >>>
>> >>> Andrew, any luck with that yet?
>> >>>
>> >>> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c'
>> >>> execution test failure mentioned above (manual compilation and
>> >>> 'gcn-run')?
>> >>
>> >> No, when manually compiling/running the testcase it works fine for me.
>> >
>> > I've updated my GCC master branch sources, but it still fails for me:
>> >
>> > $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/
>> > source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
>> > --sysroot=install/amdgcn-amdhsa -isystem
>> > build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem
>> > source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/
>> > -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -march=gfx1100 -ftree-vectorize
>> > -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2
>> > -save-temps
>> > $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out
>> > GCN Kernel Aborted
>> > Kernel aborted
>> >
>> > Strange.
>> >
>> > In 'bb-slp-cond-1.tar.xz' I'm attaching the files I've built. Could you
>> > please compare those to yours and try 'gcn-run gfx1030/a.out'?
>>
>> Actually: 'gcn-run gfx1030/a.out' a few times -- our dear friend
>> Nondeterminism seems to be at play here... :-|
>
> What's your set of compile options? I don't manage to get close
> to your gfx1030 assembly when using your preprocessed source ...
>
> I've tried -march=gfx1030 -O[23] [-fno-vect-cost-model]
See the 'xgcc' command line just a few lines above? ;-)
-ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model
-fno-common -O2
That's what I originally found in 'gcc.log'.
Grüße
Thomas
> Looks like you use -fno-omit-frame-pointer but then I still see
> -mine +yours
>
> - v_readlane_b32 s18, v4, 0
> - v_readlane_b32 s19, v5, 0
> - s_add_u32 s18, s18, s26
> - s_addc_u32 s19, s19, s27
> - v_writelane_b32 v4, s18, 0
> - v_writelane_b32 v5, s19, 0
> - s_mov_b32 s18, s14
> - s_mov_b32 s19, s15
> - s_mov_b32 s22, scc
> - s_add_u32 s18, s18, 4096
> - s_addc_u32 s19, s19, 0
> - s_cmpk_lg_u32 s22, 0
> - v_writelane_b32 v6, s18, 0
> - v_writelane_b32 v7, s19, 0
> - flat_store_dwordx2 v[6:7], v[4:5]
> + v_writelane_b32 v6, s26, 0
> + v_writelane_b32 v7, s27, 0
> + v_add_co_u32 v4, vcc, v6, v4
> + v_add_co_ci_u32 v5, vcc, v7, v5, vcc
>
> and more changes.
>
> Richard.
>
>>
>> Gr??e
>> Thomas
>>
>>
>> >> Didn't yet get to try the .exp files
>> >>
>> >> Richard.
>> >>
>> >>>
>> >>> Gr??e
>> >>> Thomas
>> >>>
>> >>>
>> >>> >>> I didn't try to run it - when doing make check-gcc fails to using
>> >>> >>> gcn-run for test invocation
>> >>> >
>> >>> > Note, that for such individual test cases, invoking the compiler and
>> >>> > then
>> >>> > 'gcn-run' manually would seem easiest?
>> >>> >
>> >>> >>> what's the trick to make it do that?
>> >>> >
>> >>> > I tell you've probably not done much "embedded" or simulator testing of
>> >>> > GCC targets? ;-P
>> >>> >
>> >>> >> There's a config file for nvptx here:
>> >>> >> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp
>> >>> >
>> >>> > Yes, and I have pending some updates to that one, to be finished once
>> >>> > I've generally got my testing set up again, to a sufficient degree...
>> >>> >
>> >>> >> You can probably make the obvious adjustments. I think Thomas has a
>> >>> >> GCN
>> >>> >> version with a few more features.
>> >>> >
>> >>> > Right. I'm attaching my current 'amdgcn-amdhsa-run.exp'.
>> >>> >
>> >>> > I'm aware that the 'set_board_info gcc,[...] [...]' may be
>> >>> > obsolete/wrong
>> >>> > (as Andrew also noted privately) -- likewise, at least in part, for
>> >>> > GCC/nvptx, which is where I copied all that from. (Will revise later;
>> >>> > not relevant for this discussion, here.)
>> >>> >
>> >>> > Similar to what I've recently added to libgomp, there is 'flock'ing
>> >>> > here,
>> >>> > so that you may use 'make -j[...] check' for (partial) parallelism, but
>> >>> > still all execution testing runs serialized. I found this to greatly
>> >>> > help denoise the test results. (Not ideal, of course, but improving
>> >>> > that
>> >>> > is for later, too.)
>> >>> >
>> >>> > You may want to disable the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' thing
>> >>> > if
>> >>> > that doesn't work like that in your case. (I've no idea what
>> >>> > 'amdgpu_gpu_recover' would do if the GPU is also used for display.)
>> >>> > But
>> >>> > this, again, greatly helps denoise test results, at least for the one
>> >>> > system I'm currently testing on.
>> >>> >
>> >>> > I intend to publish proper documentation of all this, later on -- happy
>> >>> > to answer any questions in the mean time.
>> >>> >
>> >>> > If you don't already have a common directory for DejaGnu board files,
>> >>> > put
>> >>> > 'amdgcn-amdhsa-run.exp' into '~/tmp/amdgcn-amdhsa/', for example, and
>> >>> > add
>> >>> > a 'dejagnu.exp' file next to it:
>> >>> >
>> >>> > lappend boards_dir ~/tmp/amdgcn-amdhsa
>> >>> >
>> >>> > Prepare:
>> >>> >
>> >>> > $ DEJAGNU=$HOME/tmp/amdgcn-amdhsa/dejagnu.exp
>> >>> > $ export DEJAGNU
>> >>> > $ AMDGCN_AMDHSA_RUN=[...]/build-gcc/gcc/gcn-run
>> >>> > $ export AMDGCN_AMDHSA_RUN
>> >>> > $ # If necessary:
>> >>> > $ AMDGCN_AMDHSA_LD_LIBRARY_PATH=/opt/rocm/lib
>> >>> > $
>> >>> > LD_LIBRARY_PATH=$AMDGCN_AMDHSA_LD_LIBRARY_PATH${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH}
>> >>> > $ export LD_LIBRARY_PATH
>> >>> >
>> >>> > ..., and then run:
>> >>> >
>> >>> > $ make -j8 check-gcc-c
>> >>> > RUNTESTFLAGS='--target_board=amdgcn-amdhsa-run/-march=gfx1030 vect.exp'
>> >>> >
>> >>> > Oh, and I saw that on <https://gcc.gnu.org/wiki/Offloading>, Tobias has
>> >>> > recently put into a new "Using the GPU as stand-alone system" section
>> >>> > some similar information. (..., but this should, in my opinion, be on
>> >>> > a
>> >>> > different page, as it's explicitly *not* about what we understand as
>> >>> > offloading.)
>> >>> >
>> >>> >> I usually use the CodeSourcery magic stack of scripts for testing
>> >>> >> installed toolchains on remote devices, so I'm not too familiar with
>> >>> >> using Dejagnu directly.
>> >>> >
>> >>> > Tsk... ;'-|
>> >>> >
>> >>> >
>> >>> > Gr??e
>> >>> > Thomas
>> >>>
>> >>
>> >> --
>> >> Richard Biener <[email protected]>
>> >> SUSE Software Solutions Germany GmbH,
>> >> Frankenstrasse 146, 90461 Nuernberg, Germany;
>> >> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>>
>
> --
> Richard Biener <[email protected]>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)