On Thu, Oct 29, 2020 at 2:46 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Hongtao Liu <crazy...@gmail.com> writes:
> > On Tue, Oct 27, 2020 at 7:13 PM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> >>
> >> Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> >> > Hi:
> >> >   For inline asm, there could be an operand like (not (mem:)), it's
> >> > not a valid operand for normal memory constraint.
> >> >   Bootstrap is ok, regression test is ok for make check
> >> > RUNTESTFLAGS="--target_board='unix{-m32,}'"
> >> >
> >> > gcc/ChangeLog
> >> >         PR target/97540
> >> >         * ira.c: (ira_setup_alts): Extract memory from operand only
> >> >         for special memory constraint.
> >> >         * recog.c (asm_operand_ok): Ditto.
> >> >         * lra-constraints.c (process_alt_operands): MEM_P is
> >> >         required for normal memory constraint.
> >> >
> >> > gcc/testsuite/ChangeLog
> >> >         * gcc.target/i386/pr97540.c: New test.
> >>
> >> Sorry to stick my oar in, but I think we should reconsider the
> >> bcst_mem_operand approach.  It seems like these patches (and the
> >> previous one) are fighting against the principle that operands
> >> cannot be arbitrary expressions.
> >>
> >> This kind of thing was attempted long ago (even before my time!)
> >> for SIGN_EXTEND on MIPS.  It ended up causing more problems than
> >> it solved and in the end it had to be taken out.  I'm worried that
> >> we might end up going through the same cycle again.
> >>
> >> Also, this LRA code is extremely performance-sensitive in terms
> >> of compile time: it's often at the top or near the top of the profile.
> >> So adding calls to new functions like extract_mem_from_operand for
> >> a fairly niche case probably isn't a good trade-off.
> >>
> >> I think we should instead find a nice(?) syntax for generating separate
> >> patterns for the two bcst_vector_operand alternatives from a single
> >> .md pattern.  That would fit the existing model much more closely.
> >>
> >
> > We have define_subst for RTL template transformations, but it's not
> > suitable for this case(too many define_subst templates need to
> > be added, and it doesn't show any advantage compared to adding
> > separate bcst patterns.). I don't find other workable existing syntax for 
> > it.
>
> Yeah, I think it would need to be new syntax.  I was wondering if it
> would help if we had somethine like (strawman suggestion):
>
>           (one_of 0
>             [(match_operand:VI_AVX2 1 "vector_operand" "...")
>              (vec_duplicate:VI_AVX2
>                (match_operand:<...> 1 "..." "..."))]
>
> where all instances of (one_of N ...) for a given N are required
> to have the same number of alternatives.
>
> This could be handled in a similar way to define_subst, with the
> one_of being expanded before the main generator routines see it.
>
> But maybe it wouldn't help that much.  E.g. for:
>
> (define_insn "*<plusminus_insn><mode>3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
>         (plusminus:VI_AVX2
>           (match_operand:VI_AVX2 1 "bcst_vector_operand" "<comm>0,v")
>           (match_operand:VI_AVX2 2 "bcst_vector_operand" "xBm,vmBr")))]
>
> the vec_duplicate version should only really have one alternative.
It would be guaranteed by it's attribute (set_attr "isa" "noavx,avx"),
since bcst_mem_operand implies avx512f, which of course implies avx,
therefore the first alternative would never be enabled under avx.
> I guess we could handle that by using a:
>
>   (one_of 0
>     [(set_attr "enabled" "*,*")
>      (set_attr "enabled" "0,*")])
>
> or some variant of that that uses a derived attribute.  But it feels
> a bit clunky…
>
> Without that, I guess the only pattern that would benefit directly is:
>
> (define_insn "avx512dq_mul<mode>3<mask_name>"
>   [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v")
>         (mult:VI8_AVX512VL
>           (match_operand:VI8_AVX512VL 1 "bcst_vector_operand" "%v")
>           (match_operand:VI8_AVX512VL 2 "bcst_vector_operand" "vmBr")))]
>
> > So suppose I should revert my former 2 patches and add separate bcst 
> > patterns.
>
> Are there going to more patterns that need bcst_vector_operand,

Almost all AVX512 instructions need corresponding bcst patterns except
for those with 8-bit/16-bit data elements.
And the former patch at [1] is the first step to refactor the existing
bcst patterns. If it works without extra issue(unfortunately not),
I'll convert more patterns to bcst_vector_operand.

> or is the current set complete?
>
> I definitely think we should have a better way of handling this in the
> .md files, and I'd be happy to hack something up on the generator side
> (given that I'm being the awkward one here).  But I guess the answer to
> the question above will decide whether it make things better or not.
>
> FWIW, I think having separate patterns (whether they're produced from
> one .md construct or from several) might better optimisation results.

With proper extending for special memory constraint, they should be equivalent.
At least unit tests under i386 backend would generate embedded
broadcast instruction as before.

> But I guess there's a risk of combinatorial explosion, and the port has
> a lot of patterns as it is.
>
> Thanks,
> Richard


[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555948.html
-- 
BR,
Hongtao

Reply via email to