On 2021-07-12 14:20, Richard Biener wrote:
On Fri, 9 Jul 2021, Segher Boessenkool wrote:

On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> I wonder if there's a way to query the target what modes the doloop
> pattern can handle (not being too familiar with the doloop code).

You can look what modes are allowed for operand 0 of doloop_end,
perhaps? Although that is a define_expand, not a define_insn, so it is
hard to introspect.

> Why do you need to do any checks besides the new type being able to
> represent all IV values?  The original doloop IV will never wrap
> (OTOH if niter is U*_MAX then we compute niter + 1 which will become
> zero ... I suppose the doloop might still do the correct thing here
> but it also still will with a IV with larger type).

The issue comes from U*_MAX (original short MAX), as you said: on which
niter + 1 becomes zero.  And because the step for doloop is -1; then, on
larger type 'zero - 1' will be a very large number on larger type
(e.g. 0xff...ff); but on the original short type 'zero - 1' is a small value
(e.g. "0xff").


doloop_valid_p guarantees it is simple and doesn't wrap.

> I'd have expected sth like
>
>    ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
> (ntype));
>
> thus the decision made using a mode - which is also why I wonder
> if there's a way to query the target for this.  As you say,
> it _may_ be fast, so better check (somehow).


I was also thinking of using hooks like type_for_size/type_for_mode.
    /* Use type in word size may fast.  */
    if (TYPE_PRECISION (ntype) < BITS_PER_WORD
&& Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype))))
      {
        ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
        base = fold_convert (ntype, base);
      }

As you pointed out, this does not query the mode from targets.
As Segher pointed out "doloop_end" checks unsupported mode, while it seems
not easy to use it in tree-ssa-loop-ivopts.c.
For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 requires Pmode/DImode; while there are other targets that work on other 'mode' (e.g. SI).


In doloop_optimize, there is code:

```
    mode = desc->mode;
.....
    doloop_reg = gen_reg_rtx (mode);
rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);

    word_mode_size = GET_MODE_PRECISION (word_mode);
    word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 1;
    if (! doloop_seq
        && mode != word_mode
/* Before trying mode different from the one in that # of iterations is computed, we must be sure that the number of iterations fits into
           the new mode.  */
        && (word_mode_size >= GET_MODE_PRECISION (mode)
            || wi::leu_p (iterations_max, word_mode_max)))
      {
        if (word_mode_size > GET_MODE_PRECISION (mode))
count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, mode);
        else
          count = lowpart_subreg (word_mode, count, mode);
        PUT_MODE (doloop_reg, word_mode);
        doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
      }
    if (! doloop_seq)
      {
        if (dump_file)
          fprintf (dump_file,
                   "Doloop: Target unwilling to use doloop pattern!\n");
        return false;
      }
```
The above code first tries the mode of niter_desc by call targetm.gen_doloop_end
to see if the target can generate doloop insns, if fail, then try to use
'word_mode' against gen_doloop_end.



Almost all targets just use Pmode, but there is no such guarantee I
think, and esp. some targets that do not have machine insns for this
(but want to generate different code for this anyway) can do pretty much
anything.

Maybe using just Pmode here is good enough though?

I think Pmode is a particularly bad choice and I'd prefer word_mode
if we go for any hardcoded mode.  s390x for example seems to handle
both SImode and DImode (but names the helper gen_doloop_si64
for SImode?!).  But indeed it looks like somehow querying doloop_end
is going to be difficult since the expander doesn't have any mode,
so we'd have to actually try emit RTL here.

Instead of using hardcode mode, maybe we could add a hook for targets to return
the preferred mode.


Thanks for those valuable comments!

Jiufu Guo




Richard.

Reply via email to