On 2021-07-12 16:57, Richard Biener wrote:
On Mon, 12 Jul 2021, guojiufu wrote:

On 2021-07-12 14:20, Richard Biener wrote:
> On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>
>> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>> > I wonder if there's a way to query the target what modes the doloop
>> > pattern can handle (not being too familiar with the doloop code).
>>
>> You can look what modes are allowed for operand 0 of doloop_end,
>> perhaps?  Although that is a define_expand, not a define_insn, so it is
>> hard to introspect.
>>
>> > Why do you need to do any checks besides the new type being able to
>> > represent all IV values?  The original doloop IV will never wrap
>> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
>> > zero ... I suppose the doloop might still do the correct thing here
>> > but it also still will with a IV with larger type).

The issue comes from U*_MAX (original short MAX), as you said: on which niter + 1 becomes zero. And because the step for doloop is -1; then, on
larger type 'zero - 1' will be a very large number on larger type
(e.g. 0xff...ff); but on the original short type 'zero - 1' is a small value
(e.g. "0xff").

But for the larger type the small type MAX + 1 fits and does not yield
zero so it should still work exactly as before, no?  Of course you
have to compute the + 1 in the larger type.

You are right, if compute the "+ 1" in the larger type it is ok, as below code:
```
   /* Use type in word size may fast.  */
    if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
      {
        ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
        niter = fold_convert (ntype, niter);
      }

    tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
                             build_int_cst (ntype, 1));


add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL, true);
```
The issue of this is, this code generates more stmt for doloop.xxx:
  _12 = (unsigned int) xx(D);
  _10 = _12 + 4294967295;
  _24 = (long unsigned int) _10;
  doloop.6_8 = _24 + 1;

if use previous patch, "+ 1" on original type, then the stmts will looks like:
  _12 = (unsigned int) xx(D);
  doloop.6_8 = (long unsigned int) _12;

This is the reason for checking
   wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))

>>
>> doloop_valid_p guarantees it is simple and doesn't wrap.
>>
>> > I'd have expected sth like
>> >
>> >    ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
>> > (ntype));
>> >
>> > thus the decision made using a mode - which is also why I wonder
>> > if there's a way to query the target for this.  As you say,
>> > it _may_ be fast, so better check (somehow).


I was also thinking of using hooks like type_for_size/type_for_mode.
    /* Use type in word size may fast.  */
    if (TYPE_PRECISION (ntype) < BITS_PER_WORD
        && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
(ntype))))
      {
        ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
        base = fold_convert (ntype, base);
      }

As you pointed out, this does not query the mode from targets.
As Segher pointed out "doloop_end" checks unsupported mode, while it seems
not easy to use it in tree-ssa-loop-ivopts.c.
For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 requires Pmode/DImode; while there are other targets that work on other 'mode' (e.g.
SI).


In doloop_optimize, there is code:

```
    mode = desc->mode;
.....
    doloop_reg = gen_reg_rtx (mode);
    rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg,
start_label);

    word_mode_size = GET_MODE_PRECISION (word_mode);
word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 1;
    if (! doloop_seq
        && mode != word_mode
        /* Before trying mode different from the one in that # of
iterations is
computed, we must be sure that the number of iterations fits into
           the new mode.  */
        && (word_mode_size >= GET_MODE_PRECISION (mode)
            || wi::leu_p (iterations_max, word_mode_max)))
      {
        if (word_mode_size > GET_MODE_PRECISION (mode))
count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, mode);
        else
          count = lowpart_subreg (word_mode, count, mode);
        PUT_MODE (doloop_reg, word_mode);
        doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
      }
    if (! doloop_seq)
      {
        if (dump_file)
          fprintf (dump_file,
"Doloop: Target unwilling to use doloop pattern!\n");
        return false;
      }
```
The above code first tries the mode of niter_desc by call
targetm.gen_doloop_end
to see if the target can generate doloop insns, if fail, then try to use
'word_mode' against gen_doloop_end.


>>
>> Almost all targets just use Pmode, but there is no such guarantee I
>> think, and esp. some targets that do not have machine insns for this
>> (but want to generate different code for this anyway) can do pretty much
>> anything.
>>
>> Maybe using just Pmode here is good enough though?
>
> I think Pmode is a particularly bad choice and I'd prefer word_mode
> if we go for any hardcoded mode.  s390x for example seems to handle
> both SImode and DImode (but names the helper gen_doloop_si64
> for SImode?!).  But indeed it looks like somehow querying doloop_end
> is going to be difficult since the expander doesn't have any mode,
> so we'd have to actually try emit RTL here.

Instead of using hardcode mode, maybe we could add a hook for targets to
return
the preferred mode.

That's a possiblity of course.  Like the following which just shows the
default implementation then (pass in current mode, return a more preferred
mode or the mode itself)

enum machine_mode
prefered_doloop_mode (enum machine_mode mode)
{
  return mode;
}

Yes, thanks!

Checking current do_loop_end in targets, in general, when do_loop_end requires SI mode, the target is defining Pmode as SImode and word_mode (from BITS_PER_WORD which defaults from UNITS_PER_WORD) is also defined to align with SI mode. When do_loop_end requires DI mode, the target is defining Pmode as DImode
and word_mode/UNITS_PER_WORD is also defined to align with DI mode.

So, if aggressively, then by default we may just return word_mode.

BR,

Jiufu Guo.



Thanks for those valuable comments!

Jiufu Guo



>
> Richard.


Reply via email to