On Thu, 2016-10-06 at 15:30 +0200, Bernd Schmidt wrote:
> On 10/05/2016 06:15 PM, David Malcolm wrote:
> > +;; MEM[(struct isl_obj *)&obj1] = &isl_obj_map_vtable;
> > +(insn 1045 0 1046 2 (set (reg:SI 480)
> > +        (high:SI (symbol_ref:SI ("isl_obj_map_vtable")
> > +                    [flags 0xc0]
> > +                    <var_decl 0x7fa0363ea240
> > isl_obj_map_vtable>)))
> > +     y.c:12702 -1
> > +     (nil))
> > +(insn 1046 1045 1047 2 (set (reg/f:SI 479)
> > +        (lo_sum:SI (reg:SI 480)
> > +            (symbol_ref:SI ("isl_obj_map_vtable")
> > +               [flags 0xc0]
> > +               <var_decl 0x7fa0363ea240 isl_obj_map_vtable>)))
> > +     y.c:12702 -1
> > +     (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
> > +                             [flags 0xc0]
> > +                             <var_decl 0x7fa0363ea240
> > isl_obj_map_vtable>)
> > +        (nil)))
> 
> I hate saying this, since you've obviously spent a lot of effort, but
> I 
> think we're still at a too early stage to construct testcases. One
> issue 
> that came up while discussing previous patch kits is that the format 
> here is still too verbose, and I'd like to settle on a format first. 

We have to construct testcases in order to see what the testcase format
should look like - it's hard to talk about things without examples.

> There's all manner of things that are pointless in a testcase and
> impede 
> readability:
> 
>   * INSN_UIDs and NEXT/PREV fields (could be constructed from
> scratch,
>     except for labels)

A benefit of keeping the INSN_UIDs is that if you've spent half an hour
single-stepping through RTL modification and know that INSN_UID 1045 is
the one that gets corrupted, you can have that insn have UID 1045 in
the testcase.  Otherwise the UIDs get determined by the parser, and can
change if other insns get inserted.

Explicit UIDs let you refer to specific insns when discussing a test
case.

>   * INSN_CODE_NUMBERs in both textual and number form.
>   * Various (nil) fields

FWIW these (nil) fields signify the end of various expr_list chains.
If we emit them, there are places in the format where it would become
ambiguous as to which field is getting an expr_list.

>   * VAR_DECL addresses.

Here's a revised version of that example
(gcc/testsuite/rtl.dg/aarch64/pr71779.rtl), with the things you mention
removed, and with some indentation to aid readability:

;; { dg-do compile { target aarch64-*-* } }
;; { dg-options "-fsingle-pass=rtl-cse1 -fdump-rtl-cse1" }

(function "fragment"
  (insn-chain
    ;; MEM[(struct isl_obj *)&obj1] = &isl_obj_map_vtable;
    (insn 2 (set (reg:SI 480)
                 (high:SI (symbol_ref:SI ("isl_obj_map_vtable")
                              [flags 0xc0] <var_decl isl_obj_map_vtable>)))
      y.c:12702)
    (insn 2 (set (reg/f:SI 479)
                 (lo_sum:SI (reg:SI 480)
                            (symbol_ref:SI ("isl_obj_map_vtable")
                               [flags 0xc0]
                               <var_decl 0x7fa0363ea240 isl_obj_map_vtable>)))
      y.c:12702
      (expr_list:REG_EQUAL (symbol_ref:SI ("isl_obj_map_vtable")
                            [flags 0xc0]
                            <var_decl isl_obj_map_vtable>)))
    (insn 2 (set (reg:DI 481)
                 (subreg:DI (reg/f:SI 479) 0)) y.c:12702)
    (insn 2 (set (zero_extract:DI (reg/v:DI 191 [ obj1D.17368 ])
                                  (const_int 32 [0x20])
                                  (const_int 0 [0]))
                 (reg:DI 481)) y.c:12702)
    ;; Extra insn, to avoid all of the above from being deleted by DCE
    (insn 2 (set (mem:DI (reg:DI 191) [1 i+0 S4 A32])
                         (const_int 1 [0x1])))

  ) ;; insn-chain
) ;; function

This is with the optional cfg and crtl directives omitted.

Does the above look acceptable?

The "2" values after each "insn" are the basic block indices.  Should
these be omitted also?

My hope here is to that the input format is something that existing gcc
developers will be comfortable reading, which is why I opted for
parsing the existing output - and I'm nervous about moving away too
much from the existing format.  In particular, I want the format to be
something that can be automatically printed and roundtripped, and for
it to be useful to look at when in the debugger.

Some other possible changes: locations could do with a wrapper, and/or
to quote the filename, since otherwise we can't cope with spaces in
filenames).  So instead of:

    (insn 2 (set (reg:DI 481)
                 (subreg:DI (reg/f:SI 479)
0)) y.c:12702)

we could have:

    (insn 2 (set (reg:DI 481)
                 (subreg:DI (reg/f:SI 479) 0)) "y.c":12702)

or:

    (insn 2 (set (reg:DI 481)
                 (subreg:DI (reg/f:SI 479)
0)) (srcloc "y.c" 12702))

Any preferences on these?

> What does the RTL reader actually do with MEM_ATTRs?

It parses them, where it can parse the MEM_EXPR, at least (it can only
handle simple decl names for MEM_EXPR).

> Do these survive 
> the dump/read process?

Yes, for the MEM_EXPR cases above.  So it will successfully parse the
"[1 i+0 S4 A32]" above.

> There was also a testcase where virtual regs still occurred with 
> register number, I think it would be best to disallow this and add
> the 
> ability to parse "(reg:DI virtual-outgoing-args)".

When we discussing pseudos you expressed a need to keep printing the
regno for REGs, so that the dump format is usable in the debugger, for
array indices etc.  Hence I'd prefer to keep printing the regno for all
regnos, including virtuals.

> Also I think I'd prefer testcases submitted separately from the RTL 
> frontend. It seems we have two different frontend approaches here,
> one 
> RTL frontend and one modification for the C frontend - which do you
> prefer?

Is it acceptable to do both?  I'd like to have an actual RTL frontend,
though this punts on handling structs etc, whereas Richi seems to
prefer the modification to the C frontend, since it better supports
structs, aliasing, etc.  At this point, whichever gets me past patch
review...

Dave

Reply via email to