On Tue, Nov 20, 2018 at 7:27 PM Andi Kleen <a...@firstfloor.org> wrote:
>
> On Tue, Nov 20, 2018 at 01:04:19PM +0100, Richard Biener wrote:
> > Since your builtin clobbers memory
>
> Hmm, maybe we could get rid of that, but then how to avoid
> the optimizer moving it around over function calls etc.?
> The instrumentation should still be useful when the program
> crashes, so we don't want to delay logging too much.

You can't avoid moving it, yes.  But it can be even moved now, effectively
detaching it from the "interesting" $pc ranges we have debug location lists for?

> > Maybe even instead pass it a number of bytes so it models how atomics work.
>
> How could that reject float?

Why does it need to reject floats?  Note you are already rejecting floats
in the instrumentation pass.

> Mode seems better for now.
>
> Eventually might support float/double through memory, but not in the
> first version.

Why does movq %xmm0, %rax; ptwrite %rax not work?

>
> > >    NEXT_PASS (pass_tsan_O0);
> > >    NEXT_PASS (pass_sanopt);
> > > +  NEXT_PASS (pass_vartrace);
> >
> > I'd move it one lower, after pass_cleanup_eh.  Further enhancement
> > would make it a
> > RTL pass ...
>
> It's after pass_nrv now.
>
> > So in reality the restriction is on the size of the object, correct?
>
> The instruction accepts 32 or 64bit memory or register.
>
> In principle everything could be logged through this, but i was trying
> to limit the cases to integers and pointers for now to simplify
> the problem.
>
> Right now the backend fails up when something else than 4 or 8 bytes
> is passed.

Fair enough, the instrumentation would need to pad out smaller values
and/or split larger values.  I think 1 and 2 byte values would be interesting
so you can support char and shorts.  Eventually 16byte values for __int128
or vectors.

> >
> > > +{
> > > +  if (!supported_type (t))
> >
> > You handle some nested cases below via recursion,
> > like a.b.c (but not a[i][j]).  But then this check will
> > fire.  I think it would be better to restructure the
> > function to look at the outermost level for whether
> > the op is of supported type, thus we can log it
> > at all and then get all the way down to the base via
> > sth like
> >
> >   if (!supported_type (t))
> >     return false;
> >   enum attrstate s = <some default>;
> >   do
> >     {
> >        s = supported_op (t, s);
> >        if (s == force_off)
> >          return false;
> >     }
> >   while (handled_component_p (t) && (t = TREE_OPERAND (t, 0)))
> >
> > Now t is either an SSA_NAME, a DECL (you fail to handle PARM_DECL
>
> Incoming arguments and returns are handled separately.
>
> > and RESULT_DECL below) or a [TARGET_]MEM_REF.  To get rid
> > of non-pointer indirections do then
> >
> >       t = get_base_address (t);
> >       if (DECL_P (t) && is_local (t))
> >   ....
> >
> > because...
> >
> > > +    return false;
> > > +
> > > +  enum attrstate s = supported_op (t, neutral);
> > > +  if (s == force_off)
> > > +    return false;
> > > +  if (s == force_on)
> > > +    force = true;
> > > +
> > > +  switch (TREE_CODE (t))
> > > +    {
> > > +    case VAR_DECL:
> > > +      if (DECL_ARTIFICIAL (t))
> > > +       return false;
> > > +      if (is_local (t))
> > > +       return true;
> > > +      return s == force_on || force;
> > > +
> > > +    case ARRAY_REF:
> > > +      t = TREE_OPERAND (t, 0);
> > > +      s = supported_op (t, s);
> > > +      if (s == force_off)
> > > +       return false;
> > > +      return supported_type (TREE_TYPE (t));
> >
> > Your supported_type is said to take a DECL.  And you
> > already asked for this type - it's the type of the original t
> > (well, the type of this type given TREE_TYPE (t) is an array type).
> > But you'll reject a[i][j] where the type of this type is an array type as 
> > well.
>
> Just to be clear, after your changes above I only need
> to handle VAR_DECL and SSA_NAME here then, correct?

Yes (and PARM_DECL and RESULT_DECL).

> So one of the reasons I handled ARRAY_REF this way is to
> trace the index as a local if needed. If I can assume
> it was always in a MEM with an own ASSIGN earlier if the local
> was a user visible that wouldn't be needed (and also some other similar
> code elsewhere)

If the index lives in memory it has a corresponding load.  For SSA uses
see my comment about instrumenting them at all (together with the
suggestion on how to handle them in an easier way).

>
> But when I look at a simple test case like vartrace-6
>
> void
> f (void)
> {
>   int i;
>   for (i = 0; i < 10; i++)
>    f2 ();
> }
>
> i appears to be a SSA name only that is referenced everywhere without
> MEM. And if the user wants to track the value of i I would need
> to explicitely handle all these cases. Do I miss something here?

You handle those in different places I think.

> I'm starting to think i should perhaps drop locals support to simplify
> everything? But that might limit usability for debugging somewhat.

I think you confuse "locals" a bit.  In GIMPLE an automatic variable
of type 'int' might be assigned a memory location, then reads
(if not cached in a register) happen via separate memory load statements.
If it has not been assigned a memory location then it lives in a (SSA)
register.

>
> > gsi_insert_* does update_stmt already.  Btw, if you allow any
> > SImode or DImode size value you can use a VIEW_CONVERT_EXPR
>
> Just add them unconditionally?

You can use the simplification machinery to elide it automatically
for example.

   tree tem = gimple_build (&seq, VIEW_CONVERT_EXPR,
                                       uint/ulong-type, val);

gives you a register or value of desired type with a conversion
stmt appended to seq if one is required.

> > > +bool
> > > +instrument_args (function *fun)
> > > +{
> > > +  gimple_stmt_iterator gi;
> > > +  bool changed = false;
> > > +
> > > +  /* Local tracing usually takes care of the argument too, when
> > > +     they are read. This avoids redundant trace instructions.  */
> >
> > But only when instrumenting reads?
>
> Yes will add the check.
>
> >
> > Hmm, but then this requires the target instruction to have a memory operand?
>
> Yes that's right for now. Eventually it will be fixed and x86 would
> benefit too.
>
> > That's going to be unlikely for RISCy cases?  On x86 does it work if
> > combine later does not syntesize a ptwrite with memory operand?
> > I also wonder how this survives RTL CSE since you are basically doing
> >
> >   mem = val;  // orig stmt
> >   val' = mem;
> >   ptwrite (val');
> >
> > that probably means when CSE removes the load there ends up a debug-insn
> > reflecting what you want?
>
> I'll check.
>
> > > +  /* Handle operators in case they read locals.  */
> >
> > Does it make sense at all to instrument SSA "reads"?  You know
> > all SSA vars have a single definition and all locals not in SSA form
> > are represented with memory reads/writes, so ...
>
> Ok but how about the locals in SSA form? Like in the example above.
>
> I would love to simplify all this, but I fear that it would make
> the locals tracking unusable.

I'd instrument SSA variables at the point of their definition.

> >
> > > +  if (gimple_num_ops (gas) > 2)
> > > +    {
> >
> > ... this case looks suprious.  And as said you probably do not want to
> > isnstrument SSA "reads"?  Also not above when instrumenting
> > gimple_assign_rhs1 (gas),
> > so better guard that with gimple_assing_load_p (gas)?
> >
> > > +      tvar = instrument_mem (gi, gimple_assign_rhs2 (gas),
> > > +                             (flag_vartrace & VARTRACE_READS) || force,
> > > +                             gas, "assign load2");
> > > +      if (tvar)
> > > +       {
> > > +         gimple_assign_set_rhs2 (gas, tvar);
> >
> > you still have this stmt adjusting here, why?
>
> I tried removing it, but I had problems during testing (IIRC it was
> definitions used after assignment), so I readded it. It might be me
> doing something bogus elsewhere.

Probably...

> > > +/* Instrument return at statement STMT at GI with FORCE. Return true
> > > +   if changed.  */
> > > +
> > > +bool
> > > +instrument_return (gimple_stmt_iterator *gi, greturn *gret, bool force)
> > > +{
> > > +  tree rval = gimple_return_retval (gret);
> > > +
> > > +  if (!rval)
> > > +    return false;
> > > +  if (DECL_P (rval) && DECL_BY_REFERENCE (rval))
> > > +    rval = build_simple_mem_ref (ssa_default_def (cfun, rval));
> >
> > This looks bogus.  If DECL_BY_REFERENCE is true then rval
> > is of pointer type and thus a register, you shouldn't ever see that
> > returned plainly in gimple_return_retval.
>
> Ok so what to do with the DECL_BY_REFERENCE then?
>
> I think i copied this from some other pass.

The above should be unreachable so just remove it ;)

> >
> > I would guess instrumenting asms() has the chance of disturbing
> > reg-alloc quite a bit...
>
> You want to make it optional with a new argument?

Not sure.

> > > +{
> > > +  basic_block bb;
> > > +  gimple_stmt_iterator gi;
> > > +  bool force = 0;
> > > +
> > > +  if (lookup_attribute ("vartrace", TYPE_ATTRIBUTES (TREE_TYPE 
> > > (fun->decl)))
> > > +      || lookup_attribute ("vartrace", DECL_ATTRIBUTES (fun->decl)))
> >
> > btw, I wonder whether the vartrace attribute should have an argument like
> > vartrace(locals,reads,...)?
>
> I was hoping this could be delayed until actually needed. It'll need
> some changes because I don't want to do the full parsing for every decl
> all the time, so would need to store a bitmap of options somewhere
> in tree.

Hmm, indeed.  I wonder if you need the function attribute at all then?  Maybe
a negative, no-vartrace is enough?  Maybe even that is not needed?
That is, on types and decls you'd interpret it as if locals tracing is on
then locals of type are traced but otherwise locals of type are not traced?
That is, if I just want to trace variable i and do

 int i __attribute__((vartrace));

then what options do I enable to make that work and how can I avoid
tracing anything else?  Similar for

 typedef int traced_int __attribute_((vartrace));

?  I guess we'd need -fvar-trace=locals to get the described effect for
the type attribute and then -fvar-trace=locals,all to have _all_ locals
traced?  Or -fvar-trace=locals,only-marked? ... or forgo with the idea
of marking types?

> > > +
> > > +  FOR_EACH_BB_FN (bb, fun)
> > > +    for (gi = gsi_start_bb (bb); !gsi_end_p (gi); gsi_next (&gi))
> > > +      {
> > > +       gimple *stmt = gsi_stmt (gi);
> >
> > Not sure if I suggested it during the first review but there's
> >
> >   walk_stmt_load_store_ops ()
> >
> > which lets you walk (via callbacks) all memory loads and stores in a stmt
> > (also loads of non-SSA registers).  Then there's
> >
> >   FOR_EACH_SSA_DEF_OPERAND ()
> >
> > or alternatively FOR_EACH_SSA_TREE_OPERAND () in case you are
> > also interested in uses.  For SSA uses you are currently missing
> > indexes in ARRAY_REFs and friends.  But as said I think you really
> > want to avoid instrumenting SSA uses.
>
> I would love too, but would the example above still trace "i" ?

Yes.  Or, well...  at -O0 you instrument sth like

f ()
{
  int i;

  <bb 2> :
  i_3 = 0;
  goto <bb 4>; [INV]

  <bb 3> :
  f2 ();
  i_6 = i_1 + 1;

  <bb 4> :
  # i_1 = PHI <i_3(2), i_6(3)>
  if (i_1 <= 9)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 5> :
  return;

if you instrument at SSA definition sites this would become

f ()
{
  int i;

  <bb 2> :
  i_3 = 0;
  trace (0);
  goto <bb 4>; [INV]

  <bb 3> :
  f2 ();
  i_6 = i_1 + 1;
  trace (i_6);

  <bb 4> :
  # i_1 = PHI <i_3(2), i_6(3)>
  trace (i_1);
  if (i_1 <= 9)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 5> :
  return;

of course when optimizing you'll instead see

f ()
{
  unsigned int ivtmp_2;
  unsigned int ivtmp_10;

  <bb 2> [local count: 97603132]:

  <bb 3> [local count: 976138693]:
  # ivtmp_10 = PHI <10(2), ivtmp_2(3)>
  f2 ();
  ivtmp_2 = ivtmp_10 + 4294967295;
  if (ivtmp_2 != 0)
    goto <bb 3>; [90.91%]
  else
    goto <bb 4>; [9.09%]

  <bb 4> [local count: 97603132]:
  return;

so there's no 'i' anymore.  If you enable debug-info you
can get it back via looking at DEBUG_INSNs:

  <bb 2> [local count: 97603132]:
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG i => 0
  # DEBUG i => 0

  <bb 3> [local count: 976138693]:
  # ivtmp_10 = PHI <10(2), ivtmp_2(3)>
  # DEBUG i => (int) (10 - ivtmp_10)
  # DEBUG BEGIN_STMT
  f2 ();
  # DEBUG D#1 => (int) (11 - ivtmp_10)
  # DEBUG i => D#1
  # DEBUG i => D#1
  ivtmp_2 = ivtmp_10 + 4294967295;
  if (ivtmp_2 != 0)
    goto <bb 3>; [90.91%]
  else
    goto <bb 4>; [9.09%]

  <bb 4> [local count: 97603132]:
  return;

but that's going to be a bit more tricky.  That is,
instrumenting memory is easy, instrumenting
register "reads"/"writes" is going to be a bit
tricky, at least when looking at optimized code
and instrumenting late.  (but ptwrite is only
interesting for optimized code, no?)

Richard.

>
> -Andi
>

Reply via email to