Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-26 Thread Richard Henderson
On 09/26/2017 09:31 AM, Lluís Vilanova wrote:
> Lluís Vilanova writes:
> 
>> Richard Henderson writes:
>>> On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
 Richard Henderson writes:

> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>> TCG BBLs and instructions have multiple exit points from where to raise
>> tracing events, but some of the necessary information in the generic
>> disassembly infrastructure is not available until after generating these
>> exit points.
>>
>> This patch adds support for "inline points" (where the tracing code will
>> be placed), and "inline regions" (which identify the TCG code that must
>> be inlined). The TCG compiler will basically copy each inline region to
>> any inline points that reference it.

> I am not keen on this.

> Is there a reason you can't just emit the tracing code at the appropriate 
> place
> to begin with?  Perhaps I have to wait to see how this is used...

 As I tried to briefly explain on next patch, the main problem without 
 inlining
 is that we will see guest_tb_after_trans twice on the trace for each TB in
 conditional instructions on the guest, since they have two exit points 
 (which we
 capture when emitting goto_tb in TCG).
> 
>>> Without seeing the code, I suspect this is because you didn't examine the
>>> argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been 
>>> emitted
>>> and avoid logging twice.
> 
>> The generated tracing code for 'guest_*_after' must be right before the
>> "goto_tb" opcode at the end of a TB (AFAIU generated by
>> tcg_gen_lookup_and_goto_ptr()), and we have two of those when decoding a 
>> guest
>> conditional jump.
> 
>> If we couple this with the semantics of the trace_*_tcg functions (trace the
>> event at translation time, and generate TCG code to trace the event at 
>> execution
>> time), we get the case I described (we don't want to call 
>> trace_tb_after_tcg()
>> or trace_insn_after_tcg() twice for the same TB or instruction).
> 
>> That is, unless I've missed something.
> 
> 
>> The only alternative I can think of is changing tracetool to offer an 
>> additional
>> API that provides separate functions for translation-time tracing and
>> execution-time generation. So from this:
> 
>>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>>   {
>>   trace_event_trans(cpu, ...);
>>   if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>>   gen_helper_trace_event_exec(env, ...);
>>   }
>>   }
> 
>> We can extend it into this:
> 
>>   static inline void gen_trace_event_exec(TCGv_env env, ...)
>>   if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>>   gen_helper_trace_event_exec(env, ...);
>>   }
>>   }
>>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>>   {
>>   trace_event_trans(cpu, ...);
>>   gen_trace_event_exec(env, ...);
>>   }
> 
> Richard, do you prefer to keep the "TCG inline" feature or switch the internal
> tracing API to this second approach?

I don't think I fully understand what you're proposing.  The example
transformation above is merely syntactic and has no functional change.

As previously stated, I'm not keen on the "tcg inline" approach.  I would
prefer that you hook into tcg_gen_{exit_tb,goto_tb,goto_ptr} functions within
tcg/tcg-op.c to log transitions between TBs.


r~



Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-26 Thread Lluís Vilanova
Lluís Vilanova writes:

> Richard Henderson writes:
>> On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
>>> Richard Henderson writes:
>>> 
 On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
> TCG BBLs and instructions have multiple exit points from where to raise
> tracing events, but some of the necessary information in the generic
> disassembly infrastructure is not available until after generating these
> exit points.
> 
> This patch adds support for "inline points" (where the tracing code will
> be placed), and "inline regions" (which identify the TCG code that must
> be inlined). The TCG compiler will basically copy each inline region to
> any inline points that reference it.
>>> 
 I am not keen on this.
>>> 
 Is there a reason you can't just emit the tracing code at the appropriate 
 place
 to begin with?  Perhaps I have to wait to see how this is used...
>>> 
>>> As I tried to briefly explain on next patch, the main problem without 
>>> inlining
>>> is that we will see guest_tb_after_trans twice on the trace for each TB in
>>> conditional instructions on the guest, since they have two exit points 
>>> (which we
>>> capture when emitting goto_tb in TCG).

>> Without seeing the code, I suspect this is because you didn't examine the
>> argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been 
>> emitted
>> and avoid logging twice.

> The generated tracing code for 'guest_*_after' must be right before the
> "goto_tb" opcode at the end of a TB (AFAIU generated by
> tcg_gen_lookup_and_goto_ptr()), and we have two of those when decoding a guest
> conditional jump.

> If we couple this with the semantics of the trace_*_tcg functions (trace the
> event at translation time, and generate TCG code to trace the event at 
> execution
> time), we get the case I described (we don't want to call trace_tb_after_tcg()
> or trace_insn_after_tcg() twice for the same TB or instruction).

> That is, unless I've missed something.


> The only alternative I can think of is changing tracetool to offer an 
> additional
> API that provides separate functions for translation-time tracing and
> execution-time generation. So from this:

>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>   {
>   trace_event_trans(cpu, ...);
>   if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>   gen_helper_trace_event_exec(env, ...);
>   }
>   }

> We can extend it into this:

>   static inline void gen_trace_event_exec(TCGv_env env, ...)
>   if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
>   gen_helper_trace_event_exec(env, ...);
>   }
>   }
>   static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
>   {
>   trace_event_trans(cpu, ...);
>   gen_trace_event_exec(env, ...);
>   }

Richard, do you prefer to keep the "TCG inline" feature or switch the internal
tracing API to this second approach?


Thanks,
  Lluis



Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-15 Thread Lluís Vilanova
Richard Henderson writes:

> On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
>> Richard Henderson writes:
>> 
>>> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
 TCG BBLs and instructions have multiple exit points from where to raise
 tracing events, but some of the necessary information in the generic
 disassembly infrastructure is not available until after generating these
 exit points.
 
 This patch adds support for "inline points" (where the tracing code will
 be placed), and "inline regions" (which identify the TCG code that must
 be inlined). The TCG compiler will basically copy each inline region to
 any inline points that reference it.
>> 
>>> I am not keen on this.
>> 
>>> Is there a reason you can't just emit the tracing code at the appropriate 
>>> place
>>> to begin with?  Perhaps I have to wait to see how this is used...
>> 
>> As I tried to briefly explain on next patch, the main problem without 
>> inlining
>> is that we will see guest_tb_after_trans twice on the trace for each TB in
>> conditional instructions on the guest, since they have two exit points 
>> (which we
>> capture when emitting goto_tb in TCG).

> Without seeing the code, I suspect this is because you didn't examine the
> argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been emitted
> and avoid logging twice.

The generated tracing code for 'guest_*_after' must be right before the
"goto_tb" opcode at the end of a TB (AFAIU generated by
tcg_gen_lookup_and_goto_ptr()), and we have two of those when decoding a guest
conditional jump.

If we couple this with the semantics of the trace_*_tcg functions (trace the
event at translation time, and generate TCG code to trace the event at execution
time), we get the case I described (we don't want to call trace_tb_after_tcg()
or trace_insn_after_tcg() twice for the same TB or instruction).

That is, unless I've missed something.


The only alternative I can think of is changing tracetool to offer an additional
API that provides separate functions for translation-time tracing and
execution-time generation. So from this:

  static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
  {
  trace_event_trans(cpu, ...);
  if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
  gen_helper_trace_event_exec(env, ...);
  }
  }

We can extend it into this:

  static inline void gen_trace_event_exec(TCGv_env env, ...)
  if (trace_event_get_vcpu_state(cpu, EVENT_EXEC)) {
  gen_helper_trace_event_exec(env, ...);
  }
  }
  static inline void trace_event_tcg(CPUState *cpu, TCGv_env env, ...)
  {
  trace_event_trans(cpu, ...);
  gen_trace_event_exec(env, ...);
  }


Cheers,
  Lluis



Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-14 Thread Richard Henderson
On 09/14/2017 08:20 AM, Lluís Vilanova wrote:
> Richard Henderson writes:
> 
>> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>>> TCG BBLs and instructions have multiple exit points from where to raise
>>> tracing events, but some of the necessary information in the generic
>>> disassembly infrastructure is not available until after generating these
>>> exit points.
>>>
>>> This patch adds support for "inline points" (where the tracing code will
>>> be placed), and "inline regions" (which identify the TCG code that must
>>> be inlined). The TCG compiler will basically copy each inline region to
>>> any inline points that reference it.
> 
>> I am not keen on this.
> 
>> Is there a reason you can't just emit the tracing code at the appropriate 
>> place
>> to begin with?  Perhaps I have to wait to see how this is used...
> 
> As I tried to briefly explain on next patch, the main problem without inlining
> is that we will see guest_tb_after_trans twice on the trace for each TB in
> conditional instructions on the guest, since they have two exit points (which 
> we
> capture when emitting goto_tb in TCG).

Without seeing the code, I suspect this is because you didn't examine the
argument to tcg_gen_exit_tb.  You can tell when goto_tb must have been emitted
and avoid logging twice.


r~



Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-14 Thread Lluís Vilanova
Richard Henderson writes:

> On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
>> TCG BBLs and instructions have multiple exit points from where to raise
>> tracing events, but some of the necessary information in the generic
>> disassembly infrastructure is not available until after generating these
>> exit points.
>> 
>> This patch adds support for "inline points" (where the tracing code will
>> be placed), and "inline regions" (which identify the TCG code that must
>> be inlined). The TCG compiler will basically copy each inline region to
>> any inline points that reference it.

> I am not keen on this.

> Is there a reason you can't just emit the tracing code at the appropriate 
> place
> to begin with?  Perhaps I have to wait to see how this is used...

As I tried to briefly explain on next patch, the main problem without inlining
is that we will see guest_tb_after_trans twice on the trace for each TB in
conditional instructions on the guest, since they have two exit points (which we
capture when emitting goto_tb in TCG).

We cannot instead emit it only once by overloading the brcond opcode in TCG,
since that can be used internally in the guest instruction emulation without
necessarily ending a TB (or we could have more than one brcond for a single
instruction).

I hope it's clearer now.


Thanks,
  Lluis



Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
> TCG BBLs and instructions have multiple exit points from where to raise
> tracing events, but some of the necessary information in the generic
> disassembly infrastructure is not available until after generating these
> exit points.
> 
> This patch adds support for "inline points" (where the tracing code will
> be placed), and "inline regions" (which identify the TCG code that must
> be inlined). The TCG compiler will basically copy each inline region to
> any inline points that reference it.

I am not keen on this.

Is there a reason you can't just emit the tracing code at the appropriate place
to begin with?  Perhaps I have to wait to see how this is used...


r~



[Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-10 Thread Lluís Vilanova
TCG BBLs and instructions have multiple exit points from where to raise
tracing events, but some of the necessary information in the generic
disassembly infrastructure is not available until after generating these
exit points.

This patch adds support for "inline points" (where the tracing code will
be placed), and "inline regions" (which identify the TCG code that must
be inlined). The TCG compiler will basically copy each inline region to
any inline points that reference it.

Signed-off-by: Lluís Vilanova 
---
 include/qemu/log.h  |1 
 include/qemu/typedefs.h |1 
 tcg/tcg-op.h|   39 +++
 tcg/tcg-opc.h   |3 +
 tcg/tcg.c   |  166 +++
 tcg/tcg.h   |   18 +
 util/log.c  |2 +
 7 files changed, 230 insertions(+)

diff --git a/include/qemu/log.h b/include/qemu/log.h
index a50e994c21..23acc63c73 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -43,6 +43,7 @@ static inline bool qemu_log_separate(void)
 #define CPU_LOG_PAGE   (1 << 14)
 #define LOG_TRACE  (1 << 15)
 #define CPU_LOG_TB_OP_IND  (1 << 16)
+#define CPU_LOG_TB_OP_INLINE (1 << 17)
 
 /* Returns true if a bit is set in the current loglevel mask
  */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 39bc8351a3..2fb5670af3 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -96,6 +96,7 @@ typedef struct SerialState SerialState;
 typedef struct SHPCDevice SHPCDevice;
 typedef struct SMBusDevice SMBusDevice;
 typedef struct SSIBus SSIBus;
+typedef struct TCGInlineLabel TCGInlineLabel;
 typedef struct uWireSlave uWireSlave;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct Visitor Visitor;
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 5d3278f243..da3784f8f2 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -326,6 +326,45 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg);
 void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg);
 
+static inline int _get_inline_index(TCGInlineLabel *l)
+{
+TCGContext *s = &tcg_ctx;
+return l - s->inline_labels;
+}
+
+static inline void gen_set_inline_point(TCGInlineLabel *l)
+{
+TCGContext *s = &tcg_ctx;
+TCGInlinePoint *p = tcg_malloc(sizeof(TCGInlinePoint));
+p->op_idx = s->gen_next_op_idx;
+p->next_point = l->first_point;
+l->first_point = p;
+tcg_gen_op1i(INDEX_op_set_inline_point,
+ _get_inline_index(l));
+}
+
+static inline void gen_set_inline_region_begin(TCGInlineLabel *l)
+{
+TCGContext *s = &tcg_ctx;
+if (l->begin_op_idx != -1) {
+tcg_abort();
+}
+l->begin_op_idx = s->gen_next_op_idx;
+tcg_gen_op1i(INDEX_op_set_inline_region_begin,
+ _get_inline_index(l));
+}
+
+static inline void gen_set_inline_region_end(TCGInlineLabel *l)
+{
+TCGContext *s = &tcg_ctx;
+if (l->begin_op_idx == -1) {
+tcg_abort();
+}
+l->end_op_idx = s->gen_next_op_idx;
+tcg_gen_op1i(INDEX_op_set_inline_region_end,
+ _get_inline_index(l));
+}
+
 static inline void tcg_gen_discard_i32(TCGv_i32 arg)
 {
 tcg_gen_op1_i32(INDEX_op_discard, arg);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 956fb1e9f3..279ac0dc1f 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -29,6 +29,9 @@
 /* predefined ops */
 DEF(discard, 1, 0, 0, TCG_OPF_NOT_PRESENT)
 DEF(set_label, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_NOT_PRESENT)
+DEF(set_inline_point, 0, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(set_inline_region_begin, 0, 0, 1, TCG_OPF_NOT_PRESENT)
+DEF(set_inline_region_end, 0, 0, 1, TCG_OPF_NOT_PRESENT)
 
 /* variable number of parameters */
 DEF(call, 0, 0, 3, TCG_OPF_CALL_CLOBBER | TCG_OPF_NOT_PRESENT)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index fd8a3dfe93..b48196da27 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -251,6 +251,23 @@ TCGLabel *gen_new_label(void)
 return l;
 }
 
+TCGInlineLabel *gen_new_inline_label(void)
+{
+TCGContext *s = &tcg_ctx;
+int idx;
+TCGInlineLabel *l;
+
+if (s->nb_inline_labels >= TCG_MAX_INLINE_LABELS) {
+tcg_abort();
+}
+idx = s->nb_inline_labels++;
+l = &s->inline_labels[idx];
+l->first_point = NULL;
+l->begin_op_idx = -1;
+l->end_op_idx = -1;
+return l;
+}
+
 #include "tcg-target.inc.c"
 
 /* pool based memory allocation */
@@ -462,6 +479,10 @@ void tcg_func_start(TCGContext *s)
 s->nb_labels = 0;
 s->current_frame_offset = s->frame_start;
 
+s->inline_labels = tcg_malloc(sizeof(TCGInlineLabel) *
+  TCG_MAX_INLINE_LABELS);
+s->nb_inline_labels = 0;
+
 #ifdef CONFIG_DEBUG_TCG
 s->goto_tb_issue_mask = 0;
 #endif
@@ -1423,6 +1444,139 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t 
*temp_state)
 }
 }
 
+static inline int _get_op_next(TCGContext *s, int idx)
+{
+return s->gen_op_buf[idx].next;
+}
+
+static inline void _set_op_next(TCGContext *s, int