Re: [Qemu-devel] [PATCH v4 25/26] tcg: Check for overflow via highwater mark

2015-10-01 Thread Peter Maydell
On 30 September 2015 at 17:50, Aurelien Jarno  wrote:
> On 2015-09-30 15:09, Richard Henderson wrote:
>> We currently pre-compute an worst case code size for any TB, which
>> works out to be 122kB.  Since the average TB size is near 1kB, this
>> wastes quite a lot of storage.
>
> The code generation buffer is currently computed as 1/4 of the guest
> RAM in softmmu mode (so 32MB for the default 128MB of RAM) or 32MB in
> user mode. 122kB is therefore less than 0.4% of waster memory, I am not
> therefore sure we need to add so much code just for that.

I think the best argument for this patch is that it gets rid
of the TCG_MAX_OP_SIZE guesswork about TCG op worst-cases.
That sort of #define is generally arbitrary and liable to not
get updated when new targets or backends invalidate previous
assumptions.

(Can we do anything to get rid of MAX_OP_PER_INSTR too?)

thanks
-- PMM



Re: [Qemu-devel] [PATCH v4 25/26] tcg: Check for overflow via highwater mark

2015-10-01 Thread Richard Henderson

On 10/01/2015 02:50 AM, Aurelien Jarno wrote:

On 2015-09-30 15:09, Richard Henderson wrote:

We currently pre-compute an worst case code size for any TB, which
works out to be 122kB.  Since the average TB size is near 1kB, this
wastes quite a lot of storage.


The code generation buffer is currently computed as 1/4 of the guest
RAM in softmmu mode (so 32MB for the default 128MB of RAM) or 32MB in
user mode. 122kB is therefore less than 0.4% of waster memory, I am not
therefore sure we need to add so much code just for that.

BTW, I wonder if it is really a good idea to scale the code generation
buffer with the size of the RAM, as the two do not seem that related. It
happens that at some point we don't really increases performances
anymore, and always defining it as 32MB might actually be a good idea.
Personally I am using a patch that limits it to 128MB.


It depends on the guest and the workload I suppose.

For alpha, which almost never flushes the buffer, allocating 2GB to the guest 
(and thus 512MB to the buffer) seems to reach a nice steady-state during a 
guest gcc bootstrap where no more code generation is required.


If I were to limit the buffer to significantly less, as you're suggesting, I 
believe that the working set would overflow the buffer, requiring more code 
generation.


I guess I could always use -tb-size myself to override...



r~




Re: [Qemu-devel] [PATCH v4 25/26] tcg: Check for overflow via highwater mark

2015-10-01 Thread Aurelien Jarno
On 2015-09-30 15:09, Richard Henderson wrote:
> We currently pre-compute an worst case code size for any TB, which
> works out to be 122kB.  Since the average TB size is near 1kB, this
> wastes quite a lot of storage.

The code generation buffer is currently computed as 1/4 of the guest
RAM in softmmu mode (so 32MB for the default 128MB of RAM) or 32MB in
user mode. 122kB is therefore less than 0.4% of waster memory, I am not
therefore sure we need to add so much code just for that.

BTW, I wonder if it is really a good idea to scale the code generation
buffer with the size of the RAM, as the two do not seem that related. It
happens that at some point we don't really increases performances
anymore, and always defining it as 32MB might actually be a good idea.
Personally I am using a patch that limits it to 128MB.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net



[Qemu-devel] [PATCH v4 25/26] tcg: Check for overflow via highwater mark

2015-09-30 Thread Richard Henderson
We currently pre-compute an worst case code size for any TB, which
works out to be 122kB.  Since the average TB size is near 1kB, this
wastes quite a lot of storage.

Instead, check for overflow in between generating code for each opcode.
The overhead of the check isn't measurable and wastage is minimized.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h |  6 --
 tcg/tcg.c   | 14 +++---
 tcg/tcg.h   |  5 +++--
 translate-all.c | 31 ++-
 4 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6871e78..71c9d85 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -62,12 +62,6 @@ typedef struct TranslationBlock TranslationBlock;
 #define OPC_BUF_SIZE 640
 #define OPC_MAX_SIZE (OPC_BUF_SIZE - MAX_OP_PER_INSTR)
 
-/* Maximum size a TCG op can expand to.  This is complicated because a
-   single op may require several host instructions and register reloads.
-   For now take a wild guess at 192 bytes, which should allow at least
-   a couple of fixup instructions per argument.  */
-#define TCG_MAX_OP_SIZE 192
-
 #define OPPARAM_BUF_SIZE (OPC_BUF_SIZE * MAX_OPC_PARAM)
 
 #include "qemu/log.h"
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 5609108..682af8a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -385,9 +385,10 @@ void tcg_prologue_init(TCGContext *s)
 total_size = s->code_gen_buffer_size - prologue_size;
 s->code_gen_buffer_size = total_size;
 
-/* Compute a high-water mark, at which we voluntarily flush the
-   buffer and start over.  */
-s->code_gen_buffer_max_size = total_size - TCG_MAX_OP_SIZE * OPC_BUF_SIZE;
+/* Compute a high-water mark, at which we voluntarily flush the buffer
+   and start over.  The size here is arbitrary, significantly larger
+   than we expect the code generation for any one opcode to require.  */
+s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024);
 
 tcg_register_jit(s->code_gen_buffer, total_size);
 
@@ -2438,6 +2439,13 @@ int tcg_gen_code(TCGContext *s, tcg_insn_unit 
*gen_code_buf)
 #ifndef NDEBUG
 check_regs(s);
 #endif
+/* Test for (pending) buffer overflow.  The assumption is that any
+   one operation beginning below the high water mark cannot overrun
+   the buffer completely.  Thus we can test for overflow after
+   generating code without having to check during generation.  */
+if (unlikely(s->code_gen_ptr > s->code_gen_highwater)) {
+return -1;
+}
 }
 tcg_debug_assert(num_insns >= 0);
 s->gen_insn_end_off[num_insns] = tcg_current_code_size(s);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 5fbbd15..a696922 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -559,10 +559,11 @@ struct TCGContext {
 void *code_gen_prologue;
 void *code_gen_buffer;
 size_t code_gen_buffer_size;
-/* threshold to flush the translated code buffer */
-size_t code_gen_buffer_max_size;
 void *code_gen_ptr;
 
+/* Threshold to flush the translated code buffer.  */
+void *code_gen_highwater;
+
 TBContext tb_ctx;
 
 /* The TCGBackendData structure is private to tcg-target.c.  */
diff --git a/translate-all.c b/translate-all.c
index b43bd03..333eba4 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -223,6 +223,7 @@ static target_long decode_sleb128(uint8_t **pp)
 
 static int encode_search(TranslationBlock *tb, uint8_t *block)
 {
+uint8_t *highwater = tcg_ctx.code_gen_highwater;
 uint8_t *p = block;
 int i, j, n;
 
@@ -241,6 +242,14 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 }
 prev = (i == 0 ? 0 : tcg_ctx.gen_insn_end_off[i - 1]);
 p = encode_sleb128(p, tcg_ctx.gen_insn_end_off[i] - prev);
+
+/* Test for (pending) buffer overflow.  The assumption is that any
+   one row beginning below the high water mark cannot overrun
+   the buffer completely.  Thus we can test for overflow after
+   encoding a row without having to check during encoding.  */
+if (unlikely(p > highwater)) {
+return -1;
+}
 }
 
 return p - block;
@@ -756,9 +765,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
 TranslationBlock *tb;
 
-if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks ||
-(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) >=
- tcg_ctx.code_gen_buffer_max_size) {
+if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
 return NULL;
 }
 tb = _ctx.tb_ctx.tbs[tcg_ctx.tb_ctx.nb_tbs++];
@@ -1063,12 +1070,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 if (use_icount) {
 cflags |= CF_USE_ICOUNT;
 }
+
 tb = tb_alloc(pc);
-if (!tb) {
+if (unlikely(!tb)) {
+ buffer_overflow:
 /* flush must be done */