Re: [PATCH] Re: Register stacks organization

2004-01-19 Thread Leopold Toetsch
Luke Palmer [EMAIL PROTECTED] wrote:
 Leopold Toetsch writes:

 I'm working on yet another scheme at the moment, with a special op that
 pushes and clears simultaneously, to see if I can avoid copying
 altogether!  It costs another indirection in the core registers,

That would be a major change especially for prederefed and JIT code too.

 But we'll see what the benchmarks say.

Yep.

 Fixed in included patch.

Thanks, applied.

 - the COW flags is only reset on one user of 2 provocing even
   more unneeded ocpies.

 Again, I don't think this is hurting us.  If there's another copy, it's
 in a continuation.

We got that problem on *all* stacks, not only register frames used in
Continuations.

 Luke

leo


[PATCH] Re: Register stacks organization

2004-01-18 Thread Luke Palmer
Leopold Toetsch writes:
 Luke Palmer clearly should, that optimizations WRT register frame stacks 
 are possible.
 The follwing numbers seem to second that:
 
 FRAMES_PER_CHUNK   time real  user
  4  1.2s  0.9
  8  1.6
 16  2.3 1.4
 
 These are runs of
  parrot -j -Oc examples/benchmarks/fib.imc
 on an unoptimized built parrot on Athlon 800.
 
 The main problem seems to be, that we are copying around a lot of memory:
 16 chunks * 32 regs * 4/8 bytes

I think it would be a good idea to reduce FRAMES_PER_CHUNK anyway.  The
more continuations are used, the more a high value hurts us.

I'm working on yet another scheme at the moment, with a special op that
pushes and clears simultaneously, to see if I can avoid copying
altogether!  It costs another indirection in the core registers, so I'm
worried that it will slow things down too much in general to be useful.
But we'll see what the benchmarks say.

 More fields for optimizations:
 - regstack_copy_chunk alwyas copies a full chunk (despite of used)
 - memory is allocated zeroed

Fixed in included patch.

 - the COW flags is only reset on one user of 2 provocing even
   more unneeded ocpies.

Again, I don't think this is hurting us.  If there's another copy, it's
in a continuation.  And unless continuations destroy themselves when
they're invoked, the stack they hold on to should stay COW.

 Comments welcome,
 leo

Luke


Index: include/parrot/register.h
===
RCS file: /cvs/public/parrot/include/parrot/register.h,v
retrieving revision 1.18
diff -u -r1.18 register.h
--- include/parrot/register.h   12 Jan 2004 09:50:24 -  1.18
+++ include/parrot/register.h   18 Jan 2004 18:28:54 -
@@ -50,7 +50,7 @@
 
 struct RegStack {
 struct RegisterChunkBuf* top;
-size_t chunk_size;
+size_t frame_size;
 };
 
 /* Base class for the RegChunk types */
Index: src/register.c
===
RCS file: /cvs/public/parrot/src/register.c,v
retrieving revision 1.38
diff -u -r1.38 register.c
--- src/register.c  17 Jan 2004 11:40:03 -  1.38
+++ src/register.c  18 Jan 2004 18:28:54 -
@@ -23,22 +23,22 @@
 buf = new_bufferlike_header(interpreter, sizeof(struct RegisterChunkBuf));
 Parrot_allocate_zeroed(interpreter, buf, sizeof(struct IRegChunkBuf));
 interpreter-ctx.int_reg_stack.top = buf;
-interpreter-ctx.int_reg_stack.chunk_size = sizeof(struct IRegChunkBuf);
+interpreter-ctx.int_reg_stack.frame_size = sizeof(struct IRegFrame);
 
 buf = new_bufferlike_header(interpreter, sizeof(struct RegisterChunkBuf));
 Parrot_allocate_zeroed(interpreter, buf, sizeof(struct SRegChunkBuf));
 interpreter-ctx.string_reg_stack.top = buf;
-interpreter-ctx.string_reg_stack.chunk_size = sizeof(struct SRegChunkBuf);
+interpreter-ctx.string_reg_stack.frame_size = sizeof(struct SRegFrame);
 
 buf = new_bufferlike_header(interpreter, sizeof(struct RegisterChunkBuf));
 Parrot_allocate_zeroed(interpreter, buf, sizeof(struct NRegChunkBuf));
 interpreter-ctx.num_reg_stack.top = buf;
-interpreter-ctx.num_reg_stack.chunk_size = sizeof(struct NRegChunkBuf);
+interpreter-ctx.num_reg_stack.frame_size = sizeof(struct NRegFrame);
 
 buf = new_bufferlike_header(interpreter, sizeof(struct RegisterChunkBuf));
 Parrot_allocate_zeroed(interpreter, buf, sizeof(struct PRegChunkBuf));
 interpreter-ctx.pmc_reg_stack.top = buf;
-interpreter-ctx.pmc_reg_stack.chunk_size = sizeof(struct PRegChunkBuf);
+interpreter-ctx.pmc_reg_stack.frame_size = sizeof(struct PRegFrame);
 
 Parrot_unblock_DOD(interpreter);
 }
@@ -112,10 +112,11 @@
 PObj_COW_CLEAR((PObj*) buf);
 
 Parrot_block_DOD(interpreter);
-Parrot_allocate(interpreter, buf, stack-chunk_size);
+Parrot_allocate(interpreter, buf, stack-frame_size * FRAMES_PER_CHUNK);
 Parrot_unblock_DOD(interpreter);
 
-memcpy(buf-data.bufstart, chunk-data.bufstart, stack-chunk_size);
+memcpy(buf-data.bufstart, chunk-data.bufstart, 
+stack-frame_size * FRAMES_PER_CHUNK);
 return buf;
 }
 
@@ -136,7 +137,8 @@
 sizeof(struct RegisterChunkBuf));
 
 Parrot_block_DOD(interpreter);
-Parrot_allocate_zeroed(interpreter, (PObj*)buf, stack-chunk_size);
+Parrot_allocate(interpreter, (PObj*)buf, 
+stack-frame_size * FRAMES_PER_CHUNK);
 Parrot_unblock_DOD(interpreter);
 
 buf-used = 1;


Re: [PATCH] Re: Register stacks organization

2004-01-18 Thread Luke Palmer
Luke Palmer writes:
  
 -memcpy(buf-data.bufstart, chunk-data.bufstart, stack-chunk_size);
 +memcpy(buf-data.bufstart, chunk-data.bufstart, 
 +stack-frame_size * FRAMES_PER_CHUNK);

Silly me -- left over from benchmarks.  Of course I mean:

+   memcpy(buf-data.bufstart, chunk-data.bufstart,
+   stack-frame_size * chunk-used);

Luke