> Some final 5000 life results from my system, and a few improvements
> I believe are still possible:
>
> Before COW: 172 seconds
> After COW: 121 seconds
> A 30% improvement in performance is not too bad, I suppose.
> Well done Mike!

Thanks!

> CVS/COW with stack pointer alignment = four: 93 seconds
> Above plus pre-mask for PMC/Buffer alignment = four: 90 seconds
>
> The first of these improvements is achieved by determining
> the alignment with which pointers are actually placed on the
> stack, versus PARROT_PTR_ALIGNMENT, which is the
> minimum alignment permitted by the underlying system.
> On an Intel x86 platform running linux, I have been unable to
> persuade any pointer to live on the stack other than on a
> four-byte alignment, except by placing it in a struct, and
> telling the compiler to pack structs. A simple C program is
> included below which illustrates this point.

Jason Gloudon has also said that x86 has a four-byte pointer alignment. I
seem to recall a pointer aligned to an odd value that I found in a stack
walk once, but I'm unable to reproduce it in extensive fiddling with your
test program. As such, it's probably worthwhile to implement such a
change, although I'm not quite sure the best way to do it.

Should this be a configure.pl-determined constant? Should we hardcode it
to sizeof(void*)? Is this behavior guaranteed by the C spec? Can we
assume it across all platforms even if it is not guaranteed?

> > If you don't mind, please feel free to continue your work on parrot-grey.
> The problem arises with trying to do new experimental development,
> which still keeping sufficiently in sync with cvs parrot that I can do a
> 'cvs update' from time to time without getting dozens of conflicts.
> A case in point is the new 'strstart' field - grey doesn't need it, but to
> leave it out would create a large number of differences between the
> two versions, with code having to be changed every time somebody
> writes a new reference to it - therefore if I do continue with grey, I will
> just probably just leave strstart in, and ignore the memory overhead.
> The next item on the list for grey was paged memory allocation - this
> may be usable to some extent without the buffer linked lists; so I will
> probably give that a spin anyway.

I think a union in the string header might do quite nicely in your case. I
had the chance to look into your next/prev buffer linking code the other
night. Interesting approach, but I have a few questions. :)

In your collection phase, you give up header pool cache-coherency in favor
of the memory pool. Your headers are organized by bufstart, essentially.
Likewise, your use of the circular linked list of headers to add stuff to
the front and ends of the header list as necessary is also interesting,
and thrw me for a loop for a little while. :)

The current cvs approach has an approach which is mostly cache-coherent.
It iterates over ALL (not just live ones, as you do) buffers in header
pools. And since the last collection, we can assume that most of the data
hasn't changed (a harder assumption if we have a generational collector),
and so the pool locality should follow the header locality, due to the
nature of the copying. I'm not trying to argue which one is better, but
merely try and state the differences in implementations to see if I got it
straight.

Might I ask what your motivation was for the header linked list? I can see
that it solves the problem of:

set S0, some_large_data_file_contents
substr S1, S0, 0, 1 #get first character as COW
set S0, ""
sweep
collect

In current CVS, the large data file is kept around, whereas in your
implementation, it would only copy the single character. However, there is
an easy way to achieve nearly the same behavior as above in the current
CVS. When we copy a COW string, it's initially marked as non-COW. In the
subsequent collection, we have a really large buffer with a strstart and
bufused that are quite small in total usage. If we only copy necessary
data for non-COW strings, then the second sweep performed would eliminate
the wasted memory copy.

Not quite as fast in eliminating the memory usage as the above solution,
but since we are guaranteed of collections happening throughout the
lifetime of any program that does something with strings, I think it's an
okay tradeoff. Were there any other reasons for implementing the above
linked list technique that I missed?

Thanks,
Mike Lambert



Reply via email to