On 1/9/07, Robin Garner <[EMAIL PROTECTED]> wrote:

> On 09 Jan 2007 16:54:03 +0600, Egor Pasko <[EMAIL PROTECTED]> wrote:
>>
>> On the 0x258 day of Apache Harmony Weldon Washburn wrote:
>> > It looks like multiple interesting design topics.  My comments
inlined
>> > below.
>> >
>> [..snip..]
>> >
>> > > Hi, I found write barrier in DRLVM can't catch all reference fields
>> > > updates, and the problem is identified to be caused by new jit opts
>> > > that do not observe the write barrier invariant for fields updates.
>> > > For example, JIT may generate code to copy consecutive fields of an
>> > > object without invoking write barrier. In order to revive the write
>> > > barrier functionality while not sacrificing the opt performance,
>> >
>> > Yes.  But first how much performance is  being
sacrificed?  .2%?  8%??

>>
>> JIT magic arraycopy gives up to 30% boost on a microbenchmark (see
>> HARMONY-2247). I did not measure boosts on specific benchmarks, but I
>> think users would expect System.arraycopy() to be pretty faster than
>> manual array copying. We should care about performance numbers as
>> such.
>
>
> Let me re-phrase the question.  Arraycopy performance is important and
> deserves the special case treatment it has always gotten.  Setting aside
> arraycopy, how much performance gain can be expected by optimizing
> consecutive writes to fields of an object for the benchmarks we care
> about?
> What about simply marking the consecutive writes regions as
> "Uninterruptible"?  This would eliminate yet another API between the GC
> and
> JIT.  I think this is basically the same as Robin's suggestion.
>
> Regarding arraycopy, is there a problem with making the entire arraycopy
> loop "Uninterruptible"?  This will impact GC latency but is the impact a
> big
> deal for workloads we care about?  If it is, why not have the compiler
> unroll the loop a bunch and put WBs every, say, 10th write.  The body of
> 10
> writes would be Uninterruptible.

With arraycopy, much of the saving is in barrier costs themselves.  Apart
from the overhead on the write, there's a reduction in remset entries, and
the cost of scanning the object at GC time is minimal for a reference
array.

The original motivating benchmark was jess iirc.

Off the top of my head, if the barrier is called before any data is copied
I think the arraycopy code is GC-safe, provided another barrier call is
made after the GC and before the next pointer write.  It's probably better
to make the arraycopy code uninterruptible.


Yes, I agree it's probably better to make the arraycopy code
uninterrutible.  The only caution is the impact on GC latency.  Um, does
this require a new additional API or can we simply use what is existing?

My gut feel is that scalars don't generally have enough pointers to make
the object remembering barrier worthwhile.


That's my hunch also.  However, if someone wants to spend time analyzing
enterprise workloads to discover if there is any cheese down that tunnel, I
won't get in the way.

> I'd
>> > > suggest to introduce an object remember write barrier which will be
>> > > invoked after the object is copied. So the JIT doesn't need to
>> insert
>> > > barrier for each field store.
>> >
>> > hmm.... what happens if some other app thread causes a GC to happen
in
>> the
>> > middle of writing a bunch of fields of a given object?  If the
>> > gc_heap_wrote_object() is called before JITed code scribbles on
slots,
>> then
>> > the early slots will be scanned and handled properly by the GC.   But
>> how do
>> > we handle the slots written after the GC completes?  One approach
>> would
>> be
>> > for the JIT to mark such regions of emitted code "Uninterruptable".
>> > Another approach would be to emit a WB both before and after a region
>> of
>> > multiple ref field scribbles.   In any case, it looks like we need
>> patch
>> up
>> > the holes in the contract between jit and gc.  However, as I said
>> above
>> is
>> > there anything wrong with a real simple dumb contract for now?  That
>> is
>> each
>> > ref write has a matching WB with no intervening instructions.
>> >
>> > >GC has an interface
>> > > gc_heap_wrote_object(p_obj) for this case.  I think it's ok to
>> insert
>> > > only the runtime native call at first. Then later we can consider
to
>> > > inline the object remembering barrier as well as the slot
>> remembering
>> > > barrier.
>> > >
>> > > Thanks,
>> > > xiaofeng
>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Weldon Washburn
>> > Intel Enterprise Solutions Software Division
>>
>> --
>> Egor Pasko
>>
>>
>
>
> --
> Weldon Washburn
> Intel Enterprise Solutions Software Division
>





--
Weldon Washburn
Intel Enterprise Solutions Software Division

Reply via email to