Re: [drlvm] dynamic object layout

2006-11-08 Thread Geir Magnusson Jr.



Rana Dasgupta wrote:

Again, this makes sense. Functional completenes is needed, but over a
period, based on when we want to release. Identifying a couple of 
milestones

before 1.0 for which we choose features to complete, and performance
objectives can help. For each, we can add a bug-fix/stability period.
Branching that happens at the start and end of each milestone automatically
improves stability.
In addition, as Etienne mentions, we should not hesitate to use branches to
partition new platforms, large new development etc. For example GCv5 is
effectively in a branch currently. It is not executed without a specific
command line option and is not picked up by our regular test runs.


That's one thing to add to build-test - to do a rebuild with GCv5 and 
see what happens ;)


geir



Re: [drlvm] dynamic object layout

2006-11-07 Thread Rana Dasgupta

Again, this makes sense. Functional completenes is needed, but over a
period, based on when we want to release. Identifying a couple of milestones
before 1.0 for which we choose features to complete, and performance
objectives can help. For each, we can add a bug-fix/stability period.
Branching that happens at the start and end of each milestone automatically
improves stability.
In addition, as Etienne mentions, we should not hesitate to use branches to
partition new platforms, large new development etc. For example GCv5 is
effectively in a branch currently. It is not executed without a specific
command line option and is not picked up by our regular test runs.



On 11/7/06, Tim Ellison <[EMAIL PROTECTED]> wrote:



Just to add my 2c, in that I concur with this position.  There has to be
a judgement call on each of the new areas of functional improvement to
decide whether it will further disrupt improved stability goals.  In
general it is preferable to be solid but functionally incomplete rather
than vice versa.

Regards,
Tim




Re: [drlvm] dynamic object layout

2006-11-07 Thread Tim Ellison
Geir Magnusson Jr. wrote:
> 
> Alexei Fedotov wrote:
>> Weldon,
>>
>> I agree with you that it is nearly impossible to achieve stability for
>> a branch under active development.
>>
>>  From the other side, adding new features is fun, and also has a reason
>> behind it. If we strive for a complete implementation of J2SE, we
>> cannot avoid this type of activity.
>>
>> So my suggestion is to create separate branches for new features which
>> could be merged into the main branch when mature enough to achieve an
>> appropriate level of stability. What do you think?
> 
> Well, there's a couple of things here.  Any committer is free to go off
> into a sandbox to do something radical.  However, there are features we
> simply need - class unloading, for example - that aren't new features
> being done just for fun.
> 
> Things are complicated and we've seen how some features from the past,
> say the TM or invocation API, were done off in a corner, that led to two
> problems when brought forward -
> 
> 1) There were lots of others that had useful input who weren't able to
> contribute until the feature was finished  and
> 
> 2) The iterations of discussion about the patch while ongoing progress
> was happening in the trunk made the big patches stale, which made it
> hard for people to examine, test and comment on.
> 
> I think that for something like this, we should evaluate the "new ideas"
> on the merit, and decide if it's critical to our goal of a competitive,
> compatible Harmony v1.0 (for example, class unloading) or simply a
> nice-to-have improvement (GCv5, maybe).
> 
> We have a really difficult job to do in the next 7.5 months - to get to
> a compatible 1.0* - so I'd like to encourage people to remain as focused
> as we can to get to that point.  That doesn't mean this isn't fun, but
> the way I see it, we have a few focused months of efforts before we
> begin TCK testing, and we probably need to make some hard choices to
> delay stuff.  We're a mighty community, but a relatively small one, so
> the more of us rowing in the same direction, the better.
> 
> So if JVMTI is slow?  What's the tradeoff?  My persoal perference would
> be to take stability for now, and revisit the JVMTI  performance later...

Just to add my 2c, in that I concur with this position.  There has to be
a judgement call on each of the new areas of functional improvement to
decide whether it will further disrupt improved stability goals.  In
general it is preferable to be solid but functionally incomplete rather
than vice versa.

Regards,
Tim

> geir
> 
> * Yeah, I dream of Harmony as the first compatible open source
> implementation of the JDK, beating Sun...
> 
> 
>>
>> Alexei
>>
>> On 11/3/06, Weldon Washburn <[EMAIL PROTECTED]> wrote:
>>> Salikh,
>>>  I glanced at the patch.  What you propose below looks reasonable.  I
>>> really
>>> don't see any other way to do it and still get "usable" performance.
>>>
>>> All,
>>> My only worry is disturbing highly critical code like object layout. 
>>> Given
>>> that this JIRA has been open a long time, I guess its OK to apply the
>>> patch.  At some point, we need to stop adding functionality and focus on
>>> stability.
>>>
>>>
>>>
>>> On 11/3/06, Salikh Zakirov <[EMAIL PROTECTED]> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am currently continuing to work on improving JVMTI Heap Iteration
>>> > (HARMONY-1635),
>>> > particularly, tagging objects.
>>> >
>>> > The use case that I've heard of is tagging *all* objects for the
>>> purpose
>>> > of memory
>>> > profiling. According to what I've heard it causes 60x slowdown on
>>> Sun's
>>> > VM.
>>> > However, the initial tags implementation that I've uploaded to
>>> > HARMONY-1635
>>> > is far worse, as it uses linear search for get/set tag operations.
>>> >
>>> > (* for those who didn't read JVMTI spec, tags are jlong (8 byte
>>> integer)
>>> > values,
>>> > which can be attached to arbitrary objects in get/set manner *)
>>> >
>>> > The alternative approach I came up with is to use (mostly) constant
>>> time
>>> > algorithms
>>> > for get/set operations, is to store a tag pointer in each object.
>>> > Storing tag itself in an object is not an option, as JVMTI requires to
>>> > send
>>> > OBJECT_FREE events with tags for each reclaimed objects, and this
>>> > information would not be
>>> > available if the tag would be reclaimed together with the object.
>>> >
>>> > However, since the general consensus was that increasing object
>>> header is
>>> > highly undesired,
>>> > I've tried to implement the _conditional_ increase in object header.
>>> > Additional object header field is allocated in case JVMTI Agent has
>>> > requested
>>> > can_tag_objects capability.
>>> >
>>> > The modified object layout I used is as follows:
>>> >
>>> > +---+
>>> > |   VTable pointer  |
>>> > +---+
>>> > |  lockword |
>>> > +---+
>>> > |   [array length]  |
>>> > +---+
>>> > |   [tag pointer]   |
>>> > +

[drlvm] Development plan (was: Re: [drlvm] dynamic object layout)

2006-11-04 Thread Robin Garner
> We have a really difficult job to do in the next 7.5 months - to get to
> a compatible 1.0* - so I'd like to encourage people to remain as focused
> as we can to get to that point.  That doesn't mean this isn't fun, but
> the way I see it, we have a few focused months of efforts before we
> begin TCK testing, and we probably need to make some hard choices to
> delay stuff.  We're a mighty community, but a relatively small one, so
> the more of us rowing in the same direction, the better.

Is there a planned set of performance features for the 1.0 release of
harmony/DRLVM ?  I've had a quick look and can't seem to find it.  From
where I sit, reading (bits of) the mailing list, performance tweaks seem
to be thrown into the mix as people think of them rather than in a
coordinated way.  I hope I'm wrong, and that somewhere there is a plan :)

It would be good to see a list of the features that people see as being
critical to improving DRLVM's performance, and to be able to contribute to
it as a whole, rather than just on a point by point basis.

One resource I would also like to see developed is an archive of
performance feature tests, so that we can refer to an empirical record of
the cost/benefit of certain changes (for example, the cost of adding an
additional word to the object header).  Making this available, along with
patches that would allow the experiment to be reproduced seems to me like
it would be valuable.

Where this connects specifically with the 'dynamic object layout' thread
is that I was wondering what the cost of DRLVM's object model is, vs one
where the fields of all objects/arrays are at a fixed offset from the
object pointer.  This kind of object model makes adding header fields (at
the start of the object) trivial, and I'm thinking it might help
performance too.

cheers,
Robin



Re: [drlvm] dynamic object layout

2006-11-04 Thread Geir Magnusson Jr.


Alexei Fedotov wrote:

Weldon,

I agree with you that it is nearly impossible to achieve stability for
a branch under active development.

 From the other side, adding new features is fun, and also has a reason
behind it. If we strive for a complete implementation of J2SE, we
cannot avoid this type of activity.

So my suggestion is to create separate branches for new features which
could be merged into the main branch when mature enough to achieve an
appropriate level of stability. What do you think?


Well, there's a couple of things here.  Any committer is free to go off 
into a sandbox to do something radical.  However, there are features we 
simply need - class unloading, for example - that aren't new features 
being done just for fun.


Things are complicated and we've seen how some features from the past, 
say the TM or invocation API, were done off in a corner, that led to two 
problems when brought forward -


1) There were lots of others that had useful input who weren't able to 
contribute until the feature was finished  and


2) The iterations of discussion about the patch while ongoing progress 
was happening in the trunk made the big patches stale, which made it 
hard for people to examine, test and comment on.


I think that for something like this, we should evaluate the "new ideas" 
on the merit, and decide if it's critical to our goal of a competitive, 
compatible Harmony v1.0 (for example, class unloading) or simply a 
nice-to-have improvement (GCv5, maybe).


We have a really difficult job to do in the next 7.5 months - to get to 
a compatible 1.0* - so I'd like to encourage people to remain as focused 
as we can to get to that point.  That doesn't mean this isn't fun, but 
the way I see it, we have a few focused months of efforts before we 
begin TCK testing, and we probably need to make some hard choices to 
delay stuff.  We're a mighty community, but a relatively small one, so 
the more of us rowing in the same direction, the better.


So if JVMTI is slow?  What's the tradeoff?  My persoal perference would 
be to take stability for now, and revisit the JVMTI  performance later...


geir

* Yeah, I dream of Harmony as the first compatible open source 
implementation of the JDK, beating Sun...





Alexei

On 11/3/06, Weldon Washburn <[EMAIL PROTECTED]> wrote:

Salikh,
 I glanced at the patch.  What you propose below looks reasonable.  I 
really

don't see any other way to do it and still get "usable" performance.

All,
My only worry is disturbing highly critical code like object layout.  
Given

that this JIRA has been open a long time, I guess its OK to apply the
patch.  At some point, we need to stop adding functionality and focus on
stability.



On 11/3/06, Salikh Zakirov <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I am currently continuing to work on improving JVMTI Heap Iteration
> (HARMONY-1635),
> particularly, tagging objects.
>
> The use case that I've heard of is tagging *all* objects for the 
purpose

> of memory
> profiling. According to what I've heard it causes 60x slowdown on Sun's
> VM.
> However, the initial tags implementation that I've uploaded to
> HARMONY-1635
> is far worse, as it uses linear search for get/set tag operations.
>
> (* for those who didn't read JVMTI spec, tags are jlong (8 byte 
integer)

> values,
> which can be attached to arbitrary objects in get/set manner *)
>
> The alternative approach I came up with is to use (mostly) constant 
time

> algorithms
> for get/set operations, is to store a tag pointer in each object.
> Storing tag itself in an object is not an option, as JVMTI requires to
> send
> OBJECT_FREE events with tags for each reclaimed objects, and this
> information would not be
> available if the tag would be reclaimed together with the object.
>
> However, since the general consensus was that increasing object 
header is

> highly undesired,
> I've tried to implement the _conditional_ increase in object header.
> Additional object header field is allocated in case JVMTI Agent has
> requested
> can_tag_objects capability.
>
> The modified object layout I used is as follows:
>
> +---+
> |   VTable pointer  |
> +---+
> |  lockword |
> +---+
> |   [array length]  |
> +---+
> |   [tag pointer]   |
> +---+
> |[padding]  |
> +---+
> | fields or elements|
> |   ... |
> +---+
>
> Where [array length] is only present in array objects,
> [tag pointer] is only present when can_tag_capability has been 
enabled at

> startup
> [padding] is only present in arrays of longs and doubles for natural
> 8-byte alignment.
>
> VTable pointer is really uint32 offset on em64t/x86_64 and ipf/ia64.
>
> The only difference with current object layout is introduction of tag
> pointer field.
>
> I've modified gc_cc to take the changed dynamic object layout into
> account,
> and surprisingly it took only one modification:
>
> * u

Re: [drlvm] dynamic object layout

2006-11-04 Thread Etienne Gagnon
Hi,

In the SableVM project, all new feature development is done in
sandboxes.  This helps maintaining a robust trunk (as much as is
possible).  My experience is that this works well.  Maybe you could do
the same in Harmony.  Isn't there already some "sandboxes"?

It might or might not work well here, though.  In SableVM, bugs are not
usually reported against sandboxes.  In contrast, in Harmony, almost
everything is done through JIRA, including patch submission...

Etienne

Weldon Washburn wrote:
> On 11/4/06, Alexei Fedotov <[EMAIL PROTECTED]> wrote:
>> So my suggestion is to create separate branches for new features which
>> could be merged into the main branch when mature enough to achieve an
>> appropriate level of stability. What do you think?
> 
> As much as I hate it, I don't know how to avoid branching.  Also, we
> probably need some sort of JIRA coding to reflect which branch has which
> patches.

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature


Re: [drlvm] dynamic object layout

2006-11-04 Thread Weldon Washburn

On 11/4/06, Alexei Fedotov <[EMAIL PROTECTED]> wrote:


Weldon,

I agree with you that it is nearly impossible to achieve stability for
a branch under active development.

From the other side, adding new features is fun, and also has a reason
behind it. If we strive for a complete implementation of J2SE, we
cannot avoid this type of activity.

So my suggestion is to create separate branches for new features which
could be merged into the main branch when mature enough to achieve an
appropriate level of stability. What do you think?



As much as I hate it, I don't know how to avoid branching.  Also, we
probably need some sort of JIRA coding to reflect which branch has which
patches.


Alexei


On 11/3/06, Weldon Washburn <[EMAIL PROTECTED]> wrote:
> Salikh,
>  I glanced at the patch.  What you propose below looks reasonable.  I
really
> don't see any other way to do it and still get "usable" performance.
>
> All,
> My only worry is disturbing highly critical code like object
layout.  Given
> that this JIRA has been open a long time, I guess its OK to apply the
> patch.  At some point, we need to stop adding functionality and focus on
> stability.
>
>
>
> On 11/3/06, Salikh Zakirov <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> >
> > I am currently continuing to work on improving JVMTI Heap Iteration
> > (HARMONY-1635),
> > particularly, tagging objects.
> >
> > The use case that I've heard of is tagging *all* objects for the
purpose
> > of memory
> > profiling. According to what I've heard it causes 60x slowdown on
Sun's
> > VM.
> > However, the initial tags implementation that I've uploaded to
> > HARMONY-1635
> > is far worse, as it uses linear search for get/set tag operations.
> >
> > (* for those who didn't read JVMTI spec, tags are jlong (8 byte
integer)
> > values,
> > which can be attached to arbitrary objects in get/set manner *)
> >
> > The alternative approach I came up with is to use (mostly) constant
time
> > algorithms
> > for get/set operations, is to store a tag pointer in each object.
> > Storing tag itself in an object is not an option, as JVMTI requires to
> > send
> > OBJECT_FREE events with tags for each reclaimed objects, and this
> > information would not be
> > available if the tag would be reclaimed together with the object.
> >
> > However, since the general consensus was that increasing object header
is
> > highly undesired,
> > I've tried to implement the _conditional_ increase in object header.
> > Additional object header field is allocated in case JVMTI Agent has
> > requested
> > can_tag_objects capability.
> >
> > The modified object layout I used is as follows:
> >
> > +---+
> > |   VTable pointer  |
> > +---+
> > |  lockword |
> > +---+
> > |   [array length]  |
> > +---+
> > |   [tag pointer]   |
> > +---+
> > |[padding]  |
> > +---+
> > | fields or elements|
> > |   ... |
> > +---+
> >
> > Where [array length] is only present in array objects,
> > [tag pointer] is only present when can_tag_capability has been enabled
at
> > startup
> > [padding] is only present in arrays of longs and doubles for natural
> > 8-byte alignment.
> >
> > VTable pointer is really uint32 offset on em64t/x86_64 and ipf/ia64.
> >
> > The only difference with current object layout is introduction of tag
> > pointer field.
> >
> > I've modified gc_cc to take the changed dynamic object layout into
> > account,
> > and surprisingly it took only one modification:
> >
> > * use VM function vector_first_element_offset_unboxed() instead of
> > hardcoding
> > first array element offset. This is done once for each class done at
> > loading stage,
> > and gc_cc caches this offset for later uses.
> >
> > I've experimented with putting tag pointer at fixed location before
array
> > length,
> > but it looks expensive, as it will add one more read to GC array
scanning,
> > and
> > we obviously do not want optimize at the expense of common case.
> >
> > The latest version of the patch is attached to HARMONY-1635 (
> > heap-iteration-optimized.patch),
> > I would appreciate any comments and concerns.
> >
> >
> >
>
>
> --
> Weldon Washburn
> Intel Enterprise Solutions Software Division
>
>


--
Thank you,
Alexei





--
Weldon Washburn
Intel Enterprise Solutions Software Division


Re: [drlvm] dynamic object layout

2006-11-04 Thread Alexei Fedotov

Weldon,

I agree with you that it is nearly impossible to achieve stability for
a branch under active development.


From the other side, adding new features is fun, and also has a reason

behind it. If we strive for a complete implementation of J2SE, we
cannot avoid this type of activity.

So my suggestion is to create separate branches for new features which
could be merged into the main branch when mature enough to achieve an
appropriate level of stability. What do you think?

Alexei

On 11/3/06, Weldon Washburn <[EMAIL PROTECTED]> wrote:

Salikh,
 I glanced at the patch.  What you propose below looks reasonable.  I really
don't see any other way to do it and still get "usable" performance.

All,
My only worry is disturbing highly critical code like object layout.  Given
that this JIRA has been open a long time, I guess its OK to apply the
patch.  At some point, we need to stop adding functionality and focus on
stability.



On 11/3/06, Salikh Zakirov <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I am currently continuing to work on improving JVMTI Heap Iteration
> (HARMONY-1635),
> particularly, tagging objects.
>
> The use case that I've heard of is tagging *all* objects for the purpose
> of memory
> profiling. According to what I've heard it causes 60x slowdown on Sun's
> VM.
> However, the initial tags implementation that I've uploaded to
> HARMONY-1635
> is far worse, as it uses linear search for get/set tag operations.
>
> (* for those who didn't read JVMTI spec, tags are jlong (8 byte integer)
> values,
> which can be attached to arbitrary objects in get/set manner *)
>
> The alternative approach I came up with is to use (mostly) constant time
> algorithms
> for get/set operations, is to store a tag pointer in each object.
> Storing tag itself in an object is not an option, as JVMTI requires to
> send
> OBJECT_FREE events with tags for each reclaimed objects, and this
> information would not be
> available if the tag would be reclaimed together with the object.
>
> However, since the general consensus was that increasing object header is
> highly undesired,
> I've tried to implement the _conditional_ increase in object header.
> Additional object header field is allocated in case JVMTI Agent has
> requested
> can_tag_objects capability.
>
> The modified object layout I used is as follows:
>
> +---+
> |   VTable pointer  |
> +---+
> |  lockword |
> +---+
> |   [array length]  |
> +---+
> |   [tag pointer]   |
> +---+
> |[padding]  |
> +---+
> | fields or elements|
> |   ... |
> +---+
>
> Where [array length] is only present in array objects,
> [tag pointer] is only present when can_tag_capability has been enabled at
> startup
> [padding] is only present in arrays of longs and doubles for natural
> 8-byte alignment.
>
> VTable pointer is really uint32 offset on em64t/x86_64 and ipf/ia64.
>
> The only difference with current object layout is introduction of tag
> pointer field.
>
> I've modified gc_cc to take the changed dynamic object layout into
> account,
> and surprisingly it took only one modification:
>
> * use VM function vector_first_element_offset_unboxed() instead of
> hardcoding
> first array element offset. This is done once for each class done at
> loading stage,
> and gc_cc caches this offset for later uses.
>
> I've experimented with putting tag pointer at fixed location before array
> length,
> but it looks expensive, as it will add one more read to GC array scanning,
> and
> we obviously do not want optimize at the expense of common case.
>
> The latest version of the patch is attached to HARMONY-1635 (
> heap-iteration-optimized.patch),
> I would appreciate any comments and concerns.
>
>
>


--
Weldon Washburn
Intel Enterprise Solutions Software Division





--
Thank you,
Alexei


Re: [drlvm] dynamic object layout

2006-11-03 Thread Weldon Washburn

Salikh,
I glanced at the patch.  What you propose below looks reasonable.  I really
don't see any other way to do it and still get "usable" performance.

All,
My only worry is disturbing highly critical code like object layout.  Given
that this JIRA has been open a long time, I guess its OK to apply the
patch.  At some point, we need to stop adding functionality and focus on
stability.



On 11/3/06, Salikh Zakirov <[EMAIL PROTECTED]> wrote:


Hi,

I am currently continuing to work on improving JVMTI Heap Iteration
(HARMONY-1635),
particularly, tagging objects.

The use case that I've heard of is tagging *all* objects for the purpose
of memory
profiling. According to what I've heard it causes 60x slowdown on Sun's
VM.
However, the initial tags implementation that I've uploaded to
HARMONY-1635
is far worse, as it uses linear search for get/set tag operations.

(* for those who didn't read JVMTI spec, tags are jlong (8 byte integer)
values,
which can be attached to arbitrary objects in get/set manner *)

The alternative approach I came up with is to use (mostly) constant time
algorithms
for get/set operations, is to store a tag pointer in each object.
Storing tag itself in an object is not an option, as JVMTI requires to
send
OBJECT_FREE events with tags for each reclaimed objects, and this
information would not be
available if the tag would be reclaimed together with the object.

However, since the general consensus was that increasing object header is
highly undesired,
I've tried to implement the _conditional_ increase in object header.
Additional object header field is allocated in case JVMTI Agent has
requested
can_tag_objects capability.

The modified object layout I used is as follows:

+---+
|   VTable pointer  |
+---+
|  lockword |
+---+
|   [array length]  |
+---+
|   [tag pointer]   |
+---+
|[padding]  |
+---+
| fields or elements|
|   ... |
+---+

Where [array length] is only present in array objects,
[tag pointer] is only present when can_tag_capability has been enabled at
startup
[padding] is only present in arrays of longs and doubles for natural
8-byte alignment.

VTable pointer is really uint32 offset on em64t/x86_64 and ipf/ia64.

The only difference with current object layout is introduction of tag
pointer field.

I've modified gc_cc to take the changed dynamic object layout into
account,
and surprisingly it took only one modification:

* use VM function vector_first_element_offset_unboxed() instead of
hardcoding
first array element offset. This is done once for each class done at
loading stage,
and gc_cc caches this offset for later uses.

I've experimented with putting tag pointer at fixed location before array
length,
but it looks expensive, as it will add one more read to GC array scanning,
and
we obviously do not want optimize at the expense of common case.

The latest version of the patch is attached to HARMONY-1635 (
heap-iteration-optimized.patch),
I would appreciate any comments and concerns.






--
Weldon Washburn
Intel Enterprise Solutions Software Division


[drlvm] dynamic object layout

2006-11-03 Thread Salikh Zakirov
Hi,

I am currently continuing to work on improving JVMTI Heap Iteration 
(HARMONY-1635),
particularly, tagging objects.

The use case that I've heard of is tagging *all* objects for the purpose of 
memory
profiling. According to what I've heard it causes 60x slowdown on Sun's VM.
However, the initial tags implementation that I've uploaded to HARMONY-1635
is far worse, as it uses linear search for get/set tag operations.

(* for those who didn't read JVMTI spec, tags are jlong (8 byte integer) values,
which can be attached to arbitrary objects in get/set manner *)

The alternative approach I came up with is to use (mostly) constant time 
algorithms
for get/set operations, is to store a tag pointer in each object.
Storing tag itself in an object is not an option, as JVMTI requires to send
OBJECT_FREE events with tags for each reclaimed objects, and this information 
would not be
available if the tag would be reclaimed together with the object.

However, since the general consensus was that increasing object header is 
highly undesired,
I've tried to implement the _conditional_ increase in object header. 
Additional object header field is allocated in case JVMTI Agent has requested
can_tag_objects capability.

The modified object layout I used is as follows:

  +---+
  |   VTable pointer  |
  +---+
  |  lockword |
  +---+
  |   [array length]  |
  +---+
  |   [tag pointer]   |
  +---+
  |[padding]  |
  +---+
  | fields or elements|
  |   ... |
  +---+

Where [array length] is only present in array objects,
[tag pointer] is only present when can_tag_capability has been enabled at 
startup
[padding] is only present in arrays of longs and doubles for natural 8-byte 
alignment.

VTable pointer is really uint32 offset on em64t/x86_64 and ipf/ia64.

The only difference with current object layout is introduction of tag pointer 
field.

I've modified gc_cc to take the changed dynamic object layout into account,
and surprisingly it took only one modification:

* use VM function vector_first_element_offset_unboxed() instead of hardcoding 
first array element offset. This is done once for each class done at loading 
stage,
and gc_cc caches this offset for later uses.

I've experimented with putting tag pointer at fixed location before array 
length,
but it looks expensive, as it will add one more read to GC array scanning, and
we obviously do not want optimize at the expense of common case.

The latest version of the patch is attached to HARMONY-1635 
(heap-iteration-optimized.patch),
I would appreciate any comments and concerns.