[Bug 25480] - Experimental performance improvements.

Finn Bock Sat, 10 Jan 2004 06:35:40 -0800

[Glen in bugzilla]

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25480

I'm now looking at the changes to PropertySets.java (already applied) and PropertyList.java (only partly so) and have a few questions:

1.) For the new PropertyList constructor (in the patch), you appear to be duplicating the element ID argument, once as "el", the other time as "elementId"--just to confirm, they are referring to the same thing (and hence one of them can be removed)?

Yes, it is a silly leftover from my own conversion process.

----------------

2.) In PropertySets.java (already applied), method makeSparseIndices, you define indices[0] as:

indices[0] = (short) (set.cardinality() + 1);

Later, in PropertyList, you initialize the values array as follows:

this.values = new Property[indices[0]];

I think we can then just use set.cardinality() in makeSparseIndices(), correct? (i.e., leave out the +1).


No, the + 1 is a deliberate trick to handle unknown properties which
should return a null value during lookup(). The other part of the trick
is the

int j = 1;

in makeSpareIndices() which ensure that all the unknown properties in a
indices array all have a value of '0' and all known properties has a
value > 0. For an element fo:bar which support 2 properties foo=21 and
baz=137, the indices array has the values

   indices[21] = 1
   indices[137] = 2

and all other values are 0. The PropertyList.values array then look
like this:

   values[0] = null // always null.
   values[1] = reference to a 'foo' Property instance
   values[2] = reference to a 'baz' Property instance

In the performance sensitive PropertyList.lookup(), the code doesn't
have to test for properties that are unknown by fo:bar

return values[indices[propertyName]];

because all unknown properties map to the values[0] index which always
have a null value.

------------------

3.) PropertySets.java defines those properties which are valid for each FO--in PropertyList, the proposed implementation then uses that information to limit the properties that can be assigned to an FObj (i.e., only those defined as valid for it.) Am I correct here on this point?


And it includes the properties that are valid for all the children element of
each FO. That is what the large while loop does when it calls mergeContent.
It keeps pulling the childrens properties into the parent and repeats doing
it for all the FOs until no more properties can be pulled (yes, there has
to be a better way of doing that).

... and do we also need to somehow additionally qualify *those* properties as "valid for the FO but not directly relevant for it"?


I don't think so, and the patch doesn't do it. ProperttySets only return
the set of properties that are valid for a FO.

------------------------------------

4.) Finally, I'm too far removed from my C programming days to understand the math here:

In the PropertyList constructor, you code this:

this.specified = new int[(indices[0] >> 5) + 1];

(where indices[0] defines the number of properties valid for the FObj)

Why the bitshifting 5 to the right? What does this accomplish--what is this shorthand for?

The 32 bits that is stored in a int. See below.

also, in putSpecified(int idx, Property value), you code this:

specified[i >> 5] |= 1 << (i & 31);

I'm not clear what this is doing either. What does putSpecified() do, and what's the point of the i & 31 and the Or'ing?


PropertList.specified is just a bitmap of the same size as values.length.
All the shifting and masking is similar to what is done in
java.util.BitSet, inlined in PropertyList for performance.

putSpecified() insert a property in the array and set the specified bit to
true. This is in contrast to the inherit() method which also inserts a
property in the array but doesn't set the specified bit. The other put()
method isn't used anymore.

I also think that inlining the bitset implementation is going to far,
but it does make a performance difference, so I included it in the patch.

[Andreas L. Delmelle]

> IIC the initial strategy WRT inherited properties was to add methods to the
> FObj's to get these from their parent. I think the problem with this
> implementation is that, in the case of very large documents with deeply
> nested elements that inherit a property which is specified at the top-level,
> you would end up with one getter being dispatched to the parent's getter,
> and this in its turn being dispatched to yet another ancestor's getter (or
> Makers in case of Property creation)... In this case, however, I think you
> can't fully 'push' these onto the descendants, as this would lead to absurd
> storage-reqs for quite average-sized documents.
>
> OTOH, the inherited property value (resolved or unresolved) can indeed be
> supposed as available at parse time, because a parent is per se processed
> *before* any of its children.

Absolutely correct, and I'm not at all sure that we should go to an array
based storage of the properties in PropertyList, as I did in the patch.
Using arrays has strengths:

- speed during lookup.
- possible to pushing inherited props into the children.

and weaknesses:

- much, much larger memory consumption.

I have some ideas regarding releasing the array memory earlier during the
endElement event that I will try out, but until we have a solution to
the memory consumption, I think we are better of with the current HashMap
implementation of property storage and then delay the implementation of
pushing inherited props into the children.

> I just wonder if this has something to do with
> Finn's other idea of moving logic to the property side to save on
> unnecessary calls to empty methods ?

No, it is unrelated, but while I made the patch to remove the generated
maker classes

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25873

, I've concluded that it would also be good design decision and I plan
on updating patch 25873 to show that the Property.Maker can become
simpler and easier to understand as well as faster.

regards,
finn

[Bug 25480] - Experimental performance improvements.

Reply via email to