Hi Laurent,

There were 1 or 2 very minor fixes I suggested in a couple of files that you indicated you were going to do, but I didn't see a new webrev. Is there one? I think the existing array stuff is probably OK and I'll look at it again now that I understand the tradeoffs of the current design of it, but I wanted to know if there was a newer webrev for me to review based on those other minor fixes?

On 6/12/15 5:45 AM, Laurent Bourgès wrote:
    A general thought - should we have foo_initial arrays if we are
    getting everything from a cache?

I adopted this approach to maximize performance for the 99% cases: I
sized properly the initial arrays and they takes ~ 512Kb per
RendererContext: the ArrayCache (Weak) is only used for extreme cases (a
bit slower). In march 2013, I started by using the cache for all array
accesses and it was 30% slower !

Was the 30% back when we were allocating or clearing all of the arrays? Now that we only clear what we use, the initial sizes of the arrays shouldn't matter so much. It would be worth a little extra size overhead to not have so many special cases and special handling for the "_initial" arrays if the performance is now essentially identical.

I will look again at these cases: I agree "leaking refs" should be set
to null to be sure. However, Marlin code is always executing the pattern
init() / dispose() so leaking pointers are not critical but OK let's set
all of them to null.

For initial arrays, it seems redundant to set to null in dispose() and
set to initial in init() = useless and counter-productive !

It gets easier to review and maintain code if there is a clear reset to some initial conditions is all. I would like to try to keep the code as simple and straightforward as possible while evaluating algorithmic improvements without introducing potential bugs with a lot of special case code and then we can worry about the 1%'s later.

Also, we may find moving the code to native to be more of a performance improvement than trying to fight with Hotspot's load/store policies and so any code that tries to make "slightly more optimal byte codes" would end up not being needed and would add some noise to any such native conversion.

But, I do agree that technically you are correct about some of these suggestions leading to some provably redundant code. The issue is more that constantly having to reprove the redundancies gets in the way of some other progress we could be making.

And, on the other hand, right now you are the engineer driving this from the workload point of view (and I apologize if I am inadvertently in a role that seems to be described as "the engineer getting in the way"), so I'm willing to rely on your commitment to be diligent about these optimizations. I'm just stating a case for whether they should be the focus yet.

Sorry it is not a flaw but an advantage: the initial arrays that can
grow in widenXXXArray() have an odd size as indicated by the comment:
     // +1 to avoid recycling in Helpers.widenArray()
     private final int[] edgePtrs_initial  = new int[INITIAL_SMALL_ARRAY
+ 1]; // 4K

I saw that after the fact, but my main point remains that this is a fragile inter-module optimization. I was unaware that it could save us 30% to have special-cased "initial" arrays, and that fact makes this relationship that much more important but I wonder if that is still true. If we ditch the initial arrays because the new "dirty array" management removes that performance penalty then this all goes away. And, even if we keep the initial array concept it would be nice to find a more reliable way to manage that than a side effect of its length.

I wanted to have a simple but robust API that can be easily used in Marlin.

I'd like to think that I'm working/promoting towards simple and robust APIs as well.

I agree the documentation can be improved a lot and I could introduce a
new helper method in ArrayCache to create the initial arrays in a safer
and explicit way :

int[] createNonCachedIntArray(int length) {
// return only odd length to detect wrong cache usage
     if (length & 1 == 0) {
         length +=1;
     }
     return new int[length];
}

In a future effort, if we find that we must keep some small initial arrays for performance, we could make this a more explicit and tunable parameter on the caches. Let's table that until we consider if they buy us any performance after these dirty array changes.

Anyway I agree the cache handling can be improved a lot: try using
generics for array types with a lambda factory to allocate them.

I just wish that generics could handle primitive arrays, but they can't. I did a similar generification thing for the raster format conversion code in FX, but I had to play games with the generic types being ByteBuffer and IntBuffer and having the variants of those accept the proper type of primitive arrays. It wasn't pretty, but it did work out in the end.

    The caching code is an example of "code with no obvious flaws
    because it is hard to figure out if it has flaws".  We should strive
    for "code with obviously no flaws because it is easy to verify that
    it is doing everything correctly".


I disagree the term "flaw" but I agree it is hard to figure out what is
done.

I took that from a talk on producing more secure code. I think it originally came from an overall system design that could easily be proven secure. The first release of Java required all sensitive Java code to be constantly asking "am I in the context that this operation is allowed?" and the detection was fragile. It was replaced by code where all sensitive operations were rejected unless the surrounding code proved its protected context which was much more secure. "Disallowed unless requested" was harder to make a mistake than "allowed unless it looks questionable".

But, that same philosophy serves a lot of code well. If you start having a bunch of "agreements" between modules then they can quickly get out of synch through regular maintenance and sometimes spiral out of control and you spend more time verifying that you're still satisfying all of these conventions rather than letting the code make sure things are kosher because only valid operations are expressable.

OK, I can work later on a new framework:
- ArrayCache<K> abstract class where <K> is byte[], int[], float[]
- CleanArrayCache<K> extends ArrayCache<K> = clean arrays
- DirtyArrayCache<K> extends ArrayCache<K> = dirty arrays

I think I can trust the existing architecture for at least a putback of the last webrev that you sent out while we braingstorm a better caching mechanism - let me make one last pass through it and I'll send an OK later this afternoon.

What do you mean by "meking further real improvements in the algorithms" ?

Things like investigating DDA in the inner loops, alternate ways to manage crossings that require less sorting, enhancing the stroker to do non-uniform scaling (which would eliminate the double transform in some cases), potentially passing the crossings to the output loop rather than having to convert them into an alpha array, etc. A lot of that is easier to review if there aren't a lot of extra "this line added 1% performance" code adding visual noise.

Who could help ? I am the only one working concretely on this code, help
wanted.

I'd love to get more people involved, but history tells us that this is a fairly lonely field. :(

I may have some time now that FX is in tightened bug-fix mode to do some tinkering, but mainly I will probably be much quicker on getting reviews done which may help as much as having another pair of hands in practice...

                        ...jim

Reply via email to