Hi Denis,

On 9/27/2010 7:43 AM, Denis Lila wrote:
Hi Jim.
How much faster?  I'm worried about this, especially given our tiled
approach to requesting the data.  What was the bottleneck before?
(It's been a while since I visited the code - we weren't computing the
crossings for every curve in the path for every tile being generated
were we?)

     Not much faster. I'm working on someting better.

Then hopefully we can get to something with better memory and CPU costs.

     I'm not sure about the bottleneck, but what we were doing before is:
1. Flatten (by subdividing) every curve so that we deal only with lines.
2. Add each line to a list sorted by y0. When end_rendering was called
for each scanline we found the crossings of the scanline and every line
in our line list, which we used to compute the alpha for that scanline's
pixel row. All this would be put into RLE encoded temporary storage and it
would be read back and converted into tile form by PiscesTileGenerator.

     Speaking of which, would it not be better to get rid of PiscesCache and
just keep a buffer with the current tile row in Renderer.java. This would
be possible because the specification for AATileGenerator says the iteration
is like: for (y...) for (x...);.
Why is PiscesCache there? It isn't being used as a cache at all. Could it be?
Also, why do we output tiles, instead of just pixel rows (which I guess would
just be nx1 tiles). Is it because we would like to use getTypicalAlpha to 
eliminate
completely transparent or completely opaque regions as soon as possible (and the
longer a tile is the less of a chance it has at being either of those two)?

That was basically "cramming what we had into the interface's box". The cache existed for something that was being done on mobile, but it doesn't have much of a place in our APIs so it was just reused for tile generation. If we have a much more direct way of doing it then it would be great to get rid of it.

I think we can support "ALL1s" and "ALL0s" reasonably without the cache.

I can see your points here.  I think there are solutions to avoid much
of the untransforming we can consider, but your solution works well so
let's get it in and then we can look at optimizations if we feel they
are causing a measurable problem later.

     I should say this isn't quite as bad as I might have made it seem. Firstly,
this IO handler class I made elimiinates transformations when Dasher
communicates with Stroker. More importantly, no untransforming is done
when the transformation is just a translation or is the identity or is singular
and when STROKE_CONTROL is off, we only transform the output path. That's
because the most important reason for handling transforms the way I do now
is because we can't normalize untransformed paths, otherwise coordinate
adjustments might be magnified too much. So, we need to transform paths
before normalization. But we also can't do the stroking and widening
before the normalization. But if normalization is removed we can just pass
untransformed paths into Stroker, and transform its output (which is still
somewhat more expensive than only trasnforming the input path, since
Stroker produces many 3-7 curves for each input curve).

Can the untransform be eliminated in the case of scaling? (Whether just for uniform scaling, or maybe even for non-uniform scaling with no rotation or shearing?)

I'm not sure I understand the reasoning of the control point
calculation.  I'll have to look at the code to register an opinion.

     I'm sorry, my explanation wasn't very clear. I attached a picture that
will hopefully clarify things.
But, in a way, the computation I use is forced on us. Suppose we have a
quadratic curve B and we need to compute one of its offsets C. C'(0)
and C'(1) will be parallel to B'(0) and B'(1) so we need to make sure
our computed offset has this property too (or it would look weird around
the endpoints). Now, B'(0) and B'(1) are parallel to p2-p1 and p3-p2
where p1,p2,p3 are the 3 control points that define B, so if the control
points of C are q1, q2, q3 then q2-q1 and q3-q2 must be parallel to p2-p1
and p3-p2 respectively. At this point, we need more constraint, since
our system is underdetermined. We use the constraints that q1 = C(0)
and q3 = C(1) (so, the endpoints of the computed offset are equal to the
endpoints of the ideal offset). All we have left to compute is q2, but
we know the direction of q2-q1 and the direction of q3-q2, so q2 must
lie on the lines defined by q1+t*(q2-q1) and q3+t*(q3-q2) so q2 must
be the intersection of these lines.

I agree that if you are creating a parallel curve then intersection is the way to go. I guess what I was potentially confused about was whether there are cases where you need to subdivide at all? Regardless of subdivision, when you get down to the final step of creating the parallel curves then I believe offsetting and finding the intersection is correct (though I reserve the possibility that there might still be a simpler way - I haven't done any investigation to know if that is true).

It sounds like you are correct here.  What does the closed source code
draw?

     I thought closed source java simply didn't draw the round joins in
these cases, but I did some more testing and it turns out it does for
some curves and it doesn't for others. I've included the results of a
test I wrote that tries to draw paths like: 
moveTo(0,0);p.lineTo(cos,sin);p.lineTo(0,0);
where cos and sin are coordinates on a circle (source1.png is the output
of closed java. Source2.png is my output). As you can see, my
version draws the round joins on all tested cases, while closed java
is inconsistent.

You rock then! A bug should be filed on closed JDK. Can you file it or send me your test case and I'll do it?

     That sounds good. Hopefully by the end of today I'll have a
less memory hungry AA implementation that is also faster.

Yay!

Thank you,

Ummm... Thank *you*. You're doing all the good work here, I'm just sitting back, throwing out tiny crumbs of past experience and watching the ensuing woodchips fly with awe. I've had on my wish list for some time to be able to eliminate these last few closed source holdouts, but the quality of the Ductus code was so high that I never got motivated to try. Who knows now... ;-)

                        ...jim

Reply via email to