Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Denis, The problem is that pruning complicates the inner loop that advances the scanline and you can't tell without scanning every segment in play whether you need to do it in its own loop. Thus, for one of my test cases it was really expensive without some up front record of whether or not the pruning was needed. Also, I keep a count so I can amortize the "widenArray" call. Before I was calling "widen(..., 1)" inside the loop that drew new edges in from the bucket. I could also have added an additional loop to count the new edges, but that would have been expensive as well... ...jim On 11/8/2010 3:23 PM, Denis Lila wrote: Hi Jim. I like the new Renderer, but I have a question about edgeBucketCount. As far as I can see it is only used so that we can tell whether any given bucket contains only edges that go all the way to the last (or beyond) scanline. Is this really common enough that we gain by treating it as a special case? Would it not be better to assume that every bucket needs pruning and save some memory (and possibly resulting in better cache performance)? Thanks, Denis. - "Jim Graham" wrote: It's still a work in progress, but I've cleaned up a lot of logic and made it faster in a number of ways. Note that I've abstracted out the cache stuff and created an "AlphaConsumer" interface which may evolve over time. In FX we actually consume alphas in larger chunks than the code in JDK which was driven by Ductus's 32x32 mandate, so I would have had to make completely incompatible changes to emitRow - so I moved it behind an interface. For the JDK code, if you want to integrate this version, I would have the cache implement the new interface and move your version of emitRow into the Cache object. I'd send you the new code for my AlphaConsumer, but it is incompatible with what you need to cache so it won't help you. You'll also need a bit of un-translation cleanup as we have mirrors of all of java.awt.geom with special new APIs that FX needs. ...jim On 11/8/2010 6:40 AM, Denis Lila wrote: Hi Jim. Also, I've gotten another 20% improvement out of the design with a few more tweaks. (Though I measured the 20% in the stripped down version that I'm prototyping with FX so I'm not sure how much of that 20% would show up through the layers of the 2D code. Overall, I've about doubled the frame rates of the prototype since your first drop that you checked in to the OpenJDK repository.) Can I see your new version? Attached. How about looking more at the stroking end of the process and I'll dig a little more into optimal rasterization code. I have a lot of experience with optimizing rasterizer code (and JNI if it comes to that), but very little with the curve manipulations involved in stroking (math is so *hard* at my age ;-)... Sounds good. Have you implemented your idea of processing one pixel row at a time, as opposed to processing subpixel rows? If not, I could do that. Not yet. Right now I've gotten a lot of mileage out of a few tweaks of the bookkeeping of the sample-row-at-a-time version. I'm still mulling over exactly how to make that go faster. ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Denis, On 11/8/2010 2:39 PM, Denis Lila wrote: Finally, I discovered (while testing for other problems) that the curves are not truly monotonic after slicing them. I realized this years ago when I was writing my Area code (see sun.awt.geom.Curve) and put in tweaking code to make them monotonic after they were split. They are never off by more than a few bits, but you can't trust the curve splitting math to generate purely monotonic segments based on a t generated by some unrelated math. Sometimes the truly horizontal or vertical t value requires more precision than a float (or even a double) can provide and splitting at the highest representable float less than the t value produces a pair of curves on one side of the hill and splitting at the next float value (which is greater than the true t value) produces curves on the other side of the hill. Also, when the curve has been split a few times already, the t values loose accuracy with each split. This will all be moot if I can eliminate the splitting code from the renderer, but it may also play a factor in the stroke/dash code... Making curves monotonic is only used for optimization purposes, so it can't see how it would affect rendering correctness. All lines generated from a given "allegedly monotonic" curve are recorded with the same "or" (orientation) value. But, if the curves are not truly monotonic then it might be theoretically possible to generate a line that is backwards with respect to the expected orientation. It would then get recorded in the edges array with the wrong orientation and slope and then rasterization might unravel. Fortunately, the non-monotonicity is limited to a few bits of precision so this may never generate an errant edge in practice unless flattening gets really fine-grained... ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Jim. I like the new Renderer, but I have a question about edgeBucketCount. As far as I can see it is only used so that we can tell whether any given bucket contains only edges that go all the way to the last (or beyond) scanline. Is this really common enough that we gain by treating it as a special case? Would it not be better to assume that every bucket needs pruning and save some memory (and possibly resulting in better cache performance)? Thanks, Denis. - "Jim Graham" wrote: > It's still a work in progress, but I've cleaned up a lot of logic and > > made it faster in a number of ways. Note that I've abstracted out the > > cache stuff and created an "AlphaConsumer" interface which may evolve > > over time. > > In FX we actually consume alphas in larger chunks than the code in JDK > > which was driven by Ductus's 32x32 mandate, so I would have had to > make > completely incompatible changes to emitRow - so I moved it behind an > interface. For the JDK code, if you want to integrate this version, I > > would have the cache implement the new interface and move your version > > of emitRow into the Cache object. I'd send you the new code for my > AlphaConsumer, but it is incompatible with what you need to cache so > it > won't help you. > > You'll also need a bit of un-translation cleanup as we have mirrors of > > all of java.awt.geom with special new APIs that FX needs. > > ...jim > > On 11/8/2010 6:40 AM, Denis Lila wrote: > > Hi Jim. > > > >> Also, I've gotten another 20% improvement out of the design with a > few > >> more tweaks. (Though I measured the 20% in the stripped down > version > >> that I'm prototyping with FX so I'm not sure how much of that 20% > >> would show up through the layers of the 2D code. Overall, I've > about > >> doubled the frame rates of the prototype since your first drop that > you > >> checked in to the OpenJDK repository.) > > > > Can I see your new version? > > Attached. > > >> How about looking more at the stroking end of the process and I'll > dig > >> a little more into optimal rasterization code. I have a lot of > >> experience with optimizing rasterizer code (and JNI if it comes to > that), but > >> very little with the curve manipulations involved in stroking (math > is so > >> *hard* at my age ;-)... > > > > Sounds good. Have you implemented your idea of processing one pixel > row at a > > time, as opposed to processing subpixel rows? If not, I could do > that. > > Not yet. Right now I've gotten a lot of mileage out of a few tweaks > of > the bookkeeping of the sample-row-at-a-time version. I'm still > mulling > over exactly how to make that go faster. > > ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Jim. > A couple of questions about the code that I haven't touched... > Is there some reason why the AFD for cubics doesn't have any tests for > dddxy (the "constants" for its equations), but the AFD for quads is > testing the ddxy on every loop? I know that these values do change > when the AFD variables are doubled or halved, but why does the cubic > version get away with only testing out to the n-1th order differences but the > quad version has to test out to the nth order differences? That's because I used to have only one AFD function that worked for both cubics and quadratics. It simply treated quadratics as cubics with the first coefficient 0. I decided to split them up because we might want to do different things for quadratics, but I didn't get around to doing any of them (yet) so that's why it is the way it is. > Also, what is the value of breaking the pieces into monotonic segments > prior to flattening? Is it simply amortizing the cost of determining > if the segment is up or down? I guess this used to be done because we > needed monotonic (in Y) curve segments since we did a top-down > iteration across all segments, but now they aren't in the rasterization loop. > If it is simply a performance issue then I may experiment with > eliminating that stage and seeing if I can make it go faster overall. Well, making them monotonic makes updating edgeM[in|ax][X|Y] more efficient. We only have to do the tests once per subdivided curve instead of once per line. Also it helps with making our lines have y2>y1. Without monotonicity we'd have to test each individual line, instead of just reversing the curve coordinate array. However, for the latter, all we need is monotonicity in Y. We should make some measurements to determine whether the above are enough to compensate for the subdivision costs. > Finally, I discovered (while testing for other problems) that the > curves are not truly monotonic after slicing them. I realized this years ago > when I was writing my Area code (see sun.awt.geom.Curve) and put in > tweaking code to make them monotonic after they were split. They are > never off by more than a few bits, but you can't trust the curve > splitting math to generate purely monotonic segments based on a t > generated by some unrelated math. Sometimes the truly horizontal or > vertical t value requires more precision than a float (or even a > double) can provide and splitting at the highest representable float less than > the t value produces a pair of curves on one side of the hill and > splitting at the next float value (which is greater than the true t > value) produces curves on the other side of the hill. Also, when the > curve has been split a few times already, the t values loose accuracy > with each split. This will all be moot if I can eliminate the > splitting code from the renderer, but it may also play a factor in the > stroke/dash > code... Making curves monotonic is only used for optimization purposes, so it can't see how it would affect rendering correctness. Regards, Denis. - "Jim Graham" wrote: > > ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Also, some things to note in the new version - things I had to "fix" not related to performance. In endRendering the pixel bounds of the edges needed to be computed and passed along to the next stage. Unfortunately the code in endRendering computed the sub-pixel bounds and then simply shifted them to get the pixel addresses of those bounds. For the bottom and right edges, this means that a ceiling operation was performed on the sub-pixel data, but a floor operation was performed on the conversion to pixel addresses. This means that the bottom few sub-pixel rows could have been sliced off in these calculations. I restructured the calculations there to make sure that it produced the pixel bounds of all of the subpixel rows. Also, I use a ceil() operation on the sub-pixel values because we really want to know what is the next sub-pixel sample row/column that we will cross and then get the (pixel) bounds of those sample row/columns. I think there may have been a couple of other places where the "which pixel are we on" code got sloppy along similar lines to the endRendering case. The conditions for sending the alphas for the last row changed a little. I need to have every row visited in my AlphaConsumer because it is cleaning out an alpha tile as it incorporates the new alphas. Thus, I needed the logic to call "getAndClear" even if there were no alphas to deliver. I don't think that change would have affected your version with the PiscesCache since you can just finalize the cache when the alphas are done and clearing the alphaRow[] array is irrelevant at that final stage. The way this works may change as I continue optimizing. ...jim On 11/8/2010 6:40 AM, Denis Lila wrote: Hi Jim. Also, I've gotten another 20% improvement out of the design with a few more tweaks. (Though I measured the 20% in the stripped down version that I'm prototyping with FX so I'm not sure how much of that 20% would show up through the layers of the 2D code. Overall, I've about doubled the frame rates of the prototype since your first drop that you checked in to the OpenJDK repository.) Can I see your new version?
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
A couple of questions about the code that I haven't touched... Is there some reason why the AFD for cubics doesn't have any tests for dddxy (the "constants" for its equations), but the AFD for quads is testing the ddxy on every loop? I know that these values do change when the AFD variables are doubled or halved, but why does the cubic version get away with only testing out to the n-1th order differences but the quad version has to test out to the nth order differences? Also, what is the value of breaking the pieces into monotonic segments prior to flattening? Is it simply amortizing the cost of determining if the segment is up or down? I guess this used to be done because we needed monotonic (in Y) curve segments since we did a top-down iteration across all segments, but now they aren't in the rasterization loop. If it is simply a performance issue then I may experiment with eliminating that stage and seeing if I can make it go faster overall. Finally, I discovered (while testing for other problems) that the curves are not truly monotonic after slicing them. I realized this years ago when I was writing my Area code (see sun.awt.geom.Curve) and put in tweaking code to make them monotonic after they were split. They are never off by more than a few bits, but you can't trust the curve splitting math to generate purely monotonic segments based on a t generated by some unrelated math. Sometimes the truly horizontal or vertical t value requires more precision than a float (or even a double) can provide and splitting at the highest representable float less than the t value produces a pair of curves on one side of the hill and splitting at the next float value (which is greater than the true t value) produces curves on the other side of the hill. Also, when the curve has been split a few times already, the t values loose accuracy with each split. This will all be moot if I can eliminate the splitting code from the renderer, but it may also play a factor in the stroke/dash code... ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
It's still a work in progress, but I've cleaned up a lot of logic and made it faster in a number of ways. Note that I've abstracted out the cache stuff and created an "AlphaConsumer" interface which may evolve over time. In FX we actually consume alphas in larger chunks than the code in JDK which was driven by Ductus's 32x32 mandate, so I would have had to make completely incompatible changes to emitRow - so I moved it behind an interface. For the JDK code, if you want to integrate this version, I would have the cache implement the new interface and move your version of emitRow into the Cache object. I'd send you the new code for my AlphaConsumer, but it is incompatible with what you need to cache so it won't help you. You'll also need a bit of un-translation cleanup as we have mirrors of all of java.awt.geom with special new APIs that FX needs. ...jim On 11/8/2010 6:40 AM, Denis Lila wrote: Hi Jim. Also, I've gotten another 20% improvement out of the design with a few more tweaks. (Though I measured the 20% in the stripped down version that I'm prototyping with FX so I'm not sure how much of that 20% would show up through the layers of the 2D code. Overall, I've about doubled the frame rates of the prototype since your first drop that you checked in to the OpenJDK repository.) Can I see your new version? Attached. How about looking more at the stroking end of the process and I'll dig a little more into optimal rasterization code. I have a lot of experience with optimizing rasterizer code (and JNI if it comes to that), but very little with the curve manipulations involved in stroking (math is so *hard* at my age ;-)... Sounds good. Have you implemented your idea of processing one pixel row at a time, as opposed to processing subpixel rows? If not, I could do that. Not yet. Right now I've gotten a lot of mileage out of a few tweaks of the bookkeeping of the sample-row-at-a-time version. I'm still mulling over exactly how to make that go faster. ...jim package com.sun.openpisces; /** * @author Flar */ public interface AlphaConsumer { public int getOriginX(); public int getOriginY(); public int getWidth(); public int getHeight(); public void setMaxAlpha(int maxalpha); public void setAndClearRelativeAlphas(int alphaDeltas[], int y, int firstdelta, int lastdelta); } package com.sun.openpisces; import com.sun.javafx.geom.PathConsumer; import com.sun.javafx.geom.PathIterator; import com.sun.javafx.geom.Rectangle; import java.util.Iterator; /** * @author Flar */ public final class OpenPiscesRenderer implements PathConsumer { public static void feedConsumer(PathIterator pi, PathConsumer pc) { float[] coords = new float[6]; while (!pi.isDone()) { int type = pi.currentSegment(coords); switch (type) { case PathIterator.SEG_MOVETO: pc.moveTo(coords[0], coords[1]); break; case PathIterator.SEG_LINETO: pc.lineTo(coords[0], coords[1]); break; case PathIterator.SEG_QUADTO: pc.quadTo(coords[0], coords[1], coords[2], coords[3]); break; case PathIterator.SEG_CUBICTO: pc.curveTo(coords[0], coords[1], coords[2], coords[3], coords[4], coords[5]); break; case PathIterator.SEG_CLOSE: pc.closePath(); break; } pi.next(); } pc.pathDone(); } private final class ScanlineIterator { private int[] crossings; private int[] edgePtrs; private int edgeCount; // crossing bounds. The bounds are not necessarily tight (the scan line // at minY, for example, might have no crossings). The x bounds will // be accumulated as crossings are computed. private final int maxY; private int nextY; private static final int INIT_CROSSINGS_SIZE = 10; private ScanlineIterator() { crossings = new int[INIT_CROSSINGS_SIZE]; edgePtrs = new int[INIT_CROSSINGS_SIZE]; // We don't care if we clip some of the line off with ceil, since // no scan line crossings will be eliminated (in fact, the ceil is // the y of the first scan line crossing). nextY = getFirstScanLineCrossing(); maxY = getScanLineCrossingEnd()-1; } private int next() { // TODO: make function that convert from y value to bucket idx? int cury = nextY++; int bucket = cury - boundsMinY; int count = this.edgeCount; int ptrs[] = this.edgePtrs;
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
On 11/8/2010 6:34 AM, Denis Lila wrote: Hi Clemens. I've only followed your discussion with Jim but skipped all the in-depth discussion. From my prior experiences usually JNI is not woth the trouble, if you don't have a serious reason why using native code would have advantages (like the possibility of using SIMD or when value-types would be a huge benefit), it has its own performance pitfalls especially if the workload is small and things like Get*ArrayCritical cause scalability problems because they have to lock the GC. Well, Jim Graham said that a native version of the engine still runs a lot faster than the version with all my changes. That's why I thought Actually, that report is old. I've now got the new Java version turning in double the frame rates of the old native version. it would be a good idea. Also, when not doing antialiasing we usually feed paths to native consumers, so I thought if pisces used JNI, we could reduce the java->C transitions five fold. But then I realized that with antialiasing the opposite would happen, so I'm not sure whether JNI is a good idea. That's a good point that the other rasterizers will end up using this stroking engine and they are native. We can worry about cleaning that up later. JNI might eventually be a good idea, but lets fix the algorithm first and then worry about whether it will help this renderer or if we can make the interface to the other renderers simpler. ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Il giorno lun, 08/11/2010 alle 09.34 -0500, Denis Lila ha scritto: > > Have you had a look at the Netbeans profiler? It supports sampling > > based testing to keep the influence of the profiler at a minimum. > > No, I don't use netbeans - it doesn't render underscores properly (although > I think this is an openjdk bug). I'll try it out. This has been fixed in OpenJDK 7, and should have been back ported to 6, not sure when the fix will be out. > > Thanks for working on this! yup :) Mario -- pgp key: http://subkeys.pgp.net/ PGP Key ID: 80F240CF Fingerprint: BA39 9666 94EC 8B73 27FA FC7C 4086 63E3 80F2 40CF Proud GNU Classpath developer: http://www.classpath.org/ Read About us at: http://planet.classpath.org OpenJDK: http://openjdk.java.net/projects/caciocavallo/ Please, support open standards: http://endsoftpatents.org/
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Clemens. > I've only followed your discussion with Jim but skipped all the > in-depth discussion. > From my prior experiences usually JNI is not woth the trouble, if > you don't have a serious reason why using native code would have > advantages (like the possibility of using SIMD or when value-types > would be a huge benefit), it has its own performance pitfalls > especially if the workload is small and things like Get*ArrayCritical > cause scalability problems because they have to lock the GC. Well, Jim Graham said that a native version of the engine still runs a lot faster than the version with all my changes. That's why I thought it would be a good idea. Also, when not doing antialiasing we usually feed paths to native consumers, so I thought if pisces used JNI, we could reduce the java->C transitions five fold. But then I realized that with antialiasing the opposite would happen, so I'm not sure whether JNI is a good idea. > Have you had a look at the Netbeans profiler? It supports sampling > based testing to keep the influence of the profiler at a minimum. No, I don't use netbeans - it doesn't render underscores properly (although I think this is an openjdk bug). I'll try it out. > Thanks for working on this! It's been fun. Regards, Denis. - "Clemens Eisserer" wrote: > Hi Denis, > > > It's not obvious to me why this happened, so I think now I will put > > this type of optimization aside and convert to JNI, > > > > where profiling > > will be easier (for me - I haven't been able to make OProfile work > > for java yet). > > - Clemens
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Jim. > Also, I've gotten another 20% improvement out of the design with a few > more tweaks. (Though I measured the 20% in the stripped down version > that I'm prototyping with FX so I'm not sure how much of that 20% > would show up through the layers of the 2D code. Overall, I've about > doubled the frame rates of the prototype since your first drop that you > checked in to the OpenJDK repository.) Can I see your new version? > How about looking more at the stroking end of the process and I'll dig > a little more into optimal rasterization code. I have a lot of > experience with optimizing rasterizer code (and JNI if it comes to that), but > very little with the curve manipulations involved in stroking (math is so > *hard* at my age ;-)... Sounds good. Have you implemented your idea of processing one pixel row at a time, as opposed to processing subpixel rows? If not, I could do that. Thanks, Denis. - "Jim Graham" wrote: > Hi Denis, > > ...jim > > On 11/4/2010 9:20 AM, Clemens Eisserer wrote: > > Hi Denis, > > > >> It's not obvious to me why this happened, so I think now I will > put > >> this type of optimization aside and convert to JNI, > > > > I've only followed your discussion with Jim but skipped all the > > in-depth discussion. > >> From my prior experiences usually JNI is not woth the trouble, if > you > > don't have a serious reason why using native code would have > > advantages (like the possibility of using SIMD or when value-types > > would be a huge benefit), it has its own performance pitfalls > > especially if the workload is small and things like > Get*ArrayCritical > > cause scalability problems because they have to lock the GC. > > > >> where profiling > >> will be easier (for me - I haven't been able to make OProfile work > >> for java yet). > > > > Have you had a look at the Netbeans profiler? It supports sampling > > based testing to keep the influence of the profiler at a minimum. > > > > Thanks for working on this! > > > > - Clemens