Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Denis, Just to be certain - you are still planning on putting the existing stuff back and we're talking about future work, right? I'd love to get a stake in the ground here. On 10/25/2010 3:30 PM, Denis Lila wrote: - Create a curve class and store an array of those so you don't have to iterate 3 different arrays of values and use array accesses to grab the data (each array access is checked for index OOB exceptions). I actually implemented something like this in my first draft of the current version of renderer. I didn't stick with it because it was a slower than even what we have in openjdk6. Granted, my implementation was a bit more high level than simply making 3 classes to represent lines quads and cubics, but it seemed pretty hopeless, so I didn't spend any time figuring out exactly what it was that made it slower. Hmmm... Perhaps object allocation overhead was biting us there. In native code you could cobble this up with batch allocation of space and pseudo-object/struct allocation out of the batches. - Or perform AFD on curves as they come into Renderer, but only save line segment edges in the edges array. This would use more storage, but the iterations of the AFD would happen in a tight loop as the data comes in rather than having to store all of their AFD data back in the quads I considered this before abandoning the high level version I mention above. I didn't go with it because, while I am not worried about the higher memory consumption, I was afraid of the impact that having this many edges would have on the qsort call and on lines 99-113 and 140-148 in next(). I can see worrying about qsort, but I think one qsort would be inherently faster than 3 qsorts which you have anyway so the difference would get lost in the noise. Also, I'm not sure how the 99 to 113 and 140 to 148 would be affected. The path will have the same number of crossings per sample row regardless of whether the curves have been flattened or not. You might be adding and deleting edges from the active list more often, though (in other words, 99 would dump more curves and 140 would take in more curves), but the number of edges or curves at any given point would not be affected by flattening. Also, the way you've written the loops at 99, they have to copy every edge/quad/curve that *doesn't* expire so a skipped curve is actually less work for that loop. The loops at 140 would have to occasionally do more processing, but it is made up for in the fact that 99 does less work and the "nextY" processing is simpler. How about this: we change the format of the edges array to be an array of sequences of edges. So, right now the edges array has this format: E* where E represents 6 consecutive floats {ymin,ymax,curx,cury,or,slope}. I am proposing we change it to {n,or}({ymin,ymax,curx,cury,slope})^n. lineTo's would add an edge sequence with n=1 to the edges array. If a call to quadTo or curveTo produced N curves, it would simply put N,or at the end of the edges array, and then append the data for the N produced edges. I think this would give us the best of both worlds, in that we can do all the AFD iterations in a tight loop close to the input methods and it doesn't present any problems with respect to sorting or managing the active list. It can probably be implemented completely transparently with respect to ScanlineIterator. The details of the implementation involve an interesting speed/memory trade off, but we can discuss that later. I think this might be overkill since sorts tend to have logN behavior so doubling the number of edges would not double the time taken in the sort. Also, I would think that the sort would be a small amount of time compared to the rest of the processing, wasn't it? If we are really worried about the y-sort, then how about creating a bunch of buckets and doing a bucket sort of the edges? As they are added to the list of segments, we accumulate their indices in a row list based on their startY so that each step of the "next()" simply moves to the next Y and adds the edges mentioned in the list there. Some work would have to be done on how to manage the storage simply (like a "rownext" field in the edge structure so that they are linked in a linked list), but then there would be no qsort at all for the cost of new int[N_ROWS] and an extra field in every edge. - Convert to native. Note that we use a native version of the pisces that you started with to do some rendering in FX. I tried porting to use your new (Java) renderer in FX and performance went down even though you show it to be faster than what was there before. So, your Java renderer compares favorably to the old Java pisces, but both compare unfavorably to the old native pisces. Maybe we should convert your code to native and see if that gives us a performance boost. It's nice to use pure Java, but there is a lot of "shoehorning" of data going on here that could be done much more ea
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Jim. > - Create a curve class and store an array of those so you don't have > to iterate 3 different arrays of values and use array accesses to grab > the data (each array access is checked for index OOB exceptions). I actually implemented something like this in my first draft of the current version of renderer. I didn't stick with it because it was a slower than even what we have in openjdk6. Granted, my implementation was a bit more high level than simply making 3 classes to represent lines quads and cubics, but it seemed pretty hopeless, so I didn't spend any time figuring out exactly what it was that made it slower. > - Or perform AFD on curves as they come into Renderer, but only save > line segment edges in the edges array. This would use more storage, > but the iterations of the AFD would happen in a tight loop as the data > comes in rather than having to store all of their AFD data back in the quads I considered this before abandoning the high level version I mention above. I didn't go with it because, while I am not worried about the higher memory consumption, I was afraid of the impact that having this many edges would have on the qsort call and on lines 99-113 and 140-148 in next(). How about this: we change the format of the edges array to be an array of sequences of edges. So, right now the edges array has this format: E* where E represents 6 consecutive floats {ymin,ymax,curx,cury,or,slope}. I am proposing we change it to {n,or}({ymin,ymax,curx,cury,slope})^n. lineTo's would add an edge sequence with n=1 to the edges array. If a call to quadTo or curveTo produced N curves, it would simply put N,or at the end of the edges array, and then append the data for the N produced edges. I think this would give us the best of both worlds, in that we can do all the AFD iterations in a tight loop close to the input methods and it doesn't present any problems with respect to sorting or managing the active list. It can probably be implemented completely transparently with respect to ScanlineIterator. The details of the implementation involve an interesting speed/memory trade off, but we can discuss that later. > - Convert to native. Note that we use a native version of the pisces > that you started with to do some rendering in FX. I tried porting to > use your new (Java) renderer in FX and performance went down even > though you show it to be faster than what was there before. So, your Java > renderer compares favorably to the old Java pisces, but both compare > unfavorably to the old native pisces. Maybe we should convert your > code to native and see if that gives us a performance boost. It's nice to > use pure Java, but there is a lot of "shoehorning" of data going on > here that could be done much more easily and naturally in native code. I've been wanting to do this for a very long time. C and C++ are more convenient for this type of work. I didn't because I've never used the JNI before. I guess this is as good a time to learn as any. I'd still like to keep the debugging of native code to a minimum, so we should implement as much as possible in java before starting on this. I still need to think some more about your other suggestion. I'll reply to it tomorrow. > How is that for "food for thought"? Delicious :) Regards, Denis. - "Jim Graham" wrote: > Hi Denis, > > On 10/25/2010 7:34 AM, Denis Lila wrote: > >> (and I have some ideas on further optimizations to consider if you > are still > >> game after this goes in)... > > > > I'd love to hear what they are. > > Here are my thoughts: > > - Currently Renderer has more stages than we probably should have: > for (each full pixel row) { > for (each subpixel row) { > for (each curve on subpixel row) { > convert curve into crossing > crossing is (subpixelx:31 + dir:1) > } > sort crossings for subpixel row > apply wind rule and convert crossings into alpha deltas > } > convert alpha deltas into run lengths > } > for (each tile) { > convert run lengths into alphas > } > I'm thinking we should be able to get rid of a couple of those > stages... > > - One alternate process would be: > (all work is done in the tail end of the cache itself) > for (each full pixel row) { > for (each curve on full pixel row) { > convert curve into N subpixel crossings > (subpixelx:28 + subpixelfracty:3 + dir:1) > } > } > sort all crossings for entire pixel row > maybe collapse raw crossings into modified accumulated crossings > record start of row in index array > for (each tile) { > convert raw or collapsed crossings directly into alphas > } > Note that we could simply leave the raw crossings in the cache and > then > process them in the tile generator, but that would require more > storage > in the cache sin
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Denis, On 10/25/2010 7:34 AM, Denis Lila wrote: (and I have some ideas on further optimizations to consider if you are still game after this goes in)... I'd love to hear what they are. Here are my thoughts: - Currently Renderer has more stages than we probably should have: for (each full pixel row) { for (each subpixel row) { for (each curve on subpixel row) { convert curve into crossing crossing is (subpixelx:31 + dir:1) } sort crossings for subpixel row apply wind rule and convert crossings into alpha deltas } convert alpha deltas into run lengths } for (each tile) { convert run lengths into alphas } I'm thinking we should be able to get rid of a couple of those stages... - One alternate process would be: (all work is done in the tail end of the cache itself) for (each full pixel row) { for (each curve on full pixel row) { convert curve into N subpixel crossings (subpixelx:28 + subpixelfracty:3 + dir:1) } } sort all crossings for entire pixel row maybe collapse raw crossings into modified accumulated crossings record start of row in index array for (each tile) { convert raw or collapsed crossings directly into alphas } Note that we could simply leave the raw crossings in the cache and then process them in the tile generator, but that would require more storage in the cache since a given path would tend to have 8 different entries as it crossed every subpixel row. If we collapse them, then we can sum the entries for a given x coordinate into a single entry and store: (pixelx:25 + coveragedelta:7) where the coverage delta is a signed quantity up to N_SUBPIX^2 so it uses the same storage we needed for the subpixel parts of both X and Y plus the direction bit. It likely needs a paired value in the next x pixel location just like our current "alpha deltas" needs as well. If we wanted to optimize that then we could devote one more bit to "the next pixel will consume all of the remainder of the N^2 coverage", but there would be cases where that would not be true (such as if the pixel row has only partial vertical coverage by the path). It's probably simpler to just have deltas for every pixel. The storage here would likely be similar to the storage used for the current cache since the current RLE cache uses 2 full ints to store a coverage and a count. And in cases where we have one pixel taking up partial coverage and the following pixel taking up the remainder of the full coverage then we have 4 ints, but the "crossing delta" system would only have 2 ints. Other thoughts... - Create a curve class and store an array of those so you don't have to iterate 3 different arrays of values and use array accesses to grab the data (each array access is checked for index OOB exceptions). - Or perform AFD on curves as they come into Renderer, but only save line segment edges in the edges array. This would use more storage, but the iterations of the AFD would happen in a tight loop as the data comes in rather than having to store all of their AFD data back in the quads and curves arrays and then reload the data for every sub-pixel step. Renderer still takes curves, it just breaks them down immediately rather than on the fly. If there are only a small number of edges per curve then the storage might not be that much worse because the quad and curve arrays already store more values than the edge array. - Convert to native. Note that we use a native version of the pisces that you started with to do some rendering in FX. I tried porting to use your new (Java) renderer in FX and performance went down even though you show it to be faster than what was there before. So, your Java renderer compares favorably to the old Java pisces, but both compare unfavorably to the old native pisces. Maybe we should convert your code to native and see if that gives us a performance boost. It's nice to use pure Java, but there is a lot of "shoehorning" of data going on here that could be done much more easily and naturally in native code. How is that for "food for thought"? ...jim
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
> That's great. I will be pushing today. About that: you wrote the TransformingPathConsumer2D file, so how should you be given credit? Should I put your name in "Contributed-by"? Should I put an @author tag in the file? Or does the "reviewed-by" suffice? Regards, Denis. - "Denis Lila" wrote: > Hi Jim. > > > How about this: > > > > (Math.abs(len-leftLen) < err*len) > > > > (noting that err*len can be calculated outside of the loop). > > This is what I use now. > > > Note that a custom shape can send segments in any order that it > wants > > so close,close can happen from a custom shape even if Path2D won't > do > > it. > > Right, of course. For some reason I only considered the test case I > was > using. Sorry for slip up. > > > There is only one question on the board from my perspective - the > > question about dash length errors at the start of this message. > > I'm ok with the accuracy of dash length computations. My main > concerns > right now are dashing performance and AA rendering performance. The > latter has improved, but not by as much as I was expecting. Also, it > was > bothering me that when I removed PiscesCache 1-2 weeks ago > performance > got worse. > It would also be nice to find a method that is guaranteed to find > all offset curve cusps. > > > Depending on how you feel about that I think we're ready to go > > That's great. I will be pushing today. > > > (and I have some ideas on further optimizations to consider if you > are still > > game after this goes in)... > > I'd love to hear what they are. > > Thank you for all your time, > Denis. > > - "Jim Graham" wrote: > > > On 10/22/2010 12:22 PM, Denis Lila wrote: > > > Because the error is meant to be relative. What I use is supposed > to > > be > > > equivalent to difference/AverageOfCorrectValueAndApproximation< > > err, > > > where, in our case, > > AverageOfCorrectValueAndApproximation=(len+leafLen)/2, > > > so that multiplication by 2 should have been a division. > > > Should I use the less fancy differece/CorrectValue< err and skip > > > a division by 2? > > > > If it is relative, then shouldn't it be relative to the desired > > length? > > Why does the computed approximation factor in to the size of the > > error > > you are looking for? If you use the average then the amount of > error > > > > that you will "accept" will be larger if your estimate is wronger. > I > > > > don't think the "wrongness" of your approximation should have any > > effect > > on your error. > > > > >> lines 98-101 - somewhere in the confusion moveToImpl became > > redundant > > >> with moveTo. > > > > > > I thought I'd leave these in because they're interface methods, > > > and it's usually not a great idea to call these from private > > methods. I've > > > removed them anyway, because you said so and because I suppose > > > if ever we need to do something to the user input that shouldn't > be > > done > > > to input coming from private methods in the class (such as the > > scaling > > > of the input coordinates done in Renderer) we can always easily > > > reintroduce them. > > > > I actually thought about the interface concept after I sent that > and > > was > > at peace with them, but I'm also at peace with them being gone. ;-) > > > > > That's right. I don't think it's what should happen, but it's > what > > closed > > > jre does, so I've left it in (with some changes to make it > actually > > > replicate the behaviour of closed java, since it was buggy - the > > moveTo > > > was missing). I don't draw anything on the second close in the > > close,close > > > case, since that would look horrible with round joins and square > > caps. > > > However, the way path2D's are implemented this safeguard will not > > be > > > activated since a closePath() following another closePath() is > > ignored. > > > > > I also now initialize the *dxy *mxy variables in moveTo. This > should > > be an > > > inconsequential change, but I've put it in just so the state of > > > Stroker after a moveTo is well defined. > > > > New code looks good (even if we think it's a silly side effect of > > closed > > JDK to have to implement). > > > > >> line 368 - does this cause a problem if t==1 because lastSegLen > > will > > >> now return the wrong value? Maybe move this into an else clause > on > > the > > >> "t>=1" test? > > > > > > It does cause a problem. I've fixed it by adding a variable that > > keeps track > > > of the length of the last returned segment. The way it was being > > done was > > > a dirty hack because it assumed that if the method didn't return > in > > the loop, > > > done would remain false. > > > > Woohoo! I was actually bothered by the side-effecting in the old > code > > - > > this is a better approach. > > > > > I made one more change in dasher: in somethingTo I removed the > long > > comment > > > near the end, and I handle the case where phase == dash[idx] > > immediately. > > > I do this for con
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Jim. > How about this: > > (Math.abs(len-leftLen) < err*len) > > (noting that err*len can be calculated outside of the loop). This is what I use now. > Note that a custom shape can send segments in any order that it wants > so close,close can happen from a custom shape even if Path2D won't do > it. Right, of course. For some reason I only considered the test case I was using. Sorry for slip up. > There is only one question on the board from my perspective - the > question about dash length errors at the start of this message. I'm ok with the accuracy of dash length computations. My main concerns right now are dashing performance and AA rendering performance. The latter has improved, but not by as much as I was expecting. Also, it was bothering me that when I removed PiscesCache 1-2 weeks ago performance got worse. It would also be nice to find a method that is guaranteed to find all offset curve cusps. > Depending on how you feel about that I think we're ready to go That's great. I will be pushing today. > (and I have some ideas on further optimizations to consider if you are still > game after this goes in)... I'd love to hear what they are. Thank you for all your time, Denis. - "Jim Graham" wrote: > On 10/22/2010 12:22 PM, Denis Lila wrote: > > Because the error is meant to be relative. What I use is supposed to > be > > equivalent to difference/AverageOfCorrectValueAndApproximation< > err, > > where, in our case, > AverageOfCorrectValueAndApproximation=(len+leafLen)/2, > > so that multiplication by 2 should have been a division. > > Should I use the less fancy differece/CorrectValue< err and skip > > a division by 2? > > If it is relative, then shouldn't it be relative to the desired > length? > Why does the computed approximation factor in to the size of the > error > you are looking for? If you use the average then the amount of error > > that you will "accept" will be larger if your estimate is wronger. I > > don't think the "wrongness" of your approximation should have any > effect > on your error. > > >> lines 98-101 - somewhere in the confusion moveToImpl became > redundant > >> with moveTo. > > > > I thought I'd leave these in because they're interface methods, > > and it's usually not a great idea to call these from private > methods. I've > > removed them anyway, because you said so and because I suppose > > if ever we need to do something to the user input that shouldn't be > done > > to input coming from private methods in the class (such as the > scaling > > of the input coordinates done in Renderer) we can always easily > > reintroduce them. > > I actually thought about the interface concept after I sent that and > was > at peace with them, but I'm also at peace with them being gone. ;-) > > > That's right. I don't think it's what should happen, but it's what > closed > > jre does, so I've left it in (with some changes to make it actually > > replicate the behaviour of closed java, since it was buggy - the > moveTo > > was missing). I don't draw anything on the second close in the > close,close > > case, since that would look horrible with round joins and square > caps. > > However, the way path2D's are implemented this safeguard will not > be > > activated since a closePath() following another closePath() is > ignored. > > > I also now initialize the *dxy *mxy variables in moveTo. This should > be an > > inconsequential change, but I've put it in just so the state of > > Stroker after a moveTo is well defined. > > New code looks good (even if we think it's a silly side effect of > closed > JDK to have to implement). > > >> line 368 - does this cause a problem if t==1 because lastSegLen > will > >> now return the wrong value? Maybe move this into an else clause on > the > >> "t>=1" test? > > > > It does cause a problem. I've fixed it by adding a variable that > keeps track > > of the length of the last returned segment. The way it was being > done was > > a dirty hack because it assumed that if the method didn't return in > the loop, > > done would remain false. > > Woohoo! I was actually bothered by the side-effecting in the old code > - > this is a better approach. > > > I made one more change in dasher: in somethingTo I removed the long > comment > > near the end, and I handle the case where phase == dash[idx] > immediately. > > I do this for consistency with lineTo. The only instances where this > makes > > a difference is when we have a path that starts with a dash, ends at > exactly > > the end of a dash, and has a closePath. So something like > move(100,0), > > line(0,0),line(0,50),line(100,50),line(100,0),close() with a dash > pattern {60,60}. > > In closed jdk (and also in openjdk with my patch), no join would be > drawn > > at 100,0 on this path with this dash pattern. > > Whether that's correct behaviour is arguable, imho. On one hand, if > we don't do it > > , but I think it is because if > > it isn't done certain dashed re
Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces
Hi Jim. > It's interesting to note that non-AA dashed ovals took a > much bigger hit than AA dashed ovals so we need to see which code is > fielding those and see what its issue is. I think that's because when AA is used the tests take more time, so a smaller proportion of time is being spent in Dasher so all that is needed for us to see this pattern is for AA to not have gotten too much worse (which it hasn't - it's faster in fact). Regards, Denis. - "Jim Graham" wrote: > Hi Denis, > > Interesting results. Note that sun-jdk7 is already showing some > regressions so the more intesting results are the old-to-new pisces > comparisons. > > I'd stick with the new algorithm and lets look for ways to make dashed > > curves go faster. I know there are better "curve length" algorithms > we > could incorporate, but let's get a stake in the ground before we > consider going back to flattening... > > ...jim > > On 10/22/2010 2:25 PM, Denis Lila wrote: > > Hi Jim. > > > >> I was going to run these today, but fixing the dashing bug above > and > >> rerunning the tests took a while and it's already 8:30 pm here and > I > >> have to go home. I'll run them tomorrow morning. > > > > I ran the benchmarks. I've attached the options file. I ran > benchmarks > > of my icedtea installation, closed source java, openjdk7 without > the > > webrev, and openjdk7 with the webrev. > > > > The results files are at > http://icedtea.classpath.org/~dlila/benchResults/ > > I think the names are pretty self explanatory. > > > > The html comparisons are: > > jdk6 vs latest work: > > > http://icedtea.classpath.org/~dlila/benchResults/JDK6vsLatest_html/Summary_Report.html > > > > closed source vs latest work: > > > http://icedtea.classpath.org/~dlila/benchResults/SUNvsLatest_html/Summary_Report.html > > > > and most importantly: > > previous version of pisces in openjdk7 vs latest work: > > > http://icedtea.classpath.org/~dlila/benchResults/PrevVsLatest_html/Summary_Report.html > > > > The improvements are significant. Running J2DAnalyzer on all the > results files with > > jdk6Bench.res as the basis produces the following summary: > > > > Summary: > >jdk6Bench: > > Number of tests: 104 > > Overall average: 30.7986576862 > > Best spread: 0.0% variance > > Worst spread: 10.96% variance > > (Basis for results comparison) > > > >sunBench: > > Number of tests: 104 > > Overall average: 276654.4443479696 > > Best spread: 0.25% variance > > Worst spread: 19.28% variance > > Comparison to basis: > >Best result: 6488.34% of basis > >Worst result: 43.74% of basis > >Number of wins: 80 > >Number of ties: 2 > >Number of losses: 22 > > > >prevPisces: > > Number of tests: 104 > > Overall average: 221539.3516605794 > > Best spread: 0.08% variance > > Worst spread: 5.54% variance > > Comparison to basis: > >Best result: 350.33% of basis > >Worst result: 55.0% of basis > >Number of wins: 57 > >Number of ties: 10 > >Number of losses: 37 > > > >latestPisces: > > Number of tests: 104 > > Overall average: 226762.64157611743 > > Best spread: 0.0% variance > > Worst spread: 3.03% variance > > Comparison to basis: > >Best result: 501.86% of basis > >Worst result: 26.23% of basis > >Number of wins: 72 > >Number of ties: 4 > >Number of losses: 28 > > > > > > But unfortunately, if you look at the individual test cases in the > html > > reports there are also some stepbacks, most notably in the dashing > of > > anything that isn't a straight line. In fact, the results of > drawOval > > show a 50%-500% improvement in non dashed drawing, and a 50%-25% > deterioration > > in dashed drawing. I was expecting this, after the binary search > algorithm. > > I've been thinking it might be better if we just go with the old > algorithm > > and simply use Dasher.LengthIterator to flatten the curves and feed > the lines > > to lineTo. Or we could just go with what we have and hope we find a > better > > algorithm. > > > > Regards, > > Denis.