Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code
Hi Denis, It looks fine. Hopefully we can eventually figure out why the sorting on the fly didn't pan out. Denis Lila wrote: Hi Jim. Thanks for all your suggestions. I fixed the edge array indexing issue, the moveTo bug (not assigning x0), and the double initialization issue. I also improved the emission of the last row to what you said. The link is the same as the last one I sent. I'm guessing the test for "y == boundsMaxY-1" at line 470 could probably also be deleted now (since it will be handled by the end test when y reaches maxY)? But everything looks in order! ...jim
Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code
Hi Jim. Thanks for all your suggestions. I fixed the edge array indexing issue, the moveTo bug (not assigning x0), and the double initialization issue. I also improved the emission of the last row to what you said. The link is the same as the last one I sent. I considered one more optimization for the scan line iteration: instead of keeping a crossings array and Arrays.sort'ing it for every scanline, utilize the fact that the crossings information is already in the active list, and doesn't need to be copied, so we could just keep the active list sorted by increasing x crossing (this is the way things are done in ShapeSpanIterator, which is where I got the idea from). An added benefit of this (so I thought) would be that the order of the edges sorted by x crossing would not change very often, and not very much, in which case keeping it sorted by insertion sort would have (on average) linear time complexity per scan line, instead of nlogn which is what Arrays.sort has. After implementing this, I tested it, and unfortunately it turns out not to be as good as I expected. It is faster in some tests, but it is a lot slower in some others. It's a shame because it simplified things quite a bit (the file became just 440 lines). With it I was able to remove about 4 or 5 variables from Renderer, flips among them so we no longer needed to keep track of orientations, since we only do that to determine the size of the crossings array. > For now, it might be simpler to ignore this and revisit that later > since it isn't going to be common to have large such jumps in the > middle of most paths. I agree. Not only that, but even if there are gaps I don't think it will be much of a problem since very little work has to be done to traverse scan lines with no crossings. > Filling the alpha row. I had an interesting optimization here that I > > never got around to. Instead of filling the array entries with the > alpha values, fill them with the deltas of the alpha values. The > inner > loop in endRendering then becomes something like: > > alpha[pix_x0 ] += NUM_POS - (x0 & MASK); > alpha[pix_x0+1] += (x0 & MASK); > alpha[pix_x1 ] -= NUM_POS - (x1 & MASK); > alpha[pix_x1+1] -= (x1 & MASK); That's a good idea. I implemented it but I think we should use it whether it offers big savings or not. It's just a nicer algorithm. Although, the way it is now, it is a bit misleading (the name of the array is alpha). I should have commented this a bit more. I don't have time now, I'll do it on Monday. Thanks, Denis. - "Jim Graham" wrote: > Hi Denis, > > More thoughts on Renderer.java. > > -- Skipping gaps (minor optimization) -- > > If there is a gap in the edges in Y, say if a path consists of two > subpaths, one that covers y=[0..10] and another that covers > y=[1000..1010], then I think you will iterate over each y value from > 10 > to 1000, but have no work to do on each scan line. That could > possibly > waste a bit of time. > > On the other hand, fixing that would have to take into account whether > > or not you are done with a given alpha row, so the "NextY" function > can't simply skip - the skipping has to be done at the higher level in > > endRendering - or at least with the cooperation of endRendering. > Since > it asks what the "current Y" is near the top of the loop, it could > detect if the current Y jumped out of the given alpha row, emit it, > and > prepare for a new alpha row starting at the new Y...? > > > -- Done with skipping gaps -- > > -- Alpha accumulation opt -- > > > The [pix+1] lines were the gotcha that always confused me when I tried > > to do this in the past. Basically if you enter an inside region at 1 > > subpixel before the end of a pixel then you need to add one to the > coverage for that pixel. But, starting with the next pixel you want > the > total contribution of the interior region to be NUM_POS per pixel, but > > you've only added 1 so far - so you have to add the additional amount > so > that the total amount added for each "enter" crossing ends up being > NUM_POS. Similarly, when you decrement for the "exit" crossing you > need > to ensure that the total negative delta sums to NUM_POS across the > pixel > where the exit happens and the following pixel. Does that make > sense? > > Note that you could still have the "single pixel" optimization which > would look like: > > alpha[pix_x0 ] += (x1 - x0); > alpha[pix_x0+1] -= (x1 - x0); > > and is equivalent to the above 4 lines. > > then when you do the RLE, you simply have to use an accumulation as > you > scan the alpha row: > > byte nextVal = startVal + alphaRow[i]; > > So, for the cost of an add per pixel as you do the RLE you can avoid > having to do the loop in the "multiple pixel section" when filling the > > alpha array. > > I don't think I've ever quantified this optimization so it might be > better to investigate
Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code
Just to clarify. My second message that I just sent mostly contained some additional optimizations to consider for now or later, but this message below contained at least one (maybe 2) thing(s) that looked like a bug and a few maintenance issues that I think should be done before finalizing this set of changes... ...jim On 7/29/2010 5:27 PM, Jim Graham wrote: Hi Denis, The changes look fine and I'm moving on to Renderer.java... I'd make as many methods private as you can. It looks like all your new methods are private, but some old methods are still public even though I don't think they are accessed elsewhere (like addEdge()?). I think "private" is enough for Hotspot to inline so that should help performance a bit. I usually use "final", but I think "private" does the same thing. You should create constants for the indices into the struct to make the code more readable (and to simplify updating the EDGE layout): X0_OFF = 0 Y0_OFF = 1 // etc. I don't think you touch X0 and Y0 after initializing them and then using them for the first setCurY so you could just use them as the "curX" and "curY" and save 2 slots in the table. You initialize edges twice - once when it is declared, and once in the constructor. Note that the initialization where it is declared uses INIT_SIZE which is not a multiple of EDGE size anyway. It looks like there is a missing statement in moveTo to initialize this.x0...? I need some more time on it, but I thought I would send along these comments in the meantime... ...jim Denis Lila wrote: Hello Jim. I made the changes you point out, except for your second point (I don't have time to think about it right now). I stopped precomputing m00_2_m01_2 because it was only being used in one place. But I guess it worth it to precompute it if it's going to be used often (and computeOffset is called for every line). The new webrev is at http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/ Thanks, Denis.
Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code
Hi Denis, More thoughts on Renderer.java. -- Skipping gaps (minor optimization) -- If there is a gap in the edges in Y, say if a path consists of two subpaths, one that covers y=[0..10] and another that covers y=[1000..1010], then I think you will iterate over each y value from 10 to 1000, but have no work to do on each scan line. That could possibly waste a bit of time. On the other hand, fixing that would have to take into account whether or not you are done with a given alpha row, so the "NextY" function can't simply skip - the skipping has to be done at the higher level in endRendering - or at least with the cooperation of endRendering. Since it asks what the "current Y" is near the top of the loop, it could detect if the current Y jumped out of the given alpha row, emit it, and prepare for a new alpha row starting at the new Y...? For now, it might be simpler to ignore this and revisit that later since it isn't going to be common to have large such jumps in the middle of most paths. -- Done with skipping gaps -- -- Alpha accumulation opt -- Filling the alpha row. I had an interesting optimization here that I never got around to. Instead of filling the array entries with the alpha values, fill them with the deltas of the alpha values. The inner loop in endRendering then becomes something like: alpha[pix_x0 ] += NUM_POS - (x0 & MASK); alpha[pix_x0+1] += (x0 & MASK); alpha[pix_x1 ] -= NUM_POS - (x1 & MASK); alpha[pix_x1+1] -= (x1 & MASK); The [pix+1] lines were the gotcha that always confused me when I tried to do this in the past. Basically if you enter an inside region at 1 subpixel before the end of a pixel then you need to add one to the coverage for that pixel. But, starting with the next pixel you want the total contribution of the interior region to be NUM_POS per pixel, but you've only added 1 so far - so you have to add the additional amount so that the total amount added for each "enter" crossing ends up being NUM_POS. Similarly, when you decrement for the "exit" crossing you need to ensure that the total negative delta sums to NUM_POS across the pixel where the exit happens and the following pixel. Does that make sense? Note that you could still have the "single pixel" optimization which would look like: alpha[pix_x0 ] += (x1 - x0); alpha[pix_x0+1] -= (x1 - x0); and is equivalent to the above 4 lines. then when you do the RLE, you simply have to use an accumulation as you scan the alpha row: byte nextVal = startVal + alphaRow[i]; So, for the cost of an add per pixel as you do the RLE you can avoid having to do the loop in the "multiple pixel section" when filling the alpha array. I don't think I've ever quantified this optimization so it might be better to investigate it after the current code goes in and adopt it only if it shows a big savings. -- Done with alpha accumulation opt -- Something about the "emit last row" code bothers me. First, I would guess that the "y == boundsMaxY - 1" test would have already flushed the row, right? Also, why do the for loop? Why not just flush the row if it isn't flushed? I kind of feel like the "y == boundsMaxY-1" test could be removed from inside the loop and simply test if there is unflushed data in the alpha array then just make a call to emitRow() with the appropriate values after the loop. while (hasNext()) { ... if ((y & MASK) == MASK) { emitRow(...); pix_min,max = MAX,MIN; } } if (pix_min < pix_max) { emitRow(...); } Am I missing something? So, the upshot is that I didn't find anything wrong. You can take or leave my suggestions for improvements as you see fit and maybe save some of them for a second round after this goes back... ...jim On 7/29/2010 2:16 PM, Denis Lila wrote: Hello Jim. I made the changes you point out, except for your second point (I don't have time to think about it right now). I stopped precomputing m00_2_m01_2 because it was only being used in one place. But I guess it worth it to precompute it if it's going to be used often (and computeOffset is called for every line). The new webrev is at http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/ Thanks, Denis.