Hello Jim. I made the changes you point out, except for your second point (I don't have time to think about it right now).
I stopped precomputing m00_2_m01_2 because it was only being used in one place. But I guess it worth it to precompute it if it's going to be used often (and computeOffset is called for every line). The new webrev is at http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/ Thanks, Denis. ----- "Jim Graham" <james.gra...@oracle.com> wrote: > Hi Denis, > > Only some minor comments so far: > > Stroker.java: > > - Should det be precomputed and saved? (You calculate it in the > constructor anyway, but don't save it.) > - Should test for uniform scale be precomputed? > - (Is test for uniform scale too strict? Can a rotated uniform scale > > use the same code as upright uniform scale?) > - Why are m00_2_m01_2 et al no longer precomputed (they only need to > be > precomputed if scale is not uniform which isn't common)? > - lineLength method is "user space length" isn't it? - a more > descriptive name might help avoid confusion if someone is modifying > code > here later. > - line 614 - missing space > > Dasher.java: > > - Line 187 - use "leftInThisDashSegment" here? > > I still have to look at Renderer.java in depth, but I thought I'd send > > these minor comments along while they are fresh in my email buffer... > > ...jim > > Denis Lila wrote: > > Hello. > > > > And, here it is: > http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/ > > > > Just as I thought, it's way faster than the original. I made a new > > test, which is supposed to be more realistic than drawing 30000 > length > > lines. It consists of splitting a 1000x1000 frame in 100 10x10 > squares > > and in each square drawing 20 or so curves with random > start/end/control > > points in that square. In this case it's at least twice faster > (~0.85 > > versus ~1.95 seconds). > > > > Unfortunately I've had to do a lot of the same optimizations that I > was > > trying to remove (i.e. making everything a global variable). I > tried > > writing a higher level version of it, but it completely negated the > > performance gains of the new algorithm. Anyway, it's still not as > bad > > as before because the algorithm is inherently clearer and we only > iterate > > through scan lines instead of iterating through strips and then > scanlines > > in that strip, and then edges, all in the same method. > > > > It's also better organized, and logically separate parts of the code > don't > > really touch each other's variables much. And it's definitely > better > > commented. > > > > I also made some minor changes outside of Renderer that did not > appear in > > the first version of this patch last week: > > 1. I fixed the issue Jim pointed out in Dasher, where if > > (origLen == leftInThisDashSegment) the dash index was not being > incremented. > > 2. In dasher, I no longer copy the dash array. > > 3. I removed files PiscesMath.java and Transform4.java because they > are no > > longer needed. > > > > Thanks, > > Denis. > > > > ----- "Jim Graham" <james.gra...@oracle.com> wrote: > > > >> Woohoo, Denis! I look forward to seeing the new version! > >> > >> ...jim > >> > >> On 7/28/2010 5:51 AM, Denis Lila wrote: > >>> Hello Jim. > >>> > >>> This one performs almost identically to what is already there > >>> in openjdk6 and 7, since it's exactly what I sent for review > >>> last week, but with all the changes you suggested implemented. > >>> > >>> I would actually like to ask you to not look at either one of > >>> them. First of all, there is an ArrayIndexOutOfBoundsException > >>> possible in emitRow. > >>> And secondly, even if there wasn't, last night I implemented > >>> your algorithm from ShapeSpanIterator.c to iterate through > >>> the edges. I have yet to debug it, but it makes everything much, > >>> much simpler, and it should make it far faster, so we get the > best > >>> of both worlds. > >>> > >>> Thanks, > >>> Denis. > >>> > >>> ----- "Jim Graham"<james.gra...@oracle.com> wrote: > >>> > >>>> Hi Denis, > >>>> > >>>> I'll try to get through both versions and see if I can find > >> anything > >>>> that was hurting performance with your EdgeLists. I'm guessing > >> that > >>>> this version was created because of the performance issues you > >> found > >>>> with the EdgeList version? Does this perform more closely to > the > >>>> existing code than the EdgeList version? > >>>> > >>>> ...jim > >>>> > >>>> Denis Lila wrote: > >>>>> Hello again. > >>>>> > >>>>> This attachmet is a "can of worms" implementation without all > the > >>>> fancy (and slow) > >>>>> iteration. It also includes all of the other suggestions you > sent > >> in > >>>> your first > >>>>> review of Dasher and Renderer last week (most importantly, the > >>>> firstOrientation > >>>>> issue, horizontal lines filtering, and adding prefixes to > >> variable > >>>> names to make > >>>>> it clear whether they refer to pixels, or subpixels). > >>>>> > >>>>> Regards, > >>>>> Denis. > >>>>> > >>>>> ----- "Denis Lila"<dl...@redhat.com> wrote: > >>>>> > >>>>>> Hello Jim. > >>>>>> > >>>>>> I implemented your "can of worms" idea. It works, and it got > rid > >>>> of > >>>>>> the biasing. > >>>>>> I wasn't able to send a webrev, but there are many changes and > a > >>>> side > >>>>>> by side > >>>>>> comparison would probably be useless, so I just attached the > >> file. > >>>> I > >>>>>> hope this is > >>>>>> ok. > >>>>>> > >>>>>> I also implemented a "better" iterating structure for the > lines > >>>> and > >>>>>> the > >>>>>> strips and crossings. I think it is better in every way, > except > >>>>>> performance. > >>>>>> The new file is more than 200 lines smaller than the old one. > >> The > >>>> only > >>>>>> members of Renderer are now the AA variables and the position > >>>>>> variables > >>>>>> (sx*, sy*, x*, y*). > >>>>>> What I've done is I added an EdgeList class, which > encapsulates > >>>> all > >>>>>> the edge > >>>>>> related variables in the old Renderer. At first, I had an Edge > >>>> class > >>>>>> in addition > >>>>>> to the EdgeList class, and while this was much nicer, it > turned > >> out > >>>> to > >>>>>> be too > >>>>>> expensive (see last paragraph). > >>>>>> I've also added a ScanLineIterator, so instead of > _endRendering > >>>>>> iterating > >>>>>> through strips, and then calling renderStrip() which iterates > >>>> through > >>>>>> the > >>>>>> scanlines in that strip, and then through the crossings in > that > >>>>>> scanline, > >>>>>> what happens now is that _endRendering uses the > >> Iterator<ScanLine> > >>>> to > >>>>>> iterate through each scanline, get get its crossings and > iterate > >>>>>> through them > >>>>>> to accumulate the alpha. By the way, a ScanLine is a type > >> defined > >>>> by > >>>>>> an > >>>>>> interface which exports methods for getting the y coord of the > >>>> line, > >>>>>> the > >>>>>> number of crossings in it, the ith crossing, and a method for > >>>> sorting > >>>>>> its crossings. > >>>>>> The class that implements ScanLine is ScanLineIterator itself. > I > >>>> made > >>>>>> a > >>>>>> ScanLine class, but I was afraid performance would suffer > >> because > >>>> of > >>>>>> all the > >>>>>> object creations (this turned out not to be an issue, after I > >>>> switched > >>>>>> to the > >>>>>> current way, and remeasured things). I did not switch back > >> because > >>>>>> this is > >>>>>> only slightly worse. > >>>>>> > >>>>>> As for performance: I wrote a simple program that tries to > draw > >> a > >>>>>> dashed path > >>>>>> that consists of about 160 dashed lines of width 1 and length > >>>> 30000, > >>>>>> going > >>>>>> from the centre of the frame to some point. On my machine, > this > >>>> takes > >>>>>> about 4.9 > >>>>>> seconds in openjdk6, and 26 seconds using the attached file. > >> Back > >>>> when > >>>>>> I was using > >>>>>> the Edge class it took about 39 seconds. Everything without > >> hundres > >>>> of > >>>>>> thousands of > >>>>>> edges is not much slower > >>>>>> I have not changed any of the algorithms. ScanLineIterator > still > >>>> goes > >>>>>> through > >>>>>> strips of the same size and computes crossings in every strip > >>>> using > >>>>>> the same > >>>>>> method as before, so I don't know why it's so slow. It can't > be > >>>>>> because of anything > >>>>>> happening in _endRendering, because there are only about 9000 > >>>>>> scanlines and for each > >>>>>> of them I've just added a few calls to one line getters (which > >> used > >>>> to > >>>>>> be direct > >>>>>> accesses into arrays). > >>>>>> > >>>>>> Thanks, > >>>>>> Denis. > >>>>>> > >>>>>> ----- "Jim Graham"<james.gra...@oracle.com> wrote: > >>>>>> > >>>>>>> Denis Lila wrote: > >>>>>>>> Hello Jim. > >>>>>>>> > >>>>>>>> Thank you very much for taking the time to read through > this. > >>>>>>>> > >>>>>>>>> 169 - if origLen reaches the end of the dash exactly (the > >> "==" > >>>>>>> case) > >>>>>>>> You're right, I should. I can't just replace<= with == > >>>>>> though, > >>>>>>>> because the results will be the same: in the equal case > >> origLen > >>>>>>> will > >>>>>>>> become 0, and on the next iteration, the (origLen< > >>>>>> dash[idx]-phase) > >>>>>>> will > >>>>>>>> be true, and we will do a goTo(x1,y1), which is what we just > >> did > >>>>>> in > >>>>>>> the > >>>>>>>> previous iteration (unless dash[idx] is 0, in which case the > >>>>>> results > >>>>>>>> will be even worse). The best solution to this is to just do > a > >>>>>>> nested > >>>>>>>> check for the == case. > >>>>>>> Ah, right - because there is no "break" when origLen becomes > >>>> zero. > >>>>>>> Sounds like you're on it. > >>>>>>> > >>>>>>>>> 171 - Aren't x0,y0 stored as subpix values? You would then > >> be > >>>>>>> comparing > >>>>>>>>> a subpix value to a non-subpix value. Perhaps if the > subpix > >>>>>> calls > >>>>>>> are > >>>>>>>>> moved to the top of the function I think this should work > OK? > >>>>>>>> That's true, they are. This is very puzzling. If a > >>>> horizontal > >>>>>>> line is > >>>>>>>> added, when the crossings for it are being computed, dxBydy > >>>> should > >>>>>>> be NaN, and > >>>>>>>> wouldn't an error be thrown when we try to cast to an int in > >> the > >>>>>>> call to addCrossing? > >>>>>>> > >>>>>>> I'm not sure - I didn't trace it through very far - I just > >> noted > >>>>>> that > >>>>>>> the values were likely in different "resolutions". > >>>>>>> > >>>>>>>>> 194,197 - Shouldn't these be constants, or based on the > >>>>>>> SUB_POS_XY? > >>>>>>>> I suppose I should make a biasing constant. I don't > think > >>>> they > >>>>>>> should be based > >>>>>>>> on SUB_POS_XY though, because the biasing is done to > subpixel > >>>>>>> coordinates so > >>>>>>>> there is no danger that if our supersampling is fine enough > >> the > >>>>>>> biasing will > >>>>>>>> make the coordinates jump over some scan line. > >>>>>>> I'm guessing you punted on my "can of worms" suggestion then. > >>>> ;-) > >>>>>>>>> 216 - if you save the sx0,sy0 before subpix'ing them then > you > >>>>>> don't > >>>>>>> have > >>>>>>>>> to "unsubpix" them here. (Though you still need the subpix > >> sy0 > >>>>>> for > >>>>>>> line > >>>>>>>>> 209 - or you could just call subpix on it for that line and > >> at > >>>>>>> least > >>>>>>>>> you'd save 1 call to sub/unsub). > >>>>>>>> Ok. I'll just save subpixed and unsubpixed versions of > >> sx0, > >>>>>> sy0. > >>>>>>> That should > >>>>>>>> eliminate all sx0,sy0 related calls to tosubpix and topix > >> except > >>>>>> in > >>>>>>> moveTo. > >>>>>>> > >>>>>>> and lineTo. You may only need 3 of those values, though, if > I > >>>>>>> remember > >>>>>>> my code reading well enough. > >>>>>>> > >>>>>>>>> 256,264 - casting to int is problematic. It truncates > >> towards > >>>> 0 > >>>>>>> which > >>>>>>>>> means negatives are ceil'd and positives are floor'd. It > >> would > >>>>>> be > >>>>>>> best > >>>>>>>>> to use floor here instead. On the other hand, if negative > >>>>>> numbers > >>>>>>> are > >>>>>>>>> always "off the left side of the drawable" then this is > moot. > >>>>>>>> That's why I left it at int casting. Do you still think > I > >>>>>> should > >>>>>>> change it > >>>>>>>> to floor? > >>>>>>> If you mean floor, I think it best to use floor. Unless you > >> can > >>>>>> prove > >>>>>>> that negatives aren't really an issue and that the strange > >>>>>> truncation > >>>>>>> on > >>>>>>> them won't be numerically a problem - but I don't think it is > >>>> worth > >>>>>> it > >>>>>>> for this code. > >>>>>>> > >>>>>>>> Speaking of which, is there a good way to edit and build > >> openJDK > >>>>>>> from eclipse? > >>>>>>>> Then this sort of embarrassing error could be avoided > >> (including > >>>>>> the > >>>>>>> printStats() call). > >>>>>>> > >>>>>>> I don't use Eclipse, sorry. :-( > >>>>>>> > >>>>>>>> As for Arrays.newSize()... I can't find it here: > >>>>>>>> > >> > http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html > >>>>>>>> Is this a new function added in 7? > >>>>>>> Sorry, make that Arrays.copyOf(..., newSize). I tried to > type > >>>> the > >>>>>>> name > >>>>>>> from memory and got it wrong. > >>>>>>> > >>>>>>>>> 721 - Arrays.sort() > >>>>>>>> I thought about using this, but I did some > measurements, > >> and > >>>>>> it > >>>>>>> turns out that > >>>>>>>> Arrays.sort() is a bit slower if the portion of the array > >> being > >>>>>>> sorted has fewer > >>>>>>>> than about 70 elements. > >>>>>>> I wonder what the typical number of elements is. Is this > >> sorting > >>>>>>> crossings per line? Then simple primitives like circles > should > >>>> only > >>>>>>> have 2 per line, right? Is it worth testing for really small > >>>>>> numbers > >>>>>>> of > >>>>>>> elements (much lower than 70) and doing a manual sort? Or am > I > >>>>>>> misunderstanding what is being sorted there? > >>>>>>> > >>>>>>>>> How comfortable do you feel with that conversion? > >>>>>>>> I'll try to do it and include it in a new patch along with, > >>>>>>> hopefully, a better way > >>>>>>>> to iterate through strips, and especially crossings. Right > now > >>>> all > >>>>>>> the iteration > >>>>>>>> state is saved in global variables. This is... not good. I > >> spent > >>>>>> far > >>>>>>> too much time last > >>>>>>>> week on bugs caused by this sort of thing. Ideally, any > >> members > >>>>>> that > >>>>>>> can't be put at > >>>>>>>> the top of a class (like our edge and crossing data) should > be > >>>> put > >>>>>>> in classes of their own. > >>>>>>> > >>>>>>> That sounds good, but also consider doing it in separate > stages > >>>> to > >>>>>>> reduce churn in the code reviewing (and you then have revs to > >> go > >>>>>> back > >>>>>>> and test which stage caused a probem if we find a bug later): > >>>>>>> > >>>>>>> - first get all on floats > >>>>>>> - then change strip management > >>>>>>> - then change to open coordinate intervals > >>>>>>> - (or vice versa) > >>>>>>> > >>>>>>>> Do you have any ideas about how to iterate through edges in > a > >>>>>> strip > >>>>>>> without going through > >>>>>>>> every edge? I was thinking of maybe using some sort of tree > to > >>>>>> split > >>>>>>> the drawing surface, > >>>>>>>> but I haven't given it much thought. > >>>>>>> If you look for something like the native code for > >>>>>>> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I > >>>> typically > >>>>>>> like to do edge setup and enumeration. > >>>>>>> > >>>>>>> That code uses the "half open interval" approach for both X > and > >> Y > >>>>>>> intervals. > >>>>>>> > >>>>>>> I then sort the edge list by "leading Y" and then move > through > >>>> the > >>>>>>> edges > >>>>>>> using the following manner (kind of like an inch worm eating > >> the > >>>>>>> edges, > >>>>>>> maintaining a pointer to the beginning and end of an "active > >>>> list" > >>>>>> of > >>>>>>> edges that are in play at any given time by virtue of having > >> their > >>>> Y > >>>>>>> range intersect the current sampling Y). Note that I use an > >>>> array > >>>>>> of > >>>>>>> edges and then a parallel array of pointers into the edges so > >> that > >>>> I > >>>>>>> can > >>>>>>> sort just the pointers and avoid moving tons of edge data > >> around. > >>>>>>> Also, > >>>>>>> later when I eliminate edges from the active list I just move > >>>> their > >>>>>>> pointers into and out of view rather than having to copy the > >> edge > >>>>>>> data. > >>>>>>> It is harder to do an array of pointers in Java, though - > >>>> perhaps > >>>>>> an > >>>>>>> array of indices? Here is some basic pseudocode: > >>>>>>> > >>>>>>> start with lo and hi pointing at index 0 in edge list. > >>>>>>> until edge list is exhausted { > >>>>>>> process edges between lo and hi (empty on first pass) > >>>>>>> scan from hi to lo and toss any edges that are > exhausted > >>>>>>> (compacting remaining pointers towards hi) > >>>>>>> keep incrementing hi to accept any new edges coming > into > >>>> play > >>>>>>> process edges between lo and hi to increment to the > next > >> Y > >>>>>>> (note that new edges from previous step may not > >>>>>>> need any processing in this stage if their > starting > >>>>>>> Y equals the next Y to be sampled) > >>>>>>> } > >>>>>>> > >>>>>>> Gradually lo and hi make their way through the list. The > edges > >>>>>> above > >>>>>>> hi > >>>>>>> are always sorted in order of when they may come into play as > >> we > >>>>>> move > >>>>>>> downward in Y. The edges between lo and hi are also usually > >> kept > >>>>>>> sorted > >>>>>>> by their current X so that the stage that processes them into > >>>> spans > >>>>>>> can > >>>>>>> just run through them. The edges below lo are usually random > >>>>>> garbage > >>>>>>> because no care was taken during the "pruning" step to worry > >>>> about > >>>>>>> what > >>>>>>> happens to the pointer table down there as lo is incremented > >> (the > >>>>>>> algorithm only ever looks up the array of pointers). > >>>>>>> > >>>>>>> I hope that helps... > >>>>>>> > >>>>>>> ...jim