Hi Jim. > Just to be certain - you are still planning on putting the existing > stuff back and we're talking about future work, right? I'd love to > get a stake in the ground here.
Yes, I'll push today. > If we are really worried about the y-sort, then how about creating a > bunch of buckets and doing a bucket sort of the edges? As they are > added to the list of segments, we accumulate their indices in a row > list based on their startY so that each step of the "next()" simply moves > to the next Y and adds the edges mentioned in the list there. Some work > would have to be done on how to manage the storage simply (like a > "rownext" field in the edge structure so that they are linked in a > linked list), but then there would be no qsort at all for the cost of > new int[N_ROWS] and an extra field in every edge. I like this. > Perhaps we should work on the algorithms a little more then (I'm > talking about the numeric stuff, not the memory management stuff since the > memory management techniques will differ quite a lot in C code, but > better math helps at either level)? Indeed - especially the dashing. > 90% (guesstimate) of the time edges do not cross each other, thus if > you sort the crossings without reordering the active edges then you just > end up doing the same sorting work (same swaps) on the next scanline. My > SpanShapeIterator code actually reordered the edges on every sample > line to match their current X coordinates in a way that involved 1 compare > per edge that was processed and only occasionally a swap of 2 edge > pointers. It would basically eliminate the sort at line 149 at the > cost of only doing a compare in the nextY processing for the very very > common case. I also implemented this some time ago. I didn't keep it because it slowed things down a bit. However, I only tested it with very complex and large paths, and in hindsight, I shouldn't have based my decision on that, so I will re-implement it. Regards, Denis. ----- "Jim Graham" <james.gra...@oracle.com> wrote: > Hi Denis, > > On 10/25/2010 3:30 PM, Denis Lila wrote: > >> - Create a curve class and store an array of those so you don't > have > >> to iterate 3 different arrays of values and use array accesses to > grab > >> the data (each array access is checked for index OOB exceptions). > > > > I actually implemented something like this in my first draft of the > current > > version of renderer. I didn't stick with it because it was a slower > than > > even what we have in openjdk6. Granted, my implementation was a bit > more > > high level than simply making 3 classes to represent lines quads and > cubics, > > but it seemed pretty hopeless, so I didn't spend any time figuring > out > > exactly what it was that made it slower. > > Hmmm... Perhaps object allocation overhead was biting us there. In > native code you could cobble this up with batch allocation of space > and > pseudo-object/struct allocation out of the batches. > > >> - Or perform AFD on curves as they come into Renderer, but only > save > >> line segment edges in the edges array. This would use more > storage, > >> but the iterations of the AFD would happen in a tight loop as the > data > >> comes in rather than having to store all of their AFD data back in > the quads > > > > I considered this before abandoning the high level version I mention > above. > > I didn't go with it because, while I am not worried about the higher > memory > > consumption, I was afraid of the impact that having this many edges > would > > have on the qsort call and on lines 99-113 and 140-148 in next(). > > I can see worrying about qsort, but I think one qsort would be > inherently faster than 3 qsorts which you have anyway so the > difference > would get lost in the noise. Also, I'm not sure how the 99 to 113 and > > 140 to 148 would be affected. The path will have the same number of > crossings per sample row regardless of whether the curves have been > flattened or not. You might be adding and deleting edges from the > active list more often, though (in other words, 99 would dump more > curves and 140 would take in more curves), but the number of edges or > > curves at any given point would not be affected by flattening. Also, > > the way you've written the loops at 99, they have to copy every > edge/quad/curve that *doesn't* expire so a skipped curve is actually > less work for that loop. The loops at 140 would have to occasionally > do > more processing, but it is made up for in the fact that 99 does less > work and the "nextY" processing is simpler. > > > How about this: we change the format of the edges array to be an > array of > > sequences of edges. So, right now the edges array has this format: > > E* where E represents 6 consecutive floats > {ymin,ymax,curx,cury,or,slope}. > > I am proposing we change it to > {n,or}({ymin,ymax,curx,cury,slope})^n. > > lineTo's would add an edge sequence with n=1 to the edges array. If > a call > > to quadTo or curveTo produced N curves, it would simply put N,or at > the > > end of the edges array, and then append the data for the N produced > edges. > > I think this would give us the best of both worlds, in that we can > do all > > the AFD iterations in a tight loop close to the input methods and > it > > doesn't present any problems with respect to sorting or managing > the > > active list. It can probably be implemented completely transparently > with > > respect to ScanlineIterator. The details of > > the implementation involve an interesting speed/memory trade off, > but we > > can discuss that later. > > I think this might be overkill since sorts tend to have logN behavior > so > doubling the number of edges would not double the time taken in the > sort. Also, I would think that the sort would be a small amount of > time > compared to the rest of the processing, wasn't it? > > >> - Convert to native. Note that we use a native version of the > pisces > >> that you started with to do some rendering in FX. I tried porting > to > >> use your new (Java) renderer in FX and performance went down even > >> though you show it to be faster than what was there before. So, > your Java > >> renderer compares favorably to the old Java pisces, but both > compare > >> unfavorably to the old native pisces. Maybe we should convert > your > >> code to native and see if that gives us a performance boost. It's > nice to > >> use pure Java, but there is a lot of "shoehorning" of data going > on > >> here that could be done much more easily and naturally in native > code. > > > > I've been wanting to do this for a very long time. C and C++ are > more > > convenient for this type of work. I didn't because I've never used > the JNI > > before. I guess this is as good a time to learn as any. I'd still > like to > > keep the debugging of native code to a minimum, so we should > implement > > as much as possible in java before starting on this. > > >> How is that for "food for thought"? > > Delicious :) > > <yummy-happy-face> > > Here's another idea: > > And another non-idea: > > I tried inlining all of the work that lineTo did and it ended up > slower > than what you do where you turn it into a generic curve of length 4 in > a > points array. I scratched my head, but didn't pursue why that would > be > slower. It might have been because I had a large chunk of nearly > duplicate code in both halves of an "if (y0 < y1)" conditional rather > > than simply swapping the coordinates and using common code - maybe it > > was an instruction cache thing. > > The test case I was using to test performance was only using polygons > so > the curve stuff didn't even come into play... > > ...jim