On 10/26/2010 6:58 AM, Denis Lila wrote:
If we are really worried about the y-sort, then how about creating a
bunch of buckets and doing a bucket sort of the edges? As they are
added to the list of segments, we accumulate their indices in a row
list based on their startY so that each step of the "next()" simply moves
to the next Y and adds the edges mentioned in the list there. Some work
would have to be done on how to manage the storage simply (like a
"rownext" field in the edge structure so that they are linked in a
linked list), but then there would be no qsort at all for the cost of
new int[N_ROWS] and an extra field in every edge.
I like this.
It's even more frugal if combined with my idea that an entire full pixel
row could be processed in one batch rather than processing each
sub-pixel row separately.
90% (guesstimate) of the time edges do not cross each other, thus if
you sort the crossings without reordering the active edges then you just
end up doing the same sorting work (same swaps) on the next scanline. My
SpanShapeIterator code actually reordered the edges on every sample
line to match their current X coordinates in a way that involved 1 compare
per edge that was processed and only occasionally a swap of 2 edge
pointers. It would basically eliminate the sort at line 149 at the
cost of only doing a compare in the nextY processing for the very very
common case.
I also implemented this some time ago. I didn't keep it because it slowed
things down a bit. However, I only tested it with very complex and large
paths, and in hindsight, I shouldn't have based my decision on that, so I
will re-implement it.
First, this is less helpful if we process the edges/quads/curves in 3
separate batches because you'd still have 3 sets of crossings to
interweave and so some sorting would still be necessary, but keeping
each of the 3 sets sorted amongst themselves might still help with
reducing the swapping work of the sorting step. This technique would
work best if all of the edge types were processed from one list.
Also, be sure to test with both clockwise and couterclockwise versions
of the test paths just in case. The (constantly re)sorting hit will be
more expensive with one than the other since one of them will have the
edges already in the sorted order and the other would require maximum
swaps. I'd check:
Simple CW rectangle
Simple CCW rectangle
CW and CCW rectangles with 500(?) zig-zags along top edge
CW and CCW rectangles with 500(?) zig-zags along both top and bottom
And a note related to another suggestion I'd made, I think if we did
something like "process an entire pixel row at once" it would complicate
the "auto-edge sorting" a bit. If every curve is iterated to the bottom
of the pixel in turn before moving to the next edge then its crossings
may swap with another edge's crossings, but that next edge would not be
processed yet and then there may be a mish-mosh of N subpixel values
from the two edges to interweave. If it was done by processing each
subpixel row in turn then the edges could be kept sorted as the subpixel
rows were iterated, but it would reduce the savings on the overhead of
swapping between edges. The best compromise might be to process every
curve fully, verify its sorted position in the edge array using its X
coordinate at the bottom of the pixel, still sort the whole pixel row's
worth of crossing values, but hopefully the sorting would have very
little work to do if the original values were "mostly sorted" by having
kept the edges in order at pixel boundaries (a "mostly sorted" friendly
sort algorithm, like insertion sort, might be needed).
...jim