Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Jim Graham

Hi Denis,

It looks fine.  Hopefully we can eventually figure out why the sorting 
on the fly didn't pan out.


Denis Lila wrote:
Hi Jim. 


Thanks for all your suggestions. I fixed the edge array indexing
issue, the moveTo bug (not assigning x0), and the double
initialization issue. I also improved the emission of the last row
to what you said. The link is the same as the last one I sent.


I'm guessing the test for "y == boundsMaxY-1" at line 470 could probably 
also be deleted now (since it will be handled by the end test when y 
reaches maxY)?


But everything looks in order!

...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Denis Lila
Hi Jim. 

Thanks for all your suggestions. I fixed the edge array indexing
issue, the moveTo bug (not assigning x0), and the double
initialization issue. I also improved the emission of the last row
to what you said. The link is the same as the last one I sent.

I considered one more optimization for the scan line iteration:
instead of keeping a crossings array and Arrays.sort'ing it for 
every scanline, utilize the fact that the crossings information is
already in the active list, and doesn't need to be copied, so we could
just keep the active list sorted by increasing x crossing (this is the
way things are done in ShapeSpanIterator, which is where I got the idea from).
An added benefit of this (so I thought) would be that the order of the
edges sorted by x crossing would not change very often, and not very much,
in which case keeping it sorted by insertion sort would have (on average)
linear time complexity per scan line, instead of nlogn which is what 
Arrays.sort has.
After implementing this, I tested it, and unfortunately it turns out not
to be as good as I expected. It is faster in some tests, but it is a lot
slower in some others. 
It's a shame because it simplified things quite a bit (the file became
just 440 lines). With it I was able to remove about 4 or 5 variables from 
Renderer,
flips among them so we no longer needed to keep track of orientations, since
we only do that to determine the size of the crossings array.

> For now, it might be simpler to ignore this and revisit that later
> since it isn't going to be common to have large such jumps in the
> middle of most paths.

I agree. Not only that, but even if there are gaps I don't think it
will be much of a problem since very little work has to be done
to traverse scan lines with no crossings.

> Filling the alpha row.  I had an interesting optimization here that I
> 
> never got around to.  Instead of filling the array entries with the 
> alpha values, fill them with the deltas of the alpha values.  The
> inner 
> loop in endRendering then becomes something like:
> 
>  alpha[pix_x0  ] += NUM_POS - (x0 & MASK);
>  alpha[pix_x0+1] += (x0 & MASK);
>  alpha[pix_x1  ] -= NUM_POS - (x1 & MASK);
>  alpha[pix_x1+1] -= (x1 & MASK);

That's a good idea. I implemented it but I think we should use it whether
it offers big savings or not. It's just a nicer algorithm.
Although, the way it is now, it is a bit misleading (the name of the array is
alpha). I should have commented this a bit more. I don't have time now, I'll do
it on Monday.

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 
> More thoughts on Renderer.java.
> 
> -- Skipping gaps (minor optimization) --
> 
> If there is a gap in the edges in Y, say if a path consists of two 
> subpaths, one that covers y=[0..10] and another that covers 
> y=[1000..1010], then I think you will iterate over each y value from
> 10 
> to 1000, but have no work to do on each scan line.  That could
> possibly 
> waste a bit of time.
> 
> On the other hand, fixing that would have to take into account whether
> 
> or not you are done with a given alpha row, so the "NextY" function 
> can't simply skip - the skipping has to be done at the higher level in
> 
> endRendering - or at least with the cooperation of endRendering. 
> Since 
> it asks what the "current Y" is near the top of the loop, it could 
> detect if the current Y jumped out of the given alpha row, emit it,
> and 
> prepare for a new alpha row starting at the new Y...?
> 

> 
> -- Done with skipping gaps --
> 
> -- Alpha accumulation opt --
> 

> 
> The [pix+1] lines were the gotcha that always confused me when I tried
> 
> to do this in the past.  Basically if you enter an inside region at 1
> 
> subpixel before the end of a pixel then you need to add one to the 
> coverage for that pixel.  But, starting with the next pixel you want
> the 
> total contribution of the interior region to be NUM_POS per pixel, but
> 
> you've only added 1 so far - so you have to add the additional amount
> so 
> that the total amount added for each "enter" crossing ends up being 
> NUM_POS.  Similarly, when you decrement for the "exit" crossing you
> need 
> to ensure that the total negative delta sums to NUM_POS across the
> pixel 
> where the exit happens and the following pixel.  Does that make
> sense?
> 
> Note that you could still have the "single pixel" optimization which 
> would look like:
> 
>  alpha[pix_x0  ] += (x1 - x0);
>  alpha[pix_x0+1] -= (x1 - x0);
> 
> and is equivalent to the above 4 lines.
> 
> then when you do the RLE, you simply have to use an accumulation as
> you 
> scan the alpha row:
> 
>  byte nextVal = startVal + alphaRow[i];
> 
> So, for the cost of an add per pixel as you do the RLE you can avoid 
> having to do the loop in the "multiple pixel section" when filling the
> 
> alpha array.
> 
> I don't think I've ever quantified this optimization so it might be 
> better to investigate 

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Jim Graham
Just to clarify.  My second message that I just sent mostly contained 
some additional optimizations to consider for now or later, but this 
message below contained at least one (maybe 2) thing(s) that looked like 
a bug and a few maintenance issues that I think should be done before 
finalizing this set of changes...


...jim

On 7/29/2010 5:27 PM, Jim Graham wrote:

Hi Denis,

The changes look fine and I'm moving on to Renderer.java...

I'd make as many methods private as you can. It looks like all your new
methods are private, but some old methods are still public even though I
don't think they are accessed elsewhere (like addEdge()?). I think
"private" is enough for Hotspot to inline so that should help
performance a bit. I usually use "final", but I think "private" does the
same thing.

You should create constants for the indices into the struct to make the
code more readable (and to simplify updating the EDGE layout):

X0_OFF = 0
Y0_OFF = 1
// etc.

I don't think you touch X0 and Y0 after initializing them and then using
them for the first setCurY so you could just use them as the "curX" and
"curY" and save 2 slots in the table.

You initialize edges twice - once when it is declared, and once in the
constructor. Note that the initialization where it is declared uses
INIT_SIZE which is not a multiple of EDGE size anyway.

It looks like there is a missing statement in moveTo to initialize
this.x0...?

I need some more time on it, but I thought I would send along these
comments in the meantime...

...jim

Denis Lila wrote:

Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Jim Graham

Hi Denis,

More thoughts on Renderer.java.

-- Skipping gaps (minor optimization) --

If there is a gap in the edges in Y, say if a path consists of two 
subpaths, one that covers y=[0..10] and another that covers 
y=[1000..1010], then I think you will iterate over each y value from 10 
to 1000, but have no work to do on each scan line.  That could possibly 
waste a bit of time.


On the other hand, fixing that would have to take into account whether 
or not you are done with a given alpha row, so the "NextY" function 
can't simply skip - the skipping has to be done at the higher level in 
endRendering - or at least with the cooperation of endRendering.  Since 
it asks what the "current Y" is near the top of the loop, it could 
detect if the current Y jumped out of the given alpha row, emit it, and 
prepare for a new alpha row starting at the new Y...?


For now, it might be simpler to ignore this and revisit that later since 
it isn't going to be common to have large such jumps in the middle of 
most paths.


-- Done with skipping gaps --

-- Alpha accumulation opt --

Filling the alpha row.  I had an interesting optimization here that I 
never got around to.  Instead of filling the array entries with the 
alpha values, fill them with the deltas of the alpha values.  The inner 
loop in endRendering then becomes something like:


alpha[pix_x0  ] += NUM_POS - (x0 & MASK);
alpha[pix_x0+1] += (x0 & MASK);
alpha[pix_x1  ] -= NUM_POS - (x1 & MASK);
alpha[pix_x1+1] -= (x1 & MASK);

The [pix+1] lines were the gotcha that always confused me when I tried 
to do this in the past.  Basically if you enter an inside region at 1 
subpixel before the end of a pixel then you need to add one to the 
coverage for that pixel.  But, starting with the next pixel you want the 
total contribution of the interior region to be NUM_POS per pixel, but 
you've only added 1 so far - so you have to add the additional amount so 
that the total amount added for each "enter" crossing ends up being 
NUM_POS.  Similarly, when you decrement for the "exit" crossing you need 
to ensure that the total negative delta sums to NUM_POS across the pixel 
where the exit happens and the following pixel.  Does that make sense?


Note that you could still have the "single pixel" optimization which 
would look like:


alpha[pix_x0  ] += (x1 - x0);
alpha[pix_x0+1] -= (x1 - x0);

and is equivalent to the above 4 lines.

then when you do the RLE, you simply have to use an accumulation as you 
scan the alpha row:


byte nextVal = startVal + alphaRow[i];

So, for the cost of an add per pixel as you do the RLE you can avoid 
having to do the loop in the "multiple pixel section" when filling the 
alpha array.


I don't think I've ever quantified this optimization so it might be 
better to investigate it after the current code goes in and adopt it 
only if it shows a big savings.


-- Done with alpha accumulation opt --

Something about the "emit last row" code bothers me.  First, I would 
guess that the "y == boundsMaxY - 1" test would have already flushed the 
row, right?  Also, why do the for loop?  Why not just flush the row if 
it isn't flushed?  I kind of feel like the "y == boundsMaxY-1" test 
could be removed from inside the loop and simply test if there is 
unflushed data in the alpha array then just make a call to emitRow() 
with the appropriate values after the loop.


while (hasNext()) {
...
if ((y & MASK) == MASK) {
emitRow(...);
pix_min,max = MAX,MIN;
}
}
if (pix_min < pix_max) {
emitRow(...);
}

Am I missing something?

So, the upshot is that I didn't find anything wrong.  You can take or 
leave my suggestions for improvements as you see fit and maybe save some 
of them for a second round after this goes back...


...jim

On 7/29/2010 2:16 PM, Denis Lila wrote:

Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at 
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.