Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-06 Thread Jim Graham
To the end of getting rid of "flips" - all they are used for is to 
resize the crossings array, right?  This is not a huge array that costs 
a lot to resize - why not simply use a default array of, say, 30 
elements and then resize it if we ever have more crossings than that? 
Only very complicated paths would have more than 30 crossings to track. 
 The check for array length is only needed once per scanline since we 
know how many active edges are on each scan line (hi-lo) and you can 
only have 1 crossing per active edge so with one test per scanline we 
can keep the crossings array within range...


...jim

Jim Graham wrote:

Hi Denis,

Well, I guess it's a good thing that Java2Demo had a path like that in 
it - not a very common case, so it's good we found it!


The fix looks fine.  It still seems like there is way more logic there 
than is needed - hopefully if we can get rid of flips at some point, 
much of it will go away.


Fixes look good to go to me...

...jim

On 8/5/2010 3:58 PM, Denis Lila wrote:

Hi Jim.

I didn't know about Java2Demo. If I did I would have run it sooner.
But I ran it a few hours ago, and everything looked fine (surprisingly
high fps too) until I got to the append test.

Apparently I introduced a bug when solving the "2 consecutive moveTos 
bug".

Basically, when there's a close() after a horizontal lineTo(), the lineTo
in close() won't be executed because it's inside the if 
(firstOrientation != 0)
test. So instead of going back to the starting point, close will stay 
where

it is, which will draw a triangle above the rectangle.

I fixed this by introducing a variable that keeps track of the last 
method
called (lineTo, moveTo, or close), and instead of checking for 
firstOrientation != 0

in close(), I check for (last == LINE_TO).

webrev (hopefully final):
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

I'm sorry about this. I wish I had known about Java2Demo sooner.

Thanks,
Denis.

- "Jim Graham"  wrote:


Hi Denis,

That's great!  I just did a last minute double-check of your last
(final) webrevs to be sure.

Have you tested Java2Demo with these changes?  I'd also run any
regression tests you can find with the changes.  If there are no
problems there, then you are good to go to push it...

...jim

On 8/5/2010 8:08 AM, Denis Lila wrote:

Hello.


Are you a registered OpenJDK developer?

I am now.
Can I go ahead and push it?

Regards,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-06 Thread Jim Graham

Hi Denis,

Well, I guess it's a good thing that Java2Demo had a path like that in 
it - not a very common case, so it's good we found it!


The fix looks fine.  It still seems like there is way more logic there 
than is needed - hopefully if we can get rid of flips at some point, 
much of it will go away.


Fixes look good to go to me...

...jim

On 8/5/2010 3:58 PM, Denis Lila wrote:

Hi Jim.

I didn't know about Java2Demo. If I did I would have run it sooner.
But I ran it a few hours ago, and everything looked fine (surprisingly
high fps too) until I got to the append test.

Apparently I introduced a bug when solving the "2 consecutive moveTos bug".
Basically, when there's a close() after a horizontal lineTo(), the lineTo
in close() won't be executed because it's inside the if (firstOrientation != 0)
test. So instead of going back to the starting point, close will stay where
it is, which will draw a triangle above the rectangle.

I fixed this by introducing a variable that keeps track of the last method
called (lineTo, moveTo, or close), and instead of checking for firstOrientation 
!= 0
in close(), I check for (last == LINE_TO).

webrev (hopefully final):
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

I'm sorry about this. I wish I had known about Java2Demo sooner.

Thanks,
Denis.

- "Jim Graham"  wrote:


Hi Denis,

That's great!  I just did a last minute double-check of your last
(final) webrevs to be sure.

Have you tested Java2Demo with these changes?  I'd also run any
regression tests you can find with the changes.  If there are no
problems there, then you are good to go to push it...

...jim

On 8/5/2010 8:08 AM, Denis Lila wrote:

Hello.


Are you a registered OpenJDK developer?

I am now.
Can I go ahead and push it?

Regards,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-05 Thread Denis Lila
Hi Jim.

I didn't know about Java2Demo. If I did I would have run it sooner.
But I ran it a few hours ago, and everything looked fine (surprisingly 
high fps too) until I got to the append test.

Apparently I introduced a bug when solving the "2 consecutive moveTos bug".
Basically, when there's a close() after a horizontal lineTo(), the lineTo
in close() won't be executed because it's inside the if (firstOrientation != 0)
test. So instead of going back to the starting point, close will stay where
it is, which will draw a triangle above the rectangle.

I fixed this by introducing a variable that keeps track of the last method
called (lineTo, moveTo, or close), and instead of checking for firstOrientation 
!= 0
in close(), I check for (last == LINE_TO).

webrev (hopefully final):
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

I'm sorry about this. I wish I had known about Java2Demo sooner.

Thanks,
Denis. 

- "Jim Graham"  wrote:

> Hi Denis,
> 
> That's great!  I just did a last minute double-check of your last 
> (final) webrevs to be sure.
> 
> Have you tested Java2Demo with these changes?  I'd also run any 
> regression tests you can find with the changes.  If there are no 
> problems there, then you are good to go to push it...
> 
>   ...jim
> 
> On 8/5/2010 8:08 AM, Denis Lila wrote:
> > Hello.
> >
> >> Are you a registered OpenJDK developer?
> > I am now.
> > Can I go ahead and push it?
> >
> > Regards,
> > Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-05 Thread Jim Graham

Hi Denis,

That's great!  I just did a last minute double-check of your last 
(final) webrevs to be sure.


Have you tested Java2Demo with these changes?  I'd also run any 
regression tests you can find with the changes.  If there are no 
problems there, then you are good to go to push it...


...jim

On 8/5/2010 8:08 AM, Denis Lila wrote:

Hello.


Are you a registered OpenJDK developer?

I am now.
Can I go ahead and push it?

Regards,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-05 Thread Denis Lila
Hello.

> Are you a registered OpenJDK developer?
I am now.
Can I go ahead and push it?

Regards,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-04 Thread Denis Lila
Hello Jim.

> I'd need to see a final webrev and approve it and then anyone with an
> OpenJDK user id can push it.  
The final webrev is 
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/
The only things that have changed since the last one you commented on
are the "y == boundsMaxY - 1" check, and a few comments in Renderer.java.

> I'll also have to create a bugid for the change set.
I've already submitted a bug that this patch would fix. The ID is 6967436.

> Are you a registered OpenJDK developer?
Not just yet. I will be soon, I think.

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,

> 
> 
>   ...jim
> 
> Denis Lila wrote:
> > Hello Jim.
> > 
> >> I'm guessing the test for "y == boundsMaxY-1" at line 470 could
> >> probably also be deleted now (since it will be handled by the end 
> >> test when y reaches maxY)?
> > 
> > Indeed it can. I've done this, and also updated a few comments that
> > might have been misleading (they were written before certain
> changes
> > were made).
> > 
> >> It looks fine.  Hopefully we can eventually figure out why the
> sorting
> >> on the fly didn't pan out.
> > 
> > Wonderful! So, will it be committed?
> > 
> > Thanks,
> > Denis.
> > 
> > - "Jim Graham"  wrote:
> > 
> >> Hi Denis,
> >>
> > 
> >> Denis Lila wrote:
> >>> Hi Jim. 
> >>>
> >>> Thanks for all your suggestions. I fixed the edge array
> >> indexing
> >>> issue, the moveTo bug (not assigning x0), and the double
> >>> initialization issue. I also improved the emission of the last
> row
> >>> to what you said. The link is the same as the last one I sent.
> > 
> >> But everything looks in order!
> >>
> >>...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-03 Thread Jim Graham

Hi Denis,

I'd need to see a final webrev and approve it and then anyone with an 
OpenJDK user id can push it.  I'll also have to create a bugid for the 
change set.


Are you a registered OpenJDK developer?

...jim

Denis Lila wrote:

Hello Jim.


I'm guessing the test for "y == boundsMaxY-1" at line 470 could
probably also be deleted now (since it will be handled by the end 
test when y reaches maxY)?


Indeed it can. I've done this, and also updated a few comments that
might have been misleading (they were written before certain changes
were made).


It looks fine.  Hopefully we can eventually figure out why the sorting
on the fly didn't pan out.


Wonderful! So, will it be committed?

Thanks,
Denis.

- "Jim Graham"  wrote:


Hi Denis,




Denis Lila wrote:
Hi Jim. 


Thanks for all your suggestions. I fixed the edge array

indexing

issue, the moveTo bug (not assigning x0), and the double
initialization issue. I also improved the emission of the last row
to what you said. The link is the same as the last one I sent.



But everything looks in order!

...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-08-02 Thread Denis Lila
Hello Jim.

> I'm guessing the test for "y == boundsMaxY-1" at line 470 could
> probably also be deleted now (since it will be handled by the end 
> test when y reaches maxY)?

Indeed it can. I've done this, and also updated a few comments that
might have been misleading (they were written before certain changes
were made).

> It looks fine.  Hopefully we can eventually figure out why the sorting
> on the fly didn't pan out.

Wonderful! So, will it be committed?

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 

> 
> Denis Lila wrote:
> > Hi Jim. 
> > 
> > Thanks for all your suggestions. I fixed the edge array
> indexing
> > issue, the moveTo bug (not assigning x0), and the double
> > initialization issue. I also improved the emission of the last row
> > to what you said. The link is the same as the last one I sent.
> 

> 
> But everything looks in order!
> 
>   ...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Jim Graham

Hi Denis,

It looks fine.  Hopefully we can eventually figure out why the sorting 
on the fly didn't pan out.


Denis Lila wrote:
Hi Jim. 


Thanks for all your suggestions. I fixed the edge array indexing
issue, the moveTo bug (not assigning x0), and the double
initialization issue. I also improved the emission of the last row
to what you said. The link is the same as the last one I sent.


I'm guessing the test for "y == boundsMaxY-1" at line 470 could probably 
also be deleted now (since it will be handled by the end test when y 
reaches maxY)?


But everything looks in order!

...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Denis Lila
Hi Jim. 

Thanks for all your suggestions. I fixed the edge array indexing
issue, the moveTo bug (not assigning x0), and the double
initialization issue. I also improved the emission of the last row
to what you said. The link is the same as the last one I sent.

I considered one more optimization for the scan line iteration:
instead of keeping a crossings array and Arrays.sort'ing it for 
every scanline, utilize the fact that the crossings information is
already in the active list, and doesn't need to be copied, so we could
just keep the active list sorted by increasing x crossing (this is the
way things are done in ShapeSpanIterator, which is where I got the idea from).
An added benefit of this (so I thought) would be that the order of the
edges sorted by x crossing would not change very often, and not very much,
in which case keeping it sorted by insertion sort would have (on average)
linear time complexity per scan line, instead of nlogn which is what 
Arrays.sort has.
After implementing this, I tested it, and unfortunately it turns out not
to be as good as I expected. It is faster in some tests, but it is a lot
slower in some others. 
It's a shame because it simplified things quite a bit (the file became
just 440 lines). With it I was able to remove about 4 or 5 variables from 
Renderer,
flips among them so we no longer needed to keep track of orientations, since
we only do that to determine the size of the crossings array.

> For now, it might be simpler to ignore this and revisit that later
> since it isn't going to be common to have large such jumps in the
> middle of most paths.

I agree. Not only that, but even if there are gaps I don't think it
will be much of a problem since very little work has to be done
to traverse scan lines with no crossings.

> Filling the alpha row.  I had an interesting optimization here that I
> 
> never got around to.  Instead of filling the array entries with the 
> alpha values, fill them with the deltas of the alpha values.  The
> inner 
> loop in endRendering then becomes something like:
> 
>  alpha[pix_x0  ] += NUM_POS - (x0 & MASK);
>  alpha[pix_x0+1] += (x0 & MASK);
>  alpha[pix_x1  ] -= NUM_POS - (x1 & MASK);
>  alpha[pix_x1+1] -= (x1 & MASK);

That's a good idea. I implemented it but I think we should use it whether
it offers big savings or not. It's just a nicer algorithm.
Although, the way it is now, it is a bit misleading (the name of the array is
alpha). I should have commented this a bit more. I don't have time now, I'll do
it on Monday.

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 
> More thoughts on Renderer.java.
> 
> -- Skipping gaps (minor optimization) --
> 
> If there is a gap in the edges in Y, say if a path consists of two 
> subpaths, one that covers y=[0..10] and another that covers 
> y=[1000..1010], then I think you will iterate over each y value from
> 10 
> to 1000, but have no work to do on each scan line.  That could
> possibly 
> waste a bit of time.
> 
> On the other hand, fixing that would have to take into account whether
> 
> or not you are done with a given alpha row, so the "NextY" function 
> can't simply skip - the skipping has to be done at the higher level in
> 
> endRendering - or at least with the cooperation of endRendering. 
> Since 
> it asks what the "current Y" is near the top of the loop, it could 
> detect if the current Y jumped out of the given alpha row, emit it,
> and 
> prepare for a new alpha row starting at the new Y...?
> 

> 
> -- Done with skipping gaps --
> 
> -- Alpha accumulation opt --
> 

> 
> The [pix+1] lines were the gotcha that always confused me when I tried
> 
> to do this in the past.  Basically if you enter an inside region at 1
> 
> subpixel before the end of a pixel then you need to add one to the 
> coverage for that pixel.  But, starting with the next pixel you want
> the 
> total contribution of the interior region to be NUM_POS per pixel, but
> 
> you've only added 1 so far - so you have to add the additional amount
> so 
> that the total amount added for each "enter" crossing ends up being 
> NUM_POS.  Similarly, when you decrement for the "exit" crossing you
> need 
> to ensure that the total negative delta sums to NUM_POS across the
> pixel 
> where the exit happens and the following pixel.  Does that make
> sense?
> 
> Note that you could still have the "single pixel" optimization which 
> would look like:
> 
>  alpha[pix_x0  ] += (x1 - x0);
>  alpha[pix_x0+1] -= (x1 - x0);
> 
> and is equivalent to the above 4 lines.
> 
> then when you do the RLE, you simply have to use an accumulation as
> you 
> scan the alpha row:
> 
>  byte nextVal = startVal + alphaRow[i];
> 
> So, for the cost of an add per pixel as you do the RLE you can avoid 
> having to do the loop in the "multiple pixel section" when filling the
> 
> alpha array.
> 
> I don't think I've ever quantified this optimization so it might be 
> better to investigate 

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Jim Graham
Just to clarify.  My second message that I just sent mostly contained 
some additional optimizations to consider for now or later, but this 
message below contained at least one (maybe 2) thing(s) that looked like 
a bug and a few maintenance issues that I think should be done before 
finalizing this set of changes...


...jim

On 7/29/2010 5:27 PM, Jim Graham wrote:

Hi Denis,

The changes look fine and I'm moving on to Renderer.java...

I'd make as many methods private as you can. It looks like all your new
methods are private, but some old methods are still public even though I
don't think they are accessed elsewhere (like addEdge()?). I think
"private" is enough for Hotspot to inline so that should help
performance a bit. I usually use "final", but I think "private" does the
same thing.

You should create constants for the indices into the struct to make the
code more readable (and to simplify updating the EDGE layout):

X0_OFF = 0
Y0_OFF = 1
// etc.

I don't think you touch X0 and Y0 after initializing them and then using
them for the first setCurY so you could just use them as the "curX" and
"curY" and save 2 slots in the table.

You initialize edges twice - once when it is declared, and once in the
constructor. Note that the initialization where it is declared uses
INIT_SIZE which is not a multiple of EDGE size anyway.

It looks like there is a missing statement in moveTo to initialize
this.x0...?

I need some more time on it, but I thought I would send along these
comments in the meantime...

...jim

Denis Lila wrote:

Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-30 Thread Jim Graham

Hi Denis,

More thoughts on Renderer.java.

-- Skipping gaps (minor optimization) --

If there is a gap in the edges in Y, say if a path consists of two 
subpaths, one that covers y=[0..10] and another that covers 
y=[1000..1010], then I think you will iterate over each y value from 10 
to 1000, but have no work to do on each scan line.  That could possibly 
waste a bit of time.


On the other hand, fixing that would have to take into account whether 
or not you are done with a given alpha row, so the "NextY" function 
can't simply skip - the skipping has to be done at the higher level in 
endRendering - or at least with the cooperation of endRendering.  Since 
it asks what the "current Y" is near the top of the loop, it could 
detect if the current Y jumped out of the given alpha row, emit it, and 
prepare for a new alpha row starting at the new Y...?


For now, it might be simpler to ignore this and revisit that later since 
it isn't going to be common to have large such jumps in the middle of 
most paths.


-- Done with skipping gaps --

-- Alpha accumulation opt --

Filling the alpha row.  I had an interesting optimization here that I 
never got around to.  Instead of filling the array entries with the 
alpha values, fill them with the deltas of the alpha values.  The inner 
loop in endRendering then becomes something like:


alpha[pix_x0  ] += NUM_POS - (x0 & MASK);
alpha[pix_x0+1] += (x0 & MASK);
alpha[pix_x1  ] -= NUM_POS - (x1 & MASK);
alpha[pix_x1+1] -= (x1 & MASK);

The [pix+1] lines were the gotcha that always confused me when I tried 
to do this in the past.  Basically if you enter an inside region at 1 
subpixel before the end of a pixel then you need to add one to the 
coverage for that pixel.  But, starting with the next pixel you want the 
total contribution of the interior region to be NUM_POS per pixel, but 
you've only added 1 so far - so you have to add the additional amount so 
that the total amount added for each "enter" crossing ends up being 
NUM_POS.  Similarly, when you decrement for the "exit" crossing you need 
to ensure that the total negative delta sums to NUM_POS across the pixel 
where the exit happens and the following pixel.  Does that make sense?


Note that you could still have the "single pixel" optimization which 
would look like:


alpha[pix_x0  ] += (x1 - x0);
alpha[pix_x0+1] -= (x1 - x0);

and is equivalent to the above 4 lines.

then when you do the RLE, you simply have to use an accumulation as you 
scan the alpha row:


byte nextVal = startVal + alphaRow[i];

So, for the cost of an add per pixel as you do the RLE you can avoid 
having to do the loop in the "multiple pixel section" when filling the 
alpha array.


I don't think I've ever quantified this optimization so it might be 
better to investigate it after the current code goes in and adopt it 
only if it shows a big savings.


-- Done with alpha accumulation opt --

Something about the "emit last row" code bothers me.  First, I would 
guess that the "y == boundsMaxY - 1" test would have already flushed the 
row, right?  Also, why do the for loop?  Why not just flush the row if 
it isn't flushed?  I kind of feel like the "y == boundsMaxY-1" test 
could be removed from inside the loop and simply test if there is 
unflushed data in the alpha array then just make a call to emitRow() 
with the appropriate values after the loop.


while (hasNext()) {
...
if ((y & MASK) == MASK) {
emitRow(...);
pix_min,max = MAX,MIN;
}
}
if (pix_min < pix_max) {
emitRow(...);
}

Am I missing something?

So, the upshot is that I didn't find anything wrong.  You can take or 
leave my suggestions for improvements as you see fit and maybe save some 
of them for a second round after this goes back...


...jim

On 7/29/2010 2:16 PM, Denis Lila wrote:

Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at 
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-29 Thread Jim Graham

Hi Denis,

The changes look fine and I'm moving on to Renderer.java...

I'd make as many methods private as you can.  It looks like all your new 
methods are private, but some old methods are still public even though I 
don't think they are accessed elsewhere (like addEdge()?).  I think 
"private" is enough for Hotspot to inline so that should help 
performance a bit.  I usually use "final", but I think "private" does 
the same thing.


You should create constants for the indices into the struct to make the 
code more readable (and to simplify updating the EDGE layout):


X0_OFF = 0
Y0_OFF = 1
// etc.

I don't think you touch X0 and Y0 after initializing them and then using 
them for the first setCurY so you could just use them as the "curX" and 
"curY" and save 2 slots in the table.


You initialize edges twice - once when it is declared, and once in the 
constructor.  Note that the initialization where it is declared uses 
INIT_SIZE which is not a multiple of EDGE size anyway.


It looks like there is a missing statement in moveTo to initialize 
this.x0...?


I need some more time on it, but I thought I would send along these 
comments in the meantime...


...jim

Denis Lila wrote:

Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at 
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-29 Thread Denis Lila
Hello Jim.

I made the changes you point out, except for your second point
(I don't have time to think about it right now).

I stopped precomputing m00_2_m01_2 because it was only being
used in one place. But I guess it worth it to precompute it
if it's going to be used often (and computeOffset is called
for every line).

The new webrev is at 
http://icedtea.classpath.org/~dlila/webrevs/fpBetterAAv2/webrev/

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 
> Only some minor comments so far:
> 
> Stroker.java:
> 
> - Should det be precomputed and saved?  (You calculate it in the 
> constructor anyway, but don't save it.)
> - Should test for uniform scale be precomputed?
> - (Is test for uniform scale too strict?  Can a rotated uniform scale
> 
> use the same code as upright uniform scale?)
> - Why are m00_2_m01_2 et al no longer precomputed (they only need to
> be 
> precomputed if scale is not uniform which isn't common)?
> - lineLength method is "user space length" isn't it? - a more 
> descriptive name might help avoid confusion if someone is modifying
> code 
> here later.
> - line 614 - missing space
> 
> Dasher.java:
> 
> - Line 187 - use "leftInThisDashSegment" here?
> 
> I still have to look at Renderer.java in depth, but I thought I'd send
> 
> these minor comments along while they are fresh in my email buffer...
> 
>   ...jim
> 
> Denis Lila wrote:
> > Hello.
> > 
> > And, here it is:
> http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/
> > 
> > Just as I thought, it's way faster than the original. I made a new
> > test, which is supposed to be more realistic than drawing 3
> length
> > lines. It consists of splitting a 1000x1000 frame in 100 10x10
> squares
> > and in each square drawing 20 or so curves with random
> start/end/control
> > points in that square. In this case it's at least twice faster
> (~0.85
> > versus ~1.95 seconds).
> > 
> > Unfortunately I've had to do a lot of the same optimizations that I
> was
> > trying to remove (i.e. making everything a global variable). I
> tried
> > writing a higher level version of it, but it completely negated the
> > performance gains of the new algorithm. Anyway, it's still not as
> bad
> > as before because the algorithm is inherently clearer and we only
> iterate
> > through scan lines instead of iterating through strips and then
> scanlines
> > in that strip, and then edges, all in the same method.
> > 
> > It's also better organized, and logically separate parts of the code
> don't
> > really touch each other's variables much. And it's definitely
> better
> > commented. 
> > 
> > I also made some minor changes outside of Renderer that did not
> appear in
> > the first version of this patch last week:
> > 1. I fixed the issue Jim pointed out in Dasher, where if 
> > (origLen == leftInThisDashSegment) the dash index was not being
> incremented.
> > 2. In dasher, I no longer copy the dash array.
> > 3. I removed files PiscesMath.java and Transform4.java because they
> are no
> > longer needed.
> > 
> > Thanks,
> > Denis.
> > 
> > - "Jim Graham"  wrote:
> > 
> >> Woohoo, Denis!  I look forward to seeing the new version!
> >>
> >>...jim
> >>
> >> On 7/28/2010 5:51 AM, Denis Lila wrote:
> >>> Hello Jim.
> >>>
> >>> This one performs almost identically to what is already there
> >>> in openjdk6 and 7, since it's exactly what I sent for review
> >>> last week, but with all the changes you suggested implemented.
> >>>
> >>> I would actually like to ask you to not look at either one of
> >>> them. First of all, there is an ArrayIndexOutOfBoundsException
> >>> possible in emitRow.
> >>> And secondly, even if there wasn't, last night I implemented
> >>> your algorithm from ShapeSpanIterator.c to iterate through
> >>> the edges. I have yet to debug it, but it makes everything much,
> >>> much simpler, and it should make it far faster, so we get the
> best
> >>> of both worlds.
> >>>
> >>> Thanks,
> >>> Denis.
> >>>
> >>> - "Jim Graham"  wrote:
> >>>
>  Hi Denis,
> 
>  I'll try to get through both versions and see if I can find
> >> anything
>  that was hurting performance with your EdgeLists.  I'm guessing
> >> that
>  this version was created because of the performance issues you
> >> found
>  with the EdgeList version?  Does this perform more closely to
> the
>  existing code than the EdgeList version?
> 
>   ...jim
> 
>  Denis Lila wrote:
> > Hello again.
> >
> > This attachmet is a "can of worms" implementation without all
> the
>  fancy (and slow)
> > iteration. It also includes all of the other suggestions you
> sent
> >> in
>  your first
> > review of Dasher and Renderer last week (most importantly, the
>  firstOrientation
> > issue, horizontal lines filtering, and adding prefixes to
> >> variable
>  names to make
> > it clear whether they refer to pixels, or subpixels).
> >
> >

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-28 Thread Jim Graham

Hi Denis,

Only some minor comments so far:

Stroker.java:

- Should det be precomputed and saved?  (You calculate it in the 
constructor anyway, but don't save it.)

- Should test for uniform scale be precomputed?
- (Is test for uniform scale too strict?  Can a rotated uniform scale 
use the same code as upright uniform scale?)
- Why are m00_2_m01_2 et al no longer precomputed (they only need to be 
precomputed if scale is not uniform which isn't common)?
- lineLength method is "user space length" isn't it? - a more 
descriptive name might help avoid confusion if someone is modifying code 
here later.

- line 614 - missing space

Dasher.java:

- Line 187 - use "leftInThisDashSegment" here?

I still have to look at Renderer.java in depth, but I thought I'd send 
these minor comments along while they are fresh in my email buffer...


...jim

Denis Lila wrote:

Hello.

And, here it is: http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/

Just as I thought, it's way faster than the original. I made a new
test, which is supposed to be more realistic than drawing 3 length
lines. It consists of splitting a 1000x1000 frame in 100 10x10 squares
and in each square drawing 20 or so curves with random start/end/control
points in that square. In this case it's at least twice faster (~0.85
versus ~1.95 seconds).

Unfortunately I've had to do a lot of the same optimizations that I was
trying to remove (i.e. making everything a global variable). I tried
writing a higher level version of it, but it completely negated the
performance gains of the new algorithm. Anyway, it's still not as bad
as before because the algorithm is inherently clearer and we only iterate
through scan lines instead of iterating through strips and then scanlines
in that strip, and then edges, all in the same method.

It's also better organized, and logically separate parts of the code don't
really touch each other's variables much. And it's definitely better
commented. 


I also made some minor changes outside of Renderer that did not appear in
the first version of this patch last week:
1. I fixed the issue Jim pointed out in Dasher, where if 
(origLen == leftInThisDashSegment) the dash index was not being incremented.

2. In dasher, I no longer copy the dash array.
3. I removed files PiscesMath.java and Transform4.java because they are no
longer needed.

Thanks,
Denis.

- "Jim Graham"  wrote:


Woohoo, Denis!  I look forward to seeing the new version!

...jim

On 7/28/2010 5:51 AM, Denis Lila wrote:

Hello Jim.

This one performs almost identically to what is already there
in openjdk6 and 7, since it's exactly what I sent for review
last week, but with all the changes you suggested implemented.

I would actually like to ask you to not look at either one of
them. First of all, there is an ArrayIndexOutOfBoundsException
possible in emitRow.
And secondly, even if there wasn't, last night I implemented
your algorithm from ShapeSpanIterator.c to iterate through
the edges. I have yet to debug it, but it makes everything much,
much simpler, and it should make it far faster, so we get the best
of both worlds.

Thanks,
Denis.

- "Jim Graham"  wrote:


Hi Denis,

I'll try to get through both versions and see if I can find

anything

that was hurting performance with your EdgeLists.  I'm guessing

that

this version was created because of the performance issues you

found

with the EdgeList version?  Does this perform more closely to the
existing code than the EdgeList version?

...jim

Denis Lila wrote:

Hello again.

This attachmet is a "can of worms" implementation without all the

fancy (and slow)

iteration. It also includes all of the other suggestions you sent

in

your first

review of Dasher and Renderer last week (most importantly, the

firstOrientation

issue, horizontal lines filtering, and adding prefixes to

variable

names to make

it clear whether they refer to pixels, or subpixels).

Regards,
Denis.

- "Denis Lila"  wrote:


Hello Jim.

I implemented your "can of worms" idea. It works, and it got rid

of

the biasing.
I wasn't able to send a webrev, but there are many changes and a

side

by side
comparison would probably be useless, so I just attached the

file.

I

hope this is
ok.

I also implemented a "better" iterating structure for the lines

and

the
strips and crossings. I think it is better in every way, except
performance.
The new file is more than 200 lines smaller than the old one.

The

only

members of Renderer are now the AA variables and the position
variables
(sx*, sy*, x*, y*).
What I've done is I added an EdgeList class, which encapsulates

all

the edge
related variables in the old Renderer. At first, I had an Edge

class

in addition
to the EdgeList class, and while this was much nicer, it turned

out

to

be too
expensive (see last paragraph).
I've also added a ScanLineIterator, so instead of _endRendering
iterating
through strips

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-28 Thread Denis Lila
Hello.

And, here it is: http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/

Just as I thought, it's way faster than the original. I made a new
test, which is supposed to be more realistic than drawing 3 length
lines. It consists of splitting a 1000x1000 frame in 100 10x10 squares
and in each square drawing 20 or so curves with random start/end/control
points in that square. In this case it's at least twice faster (~0.85
versus ~1.95 seconds).

Unfortunately I've had to do a lot of the same optimizations that I was
trying to remove (i.e. making everything a global variable). I tried
writing a higher level version of it, but it completely negated the
performance gains of the new algorithm. Anyway, it's still not as bad
as before because the algorithm is inherently clearer and we only iterate
through scan lines instead of iterating through strips and then scanlines
in that strip, and then edges, all in the same method.

It's also better organized, and logically separate parts of the code don't
really touch each other's variables much. And it's definitely better
commented. 

I also made some minor changes outside of Renderer that did not appear in
the first version of this patch last week:
1. I fixed the issue Jim pointed out in Dasher, where if 
(origLen == leftInThisDashSegment) the dash index was not being incremented.
2. In dasher, I no longer copy the dash array.
3. I removed files PiscesMath.java and Transform4.java because they are no
longer needed.

Thanks,
Denis.

- "Jim Graham"  wrote:

> Woohoo, Denis!  I look forward to seeing the new version!
> 
>   ...jim
> 
> On 7/28/2010 5:51 AM, Denis Lila wrote:
> > Hello Jim.
> >
> > This one performs almost identically to what is already there
> > in openjdk6 and 7, since it's exactly what I sent for review
> > last week, but with all the changes you suggested implemented.
> >
> > I would actually like to ask you to not look at either one of
> > them. First of all, there is an ArrayIndexOutOfBoundsException
> > possible in emitRow.
> > And secondly, even if there wasn't, last night I implemented
> > your algorithm from ShapeSpanIterator.c to iterate through
> > the edges. I have yet to debug it, but it makes everything much,
> > much simpler, and it should make it far faster, so we get the best
> > of both worlds.
> >
> > Thanks,
> > Denis.
> >
> > - "Jim Graham"  wrote:
> >
> >> Hi Denis,
> >>
> >> I'll try to get through both versions and see if I can find
> anything
> >> that was hurting performance with your EdgeLists.  I'm guessing
> that
> >> this version was created because of the performance issues you
> found
> >> with the EdgeList version?  Does this perform more closely to the
> >> existing code than the EdgeList version?
> >>
> >>...jim
> >>
> >> Denis Lila wrote:
> >>> Hello again.
> >>>
> >>> This attachmet is a "can of worms" implementation without all the
> >> fancy (and slow)
> >>> iteration. It also includes all of the other suggestions you sent
> in
> >> your first
> >>> review of Dasher and Renderer last week (most importantly, the
> >> firstOrientation
> >>> issue, horizontal lines filtering, and adding prefixes to
> variable
> >> names to make
> >>> it clear whether they refer to pixels, or subpixels).
> >>>
> >>> Regards,
> >>> Denis.
> >>>
> >>> - "Denis Lila"  wrote:
> >>>
>  Hello Jim.
> 
>  I implemented your "can of worms" idea. It works, and it got rid
> >> of
>  the biasing.
>  I wasn't able to send a webrev, but there are many changes and a
> >> side
>  by side
>  comparison would probably be useless, so I just attached the
> file.
> >> I
>  hope this is
>  ok.
> 
>  I also implemented a "better" iterating structure for the lines
> >> and
>  the
>  strips and crossings. I think it is better in every way, except
>  performance.
>  The new file is more than 200 lines smaller than the old one.
> The
> >> only
>  members of Renderer are now the AA variables and the position
>  variables
>  (sx*, sy*, x*, y*).
>  What I've done is I added an EdgeList class, which encapsulates
> >> all
>  the edge
>  related variables in the old Renderer. At first, I had an Edge
> >> class
>  in addition
>  to the EdgeList class, and while this was much nicer, it turned
> out
> >> to
>  be too
>  expensive (see last paragraph).
>  I've also added a ScanLineIterator, so instead of _endRendering
>  iterating
>  through strips, and then calling renderStrip() which iterates
> >> through
>  the
>  scanlines in that strip, and then through the crossings in that
>  scanline,
>  what happens now is that _endRendering uses the
> Iterator
> >> to
>  iterate through each scanline, get get its crossings and iterate
>  through them
>  to accumulate the alpha. By the way, a ScanLine is a type
> defined
> >> by
>  an
>  interface which exports methods for get

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-28 Thread Jim Graham

Woohoo, Denis!  I look forward to seeing the new version!

...jim

On 7/28/2010 5:51 AM, Denis Lila wrote:

Hello Jim.

This one performs almost identically to what is already there
in openjdk6 and 7, since it's exactly what I sent for review
last week, but with all the changes you suggested implemented.

I would actually like to ask you to not look at either one of
them. First of all, there is an ArrayIndexOutOfBoundsException
possible in emitRow.
And secondly, even if there wasn't, last night I implemented
your algorithm from ShapeSpanIterator.c to iterate through
the edges. I have yet to debug it, but it makes everything much,
much simpler, and it should make it far faster, so we get the best
of both worlds.

Thanks,
Denis.

- "Jim Graham"  wrote:


Hi Denis,

I'll try to get through both versions and see if I can find anything
that was hurting performance with your EdgeLists.  I'm guessing that
this version was created because of the performance issues you found
with the EdgeList version?  Does this perform more closely to the
existing code than the EdgeList version?

...jim

Denis Lila wrote:

Hello again.

This attachmet is a "can of worms" implementation without all the

fancy (and slow)

iteration. It also includes all of the other suggestions you sent in

your first

review of Dasher and Renderer last week (most importantly, the

firstOrientation

issue, horizontal lines filtering, and adding prefixes to variable

names to make

it clear whether they refer to pixels, or subpixels).

Regards,
Denis.

- "Denis Lila"  wrote:


Hello Jim.

I implemented your "can of worms" idea. It works, and it got rid

of

the biasing.
I wasn't able to send a webrev, but there are many changes and a

side

by side
comparison would probably be useless, so I just attached the file.

I

hope this is
ok.

I also implemented a "better" iterating structure for the lines

and

the
strips and crossings. I think it is better in every way, except
performance.
The new file is more than 200 lines smaller than the old one. The

only

members of Renderer are now the AA variables and the position
variables
(sx*, sy*, x*, y*).
What I've done is I added an EdgeList class, which encapsulates

all

the edge
related variables in the old Renderer. At first, I had an Edge

class

in addition
to the EdgeList class, and while this was much nicer, it turned out

to

be too
expensive (see last paragraph).
I've also added a ScanLineIterator, so instead of _endRendering
iterating
through strips, and then calling renderStrip() which iterates

through

the
scanlines in that strip, and then through the crossings in that
scanline,
what happens now is that _endRendering uses the Iterator

to

iterate through each scanline, get get its crossings and iterate
through them
to accumulate the alpha. By the way, a ScanLine is a type defined

by

an
interface which exports methods for getting the y coord of the

line,

the
number of crossings in it, the ith crossing, and a method for

sorting

its crossings.
The class that implements ScanLine is ScanLineIterator itself. I

made

a
ScanLine class, but I was afraid performance would suffer because

of

all the
object creations (this turned out not to be an issue, after I

switched

to the
current way, and remeasured things). I did not switch back because
this is
only slightly worse.

As for performance: I wrote a simple program that tries to draw a
dashed path
that consists of about 160 dashed lines of width 1 and length

3,

going
from the centre of the frame to some point. On my machine, this

takes

about 4.9
seconds in openjdk6, and 26 seconds using the attached file. Back

when

I was using
the Edge class it took about 39 seconds. Everything without hundres

of

thousands of
edges is not much slower
I have not changed any of the algorithms. ScanLineIterator still

goes

through
strips of the same size and computes crossings in every strip

using

the same
method as before, so I don't know why it's so slow. It can't be
because of anything
happening in _endRendering, because there are only about 9000
scanlines and for each
of them I've just added a few calls to one line getters (which used

to

be direct
accesses into arrays).

Thanks,
Denis.

- "Jim Graham"  wrote:


Denis Lila wrote:

Hello Jim.

Thank you very much for taking the time to read through this.


169 - if origLen reaches the end of the dash exactly (the "=="

case)

 You're right, I should. I can't just replace<= with ==

though,

because the results will be the same: in the equal case origLen

will

become 0, and on the next iteration, the (origLen<

dash[idx]-phase)

will

be true, and we will do a goTo(x1,y1), which is what we just did

in

the

previous iteration (unless dash[idx] is 0, in which case the

results

will be even worse). The best solution to this is to just do a

nested

check for the == case.

Ah, right - because there is no "break" when origLen becomes

zero.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-28 Thread Denis Lila
Hello Jim.

This one performs almost identically to what is already there 
in openjdk6 and 7, since it's exactly what I sent for review
last week, but with all the changes you suggested implemented.

I would actually like to ask you to not look at either one of
them. First of all, there is an ArrayIndexOutOfBoundsException
possible in emitRow.
And secondly, even if there wasn't, last night I implemented 
your algorithm from ShapeSpanIterator.c to iterate through 
the edges. I have yet to debug it, but it makes everything much,
much simpler, and it should make it far faster, so we get the best
of both worlds.

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 
> I'll try to get through both versions and see if I can find anything 
> that was hurting performance with your EdgeLists.  I'm guessing that 
> this version was created because of the performance issues you found 
> with the EdgeList version?  Does this perform more closely to the 
> existing code than the EdgeList version?
> 
>   ...jim
> 
> Denis Lila wrote:
> > Hello again.
> > 
> > This attachmet is a "can of worms" implementation without all the
> fancy (and slow)
> > iteration. It also includes all of the other suggestions you sent in
> your first
> > review of Dasher and Renderer last week (most importantly, the
> firstOrientation
> > issue, horizontal lines filtering, and adding prefixes to variable
> names to make
> > it clear whether they refer to pixels, or subpixels).
> > 
> > Regards,
> > Denis.
> > 
> > - "Denis Lila"  wrote:
> > 
> >> Hello Jim.
> >>
> >> I implemented your "can of worms" idea. It works, and it got rid
> of
> >> the biasing.
> >> I wasn't able to send a webrev, but there are many changes and a
> side
> >> by side
> >> comparison would probably be useless, so I just attached the file.
> I
> >> hope this is
> >> ok.
> >>
> >> I also implemented a "better" iterating structure for the lines
> and
> >> the
> >> strips and crossings. I think it is better in every way, except
> >> performance.
> >> The new file is more than 200 lines smaller than the old one. The
> only
> >> members of Renderer are now the AA variables and the position
> >> variables
> >> (sx*, sy*, x*, y*).
> >> What I've done is I added an EdgeList class, which encapsulates
> all
> >> the edge
> >> related variables in the old Renderer. At first, I had an Edge
> class
> >> in addition
> >> to the EdgeList class, and while this was much nicer, it turned out
> to
> >> be too
> >> expensive (see last paragraph).
> >> I've also added a ScanLineIterator, so instead of _endRendering
> >> iterating
> >> through strips, and then calling renderStrip() which iterates
> through
> >> the
> >> scanlines in that strip, and then through the crossings in that
> >> scanline,
> >> what happens now is that _endRendering uses the Iterator
> to
> >> iterate through each scanline, get get its crossings and iterate
> >> through them
> >> to accumulate the alpha. By the way, a ScanLine is a type defined
> by
> >> an
> >> interface which exports methods for getting the y coord of the
> line,
> >> the
> >> number of crossings in it, the ith crossing, and a method for
> sorting
> >> its crossings.
> >> The class that implements ScanLine is ScanLineIterator itself. I
> made
> >> a
> >> ScanLine class, but I was afraid performance would suffer because
> of
> >> all the
> >> object creations (this turned out not to be an issue, after I
> switched
> >> to the
> >> current way, and remeasured things). I did not switch back because
> >> this is
> >> only slightly worse.
> >>
> >> As for performance: I wrote a simple program that tries to draw a
> >> dashed path
> >> that consists of about 160 dashed lines of width 1 and length
> 3,
> >> going
> >> from the centre of the frame to some point. On my machine, this
> takes
> >> about 4.9
> >> seconds in openjdk6, and 26 seconds using the attached file. Back
> when
> >> I was using
> >> the Edge class it took about 39 seconds. Everything without hundres
> of
> >> thousands of
> >> edges is not much slower
> >> I have not changed any of the algorithms. ScanLineIterator still
> goes
> >> through
> >> strips of the same size and computes crossings in every strip
> using
> >> the same
> >> method as before, so I don't know why it's so slow. It can't be
> >> because of anything
> >> happening in _endRendering, because there are only about 9000
> >> scanlines and for each
> >> of them I've just added a few calls to one line getters (which used
> to
> >> be direct
> >> accesses into arrays).
> >>
> >> Thanks,
> >> Denis.
> >>
> >> - "Jim Graham"  wrote:
> >>
> >>> Denis Lila wrote:
>  Hello Jim.
> 
>  Thank you very much for taking the time to read through this.
> 
> > 169 - if origLen reaches the end of the dash exactly (the "=="
> >>> case)
>  You're right, I should. I can't just replace <= with ==
> >> though,
>  because the results will be the same: in the equal case origLen
>

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-27 Thread Jim Graham

Hi Denis,

I'll try to get through both versions and see if I can find anything 
that was hurting performance with your EdgeLists.  I'm guessing that 
this version was created because of the performance issues you found 
with the EdgeList version?  Does this perform more closely to the 
existing code than the EdgeList version?


...jim

Denis Lila wrote:

Hello again.

This attachmet is a "can of worms" implementation without all the fancy (and 
slow)
iteration. It also includes all of the other suggestions you sent in your first
review of Dasher and Renderer last week (most importantly, the firstOrientation
issue, horizontal lines filtering, and adding prefixes to variable names to make
it clear whether they refer to pixels, or subpixels).

Regards,
Denis.

- "Denis Lila"  wrote:


Hello Jim.

I implemented your "can of worms" idea. It works, and it got rid of
the biasing.
I wasn't able to send a webrev, but there are many changes and a side
by side
comparison would probably be useless, so I just attached the file. I
hope this is
ok.

I also implemented a "better" iterating structure for the lines and
the
strips and crossings. I think it is better in every way, except
performance.
The new file is more than 200 lines smaller than the old one. The only
members of Renderer are now the AA variables and the position
variables
(sx*, sy*, x*, y*).
What I've done is I added an EdgeList class, which encapsulates all
the edge
related variables in the old Renderer. At first, I had an Edge class
in addition
to the EdgeList class, and while this was much nicer, it turned out to
be too
expensive (see last paragraph).
I've also added a ScanLineIterator, so instead of _endRendering
iterating
through strips, and then calling renderStrip() which iterates through
the
scanlines in that strip, and then through the crossings in that
scanline,
what happens now is that _endRendering uses the Iterator to
iterate through each scanline, get get its crossings and iterate
through them
to accumulate the alpha. By the way, a ScanLine is a type defined by
an
interface which exports methods for getting the y coord of the line,
the
number of crossings in it, the ith crossing, and a method for sorting
its crossings.
The class that implements ScanLine is ScanLineIterator itself. I made
a
ScanLine class, but I was afraid performance would suffer because of
all the
object creations (this turned out not to be an issue, after I switched
to the
current way, and remeasured things). I did not switch back because
this is
only slightly worse.

As for performance: I wrote a simple program that tries to draw a
dashed path
that consists of about 160 dashed lines of width 1 and length 3,
going
from the centre of the frame to some point. On my machine, this takes
about 4.9
seconds in openjdk6, and 26 seconds using the attached file. Back when
I was using
the Edge class it took about 39 seconds. Everything without hundres of
thousands of
edges is not much slower
I have not changed any of the algorithms. ScanLineIterator still goes
through
strips of the same size and computes crossings in every strip using
the same
method as before, so I don't know why it's so slow. It can't be
because of anything
happening in _endRendering, because there are only about 9000
scanlines and for each
of them I've just added a few calls to one line getters (which used to
be direct
accesses into arrays).

Thanks,
Denis.

- "Jim Graham"  wrote:


Denis Lila wrote:

Hello Jim.

Thank you very much for taking the time to read through this.


169 - if origLen reaches the end of the dash exactly (the "=="

case)

You're right, I should. I can't just replace <= with ==

though,

because the results will be the same: in the equal case origLen

will

become 0, and on the next iteration, the (origLen <

dash[idx]-phase)

will

be true, and we will do a goTo(x1,y1), which is what we just did

in

the

previous iteration (unless dash[idx] is 0, in which case the

results

will be even worse). The best solution to this is to just do a

nested

check for the == case.

Ah, right - because there is no "break" when origLen becomes zero.
Sounds like you're on it.


171 - Aren't x0,y0 stored as subpix values?  You would then be

comparing

a subpix value to a non-subpix value.  Perhaps if the subpix

calls

are

moved to the top of the function I think this should work OK?

That's true, they are. This is very puzzling. If a horizontal

line is

added, when the crossings for it are being computed, dxBydy should

be NaN, and

wouldn't an error be thrown when we try to cast to an int in the

call to addCrossing?

I'm not sure - I didn't trace it through very far - I just noted

that

the values were likely in different "resolutions".


194,197 - Shouldn't these be constants, or based on the

SUB_POS_XY?

I suppose I should make a biasing constant. I don't think they

should be based

on SUB_POS_XY though, because the biasing is done to subpixel

coor

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-26 Thread Denis Lila
Hello again.

This attachmet is a "can of worms" implementation without all the fancy (and 
slow)
iteration. It also includes all of the other suggestions you sent in your first
review of Dasher and Renderer last week (most importantly, the firstOrientation
issue, horizontal lines filtering, and adding prefixes to variable names to make
it clear whether they refer to pixels, or subpixels).

Regards,
Denis.

- "Denis Lila"  wrote:

> Hello Jim.
> 
> I implemented your "can of worms" idea. It works, and it got rid of
> the biasing.
> I wasn't able to send a webrev, but there are many changes and a side
> by side
> comparison would probably be useless, so I just attached the file. I
> hope this is
> ok.
> 
> I also implemented a "better" iterating structure for the lines and
> the
> strips and crossings. I think it is better in every way, except
> performance.
> The new file is more than 200 lines smaller than the old one. The only
> members of Renderer are now the AA variables and the position
> variables
> (sx*, sy*, x*, y*).
> What I've done is I added an EdgeList class, which encapsulates all
> the edge
> related variables in the old Renderer. At first, I had an Edge class
> in addition
> to the EdgeList class, and while this was much nicer, it turned out to
> be too
> expensive (see last paragraph).
> I've also added a ScanLineIterator, so instead of _endRendering
> iterating
> through strips, and then calling renderStrip() which iterates through
> the
> scanlines in that strip, and then through the crossings in that
> scanline,
> what happens now is that _endRendering uses the Iterator to
> iterate through each scanline, get get its crossings and iterate
> through them
> to accumulate the alpha. By the way, a ScanLine is a type defined by
> an
> interface which exports methods for getting the y coord of the line,
> the
> number of crossings in it, the ith crossing, and a method for sorting
> its crossings.
> The class that implements ScanLine is ScanLineIterator itself. I made
> a
> ScanLine class, but I was afraid performance would suffer because of
> all the
> object creations (this turned out not to be an issue, after I switched
> to the
> current way, and remeasured things). I did not switch back because
> this is
> only slightly worse.
> 
> As for performance: I wrote a simple program that tries to draw a
> dashed path
> that consists of about 160 dashed lines of width 1 and length 3,
> going
> from the centre of the frame to some point. On my machine, this takes
> about 4.9
> seconds in openjdk6, and 26 seconds using the attached file. Back when
> I was using
> the Edge class it took about 39 seconds. Everything without hundres of
> thousands of
> edges is not much slower
> I have not changed any of the algorithms. ScanLineIterator still goes
> through
> strips of the same size and computes crossings in every strip using
> the same
> method as before, so I don't know why it's so slow. It can't be
> because of anything
> happening in _endRendering, because there are only about 9000
> scanlines and for each
> of them I've just added a few calls to one line getters (which used to
> be direct
> accesses into arrays).
> 
> Thanks,
> Denis.
> 
> - "Jim Graham"  wrote:
> 
> > Denis Lila wrote:
> > > Hello Jim.
> > >
> > > Thank you very much for taking the time to read through this.
> > >
> > >> 169 - if origLen reaches the end of the dash exactly (the "=="
> > case)
> > >
> > > You're right, I should. I can't just replace <= with ==
> though,
> > > because the results will be the same: in the equal case origLen
> > will
> > > become 0, and on the next iteration, the (origLen <
> dash[idx]-phase)
> > will
> > > be true, and we will do a goTo(x1,y1), which is what we just did
> in
> > the
> > > previous iteration (unless dash[idx] is 0, in which case the
> results
> >
> > > will be even worse). The best solution to this is to just do a
> > nested
> > > check for the == case.
> >
> > Ah, right - because there is no "break" when origLen becomes zero.
> > Sounds like you're on it.
> >
> > >> 171 - Aren't x0,y0 stored as subpix values?  You would then be
> > comparing
> > >> a subpix value to a non-subpix value.  Perhaps if the subpix
> calls
> > are
> > >> moved to the top of the function I think this should work OK?
> > >
> > > That's true, they are. This is very puzzling. If a horizontal
> > line is
> > > added, when the crossings for it are being computed, dxBydy should
> > be NaN, and
> > > wouldn't an error be thrown when we try to cast to an int in the
> > call to addCrossing?
> >
> > I'm not sure - I didn't trace it through very far - I just noted
> that
> >
> > the values were likely in different "resolutions".
> >
> > >> 194,197 - Shouldn't these be constants, or based on the
> > SUB_POS_XY?
> > >
> > > I suppose I should make a biasing constant. I don't think they
> > should be based
> > > on SUB_POS_XY though, because the biasing is done to subpixel
> > coordinates so
> > >

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-26 Thread Denis Lila
Hello Jim.

I implemented your "can of worms" idea. It works, and it got rid of the biasing.
I wasn't able to send a webrev, but there are many changes and a side by side
comparison would probably be useless, so I just attached the file. I hope this 
is
ok.

I also implemented a "better" iterating structure for the lines and the
strips and crossings. I think it is better in every way, except performance. 
The new file is more than 200 lines smaller than the old one. The only 
members of Renderer are now the AA variables and the position variables 
(sx*, sy*, x*, y*).
What I've done is I added an EdgeList class, which encapsulates all the edge
related variables in the old Renderer. At first, I had an Edge class in addition
to the EdgeList class, and while this was much nicer, it turned out to be too
expensive (see last paragraph). 
I've also added a ScanLineIterator, so instead of _endRendering iterating
through strips, and then calling renderStrip() which iterates through the 
scanlines in that strip, and then through the crossings in that scanline, 
what happens now is that _endRendering uses the Iterator to
iterate through each scanline, get get its crossings and iterate through them
to accumulate the alpha. By the way, a ScanLine is a type defined by an 
interface which exports methods for getting the y coord of the line, the 
number of crossings in it, the ith crossing, and a method for sorting
its crossings. 
The class that implements ScanLine is ScanLineIterator itself. I made a 
ScanLine class, but I was afraid performance would suffer because of all the
object creations (this turned out not to be an issue, after I switched to the
current way, and remeasured things). I did not switch back because this is
only slightly worse.

As for performance: I wrote a simple program that tries to draw a dashed path
that consists of about 160 dashed lines of width 1 and length 3, going 
from the centre of the frame to some point. On my machine, this takes about 4.9
seconds in openjdk6, and 26 seconds using the attached file. Back when I was 
using
the Edge class it took about 39 seconds. Everything without hundres of 
thousands of
edges is not much slower
I have not changed any of the algorithms. ScanLineIterator still goes through 
strips of the same size and computes crossings in every strip using the same
method as before, so I don't know why it's so slow. It can't be because of 
anything
happening in _endRendering, because there are only about 9000 scanlines and for 
each
of them I've just added a few calls to one line getters (which used to be 
direct 
accesses into arrays).

Thanks,
Denis.

- "Jim Graham"  wrote:

> Denis Lila wrote:
> > Hello Jim.
> > 
> > Thank you very much for taking the time to read through this.
> > 
> >> 169 - if origLen reaches the end of the dash exactly (the "=="
> case) 
> > 
> > You're right, I should. I can't just replace <= with == though,
> > because the results will be the same: in the equal case origLen
> will
> > become 0, and on the next iteration, the (origLen < dash[idx]-phase)
> will
> > be true, and we will do a goTo(x1,y1), which is what we just did in
> the
> > previous iteration (unless dash[idx] is 0, in which case the results
> 
> > will be even worse). The best solution to this is to just do a
> nested
> > check for the == case.
> 
> Ah, right - because there is no "break" when origLen becomes zero. 
> Sounds like you're on it.
> 
> >> 171 - Aren't x0,y0 stored as subpix values?  You would then be
> comparing 
> >> a subpix value to a non-subpix value.  Perhaps if the subpix calls
> are 
> >> moved to the top of the function I think this should work OK?
> > 
> > That's true, they are. This is very puzzling. If a horizontal
> line is
> > added, when the crossings for it are being computed, dxBydy should
> be NaN, and
> > wouldn't an error be thrown when we try to cast to an int in the
> call to addCrossing?
> 
> I'm not sure - I didn't trace it through very far - I just noted that
> 
> the values were likely in different "resolutions".
> 
> >> 194,197 - Shouldn't these be constants, or based on the
> SUB_POS_XY?
> > 
> > I suppose I should make a biasing constant. I don't think they
> should be based
> > on SUB_POS_XY though, because the biasing is done to subpixel
> coordinates so
> > there is no danger that if our supersampling is fine enough the
> biasing will 
> > make the coordinates jump over some scan line.
> 
> I'm guessing you punted on my "can of worms" suggestion then.  ;-)
> 
> >> 216 - if you save the sx0,sy0 before subpix'ing them then you don't
> have 
> >> to "unsubpix" them here.  (Though you still need the subpix sy0 for
> line 
> >> 209 - or you could just call subpix on it for that line and at
> least
> >> you'd save 1 call to sub/unsub).
> > 
> > Ok. I'll just save subpixed and unsubpixed versions of sx0, sy0.
> That should
> > eliminate all sx0,sy0 related calls to tosubpix and topix except in
> moveTo.
> 
> and lineT

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-21 Thread Denis Lila
> >> 721 - Arrays.sort()
> > 
> > I thought about using this, but I did some measurements, and it
> turns out that 
> > Arrays.sort() is a bit slower if the portion of the array being
> sorted has fewer
> > than about 70 elements.
> 
> I wonder what the typical number of elements is.  Is this sorting 
> crossings per line?  Then simple primitives like circles should only 
> have 2 per line, right?  Is it worth testing for really small numbers of 
> elements (much lower than 70) and doing a manual sort?  Or am I 
> misunderstanding what is being sorted there?

That's correct. For every scan line, we're sorting the crossings on that
scan line, so filling a circle would have 2 per line (drawing it would have
4 per line (for most lines, anyway)).

I'm not sure if it's worth it. On one hand, the performance benefits would 
be
minimal (because for small numbers of elements, just about anything will be fast
and even if insertionSort is 200% faster than quicksort it won't make much 
difference).
On the other hand, checking for, say crossingIndices[i] - start < 50 
doesn't cost
us much, and I already implemented and tested insertionSort.

I think we should just use Arrays.sort all the time, just to keep the file 
as small as
possible.

> If you look for something like the native code for 
> sun/java2d/pipe/ShapeSpanIterator.c you will see the way I typically 
> like to do edge setup and enumeration.
> 
> That code uses the "half open interval" approach for both X and Y
> intervals.
> 
> I then sort the edge list by "leading Y" and then move through the
> edges 
> using the following manner (kind of like an inch worm eating the
> edges, 
> maintaining a pointer to the beginning and end of an "active list" of
> 
> edges that are in play at any given time by virtue of having their Y 
> range intersect the current sampling Y).  Note that I use an array of
> 
> edges and then a parallel array of pointers into the edges so that I
> can 
> sort just the pointers and avoid moving tons of edge data around. 
> Also, 
> later when I eliminate edges from the active list I just move their 
> pointers into and out of view rather than having to copy the edge
> data. 
>   It is harder to do an array of pointers in Java, though - perhaps an
> 
> array of indices?  Here is some basic pseudocode:
> 
> start with lo and hi pointing at index 0 in edge list.
> until edge list is exhausted {
>  process edges between lo and hi (empty on first pass)
>  scan from hi to lo and toss any edges that are exhausted
>  (compacting remaining pointers towards hi)
>  keep incrementing hi to accept any new edges coming into play
>  process edges between lo and hi to increment to the next Y
>  (note that new edges from previous step may not
>   need any processing in this stage if their starting
>   Y equals the next Y to be sampled)
> }
> 
> Gradually lo and hi make their way through the list.  The edges above
> hi 
> are always sorted in order of when they may come into play as we move
> 
> downward in Y.  The edges between lo and hi are also usually kept
> sorted 
> by their current X so that the stage that processes them into spans
> can 
> just run through them.  The edges below lo are usually random garbage
> 
> because no care was taken during the "pruning" step to worry about
> what 
> happens to the pointer table down there as lo is incremented (the 
> algorithm only ever looks up the array of pointers).
> 
> I hope that helps...

That does help.
Thanks a lot,
Denis.


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-21 Thread Jim Graham

Denis Lila wrote:

Hello Jim.

Thank you very much for taking the time to read through this.

169 - if origLen reaches the end of the dash exactly (the "==" case) 


You're right, I should. I can't just replace <= with == though,
because the results will be the same: in the equal case origLen will
become 0, and on the next iteration, the (origLen < dash[idx]-phase) will
be true, and we will do a goTo(x1,y1), which is what we just did in the
previous iteration (unless dash[idx] is 0, in which case the results 
will be even worse). The best solution to this is to just do a nested

check for the == case.


Ah, right - because there is no "break" when origLen becomes zero. 
Sounds like you're on it.


171 - Aren't x0,y0 stored as subpix values?  You would then be comparing 
a subpix value to a non-subpix value.  Perhaps if the subpix calls are 
moved to the top of the function I think this should work OK?


That's true, they are. This is very puzzling. If a horizontal line is
added, when the crossings for it are being computed, dxBydy should be NaN, and
wouldn't an error be thrown when we try to cast to an int in the call to 
addCrossing?


I'm not sure - I didn't trace it through very far - I just noted that 
the values were likely in different "resolutions".



194,197 - Shouldn't these be constants, or based on the SUB_POS_XY?


I suppose I should make a biasing constant. I don't think they should be 
based
on SUB_POS_XY though, because the biasing is done to subpixel coordinates so
there is no danger that if our supersampling is fine enough the biasing will 
make the coordinates jump over some scan line.


I'm guessing you punted on my "can of worms" suggestion then.  ;-)

216 - if you save the sx0,sy0 before subpix'ing them then you don't have 
to "unsubpix" them here.  (Though you still need the subpix sy0 for line 
209 - or you could just call subpix on it for that line and at least

you'd save 1 call to sub/unsub).


Ok. I'll just save subpixed and unsubpixed versions of sx0, sy0. That should
eliminate all sx0,sy0 related calls to tosubpix and topix except in moveTo.


and lineTo.  You may only need 3 of those values, though, if I remember 
my code reading well enough.


256,264 - casting to int is problematic.  It truncates towards 0 which 
means negatives are ceil'd and positives are floor'd.  It would be best 
to use floor here instead.  On the other hand, if negative numbers are 
always "off the left side of the drawable" then this is moot.


That's why I left it at int casting. Do you still think I should change it
to floor?


If you mean floor, I think it best to use floor.  Unless you can prove 
that negatives aren't really an issue and that the strange truncation on 
them won't be numerically a problem - but I don't think it is worth it 
for this code.


Speaking of which, is there a good way to edit and build openJDK from eclipse? 
Then this sort of embarrassing error could be avoided (including the printStats() call).


I don't use Eclipse, sorry.  :-(

As for Arrays.newSize()... I can't find it here: 
http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html

Is this a new function added in 7?


Sorry, make that Arrays.copyOf(..., newSize).  I tried to type the name 
from memory and got it wrong.



721 - Arrays.sort()


I thought about using this, but I did some measurements, and it turns out that 
Arrays.sort() is a bit slower if the portion of the array being sorted has fewer

than about 70 elements.


I wonder what the typical number of elements is.  Is this sorting 
crossings per line?  Then simple primitives like circles should only 
have 2 per line, right?  Is it worth testing for really small numbers of 
elements (much lower than 70) and doing a manual sort?  Or am I 
misunderstanding what is being sorted there?


How comfortable do you feel with that conversion? 


I'll try to do it and include it in a new patch along with, hopefully, a better 
way
to iterate through strips, and especially crossings. Right now all the iteration
state is saved in global variables. This is... not good. I spent far too much 
time last
week on bugs caused by this sort of thing. Ideally, any members that can't be put at 
the top of a class (like our edge and crossing data) should be put in classes of their own.


That sounds good, but also consider doing it in separate stages to 
reduce churn in the code reviewing (and you then have revs to go back 
and test which stage caused a probem if we find a bug later):


- first get all on floats
- then change strip management
- then change to open coordinate intervals
- (or vice versa)


Do you have any ideas about how to iterate through edges in a strip without 
going through
every edge? I was thinking of maybe using some sort of tree to split the 
drawing surface,
but I haven't given it much thought.


If you look for something like the native code for 
sun/java2d/pipe/ShapeSpanIterator.c you will see the

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-21 Thread Denis Lila
Hello Jim.

Thank you very much for taking the time to read through this.

> 91 - is there a reason to copy the dash array?  Theoretically if this 
> object were used in an arbitrary fashion it is good sense to protect 
> against data changing out from under it, but in practice we only ever 
> feed it the array that came from a BasicStroke object which was already 
> protected from arbitrary changes and is copied every time we fetch it so 
> this is one more unfortunately extra fetch that isn't needed in 
> practice.  (Ideally we'd instrument an extractor for BasicStroke so we
> could get the raw array without the defensive copy in
> getDashArray()...)

You're right, I suppose there isn't. It's just the way it was
being done before, and it didn't occur to me to change it. I will,
in the next version of this patch.


> 169 - if origLen reaches the end of the dash exactly (the "==" case) 
> then shouldn't you flip the dashOn variable?  That would happen if you
> drop through to the following code (i.e. change the test to just "<"),
> but it would require a bit more calculation.  If you leave the phase 
> equal to the dash size (which is what happens now in the "==" case) then 
> the next time through the if will fail and you will end up with a second 
> goTo to the same coordinates that the previous dash ended on before 
> flipping the dashOn variable and circling back - which is a waste of a
> bunch of calculations.  Or was there some bookkeeping issue that arose 
> when the "==" case dropped through?

You're right, I should. I can't just replace <= with == though,
because the results will be the same: in the equal case origLen will
become 0, and on the next iteration, the (origLen < dash[idx]-phase) will
be true, and we will do a goTo(x1,y1), which is what we just did in the
previous iteration (unless dash[idx] is 0, in which case the results 
will be even worse). The best solution to this is to just do a nested
check for the == case.


> 248 - is the normalize parameter used yet?  Not a complaint - I just 
> didn't see it getting used and wanted to make sure I didn't miss
> something.

No, no yet. That's next week's work :)


> 257-261 - both Stroker and Dasher grab the t4 values into local 
> variables.  Hotspot might cover up this sin if the T4 stays in the young 
> generation, but for a long path the T4 may move into an older generation 
> and take more to collect.  Maybe it would be worth just passing in the 4 
> values to the Stroker and Dasher constructors rather than encapsulating 
> them in a T4 that is immediately stripped and ignored?

Definitely. I never liked using the Transform4 class, and I almost
removed it in this patch (but didn't because I didn't want to change that much).
This is all the reason I need to remove it. Nothing ever uses Transform4 anyway,
except to just unpack the matrix entries, so all we get from it is slightly 
reduced parameter lists.


> 367 - you left a stats printing call in here.
Oops... sorry about that.

> I note some confusion below over comparing subpixel values to pixel 
> values.  Perhaps a convention where all subpixel values would be in 
> variables with "sub" in their name or something like that would make it 
> more maintainable (and correct in the short term).

That's a good idea. I'll do that.


> 47,48 - any reason to move these fields around?  (Not a complaint - just 
> wondering if you had a reason.)

No, no reason. It happened accidentally - I made many changes to this
file before settling on this, and at one point these variables weren't even 
there
(replaced by Math.floor and Math.ceil calls that I thought did equivalent things
to the various bit operations these are used for. After running into some bugs,
I reintroduced them to keep the changes as small as possible and eliminate these
as bug sources).


> 129 - note this could overflow, but I imagine that the bounds came from 
> a device clip which means there would have to be some drawable created 
> that was larger than MAX_INT/SUB_POS_XY.  If that is true then it 
> probably isn't worth worrying about, but a comment might help point that out.

That's true. In fact, there are a few other places where ints could overflow
if the clip passed in to Renderer is large enough. I'll try to comment on them 
all.


> 157 - Personally I'd call close before I started working on the new 
> data, but that's just my coding layout style.  Also, does close() always 
> do the "right thing" if no path has been created yet?  For one thing it 
> looks like it always results in a call to "lineTo" no matter what which 
> I'm guessing "happens to" fall into the short-circuit for horizontal 
> lines in lineTo, but I haven't verified it.  In fact I think I see a 
> problem in that firstOrientation is not set to 0 and so if you have 2 
> moveto's in a row then the second one will call close() and find that 
> the orientations don't match and increment flips.  That isn't a critical 
> problem si

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-20 Thread Jim Graham

Hi Denis,

This is awesome!  Thanks for doing this.

I agree with all of your comments below.  Here are some thoughts on the 
new code:


Dasher.java:

91 - is there a reason to copy the dash array?  Theoretically if this 
object were used in an arbitrary fashion it is good sense to protect 
against data changing out from under it, but in practice we only ever 
feed it the array that came from a BasicStroke object which was already 
protected from arbitrary changes and is copied every time we fetch it so 
this is one more unfortunately extra fetch that isn't needed in 
practice.  (Ideally we'd instrument an extractor for BasicStroke so we 
could get the raw array without the defensive copy in getDashArray()...)


169 - if origLen reaches the end of the dash exactly (the "==" case) 
then shouldn't you flip the dashOn variable?  That would happen if you 
drop through to the following code (i.e. change the test to just "<"), 
but it would require a bit more calculation.  If you leave the phase 
equal to the dash size (which is what happens now in the "==" case) then 
the next time through the if will fail and you will end up with a second 
goTo to the same coordinates that the previous dash ended on before 
flipping the dashOn variable and circling back - which is a waste of a 
bunch of calculations.  Or was there some bookkeeping issue that arose 
when the "==" case dropped through?


PiscesRenderingEngine.java

248 - is the normalize parameter used yet?  Not a complaint - I just 
didn't see it getting used and wanted to make sure I didn't miss something.


257-261 - both Stroker and Dasher grab the t4 values into local 
variables.  Hotspot might cover up this sin if the T4 stays in the young 
generation, but for a long path the T4 may move into an older generation 
and take more to collect.  Maybe it would be worth just passing in the 4 
values to the Stroker and Dasher constructors rather than encapsulating 
them in a T4 that is immediately stripped and ignored?


367 - you left a stats printing call in here.

Renderer.java:

I note some confusion below over comparing subpixel values to pixel 
values.  Perhaps a convention where all subpixel values would be in 
variables with "sub" in their name or something like that would make it 
more maintainable (and correct in the short term).


47,48 - any reason to move these fields around?  (Not a complaint - just 
wondering if you had a reason.)


129 - note this could overflow, but I imagine that the bounds came from 
a device clip which means there would have to be some drawable created 
that was larger than MAX_INT/SUB_POS_XY.  If that is true then it 
probably isn't worth worrying about, but a comment might help point that 
out.


157 - Personally I'd call close before I started working on the new 
data, but that's just my coding layout style.  Also, does close() always 
do the "right thing" if no path has been created yet?  For one thing it 
looks like it always results in a call to "lineTo" no matter what which 
I'm guessing "happens to" fall into the short-circuit for horizontal 
lines in lineTo, but I haven't verified it.  In fact I think I see a 
problem in that firstOrientation is not set to 0 and so if you have 2 
moveto's in a row then the second one will call close() and find that 
the orientations don't match and increment flips.  That isn't a critical 
problem since flips is only used to allocate memory, but it would 
allocate more than needed in that case. (not to mention emitting 
unnecessary linetos)  Suggestion - set firstOrientation to 0 in moveto 
and test for firstOrientation==0 in close() so that close() doesn't add 
any geometry to something that has no deflections yet.


171 - Aren't x0,y0 stored as subpix values?  You would then be comparing 
 a subpix value to a non-subpix value.  Perhaps if the subpix calls are 
moved to the top of the function I think this should work OK?


194,197 - Shouldn't these be constants, or based on the SUB_POS_XY?

 Warning: potential can of worms 
BTW, I never traced this through to see why it was needed, but I've seen 
a lot of renderers get afraid of vertices on pixel boundaries for 
similar reasons, but most of them are wrong because they miss the fact 
that the last coordinate on a line should be ignored by the scanline 
counts because of the rule of "any pixel on a horizontal edge should be 
outside the path if the area right below it is outside".  Basically, we 
fill from x0 up to but not including x1 and also from y0 up to but not 
including y1.  Thus, only y0 would trigger a scanline increment and y1 
would never do so.  If the following segment immediately heads up then 
both of those segments would "end" on the same scanline and be ignored. 
 If the following segment continues down then the first does not 
trigger a scanline crossing, but the second does.  If we can make sure 
the bookkeeping is done that way in the rest of the Renderer then we can 
get rid of this fudge factor entirely. 

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-19 Thread Denis Lila
Hello again.

I know I promised this last week, and I'm sorry it's late, but some nasty 
bugs popped up. The webrev is at: 
http://icedtea.classpath.org/~dlila/webrevs/floatingPoint/webrev/

I took this opportunity to make some changes that aren't related to floating
point conversion, since this effort wasn't really directed at solving any one
particular bug, but towards general refactoring and improving of Pisces 
(although
it does solve a certain bug).
1. 
I removed some write-only variables in Renderer and Stroker.

2. 
I removed Dasher and Stroker's ability for the same object to be used with
more than one output, transform, width, etc. Jim Graham said we should consider
removing the no argument constructor for Dasher in a previous e-mail (which
is equivalent to doing what I've done since you can't have this functionality
without a no argument constructor and methods like setParameters and setOutput).
I thought this was a good idea because this functionality was never being used,
it clutters things up (among other things, it necessitates making the clients 
call setOutput and setParameters before the object can be used. This sort of
thing is not a good idea since methods should be as self-contained as possible),
and even if it is used, all it does is save us the creation of some objects. 
Since
a huge number of these objects would never be needed and since they are not 
very expensive to create (Stroker is the biggest, and it has about 38 members), 
this is a premature optimization.

3.
(2) allowed me to declare a whole bunch of members final (things like 
output,
lineWidth, transformation matrix entries and anything that can't change).

4.
I changed the crossing sorting algorithm in Renderer from bubble sort to 
insertion sort. Not a huge difference, but mine is shorter, it should perform
slightly better, and I think it the algorithm is easier to understand.

5.
I inserted comments on things which used to confuse me. Please check these -
when reading code, there is nothing worse than a wrong explanation in a comment.

6.
I removed the if(false &&... block in Renderer. I tried to make it work some
time ago, but it was complicated and more trouble than it was worth - it would
only be used when filling a rectangle, and not even a rectangle that was part of
some path. The rectangle had to be the only thing being rendered. This is fast 
enough
without being treated as a special case.

I think that's it...
As for testing: I tested this fairly thoroughly. I used dashed strokes, various 
affine transformations, lines longer than 2^16, and complicated GeneralPaths.
Everything looks good.

I also did some performance measurements. Everything is faster than it used to 
be,
although AA rendering is still much slower than closed source java. One of the
problems here is that when rendering large objects, the strips in 
Renderer:_endRendering
will be small, and _endRenderer's complexity, not taking into account methods 
it calls,
is O(numStrips*numEdges), because for every strip it has to go through what's 
left of the edge 
list. A better way to iterate through edges that are in a strip is needed.

NOTE: I did not make changes (2) and (3) to Renderer.java because I don't have 
time
today, and I wanted to put out something fairly polished as soon as possible. 
Also,
I think Renderer needs quite a bit more refactoring than just that. In 
particular the 
way it stores and iterates through edges, scalines, and crossings needs to be 
improved.
Right now it is too error-prone, and many variables have a far larger scope than
they should.

I would very much appreciate it if anyone could look over my changes, or comment
on the last two paragraphs above.

Thank you, 
Denis. 

- "Denis Lila"  wrote:

> Hello.
> 
> And, I just finished it. At least it compiled successfully. I'm sure
> there
> are a few runtime bugs. I'll try to have a webrev out by today.
> 
> Regards,
> Denis.
> 
> - "Jim Graham"  wrote:
> 
> > Hi Denis,
> >
> > float operations are pretty fast and they are usually done in a
> > separate
> > part of the processor so the compiler can schedule a lot of
> > bookkeeping
> > instructions to run in parallel with waiting for the results of the
> FP
> >
> > instruction.  In the end, they often end up being "free" if there
> are
> >
> > enough bookkeeping instructions (branches, fetching data, etc.) in
> > amongst the data.
> >
> > Given how many alternate instructions you are pointing out for the
> > fixed
> > point code it would be very likely that this would be a "win".
> >
> > The main reason that Pisces is implemented heavily in fixed point is
> > that it was originally written for the mobile platform where there
> are
> >
> > no FP instructions.  We don't have to worry about that on the
> desktop
> >
> > (any more).
> >
> > I strongly support you converting things to fp if you want to do
> > it...
> >
> > ...jim
> >
> > On 7/12/2010 8:05 AM, Denis Lila wrote:
>

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-13 Thread Denis Lila
Hello.

And, I just finished it. At least it compiled successfully. I'm sure there 
are a few runtime bugs. I'll try to have a webrev out by today.

Regards,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 
> float operations are pretty fast and they are usually done in a
> separate 
> part of the processor so the compiler can schedule a lot of
> bookkeeping 
> instructions to run in parallel with waiting for the results of the FP
> 
> instruction.  In the end, they often end up being "free" if there are
> 
> enough bookkeeping instructions (branches, fetching data, etc.) in 
> amongst the data.
> 
> Given how many alternate instructions you are pointing out for the
> fixed 
> point code it would be very likely that this would be a "win".
> 
> The main reason that Pisces is implemented heavily in fixed point is 
> that it was originally written for the mobile platform where there are
> 
> no FP instructions.  We don't have to worry about that on the desktop
> 
> (any more).
> 
> I strongly support you converting things to fp if you want to do
> it...
> 
>   ...jim
> 
> On 7/12/2010 8:05 AM, Denis Lila wrote:
> > Hello.
> >
> > Is it ok if I do this? I already started working on it on Friday.
> > I think I should be done by tomorrow.
> >
> > But yes, I agree that we should convert to floating point. As for
> > performance, there's also the fact that right now we're trading
> > one double multiplication for 2 casts to long, 1 long
> multiplication,
> > 1 bit shift of a long, and 1 cast back to an int. I don't know much
> > about how these are done in hardware, but it doesn't seem like
> they'd
> > be faster than the double multiplication.
> >
> > As for large coordinates, there's a bug report about it (one not
> > reported by me :) ) here:
> https://bugzilla.redhat.com/show_bug.cgi?id=597227
> > I submitted a matching bug report on bugs.sun.com (ID 6967436), but
> I
> > can't find it when I search for it.
> >
> > Denis.
> >
> > - "Jim Graham"  wrote:
> >
> >> Sigh...
> >>
> >> Pisces could really stand to upgrade to floats/doubles everywhere,
> for
> >>
> >> several reasons:
> >>
> >> - FPU ops are likely as fast if not faster on modern hardware due
> to
> >> parallel execution of FP instructions alongside regular
> instructions.
> >>
> >> - Don't have to worry about getting coordinates larger than 32K (I
> >> don't
> >> think this is well tested, but I imagine that Pisces will not deal
> >> with
> >> it very gracefully).
> >>
> >> - Lots of code is greatly simplified not having to deal with the
> >> semantics of how to do fixed point multiplies, divides, and
> >> conversions.
> >>
> >> I meant to do this during the original integration, but I wanted
> to
> >> get
> >> the task done as quickly as possible so that we could have an open
> >> source alternative to the closed Ductus code so I left that task
> for a
> >>
> >> later update.  But, now that a lot of work is being done on the
> code
> >> to
> >> fix it up, it might be better to do the float conversion now so
> that
> >> the
> >> maintenance is easier and before we end up creating a lot of new
> fixed
> >>
> >> point code.
> >>
> >> My plate is full right now, but hopefully I can interest someone
> else
> >> in
> >> doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)
> >>
> >>...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-13 Thread Jim Graham

Hi Denis,

float operations are pretty fast and they are usually done in a separate 
part of the processor so the compiler can schedule a lot of bookkeeping 
instructions to run in parallel with waiting for the results of the FP 
instruction.  In the end, they often end up being "free" if there are 
enough bookkeeping instructions (branches, fetching data, etc.) in 
amongst the data.


Given how many alternate instructions you are pointing out for the fixed 
point code it would be very likely that this would be a "win".


The main reason that Pisces is implemented heavily in fixed point is 
that it was originally written for the mobile platform where there are 
no FP instructions.  We don't have to worry about that on the desktop 
(any more).


I strongly support you converting things to fp if you want to do it...

...jim

On 7/12/2010 8:05 AM, Denis Lila wrote:

Hello.

Is it ok if I do this? I already started working on it on Friday.
I think I should be done by tomorrow.

But yes, I agree that we should convert to floating point. As for
performance, there's also the fact that right now we're trading
one double multiplication for 2 casts to long, 1 long multiplication,
1 bit shift of a long, and 1 cast back to an int. I don't know much
about how these are done in hardware, but it doesn't seem like they'd
be faster than the double multiplication.

As for large coordinates, there's a bug report about it (one not
reported by me :) ) here: https://bugzilla.redhat.com/show_bug.cgi?id=597227
I submitted a matching bug report on bugs.sun.com (ID 6967436), but I
can't find it when I search for it.

Denis.

- "Jim Graham"  wrote:


Sigh...

Pisces could really stand to upgrade to floats/doubles everywhere, for

several reasons:

- FPU ops are likely as fast if not faster on modern hardware due to
parallel execution of FP instructions alongside regular instructions.

- Don't have to worry about getting coordinates larger than 32K (I
don't
think this is well tested, but I imagine that Pisces will not deal
with
it very gracefully).

- Lots of code is greatly simplified not having to deal with the
semantics of how to do fixed point multiplies, divides, and
conversions.

I meant to do this during the original integration, but I wanted to
get
the task done as quickly as possible so that we could have an open
source alternative to the closed Ductus code so I left that task for a

later update.  But, now that a lot of work is being done on the code
to
fix it up, it might be better to do the float conversion now so that
the
maintenance is easier and before we end up creating a lot of new fixed

point code.

My plate is full right now, but hopefully I can interest someone else
in
doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)

...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-12 Thread Denis Lila
Hello.

Is it ok if I do this? I already started working on it on Friday.
I think I should be done by tomorrow.

But yes, I agree that we should convert to floating point. As for
performance, there's also the fact that right now we're trading 
one double multiplication for 2 casts to long, 1 long multiplication,
1 bit shift of a long, and 1 cast back to an int. I don't know much
about how these are done in hardware, but it doesn't seem like they'd
be faster than the double multiplication.

As for large coordinates, there's a bug report about it (one not
reported by me :) ) here: https://bugzilla.redhat.com/show_bug.cgi?id=597227
I submitted a matching bug report on bugs.sun.com (ID 6967436), but I
can't find it when I search for it.

Denis.

- "Jim Graham"  wrote:

> Sigh...
> 
> Pisces could really stand to upgrade to floats/doubles everywhere, for
> 
> several reasons:
> 
> - FPU ops are likely as fast if not faster on modern hardware due to 
> parallel execution of FP instructions alongside regular instructions.
> 
> - Don't have to worry about getting coordinates larger than 32K (I
> don't 
> think this is well tested, but I imagine that Pisces will not deal
> with 
> it very gracefully).
> 
> - Lots of code is greatly simplified not having to deal with the 
> semantics of how to do fixed point multiplies, divides, and
> conversions.
> 
> I meant to do this during the original integration, but I wanted to
> get 
> the task done as quickly as possible so that we could have an open 
> source alternative to the closed Ductus code so I left that task for a
> 
> later update.  But, now that a lot of work is being done on the code
> to 
> fix it up, it might be better to do the float conversion now so that
> the 
> maintenance is easier and before we end up creating a lot of new fixed
> 
> point code.
> 
> My plate is full right now, but hopefully I can interest someone else
> in 
> doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)
> 
>   ...jim


Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

2010-07-09 Thread Jim Graham

Sigh...

Pisces could really stand to upgrade to floats/doubles everywhere, for 
several reasons:


- FPU ops are likely as fast if not faster on modern hardware due to 
parallel execution of FP instructions alongside regular instructions.


- Don't have to worry about getting coordinates larger than 32K (I don't 
think this is well tested, but I imagine that Pisces will not deal with 
it very gracefully).


- Lots of code is greatly simplified not having to deal with the 
semantics of how to do fixed point multiplies, divides, and conversions.


I meant to do this during the original integration, but I wanted to get 
the task done as quickly as possible so that we could have an open 
source alternative to the closed Ductus code so I left that task for a 
later update.  But, now that a lot of work is being done on the code to 
fix it up, it might be better to do the float conversion now so that the 
maintenance is easier and before we end up creating a lot of new fixed 
point code.


My plate is full right now, but hopefully I can interest someone else in 
doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)


...jim