Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Jim Graham Wed, 28 Jul 2010 20:56:37 -0700

Hi Denis,

Only some minor comments so far:


Stroker.java:

- Should det be precomputed and saved? (You calculate it in theconstructor anyway, but don't save it.)

- Should test for uniform scale be precomputed?

- (Is test for uniform scale too strict? Can a rotated uniform scaleuse the same code as upright uniform scale?)- Why are m00_2_m01_2 et al no longer precomputed (they only need to beprecomputed if scale is not uniform which isn't common)?- lineLength method is "user space length" isn't it? - a moredescriptive name might help avoid confusion if someone is modifying codehere later.

- line 614 - missing space

Dasher.java:

- Line 187 - use "leftInThisDashSegment" here?

I still have to look at Renderer.java in depth, but I thought I'd sendthese minor comments along while they are fresh in my email buffer...


                        ...jim

Denis Lila wrote:

Hello.

And, here it is: http://icedtea.classpath.org/~dlila/webrevs/fpBetterAA/webrev/

Just as I thought, it's way faster than the original. I made a new
test, which is supposed to be more realistic than drawing 30000 length
lines. It consists of splitting a 1000x1000 frame in 100 10x10 squares
and in each square drawing 20 or so curves with random start/end/control
points in that square. In this case it's at least twice faster (~0.85
versus ~1.95 seconds).

Unfortunately I've had to do a lot of the same optimizations that I was
trying to remove (i.e. making everything a global variable). I tried
writing a higher level version of it, but it completely negated the
performance gains of the new algorithm. Anyway, it's still not as bad
as before because the algorithm is inherently clearer and we only iterate
through scan lines instead of iterating through strips and then scanlines
in that strip, and then edges, all in the same method.

It's also better organized, and logically separate parts of the code don't
really touch each other's variables much. And it's definitely better

commented.

I also made some minor changes outside of Renderer that did not appear in
the first version of this patch last week:

1. I fixed the issue Jim pointed out in Dasher, where if(origLen == leftInThisDashSegment) the dash index was not being incremented.

2. In dasher, I no longer copy the dash array.
3. I removed files PiscesMath.java and Transform4.java because they are no
longer needed.

Thanks,
Denis.

----- "Jim Graham" <james.gra...@oracle.com> wrote:

Woohoo, Denis!  I look forward to seeing the new version!

                ...jim

On 7/28/2010 5:51 AM, Denis Lila wrote:

Hello Jim.

This one performs almost identically to what is already there
in openjdk6 and 7, since it's exactly what I sent for review
last week, but with all the changes you suggested implemented.

I would actually like to ask you to not look at either one of
them. First of all, there is an ArrayIndexOutOfBoundsException
possible in emitRow.
And secondly, even if there wasn't, last night I implemented
your algorithm from ShapeSpanIterator.c to iterate through
the edges. I have yet to debug it, but it makes everything much,
much simpler, and it should make it far faster, so we get the best
of both worlds.

Thanks,
Denis.

----- "Jim Graham"<james.gra...@oracle.com>  wrote:

Hi Denis,

I'll try to get through both versions and see if I can find

anything

that was hurting performance with your EdgeLists.  I'm guessing

that

this version was created because of the performance issues you

found

with the EdgeList version?  Does this perform more closely to the
existing code than the EdgeList version?

                        ...jim

Denis Lila wrote:

Hello again.

This attachmet is a "can of worms" implementation without all the

fancy (and slow)

iteration. It also includes all of the other suggestions you sent

in

your first

review of Dasher and Renderer last week (most importantly, the

firstOrientation

issue, horizontal lines filtering, and adding prefixes to

variable

names to make

it clear whether they refer to pixels, or subpixels).

Regards,
Denis.

----- "Denis Lila"<dl...@redhat.com>  wrote:

Hello Jim.

I implemented your "can of worms" idea. It works, and it got rid

of

the biasing.
I wasn't able to send a webrev, but there are many changes and a

side

by side
comparison would probably be useless, so I just attached the

file.

hope this is
ok.

I also implemented a "better" iterating structure for the lines

and

the
strips and crossings. I think it is better in every way, except
performance.
The new file is more than 200 lines smaller than the old one.

The

only

members of Renderer are now the AA variables and the position
variables
(sx*, sy*, x*, y*).
What I've done is I added an EdgeList class, which encapsulates

all

the edge
related variables in the old Renderer. At first, I had an Edge

class

in addition
to the EdgeList class, and while this was much nicer, it turned

out

to

be too
expensive (see last paragraph).
I've also added a ScanLineIterator, so instead of _endRendering
iterating
through strips, and then calling renderStrip() which iterates

through

the
scanlines in that strip, and then through the crossings in that
scanline,
what happens now is that _endRendering uses the

Iterator<ScanLine>

to

iterate through each scanline, get get its crossings and iterate
through them
to accumulate the alpha. By the way, a ScanLine is a type

defined

by

an
interface which exports methods for getting the y coord of the

line,

the
number of crossings in it, the ith crossing, and a method for

sorting

its crossings.
The class that implements ScanLine is ScanLineIterator itself. I

made

a
ScanLine class, but I was afraid performance would suffer

because

of

all the
object creations (this turned out not to be an issue, after I

switched

to the
current way, and remeasured things). I did not switch back

because

this is
only slightly worse.

As for performance: I wrote a simple program that tries to draw

dashed path
that consists of about 160 dashed lines of width 1 and length

30000,

going
from the centre of the frame to some point. On my machine, this

takes

about 4.9
seconds in openjdk6, and 26 seconds using the attached file.

Back

when

I was using
the Edge class it took about 39 seconds. Everything without

hundres

of

thousands of
edges is not much slower
I have not changed any of the algorithms. ScanLineIterator still

goes

through
strips of the same size and computes crossings in every strip

using

the same
method as before, so I don't know why it's so slow. It can't be
because of anything
happening in _endRendering, because there are only about 9000
scanlines and for each
of them I've just added a few calls to one line getters (which

used

to

be direct
accesses into arrays).

Thanks,
Denis.

----- "Jim Graham"<james.gra...@oracle.com>  wrote:

Denis Lila wrote:

Hello Jim.

Thank you very much for taking the time to read through this.

169 - if origLen reaches the end of the dash exactly (the

"=="

case)

     You're right, I should. I can't just replace<= with ==

though,

because the results will be the same: in the equal case

origLen

will

become 0, and on the next iteration, the (origLen<

dash[idx]-phase)

will

be true, and we will do a goTo(x1,y1), which is what we just

did

in

the

previous iteration (unless dash[idx] is 0, in which case the

results

will be even worse). The best solution to this is to just do a

nested

check for the == case.

Ah, right - because there is no "break" when origLen becomes

zero.

Sounds like you're on it.

171 - Aren't x0,y0 stored as subpix values?  You would then

be

comparing

a subpix value to a non-subpix value.  Perhaps if the subpix

calls

are

moved to the top of the function I think this should work OK?

     That's true, they are. This is very puzzling. If a

horizontal

line is

added, when the crossings for it are being computed, dxBydy

should

be NaN, and

wouldn't an error be thrown when we try to cast to an int in

the

call to addCrossing?

I'm not sure - I didn't trace it through very far - I just

noted

that

the values were likely in different "resolutions".

194,197 - Shouldn't these be constants, or based on the

SUB_POS_XY?

     I suppose I should make a biasing constant. I don't think

they

should be based

on SUB_POS_XY though, because the biasing is done to subpixel

coordinates so

there is no danger that if our supersampling is fine enough

the

biasing will

make the coordinates jump over some scan line.

I'm guessing you punted on my "can of worms" suggestion then.

;-)

216 - if you save the sx0,sy0 before subpix'ing them then you

don't

have

to "unsubpix" them here.  (Though you still need the subpix

sy0

for

line

209 - or you could just call subpix on it for that line and

at

least

you'd save 1 call to sub/unsub).

     Ok. I'll just save subpixed and unsubpixed versions of

sx0,

sy0.

That should

eliminate all sx0,sy0 related calls to tosubpix and topix

except

in

moveTo.

and lineTo.  You may only need 3 of those values, though, if I
remember
my code reading well enough.

256,264 - casting to int is problematic.  It truncates

towards

which

means negatives are ceil'd and positives are floor'd.  It

would

be

best

to use floor here instead.  On the other hand, if negative

numbers

are

always "off the left side of the drawable" then this is moot.

     That's why I left it at int casting. Do you still think I

should

change it

to floor?

If you mean floor, I think it best to use floor.  Unless you

can

prove

that negatives aren't really an issue and that the strange

truncation

on
them won't be numerically a problem - but I don't think it is

worth

it

for this code.

Speaking of which, is there a good way to edit and build

openJDK

from eclipse?

Then this sort of embarrassing error could be avoided

(including

the

printStats() call).

I don't use Eclipse, sorry.  :-(

     As for Arrays.newSize()... I can't find it here:

http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/api/java/util/Arrays.html

Is this a new function added in 7?

Sorry, make that Arrays.copyOf(..., newSize).  I tried to type

the

name
from memory and got it wrong.

721 - Arrays.sort()

     I thought about using this, but I did some measurements,

and

it

turns out that

Arrays.sort() is a bit slower if the portion of the array

being

sorted has fewer

than about 70 elements.

I wonder what the typical number of elements is.  Is this

sorting

crossings per line?  Then simple primitives like circles should

only

have 2 per line, right?  Is it worth testing for really small

numbers

of
elements (much lower than 70) and doing a manual sort?  Or am I
misunderstanding what is being sorted there?

How comfortable do you feel with that conversion?

I'll try to do it and include it in a new patch along with,

hopefully, a better way

to iterate through strips, and especially crossings. Right now

all

the iteration

state is saved in global variables. This is... not good. I

spent

far

too much time last

week on bugs caused by this sort of thing. Ideally, any

members

that

can't be put at

the top of a class (like our edge and crossing data) should be

put

in classes of their own.

That sounds good, but also consider doing it in separate stages

to

reduce churn in the code reviewing (and you then have revs to

go

back

and test which stage caused a probem if we find a bug later):

- first get all on floats
- then change strip management
- then change to open coordinate intervals
- (or vice versa)

Do you have any ideas about how to iterate through edges in a

strip

without going through

every edge? I was thinking of maybe using some sort of tree to

split

the drawing surface,

but I haven't given it much thought.

If you look for something like the native code for
sun/java2d/pipe/ShapeSpanIterator.c you will see the way I

typically

like to do edge setup and enumeration.

That code uses the "half open interval" approach for both X and

intervals.

I then sort the edge list by "leading Y" and then move through

the

edges
using the following manner (kind of like an inch worm eating

the

edges,
maintaining a pointer to the beginning and end of an "active

list"

of

edges that are in play at any given time by virtue of having

their

range intersect the current sampling Y).  Note that I use an

array

of

edges and then a parallel array of pointers into the edges so

that

can
sort just the pointers and avoid moving tons of edge data

around.

Also,
later when I eliminate edges from the active list I just move

their

pointers into and out of view rather than having to copy the

edge

data.
   It is harder to do an array of pointers in Java, though -

perhaps

an

array of indices?  Here is some basic pseudocode:

start with lo and hi pointing at index 0 in edge list.
until edge list is exhausted {
      process edges between lo and hi (empty on first pass)
      scan from hi to lo and toss any edges that are exhausted
          (compacting remaining pointers towards hi)
      keep incrementing hi to accept any new edges coming into

play

      process edges between lo and hi to increment to the next

          (note that new edges from previous step may not
           need any processing in this stage if their starting
           Y equals the next Y to be sampled)
}

Gradually lo and hi make their way through the list.  The edges

above

hi
are always sorted in order of when they may come into play as

we

move

downward in Y.  The edges between lo and hi are also usually

kept

sorted
by their current X so that the stage that processes them into

spans

can
just run through them.  The edges below lo are usually random

garbage

because no care was taken during the "pruning" step to worry

about

what
happens to the pointer table down there as lo is incremented

(the

algorithm only ever looks up the array of pointers).

I hope that helps...

                        ...jim

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Reply via email to