Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Jim Graham Tue, 20 Jul 2010 20:48:04 -0700

Hi Denis,

This is awesome!  Thanks for doing this.

I agree with all of your comments below. Here are some thoughts on thenew code:


Dasher.java:

91 - is there a reason to copy the dash array? Theoretically if thisobject were used in an arbitrary fashion it is good sense to protectagainst data changing out from under it, but in practice we only everfeed it the array that came from a BasicStroke object which was alreadyprotected from arbitrary changes and is copied every time we fetch it sothis is one more unfortunately extra fetch that isn't needed inpractice. (Ideally we'd instrument an extractor for BasicStroke so wecould get the raw array without the defensive copy in getDashArray()...)

169 - if origLen reaches the end of the dash exactly (the "==" case)then shouldn't you flip the dashOn variable? That would happen if youdrop through to the following code (i.e. change the test to just "<"),but it would require a bit more calculation. If you leave the phaseequal to the dash size (which is what happens now in the "==" case) thenthe next time through the if will fail and you will end up with a secondgoTo to the same coordinates that the previous dash ended on beforeflipping the dashOn variable and circling back - which is a waste of abunch of calculations. Or was there some bookkeeping issue that arosewhen the "==" case dropped through?


PiscesRenderingEngine.java

248 - is the normalize parameter used yet? Not a complaint - I justdidn't see it getting used and wanted to make sure I didn't miss something.

257-261 - both Stroker and Dasher grab the t4 values into localvariables. Hotspot might cover up this sin if the T4 stays in the younggeneration, but for a long path the T4 may move into an older generationand take more to collect. Maybe it would be worth just passing in the 4values to the Stroker and Dasher constructors rather than encapsulatingthem in a T4 that is immediately stripped and ignored?


367 - you left a stats printing call in here.

Renderer.java:

I note some confusion below over comparing subpixel values to pixelvalues. Perhaps a convention where all subpixel values would be invariables with "sub" in their name or something like that would make itmore maintainable (and correct in the short term).

47,48 - any reason to move these fields around? (Not a complaint - justwondering if you had a reason.)

129 - note this could overflow, but I imagine that the bounds came froma device clip which means there would have to be some drawable createdthat was larger than MAX_INT/SUB_POS_XY. If that is true then itprobably isn't worth worrying about, but a comment might help point thatout.

157 - Personally I'd call close before I started working on the newdata, but that's just my coding layout style. Also, does close() alwaysdo the "right thing" if no path has been created yet? For one thing itlooks like it always results in a call to "lineTo" no matter what whichI'm guessing "happens to" fall into the short-circuit for horizontallines in lineTo, but I haven't verified it. In fact I think I see aproblem in that firstOrientation is not set to 0 and so if you have 2moveto's in a row then the second one will call close() and find thatthe orientations don't match and increment flips. That isn't a criticalproblem since flips is only used to allocate memory, but it wouldallocate more than needed in that case. (not to mention emittingunnecessary linetos) Suggestion - set firstOrientation to 0 in movetoand test for firstOrientation==0 in close() so that close() doesn't addany geometry to something that has no deflections yet.

171 - Aren't x0,y0 stored as subpix values? You would then be comparinga subpix value to a non-subpix value. Perhaps if the subpix calls aremoved to the top of the function I think this should work OK?


194,197 - Shouldn't these be constants, or based on the SUB_POS_XY?

---- Warning: potential can of worms ----

BTW, I never traced this through to see why it was needed, but I've seena lot of renderers get afraid of vertices on pixel boundaries forsimilar reasons, but most of them are wrong because they miss the factthat the last coordinate on a line should be ignored by the scanlinecounts because of the rule of "any pixel on a horizontal edge should beoutside the path if the area right below it is outside". Basically, wefill from x0 up to but not including x1 and also from y0 up to but notincluding y1. Thus, only y0 would trigger a scanline increment and y1would never do so. If the following segment immediately heads up thenboth of those segments would "end" on the same scanline and be ignored.If the following segment continues down then the first does nottrigger a scanline crossing, but the second does. If we can make surethe bookkeeping is done that way in the rest of the Renderer then we canget rid of this fudge factor entirely. I discuss some of these changesbelow, but in the end, 5 or 6 methods would need their math updated tomake this work right and I'm not sure if I've described it well enoughfor you to make all the needed changes. Perhaps this could be afollow-on project, but it would greatly simplify a lot of the code in here.

---- End can of worms ----

216 - if you save the sx0,sy0 before subpix'ing them then you don't haveto "unsubpix" them here. (Though you still need the subpix sy0 for line209 - or you could just call subpix on it for that line and at leastyou'd save 1 call to sub/unsub).

236,237 - here is what I was talking about above (can of worms). minYand maxY should probably both be ceil'd and then maxY would be the firstscanline "not" processed and then the problem above about doublecrossings at the scanlines wouldn't be an issue.


243 - "minY >= maxY" if the above is done (can of worms).

262 - "y < maxY" if the above is done (can of worms).

256,264 - casting to int is problematic. It truncates towards 0 whichmeans negatives are ceil'd and positives are floor'd. It would be bestto use floor here instead. On the other hand, if negative numbers arealways "off the left side of the drawable" then this is moot.

317,318 - blech (can of worms) - looks like more math designed for the"minY up to and including maxY" approach - I far prefer the "minY up to,but not including maxY" approach as mentioned above. ;-)_endRendering(), setCrossingsExtents, computeCrossingsForEdge (alreadymentioned above), computeBounds, and renderStrip() would all need to beupdated for that. How comfortable do you feel with that conversion?

597 - you deleted the call to arraycopy? D'oh! (Actually,Arrays.newSize() is faster than "allocate and copy" so it couldn't hurtto switch to it here. Also, expanding by 10% seems a littleconservative for large edge lists. I'd rather see "double, but nevermore than 10*INIT_EDGES" or something like that. This may be more"limited memory on mobile" thinking here.

611 - (can of worms) "ceil >= ceil" if the above scanline crossingchange is done. OK, enough for now on harping on the can of worms.


721 - Arrays.sort()

Stroker.java

- I didn't get to this file yet. I'm going to send these comments alongso you can take a look at them and then review Stroker tomorrow...


                        ...jim

Denis Lila wrote:

Hello again.

I know I promised this last week, and I'm sorry it's late, but some nastybugs popped up. The webrev is at:http://icedtea.classpath.org/~dlila/webrevs/floatingPoint/webrev/


I took this opportunity to make some changes that aren't related to floating
point conversion, since this effort wasn't really directed at solving any one
particular bug, but towards general refactoring and improving of Pisces 
(although
it does solve a certain bug).

1.I removed some write-only variables in Renderer and Stroker.

2.I removed Dasher and Stroker's ability for the same object to be used with

more than one output, transform, width, etc. Jim Graham said we should consider
removing the no argument constructor for Dasher in a previous e-mail (which
is equivalent to doing what I've done since you can't have this functionality
without a no argument constructor and methods like setParameters and setOutput).
I thought this was a good idea because this functionality was never being used,

it clutters things up (among other things, it necessitates making the clientscall setOutput and setParameters before the object can be used. This sort of

thing is not a good idea since methods should be as self-contained as possible),
and even if it is used, all it does is save us the creation of some objects. 
Since

a huge number of these objects would never be needed and since they are notvery expensive to create (Stroker is the biggest, and it has about 38 members),this is a premature optimization.


3.
    (2) allowed me to declare a whole bunch of members final (things like 
output,
lineWidth, transformation matrix entries and anything that can't change).

4.

I changed the crossing sorting algorithm in Renderer from bubble sort toinsertion sort. Not a huge difference, but mine is shorter, it should perform

slightly better, and I think it the algorithm is easier to understand.

5.
    I inserted comments on things which used to confuse me. Please check these -
when reading code, there is nothing worse than a wrong explanation in a comment.

6.
    I removed the if(false &&... block in Renderer. I tried to make it work some
time ago, but it was complicated and more trouble than it was worth - it would
only be used when filling a rectangle, and not even a rectangle that was part of
some path. The rectangle had to be the only thing being rendered. This is fast 
enough
without being treated as a special case.

I think that's it...

As for testing: I tested this fairly thoroughly. I used dashed strokes, variousaffine transformations, lines longer than 2^16, and complicated GeneralPaths.

Everything looks good.

I also did some performance measurements. Everything is faster than it used to 
be,
although AA rendering is still much slower than closed source java. One of the
problems here is that when rendering large objects, the strips in 
Renderer:_endRendering
will be small, and _endRenderer's complexity, not taking into account methods 
it calls,

is O(numStrips*numEdges), because for every strip it has to go through what's left of the edgelist. A better way to iterate through edges that are in a strip is needed.


NOTE: I did not make changes (2) and (3) to Renderer.java because I don't have 
time
today, and I wanted to put out something fairly polished as soon as possible. 
Also,

I think Renderer needs quite a bit more refactoring than just that. In particular theway it stores and iterates through edges, scalines, and crossings needs to be improved.

Right now it is too error-prone, and many variables have a far larger scope than
they should.

I would very much appreciate it if anyone could look over my changes, or comment
on the last two paragraphs above.

Thank you,Denis.

----- "Denis Lila" <[email protected]> wrote:

Hello.

And, I just finished it. At least it compiled successfully. I'm sure
there
are a few runtime bugs. I'll try to have a webrev out by today.

Regards,
Denis.

----- "Jim Graham" <[email protected]> wrote:

Hi Denis,

float operations are pretty fast and they are usually done in a
separate
part of the processor so the compiler can schedule a lot of
bookkeeping
instructions to run in parallel with waiting for the results of the

FP

instruction.  In the end, they often end up being "free" if there

are

enough bookkeeping instructions (branches, fetching data, etc.) in
amongst the data.

Given how many alternate instructions you are pointing out for the
fixed
point code it would be very likely that this would be a "win".

The main reason that Pisces is implemented heavily in fixed point is
that it was originally written for the mobile platform where there

are

no FP instructions.  We don't have to worry about that on the

desktop

(any more).

I strongly support you converting things to fp if you want to do
it...

                        ...jim

On 7/12/2010 8:05 AM, Denis Lila wrote:

Hello.

Is it ok if I do this? I already started working on it on Friday.
I think I should be done by tomorrow.

But yes, I agree that we should convert to floating point. As for
performance, there's also the fact that right now we're trading
one double multiplication for 2 casts to long, 1 long

multiplication,

1 bit shift of a long, and 1 cast back to an int. I don't know

much

about how these are done in hardware, but it doesn't seem like

they'd

be faster than the double multiplication.

As for large coordinates, there's a bug report about it (one not
reported by me :) ) here:

https://bugzilla.redhat.com/show_bug.cgi?id=597227

I submitted a matching bug report on bugs.sun.com (ID 6967436),

but

can't find it when I search for it.

Denis.

----- "Jim Graham"<[email protected]>  wrote:

Sigh...

Pisces could really stand to upgrade to floats/doubles

everywhere,

for

several reasons:

- FPU ops are likely as fast if not faster on modern hardware due

to

parallel execution of FP instructions alongside regular

instructions.

- Don't have to worry about getting coordinates larger than 32K

(I

don't
think this is well tested, but I imagine that Pisces will not

deal

with
it very gracefully).

- Lots of code is greatly simplified not having to deal with the
semantics of how to do fixed point multiplies, divides, and
conversions.

I meant to do this during the original integration, but I wanted

to

get
the task done as quickly as possible so that we could have an

open

source alternative to the closed Ductus code so I left that task

for a

later update.  But, now that a lot of work is being done on the

code

to
fix it up, it might be better to do the float conversion now so

that

the
maintenance is easier and before we end up creating a lot of new

fixed

point code.

My plate is full right now, but hopefully I can interest someone

else

in
doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)

                        ...jim

Re: [OpenJDK 2D-Dev] Various fixes to pisces stroke widening code

Reply via email to