Hi Denis,

This is awesome!  Thanks for doing this.

I agree with all of your comments below. Here are some thoughts on the new code:

Dasher.java:

91 - is there a reason to copy the dash array? Theoretically if this object were used in an arbitrary fashion it is good sense to protect against data changing out from under it, but in practice we only ever feed it the array that came from a BasicStroke object which was already protected from arbitrary changes and is copied every time we fetch it so this is one more unfortunately extra fetch that isn't needed in practice. (Ideally we'd instrument an extractor for BasicStroke so we could get the raw array without the defensive copy in getDashArray()...)

169 - if origLen reaches the end of the dash exactly (the "==" case) then shouldn't you flip the dashOn variable? That would happen if you drop through to the following code (i.e. change the test to just "<"), but it would require a bit more calculation. If you leave the phase equal to the dash size (which is what happens now in the "==" case) then the next time through the if will fail and you will end up with a second goTo to the same coordinates that the previous dash ended on before flipping the dashOn variable and circling back - which is a waste of a bunch of calculations. Or was there some bookkeeping issue that arose when the "==" case dropped through?

PiscesRenderingEngine.java

248 - is the normalize parameter used yet? Not a complaint - I just didn't see it getting used and wanted to make sure I didn't miss something.

257-261 - both Stroker and Dasher grab the t4 values into local variables. Hotspot might cover up this sin if the T4 stays in the young generation, but for a long path the T4 may move into an older generation and take more to collect. Maybe it would be worth just passing in the 4 values to the Stroker and Dasher constructors rather than encapsulating them in a T4 that is immediately stripped and ignored?

367 - you left a stats printing call in here.

Renderer.java:

I note some confusion below over comparing subpixel values to pixel values. Perhaps a convention where all subpixel values would be in variables with "sub" in their name or something like that would make it more maintainable (and correct in the short term).

47,48 - any reason to move these fields around? (Not a complaint - just wondering if you had a reason.)

129 - note this could overflow, but I imagine that the bounds came from a device clip which means there would have to be some drawable created that was larger than MAX_INT/SUB_POS_XY. If that is true then it probably isn't worth worrying about, but a comment might help point that out.

157 - Personally I'd call close before I started working on the new data, but that's just my coding layout style. Also, does close() always do the "right thing" if no path has been created yet? For one thing it looks like it always results in a call to "lineTo" no matter what which I'm guessing "happens to" fall into the short-circuit for horizontal lines in lineTo, but I haven't verified it. In fact I think I see a problem in that firstOrientation is not set to 0 and so if you have 2 moveto's in a row then the second one will call close() and find that the orientations don't match and increment flips. That isn't a critical problem since flips is only used to allocate memory, but it would allocate more than needed in that case. (not to mention emitting unnecessary linetos) Suggestion - set firstOrientation to 0 in moveto and test for firstOrientation==0 in close() so that close() doesn't add any geometry to something that has no deflections yet.

171 - Aren't x0,y0 stored as subpix values? You would then be comparing a subpix value to a non-subpix value. Perhaps if the subpix calls are moved to the top of the function I think this should work OK?

194,197 - Shouldn't these be constants, or based on the SUB_POS_XY?

---- Warning: potential can of worms ----
BTW, I never traced this through to see why it was needed, but I've seen a lot of renderers get afraid of vertices on pixel boundaries for similar reasons, but most of them are wrong because they miss the fact that the last coordinate on a line should be ignored by the scanline counts because of the rule of "any pixel on a horizontal edge should be outside the path if the area right below it is outside". Basically, we fill from x0 up to but not including x1 and also from y0 up to but not including y1. Thus, only y0 would trigger a scanline increment and y1 would never do so. If the following segment immediately heads up then both of those segments would "end" on the same scanline and be ignored. If the following segment continues down then the first does not trigger a scanline crossing, but the second does. If we can make sure the bookkeeping is done that way in the rest of the Renderer then we can get rid of this fudge factor entirely. I discuss some of these changes below, but in the end, 5 or 6 methods would need their math updated to make this work right and I'm not sure if I've described it well enough for you to make all the needed changes. Perhaps this could be a follow-on project, but it would greatly simplify a lot of the code in here.
---- End can of worms ----

216 - if you save the sx0,sy0 before subpix'ing them then you don't have to "unsubpix" them here. (Though you still need the subpix sy0 for line 209 - or you could just call subpix on it for that line and at least you'd save 1 call to sub/unsub).

236,237 - here is what I was talking about above (can of worms). minY and maxY should probably both be ceil'd and then maxY would be the first scanline "not" processed and then the problem above about double crossings at the scanlines wouldn't be an issue.

243 - "minY >= maxY" if the above is done (can of worms).

262 - "y < maxY" if the above is done (can of worms).

256,264 - casting to int is problematic. It truncates towards 0 which means negatives are ceil'd and positives are floor'd. It would be best to use floor here instead. On the other hand, if negative numbers are always "off the left side of the drawable" then this is moot.

317,318 - blech (can of worms) - looks like more math designed for the "minY up to and including maxY" approach - I far prefer the "minY up to, but not including maxY" approach as mentioned above. ;-) _endRendering(), setCrossingsExtents, computeCrossingsForEdge (already mentioned above), computeBounds, and renderStrip() would all need to be updated for that. How comfortable do you feel with that conversion?

597 - you deleted the call to arraycopy? D'oh! (Actually, Arrays.newSize() is faster than "allocate and copy" so it couldn't hurt to switch to it here. Also, expanding by 10% seems a little conservative for large edge lists. I'd rather see "double, but never more than 10*INIT_EDGES" or something like that. This may be more "limited memory on mobile" thinking here.

611 - (can of worms) "ceil >= ceil" if the above scanline crossing change is done. OK, enough for now on harping on the can of worms.

721 - Arrays.sort()

Stroker.java

- I didn't get to this file yet. I'm going to send these comments along so you can take a look at them and then review Stroker tomorrow...

                        ...jim

Denis Lila wrote:
Hello again.

I know I promised this last week, and I'm sorry it's late, but some nasty bugs popped up. The webrev is at: http://icedtea.classpath.org/~dlila/webrevs/floatingPoint/webrev/

I took this opportunity to make some changes that aren't related to floating
point conversion, since this effort wasn't really directed at solving any one
particular bug, but towards general refactoring and improving of Pisces 
(although
it does solve a certain bug).
1. I removed some write-only variables in Renderer and Stroker.

2. I removed Dasher and Stroker's ability for the same object to be used with
more than one output, transform, width, etc. Jim Graham said we should consider
removing the no argument constructor for Dasher in a previous e-mail (which
is equivalent to doing what I've done since you can't have this functionality
without a no argument constructor and methods like setParameters and setOutput).
I thought this was a good idea because this functionality was never being used,
it clutters things up (among other things, it necessitates making the clients call setOutput and setParameters before the object can be used. This sort of
thing is not a good idea since methods should be as self-contained as possible),
and even if it is used, all it does is save us the creation of some objects. 
Since
a huge number of these objects would never be needed and since they are not very expensive to create (Stroker is the biggest, and it has about 38 members), this is a premature optimization.

3.
    (2) allowed me to declare a whole bunch of members final (things like 
output,
lineWidth, transformation matrix entries and anything that can't change).

4.
I changed the crossing sorting algorithm in Renderer from bubble sort to insertion sort. Not a huge difference, but mine is shorter, it should perform
slightly better, and I think it the algorithm is easier to understand.

5.
    I inserted comments on things which used to confuse me. Please check these -
when reading code, there is nothing worse than a wrong explanation in a comment.

6.
    I removed the if(false &&... block in Renderer. I tried to make it work some
time ago, but it was complicated and more trouble than it was worth - it would
only be used when filling a rectangle, and not even a rectangle that was part of
some path. The rectangle had to be the only thing being rendered. This is fast 
enough
without being treated as a special case.

I think that's it...
As for testing: I tested this fairly thoroughly. I used dashed strokes, various affine transformations, lines longer than 2^16, and complicated GeneralPaths.
Everything looks good.

I also did some performance measurements. Everything is faster than it used to 
be,
although AA rendering is still much slower than closed source java. One of the
problems here is that when rendering large objects, the strips in 
Renderer:_endRendering
will be small, and _endRenderer's complexity, not taking into account methods 
it calls,
is O(numStrips*numEdges), because for every strip it has to go through what's left of the edge list. A better way to iterate through edges that are in a strip is needed.

NOTE: I did not make changes (2) and (3) to Renderer.java because I don't have 
time
today, and I wanted to put out something fairly polished as soon as possible. 
Also,
I think Renderer needs quite a bit more refactoring than just that. In particular the way it stores and iterates through edges, scalines, and crossings needs to be improved.
Right now it is too error-prone, and many variables have a far larger scope than
they should.

I would very much appreciate it if anyone could look over my changes, or comment
on the last two paragraphs above.

Thank you, Denis.
----- "Denis Lila" <dl...@redhat.com> wrote:

Hello.

And, I just finished it. At least it compiled successfully. I'm sure
there
are a few runtime bugs. I'll try to have a webrev out by today.

Regards,
Denis.

----- "Jim Graham" <james.gra...@oracle.com> wrote:

Hi Denis,

float operations are pretty fast and they are usually done in a
separate
part of the processor so the compiler can schedule a lot of
bookkeeping
instructions to run in parallel with waiting for the results of the
FP
instruction.  In the end, they often end up being "free" if there
are
enough bookkeeping instructions (branches, fetching data, etc.) in
amongst the data.

Given how many alternate instructions you are pointing out for the
fixed
point code it would be very likely that this would be a "win".

The main reason that Pisces is implemented heavily in fixed point is
that it was originally written for the mobile platform where there
are
no FP instructions.  We don't have to worry about that on the
desktop
(any more).

I strongly support you converting things to fp if you want to do
it...

                        ...jim

On 7/12/2010 8:05 AM, Denis Lila wrote:
Hello.

Is it ok if I do this? I already started working on it on Friday.
I think I should be done by tomorrow.

But yes, I agree that we should convert to floating point. As for
performance, there's also the fact that right now we're trading
one double multiplication for 2 casts to long, 1 long
multiplication,
1 bit shift of a long, and 1 cast back to an int. I don't know
much
about how these are done in hardware, but it doesn't seem like
they'd
be faster than the double multiplication.

As for large coordinates, there's a bug report about it (one not
reported by me :) ) here:
https://bugzilla.redhat.com/show_bug.cgi?id=597227
I submitted a matching bug report on bugs.sun.com (ID 6967436),
but
I
can't find it when I search for it.

Denis.

----- "Jim Graham"<james.gra...@oracle.com>  wrote:

Sigh...

Pisces could really stand to upgrade to floats/doubles
everywhere,
for
several reasons:

- FPU ops are likely as fast if not faster on modern hardware due
to
parallel execution of FP instructions alongside regular
instructions.
- Don't have to worry about getting coordinates larger than 32K
(I
don't
think this is well tested, but I imagine that Pisces will not
deal
with
it very gracefully).

- Lots of code is greatly simplified not having to deal with the
semantics of how to do fixed point multiplies, divides, and
conversions.

I meant to do this during the original integration, but I wanted
to
get
the task done as quickly as possible so that we could have an
open
source alternative to the closed Ductus code so I left that task
for a
later update.  But, now that a lot of work is being done on the
code
to
fix it up, it might be better to do the float conversion now so
that
the
maintenance is easier and before we end up creating a lot of new
fixed
point code.

My plate is full right now, but hopefully I can interest someone
else
in
doing a cleanup cycle?  (Donning my Tom Sawyer hat... ;-)

                        ...jim

Reply via email to