Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Jim Graham

Hi Denis,

The problem is that pruning complicates the inner loop that advances the 
scanline and you can't tell without scanning every segment in play 
whether you need to do it in its own loop.  Thus, for one of my test 
cases it was really expensive without some up front record of whether or 
not the pruning was needed.


Also, I keep a count so I can amortize the "widenArray" call.  Before I 
was calling "widen(..., 1)" inside the loop that drew new edges in from 
the bucket.  I could also have added an additional loop to count the new 
edges, but that would have been expensive as well...


...jim

On 11/8/2010 3:23 PM, Denis Lila wrote:

Hi Jim.

I like the new Renderer, but I have a question about edgeBucketCount.

As far as I can see it is only used so that we can tell whether
any given bucket contains only edges that go all the way to the last
(or beyond) scanline. Is this really common enough that we gain by
treating it as a special case? Would it not be better to assume that
every bucket needs pruning and save some memory (and possibly resulting
in better cache performance)?

Thanks,
Denis.

- "Jim Graham"  wrote:


It's still a work in progress, but I've cleaned up a lot of logic and

made it faster in a number of ways.  Note that I've abstracted out the

cache stuff and created an "AlphaConsumer" interface which may evolve

over time.

In FX we actually consume alphas in larger chunks than the code in JDK

which was driven by Ductus's 32x32 mandate, so I would have had to
make
completely incompatible changes to emitRow - so I moved it behind an
interface.  For the JDK code, if you want to integrate this version, I

would have the cache implement the new interface and move your version

of emitRow into the Cache object.  I'd send you the new code for my
AlphaConsumer, but it is incompatible with what you need to cache so
it
won't help you.

You'll also need a bit of un-translation cleanup as we have mirrors of

all of java.awt.geom with special new APIs that FX needs.

...jim

On 11/8/2010 6:40 AM, Denis Lila wrote:

Hi Jim.


Also, I've gotten another 20% improvement out of the design with a

few

more tweaks.  (Though I measured the 20% in the stripped down

version

that I'm prototyping with FX so I'm not sure how much of that 20%
would show up through the layers of the 2D code.  Overall, I've

about

doubled the frame rates of the prototype since your first drop that

you

checked in to the OpenJDK repository.)


Can I see your new version?


Attached.


How about looking more at the stroking end of the process and I'll

dig

a little more into optimal rasterization code.  I have a lot of
experience with optimizing rasterizer code (and JNI if it comes to

that), but

very little with the curve manipulations involved in stroking (math

is so

*hard* at my age ;-)...


Sounds good. Have you implemented your idea of processing one pixel

row at a

time, as opposed to processing subpixel rows? If not, I could do

that.

Not yet.  Right now I've gotten a lot of mileage out of a few tweaks
of
the bookkeeping of the sample-row-at-a-time version.  I'm still
mulling
over exactly how to make that go faster.

...jim


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Jim Graham

Hi Denis,

On 11/8/2010 2:39 PM, Denis Lila wrote:

Finally, I discovered (while testing for other problems) that the
curves are not truly monotonic after slicing them.  I realized this years ago
when I was writing my Area code (see sun.awt.geom.Curve) and put in
tweaking code to make them monotonic after they were split.  They are
never off by more than a few bits, but you can't trust the curve
splitting math to generate purely monotonic segments based on a t
generated by some unrelated math.  Sometimes the truly horizontal or
vertical t value requires more precision than a float (or even a
double) can provide and splitting at the highest representable float less than
the t value produces a pair of curves on one side of the hill and
splitting at the next float value (which is greater than the true t
value) produces curves on the other side of the hill.  Also, when the
curve has been split a few times already, the t values loose accuracy
with each split.  This will all be moot if I can eliminate the
splitting code from the renderer, but it may also play a factor in the
stroke/dash
code...


Making curves monotonic is only used for optimization purposes,
so it can't see how it would affect rendering correctness.


All lines generated from a given "allegedly monotonic" curve are 
recorded with the same "or" (orientation) value.  But, if the curves are 
not truly monotonic then it might be theoretically possible to generate 
a line that is backwards with respect to the expected orientation.  It 
would then get recorded in the edges array with the wrong orientation 
and slope and then rasterization might unravel.


Fortunately, the non-monotonicity is limited to a few bits of precision 
so this may never generate an errant edge in practice unless flattening 
gets really fine-grained...


...jim


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Denis Lila
Hi Jim.

I like the new Renderer, but I have a question about edgeBucketCount.

As far as I can see it is only used so that we can tell whether
any given bucket contains only edges that go all the way to the last
(or beyond) scanline. Is this really common enough that we gain by
treating it as a special case? Would it not be better to assume that
every bucket needs pruning and save some memory (and possibly resulting
in better cache performance)?

Thanks,
Denis.

- "Jim Graham"  wrote:

> It's still a work in progress, but I've cleaned up a lot of logic and
> 
> made it faster in a number of ways.  Note that I've abstracted out the
> 
> cache stuff and created an "AlphaConsumer" interface which may evolve
> 
> over time.
> 
> In FX we actually consume alphas in larger chunks than the code in JDK
> 
> which was driven by Ductus's 32x32 mandate, so I would have had to
> make 
> completely incompatible changes to emitRow - so I moved it behind an 
> interface.  For the JDK code, if you want to integrate this version, I
> 
> would have the cache implement the new interface and move your version
> 
> of emitRow into the Cache object.  I'd send you the new code for my 
> AlphaConsumer, but it is incompatible with what you need to cache so
> it 
> won't help you.
> 
> You'll also need a bit of un-translation cleanup as we have mirrors of
> 
> all of java.awt.geom with special new APIs that FX needs.
> 
>   ...jim
> 
> On 11/8/2010 6:40 AM, Denis Lila wrote:
> > Hi Jim.
> >
> >> Also, I've gotten another 20% improvement out of the design with a
> few
> >> more tweaks.  (Though I measured the 20% in the stripped down
> version
> >> that I'm prototyping with FX so I'm not sure how much of that 20%
> >> would show up through the layers of the 2D code.  Overall, I've
> about
> >> doubled the frame rates of the prototype since your first drop that
> you
> >> checked in to the OpenJDK repository.)
> >
> > Can I see your new version?
> 
> Attached.
> 
> >> How about looking more at the stroking end of the process and I'll
> dig
> >> a little more into optimal rasterization code.  I have a lot of
> >> experience with optimizing rasterizer code (and JNI if it comes to
> that), but
> >> very little with the curve manipulations involved in stroking (math
> is so
> >> *hard* at my age ;-)...
> >
> > Sounds good. Have you implemented your idea of processing one pixel
> row at a
> > time, as opposed to processing subpixel rows? If not, I could do
> that.
> 
> Not yet.  Right now I've gotten a lot of mileage out of a few tweaks
> of 
> the bookkeeping of the sample-row-at-a-time version.  I'm still
> mulling 
> over exactly how to make that go faster.
> 
>   ...jim


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Denis Lila
Hi Jim.

> A couple of questions about the code that I haven't touched...
> Is there some reason why the AFD for cubics doesn't have any tests for
> dddxy (the "constants" for its equations), but the AFD for quads is 
> testing the ddxy on every loop?  I know that these values do change
> when the AFD variables are doubled or halved, but why does the cubic
> version get away with only testing out to the n-1th order differences but the
> quad version has to test out to the nth order differences?

That's because I used to have only one AFD function that worked for both
cubics and quadratics. It simply treated quadratics as cubics with the
first coefficient 0. I decided to split them up because we might want
to do different things for quadratics, but I didn't get around to doing
any of them (yet) so that's why it is the way it is.

> Also, what is the value of breaking the pieces into monotonic segments
> prior to flattening?  Is it simply amortizing the cost of determining
> if the segment is up or down?  I guess this used to be done because we 
> needed monotonic (in Y) curve segments since we did a top-down
> iteration across all segments, but now they aren't in the rasterization loop. 
> If it is simply a performance issue then I may experiment with
> eliminating that stage and seeing if I can make it go faster overall.

Well, making them monotonic makes updating edgeM[in|ax][X|Y] more efficient.
We only have to do the tests once per subdivided curve instead of once per
line. Also it helps with making our lines have y2>y1. Without monotonicity
we'd have to test each individual line, instead of just reversing the 
curve coordinate array. However, for the latter, all we need is monotonicity
in Y. We should make some measurements to determine whether the above
are enough to compensate for the subdivision costs.

> Finally, I discovered (while testing for other problems) that the
> curves are not truly monotonic after slicing them.  I realized this years ago
> when I was writing my Area code (see sun.awt.geom.Curve) and put in 
> tweaking code to make them monotonic after they were split.  They are
> never off by more than a few bits, but you can't trust the curve 
> splitting math to generate purely monotonic segments based on a t 
> generated by some unrelated math.  Sometimes the truly horizontal or 
> vertical t value requires more precision than a float (or even a
> double) can provide and splitting at the highest representable float less than
> the t value produces a pair of curves on one side of the hill and 
> splitting at the next float value (which is greater than the true t 
> value) produces curves on the other side of the hill.  Also, when the
> curve has been split a few times already, the t values loose accuracy
> with each split.  This will all be moot if I can eliminate the
> splitting code from the renderer, but it may also play a factor in the
> stroke/dash 
> code...

Making curves monotonic is only used for optimization purposes,
so it can't see how it would affect rendering correctness.

Regards,
Denis.

- "Jim Graham"  wrote:

> 
>   ...jim


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Jim Graham
Also, some things to note in the new version - things I had to "fix" not 
related to performance.


In endRendering the pixel bounds of the edges needed to be computed and 
passed along to the next stage.  Unfortunately the code in endRendering 
computed the sub-pixel bounds and then simply shifted them to get the 
pixel addresses of those bounds.  For the bottom and right edges, this 
means that a ceiling operation was performed on the sub-pixel data, but 
a floor operation was performed on the conversion to pixel addresses. 
This means that the bottom few sub-pixel rows could have been sliced off 
in these calculations.  I restructured the calculations there to make 
sure that it produced the pixel bounds of all of the subpixel rows. 
Also, I use a ceil() operation on the sub-pixel values because we really 
want to know what is the next sub-pixel sample row/column that we will 
cross and then get the (pixel) bounds of those sample row/columns.


I think there may have been a couple of other places where the "which 
pixel are we on" code got sloppy along similar lines to the endRendering 
case.


The conditions for sending the alphas for the last row changed a little. 
 I need to have every row visited in my AlphaConsumer because it is 
cleaning out an alpha tile as it incorporates the new alphas.  Thus, I 
needed the logic to call "getAndClear" even if there were no alphas to 
deliver.  I don't think that change would have affected your version 
with the PiscesCache since you can just finalize the cache when the 
alphas are done and clearing the alphaRow[] array is irrelevant at that 
final stage.  The way this works may change as I continue optimizing.


...jim

On 11/8/2010 6:40 AM, Denis Lila wrote:

Hi Jim.


Also, I've gotten another 20% improvement out of the design with a few
more tweaks.  (Though I measured the 20% in the stripped down version
that I'm prototyping with FX so I'm not sure how much of that 20%
would show up through the layers of the 2D code.  Overall, I've about
doubled the frame rates of the prototype since your first drop that you
checked in to the OpenJDK repository.)


Can I see your new version?


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Jim Graham

A couple of questions about the code that I haven't touched...

Is there some reason why the AFD for cubics doesn't have any tests for 
dddxy (the "constants" for its equations), but the AFD for quads is 
testing the ddxy on every loop?  I know that these values do change when 
the AFD variables are doubled or halved, but why does the cubic version 
get away with only testing out to the n-1th order differences but the 
quad version has to test out to the nth order differences?


Also, what is the value of breaking the pieces into monotonic segments 
prior to flattening?  Is it simply amortizing the cost of determining if 
the segment is up or down?  I guess this used to be done because we 
needed monotonic (in Y) curve segments since we did a top-down iteration 
across all segments, but now they aren't in the rasterization loop.  If 
it is simply a performance issue then I may experiment with eliminating 
that stage and seeing if I can make it go faster overall.


Finally, I discovered (while testing for other problems) that the curves 
are not truly monotonic after slicing them.  I realized this years ago 
when I was writing my Area code (see sun.awt.geom.Curve) and put in 
tweaking code to make them monotonic after they were split.  They are 
never off by more than a few bits, but you can't trust the curve 
splitting math to generate purely monotonic segments based on a t 
generated by some unrelated math.  Sometimes the truly horizontal or 
vertical t value requires more precision than a float (or even a double) 
can provide and splitting at the highest representable float less than 
the t value produces a pair of curves on one side of the hill and 
splitting at the next float value (which is greater than the true t 
value) produces curves on the other side of the hill.  Also, when the 
curve has been split a few times already, the t values loose accuracy 
with each split.  This will all be moot if I can eliminate the splitting 
code from the renderer, but it may also play a factor in the stroke/dash 
code...


...jim


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Jim Graham
It's still a work in progress, but I've cleaned up a lot of logic and 
made it faster in a number of ways.  Note that I've abstracted out the 
cache stuff and created an "AlphaConsumer" interface which may evolve 
over time.


In FX we actually consume alphas in larger chunks than the code in JDK 
which was driven by Ductus's 32x32 mandate, so I would have had to make 
completely incompatible changes to emitRow - so I moved it behind an 
interface.  For the JDK code, if you want to integrate this version, I 
would have the cache implement the new interface and move your version 
of emitRow into the Cache object.  I'd send you the new code for my 
AlphaConsumer, but it is incompatible with what you need to cache so it 
won't help you.


You'll also need a bit of un-translation cleanup as we have mirrors of 
all of java.awt.geom with special new APIs that FX needs.


...jim

On 11/8/2010 6:40 AM, Denis Lila wrote:

Hi Jim.


Also, I've gotten another 20% improvement out of the design with a few
more tweaks.  (Though I measured the 20% in the stripped down version
that I'm prototyping with FX so I'm not sure how much of that 20%
would show up through the layers of the 2D code.  Overall, I've about
doubled the frame rates of the prototype since your first drop that you
checked in to the OpenJDK repository.)


Can I see your new version?


Attached.


How about looking more at the stroking end of the process and I'll dig
a little more into optimal rasterization code.  I have a lot of
experience with optimizing rasterizer code (and JNI if it comes to that), but
very little with the curve manipulations involved in stroking (math is so
*hard* at my age ;-)...


Sounds good. Have you implemented your idea of processing one pixel row at a
time, as opposed to processing subpixel rows? If not, I could do that.


Not yet.  Right now I've gotten a lot of mileage out of a few tweaks of 
the bookkeeping of the sample-row-at-a-time version.  I'm still mulling 
over exactly how to make that go faster.


...jim
package com.sun.openpisces;

/**
 * @author Flar
 */
public interface AlphaConsumer {
public int getOriginX();
public int getOriginY();
public int getWidth();
public int getHeight();
public void setMaxAlpha(int maxalpha);
public void setAndClearRelativeAlphas(int alphaDeltas[], int y,
  int firstdelta, int lastdelta);
}
package com.sun.openpisces;

import com.sun.javafx.geom.PathConsumer;
import com.sun.javafx.geom.PathIterator;
import com.sun.javafx.geom.Rectangle;
import java.util.Iterator;

/**
 * @author Flar
 */
public final class OpenPiscesRenderer implements PathConsumer {

public static void feedConsumer(PathIterator pi, PathConsumer pc) {
float[] coords = new float[6];
while (!pi.isDone()) {
int type = pi.currentSegment(coords);
switch (type) {
case PathIterator.SEG_MOVETO:
pc.moveTo(coords[0], coords[1]);
break;
case PathIterator.SEG_LINETO:
pc.lineTo(coords[0], coords[1]);
break;
case PathIterator.SEG_QUADTO:
pc.quadTo(coords[0], coords[1],
  coords[2], coords[3]);
break;
case PathIterator.SEG_CUBICTO:
pc.curveTo(coords[0], coords[1],
   coords[2], coords[3],
   coords[4], coords[5]);
break;
case PathIterator.SEG_CLOSE:
pc.closePath();
break;
}
pi.next();
}
pc.pathDone();
}

private final class ScanlineIterator {

private int[] crossings;
private int[] edgePtrs;
private int edgeCount;

// crossing bounds. The bounds are not necessarily tight (the scan line
// at minY, for example, might have no crossings). The x bounds will
// be accumulated as crossings are computed.
private final int maxY;
private int nextY;

private static final int INIT_CROSSINGS_SIZE = 10;

private ScanlineIterator() {
crossings = new int[INIT_CROSSINGS_SIZE];
edgePtrs = new int[INIT_CROSSINGS_SIZE];

// We don't care if we clip some of the line off with ceil, since
// no scan line crossings will be eliminated (in fact, the ceil is
// the y of the first scan line crossing).
nextY = getFirstScanLineCrossing();
maxY = getScanLineCrossingEnd()-1;
}

private int next() {
// TODO: make function that convert from y value to bucket idx?
int cury = nextY++;
int bucket = cury - boundsMinY;
int count = this.edgeCount;
int ptrs[] = this.edgePtrs;

Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Jim Graham

On 11/8/2010 6:34 AM, Denis Lila wrote:

Hi Clemens.


I've only followed your discussion with Jim but skipped all the
in-depth discussion.
 From my prior experiences usually  JNI is not woth the trouble, if
you don't have a serious reason why using native code would have
advantages (like the possibility of using SIMD or when value-types
would be a huge benefit), it has its own performance pitfalls
especially if the workload is small and things like Get*ArrayCritical
cause scalability problems because they have to lock the GC.


Well, Jim Graham said that a native version of the engine still runs
a lot faster than the version with all my changes. That's why I thought


Actually, that report is old.  I've now got the new Java version turning 
in double the frame rates of the old native version.



it would be a good idea. Also, when not doing antialiasing we usually
feed paths to native consumers, so I thought if pisces used JNI, we
could reduce the java->C transitions five fold. But then I realized that
with antialiasing the opposite would happen, so I'm not sure whether
JNI is a good idea.


That's a good point that the other rasterizers will end up using this 
stroking engine and they are native.  We can worry about cleaning that 
up later.  JNI might eventually be a good idea, but lets fix the 
algorithm first and then worry about whether it will help this renderer 
or if we can make the interface to the other renderers simpler.


...jim


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Mario Torre
Il giorno lun, 08/11/2010 alle 09.34 -0500, Denis Lila ha scritto:

> > Have you had a look at the Netbeans profiler? It supports sampling
> > based testing to keep the influence of the profiler at a minimum.
> 
> No, I don't use netbeans - it doesn't render underscores properly (although
> I think this is an openjdk bug). I'll try it out.

This has been fixed in OpenJDK 7, and should have been back ported to 6,
not sure when the fix will be out.

> > Thanks for working on this!

yup :)

Mario
-- 
pgp key: http://subkeys.pgp.net/ PGP Key ID: 80F240CF
Fingerprint: BA39 9666 94EC 8B73 27FA  FC7C 4086 63E3 80F2 40CF

Proud GNU Classpath developer: http://www.classpath.org/
Read About us at: http://planet.classpath.org
OpenJDK: http://openjdk.java.net/projects/caciocavallo/

Please, support open standards:
http://endsoftpatents.org/



Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Denis Lila
Hi Clemens.

> I've only followed your discussion with Jim but skipped all the
> in-depth discussion.
> From my prior experiences usually  JNI is not woth the trouble, if
> you don't have a serious reason why using native code would have
> advantages (like the possibility of using SIMD or when value-types
> would be a huge benefit), it has its own performance pitfalls
> especially if the workload is small and things like Get*ArrayCritical
> cause scalability problems because they have to lock the GC.

Well, Jim Graham said that a native version of the engine still runs
a lot faster than the version with all my changes. That's why I thought
it would be a good idea. Also, when not doing antialiasing we usually
feed paths to native consumers, so I thought if pisces used JNI, we
could reduce the java->C transitions five fold. But then I realized that
with antialiasing the opposite would happen, so I'm not sure whether
JNI is a good idea.

> Have you had a look at the Netbeans profiler? It supports sampling
> based testing to keep the influence of the profiler at a minimum.

No, I don't use netbeans - it doesn't render underscores properly (although
I think this is an openjdk bug). I'll try it out.

> Thanks for working on this!

It's been fun.

Regards,
Denis.

- "Clemens Eisserer"  wrote:

> Hi Denis,
> 
> > It's not obvious to me why this happened, so I think now I will put
> > this type of optimization aside and convert to JNI,
> 

> 
> > where profiling
> > will be easier (for me - I haven't been able to make OProfile work
> > for java yet).
> 
> - Clemens


Re: [OpenJDK 2D-Dev] X11 uniform scaled wide lines and dashed lines; STROKE_CONTROL in Pisces

2010-11-08 Thread Denis Lila
Hi Jim.

> Also, I've gotten another 20% improvement out of the design with a few
> more tweaks.  (Though I measured the 20% in the stripped down version
> that I'm prototyping with FX so I'm not sure how much of that 20%
> would show up through the layers of the 2D code.  Overall, I've about
> doubled the frame rates of the prototype since your first drop that you
> checked in to the OpenJDK repository.)

Can I see your new version?

> How about looking more at the stroking end of the process and I'll dig
> a little more into optimal rasterization code.  I have a lot of
> experience with optimizing rasterizer code (and JNI if it comes to that), but
> very little with the curve manipulations involved in stroking (math is so 
> *hard* at my age ;-)...

Sounds good. Have you implemented your idea of processing one pixel row at a
time, as opposed to processing subpixel rows? If not, I could do that.

Thanks,
Denis.

- "Jim Graham"  wrote:

> Hi Denis,
> 
>   ...jim
> 
> On 11/4/2010 9:20 AM, Clemens Eisserer wrote:
> > Hi Denis,
> >
> >> It's not obvious to me why this happened, so I think now I will
> put
> >> this type of optimization aside and convert to JNI,
> >
> > I've only followed your discussion with Jim but skipped all the
> > in-depth discussion.
> >> From my prior experiences usually  JNI is not woth the trouble, if
> you
> > don't have a serious reason why using native code would have
> > advantages (like the possibility of using SIMD or when value-types
> > would be a huge benefit), it has its own performance pitfalls
> > especially if the workload is small and things like
> Get*ArrayCritical
> > cause scalability problems because they have to lock the GC.
> >
> >> where profiling
> >> will be easier (for me - I haven't been able to make OProfile work
> >> for java yet).
> >
> > Have you had a look at the Netbeans profiler? It supports sampling
> > based testing to keep the influence of the profiler at a minimum.
> >
> > Thanks for working on this!
> >
> > - Clemens