Hi Mik,
On Feb 22, 2007, at 2:11 AM, Michele Puccini wrote:
A little followup..
the fragment shader code of my previous email was bloody wrong as
the docs
say:
"Notice that the fragment shader has no access to the frame buffer.
This
implies that operations such as blending occur only after the fragment
shader has run."
The big mistake is that I was thinking of reading the framebuffer
contents
with gl_FragColor.. :(
Yep, the fact that you can't access destination pixels is about the
only downside of working with fragment shaders.
Mmmh.. maybe I could implement my blending by binding a pbuffer
object to a
texture (or even a FBO) and then access the dest pixels from there,
right ?
If one really needs access to destination pixels from a fragment
shader, the typical approach is to copy a chunk of the destination
into a texture, and then read that from the shader. This can get
tricky, but it is exactly the technique we use in our LCD-optimzed
text shader for the OGL pipeline, and it works great.
But, anyway, the idea of using shaders in the java2d/opengl
codepath could
still be valid.
Well, this is a timely discussion. As you may already know, our
first use of fragment shaders to accelerate Java 2D operations was in
JDK 6 when I added the fragment shader (mentioned above) that handles
LCD-optimized text on the GPU, currently only for the OGL pipeline.
But that was just the beginning.
In JDK 7-b08, I checked in a chunk of code that uses fragment shaders
to accelerate ConvolveOp, RescaleOp, and LookupOp on the GPU (again,
currently only for the OGL pipeline). You can download b08 today
from jdk7.dev.java.net and give it a try. On your board (Nvidia
GeForce 7600GT), it should scream. You can get a taste by maximizing
Java2Demo and trying out the ImageOps demo (moving the sliders
around). The screen refreshes should happen in a fraction of the
time that they did in JDK 6, all while using close to 0% CPU. I
won't go into more details because I have a blog forthcoming on this
subject, with performance charts and whatnot, but if you're really
curious in the meantime you can read this bug report:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6514990
I've also just checked in code that uses fragment shaders to
accelerate the new LinearGradientPaint and RadialGradientPaint
classes that were added in JDK 6. This should appear in a public JDK
7 build sometime around b09 or b10. There is some chance that we
could backport these fixes to a JDK 6 update release, once they've
been proven in JDK 7 for some time. Again, I'll have a blog entry
that discusses this work, but in the meantime:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6521533
There are still more operations that we plan to accelerate via
fragment shaders in the OGL pipeline, so stay tuned. Also, although
I've mentioned that these ops are currently only accelerated via the
OGL pipeline, there is work being done to completely rework the D3D
pipeline on Windows so that it shares a lot of the same code and
structure of the OGL pipeline. I'm hoping that these shader-based
optimizations should be portable to the new D3D pipeline when it's
ready.
Anyway, back to your original question about using shaders to
accelerate premultiplication. As you've found, it can be quite
complicated, but really it's only applicable in a very small set of
situations. In most cases, incoming pixel data is already in a
premultiplied format, so there's nothing special we have to do in the
OGL pipeline. The only case where we convert in software is when
rendering a non-premultiplied TYPE_INT_ARGB or similar image, but
this conversion step rarely is a problem for performance. What would
have helped in this case was 3Dlabs proposal for programmable pixel
packing/unpacking, in which case we could convert from non-premult
data to premult data on-the-fly using OpenGL, but that never saw the
light of day.
But again, for most cases this isn't a concern. If your image can be
"managed" by Java 2D, then you only incur the conversion once when
the BufferedImage is cached as a (premultiplied) OpenGL texture.
Unless you really need to use TYPE_INT_ARGB images, I'd recommend (as
always) to use createCompatibleImage() to construct your images, as
this is guaranteed to return the fastest image type for the
particular screen/pipeline. For the OGL pipeline, this will return a
premultiplied image format, which means you will avoid the minor
conversions described above.
BTW:
That's probably because the Porter and Duff compositing rules "work
better" with premultiplied data ;)
It's not just "work better"; premultiplied data is a requirement for
making the math work out correctly.
I'll send an email to the list once those blog entries are published.
Thanks,
Chris
===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff JAVA2D-INTEREST". For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".