Re: [OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow

Phil Race Wed, 24 Jun 2015 11:42:07 -0700

On 6/24/15 8:18 AM, Andrew Brygin wrote:

Hello Phil,


 please see my comments inline.

23/06/15 21:29, Phil Race wrote:

Hi Andrew,

Overall the fix looks good. A few questions.

1. Regarding translucent surfaces, do you know when Swing
has a translucent backbuffer and when it does not ?
It has been noted that we now have LCD text in some cases
in SS2 but apparently still not in NB ..

I did not noticed the lcd text in SwingSet2 demo without explicit
switch to opaque backbuffers in the ReapaintManager.

My expectation is that standard swing components should not
use lcd text on macosx at the moment. However, if there are
(custom?) components which create an opaque buffers separately
from the ReapaintManager, then they could be able to use lcd text.


See https://bugs.openjdk.java.net/browse/JDK-8098853 and/or ask Yuri.

2. Where are we likely to find (or not find) support for thisextension ?
Based on your results ironically, it seems that the Nvidia card is the
one case that did not support the extension. Is that because it was
an older version of OS X than the others ?
Unfortunately, the extension is relatively new, and we need a new drivers
to use this extension. The mbp with nvidia GF9600M is running underOSX 10.8,
and there we can not use the extension. However, this extension is listed
as supported for the GF9600M in the extension database, and weprobably can
expect that an upgrade of OSX to 10.9 or 10.10 will make it available.

OK.

The availability of the extension is a main reason to look for analternative

solutions. A best option is to identify and eliminate a reason of the
glCopyTexSubimage() slowness. There are some reasons to think that this
is possible:
 * a separate simple OGL demo shows almost equal performance for
   glCopyTexSubImage() and re-using the FBO texture.
 * on windows, the performance of glCopyTexSubImage is much better
    in the case of FBO.

However, at the moment I do not see what we are doing wrong/non-optimal
with the standard approach.


By standard approach you mean what exactly ?

3. The performance 'lost' case.
> However, on systems where the fast path with destination textureis not> possible for any reasons, this change may cause a performancedegradation
>  because of more extenceive usage of glCopyTexSubImage2D.
> So, we probably may want to get a means to configure the celldimension
Is this a reference to losing performance on non-retina displays
where we would be better off with the smaller cache cell size ?


Was the answer to this 'yes' ?

I suppose the importance of this depends in part on the answer toquestion #2
Probably, most important part here is old OSX (< 10.9) systems.


All the 8 updates support 10.8.3+ so I suppose that is the main case but

I expect that to 'go away' for JDK 9 or perhaps earlier once Apple stopsupporting it.

Also, windows systems with OGL drivers created before 2011 - 2012.
However, OGL is a optional pipeline in windows, so it could be lesscritical.


I think older windows drivers are something we would encourage everyone
to get off ASAP anyway ..

4. Have you tried this on Linux .. or even a Windows OGL driver ?
I have uploaded results for a linux system with NVS5400:
http://cr.openjdk.java.net/~bae/8087201/9/linux-x64-bench.txt

Here we have the NV_texture_barrier extension, and benefit up to x10-x20
speedup in some testes.

On windows, I have got mixed results:
* Intel HD4000: no extension due to old drivers, so the same resultsas without the fix.* NVS5400: with the fix we have got similar scores in the tests as onmacosx,
     but standard way with glCopyTexSubImage gives better results anyway.
I.e. with the fix, we achieve only 55% - 60% of originalperformance.


That is very interesting. Is that related to your earlier observation :

> * on windows, the performance of glCopyTexSubImage is much better inthe case of FBO.

Any idea why ? Given that it is a unified driver it sounds like we maybe want

to disable this code path when on windows at least for NV but I guess we
may also want to validate that on some other cards - from Nvidia - to
see if it is a driver or h/w limitation.

-phil.

Thanks,
Andrew
-phil.

On 06/18/2015 07:40 AM, Andrew Brygin wrote:
Bug: https://bugs.openjdk.java.net/browse/JDK-8087201
Webrev: http://cr.openjdk.java.net/~bae/8087201/9/webrev.00/

Thanks,
Andrew


18/06/15 17:39, Andrew Brygin пишет:
Hello,

 could you please review a fix for 8087201?

 The root of the problem is that we have to supply a content of
 destination surface to lcd shader to compose the lcd glyph correctly.
 In order to do this, we have to copy a sub-image from destination
buffer to an intermediate texture using glCopyTexSubImage2D()routine.Unfortunately, this routine is quite slow on majority of systems,and it
 dramatically reduces the overall speed of lcd text rendering.
The main idea of the fix is to use a texture associated with thedestinationsurface if it exists. In this case we have a chance to completelyabandon thedata copying. However, we have to avoid read-after-write in orderto getcorrect results in this case. Fortunately, it can be achieved byusing the
 GL_NV_texture_barrier extension:

https://www.opengl.org/registry/specs/NV/texture_barrier.txt
Beside this, suggested fix introduces following changes in OGL textrenderer:
* Separate accelerated caches for LCD and AA glyphs
We have a single cache which is initialized ether for LCD or forAA glyphs.If application mixes these types of font smoothing from somereasons, we
   have got a significant performance degradation.
For example, if we use J2DBench in GUI mode, then swing GUIinitializes theaccelerated cache for AA, and subsequent rendering of LCD textalways
   uses 'no-cache' code path.

* Increase dimension of the glyph cache cell from 16x16 to 32x32.
This change gives significant performance boost on systems withretina
  (because of average size of rendered glyphs).
However, on systems where the fast path with destination textureis notpossible for any reasons, this change may cause a performancedegradation
   because of more extenceive usage of glCopyTexSubImage2D.
So, we probably may want to get a means to configure the celldimension
  depending on system capabilities.

Performance results overview:
* MBP with Intel Iris (retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt

* iMac with AMD HD6750M (no retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt

* MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt

Please take a look.

Thanks,
Andrew

Re: [OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow

Reply via email to