On 6/24/15 8:18 AM, Andrew Brygin wrote:
Hello Phil,

 please see my comments inline.

23/06/15 21:29, Phil Race wrote:
Hi Andrew,

Overall the fix looks good. A few questions.

1. Regarding translucent surfaces, do you know when Swing
has a translucent backbuffer and when it does not ?
It has been noted that we now have LCD text in some cases
in SS2 but apparently still not in NB ..
I did not noticed the lcd text in SwingSet2 demo without explicit
switch to opaque backbuffers in the ReapaintManager.

My expectation is that standard swing components should not
use lcd text on macosx at the moment. However, if there are
(custom?) components which create an opaque buffers separately
from the ReapaintManager, then they could be able to use lcd text.

See https://bugs.openjdk.java.net/browse/JDK-8098853 and/or ask Yuri.



2. Where are we likely to find (or not find) support for this extension ?

Based on your results ironically, it seems that the Nvidia card is the
one case that did not support the extension. Is that because it was
an older version of OS X than the others ?

Unfortunately, the extension is relatively new, and we need a new drivers
to use this extension. The mbp with nvidia GF9600M is running under OSX 10.8,
and there we can not use the extension. However, this extension is listed
as supported for the GF9600M in the extension database, and we probably can
expect that an upgrade of OSX to 10.9 or 10.10 will make it available.

OK.

The availability of the extension is a main reason to look for an alternative
solutions. A best option is to identify and eliminate a reason of the
glCopyTexSubimage() slowness. There are some reasons to think that this
is possible:
 * a separate simple OGL demo shows almost equal performance for
   glCopyTexSubImage() and re-using the FBO texture.
 * on windows, the performance of glCopyTexSubImage is much better
    in the case of FBO.

However, at the moment I do not see what we are doing wrong/non-optimal
with the standard approach.

By standard approach you mean what exactly ?


3. The performance 'lost' case.
> However, on systems where the fast path with destination texture is not > possible for any reasons, this change may cause a performance degradation
>  because of more extenceive usage of glCopyTexSubImage2D.
> So, we probably may want to get a means to configure the cell dimension

Is this a reference to losing performance on non-retina displays
where we would be better off with the smaller cache cell size ?

Was the answer to this 'yes' ?

I suppose the importance of this depends in part on the answer to question #2
Probably, most important part here is old OSX (< 10.9) systems.

All the 8 updates support 10.8.3+ so I suppose that is the main case but
I expect that to 'go away' for JDK 9 or perhaps earlier once Apple stop supporting it.

Also, windows systems with OGL drivers created before 2011 - 2012.
However, OGL is a optional pipeline in windows, so it could be less critical.

I think older windows drivers are something we would encourage everyone
to get off ASAP anyway ..

4. Have you tried this on Linux .. or even a Windows OGL driver ?
I have uploaded results for a linux system with NVS5400:
http://cr.openjdk.java.net/~bae/8087201/9/linux-x64-bench.txt

Here we have the NV_texture_barrier extension, and benefit up to x10-x20
speedup in some testes.

On windows, I have got mixed results:
* Intel HD4000: no extension due to old drivers, so the same results as without the fix. * NVS5400: with the fix we have got similar scores in the tests as on macosx,
     but standard way with glCopyTexSubImage gives better results anyway.
I.e. with the fix, we achieve only 55% - 60% of original performance.

That is very interesting. Is that related to your earlier observation :
> * on windows, the performance of glCopyTexSubImage is much better in the case of FBO.

Any idea why ? Given that it is a unified driver it sounds like we may be want
to disable this code path when on windows at least for NV but I guess we
may also want to validate that on some other cards - from Nvidia - to
see if it is a driver or h/w limitation.

-phil.

Thanks,
Andrew


-phil.

On 06/18/2015 07:40 AM, Andrew Brygin wrote:

Bug: https://bugs.openjdk.java.net/browse/JDK-8087201
Webrev: http://cr.openjdk.java.net/~bae/8087201/9/webrev.00/

Thanks,
Andrew


18/06/15 17:39, Andrew Brygin пишет:
Hello,

 could you please review a fix for 8087201?

 The root of the problem is that we have to supply a content of
 destination surface to lcd shader to compose the lcd glyph correctly.
 In order to do this, we have to copy a sub-image from destination
buffer to an intermediate texture using glCopyTexSubImage2D() routine. Unfortunately, this routine is quite slow on majority of systems, and it
 dramatically reduces the overall speed of lcd text rendering.

The main idea of the fix is to use a texture associated with the destination surface if it exists. In this case we have a chance to completely abandon the data copying. However, we have to avoid read-after-write in order to get correct results in this case. Fortunately, it can be achieved by using the
 GL_NV_texture_barrier extension:

https://www.opengl.org/registry/specs/NV/texture_barrier.txt

Beside this, suggested fix introduces following changes in OGL text renderer:

* Separate accelerated caches for LCD and AA glyphs
We have a single cache which is initialized ether for LCD or for AA glyphs. If application mixes these types of font smoothing from some reasons, we
   have got a significant performance degradation.
For example, if we use J2DBench in GUI mode, then swing GUI initializes the accelerated cache for AA, and subsequent rendering of LCD text always
   uses 'no-cache' code path.

* Increase dimension of the glyph cache cell from 16x16 to 32x32.
This change gives significant performance boost on systems with retina
  (because of average size of rendered glyphs).
However, on systems where the fast path with destination texture is not possible for any reasons, this change may cause a performance degradation
   because of more extenceive usage of glCopyTexSubImage2D.
So, we probably may want to get a means to configure the cell dimension
  depending on system capabilities.

Performance results overview:
* MBP with Intel Iris (retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt

* iMac with AMD HD6750M (no retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt

* MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt

Please take a look.

Thanks,
Andrew




Reply via email to