On 6/24/15 8:18 AM, Andrew Brygin wrote:
Hello Phil,
please see my comments inline.
23/06/15 21:29, Phil Race wrote:
Hi Andrew,
Overall the fix looks good. A few questions.
1. Regarding translucent surfaces, do you know when Swing
has a translucent backbuffer and when it does not ?
It has been noted that we now have LCD text in some cases
in SS2 but apparently still not in NB ..
I did not noticed the lcd text in SwingSet2 demo without explicit
switch to opaque backbuffers in the ReapaintManager.
My expectation is that standard swing components should not
use lcd text on macosx at the moment. However, if there are
(custom?) components which create an opaque buffers separately
from the ReapaintManager, then they could be able to use lcd text.
See https://bugs.openjdk.java.net/browse/JDK-8098853 and/or ask Yuri.
2. Where are we likely to find (or not find) support for this
extension ?
Based on your results ironically, it seems that the Nvidia card is the
one case that did not support the extension. Is that because it was
an older version of OS X than the others ?
Unfortunately, the extension is relatively new, and we need a new drivers
to use this extension. The mbp with nvidia GF9600M is running under
OSX 10.8,
and there we can not use the extension. However, this extension is listed
as supported for the GF9600M in the extension database, and we
probably can
expect that an upgrade of OSX to 10.9 or 10.10 will make it available.
OK.
The availability of the extension is a main reason to look for an
alternative
solutions. A best option is to identify and eliminate a reason of the
glCopyTexSubimage() slowness. There are some reasons to think that this
is possible:
* a separate simple OGL demo shows almost equal performance for
glCopyTexSubImage() and re-using the FBO texture.
* on windows, the performance of glCopyTexSubImage is much better
in the case of FBO.
However, at the moment I do not see what we are doing wrong/non-optimal
with the standard approach.
By standard approach you mean what exactly ?
3. The performance 'lost' case.
> However, on systems where the fast path with destination texture
is not
> possible for any reasons, this change may cause a performance
degradation
> because of more extenceive usage of glCopyTexSubImage2D.
> So, we probably may want to get a means to configure the cell
dimension
Is this a reference to losing performance on non-retina displays
where we would be better off with the smaller cache cell size ?
Was the answer to this 'yes' ?
I suppose the importance of this depends in part on the answer to
question #2
Probably, most important part here is old OSX (< 10.9) systems.
All the 8 updates support 10.8.3+ so I suppose that is the main case but
I expect that to 'go away' for JDK 9 or perhaps earlier once Apple stop
supporting it.
Also, windows systems with OGL drivers created before 2011 - 2012.
However, OGL is a optional pipeline in windows, so it could be less
critical.
I think older windows drivers are something we would encourage everyone
to get off ASAP anyway ..
4. Have you tried this on Linux .. or even a Windows OGL driver ?
I have uploaded results for a linux system with NVS5400:
http://cr.openjdk.java.net/~bae/8087201/9/linux-x64-bench.txt
Here we have the NV_texture_barrier extension, and benefit up to x10-x20
speedup in some testes.
On windows, I have got mixed results:
* Intel HD4000: no extension due to old drivers, so the same results
as without the fix.
* NVS5400: with the fix we have got similar scores in the tests as on
macosx,
but standard way with glCopyTexSubImage gives better results anyway.
I.e. with the fix, we achieve only 55% - 60% of original
performance.
That is very interesting. Is that related to your earlier observation :
> * on windows, the performance of glCopyTexSubImage is much better in
the case of FBO.
Any idea why ? Given that it is a unified driver it sounds like we may
be want
to disable this code path when on windows at least for NV but I guess we
may also want to validate that on some other cards - from Nvidia - to
see if it is a driver or h/w limitation.
-phil.
Thanks,
Andrew
-phil.
On 06/18/2015 07:40 AM, Andrew Brygin wrote:
Bug: https://bugs.openjdk.java.net/browse/JDK-8087201
Webrev: http://cr.openjdk.java.net/~bae/8087201/9/webrev.00/
Thanks,
Andrew
18/06/15 17:39, Andrew Brygin пишет:
Hello,
could you please review a fix for 8087201?
The root of the problem is that we have to supply a content of
destination surface to lcd shader to compose the lcd glyph correctly.
In order to do this, we have to copy a sub-image from destination
buffer to an intermediate texture using glCopyTexSubImage2D()
routine.
Unfortunately, this routine is quite slow on majority of systems,
and it
dramatically reduces the overall speed of lcd text rendering.
The main idea of the fix is to use a texture associated with the
destination
surface if it exists. In this case we have a chance to completely
abandon the
data copying. However, we have to avoid read-after-write in order
to get
correct results in this case. Fortunately, it can be achieved by
using the
GL_NV_texture_barrier extension:
https://www.opengl.org/registry/specs/NV/texture_barrier.txt
Beside this, suggested fix introduces following changes in OGL text
renderer:
* Separate accelerated caches for LCD and AA glyphs
We have a single cache which is initialized ether for LCD or for
AA glyphs.
If application mixes these types of font smoothing from some
reasons, we
have got a significant performance degradation.
For example, if we use J2DBench in GUI mode, then swing GUI
initializes the
accelerated cache for AA, and subsequent rendering of LCD text
always
uses 'no-cache' code path.
* Increase dimension of the glyph cache cell from 16x16 to 32x32.
This change gives significant performance boost on systems with
retina
(because of average size of rendered glyphs).
However, on systems where the fast path with destination texture
is not
possible for any reasons, this change may cause a performance
degradation
because of more extenceive usage of glCopyTexSubImage2D.
So, we probably may want to get a means to configure the cell
dimension
depending on system capabilities.
Performance results overview:
* MBP with Intel Iris (retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt
* iMac with AMD HD6750M (no retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt
* MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt
Please take a look.
Thanks,
Andrew