Re: [OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow

Sergey Bylokhov Thu, 09 Jul 2015 11:26:23 -0700

swingmark also shows double improvement on the retina.


On 25.06.15 13:40, Andrew Brygin wrote:

Hello Sergey,

24/06/15 22:45, Sergey Bylokhov wrote:
Hi, Andrew.
Thanks for this report. As far as I understand it in case of retinathe lcd text is drawing faster after the fix than aa before the fix,which means that we will not get a new regressions. So the fix looksfine.
But on non retina our results still not so good, lcd text is slow:485(was 16.4) vs 16508..... and the window for optimizations stillexists.
I agree that there is a room for further optimizations.

However, I do not think that it is possible to achieve the same
level of performance of the lcd rendering, as the aa rendering,
because of more complex nature of the lcd text rendering.

Thanks,
Andrew
global.dest=VolatileImg(Opaque),text.opts.font.fsize=6.0,text.opts.graphics.textaa=LCD_HRGB:
9-8087201-v00: 485.2052560 (var=0.57%) (2955.82%)
**|*********************************************************
**|*********************************************************
**|*********************************************************
global.dest=VolatileImg(Opaque),text.opts.font.fsize=6.0,text.opts.graphics.textaa=On:
9-8087201-v00: 16508.76580 (var=0.66%) (99.69%)
************************************************************|
************************************************************|
*********************************************************** |


On 19.06.15 15:54, Andrew Brygin wrote:
Hello Sergey,
the only part of the fix affects the performance of AA case: thecache cell size.In a case of retina, 13pt and 20pt glyphs do not fit the 16x16cache cells,
 so these benchmarks show better performance:
 13pt: 40-80 times faster
 20pt: 7-13 times faster

 6pt shows the same results, because it fits the cache in any case.

 Full benchmark results:
http://cr.openjdk.java.net/~bae/8087201/9/ogl-lcd-aa.res

 Regarding the suggestion with creating a separate method for the fast
path possibility check: please note that we do this check andcalculatethe dstTextureID only once per whole glyph vector, but use thedstTextureIDas an indicator for every glyph. So such change will affectperformance for
 sure.
Probably we can masquerade the 'dstTextureID == 0' condition withsomesort of a macro, like canReadDestinationDirectly() or somethinglike this.
 Are you OK with this?

Thanks,
Andrew

19/06/15 13:57, Sergey Bylokhov wrote:
Hi, Andrew.
Can you additionally provide the bench data about aa(before/afterthe fix) vs new lcd lcd?
Probably it well be more obvious if the code in OGLTextRenderer
1007     if (OGLC_IS_CAP_PRESENT(oglc, CAPS_EXT_TEXBARRIER) &&
1008         dstOps->textureTarget == GL_TEXTURE_2D)
Will be moved to the separate method and the check to thepossibility of fast blit will be clarified instead of:
if (dstTextureID == 0) {
Also your review request contains useful information likefast/slow/read-after-write etc. I think this information can beuseful as a comments in the code.
On 18.06.15 17:39, Andrew Brygin wrote:
Hello,

 could you please review a fix for 8087201?

 The root of the problem is that we have to supply a content of
destination surface to lcd shader to compose the lcd glyphcorrectly.
 In order to do this, we have to copy a sub-image from destination
buffer to an intermediate texture using glCopyTexSubImage2D()routine.Unfortunately, this routine is quite slow on majority of systems,and it
 dramatically reduces the overall speed of lcd text rendering.
The main idea of the fix is to use a texture associated with thedestinationsurface if it exists. In this case we have a chance to completelyabandon thedata copying. However, we have to avoid read-after-write in orderto getcorrect results in this case. Fortunately, it can be achieved byusing the
 GL_NV_texture_barrier extension:

https://www.opengl.org/registry/specs/NV/texture_barrier.txt
Beside this, suggested fix introduces following changes in OGLtext renderer:
* Separate accelerated caches for LCD and AA glyphs
We have a single cache which is initialized ether for LCD orfor AA glyphs.If application mixes these types of font smoothing from somereasons, we
   have got a significant performance degradation.
For example, if we use J2DBench in GUI mode, then swing GUIinitializes theaccelerated cache for AA, and subsequent rendering of LCD textalways
   uses 'no-cache' code path.

* Increase dimension of the glyph cache cell from 16x16 to 32x32.
This change gives significant performance boost on systems withretina
  (because of average size of rendered glyphs).
However, on systems where the fast path with destinationtexture is notpossible for any reasons, this change may cause a performancedegradation
   because of more extenceive usage of glCopyTexSubImage2D.
So, we probably may want to get a means to configure the celldimension
  depending on system capabilities.

Performance results overview:
* MBP with Intel Iris (retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-intel-iris.txt

* iMac with AMD HD6750M (no retina, texture barrier is available):
http://cr.openjdk.java.net/~bae/8087201/9/imac-amd-hd6750m.txt

* MBP with OSX10.8, NV GF9600M (no retina, no texture barrier):
http://cr.openjdk.java.net/~bae/8087201/9/mbp-10.8-NVGF9600M.txt

Please take a look.

Thanks,
Andrew
--
Best regards, Sergey.



--
Best regards, Sergey.

Re: [OpenJDK 2D-Dev] [9] request for review: 8087201: OGL: rendering of lcd text is slow

Reply via email to